Stefan Pohl Computer Chess

private website for chessengine-tests


Latest Website-News (2019/09/16): Testrun of Scorpio 3 NN Maddex-Int8 (xboard-version) finished. See the results and download the games in the "Lc0 / NN testing" -section. Next NN-testrun: Allie 0.5dev with Leelenstein 10.2 Net.

Testrun of Stockfish 190915 still running.

 

Important news: From now, all testruns are played with Cutechess-cli (V1.1.0 (190724)). Thanx to T.Plaschke, who compiled this version, because V1.0 is not able to play an engine gauntlet, with all opponent engines playing the same openings (and replay them with reversed colors) sequentially out of a small openings-set. You can download this new Cutechess-cli (Windows, 64bit) in the "Downloads & Links"- section... (the new parameter is: policy=round)

 

 

Stay tuned.

 


Stockfish testing

 

Playing conditions:

 

Hardware: i7-6700HQ 2.6GHz Notebook (Skylake CPU), Windows 10 64bit, 8GB RAM

Fritzmark: singlethread: 5.3 / 2521 (all engines running on one thread, only), average meganodes/s displayed by LittleBlitzerGUI: Houdini: 2.6 mn/s, Stockfish: 2.2 mn/s, Komodo: 2.0 mn/s

Hash: 512MB per engine

GUI: Since 19/09/11: Cutechess-cli (draw at 250 moves, GUI ends game, when a 5-piece endgame is on the board), before: LittleBlitzerGUI (draw at 170 moves, resign at -700cp)

Tablebases: None for engines, 5 Syzygy for cutechess-cli

Openings: HERT testset (by Thomas Zipproth) (download the file at the "Download & Links"-section or here)(I use a version of HERT, where the positions in the file are ordered in a different way - makes no difference for testing-results, dont be confused, when you download my gamebase-file and the game-sequence doesnt match with the sequence of your HERT-set...)

Ponder, Large Memory Pages & learning: Off

Thinking time: 180''+1000ms (= 3'+1'') per game/engine (average game-duration: around  7.5 minutes). One 5000 games-testrun takes about 7 days.The version-numbers of the Stockfish engines are the date of the latest patch, which was included in the Stockfish sourcecode, not the release-date of the engine-file, written backwards (year,month,day))(example: 170526 = May, 26, 2017). Since July, 2018 I use the abrok-compiles of Stockfish again (http://abrok.eu/stockfish), because they are now much faster than before - now only 1.3% slower than BrainFish-compiles. So, there is no reason anymore to not use these "official" development-compiles.

Download BrainFish (and the Cerebellum-Libraries)here

 

Each Stockfish-version plays 1000 games versus Komodo 13.1, Houdini 6, Fire 7.1, Xiphos 0.5.6, Ethereal 11.53. All engines are running with default-settings.

To avoid distortions in the Ordo Elo-calculation, from now, only 2x Stockfish (latest official release + the latest version) and 1x asmFish and 1x Brainfish are stored in the gamebase (all older engine-versions games will be deleted, every time, when a new version was tested). Stockfish, asmFish and BrainFish older Elo-results can still be seen in the Elo-diagrams below. BrainFish plays always with the latest Cerebellum-Libraries of course, because otherwise BrainFish = Stockfish.

 

Latest update: 2019/09/14: Xiphos 0.5.6 (+9 Elo to Xiphos 0.5.3)

 

(Ordo-calculation fixed to Stockfish 10 = 3508 Elo)

 

See the individual statistics of engine-results here

See the ORDO-rating of the archive-gamebase since 2019 here

Download the current gamebase here

Download the archive-gamebase since 2019 here

 

     Program                      Elo    +    -   Games   Score   Av.Op.  Draws

   1 BrainFish-2 190726 bmi2    : 3587    9    9  5000    80.5 %   3329   36.8 %
   2 Stockfish 190826 bmi2      : 3542    8    8  5000    76.3 %   3330   42.1 %
   3 Stockfish 10 181129        : 3508    5    5 15000    76.4 %   3293   40.7 %
   4 Stockfish 9 180201         : 3460    8    8  5000    74.9 %   3257   41.7 %
   5 Houdini 6 pext             : 3430    4    4 20000    63.8 %   3322   49.6 %
   6 Komodo 13.1 bmi2           : 3407    6    6  8000    56.3 %   3360   53.6 %
   7 Komodo 13.01 bmi2          : 3402    6    6  8000    55.7 %   3361   52.7 %
   8 Komodo 12.3 bmi2           : 3395    6    6  8000    63.2 %   3294   49.8 %
   9 Komodo 13.1 MCTS           : 3313    6    6  7000    45.5 %   3348   54.3 %
  10 Komodo 13.01 MCTS          : 3296    7    7  6000    42.2 %   3356   54.4 %
  11 Fire 7.1 popc              : 3281    4    4 20000    44.1 %   3329   52.5 %
  12 Xiphos 0.5.6 bmi2          : 3276    7    7  6000    37.9 %   3368   52.9 %
  13 Xiphos 0.5.3 bmi2          : 3267    5    5 13000    35.8 %   3380   49.7 %
  14 Ethereal 11.53 pext        : 3266    6    6 10000    33.2 %   3401   47.9 %
  15 Komodo 12.3 MCTS           : 3260    6    6  8000    43.7 %   3311   47.1 %
  16 Ethereal 11.25 pext        : 3251    6    6  9000    38.7 %   3338   51.8 %
  17 Laser 1.7 bmi2             : 3201    7    7  6000    30.8 %   3354   45.8 %
  18 Fizbo 2 bmi2               : 3197    8    8  5000    36.0 %   3310   39.0 %
  19 Shredder 13 x64            : 3194    8    8  6000    31.9 %   3343   42.6 %
  20 Booot 6.3.1 popc           : 3184    8    8  5000    34.0 %   3312   44.1 %
  21 Andscacs 0.95 popc         : 3151    8    8  5000    23.1 %   3375   35.4 %

The version-numbers (180622 for example) of the engines are the date of the latest patch, which was included in the Stockfish sourcecode, not the release-date of the engine-file. Especially the asmFish-engines are often released much later!!

Below you find a diagram of the progress of Stockfish in my tests since the end of 2018

And below that diagram, the older diagrams.

 

You can save the diagrams (as a JPG-picture (in originial size)) on your PC with mouseclick (right button) and then choose "save image"...

The Elo-ratings of older Stockfish dev-versions in the Ordo-calculation can be a little different to the Elo-"dots" in the diagram, because the results/games of new Stockfish dev-versions - when getting part of the Ordo-calculation - can change the Elo-ratings of the opponent engines and that can change the Elo-ratings of older Stockfish dev-versions (in the Ordo-calculation / ratinglist, but not in the diagram, where all Elo-"dots" are the rating of one Stockfish dev-version at the moment, when the testrun of that Stockfish dev-version was finished).

 

 

 


Sie sind Besucher Nr.