Stefan Pohl Computer Chess

private website for chessengine-tests


Stockfish Regression Testing

 

 

Latest testrun: Stockfish 210713
Reference point (opponent) is the latest official SF-release (Stockfish 14 right now).
Each SF-dev version plays 30000 games versus this engine

 

Hardware: AMD Ryzen 3900 12-core (24 threads) notebook with 32GB RAM. 20 games are played simultaneously

Speed: Singlethread, TurboBoost-mode switched off, chess starting position: Stockfish: 1.3 mn/s, Komodo: 1.1 mn/s

Hash: 128MB per engine

GUI: Cutechess-cli (GUI ends game, when a 5-piece endgame is on the board)

Tablebases: None for engines, 5 Syzygy for cutechess-cli

Openings: Because the short time-control and the not so beefy hardware are spreading the Elo-distances, very balanced openings are used, because playing very balanced openings shrink the Elo-distances: 8 moves human openings (both players 2400+ Elo) out of Megabase 2020, KomodoDragon 1.0 analyzed each endposition on a QuadcorePC with 15 seconds thinking-time. And the eval of KomodoDragon had to be in a very small, balanced interval of [+0.01;+0.15]. Using these balanced openings work very well: In a 30000 games regression-testrun of SF 14 vs. SF 13, the result was +36 Elo for SF 14, which fits very good to my regular Stockfish testings versus other engines (on my main site), where SF 14 is +34 Elo better than SF 13.

Ponder, Large Memory Pages & learning: Off

Thinking time: 20 sec + 200 ms per game/engine (average game-duration: around  1 minute). One 30000 games-testrun takes about 25 hours.The version-numbers of the Stockfish engines are the date of the latest patch, which was included in the Stockfish sourcecode, not the release-date of the engine-file, written backwards (year,month,day) (example: 200807 = August, 7, 2020). The used SF compile is the AVX2-compile, which is the fastest on my AMD Ryzen CPU. SF binaries are taken from abrok.eu (except the official SF-release versions, which are taken form the official Stockfish website).

 

ORDO calculation fixed to reference-engine (Elo = 0)

   # PLAYER                   :  RATING  ERROR   POINTS  PLAYED     W      D     L   (%)
   1 Stockfish 210703 avx2    :     2.0    2.4  15083.5   30000  1304  27559  1137  50.3
   2 Stockfish 210713 avx2    :     1.1    2.5  15049.0   30000  1221  27656  1123  50.2
   3 Stockfish 14 210702      :     0.0    1.2  46417.0   90000  6413  80008  3579  51.6
   4 Stockfish 13 210218      :   -36.3    2.3  13450.5   30000  1054  24793  4153  44.8

 

Games        : 90000 (finished)

 

White Wins   : 6192 (6.9 %)
Black Wins   : 3800 (4.2 %)
Draws        : 80008 (88.9 %)
Unfinished   : 0
White Score  : 51.3 %
Black Score  : 48.7 %

 

You can download all played games from my Google-Drive. Mention: I deleted all comments in the pgn-files (eval, search depth etc.), because the files would be too big otherwise. Download here