Stefan Pohl Computer Chess

private website for chessengine-tests


Stockfish Regression Testing

 

 

Latest testrun: Stockfish 210915
Reference point (opponent) is the latest official SF-release (Stockfish 14 right now).
Each SF-dev version plays 30000 games versus this engine

 

Hardware: AMD Ryzen 3900 12-core (24 threads) notebook with 32GB RAM. 20 games are played simultaneously

Speed: Singlethread, TurboBoost-mode switched off, chess starting position: Stockfish 11: 1.3 mn/s, Komodo 14: 1.1 mn/s

Hash: 128MB per engine

GUI: Cutechess-cli (GUI ends game, when a 5-piece endgame is on the board)

Tablebases: None for engines, 5 Syzygy for cutechess-cli

Openings: Because the short time-control and the not so beefy hardware are spreading the Elo-distances, very balanced openings are used, because playing very balanced openings shrink the Elo-distances: 8 moves human openings (both players 2400+ Elo) out of Megabase 2020, KomodoDragon 1.0 analyzed each endposition on a QuadcorePC with 15 seconds thinking-time. And the eval of KomodoDragon had to be in a very small, balanced interval of [+0.01;+0.15]. Using these balanced openings work very well: In a 30000 games regression-testrun of SF 14 vs. SF 13, the result was +36 Elo for SF 14, which fits very good to my regular Stockfish testings versus other engines (on my main site), where SF 14 is +34 Elo better than SF 13.

Ponder, Large Memory Pages & learning: Off

Thinking time: 20 sec + 200 ms per game/engine (average game-duration: around  1 minute). One 30000 games-testrun takes about 30 hours.The version-numbers of the Stockfish engines are the date of the latest patch, which was included in the Stockfish sourcecode, not the release-date of the engine-file, written backwards (year,month,day) (example: 200807 = August, 7, 2020). The used SF compile is the AVX2-compile, which is the fastest on my AMD Ryzen CPU. SF binaries are taken from abrok.eu (except the official SF-release versions, which are taken form the official Stockfish website).

 

ORDO calculation fixed to reference-engine (Elo = 0)

     Program                      Elo      +      -   Games   Score   Av.Op.  Draws

   1 Stockfish 210827 avx2    :   19.4    2.9    2.9 30000    52.8 %      0   88.1 %
   2 Stockfish 210915 avx2    :   18.3    2.7    2.7 30000    52.6 %      0   88.7 %
   3 Stockfish 210831 avx2    :   18.1    2.6    2.6 30000    52.6 %      0   88.5 %
   4 Stockfish 210906 avx2    :   17.8    2.8    2.8 30000    52.5 %      0   88.7 %
   5 Stockfish 210912 avx2    :   17.8    2.8    2.8 30000    52.5 %      0   88.7 %
   6 Stockfish 210910 avx2    :   17.8    2.6    2.6 30000    52.5 %      0   88.7 %
   7 Stockfish 210907 avx2    :   17.0    3.0    3.0 30000    52.4 %      0   88.9 %
   8 Stockfish 210822 avx2    :   16.0    2.7    2.7 30000    52.3 %      0   89.1 %
   9 Stockfish 210818 avx2    :   13.3    2.8    2.8 30000    51.9 %      0   89.1 %
  10 Stockfish 210815 avx2    :   10.3    2.6    2.6 30000    51.5 %      0   88.9 %
  11 Stockfish 210805 avx2    :    5.3    2.6    2.6 30000    50.8 %      0   91.7 %
  12 Stockfish 210730 avx2    :    3.9    2.7    2.7 30000    50.6 %      0   92.1 %
  13 Stockfish 210731 avx2    :    3.8    2.8    2.8 30000    50.5 %      0   91.5 %
  14 Stockfish 210726 avx2    :    3.4    2.6    2.6 30000    50.5 %      0   91.8 %
  15 Stockfish 210703 avx2    :    2.0    2.7    2.7 30000    50.3 %      0   91.9 %
  16 Stockfish 210713 avx2    :    1.1    2.8    2.8 30000    50.2 %      0   92.2 %
  17 Stockfish 210724 avx2    :    1.1    2.7    2.7 30000    50.2 %      0   91.7 %
  18 Stockfish 14 210702      :    0.0    0.6    0.6 540000   48.8 %      8   89.6 %
  19 Stockfish 13 210218      :  -36.3    2.8    2.8 30000    44.8 %      0   82.6 %


Games        : 540000 (finished)

White Wins   : 34161 (6.3 %)
Black Wins   : 21987 (4.1 %)
Draws        : 483852 (89.6 %)

 

You can download all played games from my Google-Drive. Mention: I deleted all comments in the pgn-files (eval, search depth etc.), because the files would be too big otherwise. Download here

 

Here the progress in regression-testing since Stockfish 14 (2021/07/02), with Elo of SF 14 set to 3757 (Elo-number of SF 14 in the regular Stockfish testing on the main site vs. other engines) in a diagram: