Stefan Pohl Computer Chess

private website for chessengine-tests


Stockfish Regression Testing with long thinking time 

(10min+3sec = average game duration 30 minutes!!!)

 

Latest testrun: Stockfish 220620
Reference point (opponent) is the latest official SF-release (Stockfish 15 right now).
Each SF-dev version plays 2000 games versus this engine

 

Hardware: AMD Ryzen 3900 12-core (24 threads) notebook with 32GB RAM. 20 games are played simultaneously

Speed: Singlethread, TurboBoost-mode switched off, chess starting position: Stockfish 15: 750000 n/s

Hash: 512MB per engine

GUI: Cutechess-cli (GUI ends game, when a 5-piece endgame is on the board)

Tablebases: None for engines, 5 Syzygy for cutechess-cli

Openings: Because meanwhile the high draw-rates made it impossible to measure Elo-progress in regression-tests, here my UHO_2022_6mvs_+120_+129 openings are used (part of my UHO 2022 download). 

Ponder, Large Memory Pages & learning: Off

Thinking time: 10min+3sec per game/engine (average game-duration: around  30 minutes). One 2000 games-testrun takes about 47 hours.The version-numbers of the Stockfish engines are the date of the latest patch, which was included in the Stockfish sourcecode, not the release-date of the engine-file, written backwards (year,month,day) (example: 200807 = August, 7, 2020). The used SF compile is the AVX2-compile, which is the fastest on my AMD Ryzen CPU. SF binaries are taken from abrok.eu (except the official SF-release versions, which are taken form the official Stockfish website).

 

ORDO calculation fixed to reference-engine (Elo = 0)

You can download all played games from my Google-Drive. Download here

 

Here the progress in regression-testing since Stockfish 15 (2022/04/18), with Elo of SF 15 set to 0 in a diagram:

     Program                    Elo    +    -  Games    Score   Av.Op. Draws

   1 Stockfish 220515 avx2    :   14   10   10  2000    52.0%      0   51.0%
   2 Stockfish 220602 avx2    :   13   10   10  2000    51.9%      0   51.8%
   3 Stockfish 220607 avx2    :   12   11   11  2000    51.8%      0   49.5%
   4 Stockfish 220529 avx2    :   12   10   10  2000    51.7%      0   53.0%
   5 Stockfish 220620 avx2    :    8   10   10  2000    51.1%      0   50.0%
   6 Stockfish 15 220418      :    0    3    3 26000    57.6%    -57   49.6%
   7 Stockfish 220422 avx2    :   -3   10   10  2000    49.5%      0   51.5%
   8 Stockfish 220504 avx2    :   -7   10   10  2000    49.0%      0   49.3%
   9 Stockfish 14.1 211028    :  -53   10   10  2000    42.5%      0   49.8%
  10 Stockfish 14 210702      :  -98   11   11  2000    36.4%      0   52.5%
  11 KomodoDragon 3 avx2      : -108   11   11  2000    35.1%      0   49.0%
  12 Stockfish 13 210218      : -134   11   11  2000    31.8%      0   49.5%
  13 Stockfish 12 200902      : -169   12   12  2000    27.6%      0   49.3%
  14 Stockfish final HCE      : -231   13   13  2000    21.1%      0   38.9%


Games        : 26000 (finished)

White Wins   : 12805 (49.3 %)
Black Wins   : 293 (1.1 %)
Draws        : 12902 (49.6 %)

Stockfish final HCE (date 2020/07/31) was the latest SF dev-version, before the nnue-neural-nets were introduced. So, this engine is (and perhaps will stay forever?) the strongest HCE (Hand Crafted Eval) engine on the planet, besides newer Stockfish-engines with nnue-net eval switched off.


Below the regression-gamebase recalculated with my Gamepairs Rescorer Batch-Tool. Realizing Vondele's (Stockfish maintainer) idea: "Thinking uniquely in game pairs makes sense with the biased openings used these days. While pentanomial makes sense it is a bit complicated so we could simplify and score game pairs only (not games) as W-L-D (a traditional  score of 2-0, or 1.5-0.5 is just a W)."

 

   # PLAYER                   :  RATING  ERROR  PLAYED     W    D    L  Score

   1 Stockfish 220515 avx2    :      29     15    1000   259  564  177  54.1%
   2 Stockfish 220602 avx2    :      26     15    1000   258  558  184  53.7%
   3 Stockfish 220607 avx2    :      24     15    1000   249  571  180  53.5%
   4 Stockfish 220529 avx2    :      24     15    1000   262  543  195  53.4%
   5 Stockfish 220620 avx2    :      15     15    1000   242  559  199  52.1%
   6 Stockfish 15 220418      :       0
   7 Stockfish 220422 avx2    :      -7     15    1000   208  565  227  49.0%
   8 Stockfish 220504 avx2    :     -14     14    1000   207  545  248  48.0%
   9 Stockfish 14.1 211028    :    -109     16    1000   108  482  410  34.9%
  10 Stockfish 14 210702      :    -211     18    1000    33  396  571  23.1%
  11 KomodoDragon 3 avx2      :    -239     18    1000    27  354  619  20.4%
  12 Stockfish 13 210218      :    -307     20    1000    12  271  717  14.8%
  13 Stockfish 12 200902      :    -466     29    1000     2  127  871   6.5%
  14 Stockfish final HCE      :    -669     54    1000     0   43  957   2.1%

 

You can download my Gamepairs Rescorer Tool right here