Stefan Pohl Computer Chessprivate website for chessengine-testsStockfish Regression Testing with long thinking time (10min+3sec = average game duration 30 minutes!!!)
Latest testrun: Stockfish 220504
Hardware: AMD Ryzen 3900 12-core (24 threads) notebook with 32GB RAM. 20 games are played simultaneously Speed: Singlethread, TurboBoost-mode switched off, chess starting position: Stockfish 15: 750000 n/s Hash: 512MB per engine GUI: Cutechess-cli (GUI ends game, when a 5-piece endgame is on the board) Tablebases: None for engines, 5 Syzygy for cutechess-cli Openings: Because meanwhile the high draw-rates made it impossible to measure Elo-progress in regression-tests, here my UHO_2022_6mvs_+120_+129 openings are used (part of my UHO 2022 download). Ponder, Large Memory Pages & learning: Off Thinking time: 10min+3sec per game/engine (average game-duration: around 30 minutes). One 2000 games-testrun takes about 47 hours.The version-numbers of the Stockfish engines are the date of the latest patch, which was included in the Stockfish sourcecode, not the release-date of the engine-file, written backwards (year,month,day) (example: 200807 = August, 7, 2020). The used SF compile is the AVX2-compile, which is the fastest on my AMD Ryzen CPU. SF binaries are taken from abrok.eu (except the official SF-release versions, which are taken form the official Stockfish website).
ORDO calculation fixed to reference-engine (Elo = 0) You can download all played games from my Google-Drive. Download here
Here the progress in regression-testing since Stockfish 15 (2022/04/18), with Elo of SF 15 set to 0 in a diagram: ![]() Program Elo + - Games Score Av.Op. Draws 1 Stockfish 15 220418 : 0 4 4 16000 63.4% -100 48.7%
White Wins : 7926 (49.5 %) Stockfish final HCE (date 2020/07/31) was the latest SF dev-version, before the nnue-neural-nets were introduced. So, this engine is (and perhaps will stay forever?) the strongest HCE (Hand Crafted Eval) engine on the planet, besides newer Stockfish-engines with nnue-net eval switched off. Below the regression-gamebase recalculated with my Gamepairs Rescorer Batch-Tool. Realizing Vondele's (Stockfish maintainer) idea: "Thinking uniquely in game pairs makes sense with the biased openings used these days. While pentanomial makes sense it is a bit complicated so we could simplify and score game pairs only (not games) as W-L-D (a traditional score of 2-0, or 1.5-0.5 is just a W)."
# PLAYER : RATING ERROR PLAYED W D L Score 1 Stockfish 15 220418 : 0
You can download my Gamepairs Rescorer Tool right here |