Stefan Pohl Computer Chess

private website for chessengine-tests


Latest Website-News (2022/12/07): Ratinglist- and VLTC-regression testruns of Stockfish 15.1 finished: +4 Elo to Stockfish 15 in ratinglist-testrun and +26 Elo to Stockfish 15 in VLTC UHO regression testrun. Additionally a measureable progress in my EAS-Ratinglist (Stockfish 15 is on rank 20 and Stockfish 15.1 jumped on rank 8 !)

 

And, I can confirm the result of the UHO testrun in Fishtest (where Stockfish 15.1 won two times more gamepairs vs. Stockfish 15, than it has lost): In my VLTC UHO testrun vs. Stockfish 15, Stockfish 15.1 has won 300 gamepairs (of 1000) and lost only 150 gamepairs... so a huge progress here is confirmed with long thinking-time (10min+3secs per game) and using unbalanced openings. This is important for the TCEC Superfinals... 

 

Stay tuned.


Stockfish VLTC UHO Regression testing (2000 games (10min+3sec) vs Stockfish 15)

Latest testrun:

Stockfish 15.1 221204:  (+571,=1008,-421)= 53.8% = +26 Elo (+5 Elo to previous test)

Best testrun so far:

Stockfish 220917:  (+571,=1016,-413)= 54.0% = +28 Elo (+6 Elo to previous best)

See all results, get more information and download the games: Click on the yellow link above...


SPCC Top Engines Ratinglist (+ regular testing of Stockfish Dev-versions)

 

Playing conditions:

 

Hardware: Since 20/07/21 AMD Ryzen 3900 12-core (24 threads) notebook with 32GB RAM. 

Speed: (singlethread, TurboBoost-mode switched off, chess starting position) Stockfish 14.1: 750 kn/s (when 20 games are running simultaneously)

Hash: 256MB per engine

GUI: Cutechess-cli (GUI ends game, when a 5-piece endgame is on the board, all other games are played until mate or draw by chess-rules (3fold, 50-moves, stalemate, insufficent material))

Tablebases: None for engines, 5 Syzygy for cutechess-cli

Openings: HERT_500 testset (by Thomas Zipproth) (download the file at the "Download & Links"-section or here). Mention, the HERT-set is not an Anti-Draw (UHO or something) opening-set, but a classical, balanced opening-set.

Ponder, Large Memory Pages & learning: Off

Thinking time: 3min+1sec per game/engine (average game-duration: 7 min 45sec). One 7000 games-testrun takes about 2 days.The version-numbers of the Stockfish engines are the date of the latest patch, which was included in the Stockfish sourcecode, not the release-date of the engine-file, written backwards (year,month,day))(example: 200807 = August, 7, 2020). The used SF compile is the AVX2-compile, which is the fastest on my AMD Ryzen CPU. SF binaries are taken from abrok.eu (except the official SF-release versions, which are taken form the official Stockfish website).

 

To avoid distortions in the Ordo Elo-calculation, from now, only 2x Stockfish (latest official release + the latest 2 dev-versions)(all older engine-versions games will be deleted, every time, when a new version was tested). Stockfish's older Elo-results can still be seen in the Elo-diagrams below.

 

Latest update: 2022/12/07: Stockfish 15.1 (+0 Elo to Stockfish 221123 and +4 Elo to Stockfish 15)

(best Stockfish Elo so far: Stockfish 220817 3814 SPCC-Elo)

 

(Ordo-calculation fixed to Stockfish 15 = 3802 Elo)

 

See the individual statistics of engine-results here

See the Engines Aggressiveness Score Ratinglist here

Download the current gamebase here

Download the complete game-archive here

See the full SPCC-Ratinglist and full EAS-Ratinglist (without Stockfish dev-versions) from 2020 until today here

(calculating the EAS-Ratings of the full list has a high effort and will be done only from time to time, not after each test)

     Program                    Elo    +    -  Games    Score   Av.Op. Draws

   1 Stockfish 15.1 221204    : 3806    8    8  7000    68.8%   3666   62.0%
   2 Stockfish 221123 avx2    : 3806    8    8  7000    68.7%   3666   62.3%
   3 Stockfish 15 220418      : 3802    7    7  9000    70.0%   3651   59.7%
   4 KomodoDragon 3.1 avx2    : 3766    7    7 12000    63.7%   3663   65.9%
   5 KomodoDragon 3.1 MCTS    : 3704    6    6 12000    55.6%   3663   70.8%
   6 Berserk 10 avx2          : 3658    6    6 11000    45.6%   3691   70.6%
   7 Revenge 3.0 avx2         : 3642    6    6 14000    47.0%   3666   71.6%
   8 Koivisto 8.13 avx2       : 3640    6    6 12000    44.3%   3683   71.4%
   9 RubiChess 221120 avx2    : 3633    5    5 16000    47.3%   3654   69.5%
  10 Ethereal 13.75 nnue      : 3619    6    6 13000    42.7%   3674   67.2%
  11 Fire 8.NN avx2           : 3615    6    6  9000    48.1%   3630   66.2%
  12 Seer 2.6.0 avx2          : 3600    6    6 11000    49.9%   3600   73.7%
  13 Slow Chess 2.9 avx2      : 3584    6    6 12000    42.8%   3638   66.6%
  14 Stockfish final HCE      : 3578    6    6 11000    47.7%   3596   60.1%
  15 Fire 8.NN MCTS avx2      : 3575    6    6 10000    47.1%   3597   67.6%
  16 rofChade 3.0 avx2        : 3550    6    6 10000    48.0%   3565   68.1%
  17 Minic 3.30 znver3        : 3525    6    6  8000    49.3%   3529   69.0%
  18 Uralochka 3.38c avx2     : 3510    6    6  8000    49.5%   3513   68.0%
  19 Rebel 15.1a avx2         : 3496    7    7  7000    51.5%   3486   62.6%
  20 PowerFritz 18 avx2       : 3473    6    6  8000    53.2%   3451   64.6%
  21 Arasan 23.4 avx2         : 3473    6    6  9000    49.4%   3477   62.1%
  22 Black Marlin 7.0 avx2    : 3461    6    6  9000    53.4%   3436   62.7%
  23 Nemorino 6.00 avx2       : 3454    6    6  9000    53.8%   3425   55.5%
  24 Igel 3.1.0 popavx2       : 3451    6    6  9000    44.0%   3494   66.9%
  25 Wasp 6.00 avx            : 3435    6    6  9000    48.1%   3448   60.7%
  26 Devre 4.0 avx2           : 3434    5    5  9000    48.6%   3444   66.4%
  27 Caissa 1.4 avx2          : 3416    5    5 12000    50.3%   3414   59.2%
  28 Halogen 11.4 avx2        : 3413    5    5 10000    48.6%   3423   64.2%
  29 Clover 3.1 avx2          : 3397    6    6  8000    47.1%   3417   58.1%
  30 Marvin 6.1.0 avx2        : 3387    6    6  9000    54.4%   3356   57.9%
  31 Tucano 10.00 avx2        : 3376    6    6  9000    50.8%   3370   60.3%
  32 Velvet 4.1.0 avx2        : 3370    6    6  8000    52.4%   3353   51.3%
  33 Coiled 1.1 avx2          : 3346    6    6  8000    46.6%   3370   58.0%
  34 Smallbrain 6.0 avx2      : 3341    6    6  9000    51.8%   3329   49.7%
  35 Scorpio 3.0.14d cpu      : 3337    7    7  8000    52.8%   3317   51.0%
  36 Weiss 220905 popc        : 3326    7    7  8000    49.9%   3326   50.5%
  37 Zahak 10.0 avx           : 3313    6    6  9000    46.8%   3336   50.9%
  38 Dragon 3 aggressive      : 3307    6    6 10000    47.8%   3323   42.8%
  39 Gogobello 3 avx2         : 3300    7    7 10000    48.3%   3313   54.3%
  40 Alexandria 3.0.2         : 3297    7    7  8000    53.1%   3275   52.4%
  41 Counter 5.0 amd64        : 3296    7    7 10000    51.9%   3282   50.5%
  42 Lc0 0.29 dnll 791921     : 3287    7    7 10000    49.7%   3289   46.0%
  43 Combusken 2.0.0 amd64    : 3284    7    7 10000    49.4%   3288   48.6%
  44 Stash 34.0 popc          : 3283    6    6 13000    48.2%   3296   47.9%
  45 Mantissa 3.7.2 avx2      : 3282    6    6 10000    48.7%   3291   53.1%
  46 Chiron 5 x64             : 3241    6    6 10000    41.3%   3305   42.1%
  47 Danasah 9.0 avx2         : 3231    6    6 10000    41.5%   3291   47.3%

The version-numbers (180622 for example) of the engines are the date of the latest patch, which was included in the Stockfish sourcecode, not the release-date of the engine-file. Especially the asmFish-engines are often released much later!! (Stockfish final HCE is Stockfish 200731, the latest version without neural-net and with HCE (=Hand Crafted Evaluation). This engine is (and perhaps will stay forever?) the strongest HCE (Hand Crafted Eval) engine on the planet. IMHO this makes it very interesting for comparsion.)

Some engines are using a nnue-net based on evals of other engines. I decided to test these engines, too. As far as I know the follwing engines use nnue-nets based on evals of other engines (if I missed an engine, please contact me):

Fire 8.NN, Nemorino 6.00, Gogobello 3, Coiled 1.1 (using Stockfish-eval-based nnue nets or nets directly from Stockfish website). Stockfish since 210615, Devre 4 (using Lc0-based nnue nets). Halogen 11.4 and Alexandria 3.0.2 using a Koivisto-eval-based net.

Some engine-testruns were aborted, because the engine is too weak (below 3200 SPCC-Elo): LittleGoliath 3.15.3, Viridithas 5.1.0

Some engine-testruns were aborted, because the new version was clearly weaker than the engine-version already listed: Fire 220827

Below you find a diagram of the progress of Stockfish in my tests since April 2022

And below that diagram, the older diagrams.

 

You can save the diagrams (as a JPG-picture (in originial size)) on your PC with mouseclick (right button) and then choose "save image"...

The Elo-ratings of older Stockfish dev-versions in the Ordo-calculation can be a little different to the Elo-"dots" in the diagram, because the results/games of new Stockfish dev-versions - when getting part of the Ordo-calculation - can change the Elo-ratings of the opponent engines and that can change the Elo-ratings of older Stockfish dev-versions (in the Ordo-calculation / ratinglist, but not in the diagram, where all Elo-"dots" are the rating of one Stockfish dev-version at the moment, when the testrun of that Stockfish dev-version was finished).

 

 

 

 

 

 

 

 

 


Sie sind Besucher Nr.