Stefan Pohl Computer Chess

private website for chessengine-tests


Latest Website-News (2022/08/15): I updated all 3 tools (EAS-tool, Sacrifice Games Search tool and Short Games Analyzer tool), because there is a new pgn-extract version, which is bugfixed (no impact on my tools) and also incredible fast (Binary compiled by Thomas Plaschke): The tools are running around 3 times faster now (!!!), so you need only 30% amount of time for using them!!! Additionally, I improved the endgame-filter a bit and changed the EAS-scoring system a little bit. So, please re-download the tools:

EAS-Tool

Sacrifice Games Search Tool

Short Games Analyzer Tool

 

Ratinglist-testrun of Minic 3.26 finished: Good news: +68 Elo to Minic 3.22. Bad news: The EAS-Score of Minic 3.26 (15194) is a clear regression to Minic 3.22 (25014), so Minic lost 14 ranks in the EAS-Ratinglist

 

Next ratinglist-testrun: Arasan 23.4 followed by RubiChess 220813

 

 

Stay tuned.


Stockfish VLTC UHO Regression testing (2000 games (10min+3sec) vs Stockfish 15)

Latest testrun:

Stockfish 220806:  (+550,=996,-454)= 52.4% = +17 Elo (+1 Elo to previous test)

Best testrun so far:

Stockfish 220704:  (+559,=991,-450)= 52.7% = +19 Elo (+5 Elo to previous best)

See all results, get more information and download the games: Click on the yellow link above...


SPCC Top Engines Ratinglist (+ regular testing of Stockfish Dev-versions)

 

Playing conditions:

 

Hardware: Since 20/07/21 AMD Ryzen 3900 12-core (24 threads) notebook with 32GB RAM. 

Speed: (singlethread, TurboBoost-mode switched off, chess starting position) Stockfish 14.1: 750 kn/s (when 20 games are running simultaneously)

Hash: 256MB per engine

GUI: Cutechess-cli (GUI ends game, when a 5-piece endgame is on the board)

Tablebases: None for engines, 5 Syzygy for cutechess-cli

Openings: HERT_500 testset (by Thomas Zipproth) (download the file at the "Download & Links"-section or here). Mention, the HERT-set is not an Anti-Draw (UHO or something) opening-set, but a classical, balanced opening-set.

Ponder, Large Memory Pages & learning: Off

Thinking time: 180sec+1000ms (= 3min+1sec) per game/engine (average game-duration: around  7.5 minutes). One 7000 games-testrun takes about 2 days.The version-numbers of the Stockfish engines are the date of the latest patch, which was included in the Stockfish sourcecode, not the release-date of the engine-file, written backwards (year,month,day))(example: 200807 = August, 7, 2020). The used SF compile is the AVX2-compile, which is the fastest on my AMD Ryzen CPU. SF binaries are taken from abrok.eu (except the official SF-release versions, which are taken form the official Stockfish website).

 

To avoid distortions in the Ordo Elo-calculation, from now, only 2x Stockfish (latest official release + the latest 2 dev-versions)(all older engine-versions games will be deleted, every time, when a new version was tested). Stockfish's older Elo-results can still be seen in the Elo-diagrams below.

 

Latest update: 2022/08/14: Minic 3.26 (+68 Elo to Minic 3.22)

 

(Ordo-calculation fixed to Stockfish 15 = 3802 Elo)

 

See the individual statistics of engine-results here

See the Engines Aggressiveness Score Ratinglist here

Download the current gamebase here

Download the complete game-archive here

See the full SPCC-Ratinglist (without Stockfish dev-versions) from 2020 until today here

(calculating the EAS-Ratings of the full list has a high effort and will be done only from time to time, not after each test)

     Program                    Elo    +    -  Games    Score   Av.Op. Draws

   1 Stockfish 220806 avx2    : 3814    7    7  7000    69.8%   3664   59.8%
   2 Stockfish 220724 avx2    : 3809    8    8  7000    69.3%   3664   61.0%
   3 Stockfish 15 220418      : 3802    7    7  8000    69.5%   3655   60.7%
   4 KomodoDragon 3.1 avx2    : 3769    6    6 12000    64.6%   3658   64.2%
   5 KomodoDragon 3.1 MCTS    : 3707    6    6 12000    56.7%   3658   68.8%
   6 Berserk 9 avx2           : 3646    6    6 12000    46.0%   3676   67.3%
   7 Revenge 3.0 avx2         : 3645    6    6 12000    46.0%   3676   69.7%
   8 Koivisto 8.13 avx2       : 3643    6    6 11000    44.5%   3685   69.6%
   9 Ethereal 13.75 nnue      : 3624    6    6 11000    41.6%   3687   65.4%
  10 Fire 8.NN avx2           : 3617    6    6 10000    44.4%   3661   61.1%
  11 Slow Chess 2.9 avx2      : 3589    6    6 10000    42.4%   3646   65.8%
  12 Stockfish final HCE      : 3581    6    6  9000    45.3%   3616   59.9%
  13 Fire 8.NN MCTS avx2      : 3576    6    6  9000    45.8%   3607   68.0%
  14 rofChade 3.0 avx2        : 3555    6    6  9000    49.9%   3556   67.5%
  15 RubiChess 220223 avx2    : 3554    6    6 10000    45.5%   3588   65.5%
  16 Seer 2.5.0 avx2          : 3530    5    5 10000    51.7%   3518   66.4%
  17 Minic 3.26 znver3        : 3520    6    6  7000    57.5%   3465   61.6%
  18 Rebel 15.1 avx2          : 3497    6    6  9000    56.8%   3448   61.0%
  19 Uralochka 3.37c avx2     : 3477    6    6  9000    51.0%   3470   61.5%
  20 Black Marlin 7.0 avx2    : 3474    6    6  8000    55.1%   3437   60.2%
  21 Nemorino 6.00 avx2       : 3455    5    5 11000    53.9%   3426   53.3%
  22 Igel 3.1.0 popavx2       : 3454    5    5 11000    41.4%   3516   66.8%
  23 Minic 3.22 znver3        : 3452    6    6  8000    53.8%   3425   60.0%
  24 Arasan 23.3 avx2         : 3451    5    5 10000    52.3%   3434   58.9%
  25 Halogen 10.23.11 avx2    : 3414    6    6  8000    46.2%   3442   60.9%
  26 Clover 3.1 avx2          : 3401    6    6 10000    49.9%   3402   54.2%
  27 Tucano 10.00 avx2        : 3379    6    6  9000    46.2%   3408   57.3%
  28 Wasp 5.50 avx            : 3366    6    6  9000    47.4%   3386   52.8%
  29 Fritz 18 nnue avx2       : 3360    6    6  8000    46.4%   3386   53.3%
  30 Velvet 4.0.0 avx2        : 3357    6    6  9000    52.8%   3337   50.1%
  31 Coiled 1.1 avx2          : 3350    6    6  8000    44.9%   3386   58.6%
  32 Scorpio 3.0.14d cpu      : 3339    7    7  8000    50.2%   3337   50.5%
  33 Dragon 3 aggressive      : 3311    7    7  7000    50.6%   3306   42.7%
  34 Marvin 6.0.0 avx2        : 3310    7    7  7000    51.5%   3300   49.8%
  35 Gogobello 3 avx2         : 3308    6    6  9000    49.3%   3313   52.9%
  36 Zahak 10.0 avx           : 3306    6    6  8000    42.5%   3360   47.9%
  37 Lc0 0.29 dnll 791921     : 3293    7    7  9000    51.1%   3285   46.9%
  38 Weiss 2.0 popc           : 3293    6    6  9000    49.0%   3301   49.2%
  39 Combusken 2.0.0 amd64    : 3290    7    7  8000    50.7%   3285   47.6%
  40 Caissa 0.8 avx2          : 3266    7    7  7000    48.8%   3275   42.5%
  41 Stash 33.0 popc          : 3251    7    7  8000    45.9%   3280   47.7%
  42 Chiron 5 x64             : 3246    7    7 10000    42.1%   3304   42.6%
  43 Danasah 9.0 avx2         : 3240    7    7  9000    43.2%   3288   45.8%

The version-numbers (180622 for example) of the engines are the date of the latest patch, which was included in the Stockfish sourcecode, not the release-date of the engine-file. Especially the asmFish-engines are often released much later!! (Stockfish final HCE is Stockfish 200731, the latest version without neural-net and with HCE (=Hand Crafted Evaluation). This engine is (and perhaps will stay forever?) the strongest HCE (Hand Crafted Eval) engine on the planet. IMHO this makes it very interesting for comparsion.)

Some engines are using a nnue-net based on evals of other engines. I decided to test these engines, too. As far as I know the follwing engines use nnue-nets based on evals of other engines (if I missed an engine, please contact me):

Fire 8.NN, Nemorino 6.00, Gogobello 3, Coiled 1.1 (using Stockfish-eval-based nnue nets or nets directly from Stockfish website). Stockfish since 210615 (using Lc0-based nnue nets). Halogen 10.23.11 using a Koivisto-eval-based net.

Some engine-testruns were aborted, because the engine is too weak (below 3200 SPCC-Elo): LittleGoliath 3.15.3

Below you find a diagram of the progress of Stockfish in my tests since April 2022

And below that diagram, the older diagrams.

 

You can save the diagrams (as a JPG-picture (in originial size)) on your PC with mouseclick (right button) and then choose "save image"...

The Elo-ratings of older Stockfish dev-versions in the Ordo-calculation can be a little different to the Elo-"dots" in the diagram, because the results/games of new Stockfish dev-versions - when getting part of the Ordo-calculation - can change the Elo-ratings of the opponent engines and that can change the Elo-ratings of older Stockfish dev-versions (in the Ordo-calculation / ratinglist, but not in the diagram, where all Elo-"dots" are the rating of one Stockfish dev-version at the moment, when the testrun of that Stockfish dev-version was finished).

 

 

 

 

 

 

 

 

 


Sie sind Besucher Nr.