Stefan Pohl Computer Chess

Home of famous UHO openings and EAS Ratinglist


 

Here you find experimental testruns, which are not part of my regular testwork.

 

2024/02/15 Experimental testrun of Revenge 1.0 for my UHO-Top15 ratinglist, in order to test, if my EAS-tool works as I predicted.

 

Author of Willow 4.0 engine on talkchess said this about my EAS-tool:
"Also, the fact that Stockfish and Torch are at the top by a country mile suggests that a large part of what EAS is measuring is engines taking advantage of tactical mistakes by other engines rather than actively seeking out an aggressive play style."

 

So, here the proof, that this is completely wrong and my EAS-tool works as I always predicted:

I did a testrun of Revenge 1.0 (the strongest really aggressive playing engine besides Stockfish and Torch, but lightyears weaker than Stockfish and Torch, of course):
15000 games versus the Top15-engines of my UHO-Top15 ratinglist. Of course, Revenge 1.0 is way too weak, compared to these top engines. So, the score of Revenge 1.0 was only 18.3% (-141 Elo weaker, than the weakest engine in my UHO-Top15 ratinglist (RofChade 3.1) and Revenge 1.0 won only 465 games out of 15000 (!!!) 

     Program                    Elo    +    -  Games    Score   Av.Op. Draws

   1 Stockfish 16 230630      : 3821    4    4 15000    73.8%   3628   45.8%
   2 Torch 1 popavx2          : 3783    4    4 15000    69.3%   3631   46.3%
   3 KomodoDragon 3.3 avx2    : 3749    4    4 15000    65.0%   3633   47.0%
   4 Berserk 12 avx2          : 3725    4    4 15000    61.7%   3635   47.1%
   5 RubiChess 240112 avx2    : 3667    4    4 15000    53.7%   3639   48.4%
   6 Ethereal 14.25 nnue      : 3666    4    4 15000    53.5%   3639   49.2%
   7 Caissa 1.16 avx2         : 3665    4    4 15000    53.4%   3639   49.1%
   8 Obsidian 10.0 avx2       : 3653    4    4 15000    51.6%   3640   49.1%
   9 Seer 2.8.0 avx2          : 3621    4    4 15000    47.1%   3642   49.1%
  10 CSTal 2.0 avx2           : 3604    4    4 15000    44.7%   3643   49.5%
  11 Clover 6.1 avx2          : 3596    4    4 15000    43.6%   3643   49.7%
  12 Koivisto 9.2 avx2        : 3589    4    4 15000    42.6%   3644   48.3%
  13 Alexandria 6.0 avx2      : 3584    4    4 15000    41.9%   3644   48.2%
  14 Rebel EAS avx2           : 3573    4    4 15000    40.4%   3645   48.5%
  15 RofChade 3.1 avx2        : 3566    4    4 15000    39.4%   3645   47.1%
  16 Revenge 1.0 avx2         : 3385    5    5 15000    18.3%   3657   30.3%


Games        : 120000 (finished)

White Wins   : 57796 (48.2 %)
Black Wins   : 5741 (4.8 %)
Draws        : 56463 (47.1 %)


But now, look at the EAS-ratinglist, calculated out of these ratinglist games (120000 games):

 

                                 bad  avg.win 
Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    197919  31.18%  29.46%  17.09%   71   Revenge 1.0 avx2  
   2    184362  20.06%  23.61%  09.13%   71   Stockfish 16 230630  
   3    146678  15.17%  27.19%  14.12%   69   Torch 1 popavx2  
   4    122333  15.14%  21.14%  14.53%   72   KomodoDragon 3.3 avx2  
   5    101137  14.39%  17.85%  16.51%   74   RubiChess 240112 avx2  
   6     88201  12.09%  09.84%  16.04%   80   Obsidian 10.0 avx2  
   7     82332  15.98%  10.17%  19.46%   83   Rebel EAS avx2  
   8     81081  10.20%  12.17%  17.87%   80   CSTal 2.0 avx2  
   9     75262  09.37%  12.82%  19.57%   78   Clover 6.1 avx2  
  10     72552  13.23%  08.90%  17.29%   85   Ethereal 14.25 nnue  
  11     69024  10.48%  12.81%  21.57%   76   Caissa 1.16 avx2  
  12     68697  10.94%  09.81%  19.23%   81   Alexandria 6.0 avx2  
  13     66430  09.19%  09.78%  18.59%   80   Berserk 12 avx2  
  14     63224  08.24%  14.94%  23.39%   75   Seer 2.8.0 avx2  
  15     51774  08.79%  13.71%  24.52%   77   RofChade 3.1 avx2  
  16     50559  06.28%  08.08%  21.43%   84   Koivisto 9.2 avx2  
-------------------------------------------------------------------
*** Average length of all won games:     76 moves

 

A: Most high-value sacrifices (3+ pawnunits): [1]:05.38% Revenge 1.0 avx2   

                                              [2]:03.61% Stockfish 16 230630   

                                              [3]:02.31% Rebel EAS avx2   

                                              [4]:02.25% Torch 1 popavx2   

                                              [5]:01.78% Obsidian 10.0 avx2 

 
B: Most sacrifices overall                  : [1]:31.18% Revenge 1.0 avx2   

                                              [2]:20.06% Stockfish 16 230630   

                                              [3]:15.98% Rebel EAS avx2   

                                              [4]:15.17% Torch 1 popavx2   

                                              [5]:15.14% KomodoDragon 3.3 avx2  


C: Very short wins (45 moves or less)       : [1]:04.73% Revenge 1.0 avx2   

                                              [2]:02.85% Stockfish 16 230630   

                                              [3]:01.95% Torch 1 popavx2   

                                              [4]:01.87% KomodoDragon 3.3 avx2   

                                              [5]:01.15% Rebel EAS avx2  


D: Most short wins overall                  : [1]:29.46% Revenge 1.0 avx2   

                                              [2]:27.19% Torch 1 popavx2   

                                              [3]:23.61% Stockfish 16 230630   

                                              [4]:21.14% KomodoDragon 3.3 avx2   

                                              [5]:17.85% RubiChess 240112 avx2  


E: Average length of all won games          : [1]:069 Torch 1 popavx2   

                                              [2]:071 Revenge 1.0 avx2   

                                              [3]:071 Stockfish 16 230630   

                                              [4]:072 KomodoDragon 3.3 avx2   

                                              [5]:074 RubiChess 240112 avx2  

 

So, the clearly (very clearly!) weakest engine is on rank 1 in the EAS-ratinglist ! How awesome is that?
Additionally, I added the Revenge 1.0 games to my full UHO ratinglist, so you can download the games as a part of the gamebase of the full UHO ratinglist. 


2023/02/22 Experimental testrun of Rebel 16.2 with different values of the Evalcorrect UCI-parameter.

This option can be used to change the playing style of the engine. The default value is 202. Increasing the value should increase the engine aggressiveness. 

A 10000 games RoundRobin tournament was played. 60sec+600ms thinking-time, singlethread, no ponder, no bases, my UHO_2022_8mvs_+120_+129 openings were used. 

Download the games of this test right here

     Program                 Elo    +    -  Games    Score   Av.Op. Draws

   1 Rebel 16.2 default    : 3617    6    6  4000    52.4%   3600   56.1%
   2 Rebel 16.2 Ec=256     : 3601    6    6  4000    49.5%   3604   56.2%
   3 Rebel 16.2 Ec=300     : 3600    6    6  4000    49.4%   3604   57.2%
   4 Rebel 16.2 Ec=500     : 3600    6    6  4000    49.3%   3604   55.9%
   5 Rebel 16.2 Ec=400     : 3599    6    6  4000    49.3%   3604   57.0%


Games        : 10000 (finished)
White Wins   : 4190 (41.9 %)
Black Wins   : 162 (1.6 %)
Draws        : 5648 (56.5 %)

Below the Engines Aggressiveness Scoring (EAS), calculated with my EAS-Tool (V5.21):

                                 bad  avg.win 
Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1     75453  10.06%  20.43%  20.36%   79   Rebel 16.2 default  
   2     72790  09.25%  18.51%  20.45%   81   Rebel 16.2 Ec=400  
   3     71327  09.36%  16.14%  19.71%   81   Rebel 16.2 Ec=500  
   4     70839  10.50%  18.20%  20.60%   80   Rebel 16.2 Ec=256  
   5     61260  09.23%  16.55%  21.16%   82   Rebel 16.2 Ec=300  
-------------------------------------------------------------------
*** Average length of all won games:     80 moves

Conclusions: The Evalcorrect-parameter seems quite meaningless. As you can see, the strength and the aggressiveness of Rebel are nearly identical with all Evalcorrect-values and a higher Evalcorrect-value seems to lower the aggressiveness instead of increasing it...


2022/10/14 Experimental testruns (3) of Pedone 3 with different Strength-parameter settings. Merged into one pgn-file. 1min+1sec, singlethread, no ponder, no bases, balanced openings (Feobos c3). Because Pedone 3 plays very aggressive and runs on Android Smartphones, too. So, it is a very interesting engine for playing against as a human (on an electronical chessboard) etc.

Download Pedone 3 here (mention: Do not use Pedone 3.1, the successor, it plays not very aggressive!)

Download the 40500 played testgames and statistics here

(as you can see in the ratinglist, the strength-parameter is a little bit strange... It has a range of 0 up to 100, but several settings do not differ in strength...)

 

     Program             Elo    +    -  Games    Score   Av.Op. Draws

   1 Pedone 3.0 100    : 3350   75   75  2000    99.3%   2008    1.4%
   2 Pedone 3.0 99     : 2938   44   44  2000    93.3%   2029    4.4%
   3 Pedone 3.0 98     : 2740   36   36  2000    88.8%   2038    6.1%
   4 Pedone 3.0 97     : 2534   29   29  2000    82.8%   2049    7.4%
   5 Pedone 3.0 96     : 2375   23   23  2000    76.7%   2057   10.3%
   6 Pedone 3.0 94     : 2183   19   19  2000    66.6%   2066   13.9%
   7 Pedone 3.0 95     : 2181   17   17  2000    66.5%   2066   13.9%
   8 Pedone 3.0 93     : 2008   16   16  2000    53.7%   2075   14.4%
   9 Pedone 3.0 92     : 2007   17   17  2000    53.6%   2075   13.9%
  10 Pedone 3.0 91     : 1904   16   16  2000    44.2%   2080   14.9%
  11 Pedone 3.0 89     : 1823   16   16  2000    36.3%   2084   15.4%
  12 Pedone 3.0 90     : 1821   16   16  2000    36.0%   2084   14.5%
  13 Pedone 3.0 88     : 1817   16   16  2000    35.7%   2085   14.1%
  14 Pedone 3.0 87     : 1815   15   15  2000    35.5%   2085   15.3%
  15 Pedone 3.0 86     : 1720   16   16  2000    26.3%   2089   12.6%
  16 Pedone 3.0 81     : 1718   17   17  2000    26.0%   2090   11.7%
  17 Pedone 3.0 85     : 1717   17   17  2000    25.9%   2090   10.9%
  18 Pedone 3.0 82     : 1716   16   16  2000    25.8%   2090   11.7%
  19 Pedone 3.0 83     : 1715   16   16  2000    25.8%   2090   11.0%
  20 Pedone 3.0 84     : 1715   16   16  2000    25.7%   2090   11.5%
  21 Pedone 3.0 80     : 1712   10   10  5000    49.8%   1793   14.9%
  22 Pedone 3.0 79     : 1706   19   19  1000    62.0%   1619   17.1%
  23 Pedone 3.0 78     : 1704   19   19  1000    61.7%   1619   15.4%
  24 Pedone 3.0 77     : 1619   18   18  1000    48.8%   1628   16.4%
  25 Pedone 3.0 73     : 1605   19   19  1000    46.6%   1629   19.7%
  26 Pedone 3.0 76     : 1604   19   19  1000    46.5%   1629   17.7%
  27 Pedone 3.0 74     : 1594   20   20  1000    45.0%   1630   16.2%
  28 Pedone 3.0 72     : 1592   19   19  1000    44.8%   1630   18.7%
  29 Pedone 3.0 75     : 1590   19   19  1000    44.4%   1631   15.7%
  30 Pedone 3.0 71     : 1586   19   19  1000    43.9%   1631   15.3%
  31 Pedone 3.0 70     : 1584   12   12  5000    41.8%   1644   17.7%
  32 Pedone 3.0 60     : 1582   15   15  4000    54.4%   1550   17.5%
  33 Pedone 3.0 20     : 1517   24   24  4000    50.2%   1516   14.1%
  34 Pedone 3.0 30     : 1517   23   23  4000    50.0%   1517   13.9%
  35 Pedone 3.0 40     : 1517   20   20  4000    50.0%   1517   13.9%
  36 Pedone 3.0 50     : 1517   17   17  4000    45.5%   1549   15.2%
  37 Pedone 3.0 10     : 1515   27   27  2000    49.6%   1517   14.3%


Games        : 40500 (finished)

White Wins   : 18264 (45.1 %)
Black Wins   : 16734 (41.3 %)
Draws        : 5502 (13.6 %)


2022/07/03 Experimental testruns of 2 different TripleBrain "engines", using the aquiri-engine.

Download all 16000 played games and the aiquiri-engine folder here

First of all: Aiquiri does not run with all engines. For example: Koivisto 8.13 or Ethereal 13.75 did not work. So, if you use aiquiri, always check (with the Taskmanager), that "master.exe", "slave1.exe" and "slave2.exe" are running! I used cutechess-cli for my testruns. There I increased the timemargin-parameter to 2000 and set restart=on for the aiquire-engine (=reloading after each game finished). Timeparamters in aquiri were lowered to 35 for slaves (default 40) and 15 (default 20) for master. That worked. With my SPCC-testsetup (3min+1sec, singlethread, 20 games running simultaneously) I had only 8 timelosses in 16000 games - acceptable. I used my EAS-tool for measuring the aggressiveness of play of the engines (look here for more information) and the SPCC-Elo of the engines for the strength.

I built 2 different TripleBrains: (1) with a strong but solid playing master and two weaker but very aggressive playing slaves and (2) with a weaker but very aggressive master and two stronger but solid playing slaves. Mention, that the master-engine only chooses between the 2 slave-moves (if there are 2 different moves), the master-engine never plays an own move!!!

Here the results:

TripleBrain (1)   
Slave1:Pedone 3 (Elo: 3341 EAS: 11219)  Slave2:Velvet 3.3.0 (Elo: 3305 EAS: 10251)
                      Master:Berserk 9 (Elo: 3644 EAS: 688)
Result of aquiri TripleBrain: Elo: 3400 EAS: 5891 

 

TripleBrain (2)
Slave1:Clover 3.1 (Elo: 3401 EAS: 898) Slave2:Igel 3.1.0 (Elo: 3448 EAS: 932)
                      Master:Velvet 3.3.0 (Elo: 3305 EAS: 10251)
Result of aquiri TripleBrain: Elo: 3379 EAS: 1064

The results were (as expected by me):

(1) The stronger master increases the Elo of the TripleBrain (compared to the slaves), but the aggressiveness fades away (the EAS-score is only around 50% compared to the EAS-scores of the slaves)

(2) The weaker master decreases the Elo of the TripleBrain, but the aggressive playing master was not able to gain aggressiveness out of the solid-playing slaves (because you can not play aggressive, if you have to choose between 2 solid moves, only!).

So, IMO, these 2 experiments show clearly, that the TripleBrain-idea is useless...A solid, strong master make aggressive slaves play stronger, but their aggressiveness fades away. An aggressive playing master is unable to make 2 stronger solid plaing slaves play more aggressive.

 

2022/01/20 Experimental testrun of Fat Titz 2 vs. Stockfish 220113 with very long thinking-time. Goal: Find out, if Fat Titz 2 can benefit from his bigger nnue-net (compared to Stockfish), when the thinking-time is very long...Thinking-time: 20min+10sec on singlethread, average game-duration 65 minutes!!!

Extreme longtime testrun of Fat Titz 2 vs. Stockfish 210113
1000 games, singlethread, no ponder, no bases, i7-8750H 2.6GHz Notebook
Unbalanced Human Openings (UHO_V3_8mvs_+100_+109) for low draw-rate
Thinking-time:  20min+10secs !!! (average game-duration 65 minutes!)
 

     Program                    Elo    +    -   Games   Score   Av.Op.  Draws

   1 Fat Titz 2 bmi2          : 3802    7    7  1000    50.3 %   3800   51.7 %
   2 Stockfish 220113 bmi2    : 3800    7    7  1000    49.8 %   3802   51.7 %

 

Games        : 1000 (finished)
White Wins   : 482 (48.2 %)
Black Wins   : 1 (0.1 %)
Draws        : 517 (51.7 %)

 

Individual statistics: Fat Titz 2 bmi2: 1000 (+244,=517,-239), 50.3 %

 

Gamepairs rescoring tool result:

   # PLAYER                   :  RATING  ERROR  PLAYED    W    D    L   (%)
   1 Fat Titz 2 bmi2          :    3804     21     500  117  272  111  50.6
   2 Stockfish 220113 bmi2    :    3800   ----     500  111  272  117  49.4

 

Individual statistics: Fat Titz 2 bmi2: 500 (+117,=272,-111),  50.6 %

Conclusion: Fat Titz 2 does not benefit from his much bigger nnue-net (compared to Stockfish), even though the thinking-time was so long...

Download games and statistics here


2021/11/24 Experimental RoundRobin tournament with 6 different playing-styles of KomodoDragon 2.5 (Default, Defensive, Positional, Human, Active, Aggressive), each style combined with MCTS on and off = 12 engine-settings.

Tournament with 2'+1'' thinking-time on AMD Ryzen 3900 12-core (24 threads) notebook with 32GB RAM, singelthread-mode, no ponder, no bases (except for cutechess-cli, to end a game), cutechess-cli, classical, balanced 8 moves deep openings, played by humans (out of Megabase 2020, both players 2400 Elo or better). 100 rounds = 13200 games played.

Download the played games here

     Program                         Elo    +    -   Games   Score   Av.Op.  Draws

   1 Dragon 2.5 Default            :    0   16   16  2200    89.1 %   -409   20.8 %
   2 Dragon 2.5 Default MCTS       :  -96   14   14  2200    82.0 %   -400   29.6 %
   3 Dragon 2.5 Defensive          : -320   11   11  2200    58.3 %   -380   51.5 %
   4 Dragon 2.5 Positional         : -321   11   11  2200    58.2 %   -380   43.2 %
   5 Dragon 2.5 Human              : -329   11   11  2200    57.2 %   -379   26.7 %
   6 Dragon 2.5 Active             : -426   11   11  2200    44.5 %   -370   37.3 %
   7 Dragon 2.5 Defensive MCTS     : -428   10   10  2200    44.2 %   -370   49.9 %
   8 Dragon 2.5 Aggressive         : -465   10   10  2200    39.5 %   -367   41.3 %
   9 Dragon 2.5 Positional MCTS    : -469   11   11  2200    39.0 %   -367   39.1 %
  10 Dragon 2.5 Human MCTS         : -472   10   10  2200    38.6 %   -366   26.3 %
  11 Dragon 2.5 Active MCTS        : -575   12   12  2200    26.1 %   -357   32.4 %
  12 Dragon 2.5 Aggressive MCTS    : -600   12   12  2200    23.3 %   -355   32.2 %


Games        : 13200 (finished)

White Wins   : 4731 (35.8 %)
Black Wins   : 3736 (28.3 %)
Draws        : 4733 (35.9 %)


2021/10/09 Experimental RoundRobin tournament with 3 engines (Stockfish 211006, KomodoDragon 2.5 and KomodoDragon 2.5 MCTS), each with 5 different MultiPV-settings (1,2,3,5 and 7, were 1 is the normal, default playing mode). Goal: Measure, how much Elo is lost by calculating more than one PV-line. And to measure, if Dragon 2.5 MCTS has less Elo-loss, than the AlphaBeta-engines, when MultiPV is 3 or higher...This is the new testrun with Stockfish 211006 (fixed time-management in MultiPV-mode) instead of Stockfish 14. The games-download includes the games and the statistics of the first testrun, too, for comparsion. The ratings of the old testrun with Stockfish 14 are below, for comparsion.

Tournament with 3'+1'' thinking-time on AMD Ryzen 3900 12-core (24 threads) notebook with 32GB RAM, singelthread-mode, no ponder, no bases (except for cutechess-cli, to end a game), cutechess-cli, classical, balanced 6 moves deep openings, played by humans (out of Megabase 2020, both players 2400 Elo or better). 100 rounds = 10500 games played. Same engines = same color...

Download the played games here

     Program                      Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 211006   pv=1    :    0   13   13  1400    64.8 %   -110   69.9 %
   2 Stockfish 211006   pv=2    :  -24   13   13  1400    61.4 %   -108   73.1 %
   3 Stockfish 211006   pv=3    :  -49   12   12  1400    57.8 %   -106   74.9 %
   4 KomDragon 2.5 avx2 pv=1    :  -49   13   13  1400    57.8 %   -106   76.4 %
   5 KomDragon 2.5 avx2 pv=2    :  -78   13   13  1400    53.5 %   -104   78.6 %
   6 Stockfish 211006   pv=5    :  -90   13   13  1400    51.7 %   -103   74.1 %
   7 KomDragon 2.5 MCTS pv=1    : -108   13   13  1400    49.1 %   -102   78.6 %
   8 KomDragon 2.5 MCTS pv=2    : -112   13   13  1400    48.5 %   -102   79.7 %
   9 KomDragon 2.5 MCTS pv=3    : -112   12   12  1400    48.5 %   -102   76.1 %
  10 Stockfish 211006   pv=7    : -112   12   12  1400    48.5 %   -102   71.6 %
  11 KomDragon 2.5 MCTS pv=5    : -114   13   13  1400    48.2 %   -101   78.0 %
  12 KomDragon 2.5 MCTS pv=7    : -117   13   13  1400    47.7 %   -101   78.8 %
  13 KomDragon 2.5 avx2 pv=3    : -118   12   12  1400    47.6 %   -101   73.6 %
  14 KomDragon 2.5 avx2 pv=5    : -187   12   12  1400    37.6 %    -96   62.9 %
  15 KomDragon 2.5 avx2 pv=7    : -264   14   14  1400    27.4 %    -91   50.7 %

 

For comparsion, here the ratings with Stockfish 14 (buggy time-management in MultiPV-mode):

     Program                      Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 14  avx2 pv=1    :    0   13   13  1400    64.2 %   -105   70.6 %
   2 KomDragon 2.5 avx2 pv=1    :  -32   13   13  1400    59.7 %   -103   76.5 %
   3 Stockfish 14  avx2 pv=2    :  -32   13   13  1400    59.7 %   -103   74.9 %
   4 KomDragon 2.5 avx2 pv=2    :  -55   13   13  1400    56.3 %   -101   77.5 %
   5 Stockfish 14  avx2 pv=3    :  -73   12   12  1400    53.7 %   -100   74.6 %
   6 KomDragon 2.5 MCTS pv=7    :  -86   13   13  1400    51.7 %    -99   77.9 %
   7 KomDragon 2.5 MCTS pv=5    :  -89   13   13  1400    51.3 %    -98   78.0 %
   8 KomDragon 2.5 MCTS pv=1    :  -89   13   13  1400    51.2 %    -98   78.7 %
   9 KomDragon 2.5 MCTS pv=3    :  -92   12   12  1400    50.8 %    -98   77.6 %
  10 KomDragon 2.5 avx2 pv=3    :  -95   12   12  1400    50.3 %    -98   75.9 %
  11 KomDragon 2.5 MCTS pv=2    :  -97   12   12  1400    50.0 %    -98   78.4 %
  12 Stockfish 14  avx2 pv=5    : -145   12   12  1400    43.0 %    -94   68.0 %
  13 KomDragon 2.5 avx2 pv=5    : -169   12   12  1400    39.5 %    -93   63.1 %
  14 Stockfish 14  avx2 pv=7    : -188   13   13  1400    36.8 %    -91   60.7 %
  15 KomDragon 2.5 avx2 pv=7    : -225   14   14  1400    31.8 %    -89   52.9 %

Conclusions (new testrun): 1) The MCTS-mode is very good for MultiPV-analyzing. As you can see, all 5 KomodoDragon 2.5 MCTS MultiPV-engines are in a very small range of 9 Elo, only (!). 

2) The Elo-difference of Stockfish 211006 and KomodoDragon 2.5 non-MCTS is: 

pv=1: 49 Elo / pv=2: 54 Elo / pv=3: 69 Elo / pv=5: 97 Elo / pv=7: 152 Elo. In contrast to the old testrun with SF 14, the Elo-difference of Stockfish 211006 and KomodoDragon 2.5 non-MCTS is increasing with higher number of pv-lines very clearly.

 

2021/03/02 Huge CloneWars tournament. Stockfish 13 vs. 10 Stockfish derivatives/clones. 20000 games.

 

60''+600ms thinking-time, singlethread, i7-8750H 2.6GHz (Hexacore) Notebook, Windows 10 64bit, no ponder, 5 Syzygy bases for cutechess-cli - none for the engines. All engines bmi2-binary.

My Unbalanced Human Openings V2.00 6moves openings were used (low draw-rate and a wider Elo-spreading than classical opening-sets!).

     Program                   Elo    +    -   Games   Score   Av.Op.  Draws

   1 CFish 210208 bmi2       : 3742   11   11  2000    52.7 %   3723   56.0 %
   2 Stockfish 13 bmi2       : 3723    3    3 20000    55.8 %   3682   54.3 %
   3 CorChess 1.3 bmi2       : 3716   10   10  2000    49.0 %   3723   56.4 %
   4 Sugar AI 1.50 bmi2      : 3707   10   10  2000    47.7 %   3723   56.1 %
   5 Eman 6.93 bmi2          : 3706   10   10  2000    47.6 %   3723   55.0 %
   6 Fat Fritz 2 bmi2        : 3695   10   10  2000    46.0 %   3723   54.0 %
   7 Honey 13 bmi2           : 3687   10   10  2000    44.9 %   3723   55.3 %
   8 ShashChess 15.1 bmi2    : 3670   11   11  2000    42.5 %   3723   55.5 %
   9 Raubfisch x44 bmi2      : 3639   11   11  2000    38.2 %   3723   55.4 %
  10 Fat Fritz 2 free        : 3633   11   11  2000    37.5 %   3723   48.6 %
  11 Crystal 3.1 bmi2        : 3623   11   11  2000    36.0 %   3723   51.1 %


Games        : 20000 (finished)

White Wins   : 8750 (43.8 %)
Black Wins   : 382 (1.9 %)
Draws        : 10868 (54.3 %)

Conclusions: All derivatives/clones are playing measureable weaker than Stockfish 13, except CFish - no surprise, because CFish is running a little bit faster than Stockfish and has no other changes. Mention, that the used Unablanced Human Openings spread the Elo-distances around 2x bigger than a classical opening set...

 

Download the games here

 

2020/12/05 Huge experimental test (3x 7000 games) of the Eman 6.60 learning-feature.

 

I was curious, if (and how many) Eman 6.60 will gain by using it's learning-function. Eman 6.60 writes an experience-file, when he is playing. So I did 3 7000 games testruns, starting with no experience and then let Eman learn and learn. Each of the 3 testruns were 100% identical to the others, expcept, that Eman was allowed to learn and to keep the Eman.exp-file for the next testrun. All conditions like a normal Stockfish-testrun (see main-site): 3'+1'', singlecore, Hash 256MB, no ponder, 500 HERT openings.

As you can see below, the results are a complete disappointment. The second testrun (using experience-file of the first testrun) gave +4 Elo, the third testrun (using experience-file of the first and the second testrun) gave no progress at all, even though the Eman.exp-file size was 72 Megabyte, after the third testrun was finished...

     Program                      Elo    +    -   Games   Score   Av.Op.  Draws

   1 CFish 12 3xCerebellum      : 3726    9    9  7000    86.1 %   3389   27.3 %
   2 Stockfish 201115 avx2      : 3724    8    8  7000    78.3 %   3473   41.9 %
   3 Stockfish 201126 avx2      : 3722    8    8  7000    78.1 %   3473   42.6 %
   4 Eman 6.60 avx2 3rd_run     : 3716    8    8  7000    77.6 %   3473   43.4 %
   5 Eman 6.60 avx2 2nd_run     : 3716    8    8  7000    77.6 %   3473   43.0 %
   6 Eman 6.60 avx2             : 3712    8    8  7000    77.3 %   3473   43.1 %
   7 CFish 12 avx2              : 3703    8    8  7000    84.6 %   3389   29.1 %
   8 Stockfish 12 200902        : 3684    4    4 25000    74.1 %   3470   45.4 %

 *** rest of the ratinglist deleted ***

 

Individual statistics:

 

Eman 6.60 avx2 3rd_run: 3716 7000 (+3914,=3035,- 51), 77.6 %

 

RubiChess 1.9dev nnue      : 1000 (+803,=195,-  2), 90.0 %
KomodoDragon 1.0 avx2      : 1000 (+227,=758,- 15), 60.6 %
Slow Chess 2.4 popc        : 1000 (+726,=274,-  0), 86.3 %
Ethereal 12.75 avx2        : 1000 (+706,=291,-  3), 85.2 %
Nemorino 6.00 avx2         : 1000 (+634,=363,-  3), 81.5 %
Stockfish 12 200902        : 1000 (+123,=850,- 27), 54.8 %
Houdini 6 pext             : 1000 (+695,=304,-  1), 84.7 %
**********************************************************
Eman 6.60 avx2 2nd_run: 3716 7000 (+3927,=3007,- 66), 77.6 %

 

RubiChess 1.9dev nnue      : 1000 (+802,=196,-  2), 90.0 %
KomodoDragon 1.0 avx2      : 1000 (+202,=782,- 16), 59.3 %
Slow Chess 2.4 popc        : 1000 (+743,=254,-  3), 87.0 %
Ethereal 12.75 avx2        : 1000 (+712,=285,-  3), 85.5 %
Nemorino 6.00 avx2         : 1000 (+645,=353,-  2), 82.2 %
Stockfish 12 200902        : 1000 (+110,=854,- 36), 53.7 %
Houdini 6 pext             : 1000 (+713,=283,-  4), 85.5 %
**********************************************************
Eman 6.60 avx2        : 3712 7000 (+3900,=3018,- 82), 77.3 %

 

RubiChess 1.9dev nnue      : 1000 (+803,=197,-  0), 90.2 %
KomodoDragon 1.0 avx2      : 1000 (+205,=770,- 25), 59.0 %
Slow Chess 2.4 popc        : 1000 (+740,=254,-  6), 86.7 %
Ethereal 12.75 avx2        : 1000 (+718,=277,-  5), 85.7 %
Nemorino 6.00 avx2         : 1000 (+620,=378,-  2), 80.9 %
Stockfish 12 200902        : 1000 (+113,=849,- 38), 53.8 %
Houdini 6 pext             : 1000 (+701,=293,-  6), 84.8 %
**********************************************************

 

2020/02/05 Huge experimental test-tournament (31500 games (!)) of Stockfish 11 with 7 different Contempts (-40, -24, -15, 0, +15, +24 (=default of SF 11), +40).

 

Thinking-time: 1'+1'', singlethread, 256 Hash, no ponder, no endgame-bases for engines (5 Syzygy for cutechess-cli). 5 human moves openings.

Download all played games here

 

     Program                 Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 11 C=0      : 3558    4    4  9000    50.7 %   3553   79.3 %
   2 Stockfish 11 C=+15    : 3558    4    4  9000    50.6 %   3554   77.8 %
   3 Stockfish 11 C=-24    : 3555    4    4  9000    50.1 %   3554   82.7 %
   4 Stockfish 11 C=-15    : 3554    4    4  9000    50.0 %   3554   81.7 %
   5 Stockfish 11 C=+24    : 3554    4    4  9000    50.0 %   3554   76.2 %
   6 Stockfish 11 C=+40    : 3551    4    4  9000    49.5 %   3555   73.8 %
   7 Stockfish 11 C=-40    : 3549    4    4  9000    49.2 %   3555   84.5 %

 

Conclusions: Only the +40 and -40 Contempt results are somewhat weaker. All other Contempts are inside errorbar at the same level of strength.

 

 

 

2019/02/20 Testrun of the new Drawkiller balanced set (and testruns of Drawkiller tournament, Stockfish Framework 8moves and GM-4moves sets for comparsion).

 

3 engines played a RoundRobin (Stockfish 10, Houdini 6 and Komodo 12), with 500 games in each head-to-head, so each engine played 1000 games. For each game one opening-line was chosen per random by the LittleBlitzerGUI.

Singlecore, 3'+1'', LittleBlitzerGUI, no ponder, no bases, 256 MB Hash, i7-6700HQ 2.6GHz Notebook (Skylake CPU), Windows 10 64bit

 

In the Drawkiller balanced sets, all endposition-evals (analyzed by Komodo) of the opening lines are in a very small interval of [-0.09;+0.09]. The idea is, that this should lead to wider Elo-spreading of the Engine ratings, which makes the Engine rankings much more statistically reliable (or a much lower number of played games is needed, to get the results out of the errorbar-arrays). Of course, on the other hand, this concept leads to little bit higher draw-rates...

Let's see, if it worked:

 

Drawkiller balanced: 

 

     Program                Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 10 bmi2    : 3506   11   11  1000    70.9 %   3347   36.2 %
   2 Houdini 6 pext       : 3392   11   11  1000    48.5 %   3404   40.8 %
   3 Komodo 12 bmi2       : 3302   11   11  1000    30.6 %   3449   36.6 %

 

Elo-spreading (1st to last): 204 Elo

Draws: 37.9%

 

 

Drawkiller tournament:

 

     Program                Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 10 bmi2    : 3494   11   11  1000    68.9 %   3353   34.2 %
   2 Houdini 6 pext       : 3387   11   11  1000    47.3 %   3407   38.2 %
   3 Komodo 12 bmi2       : 3320   11   11  1000    33.8 %   3440   36.0 %

 

Elo-spreading (1st to last): 174 Elo

Draws: 36.1%

 

 

GM_4moves:

 

     Program                Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 10 bmi2    : 3475   11   11  1000    65.4 %   3363   53.2 %
   2 Houdini 6 pext       : 3381   10   10  1000    46.0 %   3410   59.9 %
   3 Komodo 12 bmi2       : 3345   10   10  1000    38.5 %   3428   55.9 %

 

Elo-spreading (1st to last): 130 Elo

Draws: 56.3%

 

 

Stockfish framework 8moves:

 

     Program                Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 10 bmi2    : 3463   11   11  1000    63.0 %   3369   59.7 %
   2 Houdini 6 pext       : 3388   10   10  1000    47.5 %   3406   64.2 %
   3 Komodo 12 bmi2       : 3349   10   10  1000    39.5 %   3425   60.1 %

 

Elo-spreading (1st to last): 114 Elo

Draws: 61.3%

 

Conclusions: 

 

1) The Drawkiller balanced idea was a success. The draw-rate is a little bit higher, than Drawkiller tournament (that is price, we have to pay for 2)), but look at point 2) and mention, that even this little higher draw-rate is still much, much lower, than the draw-rate of any other non-Drawkiller openings set...

 

2) The Elo-spreading, using Drawkiller balanced, was measureable higher, than with any other openings-set. That makes the Engine rankings much more statistical reliable. Or a much lower number of played games is needed, to get the results out of the errorbar-arrays: 

Example: Compared to the result of Stockfish framework 8moves openings, the Elo-spreading of Drawkiller balanced is nearly doubled, which means, you can have a doubled errorbar-array size for the same statistical reliability of the Engine rankings in a tournament / ratinglist. Mention, that you have to play 4x more games to half the size of an errorbar! That means, if you are using Drawkiller balanced openings, you have to play only 25%-30% amount of games, which you have to play, when using Stockfish Framework 8move openings for the same statistical result-quality of engine rankings (!!!) - how awesome is that?!?

 

 

 

2019/01/26 Testrun of the Skill-Levels of Stockfish 10

 

I made two large testruns with Stockfish 10, playing RoundRobin vs. itself with different Skill-Levels.

First testrun: Level 20-10 (11000 games, 1'+1'', singlecore)

Second testrun: Level 10-0 (5500 games 1'+1'', singlecore)

Then both game-pools were linked together and ORDO-calculated (fixed to 3450 Elo to Stockfish 10, Level 20, which is the Elo of Stockfish 10 in the CEGT-ratinglist (40m/4', singleCPU)).

Specs: Intel Quadcore-Notebook (SF 10 around 1.4 Mn/s in singlecore-mode), LittleBlitzerGUI, Stockfish Framework 8move-openings. No ponder, no bases. 256MB Hash per engine.

 

     Program                        Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 10 bmi2 (100%)     : 3450   47   47  2000    98.5 %   2601    2.8 %
   2 Stockfish 10 lev=19 (95%)    : 2905   22   22  2000    73.9 %   2656   16.9 %
   3 Stockfish 10 lev=18 (90%)    : 2872   22   22  2000    71.2 %   2659   17.6 %
   4 Stockfish 10 lev=17 (85%)    : 2815   22   22  2000    66.0 %   2665   17.7 %
   5 Stockfish 10 lev=16 (80%)    : 2761   21   21  2000    60.8 %   2670   20.4 %
   6 Stockfish 10 lev=15 (75%)    : 2657   21   21  2000    50.3 %   2681   19.9 %
   7 Stockfish 10 lev=14 (70%)    : 2571   21   21  2000    41.5 %   2689   15.6 %
   8 Stockfish 10 lev=13 (65%)    : 2483   21   21  2000    32.7 %   2698   13.9 %
   9 Stockfish 10 lev=12 (60%)    : 2406   22   22  2000    25.5 %   2706   12.4 %
  10 Stockfish 10 lev=11 (55%)    : 2320   21   21  2000    18.3 %   2714   10.5 %
  11 Stockfish 10 lev=10 (50%)    : 2221   16   16  3000    36.9 %   2386    5.9 %
  12 Stockfish 10 lev=9 (45%)     : 2129   26   26  1000    81.8 %   1720    4.0 %
  13 Stockfish 10 lev=8 (40%)     : 2067   25   25  1000    76.8 %   1726    4.2 %
  14 Stockfish 10 lev=7 (35%)     : 1976   25   25  1000    68.8 %   1735    4.9 %
  15 Stockfish 10 lev=6 (30%)     : 1881   25   25  1000    60.2 %   1745    3.4 %
  16 Stockfish 10 lev=5 (25%)     : 1823   25   25  1000    54.9 %   1751    2.8 %
  17 Stockfish 10 lev=4 (20%)     : 1678   26   26  1000    42.0 %   1765    2.5 %
  18 Stockfish 10 lev=3 (15%)     : 1538   28   28  1000    30.1 %   1779    1.1 %
  19 Stockfish 10 lev=2 (10%)     : 1443   29   29  1000    22.7 %   1789    1.0 %
  20 Stockfish 10 lev=1 (5%)      : 1341   32   32  1000    15.4 %   1799    0.3 %
  21 Stockfish 10 lev=0 (0%)      : 1231   36   36  1000     8.8 %   1810    0.3 %

 

(Stockfish 10: 3450 Elo is the CEGT-ranking 40m/4'), The percent-numbers in brackets are the value of the "strength-meter"  in the Droidfish-App for Smartphones...

 

 

2019/01/06 One of the biggest opening-sets testings of all time!

 

8 opening-sets were tested: Drawkiller tournament, SALC V5, Noomen (TCEC openings Season 9-13 Superfinal and Gambit-openings), Stockfish Framework 2-moves and 8-moves openings, 4 GM moves (out of MegaBase 2018, checked with Komodo), the HERT set by Thomas Zipproth and FEOBOS v20.1 contempt 3 (using contempt 3 openings is recommended by the author, Frank Quisinsky). 7 engines played a 2100 games RoundRobin-tournament with each opening-set (not openings-set playing vs. another opening-set!). For each game one opening-line was chosen per random by the GUI.

7 engines played round-robin: Stockfish 10, Houdini 6, Komodo 12, Fire 7.1, Ethereal 11.12, Komodo 12.2.2 MCTS, Shredder 13. = 100 games were played in each head-to-head competition. In each round-robin, each engine played 600 games.

Singlecore, 3'+1'', LittleBlitzerGUI, no ponder, no bases, 256 MB Hash, i7-6700HQ 2.6GHz Notebook (Skylake CPU), Windows 10 64bit. 3 games running in parallel, each testrun took 3-4 days, depending on the average game-duration. Draw adjucation after 130 played moves by the engines (after finishing opening-line)


Conclusions (all data and results below!):

 

First of all the main question: Why are low draw-rates and wide Elo-spreadings of engine testing-results better? You find the answer here

 

This excellent experiment of Andreas Strangmueller shows without any doubt, that:

The more thinking-time (or faster hardware, thats the same!) the computerchess gets, the more the draw-rates climb and the more the Elo-spreadings shrink. So, it is only a question of time, that the draw-rates will get so high and the Elo-spreading of testings-results will get so small, that engine-testing or engine-tournaments will no longer give any valuable results, because the Elo-differences of results will always stay inside the errorbars, even with thousands of played games. So, it is absolute necessary to lower the draw-rates and raise the Elo- spreadings, if computerchess shall survive the next decades!

Therefore the follwing conclusions of this huge experiment with different opening-sets:

 

1) The Drawkiller openings are a breakthrough into another dimension of engine-testing: The overall draw-rate (27%) is nearly halved, compared to classical openings sets (FEOBOS (51.3%), Stockfish Framework 8moves openings (51.9%)) AND the Elo-spreading is around +150 Elo better (!!), so the rankings are much more stable and reliable, because the errorbars of all results are nearly the same in all testruns. And the average game-duration, using Drawkiller, was 11.5% lower, than using a classical opening-set. So, in the same time, you can play more than +10% games on the same machine, which improves the quality of the results, too, because the errorbars get smaller with more played games. Download the future of computerchess (the Drawkiller openings): here

 

2) The order of rank of the engines is in all mini-ratinglists generated by ORDO out of these testruns exactly the same. So, what we learn here, is, that it does not matter, if an opening-set contains all ECO-codes (FEOBOS does!) or not (Drawkiller, SALC V5 do definitly not!). The order of rank of engines in a ratinglist is exactly the same! So, the over and over repeated statement of many people, that using all or the mostly played ECO-codes (by human players) in an opening-set is important for engine-testing, because otherwise the results are distorted, is a FAIRY TALE and nothing else !!!

 

3) At the bottom, I added the CEGT and CCRL ratinglists with the same engines, which were used for this project (nearly the same versions (Ethereal 11 instead Ethereal 11.12 for example)). There you can see, that the ranking in these ratinglist is exactly the same, too. So, what we learn here, is, that the over and over repeated statement of many people, that it is necessary to test engines versus a lot of opponents for a valid rating/ranking is a FAIRY TALE, too: 6 opponents gave the same ranking-results in all testruns of this project, than in CEGT in CCRL with much, much more opponents.

 

4) The FEOBOS-project was a complete waste of time and resources. It took more than one year of work and calculations, but the results are not measureable better, than the results of Stockfish Framework 8moves openings: The overall draw-rate is 0.6% better (=nothing). The Elo-spreading is +21 Elo better (=nearly nothing). And the prime-target of FEOBOS was, to avoid early draws: The number of early draws until move 10 (after leaving the opening-line) is 0, but with Stockfish Framework 8moves openings, there are only 6 games draw until 10 moves. Out of 2100 games (=0.29%). And the number of early draws until move 20 and move 30, FEOBOS is slightly worse, than the Stockfish Framework 8moves openings. So, even the prime-target of FEOBOS failed.

 

5) The Noomen openings, which lower the draw-rate in the TCEC superfinals and the Noomen gamibt-lines, lowered the overall draw-rate (43%) compared to classical openings sets (FEOBOS (51.3%), Stockfish Framework 8moves openings (51.9%)), but not very much (compared to Drawkiller (27%)). And the Elo- spreading is only a little bit better, than the Elo-spreading of the classical opening-sets. So, these Noomen-openings are a little improvement, but not more. And the number of openings is very, very small (only 477 lines): too small for building an opening book. Drawkiller tournament contains 6848 lines and can be used as opening-book, of course.

 


Short summary (sorted by the overall draw-rate):


Drawkiller tournament:

Draws        : 566 (27.0 %)
Avg game length = 389.777 sec
Elo-spreading: from first to last: 448 Elo


SALC V5:

Draws        : 829 (39.5 %)
Avg game length = 399.781 sec
Elo-spreading: from first to last: 341 Elo


Noomen (TCEC openings Season 9-13 Superfinal and Gambit-openings (477 lines)):

Draws        : 902 (43.0 %)
Avg game length = 405.223 sec
Elo-spreading: from first to last: 312 Elo


Stockfish Framework 2moves openings:

Draws        : 929 (44.2 %)
Avg game length = 430.108 sec
Elo-spreading: from first to last: 333 Elo


4 GM moves (out of MegaBase 2018, checked with Komodo):

Draws        : 975 (46.4 %)
Avg game length = 449.414 sec
Elo-spreading: from first to last: 330 Elo


HERT set (500 pos):

Draws        : 1013 (48.2 %)
Avg game length = 442.339 sec
Elo-spreading: from first to last: 316 Elo


FEOBOS v20.1 contempt 3:

Draws        : 1077 (51.3 %)
Avg game length = 437.481 sec
Elo-spreading: from first to last: 302 Elo


Stockfish Framework 8moves openings:

Draws        : 1090 (51.9 %)
Avg game length = 438.899 sec
Elo-spreading: from first to last: 281 Elo

 

 

Long summary (with ratinglists):


Drawkiller tournament:

 

Avg game length = 389.777 sec

 

     Program                  Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 10 bmi2      : 3459   23   23   600    82.6 %   3157   21.8 %
   2 Houdini 6 pext         : 3356   20   20   600    71.1 %   3174   29.2 %
   3 Komodo 12 bmi2         : 3294   20   20   600    63.3 %   3184   30.8 %
   4 Fire 7.1 popc          : 3145   19   19   600    42.8 %   3209   30.3 %
   5 Ethereal 11.12 pext    : 3076   19   19   600    33.5 %   3221   28.0 %
   6 Komodo 12.2.2 MCTS     : 3060   20   20   600    31.4 %   3223   25.2 %
   7 Shredder 13 x64        : 3011   20   20   600    25.3 %   3231   23.3 %

 

Elo-spreading: from first to last: 448 Elo

Number of early draws:
first 10 moves played by engines: 0 draws= 0%
first 20 moves played by engines: 10 draws= 0.48%
first 30 moves played by engines: 46 draws= 2.19%

 

Games        : 2100 (finished)
White Wins   : 822 (39.1 %)
Black Wins   : 712 (33.9 %)
Draws        : 566 (27.0 %)
White Score  : 52.6 %
Black Score  : 47.4 %


 

SALC V5:

 

Avg game length = 399.781 sec

 

     Program                  Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 10 bmi2      : 3404   21   21   600    78.3 %   3166   32.8 %
   2 Houdini 6 pext         : 3304   19   19   600    65.4 %   3183   39.8 %
   3 Komodo 12 bmi2         : 3266   18   18   600    60.1 %   3189   44.5 %
   4 Fire 7.1 popc          : 3166   18   18   600    45.3 %   3206   46.2 %
   5 Ethereal 11.12 pext    : 3120   18   18   600    38.4 %   3213   43.2 %
   6 Komodo 12.2.2 MCTS     : 3076   19   19   600    32.2 %   3221   34.3 %
   7 Shredder 13 x64        : 3063   19   19   600    30.4 %   3223   35.5 %

 

Elo-spreading: from first to last: 341 Elo

Number of early draws:
first 10 moves played by engines: 5 draws= 0.24%
first 20 moves played by engines: 39 draws= 1.86%
first 30 moves played by engines: 81 draws= 3.86%

 

Games        : 2100 (finished)
White Wins   : 689 (32.8 %)
Black Wins   : 582 (27.7 %)
Draws        : 829 (39.5 %)
White Score  : 52.5 %
Black Score  : 47.5 %

 

 

Noomen (TCEC openings Season 9-13 Superfinal and Gambit-openings (477 lines)):

 

Avg game length = 405.223 sec

 

     Program                  Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 10 bmi2      : 3388   20   20   600    76.8 %   3169   39.7 %
   2 Houdini 6 pext         : 3289   19   19   600    63.6 %   3185   42.8 %
   3 Komodo 12 bmi2         : 3257   18   18   600    58.8 %   3191   42.0 %
   4 Fire 7.1 popc          : 3170   17   17   600    45.6 %   3205   45.8 %
   5 Ethereal 11.12 pext    : 3129   18   18   600    39.5 %   3212   46.3 %
   6 Komodo 12.2.2 MCTS     : 3091   18   18   600    33.9 %   3218   40.2 %
   7 Shredder 13 x64        : 3076   18   18   600    31.8 %   3221   43.8 %

 

Elo-spreading: from first to last: 312 Elo

Number of early draws:
first 10 moves played by engines: 7 draws= 0.33%
first 20 moves played by engines: 32 draws= 1.52%
first 30 moves played by engines: 90 draws= 4.29%

 

Games        : 2100 (finished)
White Wins   : 691 (32.9 %)
Black Wins   : 507 (24.1 %)
Draws        : 902 (43.0 %)
White Score  : 54.4 %
Black Score  : 45.6 %

 

 

Stockfish Framework 2moves openings:

 

Avg game length = 430.108 sec

 

     Program                  Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 10 bmi2      : 3395   20   20   600    77.5 %   3168   35.0 %
   2 Houdini 6 pext         : 3291   18   18   600    63.8 %   3185   46.8 %
   3 Komodo 12 bmi2         : 3254   18   18   600    58.4 %   3191   48.8 %
   4 Fire 7.1 popc          : 3164   17   17   600    44.8 %   3206   46.2 %
   5 Ethereal 11.12 pext    : 3142   18   18   600    41.5 %   3210   48.0 %
   6 Komodo 12.2.2 MCTS     : 3092   19   19   600    34.1 %   3218   44.8 %
   7 Shredder 13 x64        : 3062   19   19   600    30.0 %   3223   40.0 %

 

Elo-spreading: from first to last: 333 Elo

Number of early draws:
first 10 moves played by engines: 1 draws= 0.05%
first 20 moves played by engines: 12 draws= 0.57%
first 30 moves played by engines: 31 draws= 1.48%

 

Games        : 2100 (finished)
White Wins   : 689 (32.8 %)
Black Wins   : 482 (23.0 %)
Draws        : 929 (44.2 %)
White Score  : 54.9 %
Black Score  : 45.1 %

 

 

4 GM moves (out of MegaBase 2018, checked with Komodo):

 

Avg game length = 449.414 sec

 

     Program                  Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 10 bmi2      : 3396   20   20   600    77.5 %   3167   37.3 %
   2 Houdini 6 pext         : 3307   19   19   600    65.9 %   3182   48.2 %
   3 Komodo 12 bmi2         : 3262   18   18   600    59.5 %   3190   52.0 %
   4 Fire 7.1 popc          : 3151   17   17   600    42.9 %   3208   48.5 %
   5 Ethereal 11.12 pext    : 3119   18   18   600    38.2 %   3213   52.0 %
   6 Komodo 12.2.2 MCTS     : 3099   18   18   600    35.3 %   3217   45.3 %
   7 Shredder 13 x64        : 3066   19   19   600    30.7 %   3222   41.7 %

 

Elo-spreading: from first to last: 330 Elo

Number of early draws:
first 10 moves played by engines: 1 draws= 0.05%
first 20 moves played by engines: 7 draws= 0.33%
first 30 moves played by engines: 25 draws= 1.19%

 

Games        : 2100 (finished)
White Wins   : 679 (32.3 %)
Black Wins   : 446 (21.2 %)
Draws        : 975 (46.4 %)
White Score  : 55.5 %
Black Score  : 44.5 %

 

 

HERT set (500 pos):

 

Avg game length = 442.339 sec

 

     Program                  Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 10 bmi2      : 3384   20   20   600    76.3 %   3169   42.2 %
   2 Houdini 6 pext         : 3300   19   19   600    65.1 %   3183   48.2 %
   3 Komodo 12 bmi2         : 3270   19   19   600    60.7 %   3188   53.3 %
   4 Fire 7.1 popc          : 3139   18   18   600    41.0 %   3210   52.3 %
   5 Ethereal 11.12 pext    : 3131   18   18   600    39.8 %   3212   50.8 %
   6 Komodo 12.2.2 MCTS     : 3108   18   18   600    36.4 %   3215   46.5 %
   7 Shredder 13 x64        : 3068   19   19   600    30.8 %   3222   44.3 %

 

Elo-spreading: from first to last: 316 Elo

Number of early draws:
first 10 moves played by engines: 4 draws= 0.19%
first 20 moves played by engines: 19 draws= 0.90%
first 30 moves played by engines: 46 draws= 2.19%

 

Games        : 2100 (finished)
White Wins   : 661 (31.5 %)
Black Wins   : 426 (20.3 %)
Draws        : 1013 (48.2 %)
White Score  : 55.6 %
Black Score  : 44.4 %

 

 

FEOBOS v20.1 contempt 3:

 

Avg game length = 437.481 sec

 

     Program                  Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 10 bmi2      : 3365   19   19   600    73.9 %   3173   45.5 %
   2 Houdini 6 pext         : 3301   19   19   600    65.3 %   3183   51.8 %
   3 Komodo 12 bmi2         : 3265   18   18   600    60.0 %   3189   55.0 %
   4 Fire 7.1 popc          : 3161   17   17   600    44.2 %   3206   59.7 %
   5 Ethereal 11.12 pext    : 3151   18   18   600    42.6 %   3208   53.8 %
   6 Komodo 12.2.2 MCTS     : 3094   18   18   600    34.3 %   3218   47.5 %
   7 Shredder 13 x64        : 3063   19   19   600    29.8 %   3223   45.7 %

 

Elo-spreading: from first to last: 302 Elo

Number of early draws:
first 10 moves played by engines: 0 draws= 0%
first 20 moves played by engines: 22 draws= 1.05%
first 30 moves played by engines: 61 draws= 2.90%

 

Games        : 2100 (finished)
White Wins   : 638 (30.4 %)
Black Wins   : 385 (18.3 %)
Draws        : 1077 (51.3 %)
White Score  : 56.0 %
Black Score  : 44.0 %

 

 

Stockfish Framework 8moves openings:

 

Avg game length = 438.899 sec

 

     Program                  Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 10 bmi2      : 3363   19   19   600    73.9 %   3173   44.8 %
   2 Houdini 6 pext         : 3276   18   18   600    61.8 %   3187   52.8 %
   3 Komodo 12 bmi2         : 3267   18   18   600    60.3 %   3189   55.7 %
   4 Fire 7.1 popc          : 3167   17   17   600    45.0 %   3206   54.0 %
   5 Ethereal 11.12 pext    : 3140   17   17   600    40.8 %   3210   52.3 %
   6 Komodo 12.2.2 MCTS     : 3106   18   18   600    35.8 %   3216   52.0 %
   7 Shredder 13 x64        : 3082   18   18   600    32.3 %   3220   51.7 %

 

Elo-spreading: from first to last: 281 Elo

Number of early draws:
first 10 moves played by engines: 6 draws= 0.29%
first 20 moves played by engines: 20 draws= 0.95%
first 30 moves played by engines: 53 draws= 2.52%

 

Games        : 2100 (finished)
White Wins   : 610 (29.0 %)
Black Wins   : 400 (19.0 %)
Draws        : 1090 (51.9 %)
White Score  : 55.0 %
Black Score  : 45.0 %

 

 

For comparsion:

 

CEGT 40/4 ratinglist (singlecore):

 

1     Stockfish 10.0 x64 1CPU     3450
2     Houdini 6.0 x64 1CPU        3372
3     Komodo 12.1.1 x64 1CPU      3337
4     Fire 7.1 x64 1CPU           3242
5     Ethereal 11.00 x64 1CPU     3186
6     Komodo 12.2 x64 1CPU (MCTS) 3182
7     Shredder 13 x64 1CPU        3152

 

Elo-spreading: from first to last: 298 Elo

 

 

CCRL 40/4 ratinglist (singlecore):

 

1    Stockfish 10 64-bit          3498
2    Houdini 6 64-bit             3446
3    Komodo 12 64-bit             3410
4    Fire 7.1 64-bit              3333
5    Ethereal 11.00 64-bit        3301
6    Komodo 12.2.2 MCTS 64-bit    3288
7    Shredder 13 64-bit           3269

 

Elo-spreading: from first to last: 229 Elo by bayeselo. (With ORDO: 276 Elo)

 

 

 

2017/10/22 Some days ago, I had the idea, to filter half-closed positions out of my SALC V3 opening-set. Which means, that in the endpositions of the opening-line, following conditions had to be true:

1) On d-line or e-line at least one white and one black pawn (=one of both center-lines closed)
2) no pawn-capture on the center-squares (e4,d4,e5,d5) possible (means: not allowed: (white pawn on e4 and black
pawn on d5) or (white pawn on d4 and black pawn on e5) - so, the position cannot get fully open after 1 or 2 played moves by the engines.
3) no pawn-free d-line, when both queens are on d-line. So, the queens cannot capture each other after 1 or 2 played moves by the engines.

The idea is, that in these positions, the probability of fast and many capturing-moves is much lower, so it should took
more time (and moves) to reach drawish endgame-positions. So, the probability of an interesting and long midgame should get higher...

I did a testrun with these positions (with the exact same testing-conditions like the experimental testruns of the experiment below from 2017/10/17, so the result are comparable). The result is really surprising and much better, than I expected.

 

Games Completed = 1000 of 1000 (Avg game length = 920.236 sec)
Settings = RR/256MB/300000ms+3000ms/M 450cp for 4 moves, D 120 moves/EPD:C:\LittleBlitzer\SALC_half_closed.epd(7053)
 1.  asmFish 170426 x64  631.0/1000  387-125-draws:488 (L: m=0 t=0 i=0 a=125) (D: r=143 i=198 f=29 s=4 a=114)    (tpm=6691.2 d=30.13 nps=2329022)
 2.  Komodo 10.4 x64  369.0/1000 125-387-488  (L: m=0 t=0 i=0 a=387) (D: r=143 i=198 f=29 s=4 a=114) (tpm=7026.2 d=26.03 nps=1491141)
 

The result is impressive. The draw-rate (48.8%) is more than -5% lower than using "normal"-SALC (53.9%) and -14.6% lower than using the Stockfish-Framework 8move-openingset (63.4%)(this means, the number of draws is 23% lower with SALC half-closed!!!) - that is a huge step forward on my mission to prevent computerchess from draw-death!

And the Elo-differences of the engine-scores are not getting smaller (which would happen, when the opening-positions had huge advantages for white or black), they are getting higher (score of asmFish vs Komodo with Standard-openings: 60.3% and with SALC half-closed: 63.1%)!

And take a look on the average game length: 1036 sec with Standard-openings and only 920 sec with SALC half-closed: You need 11.2% less time, using SALC half-closed, for the same number of games. This testrun (3 games played simultaneously) ran nealry a half day shorter, than the testrun with Standard-openings.

 

2017/10/17 After the release of the FEOBOS v10 opening-books and files (by Frank Quisinsky and Klaus Wlotzka), with the new "contempt-books/opening-sets", I was curious to see, if the opening-set with the highest contempt 5 (means, that none of the 10 analyzing engines had a 0.00 evaluation in any opening-line endposition), could lower the draw-rate in engine-testing (compared to my SALC-openings and the standard 8-move opening-set of the Stockfish-framework). In May, 2017, I did 3 huge testruns (using SALC, Stockfish-opening-set and FEOBOS v3 beta). Now, I tested FEOBOS v10 Contempt 5 opening-set with the exact same conditions, so there was no need to replay the testruns, using SALC and using Stockfish-framework openings...(scroll down here, to find the 3 experimental testruns in 2017/05/19).

 

asmFish played 1000 games versus Komodo 10.4 with all 3 books/opening sets (=3000 games). Not bullet-speed, but 5'+3'' (!), singlecore, 256 MB Hash, no pondering, both engines with Contempt=+15. LittleBlitzerGUI (in RoundRobin playmode, in which for each game, one opening position is chosen per random out of an epd-openings file).

 

Games Completed = 1000 of 1000 (Avg game length = 1026.116 sec)
Settings = RR/256MB/300000ms+3000ms/M 450cp for 4 moves, D 120 moves/EPD:C:\LittleBlitzer\feobos_v10_c5.epd(12412)
 1.  asmFish 170426 x64 612.0/1000  312-88-draws:600 (L: m=0 t=0 i=0 a=88) (D: r=139 i=221 f=51 s=2 a=187) (tpm=6327.1 d=31.20 nps=2379009)
 2.  Komodo 10.4 x64 388.0/1000  88-312-600 (L: m=0 t=0 i=0 a=312) (D: r=139 i=221 f=51 s=2 a=187) (tpm=6473.7 d=26.68 nps=1493768)

Games Completed = 1000 of 1000 (Avg game length = 944.640 sec)

Settings = RR/256MB/300000ms+3000ms/M 450cp for 4 moves, D 120 moves/EPD:C:\LittleBlitzer\SALC_V2_10moves.epd(10000)

1. asmFish 170426 x64 620.5/1000 351-110-draws: 539 (L: m=0 t=0 i=0 a=110) (D: r=149 i=231 f=38 s=0 a=121) (tpm=6659.0 d=30.93 nps=2552099)

2. Komodo 10.4 x64 379.5/1000 110-351-539 (L: m=0 t=0 i=0 a=351) (D: r=149 i=231 f=38 s=0 a=121) (tpm=6920.9 d=26.71 nps=1619591)

 

Games Completed = 1000 of 1000 (Avg game length = 1036.164 sec)

Settings = RR/256MB/300000ms+3000ms/M 450cp for 4 moves, D 120 moves/EPD:C:\LittleBlitzer3\34700_ok.epd(32000)

1. asmFish 170426 x64 603.0/1000 286-80-draws: 634 (L: m=0 t=0 i=0 a=80) (D: r=148 i=232 f=39 s=1 a=214) (tpm=6334.2 d=31.54 nps=2570164)

2. Komodo 10.4 x64 397.0/1000 80-286-634 (L: m=0 t=2 i=0 a=284) (D: r=148 i=232 f=39 s=1 a=214) (tpm=6473.6 d=27.00 nps=1614400)

 

Conclusions: The FEOBOS v10 Contempt 5 positions lowered the draw-rate compared to Stockfish opening-set from 63.4% to 60.0% and the number of 3fold-draws from 14.8% to 13.9%. That is a small, but measureable progress. Still far away from the low draw-rate, the SALC-openings have (SALC lowered the draw-rate from 63.4% to 53.9% (!)), but mention, that FEOBOS plays a wide variety of all openings,and SALC plays only lines, where white and black castled to opposite sides of the chessboard.

 

 

2017/09/01 I measured the speed of Stockfish-compiles (abrok, ultimaiq and BrainFish (without Cerebellum-Library, Brainfish is identical to Stockfish). Stockfish C++ code from 170905, measured with fishbench (10 runs each version), i7-6700HQ 2.6 GHz Skylake CPU. These are the results:

abrok modern    : 1.557 mn/s
abrok bmi2      : 1.611 mn/s

ultimaiq modern : 1.660 mn/s
ultimaiq bmi2   : 1.702 mn/s

brainfish modern: 1.729 mn/s
brainfish bmi2  : 1.764 mn/s


modern:
abrok -> ultimaiq = +6.6% speedup
ultimaiq -> brainfish = +4.2% speedup

 

bmi2:
abrok -> ultimaiq = +5.6% speedup
ultimaiq -> brainfish = +3.6% speedup

 

So, the ultimaiq-compiles are around 6% faster than the abrok-compiles, but BrainFish is around 10% faster than abrok!!! From now, I will use the BrainFish-compiles (without Cerebellum-Library) for my Stockfish-testruns, because these are the fastest compiles at the moment and the results are better comparable with the BrainFish-testruns, when BrainFish uses the Cerebellum-Library.

 

2017/08/23 Using the new HERT openings-set (by Thomas Zipproth) for my Stockfish-testing was a great opportunity to compare the gamebases played with HERT (contains positions selected from the most played variations in Engine and Human tournaments) and played with my SALC openings (SALC means: only positions with castling to opposite sides, both queens still on board. The idea was to lower the draw-rate in computerchess and make the games more tactical and thrilling, without distorting the results of engine-tests and engine-tournaments). So, here the results. Both gamebases were played with 3'+1'', singlecore, 512 MB Hash. The only difference was the opening-set (HERT / SALC)...

 

HERT:


     Program                    Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 170526 bmi2    : 3346    7    7  5000    71.3 %   3171   45.6 %
   2 Komodo 11.2.2 x64        : 3314    6    6  5000    66.9 %   3177   45.8 %
   3 Houdini 5 pext           : 3299    6    6  5000    64.7 %   3180   48.5 %
   4 Shredder 13 x64          : 3119    6    6  5000    37.8 %   3216   43.7 %
   5 Fizbo 1.9 bmi2           : 3096    6    6  5000    34.4 %   3221   38.2 %
   6 Andscacs 0.91b bmi2      : 3026    7    7  5000    24.9 %   3235   34.9 %

 

Elo-differences:
1-6: 320 (overall)

1-2: 32
2-3: 15
3-4: 180
4-5: 23
5-6: 70


Games        : 15000 (finished)

average game length: +13.7% compared to SALC games (moves)
                     +10% compared to SALC games (time)
    

White Wins   : 5129 (34.2 %)
Black Wins   : 3455 (23.0 %)
Draws        : 6416 (42.8 %)

 

 

SALC:


     Program                    Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 170526 bmi2    : 3359    7    7  5000    72.7 %   3168   39.9 %
   2 Komodo 11.2.2 x64        : 3327    7    7  5000    68.3 %   3175   38.5 %
   3 Houdini 5 pext           : 3298    6    6  5000    64.4 %   3180   42.2 %
   4 Shredder 13 x64          : 3108    6    6  5000    36.4 %   3218   35.4 %
   5 Fizbo 1.9 bmi2           : 3097    7    7  5000    34.8 %   3221   31.1 %
   6 Andscacs 0.91b bmi2      : 3012    7    7  5000    23.5 %   3238   27.7 %

 

Elo-differences:
1-6: 347 (overall)

1-2: 32
2-3: 29
3-4: 190
4-5: 11
5-6: 85


Games        : 15000 (finished)

White Wins   : 5476 (36.5 %)
Black Wins   : 4154 (27.7 %)
Draws        : 5370 (35.8 %)

 

See the individual statistics (engine vs. engine) here

Download the 2x 15000 games here

 

Conclusions:

1) SALC lowers the draw-rate a lot (35.8%) , compared to the HERT openings-set (42.8%) - mention, that the HERT-set was optimized for a low draw-rate. Thomas Zipproth has chosen only lines, which were not too drawish. Using other "classical" openings-sets should lead to a higher draw-rate, than using HERT !!!

2) The order of rank is the same for all engines in both gamebases - no distorted results with SALC.
3) The scores of the engines are not getting closer to 50%, using SALC. The Elo-differences are not getting smaller (in fact, they are getting higher! (Elo-differences rank 1 to 6: 320 Elo using HERT, but 347 Elo using SALC), which proofs, that SALC does not contain a lot of lines, which are leading to a clear advantage (and easy wins) for white or black. And bigger Elo-differences make the results statistical more reliable.
4) SALC lowers the average game duration  around 10%. That means, that in the same time, +10% more games can be played, which leads to statistical more valuable results in the same time.

 

5) At the moment, using a classical openings-set (like HERT) or book is OK, when playing with engines with a huge Elo-difference and when using a short thinking-time. But if you play only with very strong engines (with a small Elo-difference) and / or with very long thinking-time, than using SALC is strongly recommended, because in those cases, the draw-rate increases a lot. And in the future, when the hardware gets faster and faster, the draw-rate of computerchess will increase more and more (of course). Then using SALC will be the only solution, preventing the "draw-death" of computerchess...

So, do not hesitate and download the complete SALC-package (opening-books and more than 12000 opening-positions (PGN and EPD))  here


 

2017/05/19  In the end of 2016, I released the 2.0 version of my SALC opening book. The idea was to create a book, which lowers the draw-rate in computerchess, because the draw-rate increases more and more, when the engines get stronger and the hardware gets faster. In online engine-tournaments and in the TCEC-tournament, the draw-rates are already around 85% and so, the “draw-death“ of computerchess is coming closer and closer. As you can see below (experimental testruns 2016/12/09), my SALC V2.0 book lowered the draw-rate a lot in a Stockfish 8 selfplay testrun (compared to a classic opening book/position set)(from 83% to 68.2% (!)). But in the last months, some people criticized, that the openings in the SALC book just gave a huge advantage for one color, which lowers the number of draws. It is clear, that this way of creating a book would work: if all lines of a book would give one color an advantage of +9, the draw-rate would be (of course...) 0%...But on the other hand, the scores in an engine-tournament, using such book, would be 50% for all engines, because we would have a random distribution of the advantage of the opening lines, if the number of played games is high enough.
But this was not the idea of the SALC book. The idea was, that in all book lines, white and black castle to opposite sides, with both queens still on the board, which should lead to more attacks to the king and to a more tactical and a more thrilling computerchess. All book lines were checked with Komodo 10.2 (20'' per position, running on 3 cores), evaluation inside of [-0.6,+0.6]. So, no lines with a huge advantage for white or black are in the SALC book.
If the critics were right, that the SALC book lines lead to too huge advantage for one color, using the SALC book should bring the engine scores in an tournament closer to 50%, compared to a classical opening book.
To verify, that this will NOT happen, I did 3 testruns with 3 different opening sets:
1) SALC V2
2) Frank Quisinsky's FEOBOS 3.0 book (beta), a new and very well engine-analyzed and balanced opening book (get more information on his website www.amateurschach.de).
3) the 8-move openings collection, which is used in the Stockfish framework.

asmFish played 1000 games versus Komodo 10.4 with all 3 books/opening sets (=3000 games). Not bullet-speed, but 5'+3'' (!), singlecore, 256 MB Hash, no pondering, both engines with Contempt=+15. LittleBlitzerGUI (in RoundRobin playmode, in which for each game, one opening position is chosen per random out of an epd-openings file). It took more around 12 days, to complete these three long testruns.

Games Completed = 1000 of 1000 (Avg game length = 944.640 sec)

Settings = RR/256MB/300000ms+3000ms/M 450cp for 4 moves, D 120 moves/EPD:C:\LittleBlitzer\SALC_V2_10moves.epd(10000)

Time = 945199 sec elapsed, 0 sec remaining

1. asmFish 170426 x64 620.5/1000 351-110-draws: 539 (L: m=0 t=0 i=0 a=110) (D: r=149 i=231 f=38 s=0 a=121) (tpm=6659.0 d=30.93 nps=2552099)

2. Komodo 10.4 x64 379.5/1000 110-351-539 (L: m=0 t=0 i=0 a=351) (D: r=149 i=231 f=38 s=0 a=121) (tpm=6920.9 d=26.71 nps=1619591)

 

Games Completed = 1000 of 1000 (Avg game length = 1049.395 sec)

Settings = RR/256MB/300000ms+3000ms/M 450cp for 4 moves, D 120 moves/EPD:C:\LittleBlitzer2\FEOBOS_v03+.epd(24085)

Time = 1039157 sec elapsed, 0 sec remaining

1. asmFish 170426 x64 601.5/1000 293-90-draws: 617 (L: m=0 t=0 i=0 a=90) (D: r=132 i=221 f=38 s=1 a=225) (tpm=6315.9 d=30.83 nps=2477078)

2. Komodo 10.4 x64 398.5/1000 90-293-617 (L: m=0 t=0 i=0 a=293) (D: r=132 i=221 f=38 s=1 a=225) (tpm=6424.5 d=26.49 nps=1583220)

 

Games Completed = 1000 of 1000 (Avg game length = 1036.164 sec)

Settings = RR/256MB/300000ms+3000ms/M 450cp for 4 moves, D 120 moves/EPD:C:\LittleBlitzer3\34700_ok.epd(32000)

Time = 1036719 sec elapsed, 0 sec remaining

1. asmFish 170426 x64 603.0/1000 286-80-draws: 634 (L: m=0 t=0 i=0 a=80) (D: r=148 i=232 f=39 s=1 a=214) (tpm=6334.2 d=31.54 nps=2570164)

2. Komodo 10.4 x64 397.0/1000 80-286-634 (L: m=0 t=2 i=0 a=284) (D: r=148 i=232 f=39 s=1 a=214) (tpm=6473.6 d=27.00 nps=1614400)


Conclusions:
1) The SALC book lowers the draw-rate a lot (53.9%) , compared to the FEOBOS book (61.7%) and the Stockfish Framework opening set (63.4%), although the engines played with Contempt=+15.
2) The scores of the engines are not getting closer to 50%, using the SALC-book. The Elo-differences are not getting smaller (in fact, they are getting higher!), which proofs, that the SALC book does not contain a lot of lines, which are leading to a clear advantage (and easy wins) for white or black, compared to both other books.
3) The SALC book lowers the average game duration (compared to the other books) around 10%. That means, that in the same time, +10% more games can be played, which leads to statistical more valuable results in the same time (for example: This testrun using SALC ended more than one day before the FEOBOS and the Stockfish-openings testruns)
4) Although there is no doubt, that the FEOBOS book is very well balanced and analyzed, and this beta version contains only lines with both queens on board, the draw-rate is only a little bit lower than using the Stockfish Framework opening set. The number of 3fold-draws is a little bit lower with FEOBOS (compared to both other books), but 16 less 3fold draws of 1000 games isnt pretty much (1.6%) .

All three books/opening-sets were created only for playing engine tournaments/competitions.
But only using the SALC V2.0 book brings clearly measureable benefits: a clearly lower draw-rate, 10% lower game duration (10% more games in the same time) and the biggest Elo-differences/distances in the engine results/scores. So, the SALC book avoids the “draw-death“ of computerchess in the near future and engine-tournament results, using SALC, are statistical more valuable, than using other opening books/sets, because the "resolution" of  Elos in the results is higher and more games can be played in the same time. Feel free to download the SALC V2 book/openings set and make your own tests. If the number of games (300+) is high enough, I have no doubt, that the results will confirm my testing results and you will see, how thrilling watching modern computerchess still can be.


2016/12/09: Some weeks ago, I created my SALC opening book for engine-engine matches. In all lines (created out of 10000 human-games, all lines 20 plies deep (all lines checked with Komodo 10.2 (20'' per position, running on 3 cores), evaluation inside of [-0.6,+0.6])), white and black castled to opposite sides, both queens still on the board. The idea is, to get more attacks to the king and a lower draw rate. Because the draw rate in computerchess increases more and more, the stronger the engines and the faster the hardware gets. For my Stockfish bullet-testruns, I use 500 SALC-positions since 2014, which lowered the draw rate a lot.

To verify, how much the draw rate is lowered by these new book / opening-positions set, I did two testruns. 3000 games each (=6000 games). Stockfish 8 in selfplay. 70''+700ms thinkingtime, singlecore, LittleBlitzerGUI (using the 10000 positions epd-files, playing in RoundRobin-mode, in which for each game one epd-position is chosen per random).


Test 1: 34700 standard 8-move opening epd. Draw rate: 83.0%
Test 2: 10000 SALC V2 epd. Draw rate: 68.2%

 

I think, the result is really impressive...


2016/03/12: Testrun of 3 new Stockfish-clones. Stockfish played 1000 games against them (LittleBlitzerGUI, singlecore, 70''+700ms, 128 Hash, no ponder, no bases, no largepages, 500 SALC-openings). None of the clones is stronger (no surprise), so don't waste your time with this "engines". The new popcount-versions of DON are not running on my system (and the LittleBlitzerGUI), so I could not test DON.

 

     Program                   Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 160302 x64    : 3300    7    7  3000    51.9 %   3287   65.0 %
   2 Venom 3 x64             : 3293   12   12  1000    48.8 %   3300   66.6 %
   3 Anoa 1.1 x64            : 3285   12   12  1000    47.9 %   3300   63.6 %
   4 Sanjib 3 x64            : 3283   13   13  1000    47.8 %   3300   64.7 %

 

 

2015/02/20: A little "Clone-Wars" testrun of Stockfish 6 against 5 of its clone-engines. (70''+700ms, singlecore, SALC-openings, 1000 games-Gauntlet). As you can see, none of the 5 clones is really measureable stronger (all results in a +/-1% score-interval and clearly inside errorbar). 

 

 

     Program                 Elo    +    -   Games   Score   Av.Op.  Draws

   1 Pepper 150213 x64s    : 3251   13   13  1000    51.0 %   3243   60.9 %
   2 Sugar 5 x64s          : 3248   13   13  1000    50.7 %   3243   57.0 %
   3 Orka 150213 x64s      : 3248   14   14  1000    50.7 %   3243   59.2 %
   4 Salt 5 x64s           : 3247   13   13  1000    50.5 %   3243   60.1 %
   5 Stockfish 6 150128    : 3243    6    6  5000    49.4 %   3247   59.7 %
   6 Shark 150209 x64s     : 3241   13   13  1000    50.0 %   3243   61.3 %