Stefan Pohl Computer Chess

private website for chessengine-tests


The Unbalanced Human Openings XXL Project

 

Download the UHO XXL openings here

 

The UHO XXL openings were made for the Stockfish Framework. There, the tests, using classical openings, have reached draw-rates around 90%, which is a lot, lot too much for statistically valid results, especially, because in the Framework, where SF devs, which are close in strength, are tested versus the current SF master.
In my AntiDraw openings-collection (
Anti Draw Openings), some very good working UHO openings-sets are included, but the problem is: They are all too small for the SF Framework, where  some tests can easily have 200000 games or more. So, UHO openings are needed, with at least 100000, better more, different opening-lines. So, I decided to develop such UHO XXL openings. What have I done?

Step 1: Build a database out of all games since 1945 of the Megabase 2021 (around 8 million human games) and 2 million human games (2019-2020) out of the LiChess Elite-Database (filtered all (standard) games from  lichess to only keep games by players rated 2400+ against players rated 2200+, excluding bullet games.).

Step 2: Delete all games out of this 10 million human games database, which were starting from a FEN (Chess960). Delete all comments out of the games. Delete all games with less than 15 moves.

Step 3: Delete all played moves beyond 8 moves / 16 plies. Delete all games, where not both queens are still on the board in the endposition.

Step 4: Remove all games with an endposition, that already is in the file (remove all doubles). Result: 2.7 million opening-lines with different endpositions, all 8 moves/16 plies deep.

Step 5: Delete all Tags, replace them with 7 empty standard Tags. Add the ECO-Code Tag. Setting all game results to 1/2-1/2.

Step 6: Evaluated all endpositions with KomodoDragon 1.0 (3 sec/move on Quadcore, average search depth=20), then deleted all lines with endposition-eval out of the interval [-1.99;+1.99]

2.5 million opening-lines remaining. Saved in the rawdata.7z archive.

 

Filtered 7 Unbalanced Human Opening files with increasing evals and made additional EPD-files out of these
rawdata-file:

 

UHO_XXL_+0.80_+1.09 = 261043 lines
UHO_XXL_+0.90_+1.19 = 223081 lines
UHO_XXL_+1.00_+1.29 = 186106 lines 
UHO_XXL_+1.10_+1.39 = 155058 lines
UHO_XXL_+1.20_+1.49 = 129823 lines 
UHO_XXL_+1.30_+1.59 = 111314 lines
UHO_XXL_+1.40_+1.69 = 96173 lines

 

Why so many different openings-sets? At the moment, the first interval of [+0.80;+1.09] should work best in the Stockfish Framework, but if Stockfish gains more strength and the hardware gets faster in the future, the draw-rates will raise again. Then an openings-set with a higher eval-interval can be chosen, to lower the draw-rates again into a valid range of 45%-65%. This concept of different sets with increasing eval-intervals was developed by me (and already used in my AntiDraw openings collections) and it makes it possible to make the draw-rate controllable and keep it in a valid range of 45%-65%. So, these openings will work in the present and in the future (faster hardware and stronger engines raise the draw-rate!) and with different test-setups (longer thinking-times or more used threads raise the draw-rate, too). How cool is that?

 

Important information: Because pgn-extract always keeps the first appearance of one opening-endposition and removes all following doubles, the most common human openings are at the beginning of the openings-files with higher probability and the more "exotic" human opening-lines are at the end of the openings-files with higher probability. Because of this and the fact, that my UHO XXL-files are really huge, it could make sense to cut the files at the end off, if not so many opening-lines are needed.
Example: If the SF-developers think, that perhaps 200000 different openings are enough for valid tests in the Framework, I recommend to cut the bigger files (UHO_XXL_+0.80_+1.09 = 261043 lines) off and keep just the first 200000 openings of these files. But that is not my decision, so I decided to make the files as huge as possible.


Tests: AMD Ryzen 3900 12-core (24 threads) notebook with 32GB RAM. 20 games are played simultaneously
Singlethread, TurboBoost-mode switched off, chess starting position: Stockfish: 1.3 mn/s. Hash: 128MB per engine
GUI: Cutechess-cli
Tablebases: None for engines, 5 Syzygy for cutechess-cli
Thinking-time: 60sec+600ms
Engines: Stockfish 14 vs Stockfish 13 

Each openings-set played 10000 games (each opening replayed with reversed colours, of course)

- Noob_3mvs (standard openings, used in the Stockfish Framework right now)
- BJBraams big openings (bjbraams_chessdb_198350_lines) (unbalanced openings set, derived from Noob-openings)
- UHO_XXL_+0.80_+1.09 openings (one of my new, huge Unbalanced Human Openings sets, 261043 lines)

Here the final results of the testruns (calculated by ORDO):


Noob_3mvs:
     Program                Elo    +    -   Games   Score   Av.Op.  Draws
   1 Stockfish 14 avx2    :  32    2    2   10000    54.5 %      0   86.1 %
   2 Stockfish 13 avx2    :   0    2    2   10000    45.5 %     32   86.1 %

Games        : 10000 (finished)(SF 14: +1144,=8613,-243)
White Wins   : 812 (8.1 %)
Black Wins   : 575 (5.8 %)
Draws        : 8613 (86.1 %)

 


BJBraams big openings:
     Program                Elo    +    -   Games   Score   Av.Op.  Draws
   1 Stockfish 14 avx2    :  55    2    2   10000    57.8 %      0   64.7 %
   2 Stockfish 13 avx2    :   0    2    2   10000    42.2 %     55   64.7 %
Games        : 10000 (finished)(SF 14: +2548,=6469,-983)
White Wins   : 2676 (26.8 %)
Black Wins   : 855 (8.6 %)
Draws        : 6469 (64.7 %)

 


UHO_XXL_+0.80_+1.09:
     Program                Elo    +    -   Games   Score   Av.Op.  Draws
   1 Stockfish 14 avx2    :  60    2    2   10000    58.5 %      0   54.9 %
   2 Stockfish 13 avx2    :   0    2    2   10000    41.5 %     60   54.9 %

Games        : 10000 (finished)(SF 14: +3104,=5491,-1405)
White Wins   : 4305 (43.0 %)
Black Wins   : 204 (2.0 %)
Draws        : 5491 (54.9 %)

Conclusions:

A) My UHO_XXL openings work clearly better than the BJBraams openings, because:
1) The Elo-spreading between SF 14 and SF 13 is higher.
2) The draw-rate is clearly lower.
3) The number of wins with black is clearly lower (the number of black wins should be very low (but not 0%), when using openings with a measuerable advantage for white).

 

B) My UHO_XXL openings are (of course) clearly better than the Noob_3mvs
openings, used in the Framework right now for testing: 
1) The Elo-spreading is nearly doubled.
2) The draw-rate is clearly lower (86.1% to 54.9% (!!!))

 

 

Idea for UHO XXL openings and all work done by Stefan Pohl. The using of the UHO XXL openings is recommended only for the Stockfish Framework or for other tests-setups, where huge head-to-head tests of engines with 50000 ore more games are done. For all other testers, I strongly recommend to use the UHO openings in my Anti Draw openings collection, which can be found here: Anti Draw Openings


(C) 2021 Stefan Pohl (SPCC)
www.sp-cc.de