Stefan Pohl Computer Chess

private website for chessengine-tests


The Unbalanced Human Openings XXL Project

 

 

BUGFIX: (2022/05/07) The UHO XXL files (old and 2022 files) had a rare bug: Some lines ended in a mate in some moves. This was caused by a bug in the pgnscanner-tool: If Komodo finds a mate, the search-depth goes up to 99 and the eval, the pgnscanner gave back, is bogus... Sorry for that. I fixed all UHO pgn and epd files. Please re-download.

Mention, in the raw-data files of old UHO XXL and the all-lines file of UHO 2022 XXL, the buggy lines are still in. If you want to use these raw-data files (did anybody ever do this?) please search for "depth=99" in the pgn-files with an editor and remove these games, before using them...(the bug is very rare, so no big deal to find and delete the buggy lines). UHO XXL should be used in the Stockfish-Framework (Fishtest), only, and there, the raw-data files are not used, of course. So, I see no need to fix and reupload the raw-data files, because nobody will/should use them...

Good news: non-XXL UHOs (and all of my other AntiDraw-openings) are OK, because the bug only occurs in Lichess-base games, which were only used for building the UHO XXL-files...

 

Download the UHO XXL 2022 openings here

 

The UHO XXL 2022 openings were made for the Stockfish Framework. There, the tests, using classical openings,  have reached draw-rates around 90%, which is a lot, lot too much for statistically valid results,  especially,because in the Framework, where SF devs, which are close in strength, are tested versus the  current SF master.
In my AntiDraw openings-collection (https://www.sp-cc.de/anti-draw-openings.htm), some very good working  UHO openings-sets are included, but the problem is: They are all too small for the SF Framework, where some tests can easily have 200000 games or more.
So, UHO openings are needed, with at least 100000, better more, different opening-lines. So, I decided to develop such UHO XXL openings. What have I done?

Step 1: Built a database out of all games since 1945 of the Megabase 2022 (around 8 million human games) and 3 million human games (2019-2021) out of the LiChess Elite-Database (filtered all (standard) games from lichess to only keep games by players rated 2400+ against players rated 2200+, excluding bullet games.).

Step 2: Deleted all games out of this 11 million human games database, which were starting from a FEN (Chess960). Delete all comments out of the games. Delete all games with less than 15 moves.

Step 3: Deleted all played moves beyond 8 moves / 16 plies. Delete all games, where not both queens are still on the board in the endposition.

Step 4: Removed all games with an endposition, that already is in the file (remove all doubles). Result: 3.4 million opening-lines with different endpositions, all 8 moves/16 plies deep.

Step 5: Deleted all Tags, replace them with 7 empty standard Tags. Add the ECO-Code Tag. Setting all game results to 1/2-1/2.

Step 6: Pre-filtered all endpositions with KomodoDragon 2.6 (1 sec/move on Quadcore), then deleted all lines with endposition-eval out of the interval [+0.70;+2.99]

1129590 lines remaining

Step 7: Evaluated all pre-filtered endpositions with KomodoDragon 2.6 (8.5 sec/move on Quadcore),  then deleted all lines with endposition-eval out of the interval [+1.00;+1.99]

This evaluation was done with nearly 3x more thinking-time (8.5 sec/move) (same hardware) and the much stronger KomodoDragon 2.6 (instead of KomodoDragon 1.0 and 3 sec/move for old UHO XXL). So the evaluations of the endpositions are much better and deeper!

IMPORTANT: KomodoDragon 2.6 shows evals, which are around +0.20 higher, than KomodoDragon 1.0 does, which I used for evaluating the old UHO XXL openings. So, the Eval-intervals in the new UHO XXL 2022 are +0.20 higher...

663273 lines remaining

Filtered 8 Unbalanced Human Opening files with increasing evals and made additional EPD-files out of these rawdata-file:

UHO_XXL_2022_+1.00_+1.29 = 284039 lines
UHO_XXL_2022_+1.10_+1.39 = 253850 lines (+14% bigger than (old) UHO_XXL_+0.90_+1.19)
UHO_XXL_2022_+1.20_+1.49 = 226631 lines 
UHO_XXL_2022_+1.30_+1.59 = 203428 lines
UHO_XXL_2022_+1.40_+1.69 = 180462 lines 
UHO_XXL_2022_+1.50_+1.79 = 159694 lines
UHO_XXL_2022_+1.60_+1.89 = 139237 lines
UHO_XXL_2022_+1.70_+1.99 = 123653 lines

(compared to old UHO XXL, the new UHO XXL 2022 files contain around +13% more lines)

 

Additionally: UHO_MEGA_2022_+1.10_+1.49 = 321415 lines = +44% bigger than (old) UHO_XXL_+0.90_+1.19 (223081 lines) - more information below.

 

Why so many different openings-sets? At the moment, the first intervals of should work best in the Stockfish Framework, but if Stockfish gains more strength and the hardware gets faster in the future, the draw-rates will raise again. Then an openings-set with a higher eval-interval can be chosen, to lower the draw-rates again into a valid range of 45%-65%. This concept of different sets with increasing eval-intervals was developed by me (and already used in my AntiDraw openings collections) and it makes it possible to make the draw-rate controllable and keep it in a valid range of 45%-65%. So, these openings will work in the present and in the future (faster hardware and stronger engines raise the draw-rate!) and with different test-setups (longer thinking-times or more used threads raise the draw-rate, too). 
How cool is that?

 

Conclusions from the tests (see below):

A) As expected, the new UHO_XXL_2022_+110_+139 and the UHO_XXL_2022_+120_+149 results are the closest to the old UHO_XXL_+090_+199 openings in these testruns. Elo-spreading is nearly the same and the draw-rate of 50.2% and 43.1% are close to the 46.5% of the old UHO-set. UHO_XXL_2022_+100_+129 gives too much draws and  UHO_XXL_2022_+130_+159 a very small draw-rate.
B) Because of A), I had the idea, to build a new UHO_MEGA_2022_+110_+149 opening-set.
Means, the new UHO_XXL_2022_+110_+139 and the UHO_XXL_2022_+120_+149 set are combined. Because the draw rate of the old UHO set in the tesruns is between the draw-rates of  these 2 new UHO-sets: 46.5% is exactly between 50.2% and 43.1%...see tests below.
This new UHO_MEGA_2022_+110_+149 set is really huge: 321415 lines = +44% bigger than (old) UHO_XXL_+0.90_+1.19 (223081 lines). In the Stockfish-Framework size matters,  because some testruns are really huge. So, IMO, this new UHO MEGA set can be useful...

 

Tests: AMD Ryzen 3900 12-core (24 threads) notebook with 32GB RAM. 20 games are played simultaneously
Singlethread, TurboBoost-mode switched off, Hash: 128MB per engine
GUI: Cutechess-cli
Tablebases: None for engines, 5 Syzygy for cutechess-cli
Thinking-time: 60sec+600ms
Engines: Stockfish 220328 vs. Stockfish 14.1

Each openings-set played 10000 games (each opening replayed with reversed colours, of course)

Here the final results of the testruns (calculated by ORDO):

(old) UHO_XXL_+0.90_+1.19 (223081 lines) for comparsion:
     Program                     Elo    +    - Games    Score   Av.Op.  Draws
   1 Stockfish 220328 avx2    : 3842    2    2 10000    55.9 %   3800   46.5 %
   2 Stockfish 14.1 211028    : 3800    2    2 10000    44.1 %   3842   46.5 %
Games        : 10000 (finished)(SF 220328: +3267,=4654,-2079)
White Wins   : 5274 (52.7 %)
Black Wins   : 72 (0.7 %)
Draws        : 4654 (46.5 %)

UHO_XXL_2022_+100_+129:
     Program                     Elo    +    - Games    Score   Av.Op.  Draws
   1 Stockfish 220328 avx2    : 3844    2    2 10000    56.2 %   3800   57.2 %
   2 Stockfish 14.1 211028    : 3800    2    2 10000    43.8 %   3844   57.2 %
Games        : 10000 (finished)(SF 220328: +2757,=5722,-1521)
White Wins   : 4189 (41.9 %)
Black Wins   : 89 (0.9 %)
Draws        : 5722 (57.2 %)

UHO_XXL_2022_+110_+139:
     Program                     Elo    +    - Games    Score   Av.Op.  Draws
   1 Stockfish 220328 avx2    : 3843    2    2 10000    56.1 %   3800   50.2 %
   2 Stockfish 14.1 211028    : 3800    2    2 10000    43.9 %   3843   50.2 %
Games        : 10000 (finished)(SF 220328: +3106,=5016,-1878)
White Wins   : 4892 (48.9 %)
Black Wins   : 92 (0.9 %)
Draws        : 5016 (50.2 %)

UHO_XXL_2022_+120_+149:
     Program                     Elo    +    - Games    Score   Av.Op.  Draws
   1 Stockfish 220328 avx2    : 3842    2    2 10000    56.0 %   3800   43.1 %
   2 Stockfish 14.1 211028    : 3800    2    2 10000    44.0 %   3842   43.1 %
Games        : 10000 (finished)(SF 220328: +3446,=4310,-2244)
White Wins   : 5629 (56.3 %)
Black Wins   : 61 (0.6 %)
Draws        : 4310 (43.1 %)

UHO_XXL_2022_+130_+159:
     Program                     Elo    +    - Games    Score   Av.Op.  Draws
   1 Stockfish 220328 avx2    : 3840    2    2 10000    55.7 %   3800   36.9 %
   2 Stockfish 14.1 211028    : 3800    2    2 10000    44.3 %   3840   36.9 %
Games        : 10000 (finished)(SF 220328: +3729,=3687,-2584)
White Wins   : 6252 (62.5 %)
Black Wins   : 61 (0.6 %)
Draws        : 3687 (36.9 %)
 

UHO_MEGA_2022_+110_+149 (321415 lines= +44% bigger than (old) UHO_XXL_+0.90_+1.19)
     Program                     Elo    +    - Games    Score   Av.Op.  Draws
   1 Stockfish 220328 avx2    : 3842    2    2 10000    55.9 %   3800   46.9 %
   2 Stockfish 14.1 211028    : 3800    2    2 10000    44.1 %   3841   46.9 %
Games        : 10000 (finished)(SF 220328: +3243,=4689,-2068)
White Wins   : 5226 (52.3 %)
Black Wins   : 85 (0.9 %)
Draws        : 4689 (46.9 %)


 

 

OLD UHO XXL version from 2021

 

Download the UHO XXL openings here

 

The UHO XXL openings were made for the Stockfish Framework. There, the tests, using classical openings, have reached draw-rates around 90%, which is a lot, lot too much for statistically valid results, especially, because in the Framework, where SF devs, which are close in strength, are tested versus the current SF master.
In my AntiDraw openings-collection (
Anti Draw Openings), some very good working UHO openings-sets are included, but the problem is: They are all too small for the SF Framework, where  some tests can easily have 200000 games or more. So, UHO openings are needed, with at least 100000, better more, different opening-lines. So, I decided to develop such UHO XXL openings. What have I done?

Step 1: Build a database out of all games since 1945 of the Megabase 2021 (around 8 million human games) and 2 million human games (2019-2020) out of the LiChess Elite-Database (filtered all (standard) games from  lichess to only keep games by players rated 2400+ against players rated 2200+, excluding bullet games.).

Step 2: Delete all games out of this 10 million human games database, which were starting from a FEN (Chess960). Delete all comments out of the games. Delete all games with less than 15 moves.

Step 3: Delete all played moves beyond 8 moves / 16 plies. Delete all games, where not both queens are still on the board in the endposition.

Step 4: Remove all games with an endposition, that already is in the file (remove all doubles). Result: 2.7 million opening-lines with different endpositions, all 8 moves/16 plies deep.

Step 5: Delete all Tags, replace them with 7 empty standard Tags. Add the ECO-Code Tag. Setting all game results to 1/2-1/2.

Step 6: Evaluated all endpositions with KomodoDragon 1.0 (3 sec/move on Quadcore, average search depth=20), then deleted all lines with endposition-eval out of the interval [-1.99;+1.99]

2.5 million opening-lines remaining. Saved in the rawdata.7z archive.

 

Filtered 7 Unbalanced Human Opening files with increasing evals and made additional EPD-files out of these rawdata-file:

 

UHO_XXL_+0.80_+1.09 = 261043 lines
UHO_XXL_+0.90_+1.19 = 223081 lines
UHO_XXL_+1.00_+1.29 = 186106 lines 
UHO_XXL_+1.10_+1.39 = 155058 lines
UHO_XXL_+1.20_+1.49 = 129823 lines 
UHO_XXL_+1.30_+1.59 = 111314 lines
UHO_XXL_+1.40_+1.69 = 96173 lines

 

Why so many different openings-sets? At the moment, the first interval of [+0.80;+1.09] should work best in the Stockfish Framework, but if Stockfish gains more strength and the hardware gets faster in the future, the draw-rates will raise again. Then an openings-set with a higher eval-interval can be chosen, to lower the draw-rates again into a valid range of 45%-65%. This concept of different sets with increasing eval-intervals was developed by me (and already used in my AntiDraw openings collections) and it makes it possible to make the draw-rate controllable and keep it in a valid range of 45%-65%. So, these openings will work in the present and in the future (faster hardware and stronger engines raise the draw-rate!) and with different test-setups (longer thinking-times or more used threads raise the draw-rate, too). How cool is that?

 

Important information: Because pgn-extract always keeps the first appearance of one opening-endposition and removes all following doubles, the most common human openings are at the beginning of the openings-files with higher probability and the more "exotic" human opening-lines are at the end of the openings-files with higher probability. Because of this and the fact, that my UHO XXL-files are really huge, it could make sense to cut the files at the end off, if not so many opening-lines are needed.
Example: If the SF-developers think, that perhaps 200000 different openings are enough for valid tests in the Framework, I recommend to cut the bigger files (UHO_XXL_+0.80_+1.09 = 261043 lines) off and keep just the first 200000 openings of these files. But that is not my decision, so I decided to make the files as huge as possible.

 

 

Huge 10% time odds test (60sec+600ms vs 66sec+660ms), using Stockfish 210827 (selfplay). 60000 games each testrun, played in the Stockfish-Framework (=180000 games total(!)). noob_3moves and 8moves_v3 are classical openings-sets, which were used in the Stockfish-Framework before my UHO_XXL set (8moves_v3 is still in use for the regression-testruns).

Name                  Draw-rate   Elo     Normalized Elo  Win-Draw-Loss
 
UHO_XXL_+0.90_+1.19   50%         10.94   24.05           [16069,29751,14180]
noob_3moves           95%          3.39   15.46           [1825, 56936,1239]
8moves_v3             91%          3.94   14.97           [3144, 54393,2463]

Really impressive, how much better the results (draw-rate, Elo-spreading, normalized Elo) are, when using my UHO_XXL openings !!! The draw-rate is lowered from more than 90% to 50%, the Elo-spreading is more than 2.7x bigger and the normalized Elo value is clearly better (more than 1.5x bigger), too. Just awesome!

 

Idea for UHO XXL openings and all work done by Stefan Pohl. The using of the UHO XXL openings is recommended only for the Stockfish Framework or for other tests-setups, where huge head-to-head tests of engines with 50000 ore more games are done. For all other testers, I strongly recommend to use the UHO openings in my Anti Draw openings collection, which can be found here: Anti Draw Openings


(C) 2021 Stefan Pohl (SPCC)
www.sp-cc.de