Stefan Pohl Computer Chessprivate website for chessengine-testsThe Unbalanced Human Openings XXL Project
BUGFIX: (2022/05/07) The UHO XXL files (old and 2022 files) had a rare bug: Some lines ended in a mate in some moves. This was caused by a bug in the pgnscanner-tool: If Komodo finds a mate, the search-depth goes up to 99 and the eval, the pgnscanner gave back, is bogus... Sorry for that. I fixed all UHO pgn and epd files. Please re-download. Mention, in the raw-data files of old UHO XXL and the all-lines file of UHO 2022 XXL, the buggy lines are still in. If you want to use these raw-data files (did anybody ever do this?) please search for "depth=99" in the pgn-files with an editor and remove these games, before using them...(the bug is very rare, so no big deal to find and delete the buggy lines). UHO XXL should be used in the Stockfish-Framework (Fishtest), only, and there, the raw-data files are not used, of course. So, I see no need to fix and reupload the raw-data files, because nobody will/should use them... Good news: non-XXL UHOs (and all of my other AntiDraw-openings) are OK, because the bug only occurs in Lichess-base games, which were only used for building the UHO XXL-files...
Download the UHO XXL 2022 openings here
The UHO XXL 2022 openings were made for the Stockfish Framework. There, the tests, using classical openings, have reached draw-rates around 90%, which is a lot, lot too much for statistically valid results, especially,because in the Framework, where SF devs, which are close in strength, are tested versus the current SF master. Step 1: Built a database out of all games since 1945 of the Megabase 2022 (around 8 million human games) and 3 million human games (2019-2021) out of the LiChess Elite-Database (filtered all (standard) games from lichess to only keep games by players rated 2400+ against players rated 2200+, excluding bullet games.). Step 2: Deleted all games out of this 11 million human games database, which were starting from a FEN (Chess960). Delete all comments out of the games. Delete all games with less than 15 moves. Step 3: Deleted all played moves beyond 8 moves / 16 plies. Delete all games, where not both queens are still on the board in the endposition. Step 4: Removed all games with an endposition, that already is in the file (remove all doubles). Result: 3.4 million opening-lines with different endpositions, all 8 moves/16 plies deep. Step 5: Deleted all Tags, replace them with 7 empty standard Tags. Add the ECO-Code Tag. Setting all game results to 1/2-1/2. Step 6: Pre-filtered all endpositions with KomodoDragon 2.6 (1 sec/move on Quadcore), then deleted all lines with endposition-eval out of the interval [+0.70;+2.99] 1129590 lines remaining Step 7: Evaluated all pre-filtered endpositions with KomodoDragon 2.6 (8.5 sec/move on Quadcore), then deleted all lines with endposition-eval out of the interval [+1.00;+1.99] This evaluation was done with nearly 3x more thinking-time (8.5 sec/move) (same hardware) and the much stronger KomodoDragon 2.6 (instead of KomodoDragon 1.0 and 3 sec/move for old UHO XXL). So the evaluations of the endpositions are much better and deeper! IMPORTANT: KomodoDragon 2.6 shows evals, which are around +0.20 higher, than KomodoDragon 1.0 does, which I used for evaluating the old UHO XXL openings. So, the Eval-intervals in the new UHO XXL 2022 are +0.20 higher... 663273 lines remaining Filtered 8 Unbalanced Human Opening files with increasing evals and made additional EPD-files out of these rawdata-file: UHO_XXL_2022_+1.00_+1.29 = 284039 lines (compared to old UHO XXL, the new UHO XXL 2022 files contain around +13% more lines)
Additionally: UHO_MEGA_2022_+1.10_+1.49 = 321415 lines = +44% bigger than (old) UHO_XXL_+0.90_+1.19 (223081 lines) - more information below.
Why so many different openings-sets? At the moment, the first intervals of should work best in the Stockfish Framework, but if Stockfish gains more strength and the hardware gets faster in the future, the draw-rates will raise again. Then an openings-set with a higher eval-interval can be chosen, to lower the draw-rates again into a valid range of 45%-65%. This concept of different sets with increasing eval-intervals was developed by me (and already used in my AntiDraw openings collections) and it makes it possible to make the draw-rate controllable and keep it in a valid range of 45%-65%. So, these openings will work in the present and in the future (faster hardware and stronger engines raise the draw-rate!) and with different test-setups (longer thinking-times or more used threads raise the draw-rate, too).
Conclusions from the tests (see below): A) As expected, the new UHO_XXL_2022_+110_+139 and the UHO_XXL_2022_+120_+149 results are the closest to the old UHO_XXL_+090_+119 openings in these testruns. Elo-spreading is nearly the same and the draw-rate of 50.2% and 43.1% are close to the 46.5% of the old UHO-set. UHO_XXL_2022_+100_+129 gives too much draws and UHO_XXL_2022_+130_+159 a very small draw-rate.
Tests: AMD Ryzen 3900 12-core (24 threads) notebook with 32GB RAM. 20 games are played simultaneously Each openings-set played 10000 games (each opening replayed with reversed colours, of course) Here the final results of the testruns (calculated by ORDO): (old) UHO_XXL_+0.90_+1.19 (223081 lines) for comparison: UHO_MEGA_2022_+110_+149 (321415 lines= +44% bigger than (old) UHO_XXL_+0.90_+1.19)
OLD UHO XXL version from 2021
Download the UHO XXL openings here
The UHO XXL openings were made for the Stockfish Framework. There, the tests, using classical openings, have reached draw-rates around 90%, which is a lot, lot too much for statistically valid results, especially, because in the Framework, where SF devs, which are close in strength, are tested versus the current SF master. Step 1: Build a database out of all games since 1945 of the Megabase 2021 (around 8 million human games) and 2 million human games (2019-2020) out of the LiChess Elite-Database (filtered all (standard) games from lichess to only keep games by players rated 2400+ against players rated 2200+, excluding bullet games.). Step 2: Delete all games out of this 10 million human games database, which were starting from a FEN (Chess960). Delete all comments out of the games. Delete all games with less than 15 moves. Step 3: Delete all played moves beyond 8 moves / 16 plies. Delete all games, where not both queens are still on the board in the endposition. Step 4: Remove all games with an endposition, that already is in the file (remove all doubles). Result: 2.7 million opening-lines with different endpositions, all 8 moves/16 plies deep. Step 5: Delete all Tags, replace them with 7 empty standard Tags. Add the ECO-Code Tag. Setting all game results to 1/2-1/2. Step 6: Evaluated all endpositions with KomodoDragon 1.0 (3 sec/move on Quadcore, average search depth=20), then deleted all lines with endposition-eval out of the interval [-1.99;+1.99] 2.5 million opening-lines remaining. Saved in the rawdata.7z archive.
Filtered 7 Unbalanced Human Opening files with increasing evals and made additional EPD-files out of these rawdata-file:
UHO_XXL_+0.80_+1.09 = 261043 lines
Why so many different openings-sets? At the moment, the first interval of [+0.80;+1.09] should work best in the Stockfish Framework, but if Stockfish gains more strength and the hardware gets faster in the future, the draw-rates will raise again. Then an openings-set with a higher eval-interval can be chosen, to lower the draw-rates again into a valid range of 45%-65%. This concept of different sets with increasing eval-intervals was developed by me (and already used in my AntiDraw openings collections) and it makes it possible to make the draw-rate controllable and keep it in a valid range of 45%-65%. So, these openings will work in the present and in the future (faster hardware and stronger engines raise the draw-rate!) and with different test-setups (longer thinking-times or more used threads raise the draw-rate, too). How cool is that?
Important information: Because pgn-extract always keeps the first appearance of one opening-endposition and removes all following doubles, the most common human openings are at the beginning of the openings-files with higher probability and the more "exotic" human opening-lines are at the end of the openings-files with higher probability. Because of this and the fact, that my UHO XXL-files are really huge, it could make sense to cut the files at the end off, if not so many opening-lines are needed.
Huge 10% time odds test (60sec+600ms vs 66sec+660ms), using Stockfish 210827 (selfplay). 60000 games each testrun, played in the Stockfish-Framework (=180000 games total(!)). noob_3moves and 8moves_v3 are classical openings-sets, which were used in the Stockfish-Framework before my UHO_XXL set (8moves_v3 is still in use for the regression-testruns). Name Draw-rate Elo Normalized Elo Win-Draw-Loss
UHO_XXL_+0.90_+1.19 50% 10.94 24.05 [16069,29751,14180]
noob_3moves 95% 3.39 15.46 [1825, 56936,1239]
8moves_v3 91% 3.94 14.97 [3144, 54393,2463]
Really impressive, how much better the results (draw-rate, Elo-spreading, normalized Elo) are, when using my UHO_XXL openings !!! The draw-rate is lowered from more than 90% to 50%, the Elo-spreading is more than 2.7x bigger and the normalized Elo value is clearly better (more than 1.5x bigger), too. Just awesome!
Idea for UHO XXL openings and all work done by Stefan Pohl. The using of the UHO XXL openings is recommended only for the Stockfish Framework or for other tests-setups, where huge head-to-head tests of engines with 50000 ore more games are done. For all other testers, I strongly recommend to use the UHO openings in my Anti Draw openings collection, which can be found here: Anti Draw Openings
|