Stefan Pohl Computer Chess

private website for chessengine-tests


The Unbalanced Human Openings (UHO)

- the future of Computerchess (Part 3)

 

Whats new in V2.0? All 400000 endpositions were re-evaluated by KomodoDragon 1.0 (instead of Komodo 14 in UHO V1.0), because KomodoDragon 1.0 is around +200 Elo (!) stronger than Komodo 14. In the testing-results you can see, that the results of the testings are somewhat better with UHO V2.0, compared to V1.0, because the evaluation of the endpositions of the opening-lines has a so much higher Elo-level.

 

The idea of the UHO (Unbalanced Human Openings) openings is quite simple. The chess community is a real conservative group. And I recognized, that many of them dont like openings, which were constructed (my Drawkiller openings for example) or changed manually (like my NBSC openings, where black is not allowed to castle short). They say, that is no "real chess". I do not agree with that personally, but I decided to try building openings-sets, which are working like NBSC (white has a clear advantage) and can be (because white has a clear advantage at the beginning of the game) rescored with my Advanced Armageddon Scoring system for extremly high Elo-spreadings, which makes the rakings in a test/tournament much more reliable with a lower number of played games, but are complete unmodified and 100% human. So, I release opening sets, now, which are filtered out of the Megabase 2020 (by ChessBase). And the only filter, I am using, is the eval of KomodoDragon 1.0 in the endposition of each opening-line and that both queens are still on the board in each endposition. KomodoDragon analyzed each endposition on a QuadcorePC with 15 seconds thinking-time.

 

Download the Unbalanced Human Openings right here

 

So the Unbalanced Human Openings contain:

 

- 100% moves played by humans, only. Both players had at least 2300 Elo.

- no manually constructed openings (like my Drawkiller openings)

- no manually added moves to make castling impossible (like my NBSC openings)

- no selection of piece-patterns (like my SALC openings (white and black castle to opposite sides of the board))

 

So, after removing all doubled endpositions, I had to evaluate around 400000 (!) opening-lines (4, 6 and 8 moves deep). Then I did a lot of pre-tests, to find out, which eval-interval is giving the best results (best Elo-spreadings in a test round-robin of Stockfish 8-11). And I found 2 things: The eval interval has to be very small and the best KomodoDragon eval-interval is [+0.90;+0.99]. Then I filtered all endpositions out, with that eval-interval. And thats all...

The good news is, that all human opening systems, which have lines within that eval-interval, are part of these UHO-sets. The bad news is, that around 96% of all lines were filtered out, so the resulting UHO opening-sets are pretty small. But they are working really well. Not as good as the NBSC-openings (a forbidden castling is a more stable advantage, than an eval-value in the endposition), but not so far away from the quality of NBSC (which was a surprise for me!) - see the testing-results below.

 

There are 3 UHO opening-sets (as PGN and EPD-file), all KomodoDragon evals in the endpoitions are in the interval [+0.90;0.99]:

 

4mvs_+90_+99: 635 lines

6mvs_+90_+99: 2933 lines

8mvs_+90_+99: 8533 lines

 

And I added a bigger 8moves file with eval-interval [+0.80;+1.09]:

 

8mvs_big_+80_+109: 25857 lines (opening-books for FritzGUI, ShredderGUI, ArenaGUI and polyglot-format were built out of this file and the 8mvs_+90_+99-file, you find them in the Book-folder)

 

In all these PGN/EPD-files, the games are sorted by Elo of the players:

Both players 2600 Elo or better, followed by 2500 Elo or better, followed 2400 Elo or better, followed by 2300 Elo or better.

So, if you use the files in your engine testings/tournaments sequentially, the highest Elo-levels are used, first.

 

Aditionally, I added the non-filtered raw-data to the download. In that zip-folder, there are the 4, 6 and 8 moves pgn-files, which are containing no doubled endpositions, are sorted by Elo of the players and contain the KomodoDragon eval in the Annotator-Tag. But no filtering was done. So, all lines within the eval-interval [-1.99;+1.99] are in these files. So, these files can be used to create own opening-set or books, by filtering these raw-data files by eval or by Elo or both. But do not use them unfiltered – they contain very bad lines for white and black!

 

In the Armageddon-Tools folder, you find the tools, needed for rescoring engine testing/tournament results with Armageddon scoring system (win white = 1 point for white, draw = 1 point for black, win black = 1 point for black) or (strongly recommended) with my new Advanced Armageddon scoring system (win white = 1 point for white, draw = 1 point for black, win black = 2 points for black). As you will see in testing-results below, the Advanced Armageddon Rescoring gives an Elo spreading of the results, which is around 2x bigger, than using the classical chess-scoring. And mention, that you have to play around 4x more games, to half the size of the errorbar. So, using the UHO-openings with Advanced Armageddon Rescoring, you have to play only 25% of the number of games for the same statistical quality of results, you had to play, using the classicla chess scoring!!!

 

 

Here a short summary of my testings and comparsions to classcial opening-sets:

 

Testing conditions:

2'+1'', Singlethread, i7-8750H Hexacore mobile CPU, 256MB Hash, cutechess-cli

(no TB for engines, but 5 Syzygy for cutechess), Contempt=0 for all Stockfish.

All openings replayed with reversed colors.

Round Robin with 1500 games with SF 11, SF 10, SF 9 and SF 8. Each SF played

250 games vs. each of the 3 opponents = 1500 games per testrun.

 

*********************************************************************************

Classical scoring system:

 

Average results of 5 classical openings-sets (Balsa 2724, SuperGM 4moves,

Chad 8ply, Stockfish Framework 8moves, Hert 500):

Draw-rate : 69.0 % (smaller is better)

Elo spreading (SF 11 to SF 8): 139 Elo (bigger is better)

 

Average results of 3 Unbalanced Human Openings (UHO V2.0) sets (4, 6 and 8 moves):

Draw-rate : 50.2 % (smaller is better) (UHO V1.0: 49.7 %)

Elo spreading (SF 11 to SF 8):171 Elo (bigger is better) (UHO V1.0: 160 Elo)

*********************************************************************************

Advanced Armageddon scoring system:

 

Average result of 3 NBSC Advanced Armageddon openings-sets (3, 4 and 5 moves):

Draw-rate : 0 %

Elo spreading (SF 11 to SF 8): 318 Elo (bigger is better)

White Score : 48.0 %

 

Average results of 3 Unbalanced Human Openings (UHO V2.0) sets (4, 6 and 8 moves):

Draw-rate : 0 %

Elo spreading (SF 11 to SF 8): 308 Elo (bigger is better) (UHO V1.0: 296 Elo)

White Score : 41.3 %

*********************************************************************************

 

Conclusions:

 

1) When UHO openings are used with classical scoring system (which is possible, but not recommended), the results are measureable better than using a classical opening set: The draw-rate is clearly lower (50.2 % instead of 69.0 %) and the Elo-spreading is higher (171 Elo instead of 139 Elo).

 

2) My Advanced Armageddon scoring system, which considers the fact, that the UHO and NBSC openings give white a clear advantage, shoots the results into another dimension:

a) The Elo-spreading is at least doubled (which means, you have to play only 1/4 amount of games, to get results out of the errorbar!).

b) As I expected, the NBSC openings give a little bit better results (white score is closer to 50% and Elo spreading is higher), than the UHO openings, because the forbidden short castling for black is a more stable advantage for white in the game, than a high evaluation in the endposition of the opening-line.

But the results of NBSC openings are not so much better, and the UHO openings have the advantage, that they are totally unchanged, 100% human-played opening-lines.

Taken from the Megabase 2020, only filtered, but totally unchanged, non-modified!

c) There are no draws anymore...The draw-rate is always 0%

 

 

***********************************************************************************

Additionally, I played 1000 games with Ultralong thinking-time (10'+5'')

Stockfish 12 versus Komodo Dragon 1.0

Singlethread, i7-8750H Hexacore mobile CPU, 512MB Hash, cutechess-cli

(no TB for engines, but 5 Syzygy for cutechess), Contempt=0

All openings replayed with reversed colors.

 

For comparsion the result, using the classical HERT 500 openings-set:

Stockfish was +28 Elo stronger (Score 53.9%) and Draw-rate was 84.1%

Stockfish 12 vs Dragon 1.0: 1000 (+118,=841,- 41), 53.9 %

 

And now, see the results of the UHO V2.0 6 moves [+090;+099] openings:

Stockfish was +58 Elo stronger (Score 58.1%) and Draw-rate was 52.8%

Stockfish 12 vs Dragon 1.0: 1000 (+317,=528,-155), 58.1 %

 

And now, see the results of the UHO V2.0 6 moves [+090;+099] openings after the

Advanced Armageddon Rescoring:

Stockfish was +108 Elo stronger (Score 64.8%) and Draw-rate was 0%

Stockfish 12 vs Dragon 1.0: 1027 (+666,= 0,-361), 64.8 %

 

The Elo-spreading was more than 2x bigger, using the UHO V2.0 openings and nearly 4x bigger after the Advanced Armageddon rescoring of the UHO 2.0 games !!! That is just awesome! Because it means, using the UHO V2.0 openings, you have to play only 25% amount of games to get an Elo spreading bigger than the Errorbar.

And using my Advanced Armageddon rescoring means, you have to play only 12.5% amount of games to get an Elo-spreading bigger than the Errorbar - compared to the classical openings set. Means, 87.5% of PC-playing-time can be saved. Wow! Especially, when testing with long thinking time, this is extremly helpful.

 

See the detailed and complete testing-results in the UHO_Testresults.txt -file in the Tests-folder.There you find all played test-games, too.

 

Or for seeing the testing-results directly here on screen, click here

 

 

All work done by Stefan Pohl (SPCC) www.sp-cc.de

(C) 2021