Stefan Pohl Computer Chessprivate website for chessengine-tests
The Drawkiller Openings Project - the future of Computerchess (Part 1)
Ideas and development: Stefan Pohl and Hauke Lutz Alpha testing: Hauke Lutz Beta and final testings & documentation: Stefan Pohl
Current version: 3.1 Added all opening books in polyglot-format.
Download the Drawkiller-openings here
Download the testgames (Drawkiller, SALC, FEOBOS Noomen) here
The Drawkiller openings are based on the main ideas of my SALC openings, which were combined with a very good idea of Hauke Lutz: Very short openings-lines, which only contain pawn-moves. The Drawkiller openings are not for playing vs. other books – all engines must use the Drawkiller openings in a tournament or testrun!
The original SALC openings were filtered out of human chess-games (filtered out of the BigDatabase). SALC means "S"hort "A"nd "L"ong "C"astling: white and black castling to opposite sides (if white played 0-0, black played 0-0-0. If white played 0-0-0, black played 0-0)), both queens still on board. No double games. When using SALC-openings, the chance for attacks towards the opponent king is much higher than using normal opening-books. Because of this, computerchess using SALC openings, will bring more action and fun to watch (and a measureable lower number of draws), because the faster the computers get, the higher the quality of computerchess get and the higher the draw-rate in engine-engine-matches get...so the computerchess is in danger to die the "draw-death" in the near future. So, using SALC openings will give computerchess a future beyond playing only draws or using strange and incorrect gambit-openings for a lower draw-rate! The SALC-openings were a huge success. The draw-rates in the testruns were lowered from 63%-64% (using standard openings) to around 48% without pushing the scores of asmFish and Komodo closer to 50%. For more information please checkout the Readme-file in the SALC V5.02 folder, which you can download on my website, too.
The problem of filtering SALC opening-lines out of a database of human games is, that those SALC-positions are rare. In order to get a huge number of SALC-positions, it was necessary to make the opening-lines quite long (8 moves for the small 500 openings-set and 10 moves for the big 25000 openings-set and opening-book). And with 10 human moves played, the opening is done and the middlegame is starting already.
The idea of the new Drawkiller openings is, not to finish the opening in the opening-book, but to let the engines play their own opening-moves. This is especially important, when testing the new neural-net based engines (LC Zero for example), which are playing very strong and creative in the opening.
How can this be done?
Some months ago, Hauke Lutz had the great idea, to build some openings-sets, which contain pawn moves only (4 pawn-plies, 8 pawn-plies), so the engines had to move all other pieces in the openings by themselves. These pawn-plies files do not contain strange pawn-moves (a4,b4,g4,h4), which would lead to really strange positions on the chessboard.
Of course, these pawn-move opening lines can not be SALC-positions, because all other pieces stay on their starting position. But then, I had the idea, that it is possible to combine these pawn-move openings with some moves, which
a) move queen, bishop and knight „out of the way“, then
b) (move the white king to a1, the a1-rook to d1 and the queen to e1) or (move the white king to h1, the h1-rook to e1). And the same for black, moving the king to the opposite side of the board (a8 or h8), so, when the white king is on a1, the black king is on h8, when the white king is on h1, the black king is on a8. That creates „SALC“-like starting-positions, with both queens never on the same line (one queen is on e-line, one queen stays on d-line – that prevents early queen-captures, when the d-line is opened! - I took this idea from my SALC-half-closed positions, which measureable lowered the draw-rate (around -5%) compared to not-half-closed SALC-positions).
c) move bishop and knight back to their normal starting position
And, when this is done, the pawn moves of Hauke Lutz are played.
These are the two move-sequences, which create the „SALC“-like king- and rook-positions on the chessboard:
1. e3 d6 2. Nh3 Nc6 3. Bd3 Be6 4. O-O Qd7 5. Kh1 O-O-O 6. Re1 Kb8 7. Bf1 Ka8 8. Ng1 Qe8 9. Nf3 Bc8 10. Ng1 Nb8 1. d3 e6 2. Be3 Bd6 3. Nc3 Nh6 4. Qd2 O-O 5. O-O-O Kh8 6. Kb1 Re8 7. Ka1 Bf8 8. Qe1 Ng8 9. Bc1 Nf6 10. Nb1 Ng8
As you can see, only one pawn-move (1.e3/d3 1...e6/d6) is needed, to open a way for bishop and queen for each color. This pawn-moves were filtered out of the pawn-move openings by Hauke Lutz – then the pawn-move opening-lines could be linked together with the move-sequences from above, without generating impossible (illegal) move-sequences. The result are „SALC“-like openings (=kings on different sides of the board), without any piece, except pawns, the king and one rook and one of the two queens, moved away from their normal starting positions.
Here an example of a „complete“ artificial SALC opening-line out of the drawkiller_tournament.pgn file:
1. e3 d6 2. Nh3 Nc6 3. Bd3 Be6 4. O-O Qd7 5. Kh1 O-O-O 6. Re1 Kb8 7. Bf1 Ka8 8. Ng1 Qe8 9. Nf3 Bc8 10. Ng1 Nb8 11. h3 c6 12. e4 f6 13. d4 e5
(the line is 13 moves deep, but only 3 pawn moves were made (and the kings „traveled“ to the edge of the board), so for engine-play, this opening-line is only 3 moves deep and the engines have to play the whole opening by themselves and have to move all non-pawns-pieces from the baseline).
These Drawkiller openings combine the advantages of very short opening-lines and the much more spectacular and much less drawish chess, my SALC-idea brings to computerchess.
The Drawkiller EloZoom lines look different and lead to different non-pawn piece-patterns on the board:
1. e3 d6 2. Bd3 Be6 3. Nf3 Nc6 4. Kf1 Kd7 5. Kg1 Kc8 6. Qe1 Kb8 7. Qd1 Qe8 8. Bf1 Bc8 9. Ne1 Nd8
1. d3 e6 2. Be3 Bd6 3. Nc3 Nf6 4. Kd2 Kf8 5. Kc1 Kg8 6. Kb1 Bf8 7. Qe1 Qe8 8. Bc1 Qd8 9. Nd1 Ne8
(The kings are on b and g, the rooks stay on a and h and the knight, which was on the field, the king is now, is on d or e) - the main ideas of Drawkiller stay the same here: Kings on different sides of the board, queens not on the same row, all non-pawn pieces on the 1st and 8th rank...
Important: Mention, that the Drawkiller openings contain only normal chess-moves, each line is starting from the normal starting-position of classical chess. Drawkiller openings are not any kind of a chess-variant, like Shuffle-Chess or Chess960 or something like that! Because of this, it was possible to build opening-books for the ChessGUIs (Fritz, Arena, Shredder) out of the Drawkiller openings, which can be used for engine-tournaments in that GUIs. And each chess-engine on the planet can play chess, with using Drawkiller openings, because they are normal, classical chess!!!
Test results:
(asmFish 170426 vs. Komodo 10.4, 5'+3'' time-control, singlecore, no ponder, no endgame-bases, LittleBlitzerGUI, 1000 games each testrun(!) except Noomen Gambit-lines (only 246 positions, so 492 games were played) and Noomen TCEC Superfinal (only 100 positions, so 200 games were played))
Stockfish Framework standard 8 move openings: Score 60.3% – 39.7%, draws: 63.4% FEOBOS v20 contempt 5 top 500 openings: Score 58.7% - 41.3%, draws: 64.1% HERT 500 set: Score: 60.6% - 39.4%, draws: 60.4% Noomen Gambit-Lines: Score 59.1% - 40.9%, draws: 59.3% 4 GM-moves short book: Score 60.5% - 39.5%, draws: 57.1% Noomen TCEC Superfinal (Season 9+10): Score: 62.5% - 37.5%, draws: 50.0% SALC V5 half-closed: Score 61.6% - 38.4%, draws: 49.2% SALC V5 full-closed 500 positions: Score 66.5% - 33.5%, draws: 47.7%
NEW:
Drawkiller (normal set): Score: 65.3% - 34.7%, draws: 33.5% Drawkiller (tournament set): Score: 65.3% - 34.7%, draws: 33.5% (no mistake by me: the results of Drawkiller normal and tournament were exactly Drawkiller (small 500 positions set): Score: 66.4% - 33.6%, draws 30.5%
Drawkiller balanced: Score 69.4% - 30.6%, draws 36.4% Drawkiller balanced big (15962 positions): Score 67.4% - 32.6%, draws 38.8%
Drawkiller EloZoom: Score: 73.2% - 26.8%, draws 36.5% Drawkiller EloZoom big (20043 positions): Score: 69.2% - 30.8%, draws 40.7%
As you can see, the Drawkiller openings are not just an improvement over my SALC openings, they are a breakthrough into another dimension! Never before any openings-set gave such low draw-rates without crunching the scores of the engines towards 50%, but instead pushing the scores away from 50%. The Drawkiller Normal- and Tournament sets nearly halve the draw-rate, compared to FEOBOS or the Stockfish Framework 8-move openings. And the small 500 set has more than a halved draw-rate compared to FEOBOS or the Stockfish Framework 8-move openings. And take a look at the result-spreadings: With Drawkiller EloZoom, asmFish scored more than 72% vs. Komodo. With the Stockfish Framework standard openings, the score is 60.3%. 60.3% means an Elo-distance to Komodo of +72 for asmFish. 72.0% means an Elo-distance to Komodo of +164 for asmFish (!!!) That is more than doubled Elo-spreading. Which means you have to play only less than 1/4 amount of games with Drawkiller (compared to a standard opening set) to get results out of the errorbar (because you need 4x amount of games to half the errorbar-interval). I would never have expected, that this was possible – the Drawkiller project is really a breakthrough into another dimension. And the Drawkiller project kills the draw-death of computerchess for the next decades – mention a TCEC-tournament with nearly halved draw-rates... how awesome would be that?!?
Enjoy Drawkiller-Chess
As you can see in the testing results, the drawrate of the big Drawkiller openings are a little bit higher, than the drawrate of smaller files. So, I recommend, to use always the smallest Drawkiller openings set/book, which is possible for your engine tournament or testrun. That will give the best results. The Drawkiller openings contain the raw-data, too. The raw-data contains the unfiltered, unchecked and unmixed pgn-files. Do not use these files for engine-play, they contain very bad endpositions for white or black. The raw-data was included, to make it possible, to filter that raw-data in the future again (with stronger engines or faster machines), so the Drawkiller openings can be rebuilt in the future!
The Drawkiller openings were filtered out of this raw-data with Komodo 11.2.2. Komodo checked all endpositions (using pgnscanner-tool), running on a i7-6700HQ 2.6GHz Notebook (Skylake CPU) with all 4 cores and 2048 Hash, Contempt=0.
For the normal-file, the Komodo evaluation had to be in an interval: eval: [-0.49;-0.10] or [+0.10;+0.49]
For the tournament-file, the Komodo evaluation had to be in an even smaller interval: eval: [-0.39;-0.20] or [+0.20;+0.39]
For the small 500-file, the Komodo evaluation had to be in an even smaller interval: eval: [-0.36;-0.25] or [+0.25;+0.36]
For the Drawkiller balanced-files, the Komodo evaluation had to be in an even smaller interval: eval: [-0.09;+0.09](and [-0.19;+0.19] for the big-balanced-file)
For the Drawkiller EloZoom-files, the Komodo evaluation had to be in an eval -interval of eval: [-0.09;+0.09] (all EloZoom-files !)
You can see, that the eval-intervals are quite small. No endposition of any Drawkiller opening gives a huge advantage to white or black! Especially the Balanced and EloZoom files have very small eval-intervals. I believe no other opening set has such balanced endpositions...
Thinking-time for each endposition was: small 500 set: 60'' Normal/Tournament sets: 45'' Big sets: 30''
Copyright (C) 2019 by Stefan Pohl and Hauke Lutz
2019/02/20 Testrun of the new Drawkiller balanced set (and testruns of Drawkiller tournament, Stockfish Framework 8moves and GM-4moves sets for comparsion).
3 engines played a RoundRobin (Stockfish 10, Houdini 6 and Komodo 12), with 500 games in each head-to-head, so each engine played 1000 games. For each game one opening-line was chosen per random by the LittleBlitzerGUI. Singlecore, 3'+1'', LittleBlitzerGUI, no ponder, no bases, 256 MB Hash, i7-6700HQ 2.6GHz Notebook (Skylake CPU), Windows 10 64bit
download all played games here
In the Drawkiller balanced sets, all endposition-evals (analyzed by Komodo) of the opening lines are in a very small interval of [-0.09;+0.09]. The idea is, that this should lead to wider Elo-spreading of the Engine ratings, which makes the Engine rankings much more statistically reliable (or a much lower number of played games is needed, to get the results out of the errorbar-arrays). Of course, on the other hand, this concept leads to little bit higher draw-rates... Let's see, if it worked:
Drawkiller balanced:
Program Elo + - Games Score Av.Op. Draws 1 Stockfish 10 bmi2 : 3506 11 11 1000 70.9 % 3347 36.2 %
Elo-spreading (1st to last): 204 Elo Draws: 37.9%
Drawkiller tournament:
Program Elo + - Games Score Av.Op. Draws 1 Stockfish 10 bmi2 : 3494 11 11 1000 68.9 % 3353 34.2 %
Elo-spreading (1st to last): 174 Elo Draws: 36.1%
GM_4moves:
Program Elo + - Games Score Av.Op. Draws 1 Stockfish 10 bmi2 : 3475 11 11 1000 65.4 % 3363 53.2 %
Elo-spreading (1st to last): 130 Elo Draws: 56.3%
Stockfish framework 8moves:
Program Elo + - Games Score Av.Op. Draws 1 Stockfish 10 bmi2 : 3463 11 11 1000 63.0 % 3369 59.7 %
Elo-spreading (1st to last): 114 Elo Draws: 61.3%
Conclusions:
1) The Drawkiller balanced idea was a success. The draw-rate is a little bit higher, than Drawkiller tournament (that is price, we have to pay for 2)), but look at point 2) and mention, that even this little higher draw-rate is still much, much lower, than the draw-rate of any other non-Drawkiller openings set...
2) The Elo-spreading, using Drawkiller balanced, was measureable higher, than with any other openings-set. That makes the Engine rankings much more statistical reliable. Or a much lower number of played games is needed, to get the results out of the errorbar-arrays: Example: Compared to the result of Stockfish framework 8moves openings, the Elo-spreading of Drawkiller balanced is nearly doubled, which means, you can have a doubled errorbar-array size for the same statistical reliability of the Engine rankings in a tournament / ratinglist. Mention, that you have to play 4x more games to half the size of an errorbar! That means, if you are using Drawkiller balanced openings, you have to play only 25%-30% amount of games, which you have to play, when using Stockfish Framework 8move openings for the same statistical result-quality of engine rankings (!!!) - how awesome is that?!?
2019/01/06 One of the biggest opening-sets testings of all time!
8 opening-sets were tested: Drawkiller tournament, SALC V5, Noomen (TCEC openings Season 9-13 Superfinal and Gambit-openings), Stockfish Framework 2-moves and 8-moves openings, 4 GM moves (out of MegaBase 2018, checked with Komodo), the HERT set by Thomas Zipproth and FEOBOS v20.1 contempt 3 (using contempt 3 openings is recommended by the author, Frank Quisinsky). 7 engines played a 2100 games RoundRobin-tournament with each opening-set (not openings-set playing vs. another opening-set!). For each game one opening-line was chosen per random by the GUI. 7 engines played round-robin: Stockfish 10, Houdini 6, Komodo 12, Fire 7.1, Ethereal 11.12, Komodo 12.2.2 MCTS, Shredder 13. = 100 games were played in each head-to-head competition. In each round-robin, each engine played 600 games. Singlecore, 3'+1'', LittleBlitzerGUI, no ponder, no bases, 256 MB Hash, i7-6700HQ 2.6GHz Notebook (Skylake CPU), Windows 10 64bit. 3 games running in parallel, each testrun took 3-4 days, depending on the average game-duration. Draw adjucation after 130 played moves by the engines (after finishing opening-line)
Download all 8 x 2100 = 16800 played games here
First of all the main question: Why are low draw-rates and wide Elo-spreadings of engine testing-results better? You find the answer here
This excellent experiment of Andreas Strangmueller shows without any doubt, that: The more thinking-time (or faster hardware, thats the same!) the computerchess gets, the more the draw-rates climb and the more the Elo-spreadings shrink. So, it is only a question of time, that the draw-rates will get so high and the Elo-spreading of testings-results will get so small, that engine-testing or engine-tournaments will no longer give any valuable results, because the Elo-differences of results will always stay inside the errorbars, even with thousands of played games. So, it is absolute necessary to lower the draw-rates and raise the Elo- spreadings, if computerchess shall survive the next decades! Therefore the follwing conclusions of this huge experiment with different opening-sets:
1) The Drawkiller openings are a breakthrough into another dimension of engine-testing: The overall draw-rate (27%) is nearly halved, compared to classical openings sets (FEOBOS (51.3%), Stockfish Framework 8moves openings (51.9%)) AND the Elo-spreading is around +150 Elo better (!!), so the rankings are much more stable and reliable, because the errorbars of all results are nearly the same in all testruns. And the average game-duration, using Drawkiller, was 11.5% lower, than using a classical opening-set. So, in the same time, you can play more than +10% games on the same machine, which improves the quality of the results, too, because the errorbars get smaller with more played games. Download the future of computerchess (the Drawkiller openings): here
2) The order of rank of the engines is in all mini-ratinglists generated by ORDO out of these testruns exactly the same. So, what we learn here, is, that it does not matter, if an opening-set contains all ECO-codes (FEOBOS does!) or not (Drawkiller, SALC V5 do definitly not!). The order of rank of engines in a ratinglist is exactly the same! So, the over and over repeated statement of many people, that using all or the mostly played ECO-codes (by human players) in an opening-set is important for engine-testing, because otherwise the results are distorted, is a FAIRY TALE and nothing else !!!
3) At the bottom, I added the CEGT and CCRL ratinglists with the same engines, which were used for this project (nearly the same versions (Ethereal 11 instead Ethereal 11.12 for example)). There you can see, that the ranking in these ratinglist is exactly the same, too. So, what we learn here, is, that the over and over repeated statement of many people, that it is necessary to test engines versus a lot of opponents for a valid rating/ranking is a FAIRY TALE, too: 6 opponents gave the same ranking-results in all testruns of this project, than in CEGT in CCRL with much, much more opponents.
Long summary (with ratinglists):
Avg game length = 389.777 sec
Program Elo + - Games Score Av.Op. Draws 1 Stockfish 10 bmi2 : 3459 23 23 600 82.6 % 3157 21.8 %
Elo-spreading: from first to last: 448 Elo Number of early draws:
Games : 2100 (finished)
SALC V5:
Avg game length = 399.781 sec
Program Elo + - Games Score Av.Op. Draws 1 Stockfish 10 bmi2 : 3404 21 21 600 78.3 % 3166 32.8 %
Elo-spreading: from first to last: 341 Elo Number of early draws:
Games : 2100 (finished)
Noomen (TCEC openings Season 9-13 Superfinal and Gambit-openings (477 lines)):
Avg game length = 405.223 sec
Program Elo + - Games Score Av.Op. Draws 1 Stockfish 10 bmi2 : 3388 20 20 600 76.8 % 3169 39.7 %
Elo-spreading: from first to last: 312 Elo Number of early draws:
Games : 2100 (finished)
Stockfish Framework 2moves openings:
Avg game length = 430.108 sec
Program Elo + - Games Score Av.Op. Draws 1 Stockfish 10 bmi2 : 3395 20 20 600 77.5 % 3168 35.0 %
Elo-spreading: from first to last: 333 Elo Number of early draws:
Games : 2100 (finished)
4 GM moves (out of MegaBase 2018, checked with Komodo):
Avg game length = 449.414 sec
Program Elo + - Games Score Av.Op. Draws 1 Stockfish 10 bmi2 : 3396 20 20 600 77.5 % 3167 37.3 %
Elo-spreading: from first to last: 330 Elo Number of early draws:
Games : 2100 (finished)
HERT set (500 pos):
Avg game length = 442.339 sec
Program Elo + - Games Score Av.Op. Draws 1 Stockfish 10 bmi2 : 3384 20 20 600 76.3 % 3169 42.2 %
Elo-spreading: from first to last: 316 Elo Number of early draws:
Games : 2100 (finished)
FEOBOS v20.1 contempt 3:
Avg game length = 437.481 sec
Program Elo + - Games Score Av.Op. Draws 1 Stockfish 10 bmi2 : 3365 19 19 600 73.9 % 3173 45.5 %
Elo-spreading: from first to last: 302 Elo Number of early draws:
Games : 2100 (finished)
Stockfish Framework 8moves openings:
Avg game length = 438.899 sec
Program Elo + - Games Score Av.Op. Draws 1 Stockfish 10 bmi2 : 3363 19 19 600 73.9 % 3173 44.8 %
Elo-spreading: from first to last: 281 Elo Number of early draws:
Games : 2100 (finished)
For comparsion:
CEGT 40/4 ratinglist (singlecore):
1 Stockfish 10.0 x64 1CPU 3450
Elo-spreading: from first to last: 298 Elo
CCRL 40/4 ratinglist (singlecore):
1 Stockfish 10 64-bit 3498
Elo-spreading: from first to last: 229 Elo by bayeselo. (With ORDO: 276 Elo)
|