Stefan Pohl Computer Chess

private website for chessengine-tests


Lc0 (and other NN-engines) testing

 

Playing conditions:

 

Hardware: i7-8750H (Hexacore) Notebook with RTX 2060 GPU, Windows 10 64bit, 16GB RAM

CPU-Speed: Stockfish with 97% CPU-Speed (to switch off the Intel Turbo Boost): 7.5 MN/s in starting-position, running on 11 threads.

GPU (used by LC Zero): Nvidia RTX 2060 (6GB). LC Zero calculates around 11500 n/s in the starting position (I used the MSI-Afterburner-tool to reduce the speed of the RTX-Card as far as possible) (measured with "go infinte") with Netsize 20x256, which means a Leela-Ratio (what is Leela Ratio? look here) of 1.3.The Leela-Ratio-value of AlphaZero (used a 20x256 net, too) in the match vs. Stockfish 8 was 1.0 - so 1.3 is a high value, but acceptable.

Hash: 512 MByte for AB-engines and 500.000 size of NNCache for Leela

GUI: LittleBlitzerGUI, draw at 160 moves, resign at a -700 cp evaluation.

Tablebases: none

Openings: 250 HERT openings. Download them here

Large Memory Pages: Off

Ponder: Off

Thinking time: 50'' + 500ms (average game-duration: 3 minutes) The thinking-time is not so short, as it seems: Mention, that the AB-engines are running with 5.5 cores, so around 5x more nodes are calculated. Means around 4'+2.5'' thinking-time on singlecore. So, this is still longer, than the 3'+1'' (singlecore), which are used for my Stockfish-testings on the main site!

 

LC Zero Github (information, download Networks and LC Zero Engine): here

Read more about, how LC Zero works, in the LC0-Blog: here

 

Lc0 (or other NN-engine) plays a gauntlet (3000 games) vs. these 6 AB-engines: Stockfish 190622, Houdini 6, Komodo 12.3, Fire 7.1, Ethereal 11.53, Xiphos 0.5.3. All AB-engines running with 11 threads (=5.5 of 6 CPU-cores), 1 thread is for Windows...

 

Latest update: 2019/07/19: Testrun finished: Lc0 0.21.2 N:42741. +12 Elo to Net T40.T8.610 (TCEC Superfinal Season 15 Net).

See the individual statistics of engine-results here

See the ORDO-rating of the archive-gamebase with all NN-testruns here

Download all played NN-games here

 

     Program                    Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 190622 bmi2    : 3529    7    7  6000    71.5 %   3357   44.8 %
   2 Stockfish 10 181129      : 3508    7    7  6000    73.8 %   3320   44.8 %
   3 Lc0 0.21.2 42741         : 3503    9    9  3000    67.5 %   3366   46.7 %
   4 Lc0 0.21.2 T40.T8.610    : 3491    9    9  3000    66.1 %   3366   46.0 %
   5 Houdini 6 pext           : 3430    5    5  8000    56.8 %   3378   53.5 %
   6 Komodo 13.01 bmi2        : 3404    5    5  8000    53.1 %   3381   51.6 %
   7 Fire 7.1 popc            : 3285    5    5  8000    35.8 %   3396   50.4 %
   8 Xiphos 0.5.3 bmi2        : 3275    5    5  8000    34.4 %   3397   49.2 %
   9 Ethereal 11.53 pext      : 3273    6    6  6000    33.7 %   3400   48.8 %
  10 Ethereal 11.25 pext      : 3252    7    7  6000    30.7 %   3405   46.7 %

 

Net T40.T8.610 was used in TCEC Superfinal Season 15

 

This rating-list was built out of the gamebase of my Stockfish-testings on the main site and the games, Lc0 plays here in it's testruns. Mention that the conditions of both testings are not exactly the same:

Stockfish-testing: 3'+1'' singlecore, HERT openings

Lc0-testing: 50''+500ms RTX 2060 / Hexacore (means around 4'+2.5'' on singlecore for the AB-engines), HERT Openings.

But mention on the other hand, Lc0 and classical AB-engines cannot be tested with the same conditions, because Lc0 runs on the GPU and works in a completely different way, than AB-engines - we have the Leela-Ratio for comparsion, but even a value of 1.0 does not mean exactly fair or same conditions. So, I believe, it is possible to merge both testings in one rating-list...

 

I always believed, that it is better to play more games with faster time-controls, than playing less games with longer time-controls. Because a higher number of games makes the errorbar smaller and the results more valid. But Lc0 is running much slower, than the classical AB-Engines (like Stockfish) - will that lead to distorted results, when lc0 has to play with a very short thinking-time? To answer that question, we can compare the two testruns of lc0 with Net T40.T8.610:

I did a testrun with 50''+500ms (my new testsetting) (short) and with 150''+1500ms (my old testsetting) (long). I did a ORDO-calculation of both in one ratinglist. Here the result: (you can see, that Lc0 does not benefit from the 3x longer thinking-time)

 

     Program                      Elo    +    -   Games   Score   Av.Op.  Draws

   1 BrainFish-2 190531 bmi2    : 3577    9    9  5000    79.7 %   3328   38.1 %
   2 Stockfish 190622 bmi2      : 3531    7    7  5500    73.6 %   3343   43.1 %
   3 Stockfish 190504 bmi2      : 3527    8    8  5100    74.7 %   3329   44.1 %
   4 Stockfish 10 181129        : 3508    6    6 12000    77.2 %   3284   39.5 %
   5 Lc0 0.21.2 T40.T8.610      : 3490    9    9  3000    66.1 %   3365   46.0 % (short)
   6 Lc0 0.21.1 T40.T8.610      : 3486   20   20   700    68.6 %   3337   49.9 % (long)
   7 Stockfish 9 180201         : 3461    8    8  5000    74.9 %   3258   41.7 %
   8 Houdini 6 pext             : 3431    4    4 18600    61.8 %   3337   49.5 %
   9 Komodo 13.01 bmi2          : 3402    6    6  9500    52.3 %   3386   52.3 %
  10 Komodo 12.3 bmi2           : 3395    6    6  9100    59.4 %   3323   50.3 %
  11 Komodo 13.01 MCTS          : 3298    7    7  6000    42.2 %   3358   54.4 %
  12 Fire 7.1 popc              : 3281    4    4 18600    42.3 %   3345   50.3 %
  13 Xiphos 0.5.3 bmi2          : 3272    5    5 12600    34.1 %   3399   48.3 %
  14 Ethereal 11.53 pext        : 3270    7    7  5500    34.6 %   3389   50.0 %
  15 Komodo 12.3 MCTS           : 3261    6    6  8000    43.7 %   3312   47.1 %
  16 Ethereal 11.25 pext        : 3253    5    5 12100    33.0 %   3391   46.2 %
  17 Laser 1.7 bmi2             : 3202    7    7  6100    30.5 %   3357   45.5 %