Stefan Pohl Computer Chess

private website for chessengine-tests


Lc0 (and other NN-engines) testing

 

Playing conditions:

 

Hardware: i7-8750H (Hexacore) Notebook with RTX 2060 GPU, Windows 10 64bit, 16GB RAM

CPU-Speed: Stockfish with 97% CPU-Speed (to switch off the Intel Turbo Boost): 7.5 MN/s in starting-position, running on 11 threads.

GPU (used by LC Zero): Nvidia RTX 2060 (6GB). LC Zero calculates around 11500 n/s in the starting position (I used the MSI-Afterburner-tool to reduce the speed of the RTX-Card as far as possible) (measured with "go infinte") with Net 32930 (Netsize 20x256), which means a Leela-Ratio (what is Leela Ratio? look here) of 1.3.The Leela-Ratio-value of AlphaZero (used a 20x256 net, too) in the match vs. Stockfish 8 was 1.0 - so 1.3 is a high value, but acceptable.

Hash: 512 MByte for AB-engines and 500.000 size of NNCache for Leela

GUISince 19/09/11: Cutechess-cli (GUI ends game, when a 5-piece endgame is on the board), before: LittleBlitzerGUI (draw at 170 moves, resign at -700cp)

TablebasesNone for engines, 5 Syzygy for cutechess-cli

Openings: 250 HERT openings. Download them here

Large Memory Pages: Off

Ponder: Off

Thinking time: 50'' + 500ms (average game-duration: 3 minutes) The thinking-time is not so short, as it seems: Mention, that the AB-engines are running with 5.5 cores, so around 5x more nodes are calculated. Means around 4'+2.5'' thinking-time on singlethread. So, this is still longer, than the 3'+1'' (singlethread), which are used for my Stockfish-testings on the main site!

 

LC Zero Github (information, download Networks and LC Zero Engine): here

Read more about, how LC Zero works, in the LC0-Blog: here

 

Lc0 (or other NN-engine) plays a gauntlet (3000 games) vs. these 6 AB-engines: Stockfish 190622, Houdini 6, Komodo 13.1, Fire 7.1, Ethereal 11.53, Xiphos 0.5.3. All AB-engines running with 11 threads (=5.5 of 6 CPU-cores), 1 thread is for Windows...

 

Latest update: 2019/11/13: Testrun finished: Lc0 0.22.0 with DarkQueen 2.0 Net (first testrun of a net built out of human (online) games, only) - not very impressive...

Next testrun: Fat Fritz 1.0 - first commercial NN-engine (except from the Patreon-supported Leelenstein-nets).

 

See the individual statistics of engine-results here

See the ORDO-rating of the archive-gamebase with all NN-testruns here

Download all played NN-games here

 

     Program                         Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 190622 bmi2         : 3532    5    5 11500    66.8 %   3401   48.9 %
   2 Lc0 0.22.0 T40B.4-160         : 3525    9    9  3000    67.9 %   3386   47.0 %
   3 Lc0 0.21.3 42850              : 3513   10   10  3000    66.7 %   3385   46.0 %
   4 Lc0 0.21.2 T40.T8.610         : 3508    9    9  3000    66.1 %   3385   46.0 %
   5 Lc0 0.22.0 J13B.2-200         : 3487    9    9  3000    63.3 %   3386   47.7 %
   6 Allie 0.5 LS 11.1             : 3481    9    9  3000    62.5 %   3386   50.3 %
   7 Lc0 0.22.0 LD2                : 3480    9    9  3000    62.6 %   3385   46.9 %
   8 Lc0 0.21.4 32930              : 3459    9    9  3000    60.0 %   3385   50.2 %
   9 Scorpio 3.02 32930            : 3456    9    9  3000    59.3 %   3386   54.4 %
  10 Houdini 6 pext                : 3447    4    4 12500    54.7 %   3410   54.3 %
  11 Lc0 0.22.0 11260              : 3431    9    9  3000    56.3 %   3385   53.8 %
  12 Lc0 0.22.0 384x30-t40-1097    : 3425    9    9  3000    55.3 %   3386   48.2 %
  13 Komodo 13.1 bmi2              : 3425    6    6  8000    56.5 %   3375   50.8 %
  14 Komodo 13.01 bmi2             : 3416    6    6  7500    50.6 %   3412   52.3 %
  15 Lc0 0.22.0 60891              : 3378    9    9  3000    49.0 %   3386   48.3 %
  16 Scorpio 3 NN-Maddex           : 3365    9    9  3000    47.2 %   3386   50.6 %
  17 Fire 7.1 popc                 : 3317    4    4 12500    36.2 %   3420   47.3 %
  18 Xiphos 0.5.3 bmi2             : 3302    4    4 12500    34.2 %   3422   46.9 %
  19 Ethereal 11.53 pext           : 3295    4    4 12500    33.2 %   3422   46.1 %
  20 Lc0 0.22.0 DarkQueen 2.0      : 3171   10   10  3000    23.9 %   3386   33.7 %

 

Net 42850 was the final Net of the 40xxx learning

Net T40.T8.610 played in TCEC Superfinal Season 15

Net 32930 was the final Net of 30xxx learning

Net 11260 was the final Net of 10xxx learning

 

This rating-list was built out of the gamebase of my Stockfish-testings on the main site and the games, Lc0 plays here in it's testruns. Mention that the conditions of both testings are not exactly the same:

Stockfish-testing: 3'+1'' singlecore, HERT openings

Lc0-testing: 50''+500ms RTX 2060 / Hexacore (means around 4'+2.5'' on singlecore for the AB-engines), HERT Openings.

But mention on the other hand, Lc0 and classical AB-engines cannot be tested with the same conditions, because Lc0 runs on the GPU and works in a completely different way, than AB-engines - we have the Leela-Ratio for comparsion, but even a value of 1.0 does not mean exactly fair or same conditions. So, I believe, it is possible to merge both testings in one rating-list...

 

I always believed, that it is better to play more games with faster time-controls, than playing less games with longer time-controls. Because a higher number of games makes the errorbar smaller and the results more valid. But Lc0 is running much slower, than the classical AB-Engines (like Stockfish) - will that lead to distorted results, when lc0 has to play with a very short thinking-time? To answer that question, we can compare the two testruns of lc0 with Net T40.T8.610:

I did a testrun with 50''+500ms (my new testsetting) (short) and with 150''+1500ms (my old testsetting) (long). I did a ORDO-calculation of both in one ratinglist. Here the result: (you can see, that Lc0 does not benefit from the 3x longer thinking-time)

 

     Program                      Elo    +    -   Games   Score   Av.Op.  Draws

   1 BrainFish-2 190531 bmi2    : 3577    9    9  5000    79.7 %   3328   38.1 %
   2 Stockfish 190622 bmi2      : 3531    7    7  5500    73.6 %   3343   43.1 %
   3 Stockfish 190504 bmi2      : 3527    8    8  5100    74.7 %   3329   44.1 %
   4 Stockfish 10 181129        : 3508    6    6 12000    77.2 %   3284   39.5 %
   5 Lc0 0.21.2 T40.T8.610      : 3490    9    9  3000    66.1 %   3365   46.0 % (short)
   6 Lc0 0.21.1 T40.T8.610      : 3486   20   20   700    68.6 %   3337   49.9 % (long)
   7 Stockfish 9 180201         : 3461    8    8  5000    74.9 %   3258   41.7 %
   8 Houdini 6 pext             : 3431    4    4 18600    61.8 %   3337   49.5 %
   9 Komodo 13.01 bmi2          : 3402    6    6  9500    52.3 %   3386   52.3 %
  10 Komodo 12.3 bmi2           : 3395    6    6  9100    59.4 %   3323   50.3 %
  11 Komodo 13.01 MCTS          : 3298    7    7  6000    42.2 %   3358   54.4 %
  12 Fire 7.1 popc              : 3281    4    4 18600    42.3 %   3345   50.3 %
  13 Xiphos 0.5.3 bmi2          : 3272    5    5 12600    34.1 %   3399   48.3 %
  14 Ethereal 11.53 pext        : 3270    7    7  5500    34.6 %   3389   50.0 %
  15 Komodo 12.3 MCTS           : 3261    6    6  8000    43.7 %   3312   47.1 %
  16 Ethereal 11.25 pext        : 3253    5    5 12100    33.0 %   3391   46.2 %
  17 Laser 1.7 bmi2             : 3202    7    7  6100    30.5 %   3357   45.5 %