Stefan Pohl Computer Chess

private website for chessengine-tests


Lc0 (and other NN-engines) testing

 

Playing conditions:

 

Hardware: i7-8750H (Hexacore) Notebook with RTX 2060 GPU, Windows 10 64bit, 16GB RAM

CPU-Speed: Stockfish with 97% CPU-Speed (to switch off the Intel Turbo Boost): 7.5 MN/s in starting-position, running on 11 threads.

GPU (used by LC Zero): Nvidia RTX 2060 (6GB). LC Zero calculates around 11500 n/s in the starting position (I used the MSI-Afterburner-tool to reduce the speed of the RTX-Card as far as possible) (measured with "go infinte") with Netsize 20x256, which means a Leela-Ratio (what is Leela Ratio? look here) of 1.3.The Leela-Ratio-value of AlphaZero (used a 20x256 net, too) in the match vs. Stockfish 8 was 1.0 - so 1.3 is a high value, but acceptable.

Hash: 1 GB for AB-engines and 1.000.000 size of NNCache for Leela

GUI: LittleBlitzerGUI, draw at 160 moves, resign at a -700 cp evaluation.

Tablebases: none

Openings: 4 SuperGM-moves. Download my 4-moves opening-sets and books here

Large Memory Pages: Off

Ponder: Off

Thinking time: 150'' (2.5 minutes) +1500ms (average game-duration: 8-9 minutes)

 

LC Zero Github (information, download Networks and LC Zero Engine): here

Read more about, how LC Zero works, in the LC0-Blog: here

 

Lc0 (or other NN-engine) plays a gauntlet (at least 700 games) vs. these 7 AB-engines: Stockfish 190504, Houdini 6, Komodo 12.3, Fire 7.1, Ethereal 11.25, Xiphos 0.5.3, Laser 1.7.All AB-engines running with 11 threads (=5.5 of 6 CPU-cores), 1 thread is for Windows...

 

Latest update: 2019/06/17 - download all played Lc0/NN-engine games (11522) here

Lc0 0.21.2 BT40(40x256)103: A regression, compared to the older Net BT40(40x256)24, -37 Elo.

 

See the individual statistics of engine-results here

See the ORDO-rating of the archive-gamebase with all NN-testruns here

 

     Program                         Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 190504 bmi2         : 3523    7    7  5724    72.9 %   3342   46.0 %
   2 Lc0 0.21.1 N:42100            : 3523   20   20   700    73.2 %   3334   43.3 %
   3 Stockfish 10 181129           : 3508    7    7  8200    73.8 %   3318   43.8 %
   4 Lc0 0.21.1 N:42488            : 3507   19   19   798    70.6 %   3341   48.5 %
   5 Lc0 0.21.1 N:JH.T8.610        : 3487   19   19   700    68.6 %   3340   49.9 %
   6 Lc0 0.21.1 N:42392            : 3486   20   20   700    68.4 %   3341   48.1 %
   7 Lc0 0.21.1 N:32930            : 3455   19   19   700    65.6 %   3334   49.6 %
   8 Allie 0.4 N:42392             : 3430   19   19   700    61.6 %   3341   55.7 %
   9 Houdini 6 pext                : 3427    6    6  9924    59.4 %   3355   52.9 %
  10 Lc0 0.21.1 N:11260            : 3421   19   19   700    60.7 %   3340   59.4 %
  11 Komodo 13.01 bmi2             : 3406    7    7  5524    56.3 %   3359   53.4 %
  12 Komodo 12.3 bmi2              : 3396    6    6  8400    56.3 %   3348   51.4 %
  13 Lc0 0.21.1 BT40(40x256)24     : 3386   18   18   742    56.1 %   3341   51.5 %
  14 Lc0 0.21.2 BT40(40x256)103    : 3349   18   18   728    51.2 %   3341   58.4 %
  15 Fire 7.1 popc                 : 3286    6    6  9924    39.3 %   3370   52.5 %
  16 Xiphos 0.5.3 bmi2             : 3275    6    6  7724    33.3 %   3404   49.9 %
  17 Ethereal 11.25 pext           : 3258    6    6  9924    35.4 %   3373   49.4 %
  18 Xiphos 0.5 bmi2               : 3247    8    8  5200    33.0 %   3380   46.8 %
  19 Laser 1.7 bmi2                : 3213    7    7  5924    28.3 %   3387   44.3 %

 

This rating-list was built out of the gamebase of my Stockfish-testings on the main site and the games, Lc0 plays here in it's testruns. Mention that the conditions of both testings are not exactly the same:

Stockfish-testing: 3'+1'' singlecore, HERT openings

Lc0-testing: 2.5'+1.5'' RTX 2060 / Hexacore, 4 GMmoves openings

But mention on the other hand, Lc0 and classical AB-engines cannot be tested with the same conditions, because Lc0 runs on the GPU and works in a completely different way, than AB-engines - we have the Leela-Ratio for comparsion, but even a value of 1.0 does not mean exactly fair or same conditions. So, I believe, it is possible to merge both testings in one rating-list...