Stefan Pohl Computer Chess

Home of famous UHO openings and EAS Ratinglist


Lc0 or other GPU-Neural Nets versus Stockfish 15.1 testing

 

The evaluation of the UHO 2024 openings started. NN-testing had to be suspended, because the PC is needed for the evaluation. Estimated time needed: around 75-80 days from today (2023/11/14), if all works without crashes or other problems...

 

Playing conditions:

 

Hardware: Ryzen 7 6800H 2.6GHz Notebook, RTX 3060 GPU, Windows 11 64bit, 32GB RAM

Cuda version installed: Cuda 11.7

Speed:  Stockfish 15.1 plays with 14 Threads (=7 cores) and reaches 10 MN/s in the middlegame. Lc0 minibatchsize parameter is set to the best value for each netsize, depending on Lc0's benchmark with backendbench --clippy.

Hash: 2 GB Hash for Stockfish 15.1 / 8192 RamLimitMb for Lc0

GUICutechess-cli (GUI ends game, when a 5-piece endgame is on the board)

Tablebases: None for engines, 5 Syzygy for cutechess-cli

Openings: UHO_2022_6mvs_+120_+129.pgn. Download my UHO 2022 openings here

Ponder, Large Memory Pages & learning: Off

Thinking time: 2min+2sec for Lc0 and 1min+1sec for Stockfish 15.1: I measured nps on my system and compared these values with the TCEC: My CPU is way too fast, compared with Lc0 running on my RTX 3060 GPU, so it makes sense to set the thinking-time of Stockfish to only 50% of the thinking-time of Lc0. For compensating the fast CPU and the fact, that in TCEC Lc0 benefits from fast hardware and long thinking-time (both is better for Lc0, not for Stockfish)

One testrun takes around nearly 5 days. Average game-duration: 6min 45sec

 

Each Lc0 / Neural Net plays 1000 games vs. Stockfish 15.1 with my UHO 2022 openings

 

Learn more about Lc0 (getting started in a GUI, links to net-downloads, FAQs, development-informations and the Leela-Blog) here

 

Latest update: 2023/11/04: Lc0 0.31dev BT3-2860000 (small regression, compared to TCEC 25 SuFi-net (BT3-2860000 is the successor of this net))

 

Download all played games (games of the old test-setups, too): here

     Program                              Elo    +    -  Games    Score   Av.Op. Draws

   1 Stockfish 15.1 avx2                :    0    4    4 16000    58.6%    -61   49.4%
   2 Lc0 0.31dev TCEC 25 SuFi           :  -21   15   15  1000    47.0%      0   49.8%
   3 Lc0 0.31dev TCEC 25                :  -22   15   15  1000    46.9%      0   52.3%
   4 Lc0 0.31dev BT3-2860 (15x768)      :  -35   14   14  1000    45.0%      0   50.6%
   5 Lc0 0.30dev T1-4000 (15x768)       :  -39   15   15  1000    44.5%      0   49.8%
   6 Lc0 0.30dev 811107 (19x512)        :  -41   15   15  1000    44.1%      0   46.1%
   7 Lc0 0.30dev TCEC 24                :  -42   15   15  1000    44.1%      0   51.0%
   8 Lc0 0.30rc1 T1-4000 (15x768)       :  -44   16   16  1000    43.7%      0   49.8%
   9 Lc0 0.30dev T1-30875 (15x768)      :  -45   14   14  1000    43.5%      0   47.5%
  10 Lc0 0.30dev BT2-4510 (15x768)      :  -45   16   16  1000    43.5%      0   47.5%
  11 Lc0 0.30.0 815863 (15x768)         :  -73   15   15  1000    39.8%      0   47.8%
  12 Lc0 0.30rc2 814174 (15x768)        :  -80   16   16  1000    38.8%      0   51.0%
  13 Lc0 0.30dev 813207 (15x768)        :  -84   16   16  1000    38.3%      0   49.6%
  14 Lc0 0.30dev TCEC 20                :  -90   16   16  1000    37.5%      0   50.5%
  15 Lc0 0.30dev T1-2432500 (10x256)    :  -94   15   15  1000    36.9%      0   47.2%
  16 Lc0 0.30dev TCEC 22                :  -95   15   15  1000    36.8%      0   49.4%
  17 Lc0 0.30dev TCEC 18                : -133   15   15  1000    31.9%      0   50.5%


Games        : 16000 (finished)

White Wins   : 8032 (50.2 %)
Black Wins   : 64 (0.4 %)
Draws        : 7904 (49.4 %)

 

Below the gamebase recalculated with my Gamepairs Rescorer Batch-Tool. Realizing Vondele's (Stockfish maintainer) idea: "Thinking uniquely in game pairs makes sense with the biased openings used these days. While pentanomial makes sense it is a bit complicated so we could simplify and score game pairs only (not games) as W-L-D (a traditional  score of 2-0, or 1.5-0.5 is just a W)."

   # PLAYER                             :  RATING  ERROR  PLAYED     W     D    L   (%)  CFS(%)
   1 Stockfish 15.1 avx2                :       0   ----    8000  3499  3748  753  67.2     100
   2 Lc0 0.31dev TCEC 25 SuFi           :     -44     22     500    78   282  140  43.8      52
   3 Lc0 0.31dev TCEC 25                :     -44     20     500    85   267  148  43.7      96
   4 Lc0 0.31dev BT3-2860 (15x768)      :     -70     21     500    72   257  171  40.1      72
   5 Lc0 0.30dev T1-4000 (15x768)       :     -79     21     500    62   265  173  38.9      59
   6 Lc0 0.30dev 811107 (19x512)        :     -83     22     500    53   278  169  38.4      61
   7 Lc0 0.30dev TCEC 24                :     -87     22     500    56   266  178  37.8      57
   8 Lc0 0.30rc1 T1-4000 (15x768)       :     -90     23     500    62   250  188  37.4      56
   9 Lc0 0.30dev T1-30875 (15x768)      :     -93     23     500    60   251  189  37.1      54
  10 Lc0 0.30dev BT2-4510 (15x768)      :     -94     22     500    60   249  191  36.9     100
  11 Lc0 0.30.0 815863 (15x768)         :    -151     25     500    34   229  237  29.7      83
  12 Lc0 0.30rc2 814174 (15x768)        :    -168     25     500    28   221  251  27.7      67
  13 Lc0 0.30dev 813207 (15x768)        :    -176     25     500    21   226  253  26.8      78
  14 Lc0 0.30dev TCEC 20                :    -190     25     500    25   203  272  25.3      77
  15 Lc0 0.30dev T1-2432500 (10x256)    :    -202     24     500    20   200  280  24.0      59
  16 Lc0 0.30dev TCEC 22                :    -206     26     500    25   186  289  23.6     100
  17 Lc0 0.30dev TCEC 18                :    -315     31     500    12   118  370  14.2     ---

------------------------------------------------------------------- 
--- Number of all Gamepairs          : 8000 
--- Number of drawn Gamepairs overall: 3748 (= 46.85%) 
--- Number of 1:1 drawn Gamepairs    : 1907 (= 23.84%) 
--- Number of 2-draws drawn Gamepairs: 1841 (= 23.01%) 
------------------------------------------------------------------- 

You can download my Gamepairs Rescorer Tool right here

 

Mention, that this is not a ratinglist, but only a performance test of Lc0 with different NNs versus Stockfish. For a real ratinglist including Lc0 running on a RTX-GPU (with a valid Leela-Ratio of 1.0), please visit Andreas Strangmueller's excellent website. Just click here