Using LC0's WDL to Analyze Games

Comments on lichess.org/@/jk_182/blog/using-lc0s-wdl-to-analyze-games/spKmwjw5

White had more piece activity and was dominating the chessboard.
Domination: White 41.10% vs Black 6.85%
Lucas chess analysis shows it in other ways too.
I like the graph: Elo Average for each move.
So you see two lines.

RwSF75

I don't think Lichess's graphs are centipawn, I think they are WDL too.
They are most likely using the same model they use to determine blunders and how the eval bar should be filled.
This means that it is damped by this model, which says that a player needs an eval of +3 to have a 50% chance of winning.

If on the other hand you use Stockfish's built in WDL model which determines that a player needs an eval of +1 to have a 50% chance of winning you get much different charts which look wild just like Leela's does.
imgur.com/a/gUmscVD

Craftyawesome

I'd argue the Ponomariov-Carlsen graph is quite good. It shows far better chances for black than white for some moves. And would likely show even more by setting calibration elo.

@RwSF75 said in #3:
> I don't think Lichess's graphs are centipawn, I think they are WDL too.
> They are most likely using the same model they use to determine blunders and how the eval bar should be filled.
> This means that it is damped by this model, which says that a player needs an eval of +3 to have a 50% chance of winning.
>
> If on the other hand you use Stockfish's built in WDL model which determines that a player needs an eval of +1 to have a 50% chance of winning you get much different charts which look wild just like Leela's does.
> imgur.com/a/gUmscVD
+3 doesn't sound right. IIRC it was 1.x measured with some version of SF.
SF's WDL model also gives fairly nice graphs, but it has a few limitations that Leela's doesn't. It is sharper than Lc0's because of the higher strength of fishtest LTC games compared to leela training games, and has no configuration to change it. Also SF's WDL model has no direct way to distinguish between a dead draw 0.00 and a fighting 0.00 (other than ply in most versions, or material in the latest dev).

RwSF75

@Craftyawesome said in #4:
> +3 doesn't sound right. IIRC it was 1.x measured with some version of SF.

rawWinningChances(300) returns 50%
The model they use is based on Lichess 2300+ Elo rated rapid games from June 2022.

lichess.org/page/accuracy
github.com/lichess-org/lila/blob/038ad6281e1456def0cef78a2f8c0f5457953093/ui/ceval/src/winningChances.ts#L5-L11
github.com/lichess-org/lila/pull/11148

Craftyawesome

@RwSF75 said in #5:
> rawWinningChances(300) returns 50%
> The model they use is based on Lichess 2300+ Elo rated rapid games from June 2022.
>
> lichess.org/page/accuracy
> github.com/lichess-org/lila/blob/038ad6281e1456def0cef78a2f8c0f5457953093/ui/ceval/src/winningChances.ts#L5-L11
> github.com/lichess-org/lila/pull/11148
Huh, I guess you're right. Maybe I thought they were using internal units and not centipawns?

dboing

edited

I agree that in the limit of estimation processes involved in machine statistics about chess, the WDL has more information than estimates over specific engine tournament pools dominated by centipawn born technology.

In theory.

GnocchiPup

How were the WDL graphs made? Nibbled?

GnocchiPup

edited

As for lichess graphs, these aren't WDL
They're more like W + 50%D.

We have no idea about the draw%.
Let's say Lichess gives 50%, it could mean 50% white win 50% black win or 100% draw or anywhere in between.

dboing

#10

too late to edit. I am also hopeful that the centipawn born engines will scout behind for feature sets that can become interpretable.... while they try to fit to the engine tournament formats and usual computer cost constraints.