Weaknesses in eXtreme Gammon?

davidklausa · December 29, 2023, 5:35pm

Frank Berger has talked about some positions where a human can outplay XG. I believe this can sometimes be the case in backgames, and "snake" positions where you're forming a prime in the outfield and gradually rolling it forward. Hopefully Frank sees this and can elaborate. I wonder if BG Blitz is doing anything to address this. And does anyone else have examples?

Since we judge correctness using XG, if a human is outplaying XG how would we even know? Are we just talking about XG making a mistake on + or ++, but getting it right on a rollout?
@bgblitz

bgblitz · December 29, 2023, 6:47pm

A position where a human can outplay XG easily is this one:

XGID=------a----------BBCCBBAn-:1:1:1:00:2:2:0:3:10

As you assumed a snake (just check it with XG, it sees the opponent with the 1 straggler as a 60-70% favorite! a quick 1296 games rollout 71% favorite!). In fact you have about 3% if youre closed out and let's say there are another 2-3% accidents in rolling home the snake. So instead of 60-70% 6% would be correct. BGBlitz sees the snake player as 60-65% favorite, still way off, but far far better.

Another position:

XGID=---a--A--BBB--ABa---BbAaAA:0:0:1:41:5:6:1:7:10

Here even after hours of rollout XG doesn't find bar/24, 20/16* (BGBlitz selects it with 1-5 ply; see here the discussion of several strong humans: DailyGammon -- Please Login ) as the best moves and not only by a small marging but by -0.235.
A rollout reduces random bias but it can't reduce systematical bias. If a bot doesn't understand a position a rollout helps only if the bot understands the positions after the move, otherwise your wasting computing time. In the opening or in the early game a rollout is valuable, in such difficult position it's pure waste and inbetween you never know how much bias you have, so I don't see much value against an XG++. But sure it is an issue with BMAB, UBC etc.

I believe BGBlitz is a dash worse on average than XG but it is very robust in extreme position types (at least I'm not aware of positions with clueless play). As quality assurance I let play BGBlitz matches against itself and analyze them with XG. Everything flagged as an error I analyze deeper and in about a 3rd XG finally agrees with BGB (both bots agree soo often, depressing for a mere human). When you see jumping values of XG's evaluation for different plies (or any other bot), alarm signs should go on).

I started this week training of a new AI (long overdue, the current one is from 2016) and my goal is to improve for backgames and containment games especially, but you'll never know whether you succeed, to some part it's still a bit black magic.

davidklausa · December 30, 2023, 12:10am

Thanks for the reply. That snake position is fun to play out. If the cube is in play, the computer with a straggler doubles immediately, then loses... Interestingly, once the snake is successfully rolled about halfway home, XG sees that you are a favorite, and recommends D/P:

XGID=-------A-aBBCBBB-A------n-:0:0:1:00:0:0:0:3:10

I wonder if there is a mistake in the second position,
XGID=—a–A–BBB–ABa—BbAaAA:0:0:1:41:5:6:1:7:10
-because I don't see a checker on the bar. I wasn't able to log in to dailygammon to verify it.

Good points about rollouts: if the position isn't played correctly, it doesn't matter how many times it's played!

That is very cool that you've ironed out some of those problems in BGBlitz! And with the fact that you are actively working to improve it, I'm beginning to see this as a real competitor to XG. Embrace the black magic and something good should come of it!

bgblitz · December 30, 2023, 6:34pm

It seems some characters in the XGID where optimized by the website. Here it is once again: XGID=---a--A--BBB--ABa---BbAaAA:0:0:1:41:5:6:1:7:10
difficult_pos1

To be safe additionally as a picture. BTW it is not a synthetic position, it’s from a match. It’s also unusual and far more difficult to judge for mere mortals, at least playing b/24,20/16* seems pretty obvious: you have to avoid that the checker on the 9 could be saved. As mentioned XG doesn’t see it even after hours of rollout, but here I assume no one knows the truth (BGBlitz get’s it even with 1-ply but whether the evaluation is correct? I don’t know).
Therefore if you play for BMAB or UTC avoid backgames.

BTW you have to register to read the Dailygammon forum. I like Dailygammon and it reduces my productivity since nearly 20 years. It’s like corespondence chess on steroids and either you love it or you hate it. The killer feature for me: if i have 5 minutes time inbetween I can make some moves, I don’t have to sit down for an hour or 2.

davidklausa · December 31, 2023, 5:41pm

Great advice! I also tend to split rather than play two down with 43 or 32 because it's easier to play low PR from a holding game. And to split rather than slot with a 21, to avoid the complications of more checkers back. But if I'm playing an opponent that might get confused and it's not a BMAB event, I try to complicate and gladly embrace backgames.

hemes · January 2, 2024, 5:00pm

Sorry Frank! The software defaults to some "convenient" auto-corrections. I had made a relevant post here noting the issue BUT, after asking for help on the Discourse Meta forums, found a way to disable the markdown typographical auto-corrections!

We lose auto curly quotes and copyright/trademark symbols, etc. but I feel this is a (very) small price to pay to eliminate XGID pasting errors

Should be fixed now; sorry for the inconvenience.

bgblitz · January 2, 2024, 5:40pm

No problem at all:

using preformatted text doesn't do optimizations (I hope I remember)
BGBlitz will unoptimze some UTF-20xx characters. I don't know whther all site make two dash to UTF 2013 and 3 dash to UTF 2014 but at least it works for here

hemes · January 2, 2024, 5:42pm

using preformatted text doesn't do optimizations (I hope I remember)

With the new site settings, there is no need to use preformatted text blocks. While helpful to call attention to the ID and provide some separation, the site should no longer make the em dash substitution

BGBlitz will unoptimze some UTF-20xx characters. I don't know whther all site make two dash to UTF 2013 and 3 dash to UTF 2014 but at least it works for here

That is a pretty cool feature! XG, does not do this (as made obvious by David's issues)

bgblitz · August 21, 2024, 12:46pm

some new infos on the XGID=---a--A--BBB--ABa---BbAaAA:0:0:1:41:5:6:1:7:10 position. In the original discussion Nack Ballard agreed on hitting on 16.

In a recent great tournamen in Aachen, Germany (highly recommended: Redirecting... ) I had the opportunity to show Mochy the position. He wasn't sure and suggested 22-18 but wanted to discuss the position with two buddies (Shahab Ghodsi and Jim Pasko). After the discussion they agreee on 20-16*

And, for the rollout enthusiasts: XG is 100% sure about 10-6.

davidklausa · August 21, 2024, 2:34pm

I had a "long snake" position in my match against Wolfgang Bacher in Stockholm last month. My big mistake was focusing too much on forming a perfect outside 6-prime, and failing to start making my inner board early enough. I learned it is right to make the 6pt when you can, even if there is a gap in your prime. It doesn't come up often, but it's worth remembering.

Aachen looked like a great tournament! It's on my radar.

bgblitz · August 21, 2024, 6:28pm

Do you have recorded the game? it would be interesting to see.

davidklausa · August 21, 2024, 6:46pm

Yes I do. It doesn't seem there's a way for me to upload a match here but I'm putting it on BG Studio.
In my game, trying to play for the prime was especially difficult since I had my ace point made the whole time!

bgblitz · August 23, 2024, 8:54pm

you ay send it to frank at bgblitz.com

I like to see deep backgames.

bgblitz · August 28, 2024, 9:51pm

It's an interesting one. From a backgame from Mochy 2 or 3 years ago I learned how to prime in the outfield and roll it home. In your game with 2 checkers on the 1 it is much harder to build a prime. In a BMAB game it can ruin your PR easily

davidklausa · August 29, 2024, 5:49pm

Agreed. I realized that every play had blunder potential but didn't want to burn all my clock on Game 1. Fortunately it wasn't for BMAB, although if I did submit all my matches from the event it would've helped my average a lot.

bgblitz · September 1, 2024, 5:30pm

I guess you should avoid backgames for BBMAB at all cost
Not only it is really difficult to play, you also should be aware of possible misjudegements bei XG.... to difficult

hemes · September 15, 2024, 1:51am

I wonder how much PR it "costs" to avoid a backgame versus just accepting fate and playing it through.

I imagine its a losing battle and your best bet is to just get "decision lucky," thus avoiding the issue altogether.

bgblitz · September 15, 2024, 5:53pm

very hard to answer. How do you measure it? Only if you have one or more human experts that analyze the positions in question and come to the same conclusion as the bot you use for crosschecking you might get an idea.

Hermes · September 29, 2024, 2:19pm

Interesting you point this out Frank. I played a match the other day and was in a 'snake' type position. XG regarded the position very differently than BGBlitz did. XG does seem to misjudge back games and certain positions. I think the problem does lie in the fact it misjudged the winning chances on both sides which then throws off the analysis just enough

bgblitz · September 29, 2024, 3:48pm

One player far better than me said in an email to me that XG plays backgames o.k. but the issues start after the hit. I have to few examples for a general statement but I guess some extreme backgames and deep containment games might be misplayed.

It is also very difficult to say whether BGBlitz plays it better. The above position from the Kauder-paradox derived is easy (although the 3% accidents might be a bit to small), and I have another one where rollouts doesn't fix the misplay (the right play is regarded as 300 error) and I have some authorities that agree, but how often you have the opportunity to show that to a couple of Giants?

Have you recorded the match, I would like to hava a copy