Weaknesses in eXtreme Gammon?

Frank Berger has talked about some positions where a human can outplay XG. I believe this can sometimes be the case in backgames, and "snake" positions where you're forming a prime in the outfield and gradually rolling it forward. Hopefully Frank sees this and can elaborate. I wonder if BG Blitz is doing anything to address this. And does anyone else have examples?

Since we judge correctness using XG, if a human is outplaying XG how would we even know? Are we just talking about XG making a mistake on + or ++, but getting it right on a rollout?
@bgblitz

A position where a human can outplay XG easily is this one:

XGID=------a----------BBCCBBAn-:1:1:1:00:2:2:0:3:10

As you assumed a snake (just check it with XG, it sees the opponent with the 1 straggler as a 60-70% favorite! a quick 1296 games rollout 71% favorite!). In fact you have about 3% if youre closed out and let's say there are another 2-3% accidents in rolling home the snake. So instead of 60-70% 6% would be correct. BGBlitz sees the snake player as 60-65% favorite, still way off, but far far better.

Another position:

XGID=---a--A--BBB--ABa---BbAaAA:0:0:1:41:5:6:1:7:10

Here even after hours of rollout XG doesn't find bar/24, 20/16* (BGBlitz selects it with 1-5 ply; see here the discussion of several strong humans: DailyGammon -- Please Login ) as the best moves and not only by a small marging but by -0.235.
A rollout reduces random bias but it can't reduce systematical bias. If a bot doesn't understand a position a rollout helps only if the bot understands the positions after the move, otherwise your wasting computing time. In the opening or in the early game a rollout is valuable, in such difficult position it's pure waste and inbetween you never know how much bias you have, so I don't see much value against an XG++. But sure it is an issue with BMAB, UBC etc.

I believe BGBlitz is a dash worse on average than XG but it is very robust in extreme position types (at least I'm not aware of positions with clueless play). As quality assurance I let play BGBlitz matches against itself and analyze them with XG. Everything flagged as an error I analyze deeper and in about a 3rd XG finally agrees with BGB (both bots agree soo often, depressing for a mere human). When you see jumping values of XG's evaluation for different plies (or any other bot), alarm signs should go on).

I started this week training of a new AI (long overdue, the current one is from 2016) and my goal is to improve for backgames and containment games especially, but you'll never know whether you succeed, to some part it's still a bit black magic.

Thanks for the reply. That snake position is fun to play out. If the cube is in play, the computer with a straggler doubles immediately, then loses... Interestingly, once the snake is successfully rolled about halfway home, XG sees that you are a favorite, and recommends D/P:

XGID=-------A-aBBCBBB-A------n-:0:0:1:00:0:0:0:3:10

I wonder if there is a mistake in the second position,
XGID=—a–A–BBB–ABa—BbAaAA:0:0:1:41:5:6:1:7:10
-because I don't see a checker on the bar. I wasn't able to log in to dailygammon to verify it.

Good points about rollouts: if the position isn't played correctly, it doesn't matter how many times it's played!

That is very cool that you've ironed out some of those problems in BGBlitz! And with the fact that you are actively working to improve it, I'm beginning to see this as a real competitor to XG. Embrace the black magic and something good should come of it!

It seems some characters in the XGID where optimized by the website. Here it is once again: XGID=---a--A--BBB--ABa---BbAaAA:0:0:1:41:5:6:1:7:10
difficult_pos1

To be safe additionally as a picture. BTW it is not a synthetic position, it’s from a match. It’s also unusual and far more difficult to judge for mere mortals, at least playing b/24,20/16* seems pretty obvious: you have to avoid that the checker on the 9 could be saved. As mentioned XG doesn’t see it even after hours of rollout, but here I assume no one knows the truth (BGBlitz get’s it even with 1-ply but whether the evaluation is correct? I don’t know).
Therefore if you play for BMAB or UTC avoid backgames.

BTW you have to register to read the Dailygammon forum. I like Dailygammon and it reduces my productivity since nearly 20 years. It’s like corespondence chess on steroids and either you love it or you hate it. The killer feature for me: if i have 5 minutes time inbetween I can make some moves, I don’t have to sit down for an hour or 2.

1 Like

Great advice! I also tend to split rather than play two down with 43 or 32 because it's easier to play low PR from a holding game. And to split rather than slot with a 21, to avoid the complications of more checkers back. But if I'm playing an opponent that might get confused and it's not a BMAB event, I try to complicate and gladly embrace backgames.

Sorry Frank! The software defaults to some "convenient" auto-corrections. I had made a relevant post here noting the issue BUT, after asking for help on the Discourse Meta forums, found a way to disable the markdown typographical auto-corrections!
image
We lose auto curly quotes and copyright/trademark symbols, etc. but I feel this is a (very) small price to pay to eliminate XGID pasting errors :stuck_out_tongue_winking_eye:

Should be fixed now; sorry for the inconvenience.

No problem at all:

  • using preformatted text doesn't do optimizations (I hope I remember)
  • BGBlitz will unoptimze some UTF-20xx characters. I don't know whther all site make two dash to UTF 2013 and 3 dash to UTF 2014 but at least it works for here :slight_smile:

using preformatted text doesn't do optimizations (I hope I remember)

With the new site settings, there is no need to use preformatted text blocks. While helpful to call attention to the ID and provide some separation, the site should no longer make the em dash substitution

BGBlitz will unoptimze some UTF-20xx characters. I don't know whther all site make two dash to UTF 2013 and 3 dash to UTF 2014 but at least it works for here

That is a pretty cool feature! XG, does not do this (as made obvious by David's issues)