The Diplomacy World Interview with David Norman by Jim Burgess

Jim Burgess (JB): Thanks, David, for being willing to talk about this.

David Norman (DN): No problem.

Just to add to that description, I think it's worth repeating the opening of the article you cite. The key aim of the DAIDE project was not so much to centralise efforts, as to provide a framework for development where Bots could easily compete against each other and against humans. Before the DAIDE project, there had already been two hobby projects to develop a Diplomacy AI - Danny Loeb's DPP, and then Sean Lorber's SeaNail. Both of these had had a huge amount of development effort put into them, but neither had that much use, as the only way for them to play in a game was for a person to manage the program, entering results from the game into the AI, and then submitting the orders generated by the AI to the GM.

So, the DAIDE project set out to provide an environment where AIs could be developed, and then play against each other and against humans. By allowing them to play a lot more games, we could not just develop AIs, but also find out how well they were playing, and refine and improve them.

JB: Let me ask a general question first about the current scope of the project.  I know that there are currently 195 members of the Yahoo Group, though many of them are like me, who do not intend to actually program a bot themselves.  Roughly how many working bots have been designed to your knowledge and how many of the 195 group members would you classify as active programmers?

DN: There have been ten Bots developed so far, by nine different authors, although of course for each of those Bots, there are many different versions. The Bots vary from DumbBot, the first Bot produced by the project, which I wrote in two days, to Albert, which Jason van Hal wrote recently, and is the best Bot to date. Playing a no-press game against six Alberts is very difficult. And I should emphasise, the Bots do not know that six of the powers are being played by the same Bot, or which power is the human player.

JB: You've developed what I think is a neat three letter token language syntax that to me strikes a near perfect balance (especially for this development phase) in being computer program readable and human readable but expresses most all of the types and levels of negotiation that most players use in working out tactics and cooperation on the board.  Could I get some of your thoughts today on how well this is working in the actual operation and negotiation between bots in tests you've seen?

DN:So far, it hasn't been used that much by the Bots.

The language is split up into 13 levels of increasing complexity - from the first level - where all you can do is offer an alliance, and the second level where you can suggest specific orders, up to the top levels where you can ask for an explanation of a power's press or orders, and pass on messages that you've received from other powers. By splitting it into levels, you can have games where only language up to a certain level is allowed, allowing Bots to build up their press capabilities in stages, and still compete with more advanced Bots.

So far, none of the Bots can handle more than the bottom two levels.

Having said that, we have had one game using the full language - we had a game between seven human players where the only negotiation allowed was in the DAIDE language - mainly to test the language and find any problems with it before Bots started to use it. This was easier to do than it sounds, because the DAIDE Mapper has a press entry system which allows you to enter press in English by selecting from a list of options, and then translates to and from the tokenised language for you. And of course, we found a number of problems - mostly questions which could be asked but there was no way to express the answer you wanted to give!

JB: Of those, about how many have implemented language syntax above Level 0 (no press)?

DN:Of the ten Bots that have been written so far, seven are no-press only, and three support some press. But as I said, none of these three can handle more than the first two press levels.
JB: Do you feel that current playtest efforts around these have pushed the negotiation side of the project very far to date?  As a non-programmer, participator in group discussion on dipai, I've not seen that much discussion on this, or are people mostly trying to master the efforts to evaluate and improve coordinated tactical movement amongst one's own units?

DN:Yes, the tactical and strategic side is receiving a lot more focus at the moment.

There are two theories on how to write a Diplomacy AI that negotiates. The first is that you need to understand the tactical and strategic side of the game. Once you understand that, you can then understand where cooperation would improve both your prospects, and then that is the foundation for your negotiation. The second is that you negotiate with your neighbours. The agreements you make with them determines your strategy and tactics.

Currently, the first theory seems to be prominent, so people are concentrating on putting all their effort in writing a Bot that can play no-press well, with the expectation that once that works well, press will follow on.

Of course, there is a third theory that the two sides need to feed into each other. But that's well beyond anything anybody's trying to do at the moment!

JB: One of the things that strikes me is the sheer range of types and goals of programming that must be accomplished to design a good bot, it seems to me that more "jointly designed" bots where one person worked on one piece while someone else works on something else (with an understood and planned for goal of integration) would push things forward faster.  This was what Daniel Loeb was doing in the early 1990's in the original Diplomacy Programming Project as he had numerous students working for him on various parts.  One failure in that was the "coordination" part, so there always is a tradeoff between the single mind of a designer and a group effort.  What do you think of the joint design/single designer issue, both historically and in the future of DAIDE?

DN:In the long term, I think the best Bots will have to be a joint development - there's just too much involved for a single person to write it. But the disadvantage of a joint project is that you're unlike to get several competing joint projects - and at the moment, nobody knows the best way to write a Dip AI. So for the moment, I think we are better off with people doing their own thing, letting the different results play each other, and learn what works and what doesn't.

JB: My understanding of the mapping is that DAIDE would support variant maps (variant rules might be a bit more problematic), but I think one really good use for Diplomacy bots would be in playtesting maps to get general senses of balance between powers.  Most playtests are extremely limited while it is easily possible to run thousands of DAIDE games on a variant map to test its characteristics.  I think I actually have a series of questions about this.  First, do most of the bots people are designing have the capability of operating on other maps?

DN: As far as I know, they all do.

One of the early decisions we made, was that the project should not be limited to the standard map, as this may lead to Bots that are coded to take advantages of the public knowledge and specific features of the standard map (such as coding the opening book, the stalemate lines, etc), rather than learning how to take a map and work out the features on the fly.

Hence there is very little to do to make a Bot handle all maps. The full definition of the map is sent to the AI from the server when it connects (whether it's a variant or standard).

JB: And to the extent they do, it seems it wouldn't be hard to code them into your mapper, would it?

DN: The easiest way to code a new variant map, is to enter it into MapMaker (www.ellought.demon.co.uk/mapmaker.htm). From there, I have a process which can fairly quickly convert it into all the files required by the server and the mapper. Plus MapMaker has a lot of internal checking built in, which will pick up a lot of the common errors made when defining a map.

Entering a variant the size of Standard into MapMaker takes about an hour.

JB: Given current bot capability, do you think a variant map designer would learn much from repeated bot tests of their maps in the design phase?  How do bots do at replicating some of the statistics on regular Diplomacy games (realizing that there are large differences in those across playing groups across time)?

DN: With the early Bots, it definitely wasn't worth it. There was a huge disadvantage to playing some powers. For instance, playing as Austria or Germany against six DumbBots is pretty difficult, as you tend to get attacked from all sides, while playing England, France, Italy or Turkey against six DumbBots is extremely easy - and if you set seven DumbBots playing against each other, it'd almost always be one of those four that won.

But as the Bots have improved, so has the balance of their play. And as that happens, they would become a much better source of testing.

We have run a few DAIDE tournaments between the different Bots, with around 2000 games per tournament. The statistics from these tournaments do show a significant variation of results of each power compared with human games, but unfortunately, there haven't been any such tournaments run recently enough to involve the latest Bots, which I would expect to give results that are far closer to the results of Standard.

Even when Bots are able to play sufficiently well, there are still things that a variant tester would have to note. For instance, a game between Bots has never ended in an agreed draw, as there is no Bot that is yet able to agree to a draw. Furthermore, they also don't have any specific knowledge of how to set up a stalemate line, so almost all games end in a solo. The few that don't are where a Bot manages to form a stalemate line through its other algorithms, and the game is eventually ended by the server terminating it (which is usually set to happen if there have been 50 years without a change of centre ownership!). Because of this, play testing with the current Bots wouldn't tell you if the game is prone to stalemates or solos. But it should give you a good idea of the balance of the strengths of the powers in the variant. And hopefully future Bots will resolve this issue.

Another thing the Bots can't do, is tell you whether it's actually an interesting variant to play!!!

Of course, there is one additional advantage of testing with Bots. With human players, your results are going to be skewed by the skill level of the players. By testing with every power played by an instance of the same Bot, you have a perfectly level playing field from the player ability perspective!

JB: In my view, the negotiation part is not hugely important, I would think that testing a variant map in no press Level 0 would give a designer most of the input they needed, especially regarding statistics on which centers particular bot countries ended up holding.  Do you agree?

DN: I would go further than that. My experience of testing variants is that No Press games generally show up problems with a variant better than press games. Playing a game with press allows the players to compensate for weaknesses in their power, and counteract the strengths of other powers, much better than they are able to in a no-press game. Hence if there is an imbalance, I believe it will show up much better in repeated no-press play than in repeated press play.

Of course, if you are trying to make an unbalanced variant, one where one power is unusually strong, and the other powers have to work together to deal with it, then this doesn't follow. But variants like this are in the small minority.

JB: One of the problems we all have is that this is a hobby.  Daniel Loeb made a fairly significant amount of progress in a relatively short period of time with making his project a school/student activity.  Some of the efforts at developing bots has come from people working on Masters degrees.  But the "professionals" have done a horrible job (my opinion) in designing bots, probably because they were up against commercial constraints that made them repeatedly take inappropriate shortcuts.  I've heard the comment lately about "programming projects taking over your life" as well (knowing you, like me, are much too busy a person to actually have this or any other part of the hobby actually take over).  How would you assess the "incentive" problems, "time" problems, and "gosh darn it, this is just a really difficult programming task" problems in determining the speed and direction of DAIDE to date?

DN: I don't think it should be that big a problem yet. Some people spend years working on a hobby project - indeed, I know Sean Lorber says he spent 15 years developing SeaNail. And yet Albert, the best DAIDE Bot to date was developed in a number of months. Given this, I don't see why there should be barriers to other people writing better Bots that we currently have while still keeping it as a hobby.
When the time comes that the best Bots really are that good that it's more than a one-man hobby commitment to write a new competitive Bot, that's when I think we really need to look at forming a community project to write the next generation of Bot. But I don't think we're anywhere near that yet.

JB: I'd now like to turn to the future.  I've often said, and still believe, that truly solving the dipai problem is synonymous with the task of solving the "Turing test" of AI that currently fascinates the futurists like Ray Kurzweil and Mitch Kapor, but not much of anyone else.  In that sense, solving the dipai problem is a game, really interesting to crack, but not of much external use.  On the other hand, many of the futurists believe this is a really important hurdle to cross and thus solving the dipai problem in that way (having bots be "indistinguishable" from human players in an open test) could be a huge breakthrough in human evolution.  I don't quite believe either of these extremes, though remain fascinated by the ideas generated.  What do you think?

DN: It's not something I've really considered. I think when it comes to Diplomacy, Bots have some huge advantages and some huge disadvantages. They can calculate a massive number of possible orders in a very short length of time, but on the other hand, they don't have the natural ability to empathise with their ally, or to talk about anything other than the game. Hence I think that when Diplomacy Bots do become competitive with human players, they will do so by out-playing them in the parts of the game they are good at, not by playing like them.

JB: Would you care to give odds on a DAIDE bot passing a Turing test by 2029 (Kurzweil's date)?

DN: As in actually playing like a human, not just playing as well as a human? I'd be very surprised. They may manage it in a no-press game, but in a press game, even using the DAIDE language (or something similar), I wouldn't expect them to be able to accurately mimic a human in the way they use the language.

JB: Any other thoughts on all this you would like to convey?

DN: If people want to get involved in the project, then there are two ways they can. The first is to write their own Bot. If this is of interest, then join the DipAI YahooGroup, and have a look at the DAIDE Homepage

(www.daide.org.uk).

The other way they can help, is by joining the Real Time Diplomacy group. This is a group of players who play a complete no-press Diplomacy game online in a couple of hours, using the the DAIDE software. When there are seven of them available, they play an all-human game, but when there are less available, the spaces are filled by Bots. Hence this is a great way for Bots to get some playing experience in a human environment.

There have also been a couple of spinoffs from this project. One of them is, having put together a list of all the concepts you need to negotiate in Diplomacy, I've then laid them out on a double-sided A4 sheet, in multiple languages. Hence you have an instant translator for if you're ever playing FtF Diplomacy with someone who you don't have a common language with. See www.ellought.demon.co.uk/dip_translator. It currently covers five languages (English, French, German, Dutch, Italian).

And taking this one step further, I've already said that the DAIDE Mapper can translate between the tokenised DAIDE language and English. Well there's no direct link between the two, so it could equally translate between DAIDE and French, German, or any other language. Once this has been done, you could have two Mappers in a game, one in English, one in French. Each player enters their negotiation in their own language and it's automatically translated into the language of the other player! It's not there yet, but it's something to look out for in the future...

JB: I wish you luck in this project and hope that more people engage with it over time. One wishes one didn't have to work so much and had more time for play..... people can see your site on this project at: http://www.ellought.demon.co.uk/dipai/