Need help with learning AI

Blitz3D Forums/Blitz3D Beginners Area/Need help with learning AI

Sonic65(Posted 2008) [#1]
Okay, so for my science fair project, I'm programming a Tic-Tac-Toe AI opponent which starts out with no knowledge of the game, and learns as it goes along. I tried to implement a simple reinforcement learning system, but it doesn't give good enough results; it only wins 86% of its total games at most (at the beginning of the program, at around 100 games) against a random opponent, and then drops to 77-80% and stays there. I want the learning AI to win over 90% of the time, at least.

Here's the code I'm using (doesn't require any external files):



If anyone could please help me figure out what I'm doing wrong here, it would be greatly appreciated.


H. T. U.(Posted 2008) [#2]
There's a program in the code archives that does exactly what you want (if you want to look).


H. T. U.(Posted 2008) [#3]
There's a program in the code archives that does exactly what you want (if you want to look).

EDIT: Oops, lol!


Nate the Great(Posted 2008) [#4]
Hey i just made something like that! haha wow thats a coincidence... plz give me credit if you use it but you dont have to... here is a link

:)

http://www.blitzbasic.com/codearcs/codearcs.php?code=2370#comments

edit: when I run your code in debug mode I get an error array index out of bounds


Sonic65(Posted 2008) [#5]
That program in the code archives is pretty neat; however, it seems like it wouldn't do very well against an opponent that doesn't play well, i.e. a random opponent (also, it uses a LOT of external files). For this project, I'm going to have to test the AI against three different opponents, each playing a different way.

I tried to implement a system to also store the opponent's moves, and give feedback to those, but the results are still the same as before. Here's the new code:




I know I'm doing something wrong, but I have no idea what. (BTW, this should work in debug mode now.)


GIB3D(Posted 2008) [#6]
How can you play Tic-Tac-Toe more than two different ways, Trying To Win and Trying To Draw(No one wins)?


Sonic65(Posted 2008) [#7]
The three opponents are as follows:


Random - moves randomly

Expert - tries to win (uses pattern tables)

Control - moves psuedo-randomly; given the same board state, it will
always move in the same position (which is selected randomly when
said board state is first encountered).

Right now, it can't even get 90% against the random opponent. :(


GIB3D(Posted 2008) [#8]
v Another non-helpfull post v

Oh, pattern tables, I would've never thought of using something like that ;)

Good thing you're not trying to make a Human Vs. Computer Sudoku game.

^ Another non-helpfull post ^


Nate the Great(Posted 2008) [#9]
I will look over your code when I have time but for now can you explain exactly how it works?

BTW the reason I have my program make all of those external files is so i am able to save it at its state by putting them all in a folder and I can reset it by deleting them or I could even modify them to make it smarter :)


mtnhome3d(Posted 2008) [#10]
A strange game, the only winning move is not to play. you should name the app "joshua". ok enough jokes! in the code, which opponent does it simulate. is there a switch to change the opponent?


Sonic65(Posted 2008) [#11]
mtnhome3d:
It uses the random opponent right now. It needs to be able to beat this first =P



Nate the Great:

Each board is stored in two states; one state for the player's positions (with the board stored as 0 for blank/opponent and 1 for the player), and one state for the opponent's positions (0 for blank/player, 1 for the opponent). This is better than having them in one state because it greatly decreases the number of truly unique states (according to my math, there are only 1024 possible states, instead of the 362,880 states needed if every board was stored as a whole). To store states, the program uses the type 'state'.

The AI cycles through each state (player or opponent), to see if it matches the current board. (Note that the AI doesn't take advantage of rotations.) If it does, then it finds the highest-ranking cell in that state, and marks it. It also updates the move and state arrays, which keep track of the moves made and states encountered in the current game. (These are cleared after every game).

If it cannot match a state to the current board, then it either moves randomly or moves into the first available position on the board. (Which of these it does is determined by a random number; there is a 10% chance it will move randomly, and a 90% chance it will move into the first available position). It also updates the move array for the player (this keeps track of the moves made in the current game, so that at the end of the game these moves can be evaluated and their value changed).

After each move, the player and opponent states are updated. If the current board represents a new state, then a new state will be created. Either way, the updating functions update the player and enemy state arrays (which keep track of the states encountered in the current game, in order).

At the end of the game, cell values are updated for all the states encountered during the game. For player states, the update function adds 3 to the value of the cell moved in if the AI won, and subtracts 5 if it lost. For opponent states, it ignores losses, and only updates if the opponent wins; it updates the value of each cell moved in during the last game so that the closer to the end of the game, the higher the reward for that cell. (This way, the opponent's winning move counts more than its first, and the AI theoretically should know that it should move in that space...but it doesn't.)

Sorry if I didn't explain that very well. If there is any specific part that you're unclear about and don't understand from the code, just ask me. =P


Nate the Great(Posted 2008) [#12]
hmm... great explanation. It sounds like it does something almost exactly the same as my program except mine takes advantages of fliped and rotated boards. The whole idea of giving each cell a value sounds like one of those things that works easily in theory but is much harder to apply. I think you should start out without it giving each cell a value and instead giving each cell a 1 or a 0 for if it moved there. this may simplify things but you dont have to do it. In my tic tac toe program it checks each board and if it is the same and the computer side won it will most likely move there. it then checks to see if it also matches a losing board and if it does it starts over again or moves randomly... start over (not completely just go back to the comp moving randomly) then try a simpler concept and build uppon it like I did and you may come up with some better ideas :) it is very hard to just jump into a great idea but it is easier to look at the great idea and break it down. Then when you start you wont be instantly satisfied however as you improve it over time it will work out :) hope this helped


Sonic65(Posted 2008) [#13]
Success! I decided to research the topic a bit more, and found a Perl program which did exactly what I wanted to do here. After looking over that program and replicating it Blitz3D, as well as making it about 100x faster (in terms of how fast it learns, not program speed), it reaches the goal of 90% in only ~2000 games! The process it uses is very similar to the process I described above, except it stores the board in a string, and uses a different feedback function.

Here's the code (again, doesn't require any external files):



If you want to play against it without having to train it first against the random opponent, then download the file below and put it in the same folder as the code (without changing the name or the file), and set the human flag to True.

http://www.fileden.com/files/2006/8/1/150488/trainData.txt


Nate the Great(Posted 2008) [#14]
hmmm... very nice but when I set the human factor to true it only takes it 3 turns to start beating me and it always moves in the corner