So as a tradition, for CS3243, in a team of 4-5, we had to create an AI that plays Tetris using the skeleton game given by the university. The project was a disaster, hardly any instructions on where to start. Nevertheless, we still managed to survive the project and submitted it on time. I hope this post can help future CS3243 students survive the project better.
Tips: Don’t try to do a last minute heroic. We did and magically we survived but we cannot guarantee the same for you. Spend at least 4-6 weeks on this project.
You can find our project here. Copying our project is not recommended as it is plagiarism and can be very easily detected by the school.
So we were given:
- Some Java files that simulate a Tetris game:
- Had an interface so you can watch it play. This can be turned off to make the game run as fast as possible. However, depending on your implementation, it can still take from 30 minutes to 100 hours to clear 1 million lines.
- The API will be explained later.
- An instruction on when and where to submit the project.
We were expected to submit:
- A PDF report of the project.
- The bot implementation, stuffed into a single file. Changing any files that belong to the game mechanism is forbidden, so if a helper function is needed and it’s inside the game file, you’d better moved it to the bot implementation. I know it’s BS but it’s the requirements.
Expanding a little on a Tetris game (you can skip this part if you’re only interested in how to create a Tetris bot in general, not for a CS3243 project):
- We will follow the most simple implementation of a Tetris game:
- 10×21 board size. Row 21st is hidden and f a piece reaches that row, the game ends.
- A filled row will be removed from the board, adding 1 to your total points.
- There are 7 possible Tetris pieces: I, O, J, L, S, Z, T. The pieces are enumerated starting from index 0.
- The pieces can be rotated, the orientation is also enumerated starting from index 0.
- Java’s pseudorandom number generator is used to get the next piece.
- Only one piece is generated at a time, so no lookahead.
- We are given this API:
- Static methods:
- Get the possible orientations of all pieces: Returns an array where arr[PIECE] is the number of possible orientations. So a piece like O will have only 1 possible orientation, a piece like S will have 2.
- Get the possible widths of all pieces: Returns a 2D array where arr[PIECE][ORIENT] is the width of a given piece on a given orientation. Suppose the piece is Z, there are two possible orientations, one having width 3 while the other having width 2.
- Get the possible heights of all pieces: Same as above, but for heights, not widths.
- Get the possible bottoms of all pieces: Returns a 3D array where arr[PIECE][ORIENT][COLUMN] is the bottom row of a given piece in a given orientation on a given column. Take a Z piece for example, if it’s in the Z orientation, the bottoms will be [0,0,1]. If it’s in the N orientation, the bottoms will be [1,0].
- Get the possible tops of all pieces: Same as above. Take a Z piece in the Z orientation, this method returns [1,2,2]. If it’s in the N orientation, the method returns [3,2].
- Instance methods:
- Get the play field: Returns the play field where the play field is a 2D array of size 21×10, where arr[i][j] is the turn where a block is added to the cell on row i, column j. If arr[i][j] = 0, that cell is empty. Using the figure below as an example, we have a sequence of I, O, L, S, J, T.
- Get the list of top cells: Returns a list of length 10 where each element is the top of a particular column. Using the figure above as an example, this function returns [1,1,3,4,4,1,4,5,4,4].
- Get the next piece
- Check if has lost the game or not
- Get the number of rows cleared: When the game is over, this is the final score of the game.
- Get the number of turns
- Static methods:
Now with all of these stuff, what to do?
There are three components in this project, interacting with each other like the figure below:
You will probably spend most of the time tweaking the two steps in red.
Now let’s do the fun thing – coding the real thing.
Evaluating the best move
How do you say a move is better than the others?
In particular, we can construct a heuristic function that evaluates a Tetris board position based on some characteristic features of the position. Such features are easily recognizable by experienced players, and include the current height of the wall, the presence of “holes” and “glitches” (severe irregularities) in the first few rows, etc. Consequently, the agent’s utility function can be approximated by a heuristic function that comprises a linear weighted sum of features. For a given
grid state B, we have
where is the number of features, is the weight vector, and are the feature functions for board position .
– Quoted directly from the project briefing.
So what does this mean? Essentially, there is this thing called a feature, or heuristic, which is a value for a particular board arrangement. The bot evaluates a utility of a move by trying that move, calculating the weighted sum of all features, then compare the utilities of all moves and select the move with the highest utility.
I can say with some confidence that whether your bot has a good performance or not depends very heavily on how good the features are. You can try the most sophisticated training algorithm but if your features suck, then that’s it.
We use 8 features modified from the features introduced by Dellacherie (2003) and Thiery (2009). Be noted that the Dellacherie-Thiery features produced the best performing Tetris AI up to now. However, because we did this project as a last minute heroic, we didn’t have time to try out all the features and decided to go with these 8:
- Landing bottom: The height where the bottom of the current piece fell (row 6 in
- Landing top: The height where the top of the current piece fell (row 10 in the example).
- Line cleared: Number of lines cleared in one move (2 in the example: rows 6 and 7).
- Number of holes: Total number of holes in the grid. A hole is defined to be an empty cell with at least one filled cell above it in the same column. There are 10 holes in the example.
- Sum of outer wells: A well is a series of empty cells counting down from an empty cell that is horizontally adjacent to two occupied cells. The sum of outer wells is the accumulated well depths of all wells that are above the top of a column. Outer well depths are numbers in red color. In the example, the outer well sum is 2+1=3.
- Sum of inner wells: The sum of inner wells is the accumulated well depths of all wells that are below the top of a column. Inner
well depths are numbers in green color. In the example, the inner well sum is 5+3+2+1+1+1+1=14.
- Row transitions: The number of occupied cells adjacent to empty cells, summed over all rows. The two borders are also counted as occupied cells. Take row 2 for example, the row transitions are purple thick borders.
- Column transitions: Same as row transitions, but summed over all columns.
Training the bot
Now the even more fun part – playing God.
Most people use Genetic Algorithm (GA) to train their bots. For this project, our team used a hybrid of GA and Noisy Cross Entropy (NCE). You can also try Particle Swarm Optimization, Hill Climbing with Simulated Annealing or some other local search algorithms. Personally I don’t think the algorithm is important. The important things are the considerations below.
All weight vectors, or genes, are normalized, which means they are vectors of length 1. Why normalization? As the bot only cares about which move is better than the rests, multiplying the gene by a constant positive number doesn’t change the outcome. Let’s say you have three moves, each having a utility of 1, 20, 500. If you multiply all value in the gene by 10, the new utilities would be 10, 200, 5000. The best move would still be the 3rd one. If you don’t normalize the vector, you can end up in this kind of situation where you have one old gene, let’s say [1,2,3,4,5], that after one generation, evolves into [2,4,6,8,10]. Obviously these two genes are the same and your bot won’t improve if this situation happens many times. Technically, your weight vectors keep growing without making any real change.
With the 8 features we used, one game usually takes at least 2 hours to finish. Suppose we run 100 genes, each repeating 10 different tests. That’s 1000 games in total. If one game takes 2 hours to finish, your AI might very well take 3 months to evolve just one generation. So we increased the probability of the Z and S pieces as these pieces create more holes, thus making a game a lot harder. Indeed, the best gene we got could clear 26 million lines but it failed miserably after 800 lines on a game with high Z and S frequency. We used 3/11 for Z and S, and 1/11 for the other pieces.
The number of lines cleared per game follow an exponential distribution. I’ll let you think why instead of explaining here. Because it follows an exponential distribution, it’s very likely you’re getting a fitness worse than the actual expected fitness if you only run 3-5 tests per gene per generation. We used 15 tests per gene per generation. While I think this is still small and we need at least 50 tests, we didn’t have enough time to afford that. Please be reminded that this project was a last minute heroic.
Lookahead means to consider all possibilities of 2, or 3, or 4, or 999 next pieces instead of only 1 next piece. However, let’s say the bot takes 1ms to evaluate one state. The O piece has 9 possible moves, the Z, S and I pieces have 17 possible moves each, the T, L, J piece have 34 possible moves. So there are 162 total possible moves. So 1-stage lookahead is 160 times slower, 2-stage lookahead is 26k times slower. Suppose our bot which takes 2 hours to run just a game, if we use just one lookahead stage, it would take us 2 weeks instead of 2 hours to finish one game. So while lookahead seems promising, it’s not really practical to use.
Multi-threading and renting powerful computers
We used multi-threading to run multiple game instances on parallel to cut down the training time. If you don’t know what is multi-threading, go read up on it a little. We also made extensive use of SoC Compute Cluster and free Educating AWS to run our bot. If you have a server at home, you can use it.
This one I will explain in the training algorithms below.
This one was covered by one of the guys in my team so I’m not really clear about it.
For the zeroth generation, he used quasi-randomly generated weight vectors. Technically instead of generating weight vectors randomly, he tried to make them as uniformly distributed as possible.
We followed this article. While the article claimed to be able to clear millions of lines, when we implemented using the number given in the article, we could only clear 5k lines at most. Furthermore, the bot converged very slowly and usually got into weird peaks where all weights were positive (when only one should be positive and the rest should be negative). The problem was the bad heuristics and the constant noise / mutation factor. With constant mutation factor, you can very easily jump out of a good peak and getting a bad one. If you want to try GA, I recommend using a decaying mutation factor, like making the mutation amount smaller as the bot evolves.
The result when we followed pure GA was disastrous. The bot couldn’t clear more than 20k lines and it converged so slowly that there were times I wanted to throw my laptop out of the window (there’s a word for this, it’s called defenestration).
Noisy Cross Entropy
For the positive weight (lines cleared), I used 0.5 as the mean. For the negative weights, I used -0.5 as the mean. All of them had standard deviation 1. Then I generated 100 genes from the normal distribution:
We used for this project. All of these are magic numbers, don’t ask why.
The bot converged very fast with NCE. We got a bot that could clear over 5 million lines on average, the best test was 25.7 million lines. Training took very short, around some hours.
May 16, 2018 update: Apparently we got the highest mark out of all groups.