Monday, March 19, 2012

Parallel Game Architecture, Part 1

In this post I will describe the core idea behind the game architecture I chose. If you are more interested in the implementation have a look at the next post.

The Motivation
Last year I came across the Intel Smoke Demo while researching for the Multicore seminar I attended. Gladly I was allowed to pick my own topic, so I chose "Multicore Programming in Computer Games".
After the seminar I was really motivated to use the things I've learned to build my own game. So I used the Intel Smoke Demo as a template for my own game architecture. The architecture in the demo supports n-way threading, which is really a good thing if you consider the graph below.
Source: Steam Hardware Survey
You can see that there is a big diversity in number of CPU cores per gaming PC (Xbox has three cores, PS3 has one main core + 6 SPU cores). In the future there will be even more diversity when six and eight core CPUs are getting more common.
In the early days of multicore game programming many games utilized multicore CPUs by splitting their game loop into a render part and game logic part. Both parts were then executed separately on a own thread. QUAKE 4 for example took this approach (see these slides for more infos). On the Xbox 360, the game loop was often split into three parts to utilize all cores (have a look at these Kameo slides for instance).
This approach is called functional decomposition. Works great, but you are specializing for a fixed number of cores. If you have more cores then subsystems, you are wasting CPU power.
A n-way architecture, like the one used in the Intel Smoke Demo, tries to utilize all cores that are available on the machine. This can be achieved by additionally taking advantage of data parallelism. Simply put, this means calculating for loops in parallel (=distributing for loop iterations among threads). For example, if you have 1000 game object and four threads, you can calculate 250 object updates on each thread, ideally taking only 25% of the time (25% only iff each update takes exactly the same time).

The Idea
The core idea behind the Smoke Demo is best illustrated in two pictures.

1. Execution Phase with four threads

In the picture above you can see all the calculation needed for one frame. Each system (AI, Physics, Graphics, Audio, Input) is getting scheduled to one of the four available worker threads. Once a system is scheduled, it can split its work into subtasks to take advantage of data parallelism (AI and Physics are doing this in the picture).
Every system works on its own copy of game data. This is needed to make them independent from each other, so that they can run in parallel. After each frame, the system's game data copy is getting updated by the Change Manager (see picture below).
To make this possible, every relevant game data change has to be queued up in the Change Manager before (symbolized by the arrows in picture one).

2. Change distribution Phase
To sum this all up, here is a short list of the pros and cons for this architecture:

  • Performance: Scales to an arbitrary number of CPU cores.
  • Good maintainability: Strict separation of concerns between the system. You can even replace a system completely without affecting the other systems.
  • Higher memory consumption: Every system has to have an own copy of needed game data.
  • Overhead: Caused by change collection and distribution
  • Latencies:  All systems work on old data (from the last frame).
In the next post I will describe my implementation of the Smoke architecture.

No comments:

Post a Comment