Wednesday, March 21, 2012

Parallel Game Architecture, Part 2

As promised, here is the second part of my game architecture post.
Last week I coded something I wanted for a long time in my game. A scheduling visualizer! With it you get a much better feeling how all the parallel stuff works. In the picture below you can see 20ms of game time. Each row represents one thread and the rightmost frame is the last frame rendered. On the basis of the picture I will try to explain how my architecture works. To see the scheduling visualizer in action, jump to the video at the bottom.

20ms of game time: Each row represents one thread (ThreadId is shown at the beginning of a row).
Firstly, we have four Systems with the following colors in the diagram:
  • Graphics: Green
  • Physics: Blue
  • AI: Orange
  • Input: Turquoise
Each of the systems is implemented as a separate Windows Game Library. To fit into the framework a system has to provide a few objects with specific interfaces. For example the ISystemScene interface is needed for object creation and destruction. The other important interface is ISystemTask. At the beginning of a frame, the framework calls every System's ISystemTask.Update(DeltaTime) method in parallel. In the update method all the work of the system is done. So far we have functional parallelism. Additionally we want data parallelism. To achieve this, every system gets a reference to the frameworks taskmanager (uses the ParallelTasks library) on initialization. The taskmanager can be used by the system to issue work, which should be executed in parallel.

Graphics System 
One example for this can be seen in the picture. The graphics system issues a task, which updates the positions of all particles in the scene. This is the single small green rectangle. The graphics system issues this task at the beginning of the frame rendering. At the end of the frame rendering, the particles are getting drawn. If the task hasn't finished till that point in time the graphics system has to wait. This occurs very seldom, but you never know what the OS scheduler does with your threads.

Physics System
Real data parallel work is done in the physics system. If you look at the picture you can identify four data parallel parts. To be honest, I don't know what exactly is happening there. Jitter, the physics library I use, was already parallelized. And luckily the library is implemented very well and the source code is available too. I only had to replace Jitter's Threadmanager with my own one. My new custom Threadmanager is using a parallel for loop, which the taskmanager provides, to iterate over the work which gets issued by Jitter. The for loop iteration are then getting distributed among the worker threads. Jitter is really nicely designed, only a few lines of code were needed to achieve this.

Change Distribution
Between every frame you can see ca. 0,6 ms wide gaps. This is the time needed to distributed the changes that were made by the systems. For instance, the physics system may have changed the positions of a object. This position is needed by the graphics system to draw it at the according position in the next frame.

To finish, heres a video where you can see the scheduling visualizer in action. The scene contained 2400 asteroids and 60 space ships with a rudimentary AI (only roaming). Without multithreading we've got 40 frames per second, with multithreading 60 frames per second. Without video capturing the speedup is even better (50 single vs. 90 multithreaded). Nevertheless, with only two systems doing hard work (physics and graphics) the four CPU cores can't be utilized fully. This may change when I implement more stuff in the AI system. Getting more data parallelism out of the physics system would also improve the utilization, but I don't think that can be easily achieved and its not worth the hassle.
Used hardware: Core Quad Q9300 2.5 GHz, Radeon HD 6700


XNA multithreading from Jan Tepelmann on Vimeo.

No comments:

Post a Comment