Beetle in the Box: Architecture

Showing posts with label Architecture. Show all posts

Wednesday, November 13, 2013

Game Architecture and Networking, Part 2

In this post I will talk in more detail about my game architecture and how the networking is realized in it. For the basics of my architecture, see these two old blog posts from last year:
Parallel Game Architecture, Part 1
Parallel Game Architecture, Part 2
In short, my architecture consists of 6 subsystems which all work on their own copy of a subset of game data. Because they work on a copy, they can be updated concurrently. If a subsystem changes its game data copy, it has to queue the change up in the change manager. After all subsystems are finished updating, the change manager distributes the changes to the other subsystems. Right now my engine has the following 6 subsystems:

Input: responsible for reading the input from the gamepad or keyboard
Physic: collision detection and physical simulation of the world (uses Jitter Physics Engine)
Graphic and Audio: uses the SunBurn Engine and Mercury Particle Engine
LevelLogic: responsible for checking winning/losing conditions, creating/destroying objects, and so on
Networking: communicate with other peers over the network
AI: calculates the InputActions for not human controlled game entities (allies, enemies, homing missiles, ...)

In the picture below you can see how a player controllable game entity (for example the player's space ship) is represented in the game architecture. Every box equates to an object in the specific subsystem.


Representation of a player controllable entity on the host and a client system. The numbers in the white circles show the latency in number of frames.

In the input system the pressed buttons are read and translated to InputActions (Accelerate, Rotate, Fire Weapon, ...). Those InputActions are interpreted by the phyics-system which is the central subsystem of the engine. It informs the other system about the physical state of all the game entities like the current position, current acceleration, collisions and so on. These information are needed in the graphics and audio system to render the game entities models at the right position, play a sound and/or trigger a particle effect.

Host Side Networking
The networking system has to send the physical state of the game entities to all clients. For important game entities like player ships, this is done a few times per second, for ai objects much more infrequently, because ai objects behave more or less deterministically (iff the client and host are in sync!). There are some important physical state changes which trigger an immediate network send. For example, if the game entity is firing, we want to create the fired projectile on the clients as fast as possible, to minimize the divergence between the host-world and the client worlds.

Client Side Networking
On the client side, the networking system receives the physical state and calculates a prediction, based on the latency, the received velocity and current acceleration. In the next change distribution phase the current physical state in the physic-system is overwritten by the predicted physical state. Then it takes one more frame till the information is really visible on the client side. On the Xbox my game runs on average at 60 frames per second. This means, if a player fires her weapon this is seen at the earliest 66 ms (4 frames) + X ms later on another peer. X depends on the latency L and the point in time the network packet is received. If the packet is received after the networking system has already finished updating, the packet has to wait for the next frame till it gets processed.

Conclution
The latency is a disadvantage of the game architecture. But it could be improved for example by merging the input system into the physics-system. A separate input system does not improve performance. But merging the two system would reduce the maintainability, therefor I have no plans doing this in the near future. Separation of concerns between the systems is really a big plus. You could even replace a system completely without affecting the other systems.
On importent part of my networking code is not finished yet: the interpolation. Right now the old physical state is simply overwritten by the new one received from the host. If the two states differ much, the movement of the game entity looks jerky. To solve this, the difference from the old to the new physical state can be evenly spread and applied over the next few frames.

Wednesday, March 21, 2012

Parallel Game Architecture, Part 2

As promised, here is the second part of my game architecture post.
Last week I coded something I wanted for a long time in my game. A scheduling visualizer! With it you get a much better feeling how all the parallel stuff works. In the picture below you can see 20ms of game time. Each row represents one thread and the rightmost frame is the last frame rendered. On the basis of the picture I will try to explain how my architecture works. To see the scheduling visualizer in action, jump to the video at the bottom.

20ms of game time: Each row represents one thread (ThreadId is shown at the beginning of a row).

Firstly, we have four Systems with the following colors in the diagram:

Graphics: Green
Physics: Blue
AI: Orange
Input: Turquoise

Each of the systems is implemented as a separate Windows Game Library. To fit into the framework a system has to provide a few objects with specific interfaces. For example the ISystemScene interface is needed for object creation and destruction. The other important interface is ISystemTask. At the beginning of a frame, the framework calls every System's ISystemTask.Update(DeltaTime) method in parallel. In the update method all the work of the system is done. So far we have functional parallelism. Additionally we want data parallelism. To achieve this, every system gets a reference to the frameworks taskmanager (uses the ParallelTasks library) on initialization. The taskmanager can be used by the system to issue work, which should be executed in parallel.

Graphics System
One example for this can be seen in the picture. The graphics system issues a task, which updates the positions of all particles in the scene. This is the single small green rectangle. The graphics system issues this task at the beginning of the frame rendering. At the end of the frame rendering, the particles are getting drawn. If the task hasn't finished till that point in time the graphics system has to wait. This occurs very seldom, but you never know what the OS scheduler does with your threads.

Physics System
Real data parallel work is done in the physics system. If you look at the picture you can identify four data parallel parts. To be honest, I don't know what exactly is happening there. Jitter, the physics library I use, was already parallelized. And luckily the library is implemented very well and the source code is available too. I only had to replace Jitter's Threadmanager with my own one. My new custom Threadmanager is using a parallel for loop, which the taskmanager provides, to iterate over the work which gets issued by Jitter. The for loop iteration are then getting distributed among the worker threads. Jitter is really nicely designed, only a few lines of code were needed to achieve this.

Change Distribution
Between every frame you can see ca. 0,6 ms wide gaps. This is the time needed to distributed the changes that were made by the systems. For instance, the physics system may have changed the positions of a object. This position is needed by the graphics system to draw it at the according position in the next frame.

To finish, heres a video where you can see the scheduling visualizer in action. The scene contained 2400 asteroids and 60 space ships with a rudimentary AI (only roaming). Without multithreading we've got 40 frames per second, with multithreading 60 frames per second. Without video capturing the speedup is even better (50 single vs. 90 multithreaded). Nevertheless, with only two systems doing hard work (physics and graphics) the four CPU cores can't be utilized fully. This may change when I implement more stuff in the AI system. Getting more data parallelism out of the physics system would also improve the utilization, but I don't think that can be easily achieved and its not worth the hassle.
Used hardware: Core Quad Q9300 2.5 GHz, Radeon HD 6700

XNA multithreading from Jan Tepelmann on Vimeo.

Monday, March 19, 2012

Parallel Game Architecture, Part 1

In this post I will describe the core idea behind the game architecture I chose. If you are more interested in the implementation have a look at the next post.

The Motivation
Last year I came across the Intel Smoke Demo while researching for the Multicore seminar I attended. Gladly I was allowed to pick my own topic, so I chose "Multicore Programming in Computer Games".
After the seminar I was really motivated to use the things I've learned to build my own game. So I used the Intel Smoke Demo as a template for my own game architecture. The architecture in the demo supports n-way threading, which is really a good thing if you consider the graph below.

Source: Steam Hardware Survey

You can see that there is a big diversity in number of CPU cores per gaming PC (Xbox has three cores, PS3 has one main core + 6 SPU cores). In the future there will be even more diversity when six and eight core CPUs are getting more common.
In the early days of multicore game programming many games utilized multicore CPUs by splitting their game loop into a render part and game logic part. Both parts were then executed separately on a own thread. QUAKE 4 for example took this approach (see these slides for more infos). On the Xbox 360, the game loop was often split into three parts to utilize all cores (have a look at these Kameo slides for instance).
This approach is called functional decomposition. Works great, but you are specializing for a fixed number of cores. If you have more cores then subsystems, you are wasting CPU power.
A n-way architecture, like the one used in the Intel Smoke Demo, tries to utilize all cores that are available on the machine. This can be achieved by additionally taking advantage of data parallelism. Simply put, this means calculating for loops in parallel (=distributing for loop iterations among threads). For example, if you have 1000 game object and four threads, you can calculate 250 object updates on each thread, ideally taking only 25% of the time (25% only iff each update takes exactly the same time).

The Idea
The core idea behind the Smoke Demo is best illustrated in two pictures.

1. Execution Phase with four threads

In the picture above you can see all the calculation needed for one frame. Each system (AI, Physics, Graphics, Audio, Input) is getting scheduled to one of the four available worker threads. Once a system is scheduled, it can split its work into subtasks to take advantage of data parallelism (AI and Physics are doing this in the picture).
Every system works on its own copy of game data. This is needed to make them independent from each other, so that they can run in parallel. After each frame, the system's game data copy is getting updated by the Change Manager (see picture below).
To make this possible, every relevant game data change has to be queued up in the Change Manager before (symbolized by the arrows in picture one).

2. Change distribution Phase

To sum this all up, here is a short list of the pros and cons for this architecture:

Pros:

Performance: Scales to an arbitrary number of CPU cores.
Good maintainability: Strict separation of concerns between the system. You can even replace a system completely without affecting the other systems.

Cons:

Higher memory consumption: Every system has to have an own copy of needed game data.
Overhead: Caused by change collection and distribution
Latencies: All systems work on old data (from the last frame).

In the next post I will describe my implementation of the Smoke architecture.

Beetle in the Box