Background Part II

figure logoascender.png
figure ascender.png

Cloud Graphical Rendering:

Background to Ascender’s Solution

Part II

Joel Isaacson

joel@ascender.com

www.ascender.com/remote-graphics

August 2013

Version 1.5

1 Difficulties By Design

The previous note describes difficulties intrinsic to cloud graphics, but there are also difficulties related to design decisions. In order to consider a concrete example of cloud graphics, we examine the current paradigm of cloud gaming, exemplified by the Nvidia Grid or AMD’s Radeon Sky. In this design, there is an array of GPU’s in the cloud. In the Nvidia Grid each GPU can support up to two remote cloud gamers.

Game applications are run in the cloud. The OpenGL code is executed and the graphics of each frame are rendered in the cloud into pixels. The pixels of each frame are encoded with a GPU-based H264 codec and then sent via the network to the remote cloud gamer. The H264 stream is decoded and displayed on the remote device.

1.1 Heavy Use Of Cloud Resources

The Nvidia Grid and the similarly designed AMD Radeon Sky are offerings from hardware manufacturers. Ever-more expensive hardware is being proposed to solve by brute force the difficult problems of cloud gaming. This self-serving approach by hardware manufacturers might not be in the best interests of cloud gaming providers.

It is hard to understand the great need to centralize the GPU hardware given the current trend of ever-more capable CPU’s and GPU’s in consumer devices. Nvidia itself is currently pushing the Tegra 4 processor with a quad-core ARM Cortex-A15 and a quite capable 72 core GPU.

1.2 Heavy Use Of Network Resources

The standard video codec, H264, is a bad choice for video streams generated by computer gaming. The reason for this relates to the principles of design of the MPEG family of video codecs of which H264 is a member and general principles of information theory. MPEG was designed to efficiently compress photographically generated video streams. The typical video streams presented to the codec are a small set of all possible video streams. Only video streams with characteristic spatial and temporal correlations that make sense to the human visual system are in the domain of MPEG compression. Essentially MPEG can handle any stream that can be cinematographically produced.

The subset of video streams which are generated by a computer game is a very restricted subset of the streams that MPEG can efficiently handle. Using knowledge of how the game’s video stream is generated can enable significant efficiencies in the compression and transmission of the video streams.

A simple argument demonstrates these ideas. The published bandwidth for an Nvidia Grid H264 stream for a 30 fps refresh rate at 720p resolution is 600 KBytes/sec. This means that each hour of game play uses 2.1 GBytes of network bandwidth. The test case that we will examine as a typical game is the popular Android game Temple Run 2. This Android app is 31 MBytes. If we run this game remotely via theNvidia Grid, for every 52 seconds of play we transfer as much video data as a complete download of the game app. In the following notes we describe Ascender’s rendering compression techniques and explain why it uses a fraction of the network bandwidth of H264.

Intuitively, there must be a better way to compress the video stream. After all, if we estimate that about half the downloaded app consists of compressed textures and vertices, and since the program is an OpenGL program, the video stream consists solely of geometric and graphical transformations of these textures. We will see in the next section that transmitting the compressed rendering stream, rather than the compressed pixel stream, leads to a much more efficient way of remotely exporting graphics.

1.3 Encoding Latency

H264 and the other members of the MPEG family were designed for an environment where the real time latency of the video image is not critical. For example: if the source is a DVD or Blu-Ray disk the compressed video is precomputed and completely available. An arbitrary number of frames can be pre-buffered to allow smooth decompression of frames. Live news feeds and sporting events can lag behind the real-time video stream by a number of seconds as long as the video stream flows smoothly. This latitude in the design of MPEG introduced the concept of group of pictures (GOP) which, as the name indicates, is a group of successive pictures. Decompression of a frame will typically reference frames, within the GOP, that appear both earlier and later than the current frame. This delays decoding of a frame until all the referenced frames have been received.

When an MPEG codec is used in interactive gaming many of the design assumptions are problematical and lead to unacceptable latency. Tuning the video codec to provide a low-latency stream adversely effects the compression efficiency and/or the video quality.

1.4 Image Quality

Compressed video compromises image quality for bandwidth. The compressed images are noticeably degraded.

1.5 Frame Rate

On the Android tablet (Nexus 7) used for testing, Temple Run 2 runs close to the modern standard of 60 fps. Nvidia Grid uses a frame rate of 30 fps, a compromise for high quality games.

1.6 Rendering Latency

There are graphics systems that use a remote rendering level protocol and local generation of pixels. X11 and its OpenGL extension GLX are well known examples. They unfortunately are not useful in environments where the round trip latency is large. If an OpenGL application is run remotely on a low latency local network using GLX, it is possible to achieve a frame rate of a few tens of frames a second. The same application run on a network with tens of milliseconds latency will yield an unacceptable frame rate. The reason for this is that each OpenGL command entails a round-trip delay while waiting for the return value of the routine. The OpenGL return value semantics must be dealt with in practical remote rendering protocols.

H264 video transport doesn’t suffer from this type of rendering latency since the the protocol is simplex (transfers data in one direction only).