Creating VR video trailers for cardboard games

Matthew Wellings 2-Feb-2016

Google's new VR video support in YouTube seems to be getting a lot of press in the last few weeks, giving people to opportunity to explore new spaces using video in 360 degree panorama and stereoscopic 3D. The ability to support these three features simultaneously gives us exiting new opportunities, not just for recorded video but also for the presentation of 3D rendered content. Unlike producing recorded VR video (or Omni‐Directional Stereo Video to give it it's formal title) 3D rendered ODS video does not require us to possess any special hardware or software. This leads us to condider the idea that VR games should have VR video trailers.

In this post we will walk through the steps required to make an ODS video of a simple cardboard game that I have made available on GitHub. This game is an adaptation of the Google Cardboard sample. Google was good enough to provide the community with a document explaining how ODS video works. This document provides us with some pseudo-code for both ray tracing and traditional methods. We will implement the method that they recommend for OpenGL rendering that has the advantage of not requiring a custom virtual camera. This means that the existing shades used for 3D content and the code that prepares your MVP matrices will not have to be changed. The major drawback of this method is that the scene has to be rendered in strips and will require the scene render function to be called many times (four times for each column of the output frame). This drawback will make rendering very slow but also means you may have to check your render function for any stray game logic/physics that should be called from onNewFrame() instead.

I will not go into too much detail about how ODS video works as Google have explained it well but it is worth mentioning that we are after a pair of equirectangular images stored one above the other Each image is presented to one eye as with regular stereo video. Each strip (column) of each image must be rendered, not just at a different angle, but from a slightly different position on a circle so that the stereo offset is maintained as the viewer rotates their perspective.

The first problem to resolve is how to encode video generated with OpenGL in an Android app (without breaking the existing propitiatory OpenGL code in the Cardboard SDK). Fortunately there is a good explanation of how to render OpenGL to video, with examples, on BigFlake. This method uses the OS provided MediaCodec object which gives access to the GPU's hardware video encoder (the hardware that encodes videos recorded with the phone's camera). Using this method we create a new OpenGL context shared with the context we are given by the SDK. In the code example I have provided a pair of classes that are a slight modification of an example from BigFlake.com

Let's look at some code, starting with the setup code in onSurfaceCreated(). Before we set up our video surface we will make a backup of the state the SDK provided for us:

mScreenEglDisplay = EGL14.eglGetCurrentDisplay();
mScreenEglDrawSurface = EGL14.eglGetCurrentSurface(EGL14.EGL_DRAW);
mScreenEglReadSurface = EGL14.eglGetCurrentSurface(EGL14.EGL_READ);
mScreenEglContext = EGL14.eglGetCurrentContext();

Then set up the encoder (in this case for full HD):

mVideoEncoder = new VideoEncoder();
mVideoEncoder.prepare(1920, 1080, 16000000, mScreenEglContext);

Notice that we passed the current (SDK provided) context, this will alow the OpenGL textures, shaders, VBOs etc to be accessible from both contexts.

Next we set up a texture to store the strip that we render to. This is done in the usual way:


int[] stripFramebufferArray = new int[1];
int[] stripTextureArray = new int[1];
int[] stripDepthRenderbufferArray = new int[1];
GLES20.glGenFramebuffers(1, stripFramebufferArray, 0);
stripFramebuffer=stripFramebufferArray[0];
GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, stripFramebuffer);
GLES20.glGenTextures(1, stripTextureArray, 0);
stripTexture=stripTextureArray[0];
GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, stripTexture);

GLES20.glTexImage2D(GLES20.GL_TEXTURE_2D, 0, GLES20.GL_RGBA, 2, mVideoEncoder.height()/2, 0, GLES20.GL_RGBA, GLES20.GL_UNSIGNED_BYTE, null);
GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_WRAP_S, GLES20.GL_CLAMP_TO_EDGE);
GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_WRAP_T, GLES20.GL_CLAMP_TO_EDGE);
GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_MAG_FILTER, GLES20.GL_LINEAR);
GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_MIN_FILTER, GLES20.GL_LINEAR);

GLES20.glGenRenderbuffers(1, stripDepthRenderbufferArray, 0);
stripDepthRenderbuffer=stripDepthRenderbufferArray[0];
GLES20.glBindRenderbuffer(GLES20.GL_RENDERBUFFER, stripDepthRenderbuffer);
GLES20.glRenderbufferStorage(GLES20.GL_RENDERBUFFER, GLES20.GL_DEPTH_COMPONENT16, 2, mVideoEncoder.height()/2);

GLES20.glFramebufferRenderbuffer(GLES20.GL_FRAMEBUFFER, GLES20.GL_DEPTH_ATTACHMENT, GLES20.GL_RENDERBUFFER, stripDepthRenderbuffer);
GLES20.glFramebufferTexture2D(GLES20.GL_FRAMEBUFFER, GLES20.GL_COLOR_ATTACHMENT0, GLES20.GL_TEXTURE_2D, stripTexture, 0);

Notice that the strip is taller and wider than the final space it will be rendered to. The method the Google doc that we are following uses involes rendering the top and bottom halves of each strip (for each eye) in two parts, each with a 90 degree field of view, the first looking up at 45 degrees and the second looking down. The area we render the strip to is therefore one quarter the height of the final frame and one pixel wide. The extra size is used for areas where the reprojection stretches the image and to allow for anti-aliasing.
When we set up the video we switched to the new context. We restore the old one with:

EGL14.eglMakeCurrent(mScreenEglDisplay, mScreenEglDrawSurface, mScreenEglReadSurface, mScreenEglContext);

There is also some code to set up the fixed matrices:

Matrix.setIdentityM(lookup, 0);
Matrix.setIdentityM(lookdown, 0);
Matrix.rotateM(lookup, 0, -45, 1, 0, 0);
Matrix.rotateM(lookdown, 0, 45, 1, 0, 0);

Now we have finished with the set up let's look at the code in onNewFrame():

mScreenEglDisplay = EGL14.eglGetCurrentDisplay();
mScreenEglDrawSurface = EGL14.eglGetCurrentSurface(EGL14.EGL_DRAW);
mScreenEglReadSurface = EGL14.eglGetCurrentSurface(EGL14.EGL_READ);
mScreenEglContext = EGL14.eglGetCurrentContext();

int maxVideoFrames = 1;
if (frameNo<=maxVideoFrames) {
  mVideoEncoder.inputSurface().makeCurrent();
  mVideoEncoder.drain(false);
  Log.i(TAG, "Generating frame " + frameNo);
  generateVideoFrame(mVideoEncoder.width(), mVideoEncoder.height());
  mVideoEncoder.inputSurface().setPresentationTime((long)(frameNo*(1000000000f/30f)));
  mVideoEncoder.inputSurface().swapBuffers();
}
if (frameNo==maxVideoFrames)
{
  mVideoEncoder.drain(true);
  mVideoEncoder.release();
  Log.i(TAG, "Recording Finished");
}

EGL14.eglMakeCurrent(mScreenEglDisplay, mScreenEglDrawSurface, mScreenEglReadSurface, mScreenEglContext);

This code will backup the context. If we are still recording switch to rendering context, drain the buffer, render the frame with generateVideoFrame(), set the timestamp, push the frame with swapBuffers() and finally restore the Cardboard SDK provided context. The function generateVideoFrame() will produce one video frame. Once the above code is finished the onNewFrame() function will return and the Cardboard SDK will call onDrawEye() as usual.

Now let's look at the code within generateVideoFrame() starting with setting the 90 degree vertical field of view perspective matrix:

Matrix.perspectiveM(perspective, 0, 90, (float)1/(float)height,0.5f,10);

Create some more matrices including setting up where the first strip will be rendered in the final frame:

float[] stripPaint = new float[16];
Matrix.setIdentityM(stripPaint, 0);
Matrix.scaleM(stripPaint, 0, 1f/(float)width, -1f/4f, 1);
Matrix.translateM(stripPaint, 0, 0, -3f, 0);
Matrix.translateM(stripPaint, 0, -((float)width-1f), 0, 0);

Set the amount to rotate by for each new strip:

//Pixel angular width:
float apwidth = 360f/(float)width;

All the strips must be rendered for each eye and, within that loop, for each of looking up and looking down:

for (int whichEye = 0; whichEye<2; whichEye++) {
  for (int upDown = 0; upDown<2; upDown++) {

Set the halfeye matrix for looking up or looking down:

    if (upDown==0)
      Matrix.multiplyMM(halfEye, 0, lookup, 0, eye, 0);
    else
      Matrix.multiplyMM(halfEye, 0, lookdown, 0, eye, 0);

The inner loop renders each of the strips looking at the correct angle and from different view points:

    for (int i = 0; i < width; i++) {
      float angleDeg = apwidth * (float) i - 180;
      float angleRad = -angleDeg / 360f * (2f * (float) Math.PI);
      Matrix.setRotateM(rotation, 0, angleDeg, 0, 1, 0);
      Matrix.setIdentityM(eyePos, 0);
      if (whichEye==1)
	Matrix.translateM(eyePos, 0, (float) -Math.cos(angleRad) * ipd_2, 0, (float) Math.sin(angleRad) * ipd_2);
      else
	Matrix.translateM(eyePos, 0, (float) -Math.cos(angleRad+Math.PI) * ipd_2, 0, (float) Math.sin(angleRad+Math.PI) * ipd_2);
	
      Matrix.multiplyMM(viewMatrix, 0, eyePos, 0, camera, 0);
      Matrix.multiplyMM(viewMatrix, 0, rotation, 0, viewMatrix, 0);
      Matrix.multiplyMM(viewMatrix, 0, halfEye, 0, viewMatrix, 0);

In the above code we are modelling a human head being turned a full 360 decrees about the point between the eyes. The trig function places each eye on a circle looking directly forward along the tangent to that circle, as shown in the diagram on page 8 of Google's document.

Now we bind the framebuffer that we will render the strip to, render our scene and then bind the default frame-buffer:

GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, stripFramebuffer);
GLES20.glViewport(0, 0, 2, height/2);

renderScene();

GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, 0);
GLES20.glViewport(0, 0, width, height);

renderScene() is the function we have moved all our rendering code to It was in the onDrawEye(Eye eye) function but we will have to move it out of there and call renderScene() from onDrawEye(Eye eye) instead.

Now we can render the strip to the final frame:

GLES20.glUseProgram(stripProgram);
GLES20.glEnableVertexAttribArray(stripPositionParam);
GLES20.glEnableVertexAttribArray(stripCoordParam);
GLES20.glVertexAttribPointer(stripPositionParam, COORDS_PER_VERTEX, GLES20.GL_FLOAT, false, 0, rectVertices);
GLES20.glVertexAttribPointer(stripCoordParam, 2, GLES20.GL_FLOAT, false, 0, rectTXCoords);
GLES20.glUniformMatrix4fv(stripModelViewProjectionParam, 1, false, stripPaint, 0);
GLES20.glActiveTexture(GLES20.GL_TEXTURE0);
GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, stripTexture);
GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_WRAP_S, GLES20.GL_CLAMP_TO_EDGE);
GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_WRAP_T, GLES20.GL_CLAMP_TO_EDGE);
GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_MAG_FILTER, GLES20.GL_LINEAR);
GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_MIN_FILTER, GLES20.GL_LINEAR);
GLES20.glDrawArrays(GLES20.GL_TRIANGLES, 0, 6);

The is a pretty standard OpenGL texture render but we have loaded a shader that will perform a vertical projection correction as described in Google's document:

precision mediump float;
uniform sampler2D s_texture;
uniform float u_Trans;
varying vec2 v_TexCoord;

#define M_PI 3.1415926535897932384626433832795
void main() {
    float phy = v_TexCoord.y * M_PI / 2.0 - M_PI / 4.0;
    float perspective_y = (tan(phy) * 0.5 + 0.5);
    gl_FragColor = texture2D(s_texture, vec2(v_TexCoord.x, perspective_y));
}

We now move the position of the strip ready for the next iteration:


      Matrix.translateM(stripPaint, 0, 2, 0, 0);
    }
    Matrix.translateM(stripPaint, 0, -((float)width*2.0f), 0, 0);
    Matrix.translateM(stripPaint, 0, 0, 2f, 0);
  }
}

I have already mentioned that your actual render code needs moving from onDrawEye(Eye eye). In this function we will leave the set up code for the view matrix and the perspective matrix and call the function we have moved the render code to.

public void onDrawEye(Eye eye) {
  // Apply the eye transformation to the camera.
  Matrix.multiplyMM(viewMatrix, 0, eye.getEyeView(), 0, camera, 0);
  perspective = eye.getPerspective(Z_NEAR, Z_FAR);
  renderScene();
}

The full code is on GitHub. The original game code without any VRVideo rendering is also on GitHub for comparison.

Choreography

As video encoding using this method is very slow, recording a person playing your game may be impractical. You may wish to automate all the user actions and make game events deterministic for the recording of your video.

Sound

Recording sound in the video while it is being rendered this slowly may also be impractical, a simple solution will be to produce the sound-track separately and use your automation timings to sequence and synchronise the sounds separately.

Editing

Editing VR videos can be done with ordinary video editing software as they are stored as regular video files. Adding titles will need to be done using software that can render them with the equirectangular projection and with a correct stereo offset. The easiest way to do this may be to store your titles in textures and render them to plains in your 3D scene.
Remember that you do not know which direction the user will be facing at any particular time. This means that some games, like the Cardboard demo where one has to look for an object and "shoot it" may be completely unsuitable for VR video as it is quite possible for the viewer to miss all the action and find themselves looking around at an apparently empty scene. This problem may also apply to your titles. I like the format of rendering the titles three or four times around the scene. Guiding the user to look in a particular direction seems to defeat the point of having a 360 degree field of view.

It may also be worth mentioning that the grid background used in this game may not be the best option as aliasing is especially problematic on VR videos, partly because the user only sees a small area of the video, but also because it is stored using a non-linear projection, even straight horizontal lines will have uneven aliasing.

Uploading to YouTube

Before you upload your final video to YouTube you will need to add some meta-data. This can be achieved using the Spatial Media python script found on Google's GitHub page.