Paul's picture

In the first post I explained how to divide the world in chunks, and how it can have a huge impact for frames per second, by dramatically reducing the back and forth communication with the video card, aka draw calls.

The rest of the optimizations will deal with reducing either the amount of triangles, vertices or the time it takes to create the triangles

Optimization II

If a tree falls down in a forrest, but there is no one to hear the sound, does it exist? When making a video game, the answer is: it does not matter. 

When creating a world, a lot of blocks will be covered by other blocks. Consider a chunk seen from the side like this:

         
         
         
         
         

All the dirt blocks are covered by grass, so it does not matter if they exist, no one can possibly see them. There is no need to render the dirt blocks at all. In fact, there is no need to render the bottom of the grass either. All that is visible is the surface of the grass. This is all that needs to be rendered:

         
         
         
         
         

How can you do that in code? well, lets consider just rendering the top of the blocks, it would look something like this:

using System.Collections.Generic;
using UnityEngine;


class ChunkRenderer
{

    public static Mesh render(Chunk chunk)
    {
        List<Vector3> vertices = new List<Vector3>();
        List<int> triangles = new List<int>();

        for (int x=0; x< 32; x++)
            for (int y=0; y< 32; y++)
                for (int z = 0; z < 32; z++)
                {
                    //operator[] for chunk is overloaded to prevent out of array error 
                    //see explanation in the comments.

                    Block block = chunk[x, y, z];                    
                    Block top = chunk[x, y + 1, z];

                    // we are checking the top face of the block, so see if the top is exposed
                    if (block.IsSolid() && top.IsAir())
                    {
                        int vertexIndex = vertices.Count;
                        vertices.Add(new Vector3(x, y + 1, z));
                        vertices.Add(new Vector3(x, y + 1, z+1));
                        vertices.Add(new Vector3(x+1, y + 1, z+1));
                        vertices.Add(new Vector3(x+1, y + 1, z));

                        // first triangle for the block top
                        triangles.Add(vertexIndex);
                        triangles.Add(vertexIndex+1);
                        triangles.Add(vertexIndex+2);
                        
                        // second triangle for the block top
                        triangles.Add(vertexIndex+2);
                        triangles.Add(vertexIndex+3);
                        triangles.Add(vertexIndex);
                    }

                    // do the other faces of the block here
                }

        // Build the Mesh:
        Mesh mesh = new Mesh();
        mesh.vertices = vertices.ToArray();
        mesh.triangles = triangles.ToArray();

        return mesh;
    }
}

For every block that you render, you just check the block right on top of it. You render the surface only if this block is visible and the one on top is invisible.

Performance Analysis

In time, this algorithm is still performing O(n3) because there are 3 nested loops, and we are examining 1 block in each iteration. That said, the time it takes to create the Mesh is O(n2) because there are O(n2) triangles on average (see below). This is big overhead, so this is a significant speedup

In space, the chunk themselves still require O(n3) blocks. However, the meshes on average have roughly 1 layer of blocks. Nothing above or below this layer is drawn. The meshes now have O(n2) triangles and vertices in the average case. The worst case is still O(n3).

Draw calls are still O(n3 / (a*b*c) ) where a*b*c is the size of the chunks. However, since the meshes have dramatically less triangles and vertices, they can be bigger which reduces draw calls. This optimization allows you to go from chunks of 16*16*16 to bigger like 16*128*16, or 32*32*32, which is 8 times bigger than before. This means that there will be 8 times less draw calls for the same world.

In practice, this algorithm is fairly fast, I was seeing between 40ms to 60ms per chunk.

Comming up...

What is it with the weird numbers?
Why do I pick 16, 32, 128 as chunk sizes as opposed to 10, 50, 100 which are easier for us humans to understand?