Block engine demystified

This will the first of a series of technical blog posts in where I explain how I achieved high speed rendering.

The problem

First let me explain what the problem I am describing is.  

The world as you see in my game is composed of blocks. But the video card does not understand blocks. All it knows how to render are triangles. A block has 6 faces: top, bottom, east, west, north and south. Each face of the block is a square and can be rendered using 2 triangles. Rendering a full block would take 12 triangles.

Now, because of the way video cards work, the vertices for the top face can not be shared with the vertices from the east, west north and south faces, because their normals are different. You need a minimum of 4 vertices per face, that is for each block you need 24 vertices and 12 triangles.

So the question becomes how to produce triangles out of your blocks in an efficient way.

Brute force approach

The first brute approach would be to simply use cubes for the blocks. Cubes in Unity (my favorite game engine) have 24 vertices and 12 triangles. 

The problem with this approach, is that if you have say even a tiny world of 100x100x100, then you are talking about having 1,000,000 cubes. Each cube is drawn in a separate draw call (submitted independently to the video card). No video card in existance can take that kind of punishment, they are really good at processing large meshes, but very bad at processing tons of small objects.

Performance Analisys

In time, this algorithm performs O(n3) because each block needs to be examined once. What this means is the total time spent can be approximated by  K * n * n * n + J  where K and J depend on how fast your computer is. The faster your computer the smaller the K and J constants. But notice if I increase the world size by 2 on each side, the total time will be roughly 8 times larger.

In space, this algorithm performs O(n3) because the total amount of triangles and vertices is proportional to the total amount of blocks.

The total amount of draw calls is O(n3) because there is a draw call for every block.

This is simply a non starter.

Optimization I

Video cards love few large meshes. They call these "draw calls", and the fewer draw calls you have, the better. This is the primary driver for FPS (frames per second) on a 3d game, more so than the amount of triangles you have.

The first optimization is not to do each block independently, but create meshes composed of many blocks at a time. What I do is divide the world in chunks, a chunk is simply a region of 32x32x32 blocks. Different people pick different chunk sizes, and that has deep consequences. Minecraft for example have chunks of 16x128x16. 

With this optimization, displaying a world of 256 x 256 x 256 would take a total of 8 x 8 x 8 = 512 chunks instead of the 256 x 256 x 256 = 16,777,216 that would be necesary for individual block meshes. 

This now the video card can do, though there is a _lot_ of room for improvement.

Performance Analisys

In time, this algorithm performs O(n3) where n is the length of the world, because each block needs to be examined once.

In space, this algorithm performs O(n3) where n is the length of the world as the total amount of vertices and triangles is proportinal to the total amount of blocks.

The amount of draw calls is O(n3 / (a * b * c)) where  a* b*c is the size of the chunks.

When I tried this initially, rendering each chunk took roughly 1/2 a second but I was using much smaller chunks. When you have to render hundreds of chunks, this was painfully slow.

Comming up...

On future posts, I will go over further optimizations. They will get progresivelly more and more technical, and I will achieve better and better performance with each one.

Please comment if you want to see more posts on this subject.

Block engine demystified II -> Out of sight, out of mind

In the first post I explained how to divide the world in chunks, and how it can have a huge impact for frames per second, by dramatically reducing the back and forth communication with the video card, aka draw calls.

The rest of the optimizations will deal with reducing either the amount of triangles, vertices or the time it takes to create the triangles

Optimization II

If a tree falls down in a forrest, but there is no one to hear the sound, does it exist? When making a video game, the answer is: it does not matter. 

When creating a world, a lot of blocks will be covered by other blocks. Consider a chunk seen from the side like this:

         
         
         
         
         

All the dirt blocks are covered by grass, so it does not matter if they exist, no one can possibly see them. There is no need to render the dirt blocks at all. In fact, there is no need to render the bottom of the grass either. All that is visible is the surface of the grass. This is all that needs to be rendered:

         
         
         
         
         

How can you do that in code? well, lets consider just rendering the top of the blocks, it would look something like this:

using System.Collections.Generic;
using UnityEngine;


class ChunkRenderer
{

    public static Mesh render(Chunk chunk)
    {
        List<Vector3> vertices = new List<Vector3>();
        List<int> triangles = new List<int>();

        for (int x=0; x< 32; x++)
            for (int y=0; y< 32; y++)
                for (int z = 0; z < 32; z++)
                {
                    //operator[] for chunk is overloaded to prevent out of array error 
                    //see explanation in the comments.

                    Block block = chunk[x, y, z];                    
                    Block top = chunk[x, y + 1, z];

                    // we are checking the top face of the block, so see if the top is exposed
                    if (block.IsSolid() && top.IsAir())
                    {
                        int vertexIndex = vertices.Count;
                        vertices.Add(new Vector3(x, y + 1, z));
                        vertices.Add(new Vector3(x, y + 1, z+1));
                        vertices.Add(new Vector3(x+1, y + 1, z+1));
                        vertices.Add(new Vector3(x+1, y + 1, z));

                        // first triangle for the block top
                        triangles.Add(vertexIndex);
                        triangles.Add(vertexIndex+1);
                        triangles.Add(vertexIndex+2);
                        
                        // second triangle for the block top
                        triangles.Add(vertexIndex+2);
                        triangles.Add(vertexIndex+3);
                        triangles.Add(vertexIndex);
                    }

                    // do the other faces of the block here
                }

        // Build the Mesh:
        Mesh mesh = new Mesh();
        mesh.vertices = vertices.ToArray();
        mesh.triangles = triangles.ToArray();

        return mesh;
    }
}

For every block that you render, you just check the block right on top of it. You render the surface only if this block is visible and the one on top is invisible.

Performance Analysis

In time, this algorithm is still performing O(n3) because there are 3 nested loops, and we are examining 1 block in each iteration. That said, the time it takes to create the Mesh is O(n2) because there are O(n2) triangles on average (see below). This is big overhead, so this is a significant speedup

In space, the chunk themselves still require O(n3) blocks. However, the meshes on average have roughly 1 layer of blocks. Nothing above or below this layer is drawn. The meshes now have O(n2) triangles and vertices in the average case. The worst case is still O(n3).

Draw calls are still O(n3 / (a*b*c) ) where a*b*c is the size of the chunks. However, since the meshes have dramatically less triangles and vertices, they can be bigger which reduces draw calls. This optimization allows you to go from chunks of 16*16*16 to bigger like 16*128*16, or 32*32*32, which is 8 times bigger than before. This means that there will be 8 times less draw calls for the same world.

In practice, this algorithm is fairly fast, I was seeing between 40ms to 60ms per chunk.

Comming up...

What is it with the weird numbers?
Why do I pick 16, 32, 128 as chunk sizes as opposed to 10, 50, 100 which are easier for us humans to understand?

Block engine demystified III -> Do I have 16 fingers or what?

In my first 2 postings, I was always talking about chunk sizes of 16, 32, 128. What is this weird obsession with awkward numbers? you might ask. Or maybe you took for granted that Paul has 16 fingers. Well, no, it would be cool, and maybe my typing speed would go up 60%, but unfortunatelly that is not the case.

So then what is it?

If you follow my first post, you would have something like me: a World class would contain a 3d array of chunks (2d if you have finite height), and a Chunk would contain a 3d array of Blocks. Something like this:

class Chunk
{
    public const int SizeX = 10;
    public const int SizeY = 10;
    public const int SizeZ = 10;

    private Block[,,] blocks = new Block[SizeX, SizeY, SizeZ];

    // addresses relative to chunk origin, so 0,0,0 gives the first block in the chunk
    public Block this[int lx, int ly, int lz]
    {
        get {
            return blocks[lx,ly,lz];
        }               
        set {
            blocks[lx,ly,lz] = value;
        }
    }

}

class World
{
    // how far we can see
    public const int VisibleX = 5;
    public const int VisibleY = 5;
    public const int VisibleZ = 5;

    // the chunks around the player. The player should be at chunk
    // VisibleX,VisibleY,VisibleZ
    private Chunk[, ,] chunks = new Chunk[VisibleX * 2, VisibleY * 2, VisibleZ * 2];

    // origin of world in chunk coordinates, meaning (0,0,0) is the first chunk (1,0,0) is the chunk next to it
    // as the player walks, the chunks in the above array are shifted, and these coordinates 
    // are adjusted to keep the player at the center of the world
    private int c0x, c0y, c0z;


    // get the block at position (wx,wy,wz) where these are the world coordinates (not adjusted for shifts or anything)
    public Block this[int wx, int wy, int wz]
    {
        get {
            // first calculate which chunk we are talking about:
            int cx = (wx / Chunk.SizeX) - c0x;
            int cy = (wy / Chunk.SizeY) - c0y;
            int cz = (wz / Chunk.SizeZ) - c0z;

            // request can be out of range, then return a special
            // Unknown block type
            if (cx < 0 || cx > chunks.Length(0))
                return new Block(BlockType.Unknown);
            if (cy < 0 || cy > chunks.Length(1))
                return new Block(BlockType.Unknown);
            if (cz < 0 || cz > chunks.Length(2))
                return new Block(BlockType.Unknown);
            Chunk chunk = chunks[cx,cy,cz];

            // this figures out the coordinate of the block relative to
            // chunk origin.
            int lx = wx % Chunk.SizeX;
            int ly = wy % Chunk.SizeY;
            int lz = wz % Chunk.SizeZ;

            return chunk[lx,ly,lz];
        }
        set {
            // first calculate which chunk we are talking about:
            int cx = (wx / Chunk.SizeX) - c0x;
            int cy = (wy / Chunk.SizeY) - c0y;
            int cz = (wz / Chunk.SizeZ) - c0z;

            // cannot modify chunks that are not within the visible area
            if (cx < 0 || cx > chunks.Length(0))
                throw new Exception("Cannot modify world outside visible area");
            if (cy < 0 || cy > chunks.Length(1))
                throw new Exception("Cannot modify world outside visible area");
            if (cz < 0 || cz > chunks.Length(2))
                throw new Exception("Cannot modify world outside visible area");
            Chunk chunk = chunks[cx,cy,cz];

            // this figures out the coordinate of the block relative to
            // chunk origin.
            int lx = wx % Chunk.SizeX;
            int ly = wy % Chunk.SizeY;
            int lz = wz % Chunk.SizeZ;

            chunk[lx,ly,lz] = value;
        }
    }
}

This is a very simplified version of the Chunk and World classes. For both of them I overload the [] operators so it is convenient and readable to access blocks and set blocks in the world.

The convention is the w coordinates are world coordinates, that is as the player sees it essentially. The c coordinates are chunk coordinates, which help enumerate the chunks. And the l coordinates are local coordinates to each chunk, so they always go between 0 and Chunk.SizeX

Now, keep in mind that the world[] operator will be invoked millions of times when rendering or doing other work. Go on, take a guess, what is slow about this? all operations are O(1). In C# the / and % operators are extremelly slow. To give an idea, a division or reminder operation is roughly equivalent to 40 additions. Normally this really does not matter, but when you do it millions of time, it matters a lot.

Another problem is that this code DOES NOT WORK for negative coordinates. If the w coordinates are negatives, then the division rounds the number up, not down which is what we really need.

Optimization III

But what if we make the chunk size powers of 2? On any computer languages, you have bitwise operators. Here are some interesting facts:

x * 2y = x << y
x / 2y = x >> y      // except that / rounds up for negatives, and >> always rounds down
x % 2y = x & (2y - 1)

If you are a muggle and this sorcery scares you, allow me to explain what it does: These operators will read your mind, figure out what number you want it to calculate, and return it. Pay no attention to potential side effects like sending me your credit card information or rewriting your living will to make me the primary and sole beneficiary. The important thing is that even with the minor side effects these operations are about 40 times faster than division and reminder.

What this means, is that if our chunk sizes are powers of 2, we could use the >> and & operators to do the division and reminders. 

They call this kind of optimization microoptimization. They do not change the algorithm, they simply replace some instruction with faster ones. I normally don't bother with them in my software as it makes it less readable. But in this case, the fact that it is done so frequently, makes it actually a big deal.

So let's change the classes to take advantage of this:

class Chunk
{
    public const int LogSizeX = 5;
    public const int LogSizeY = 5;
    public const int LogSizeZ = 5;

    // This is the same as 1 * 2^LogSizeX = 2 ^ 5 = 32
    public const int SizeX = 1 << LogSizeX;
    public const int SizeY = 1 << LogSizeY;
    public const int SizeZ = 1 << LogSizeZ;

    // this is the same as 2^LogSizeX - 1 = 2^5 -1 = 31
    // from the sorcery above, it can be used to calculate x % 32
    public const int MaskX = SizeX - 1;
    public const int MaskY = SizeX - 1;
    public const int MaskZ = SizeX - 1;


    private Block[,,] blocks = new Block[SizeX, SizeY, SizeZ];

    // addresses relative to chunk origin, so 0,0,0 gives the first block in the chunk
    public Block this[int lx, int ly, int lz]
    {
        get {
            return blocks[lx,ly,lz];
        }               
        set {
            blocks[lx,ly,lz] = value;
        }
    }

}

class World
{
    // how far we can see
    const int VisibleX = 5;
    const int VisibleY = 5;
    const int VisibleZ = 5;

    // the chunks around the player. The player should be at chunk
    // VisibleX,VisibleY,VisibleZ
    private Chunk[,,] chunks = new Chunk[VisibleX*2, VisibleY*2, VisibleZ*2];

    // origin of world in chunk coordinates, meaning (0,0,0) is the first chunk (1,0,0) is the chunk next to it
    // as the player walks, the chunks in the above array are shifted, and these coordinates 
    // are adjusted to keep the player at the center of the world
    private int c0x, c0y, c0z;


    // get the block at position (wx,wy,wz) where these are the world coordinates (not adjusted for shifts or anything)
    public Block this[int wx, int wy, int wz]
    {
        get {
            // first calculate which chunk we are talking about:
            int cx = (wx >> Chunk.LogSizeX) - c0x;
            int cy = (wy >> Chunk.LogSizeY) - c0y;
            int cz = (wz >> Chunk.LogSizeZ) - c0z;

            // request can be out of range, then return a special
            // Unknown block type
            if (cx < 0 || cx > chunks.Length(0))
                return new Block(BlockType.Unknown);
            if (cy < 0 || cy > chunks.Length(1))
                return new Block(BlockType.Unknown);
            if (cz < 0 || cz > chunks.Length(2))
                return new Block(BlockType.Unknown);
            Chunk chunk = chunks[cx,cy,cz];

            // this figures out the coordinate of the block relative to
            // chunk origin.
            int lx = wx & Chunk.MaskX;
            int ly = wy & Chunk.MaskY;
            int lz = wz & Chunk.MaskZ;

            return chunk[lx, ly, lz];
        }               
        set {
            // first calculate which chunk we are talking about:
            int cx = (wx >> Chunk.LogSizeX) - c0x;
            int cy = (wy >> Chunk.LogSizeY) - c0y;
            int cz = (wz >> Chunk.LogSizeZ) - c0z;

            // cannot modify chunks that are not within the visible area
            if (cx < 0 || cx > chunks.Length(0))
                throw new Exception("Cannot modify world outside visible area");
            if (cy < 0 || cy > chunks.Length(1))
                throw new Exception("Cannot modify world outside visible area");
            if (cz < 0 || cz > chunks.Length(2))
                throw new Exception("Cannot modify world outside visible area");
            Chunk chunk = chunks[cx,cy,cz];

            // this figures out the coordinate of the block relative to
            // chunk origin.
            int lx = wx & Chunk.MaskX;
            int ly = wy & Chunk.MaskY;
            int lz = wz & Chunk.MaskZ;

            chunk[lx,ly,lz] = value;
        }
    }
}

Now, there are no divisions and remainders at all. All operations will be much faster. If you want to change the size of the Chunks, the LogSize should be 4 for 16, 5 for 32, 6 for 64, 7 for 128 and so on.

As an added bonus, this code works perfectly with negative numbers. So if you want to compare apples to apples, you would have to add code to the original version to check for negatives and adjust accordingly.

Performance Analysis

This microoptimization does not change the algorithms in any way, so the performance in O notation is exactly the same as whatever you had before.

If someone was listening out there, wants to try this, and give me real life numbers to see how much of an impact this is, it would be great. Please make sure you put your credit card information and living will on files on the same directory, trust me, it will make it faster.

Comming up...

The blocks, and memory considerations.

PS

This guy is following this blog, and writing the code in javascript. His demo looks great. 

Update

wasstraat65 reported results from this optimization:

So I found out in the lighting calculations, the / and % operators are used. So to start I switched the % with & ( chunkwidth - 1), and this simple change, reduced the total generation time for a 512x128x512 world of 32x128x32 blocks from ~40,8 seconds to a stunning ~36,1 seconds. With the >> operator added, it is reduced to 29 seconds! That's halve of the total ligh calculation process!

Block engine demystified IV -> Packing lunch for millions

The world has different types of blocks. They behave in different ways, and have different properties. In object oriented programming this is called polymorphism. A textbook OOD (Object Oriented Design) for this would look like:

class Block
{
    // you might want some attributes common to all blocks
    private byte light;

    // and some behaviour which could change according to implementation
    public virtual bool IsSolid();
    ...
}

class Air : Block
{
    public override bool IsSolid()
    {
       return false;
    }
    ...
}

class Dirt : Block
{
    public override bool IsSolid()
    {
       return true;
    }
    ...
}
...

// Then you have your chunk class which will contain a 3d array of blocks
// Chunks are arrays of blocks
class Chunk
{
    private Block[,,] blocks = new Block[SizeX, SizeY, SizeZ];
    ...
}

That would be conceptually correct. But here is where it pays to know how computer languages and in this case C# work.

Instances of classes are created in what is called the heap by the memory manager. The memory manager takes accounting overhead of about 16 bytes for each object.

The block array in the chunk is not really an array of blocks, it is instead an array of pointers to the blocks that are allocated in the heap. Each pointer on a 32 bit machine takes 4 bytes, on a 64 bit machine it takes 8 bytes.

Every object has a pointer to what is called the virtual table. This is used to figure out which method should really be called when methods like IsSolid are invoked. Again, this is another 4 or 8 bytes depending on your platform.

Moreover, for every single block, the memory manager has to go through the heap to try to find space where to put it, this is called allocation. Also for each block, the memory manager has to check if it is still in use in order to garbage collect it.

In summary, it would take roughly 24 to 32 bytes per block. If we have even a small world of 512x128x512 that would mean roughly 800 MB to 1GB.  Also we would be doing O(n3) allocations and garbage collections which is going to be very slow. Actually, if you try it, the memory manager will complain that there are too many objects allocated, write a goodby letter and commit suicide.

Optimization IV

Those that know c# know to use structs for this. But we do want to mantain polymorphism which you can not do with structs. To do this I essentially reinvent the virtual table. 

What I do is have 1 byte on the block structure, which gives you the type. Then you use that byte to figure out the class that has the actual methods. Like this:

struct Block
{
    private readonly byte type;
    // you might want some attributes common to all blocks
    private readonly byte light;

    // the is solid method, looks up what is the class that should serve this request, and forwards the request to it
    // do this for all polymorphic methods in your block
    public bool IsSolid() {
       BlockTypes.GetBlockType(type).IsSolid(this);
    }
    ...
}

interface BlockType
{
    // you only need the block for some of the methods
    // for example if your behaviour depended on data stored per block
    // otherwise, you can remove the parameter
    public bool IsSolid(Block block);
    ...
}

class AirType : BlockType
{
    public bool IsSolid(Block block)
    {
        return false;
    }
    ...
}

class DirtType : BlockType
{
    public bool IsSolid(Block block)
    {
        return true;
    }
    ...
}
...

class static BlockTypes
{
    private static BlockType[] types = new BlockType[] { new AirType(), new DirtType() };
    
    private static GetBlockType(int type)
    {
         return types[type];
    }
}

// Then you have your chunk class which will contain a 3d array of blocks
// Chunks are arrays of blocks
class Chunk
{
    private Block[,,] blocks = new Block[SizeX, SizeY, SizeZ];
    ...
}

Structs are not allocated in the heap, they are simply allocted consecutivelly in the array. There is no hidden overhead per block. The block structure here, takes only 2 bytes so when you allocate an array of 100 blocks, it only takes 200 bytes.  The type field is a byte, which means that you can have up to 256 block types, which is pretty reasonable for this game. If you actually need more, just make it an ushort.

In the case of 512x128x512, you would need only about 64 MB plus some small overhead per chunk. 

Now the garbage collector allocates the whole array at once, rather than one object at a time. It garbage collects the whole array at a time as well. This means much less overhead for the memory manager. 

With this implementation, you preserve polymorphism (up to a point, you can not add additional properties for some blocks), yet manage memory in an efficient way.

Another thing is that getting the type for the block does not need to follow a pointer, so there is one less indirection here. This is a microoptimization, but one that needs to be executed millions of times, so it makes a big difference.

Apparently you can do this in unityscript (javascript) too. You cannot do it in java. Kudos to Notch that got Minecraft working without structs. But then his engine is several times slower than mine Happy

Performance analysis

The total amount of memory is still proportional to the total amount of blocks. Therefore the performance is still O(n3) in space.

The total amount of allocations and garbage collections is now O(n3/(a*b*c)) where n3 is the size of the world, and a*b*c is the size of the chunks. This affects time, since allocations take time.

A big win is cache locality. Blocks are neatly packed one after another in the array, instead of scattered all over the place in the heap. This means that if you access blocks in order (which you will do), you take advantage of the hardware cache which loads large regions of consecutive memory at a time.

In real world, back when I was experimenting with this, it took 3 minutes to generate the chunks using classes, and it went down to 1:40 minutes simply by switching to struct. Nowadays I got it down to under 3 seconds in the same machine.

Updated

As GIB3D pointed out BlockType should be an interface.

Also, the IsSolid method should really be a property getter. The only reason I did not do it in this code is to make it more readable especially for the non C# devs. It is a property in my own code.

Block engine demystified V -> The circle of life

I have been writting articles explaining some of the details of how to implement a block engine. But it occurred to me that maybe I need to take it one step back and talk about the big picture. I am so close to this tree that might have taken for granted the fact that there was a forrest around me.

So in this article, I am just going to talk about the big picture of the lifecycle of chunks. How do we get from a seed (random string typically provided by user) to triangles in your screen from a 10000 mile high POV.

The earth took billions of years to form. If you are a christian, then you would say it took 7 days to create the earth. If you are making a block engine, you need to be a hell of a lot better than that, try a few seconds. And you want to make it interesting with mountains, trees, caves, seas, etc... 

So how can you optimize Genesys down to a few seconds?

The first thing to notice, is that you do not have to create the whole world at once. The player can only see so far, so all you really need to create are chunks that are close to the player. A world object then needs to watch the player, and load and unload chunks as they become visible. This is similar to something multiplayer games do called Region-based Interest Management system (If there is interest, I will write an article on that).

As the world loads chunks, the chunks go through the following life cycle:

Prioritize -> Generate -> Physics -> Light -> Render -> Dispose

Prioritize

First, determine which chunks need to be generated. It is also important to determine in which order. You want the chunks that are closest to the player to be done first, and then the chunks that are far from the player. 

So you loop through the coordinates of the chunks within an area of the player, and place them in a PriorityQueue ordered by the distance to the player. The closest chunks towards the top of the the list. You could come up with better algorithms to sort the chunks, but the fact is that it does not matter since we are talking about just a few dozen chunks.

Then for each chunk that needs to be generated, you go throgh the rest of the process.

Generate

Before generation, there is nothing, just thin air. Generation consists of creating the chunk as the 3d array of blocks, and start filling it up with actual blocks based on a seed.

The heart of this process is the simplex noise function (perlin noise also works). Without going into all the nitty gritty details, the simplex noise function receives a coordinate in 3d (any dimension really) and returns a float number from -1 to 1. The important part is that this number is always the same for the same coordinate, it is smooth, and appears random. 

Then you combine this noise function with different frequencies and different amplitudes to determine wether a particular block is air or solid and what kind it is.

Physics

Once you initialize the chunk, you might need to run some physics, like water flow. Note I have not done that yet in my engine. You essentially loop through all the blocks that you generated, and determine if there is some action that needs to be taken for it. For example, if you see a water block, check its neighbors, if neighbors are air, then change neighbor to have some water.

Light

Essentially you perform a flood fill, where a block with light, throws some light to its neighbor blocks, and then you repeat the process for the neighbor blocks until you run out of light.

Render

Once light is calculated, you are ready to start producing triangles. This step consists of looping through the blocks, and produce triangles for each visible one. See my previous article for very basic information on how to do this.

Dispose

Once the player walks away, the chunk needs to be discarded, otherwise you would run out of memory quickly. You might need to save the chunk to persistent storage at this point.

 

From this process, the slowest step by far is Generate. It heavily depends on how complex you want your terrain to be, how many times you call the noise function.

Note that if the player changes the world, then you essentially need to rerun physics, light and render. The good part is that you can skip generation completelly which is the expensive part. So this can be done very quickly.

I can elaborate on any one of these steps but I will require from you the reader to request which one you would like to know more about.

Update

Charles Barros has taken it upon himself to proof that I am not just talking nonsense.  He has generously implemented a block engine based on some of the things I wrote about and made it open source.

If you don't want to start from scratch,  you could get that, and work your way up.

 

Block Engine demystified VI -> The crabby patty recipe

Plankton has gotten really close to figuring out how to make a crabby patty. But no matter how hard he tries, he still can't figure out the ingredients for that bread + lettuce + tomato + patty sandwich.

So how did Block Story succeed where Plankton failed? How does the game know that when you combine certain items in some order and quantity you get the crabby patty?

The problem

You have a grid 2x2 or 3x3 (or any arbitrary grid size for that matter). You have potentially hundreds of recipes, each one with 1 to 9 items in any quantity. For example:

 
       Bread       
  Patty  
  Bread  

V

Crabby Patty

When the user enters ingredients, the goal is to recognize that his ingredients match one of the recipes and can produce some output.

Notice we would like to make it so that if the ingredients are in the first, second or last column, it still matches.

We would also like to make it so that if the user puts 10 of each ingreadients,  it still matches,  produces 1 crabby patty, and then he is left with 9 of each ingredients. 

Another thing to watch out for:  it could be possible to have more than 1 recipe matching the user input. So the recipes should be designed carefuly so that they are never ambiguous  

Brute force

The naive approach, would be:  every time the table changes contents,  look at each recipe and see if it matches.  Notice if you have say 500 recipes,  this could actually take some time.  There also remains the issue of dealing with alignment problems (having the ingredients in any column). Checking one by one would require comparing each recipe and for each recipe, comparing all cells,  so its performance is O(n*s) where n is the amount of recipes, and s is the size of the grid.

If you think this is a reasonable approach. Then what the heck are you doing building a block engine? just quit right now and come back in a few years.

Solving the Alignment problem

The alignment problem is easy to solve. When reading a recipe,  only store columns and rows that have something,  throw away all empty columns and rows. For example, Plankton would only have to remember this:

 

 
Bread
Patty
Bread

V

Crabby Patty
 

 

 

Meaning,  he only has to remember that 1 column with bread, patty and bread produces a crabby patty, and forget that this is for 3x3 or 10x10 grid.

When matching the user input, he can do the same process: eliminate all empty columns and rows. So if the user enters any one of these:

 
Bread         
Patty    
Bread        
 
       Bread       
  Patty  
  Bread  
 
              Bread
    Patty
    Bread

The result will be exactly the same:

 

 
Bread
Patty
Bread

 

Now we compare the results with the recipes.  This solves the alignment problem for all recipes regardless of grid size. But I have still not touched on making it efficient.

Matching recipes efficiently

I don't do it exactly like this, but this is a convenient way to explain the algorithm.  For both the recipe and the user input,  you can encode it as a string,  like this:

 
Bread
Patty
Bread
 
1x3BPB

Which means the recipe is 1 column by 3 rows, and it has from left to right top to bottom Bread, Patty and Bread.

You can build an entire table from these strings with all the recipes.  For example:

Encoded ingredients Output Description
1x3BPB 1 crabby patty 1 column 3 rows recipe for crabby patty
1x1R 3 red dyes just 1 cell with 1 ruby ore produces 3 red ores
3x3WWWW_WWWW 1 chest

1 rows with 3 wood,
1 row with 1 wood empty and wood,
1 row with wood produces a chest

...    

Now, when the user puts something in the crafting table,  encode what he enters into a string using the same algorithm,  and then it is just a matter of looking up the string in this table.

The simplest way to do this, is to store that table in a Dictionary or Hash table, then the lookup can be done in O(1), meaning it does not matter how many recipes there are.

So the algorithm is:

  1. At game startup,
    1. eliminate all empty rows and columns from all recipes
    2. encode each recipe into a string
    3. put all encoded recipes into a hash table indexed by the encoded string
  2. When the user puts something in the crafting table
    1. eliminate all empty rows and columns from the input
    2. encode the result into a string
    3. lookup that string in the recipe hash table
    4. if matches, then show the output for that recipe

At game startup, the algorithm performs O(n*s) which is fine because it is done only once while the user enjoys our cool loading screen.

When the user enters something, the algorithm does the matching in O(s) where s is the size of the grid.

 

Block Engine demystified VII -> from a word to a world

I have gotten a few requests about this:  How do seeds work?  What kind of sourcery is behind generating a humongous world from just 1 word. Well tighten your math belt for this ride can be rough.

There are essentially 2 ways to do procedural terrain generation:  

* Generate a blank canvas and add features to it like mountains and trees and houses and stuff.  This would allow you to have perfect structures,  but it is very complicated when a feature is generated right between chunks, especially chunks that are not loaded yet.

* Generate each voxel separately based on a math function without knowing it is part of something bigger.  The key here is that the math function needs to be written in such way that it can generate voxels in a way that look like structures. This is the approach I took. The whole terrain generation is thus just 1 function:  f(seed,x,y,z)  that returns a block.  

I could write books on this subject, but let's start by giving a big picture of how it works, and on subsequent articles I could dig a little deeper on each part.

The easiest way to explain it that I can find is by showing the structure of a terrain generation in layers.  A layer uses the data provided by the lower layer in order to calculate what it returns. The very basic structure is this:

Terrain Generation
Simplex noise
Pseudo-random number generator
Seed

The seed

At the lowest level,  we have the seed.  The seed is what the user enters,  it can be any random string.

Pseudo random number generator

The pseudo random number generator (PRNG) is a function that receives the seed and a coordinate and returns a number. It should have the following properties:

* Given the same seed and coordinates it always returns the same number,  so it is not true random.

* Given 2 different seeds (even if just 1 bit),  it should generate completely diferent results

* It should behave in such way that 2 coordinates don't seem to have any correlation, for example [PRNG(seed,x,y,z), PRNG(seed,x,y,z+1), PRNG(seed,x,y,z+2), PRNG(seed,x,y,z + 3), ...]  should look like a completely random sequence of numbers.

Simplex noise

The simplex noise, is a masterpiece algorithm from Ken Perlin.  The idea of the simplex noise is to create a function that is continuous and derivable over any number of dimensions, with the result between 0 and 1.  To get an idea,  this is what simplex noise generates on a 2 dimension plane:

I changed the reference implementation for simplex noise to use my PRNG function.  So the result is that 2 different seeds would produce a different simplex noise.

Continuous means that if you move slightly on any axis,  the value changes only slightly. A non continuous function would have jumps, and lines that cut across.

Derivable means that it does not have sharp edges,  but it is always smooth at any point.

Terrain Generation

Look at the simplex noise picture above.  Imagine that white is high and black is low.  Now you are picturing mountains and valleys.  

But if that was all I did,  you would never get caves, and the mountains would be all the same size and very regular. It would look very bad,  kind of like this:

 

And you would spit in my face if that was the terrain that block story generated.

A particularly useful technique for generating realistic terrain is called fbm (fractal brownian motion).  It evaluates noise functions several times at different frequencies and amplitudes, to generate mountains with features.  A 2d fbm looks like this:

 

Note it is much more rich than a plain simplex node.  And you can see big mountains with hills in them.

Here is an fbm generated terrain:

 

 

Now,  before I dig deeper into any one of these layers,  let's hear it:  what would you like to know more about?

Block Engine demystified VIII -> The terrain generator

In my previous article,  I explained that the generator is divided in layers.

Let's go deeper into the terrain generation layer which is where the meat is.  Here is an illustrated explanation for how infinite terrain is generated:

3670