N3CTAR: Neural 3D Cellular Tessellated Automata Rendering

Team 44: Annabel Ng, George Rickus, Henry Ko, Samarth Jajoo

Abstract

Neural Cellular Automata (NCA) is a powerful framework for simulating the evolution of cellular structures over time where each cell's state is directly influenced by its neighbors, and it has been used in various applications such as image generation, texture synthesis, and even physics and biology simulations. However, most existing work in this area has focused on 2D cellular automata or static 3D voxel grids with limited user interaction. In this project, we aim to extend the NCA framework to 3D voxel grids and create a real-time rendering pipeline that allows for dynamic user destruction of the voxel grid. We first convert an input colored triangle mesh into a 3D voxel representation and train a 3D convolutional neural network that learns to create and regenerate this voxel representation from a minimal or damaged voxel grid. The model architecture includes three 3D convolutional layers, a layerNorm layer and a pooling layer for dimensionality reduction. The final trained model is visualized with a custom interactive renderer built with Vispy that allows for real-time rendering of the model output and supports user destruction of the voxel grid with the mouse cursor in order to simulate damage and regeneration.

Technical Approach

Pipeline GIF

1. 3D mesh to 3D Voxel Pipeline

We used the GREYC 3D colored mesh dataset which contains 15 different .PLY files. Each vertex of a mesh is represented by 3 coordinates(x,y,z) and its RGB(r,g,b). Here's an example of a few objects included in the dataset, but we chose to work with the Mario, Mario Kart, and Duck meshes.



3d database img

Since our neural network trains on voxel grids and not triangle meshes, we wrote a script to convert the colored 3D mesh into voxels stored in an .NPY file. The voxelization process starts by normalizing the triangle mesh into voxel grid space in order to fit within the given resolution x resolution x resolution voxel grid. We then create a blank 3D grid of voxels and then iterate through each triangle in the given mesh. For each triangle, we calculate the voxel bounding box that contains the triangle, then loop through each voxel in the bounding box and use barycentric coordinates to check if the voxel center lies within the triangle's vertices. If it does, we assign the color of the voxel to be the color of that given triangle. To compare multiple triangles that map to the same voxel, we simply select the color of the triangle with the largest area that the voxel center lies in to be the color of the voxel. Here's an example below of our voxelization:

3D Mesh Mario
3D Voxelized Mario

We also implemented a simple FloodFill algorithm to fill in the empty voxels inside the voxel object. The FloodFill algorithm starts at an exterior boundary voxel and uses BFS to find all the connected voxels that are not already filled (essentially finding the air outside the object). We then take the inverse of these "air" voxels and the filled voxels with inside_filled = ~flood_fill & ~filled to fill in the empty voxels inside the object, and we assign these inside voxels a flesh colored pink color of (255, 200, 200).


2. 3D Cellular Automata Neural Network

Each voxel is built on 16 input channels: the first 4 are, in order, corresponding to RGBA values. The other 12 can be thought of as "hidden states" that convey information to their neighbours each update. The model is built on three 3D-convolutions. The intuition behind the architecture is to first perceive from the sorroundings, and pool information from the 3x3x3 grid of neighbouring voxels. Next up, after a LayerNorm (for regularization purposes), we process the pooled information with layers with kernal size 1, eventually shrinking dimensionality to our desired output.

Initially, we trained our model to learn to grow — it started with a single black voxel (hidden state learned), and we optimized it to be able to construct the full mesh within 16-64 iterations (number of iterations is sampled uniformly). The model learns to do this relatively quickly, but does not learn to maintain the voxel grid — within a few more iterations, the voxel grid often degenerates into chaos. So, in the next stage of training, we start from the voxel grid created by the model, and optimize it to be able to maintain the voxel grid — this way, the model learns to grow our voxel grid, and maintain it. Now for the most interesting part: we made our voxel grid resilient to damage. This stage of training consists of randomly corrupting portions of the voxel grid, and training our model to be able to reconstruct these portions, resulting in a dynamic, living 3D object.

We built a curriculum to be able to manage all these learning tasks, while still preventing catastrophic forgetting: every curriculum would add 64 iterations to the last one. So, 0->64, 64->128... upto 1024.

Our model's loss function consists of 3 factors: undergrowth, overgrowth, and stability. Stability is on a linear schedule, since we want the model to just learn to grow initially. (Weights: undergrowth at 1, overgrowth at 10, and stability from 0->10).

Mario Epochs 1000
Mario before stabilization (32x32x32)
Best Mario
Best Mario after stabilization (32x32x32)
Big Kart Results
Big Mario Kart Results (64x64x64)

3. Interactive Voxel Rendering and Model Evaluation

Once the model has stabilized, we can visualize our NCA with a custom interactive GUI built with VisPy and PyQt. Vispy is a high-performance Python library powered by OpenGL, ideal for rendering large 2D and 3D visualizations like voxel grids. Its compatibility with PyTorch and PyQt made it well-suited for integrating real-time model inference with an interactive GUI. To get the interactive renderer working, we had to implement several key components, including voxel rendering, camera control, and mouse-based interaction:

  1. Rendering the voxel grid
  2. Transforming cursor clicks to 3D space
  3. Handle ray intersections with the voxel grid and damaging the voxel grid
  4. User experience

Results

Best Mario
Best Mario (32x32x32)
Duck Results
Duck (32x32x32)
Mario Kart
Mario Kart (32x32x32)
Bald Mario
Bald Mario - flood fill blooper (32x32x32)
Mario 64
Big Mario (64x64x64)
Mario 64
Big Duck (64x64x64)
Mario 64
Big Mario Kart (64x64x64)

4. Training Infrastructure and Quantization

  1. GPU Infrastructure


  2. Half Precision Training


  3. Post-Training Int8 Quantization


  4. Post-Training FP16 Quantization


  5. FP32
    INT8 Quantized
    FP16

References


Contributions

  • Annabel Ng: Developed the 3D mesh to 3D voxel pipeline, debugged the flood fill algorithm, and implemented all of the interactive voxel rendering
  • George Rickus: Focused on model training and figured out how to grow and maintain the voxel grid while also making the voxel grid resilient to damage
  • Henry Ko: Debugged the 3D mesh to 3D voxel pipeline and focused on setting up GPU infrastructure for training all these models and experimented with quanitzation
  • Samarth Jajoo: Focused on model training and supported George in figuring out how to stabilize the voxel grid