Neural Network in Assembly — MNIST from Scratch — 7x faster

Author: Me and (Copilot)

Overview

This project implements a simple neural network entirely in x86 assembly language to recognize handwritten digits from the MNIST dataset. The goal was to understand how neural networks work at the lowest level — from memory layout and arithmetic operations to training logic — without relying on high-level libraries or frameworks. It runs in a lightweight Debian Slim environment via Docker for easy setup.

Why I Built This

Sometimes, we think we truly understand something, until we try to build it from scratch. When theory meets practice, every small detail becomes a challenge.

Not only is this a pure assembly implementation, but I also optimized it for performance:

Using AVX-512 ZMM registers, I can compute 16 float32 operations in parallel, adding SIMD acceleration to the neural network computations.
As a result, this assembly implementation is roughly 7× faster than a pure Python implementation using NumPy (which itself relies on C libraries).

I decided to write this project in pure assembly language to push my limits and see what really happens behind high-level neural network frameworks. It was a deep dive into how each operation — from matrix multiplication to gradient updates — actually works at the CPU level.

Throughout this project, I faced many challenges in both implementation and understanding of neural network concepts. But this is just the beginning — I plan to continue building what I want to know from scratch in deep learning and neural networks, creating everything in assembly to truly understand how they work underneath.

How It Works

Network Architecture

Layer	Size	Activation
Input	784	– (flattened MNIST image)
Hidden 1	128	ReLU
Hidden 2	64	ReLU
Output	10	Softmax

Training Details

Epochs: 10
Batch size: 32
Training samples: 60,000
Test samples: 10,000
Learning rate: 0.01

Challenges

I first wanted to make this for Windows, but Windows API calls in assembly were too complicated to debug. So I switched to Linux where I could use GDB in the terminal, which was hard but fun to learn.
For the softmax and loss functions, I needed exp() and log(). Instead of writing them myself, I used the math library by linking with -lc -lm.
To help debug, I wrote the same neural network in Python to compare outputs and find where my assembly was wrong.
I first implemented all operations using 64-bit doubles, but then switched to 32-bit floats to reduce memory usage and improve performance. This required changing memory layouts, instructions, and data handling throughout the code.
I added parallelism using AVX-512 ZMM registers, allowing 16 float32 calculations to be performed simultaneously in dot products and matrix operations, which sped up computation significantly.
I discovered that reading each MNIST image individually from disk was a major bottleneck. I solved this by loading the entire dataset into RAM at once, which drastically reduced file I/O overhead.
I tried to make my functions flexible and reusable across different parts of the neural network.
I tried to minimize using the stack and use registers more instead. It was really hard to find registers that weren't already being used somewhere else.

Code

The program starts from _start.asm and trains the neural network on MNIST data. In 10 epochs It processes images in batches, passing them through three layers (128 → 64 → 10 neurons) with ReLU activation. For each batch, it calculates the loss and uses backpropagation backprop.asm to compute gradients for all weights and biases in (dW1, dbias1, ...).

After each batch, gradient.asm updates the weights using these gradients and a learning rate of 0.01. layers_buffer.asm provides the memory space for all layer outputs and gradients. Once training completes, the program tests the network on unseen images and prints the final accuracy.

The build.sh script assembles all NASM files into object files, links them with the math libraries, and produces the final executable.

Build & Run with Docker

This project can be built and run inside a Docker container with NASM and build tools installed. Follow these steps:

Build the Docker image

docker build -t nasm-assembly .

Run the Docker Container

docker run \
  --volume="PATH/TO/PROJECT:/mnt/project" \
  --cpus=4 \
  --memory=4g \
  --memory-swap=4g \
  nasm-assembly

Replace PATH/TO/PROJECT with your local project folder, e.g.:

Windows: C:/Users/YourName/NASM_MNIST
Linux/Mac: /home/username/NASM_MNIST

Enter the Container

docker exec -it <container_id_or_name> bash

Run & build

This project includes a build.sh script to assemble, link, and run the NASM MNIST neural network.

./build.sh

This will assemble all .asm files, link them, and produce the executable ./mnist.

Run the neural network:

./mnist

Production Build

This project includes a build_prod.sh script for creating an optimized production executable:

./build_prod.sh

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
dataset		dataset
docs		docs
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
ReLU_activation.asm		ReLU_activation.asm
_start.asm		_start.asm
argmax.asm		argmax.asm
backprop.asm		backprop.asm
build.sh		build.sh
build_prod.sh		build_prod.sh
dot_product.asm		dot_product.asm
exp_double.asm		exp_double.asm
forward_path.asm		forward_path.asm
gradients.asm		gradients.asm
layers_buffer.asm		layers_buffer.asm
layers_data.asm		layers_data.asm
loader.asm		loader.asm
matrix_ops.asm		matrix_ops.asm
mnist		mnist
mnist_data.asm		mnist_data.asm
neg_log.asm		neg_log.asm
print.asm		print.asm
softmax.asm		softmax.asm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Neural Network in Assembly — MNIST from Scratch — 7x faster

Overview

Why I Built This

How It Works

Network Architecture

Training Details

Challenges

Code

Build & Run with Docker

Build the Docker image

Run the Docker Container

Enter the Container

Run & build

Production Build

Debugging Environment

Starting

Converging

Checking values

Accuracy 94.9%

About

Uh oh!

Releases

Packages

Languages

mohammad-ghaderi/mnist-asm-nn

Folders and files

Latest commit

History

Repository files navigation

Neural Network in Assembly — MNIST from Scratch — 7x faster

Overview

Why I Built This

How It Works

Network Architecture

Training Details

Challenges

Code

Build & Run with Docker

Build the Docker image

Run the Docker Container

Enter the Container

Run & build

Production Build

Debugging Environment

Starting

Converging

Checking values

Accuracy 94.9%

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages