Posted on :: 515 Words

A Deep Learning Framework You Can Hold in Your Head

There’s something oddly satisfying about rebuilding the things we use every day.

We, as engineers, work with deep-learning stacks all the time - PyTorch, TensorFlow, JAX, Triton - yet we rarely pause to ask how they actually work under the hood.

So I decided to peel back all the abstractions - no GPUs, no kernels, no runtime magic - and rebuild the core ideas from scratch: tensors, computation graphs, and the chain rule. These are the building blocks that form the heart of every modern DNN library.

And that’s how Synapse began: a lean, CPU-only (for now) deep-learning runtime that fits comfortably in a few hundred lines of Python and NumPy.

The goal isn’t to reinvent PyTorch. It’s to see the entire machine working at once - every gear, every gradient.

When you call loss.backward(), there’s a quiet symphony unfolding beneath the surface: contexts capture intermediate states, the computation graph gets topologically sorted, gradients ripple backward through nodes, and broadcasting semantics tie everything neatly together.

The idea behind Synapse is to build that entire picture - end to end - in the simplest way possible, while still preserving the correctness and semantics you’d expect from libraries like PyTorch or JAX.

The Heartbeat of Synapse

At the center of Synapse is the autograd engine. Every tensor knows how it was created through a tiny structure called grad_fn. That node links to the operation that produced it - Add, Matmul, ReLU - and carries a small context object that remembers just enough from the forward pass to reconstruct its gradient later.

During the backward pass, the engine performs a simple topological sort of all connected nodes and then walks them in reverse. Each node calls its own backward function, using what it saved earlier in the context. It’s pure chain rule, expressed in code.

Here’s the satisfying part: once you write those 30–40 lines that make this work, you can literally see gradients flow through your graph like water. x + y → z → sum() → .backward() Each node lights up in reverse order, accumulating its partials as it goes.

Alt text

Fig: A simple compute graph of an elementwise addtion of two tensors producing loss function L. Green arrows represent forward pass. Red arrows represent backward pass.

The figure above is a simple demonstration of compute graph. When the backward pass starts, it goes from the loss function $L$ back to the nodes that created it, traversing $Z$, the addition node and then its inputs, $X$ and $Y$. Gradient computation takes place through the chain-rule, where the gradient of $L$ w.r.t $Z$ is used to compute gradients of $L$ w.r.t $X$ and $Y$.

The Next Steps

This post was more of an announcement/introduction post. In the next post, we shall go over the mathematics for backward propagation, covering all the fundamental principles that we'll use as we contruct the autodiff engine for Synapse.

Till then, sit tight!