In my previous blog, I walked through how to train a simple Bigram language model locally using PyTorch based on Andrej Karpathy’s “Let’s build GPT: from scratch, in code, spelled out” tutorial (code, video).

That was a fun and insightful exercise—but it only scratched the surface of modern AI. Real-world machine learning and AI today are not just about clever models or algorithms. They are about infrastructure, scalability, and efficient computation.

In this post, I want to take things a step further. We’ll explore how to build and train GPT-like models from scratch, but this time with an eye toward distributed computing and production readiness using Ray, Metaflow, Docker, and Kubernetes.