this is not my lbog this is a clone from a git repo for builing the system
I wanted to better understand how AI models are created. Not to become an expert, but to gain an appreciation for the abstractions I use every day.
This post will highlight what I’ve learned so far. It’s written for other engineers who are new to topics like neural networks, deep learning, and transformers.
Software is deterministic. Given some input, if you run the program again, you will get the same output. A developer has explicitly written code to handle each case.
Most AI models¹ are not this way — they are probabilistic. Developers don’t have to explicitly program the instructions.
Machine learning teaches software to recognize patterns from data. Given some input, you might not get the same output². AI models like GPT (from OpenAI), Claude (from Anthropic), and Gemini (from Google) are “trained” on a large chunk of internet documents. These models learn patterns during training.
Then, there’s an API or chat interface where you can talk to the model. Based on some input, it can predict and generate sentences, images, or audio as output. You can think about machine learning as a subset of the broader AI category.
AI models are built on neural networks — think of them as a giant web of decision-making pathways that learn from examples. Neural networks can be used for anything, but I'll focus on language models.
These networks consist of layers of interconnected neurons that process information:
For example, if the input was “San”, there is likely an activation close to 1 for the next word of “Francisco”. A unrelated word like “kitten” would be close to 0.
A big takeaway for me is: it’s just math. You can build a neural network from first principles using linear algebra, calculus, and statistics. You likely won’t do this when there’s helpful abstractions like PyTorch, but it’s helpful for me to demystify what is happening under the hood.
Deep learning is a subset of machine learning that involves neural networks with many layers—hence the term "deep." While a simple neural network may have just one or two hidden layers, deep learning models can have hundreds or even thousands of layers.
These additional layers enable the network to learn complex patterns. For example, large language models have been trained on multi-language datasets. This allows them to understand, generate, and translate text in multiple languages.