Understanding AI

Fizan

May 8, 2023

this is not my lbog this is a clone from a git repo for builing the system

I wanted to better understand how AI models are created. Not to become an expert, but to gain an appreciation for the abstractions I use every day.

This post will highlight what I’ve learned so far. It’s written for other engineers who are new to topics like neural networks, deep learning, and transformers.

Machine Learning

Software is deterministic. Given some input, if you run the program again, you will get the same output. A developer has explicitly written code to handle each case.

Most AI models¹ are not this way — they are probabilistic. Developers don’t have to explicitly program the instructions.

Machine learning teaches software to recognize patterns from data. Given some input, you might not get the same output². AI models like GPT (from OpenAI), Claude (from Anthropic), and Gemini (from Google) are “trained” on a large chunk of internet documents. These models learn patterns during training.

Then, there’s an API or chat interface where you can talk to the model. Based on some input, it can predict and generate sentences, images, or audio as output. You can think about machine learning as a subset of the broader AI category.

Neural Networks

AI models are built on neural networks — think of them as a giant web of decision-making pathways that learn from examples. Neural networks can be used for anything, but I'll focus on language models.

These networks consist of layers of interconnected neurons that process information:

An input layer where data enters the system. Input is converted into a numerical representation of words or tokens (more on tokens later).
Many hidden layers that create an understanding of patterns in the system. Neurons inside the layer apply weights (also known as parameters) to the input data and pass the result through an activation function³. This function outputs a value, often between 0 and 1, representing the neuron's level of activation.
An output layer which produces the final result, such as predicting the next word in a sentence. The outputs at this stage are often referred to as logits, which are raw scores that get transformed into probabilities.

For example, if the input was “San”, there is likely an activation close to 1 for the next word of “Francisco”. A unrelated word like “kitten” would be close to 0.

A big takeaway for me is: it’s just math. You can build a neural network from first principles using linear algebra, calculus, and statistics. You likely won’t do this when there’s helpful abstractions like PyTorch, but it’s helpful for me to demystify what is happening under the hood.

Deep Learning

Deep learning is a subset of machine learning that involves neural networks with many layers—hence the term "deep." While a simple neural network may have just one or two hidden layers, deep learning models can have hundreds or even thousands of layers.

These additional layers enable the network to learn complex patterns. For example, large language models have been trained on multi-language datasets. This allows them to understand, generate, and translate text in multiple languages.