Making Sense Of Embeddings
Post 2/N
When you search on Amazon for “running shoes,” the system doesn’t just look for those exact words – it also shows you “jogging sneakers,” “athletic footwear,” and “marathon trainers.” When Spotify recommends a song you’ve never heard, it’s not because you searched for it – it’s because your listening history is close to people who love that song.
This magic happens because of embeddings. Embeddings are how AI systems understand that things are similar – even when they look completely different on the surface. This post will help you visualize what embeddings are, how they’re created, and why they power almost every modern AI application you use. No heavy math – just mental models you can carry with you.
What Is an Embedding, Really?
An embedding is a way to represent something – a word, a sentence, a product, a song, a user – as a point in space. Not physical space, but a mathematical space where closeness means similarity.
Mental Model: The Music Festival Seating Chart
Imagine you’re organizing a massive music festival and you need to seat 10,000 attendees. You want people who would enjoy talking to each other to sit nearby. You ask each person two questions: “How much do you like rock?” and “How much do you like electronic?” (Scale 1-10). Now each person can be placed on a 2D grid. Rock fans cluster in one corner. EDM fans cluster in another. Pop fans who like both end up in the middle.
You’ve just created an embedding space. Each person is now a point – a coordinate – based on their preferences. And proximity in this space means similarity.
^ Electronic
10 | * EDM fans
| * *
5 | * Pop fans
|
1 | * * Rock fans
+------------------→ Rock
1 5 10
Where It’s Used: Every search engine, recommendation system, and LLM-powered product you interact with runs on embeddings under the hood. Google Search, Spotify Discover Weekly, Netflix recommendations, Amazon product suggestions – all of them convert things into points in space and find what’s nearby.
Why Two Dimensions Aren’t Enough
The music festival example used two dimensions (rock preference, electronic preference). But two questions can’t capture the full complexity of someone’s taste.
Now imagine you add more questions: How much do you like jazz? Country? Hip-hop? With 5 questions, each person becomes a point in 5-dimensional space. You can’t visualize 5 dimensions, but the math works the same way – people with similar answers across all 5 questions will be “close” in this 5D space.
Real embedding systems use hundreds or thousands of dimensions. OpenAI’s embeddings use 1,536 dimensions. Each dimension captures some aspect of meaning – though unlike our music festival example, these dimensions aren’t hand-picked by humans. The AI learns them automatically.
Mental Model: The Questionnaire
Think of each dimension as one question on a massive personality quiz. Two dimensions is like judging someone’s music taste with just two questions – you’ll get the broad strokes but miss the nuance. 1,536 dimensions is like a 1,536-question quiz. The more questions, the more precisely you can place someone in “personality space.” Two people who answer all 1,536 questions similarly are almost certainly alike.
How AI Learns Embedding Dimensions
In our music festival, we chose the dimensions ourselves: rock preference, electronic preference. But AI systems don’t have humans hand-picking dimensions. Instead, they learn dimensions from context.
Mental Model: The New Employee
Imagine you’re a new employee at a company and you don’t speak the language. You can’t understand what anyone is saying, but you can observe who hangs out with whom. Over months, you notice: Sarah, Mike, and Priya always eat lunch together and carry laptops. John, Carlos, and Wei always eat together and carry hard hats. The first group walks toward the office building. The second group walks toward the construction site.
Without understanding a single word, you’ve learned that Sarah, Mike, and Priya are probably office workers, and John, Carlos, and Wei are construction workers. You figured out the structure from context, not from labels.
This is exactly how embedding models learn. They observe which words appear near each other in millions of sentences. Words that appear in similar contexts get placed close together in embedding space.
Consider these sentences from training data:
"I ate an apple for breakfast"
"I ate an orange for breakfast"
"I ate a banana for breakfast"
"I drove my car to work"
"I parked my car in the garage"
The model notices that “apple,” “orange,” and “banana” all appear after “ate” and before “for breakfast.” It notices “car” appears after “drove” and “parked” and near “work” and “garage.” From this, it learns: apple, orange, and banana belong together. Car is different.
The model doesn’t know that apples are fruits or that cars have wheels. It just knows that words used in similar ways should be close together. This is the key insight: embeddings don’t capture dictionary definitions. They capture usage patterns.
The Apple-Orange-Ball Problem: Why Embeddings Are Subtle
Here’s something that reveals the real power of embeddings. Consider three items: apple, orange, and ball.
Visually, an orange and a ball look more similar – both are round, roughly the same size. But in embedding space, apple and orange are much closer together. Why?
"She threw the ball across the yard" — ball appears with "threw"
"She ate the apple at lunch" — apple appears with "ate"
"She ate the orange at lunch" — orange appears with "ate"
Apple and orange share contexts (eating, breakfast, fruit salad, grocery store). Ball shares contexts with throw, catch, play, sports. The model places apple and orange close together, and ball farther away – even though ball and orange look similar physically.
This is the power of embeddings: they capture semantic similarity, not visual similarity.
^ "edible/food context"
|
10 | * apple * orange
| * banana
5 | * ball
|
1 | * car * truck
+-----------------------------------→ "transportation context"
1 5 10
What Dimensions Actually Represent
In learned embeddings, the dimensions are abstract – they don’t have clean labels like “rock preference.” But researchers have found that certain directions in embedding space capture meaningful concepts.
The famous example:
king - man + woman = queen
This shows that there’s a “gender direction” in the embedding space. If you take the point for “king,” subtract the direction for “man,” and add the direction for “woman,” you end up near “queen.”
Similarly:
Paris - France + Italy = Rome
There’s a “capital city” direction. The relationship between Paris and France is similar to the relationship between Rome and Italy. These directions weren’t programmed – they emerged from the patterns in language.
The PM Takeaway: You’ll never need to inspect individual dimensions. But knowing that directions in embedding space encode real relationships explains why vector arithmetic works for analogies, why bias shows up in embeddings (if the training data is biased, the geometry will be too), and why embeddings are the foundation of every semantic feature your team builds.
How Embeddings Power Real Applications
Once everything lives in embedding space, powerful operations become trivially easy.
Semantic Search (Google, Amazon, Notion)
Old keyword search: “running shoes” only matches documents containing those exact words. Embedding search: “running shoes” becomes a point in space. The system finds documents whose embeddings are close to that point – including documents about “jogging sneakers” and “marathon footwear.”
Query: "running shoes" → Point Q in embedding space
Documents: Each doc → Point D in embedding space
Results: Return docs where distance(Q, D) is smallest
Recommendation Systems (Spotify, Netflix, TikTok)
Your listening history becomes an embedding. Songs become embeddings. Recommend songs that are close to your user embedding.
Your taste: [rock=8, electronic=3, jazz=5, ...] → Point U
Song A: [rock=7, electronic=4, jazz=4, ...] → Point A → distance = 2.4 → Recommend!
Song B: [rock=2, electronic=9, jazz=1, ...] → Point B → distance = 8.7 → Skip
Clustering and Segmentation (Marketing, Product Analytics)
Embed all your users. Users who cluster together probably have similar behaviors. Name the clusters: “power users,” “casual browsers,” “deal hunters.”
Duplicate Detection (Customer Support, Data Cleaning)
Two support tickets might use completely different words but mean the same thing: “My order hasn’t arrived” and “Package delivery is late.” Embed both. If they’re close in embedding space, they’re probably duplicates or should be routed to the same team.
RAG for LLMs (ChatGPT, Claude)
When an LLM answers questions about your documents, it embeds your question, finds document chunks whose embeddings are close, and feeds those chunks to the LLM as context. This is why the AI can “know” about your specific documents without being trained on them. We’ll cover RAG architecture in detail in Posts 5 and 6.
The Training Loop: How Embeddings Are Created
The model starts with random embeddings – every word is assigned a random point in space. Then it trains on billions of sentences with a simple game.
Mental Model: The Prediction Game
Take a sentence: “The cat sat on the ___.” Mask one word. Ask the model to predict it. If it predicts wrong, adjust the embeddings so words that should predict each other are closer. After billions of these predictions, “cat” and “dog” end up close (both appear in “The ___ sat on the mat”) and “cat” and “car” end up far apart (they never appear in similar contexts).
The embeddings organize themselves so that prediction becomes easier. Similarity emerges as a side effect of prediction.
Embeddings vs. One-Hot Encoding
Before embeddings, the standard approach was one-hot encoding — giving each word a unique binary vector:
Vocabulary: [apple, orange, banana, car, truck]
apple = [1, 0, 0, 0, 0]
orange = [0, 1, 0, 0, 0]
banana = [0, 0, 1, 0, 0]
car = [0, 0, 0, 1, 0]
truck = [0, 0, 0, 0, 1]
The problem: every word is equally distant from every other word. Apple is as different from orange as it is from car. No notion of similarity at all.
Embeddings fix this by learning a dense representation where similar things are close:
apple = [0.8, 0.2, 0.9, ...] ← close to orange
orange = [0.7, 0.3, 0.8, ...] ← close to apple
car = [0.1, 0.9, 0.2, ...] ← far from fruits
| Factor | One-Hot Encoding | Dense Embeddings |
|---|---|---|
| Vector Size | Vocabulary size (100K+) | Fixed (300-1,536) |
| Similarity | Every word equidistant | Similar words are close |
| Storage | Sparse, wasteful | Dense, compact |
| Meaning | No semantic info | Captures relationships |
| Learned | No training needed | Requires training on data |
Common Embedding Models You’ll Encounter
| Model | Dimensions | Best For |
|---|---|---|
| Word2Vec | 300 | Classic word embeddings, fast and lightweight |
| Sentence-BERT | 768 | Sentence-level similarity and search |
| OpenAI Ada | 1,536 | General-purpose text embeddings |
| Cohere Embed | 1,024 | Multilingual, search-optimized |
| CLIP | 512 | Images and text in the same space |
More dimensions generally means more nuance, but also more storage and computation. Choosing the right model is a cost-accuracy tradeoff your team will navigate for every feature.
Common Misconceptions
“Embeddings understand meaning like humans do.” They don’t. They understand usage patterns. If the training data consistently uses a word in a biased way, the embedding will encode that bias. Embeddings are a mirror of the data, not a source of truth.
“Higher dimensions are always better.” Not necessarily. More dimensions capture more nuance but increase storage, latency, and cost. For many product use cases, 384 or 768 dimensions are more than enough. Don’t default to the biggest model.
“You need to build your own embeddings.” For most product teams, pre-trained embedding models (OpenAI, Cohere, Sentence-BERT) work out of the box. Fine-tuning or training from scratch only makes sense when your domain is highly specialized (medical, legal, internal jargon).
The Mental Models – Your Cheat Sheet
| Concept | Mental Model | One-Liner |
|---|---|---|
| Embedding | Music Festival Seating | Similar preferences sit together |
| High Dimensions | The 1,536-Question Quiz | More questions = more precise placement |
| Learning Dimensions | The New Employee | Figure out structure from context, not labels |
| Semantic vs. Visual Similarity | Apple-Orange-Ball | Usage patterns trump appearances |
| Vector Directions | King – Man + Woman = Queen | Relationships emerge as directions in space |
| Training Process | The Prediction Game | Similarity emerges from predicting neighbors |
| One-Hot vs. Dense | Every seat equidistant vs. grouped by taste | Dense captures similarity, one-hot can’t |
Final Thought
Embeddings are how AI systems understand that things are related – even when they look completely different on the surface. They work by observing patterns: words that appear in similar contexts get placed close together in a high-dimensional space.
The mental model to carry with you:
- Everything becomes a point in space. Words, sentences, images, users, products – anything can be embedded.
- Closeness means similarity. The entire point of embedding space is that distance equals meaning. Nearby points are semantically related.
- Dimensions are learned from patterns, not programmed. The AI discovers the structure of meaning by observing billions of examples, not by humans labeling axes.
- Once things are embedded, finding similar things is just finding nearby points. Search, recommendations, deduplication, RAG – they’re all the same operation: find what’s close.
The next time someone says “we’re using vector embeddings for search,” picture a vast coordinate system where every document has a GPS location, and search is just finding the closest locations to your query. That’s all embeddings are – and that mental model will serve you well.
In the next post, we’ll zoom into Word2Vec – the model that started the dense embeddings revolution – and see exactly how the “prediction game” works under the hood.