Skip to content
icon icon Building AI Intuition

Connecting the dots...

icon icon Building AI Intuition

Connecting the dots...

  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
Close

Search

Subscribe
icon icon Building AI Intuition

Connecting the dots...

icon icon Building AI Intuition

Connecting the dots...

  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
Close

Search

Subscribe
Recent Posts
March 1, 2026
Teaching AI Models: Gradient Descent
March 1, 2026
Needle in the Haystack: Embedding Training and Context Rot
March 1, 2026
Measuring Meaning: Cosine Similarity
February 28, 2026
AI Paradigm Shift: From Rules to Patterns
February 16, 2026
Seq2Seq Models: Basics behind LLMs
February 16, 2026
Word2Vec: Start of Dense Embeddings
February 13, 2026
Advertising in the Age of AI
February 8, 2026
Breaking the “Unbreakable” Encryption – Part 2
February 8, 2026
Breaking the “Unbreakable” Encryption – Part 1
February 8, 2026
ML Foundations – Linear Combinations to Logistic Regression
February 2, 2026
Privacy Enhancing Technologies – Introduction
February 2, 2026
Privacy Enhancing Technologies (PETs) — Part 3
February 2, 2026
Privacy Enhancing Technologies (PETs) — Part 2
February 2, 2026
Privacy Enhancing Technologies (PETs) — Part 1
February 2, 2026
An Intuitive Guide to CNNs and RNNs
February 2, 2026
Making Sense Of Embeddings
November 9, 2025
How CNNs Actually Work
August 17, 2025
How Smart Vector Search Works
Machine Learning Basics

Word2Vec: Start of Dense Embeddings

Post 2a/N When you type a search query into Google or ask Spotify to find “chill acoustic covers,” the…

Machine Learning Basics Model Intuition

Teaching AI Models: Gradient Descent

Post 1b/N In the last post, we established the big idea: machine learning is about finding patterns from data instead…

Encryption

Breaking the “Unbreakable” Encryption – Part 2

In Part 1, we covered the “Safe” (Symmetric) and the “Mailbox” (Asymmetric). The TL;DR: we use…

Privacy Tech

Privacy Enhancing Technologies (PETs) — Part 2

Secure Collaboration Without Sharing Raw Data In Part 1, we covered how individual organizations protect data…

Machine Learning Basics

Making Sense Of Embeddings

Post 2/N When you search on Amazon for “running shoes,” the system doesn’t just look for those exact…

Machine Learning Basics

Breaking the “Unbreakable” Encryption – Part 1

If you’ve spent any time in tech, you’ve heard of AES, RSA, and Diffie-Hellman. We treat them like digital…

Home/Machine Learning Basics/Making Sense Of Embeddings
Machine Learning Basics

Making Sense Of Embeddings

By Archit Sharma
9 Min Read
0
Updated on March 1, 2026

Post 2/N

When you search on Amazon for “running shoes,” the system doesn’t just look for those exact words – it also shows you “jogging sneakers,” “athletic footwear,” and “marathon trainers.” When Spotify recommends a song you’ve never heard, it’s not because you searched for it – it’s because your listening history is close to people who love that song.

This magic happens because of embeddings. Embeddings are how AI systems understand that things are similar – even when they look completely different on the surface. This post will help you visualize what embeddings are, how they’re created, and why they power almost every modern AI application you use. No heavy math – just mental models you can carry with you.


What Is an Embedding, Really?

An embedding is a way to represent something – a word, a sentence, a product, a song, a user – as a point in space. Not physical space, but a mathematical space where closeness means similarity.

Mental Model: The Music Festival Seating Chart

Imagine you’re organizing a massive music festival and you need to seat 10,000 attendees. You want people who would enjoy talking to each other to sit nearby. You ask each person two questions: “How much do you like rock?” and “How much do you like electronic?” (Scale 1-10). Now each person can be placed on a 2D grid. Rock fans cluster in one corner. EDM fans cluster in another. Pop fans who like both end up in the middle.

You’ve just created an embedding space. Each person is now a point – a coordinate – based on their preferences. And proximity in this space means similarity.

        ^ Electronic
    10  |     * EDM fans
        |   *   * 
     5  |          * Pop fans
        |  
     1  | * *  Rock fans
        +------------------→ Rock
          1    5    10

Where It’s Used: Every search engine, recommendation system, and LLM-powered product you interact with runs on embeddings under the hood. Google Search, Spotify Discover Weekly, Netflix recommendations, Amazon product suggestions – all of them convert things into points in space and find what’s nearby.


Why Two Dimensions Aren’t Enough

The music festival example used two dimensions (rock preference, electronic preference). But two questions can’t capture the full complexity of someone’s taste.

Now imagine you add more questions: How much do you like jazz? Country? Hip-hop? With 5 questions, each person becomes a point in 5-dimensional space. You can’t visualize 5 dimensions, but the math works the same way – people with similar answers across all 5 questions will be “close” in this 5D space.

Real embedding systems use hundreds or thousands of dimensions. OpenAI’s embeddings use 1,536 dimensions. Each dimension captures some aspect of meaning – though unlike our music festival example, these dimensions aren’t hand-picked by humans. The AI learns them automatically.

Mental Model: The Questionnaire

Think of each dimension as one question on a massive personality quiz. Two dimensions is like judging someone’s music taste with just two questions – you’ll get the broad strokes but miss the nuance. 1,536 dimensions is like a 1,536-question quiz. The more questions, the more precisely you can place someone in “personality space.” Two people who answer all 1,536 questions similarly are almost certainly alike.


How AI Learns Embedding Dimensions

In our music festival, we chose the dimensions ourselves: rock preference, electronic preference. But AI systems don’t have humans hand-picking dimensions. Instead, they learn dimensions from context.

Mental Model: The New Employee

Imagine you’re a new employee at a company and you don’t speak the language. You can’t understand what anyone is saying, but you can observe who hangs out with whom. Over months, you notice: Sarah, Mike, and Priya always eat lunch together and carry laptops. John, Carlos, and Wei always eat together and carry hard hats. The first group walks toward the office building. The second group walks toward the construction site.

Without understanding a single word, you’ve learned that Sarah, Mike, and Priya are probably office workers, and John, Carlos, and Wei are construction workers. You figured out the structure from context, not from labels.

This is exactly how embedding models learn. They observe which words appear near each other in millions of sentences. Words that appear in similar contexts get placed close together in embedding space.

Consider these sentences from training data:

"I ate an apple for breakfast"
"I ate an orange for breakfast"
"I ate a banana for breakfast"
"I drove my car to work"
"I parked my car in the garage"

The model notices that “apple,” “orange,” and “banana” all appear after “ate” and before “for breakfast.” It notices “car” appears after “drove” and “parked” and near “work” and “garage.” From this, it learns: apple, orange, and banana belong together. Car is different.

The model doesn’t know that apples are fruits or that cars have wheels. It just knows that words used in similar ways should be close together. This is the key insight: embeddings don’t capture dictionary definitions. They capture usage patterns.


The Apple-Orange-Ball Problem: Why Embeddings Are Subtle

Here’s something that reveals the real power of embeddings. Consider three items: apple, orange, and ball.

Visually, an orange and a ball look more similar – both are round, roughly the same size. But in embedding space, apple and orange are much closer together. Why?

"She threw the ball across the yard"     — ball appears with "threw"
"She ate the apple at lunch"             — apple appears with "ate"
"She ate the orange at lunch"            — orange appears with "ate"

Apple and orange share contexts (eating, breakfast, fruit salad, grocery store). Ball shares contexts with throw, catch, play, sports. The model places apple and orange close together, and ball farther away – even though ball and orange look similar physically.

This is the power of embeddings: they capture semantic similarity, not visual similarity.

        ^ "edible/food context"
        |
    10  |  * apple    * orange
        |     * banana
     5  |                    * ball
        |  
     1  |                           * car  * truck
        +-----------------------------------→ "transportation context"
          1         5         10

What Dimensions Actually Represent

In learned embeddings, the dimensions are abstract – they don’t have clean labels like “rock preference.” But researchers have found that certain directions in embedding space capture meaningful concepts.

The famous example:

king - man + woman = queen

This shows that there’s a “gender direction” in the embedding space. If you take the point for “king,” subtract the direction for “man,” and add the direction for “woman,” you end up near “queen.”

Similarly:

Paris - France + Italy = Rome

There’s a “capital city” direction. The relationship between Paris and France is similar to the relationship between Rome and Italy. These directions weren’t programmed – they emerged from the patterns in language.

The PM Takeaway: You’ll never need to inspect individual dimensions. But knowing that directions in embedding space encode real relationships explains why vector arithmetic works for analogies, why bias shows up in embeddings (if the training data is biased, the geometry will be too), and why embeddings are the foundation of every semantic feature your team builds.


How Embeddings Power Real Applications

Once everything lives in embedding space, powerful operations become trivially easy.

Semantic Search (Google, Amazon, Notion)

Old keyword search: “running shoes” only matches documents containing those exact words. Embedding search: “running shoes” becomes a point in space. The system finds documents whose embeddings are close to that point – including documents about “jogging sneakers” and “marathon footwear.”

Query: "running shoes"  →  Point Q in embedding space
Documents: Each doc     →  Point D in embedding space
Results: Return docs where distance(Q, D) is smallest
Recommendation Systems (Spotify, Netflix, TikTok)

Your listening history becomes an embedding. Songs become embeddings. Recommend songs that are close to your user embedding.

Your taste: [rock=8, electronic=3, jazz=5, ...]  →  Point U
Song A:     [rock=7, electronic=4, jazz=4, ...]  →  Point A  →  distance = 2.4  →  Recommend!
Song B:     [rock=2, electronic=9, jazz=1, ...]  →  Point B  →  distance = 8.7  →  Skip
Clustering and Segmentation (Marketing, Product Analytics)

Embed all your users. Users who cluster together probably have similar behaviors. Name the clusters: “power users,” “casual browsers,” “deal hunters.”

Duplicate Detection (Customer Support, Data Cleaning)

Two support tickets might use completely different words but mean the same thing: “My order hasn’t arrived” and “Package delivery is late.” Embed both. If they’re close in embedding space, they’re probably duplicates or should be routed to the same team.

RAG for LLMs (ChatGPT, Claude)

When an LLM answers questions about your documents, it embeds your question, finds document chunks whose embeddings are close, and feeds those chunks to the LLM as context. This is why the AI can “know” about your specific documents without being trained on them. We’ll cover RAG architecture in detail in Posts 5 and 6.


The Training Loop: How Embeddings Are Created

The model starts with random embeddings – every word is assigned a random point in space. Then it trains on billions of sentences with a simple game.

Mental Model: The Prediction Game

Take a sentence: “The cat sat on the ___.” Mask one word. Ask the model to predict it. If it predicts wrong, adjust the embeddings so words that should predict each other are closer. After billions of these predictions, “cat” and “dog” end up close (both appear in “The ___ sat on the mat”) and “cat” and “car” end up far apart (they never appear in similar contexts).

The embeddings organize themselves so that prediction becomes easier. Similarity emerges as a side effect of prediction.


Embeddings vs. One-Hot Encoding

Before embeddings, the standard approach was one-hot encoding — giving each word a unique binary vector:

Vocabulary: [apple, orange, banana, car, truck]

apple  = [1, 0, 0, 0, 0]
orange = [0, 1, 0, 0, 0]
banana = [0, 0, 1, 0, 0]
car    = [0, 0, 0, 1, 0]
truck  = [0, 0, 0, 0, 1]

The problem: every word is equally distant from every other word. Apple is as different from orange as it is from car. No notion of similarity at all.

Embeddings fix this by learning a dense representation where similar things are close:

apple  = [0.8, 0.2, 0.9, ...]   ← close to orange
orange = [0.7, 0.3, 0.8, ...]   ← close to apple  
car    = [0.1, 0.9, 0.2, ...]   ← far from fruits
FactorOne-Hot EncodingDense Embeddings
Vector SizeVocabulary size (100K+)Fixed (300-1,536)
SimilarityEvery word equidistantSimilar words are close
StorageSparse, wastefulDense, compact
MeaningNo semantic infoCaptures relationships
LearnedNo training neededRequires training on data

Common Embedding Models You’ll Encounter

ModelDimensionsBest For
Word2Vec300Classic word embeddings, fast and lightweight
Sentence-BERT768Sentence-level similarity and search
OpenAI Ada1,536General-purpose text embeddings
Cohere Embed1,024Multilingual, search-optimized
CLIP512Images and text in the same space

More dimensions generally means more nuance, but also more storage and computation. Choosing the right model is a cost-accuracy tradeoff your team will navigate for every feature.


Common Misconceptions

“Embeddings understand meaning like humans do.” They don’t. They understand usage patterns. If the training data consistently uses a word in a biased way, the embedding will encode that bias. Embeddings are a mirror of the data, not a source of truth.

“Higher dimensions are always better.” Not necessarily. More dimensions capture more nuance but increase storage, latency, and cost. For many product use cases, 384 or 768 dimensions are more than enough. Don’t default to the biggest model.

“You need to build your own embeddings.” For most product teams, pre-trained embedding models (OpenAI, Cohere, Sentence-BERT) work out of the box. Fine-tuning or training from scratch only makes sense when your domain is highly specialized (medical, legal, internal jargon).


The Mental Models – Your Cheat Sheet

ConceptMental ModelOne-Liner
EmbeddingMusic Festival SeatingSimilar preferences sit together
High DimensionsThe 1,536-Question QuizMore questions = more precise placement
Learning DimensionsThe New EmployeeFigure out structure from context, not labels
Semantic vs. Visual SimilarityApple-Orange-BallUsage patterns trump appearances
Vector DirectionsKing – Man + Woman = QueenRelationships emerge as directions in space
Training ProcessThe Prediction GameSimilarity emerges from predicting neighbors
One-Hot vs. DenseEvery seat equidistant vs. grouped by tasteDense captures similarity, one-hot can’t

Final Thought

Embeddings are how AI systems understand that things are related – even when they look completely different on the surface. They work by observing patterns: words that appear in similar contexts get placed close together in a high-dimensional space.

The mental model to carry with you:

  1. Everything becomes a point in space. Words, sentences, images, users, products – anything can be embedded.
  2. Closeness means similarity. The entire point of embedding space is that distance equals meaning. Nearby points are semantically related.
  3. Dimensions are learned from patterns, not programmed. The AI discovers the structure of meaning by observing billions of examples, not by humans labeling axes.
  4. Once things are embedded, finding similar things is just finding nearby points. Search, recommendations, deduplication, RAG – they’re all the same operation: find what’s close.

The next time someone says “we’re using vector embeddings for search,” picture a vast coordinate system where every document has a GPS location, and search is just finding the closest locations to your query. That’s all embeddings are – and that mental model will serve you well.

In the next post, we’ll zoom into Word2Vec – the model that started the dense embeddings revolution – and see exactly how the “prediction game” works under the hood.

Related Posts:

  • Word2Vec: Start of Dense Embeddings
  • Measuring Meaning: Cosine Similarity
  • How CNNs Actually Work
  • AI Paradigm Shift: From Rules to Patterns

Tags:

aiartificial-intelligencellmragtechnology
Author

Archit Sharma

Follow Me
Other Articles
Previous

How CNNs Actually Work

Next

An Intuitive Guide to CNNs and RNNs

No Comment! Be the first one.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

icons8 pencil 100
ML Basics

Back to the basics

screenshot 1
Model Intuition

Build model intuition

icons8 lock 100 (1)
Encryption

How encryption works

icons8 gears 100
Privacy Tech

What protects privacy

screenshot 4
Musings

Writing is thinking

Recent Posts

  • Teaching AI Models: Gradient Descent
  • Needle in the Haystack: Embedding Training and Context Rot
  • Measuring Meaning: Cosine Similarity
  • AI Paradigm Shift: From Rules to Patterns
  • Seq2Seq Models: Basics behind LLMs
  • Word2Vec: Start of Dense Embeddings
  • Advertising in the Age of AI
  • Breaking the “Unbreakable” Encryption – Part 2
  • Breaking the “Unbreakable” Encryption – Part 1
  • ML Foundations – Linear Combinations to Logistic Regression
Copyright 2026 — Building AI Intuition. All rights reserved. Blogsy WordPress Theme