Skip to content
icon icon Building AI Intuition

Connecting the dots...

icon icon Building AI Intuition

Connecting the dots...

  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
Close

Search

Subscribe
icon icon Building AI Intuition

Connecting the dots...

icon icon Building AI Intuition

Connecting the dots...

  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
Close

Search

Subscribe
Recent Posts
March 1, 2026
Teaching AI Models: Gradient Descent
March 1, 2026
Needle in the Haystack: Embedding Training and Context Rot
March 1, 2026
Measuring Meaning: Cosine Similarity
February 28, 2026
AI Paradigm Shift: From Rules to Patterns
February 16, 2026
Seq2Seq Models: Basics behind LLMs
February 16, 2026
Word2Vec: Start of Dense Embeddings
February 13, 2026
Advertising in the Age of AI
February 8, 2026
Breaking the “Unbreakable” Encryption – Part 2
February 8, 2026
Breaking the “Unbreakable” Encryption – Part 1
February 8, 2026
ML Foundations – Linear Combinations to Logistic Regression
February 2, 2026
Privacy Enhancing Technologies – Introduction
February 2, 2026
Privacy Enhancing Technologies (PETs) — Part 3
February 2, 2026
Privacy Enhancing Technologies (PETs) — Part 2
February 2, 2026
Privacy Enhancing Technologies (PETs) — Part 1
February 2, 2026
An Intuitive Guide to CNNs and RNNs
February 2, 2026
Making Sense Of Embeddings
November 9, 2025
How CNNs Actually Work
August 17, 2025
How Smart Vector Search Works
Privacy Tech

Privacy Enhancing Technologies – Introduction

Every time you browse a website, click an ad, make a purchase, or train an ML model, data flows through systems.…

Machine Learning Basics

How Smart Vector Search Works

In the ever-evolving world, the art of forging genuine connections remains timeless. Whether it’s with colleagues,…

Privacy Tech

Privacy Enhancing Technologies (PETs) — Part 3

Privacy-Preserving Computation and Measurement In Part 1, we covered how organizations protect data internally —…

Machine Learning Basics

Seq2Seq Models: Basics behind LLMs

When you use Google Translate to turn a complex English sentence into Spanish, or when you ask Gemini to summarize a…

Machine Learning Basics Model Intuition

Teaching AI Models: Gradient Descent

Post 1b/N In the last post, we established the big idea: machine learning is about finding patterns from data instead…

Machine Learning Basics

Measuring Meaning: Cosine Similarity

Post 2b/N In the previous posts, we established that embeddings turn everything into points in space and that Word2Vec…

Home/Machine Learning Basics/AI Paradigm Shift: From Rules to Patterns
Machine Learning Basics

AI Paradigm Shift: From Rules to Patterns

By Archit Sharma
11 Min Read
0
Updated on March 1, 2026

Post 1/N

Every piece of software you’ve ever shipped or have seen shipped works the same way. A developer sits down, thinks through the logic, and writes explicit rules. If the user clicks here, do this. If the input is greater than 100, reject it. If the date is past the deadline, send an email.

This approach has built the entire digital world – from banking systems to flight booking engines to the app on your phone. But it has one fatal flaw: it breaks the moment the problem gets ambiguous. Try writing explicit rules to detect sarcasm in a customer review. Or to identify a cat in a photo. Or to predict which ad a user will click next Tuesday at 2pm. You can’t – because the rules are too complex, too numerous, and too fluid for any human to write down.

This is the paradigm shift that machine learning represents. Instead of humans writing rules, we show the machine thousands (or millions) of examples and let it figure out the rules on its own. This post will give you the mental model for understanding this shift, when classical ML is enough versus when you need deep learning, and the single most important tradeoff that governs every AI system ever built. No equations – just intuition.


The Two Paradigms: Rules vs. Patterns

Let’s start with the clearest possible distinction.

Mental Model: The Restaurant Kitchen

Traditional software is like a kitchen that runs on a recipe book. Every dish has an exact recipe. The chef follows it step by step. If a customer asks for something not in the book, the kitchen can’t make it. This is rule-based programming – powerful, predictable, and completely helpless when the recipe book doesn’t cover the situation.

Machine learning is like hiring a chef who has eaten at 10,000 restaurants. You don’t give them recipes. Instead, you show them 500 plates of “good pasta” and 500 plates of “bad pasta” and say, “figure out what makes pasta good.” They taste patterns – salt levels, texture, sauce ratios – that no recipe book would ever capture. They can even handle a pasta dish they’ve never seen before, because they’ve internalized the pattern of what “good” means.

This is the fundamental shift. Traditional software encodes human knowledge as rules. Machine learning encodes human examples as patterns.

Traditional Software:
  Input → [Human-Written Rules] → Output
  "IF email contains 'Nigerian prince' AND has attachment THEN → spam"

Machine Learning:
  Input + Labeled Examples → [Algorithm Finds Patterns] → Output
  "Here are 100,000 emails labeled spam/not-spam. YOU figure out the rules."

Where It Matters for PMs and Engineers: The moment you find yourself writing increasingly complex if-else chains, or your rule-based system has 400 edge cases and counting, that’s your signal. You’ve hit the ceiling of the rules paradigm. The problem likely needs a pattern-based approach.


Classical ML vs. Deep Learning: When to Use What

Not all pattern-finding is created equal. The ML world splits into two broad camps, and knowing which one to reach for is one of the highest-leverage decisions a product team makes.

Classical Machine Learning

Classical ML algorithms – think linear regression, decision trees, random forests, XGBoost – are pattern-finders that work on structured, tabular data. The kind of data that lives in spreadsheets and SQL databases.

Mental Model: The Experienced Real Estate Agent

A good real estate agent can look at a house listing – square footage, number of bedrooms, zip code, year built – and predict the price within 10%. They’ve seen enough deals that they’ve internalized the relationship between these features and the price. They don’t need to see a photo of the house. The structured data is enough.

That’s classical ML. You give it a table of features (columns) and outcomes (labels), and it finds the mathematical relationship between them.

Classical ML is mature, fast, interpretable, and cheap to run. For a shocking number of real-world product problems, it’s all you need.

Real-World Examples:

  • Credit scoring (Will this person default on a loan?)
  • Churn prediction (Will this user cancel their subscription?)
  • Demand forecasting (How many units will we sell next quarter?)
  • Fraud detection (Is this transaction suspicious?)
  • Ad click-through rate prediction (Will this user click this ad?)
Deep Learning

Deep learning uses neural networks – layers of interconnected nodes that can learn incredibly complex patterns from unstructured data: images, text, audio, video.

Mental Model: The Blind Men and the Elephant

You’ve heard the story. Six blind men each touch a different part of an elephant. One feels the trunk and says “it’s a snake.” Another feels the leg and says “it’s a tree.” Each one detects a local pattern but misses the whole picture.

A deep neural network is like stacking many layers of these blind men. The first layer detects tiny local patterns (edges in an image, individual words in text). The second layer combines those into bigger patterns (shapes, phrases). The third layer combines those into even bigger concepts (faces, sentences). By the time you reach the final layer, the network has assembled a complete understanding from raw, unstructured input – something no single rule or simple algorithm could do.

Real-World Examples:

  • Image recognition (Google Photos, medical imaging)
  • Natural language processing (ChatGPT, Google Translate)
  • Speech recognition (Siri, Alexa)
  • Recommendation systems (YouTube, TikTok’s “For You” page)
  • Autonomous driving (Tesla, Waymo)
The Decision Matrix
FactorClassical MLDeep Learning
Data TypeStructured (tables, numbers)Unstructured (text, images, audio)
Data Volume NeededHundreds to thousands of rowsThousands to millions of examples
Training TimeMinutes to hoursHours to weeks
InterpretabilityHigh (you can explain why)Low (it’s a black box)
Compute CostLow (runs on a laptop)High (needs GPUs)
When to UseClear features, tabular dataRaw signals, complex patterns

The PM Takeaway: Before you pitch an LLM-powered feature, ask yourself: “Can this problem be solved with a well-designed XGBoost model on structured data?” If yes, you’ll ship faster, spend less, and have a system you can actually explain to stakeholders. Not every problem needs a neural network, and reaching for one when you don’t need it is one of the most expensive mistakes in AI product development.


Supervised vs. Unsupervised vs. Reinforcement Learning: Teaching with and without Answers

Now that you know what kind of pattern-finder to use, the next question is how you teach it.

Supervised Learning

Mental Model: The Flashcard Tutor

Imagine tutoring a student with flashcards. You show them a card with a question on the front (“What’s in this image?”) and the answer on the back (“A cat”). After hundreds of flashcards, the student starts getting the answers right on their own – even for cards they’ve never seen.

Supervised learning works exactly like this. You provide the algorithm with labeled data – inputs paired with correct outputs – and it learns the mapping between them.

This is the most common and well-understood form of ML. Every time you see “training data,” it almost always means labeled examples.

Two flavors:

  • Classification: The output is a category. “Is this email spam or not?” “Is this tumor malignant or benign?”
  • Regression: The output is a number. “What will this house sell for?” “How many minutes will delivery take?”
Unsupervised Learning

Mental Model: The Librarian Who Organizes by Instinct

Imagine you dump 10,000 books on a table with no labels, no categories, no Dewey Decimal numbers. A skilled librarian would start grouping them – this stack is romance, that stack is science fiction, these are cookbooks – just by reading the content. Nobody told them the categories. They discovered the structure.

That’s unsupervised learning. The algorithm finds hidden patterns and groupings in data without being told what to look for.

Real-World Examples:

  • Customer segmentation (Grouping users by behavior for marketing)
  • Anomaly detection (Finding the one weird transaction in millions of normal ones)
  • Topic discovery (What are people talking about in these 50,000 support tickets?)

Where It Gets Practical: In product work, unsupervised learning is often a first step, not a final product. You cluster your users to discover segments, then build supervised models for each segment. You detect anomalies to flag potential fraud, then a human reviews the flags.

Reinforcement Learning

Mental Model: The Dog Trainer

You don’t show a dog 10,000 labeled flashcards of “sit” and “not sit.” Instead, you let the dog try things. When it sits, it gets a treat. When it jumps on the couch, it gets a firm “no.” Over time, the dog learns a strategy – a sequence of behaviors that maximizes treats and minimizes scolding. Nobody gave it a rulebook. Nobody showed it labeled examples. It learned by doing and getting feedback.

That’s reinforcement learning. An agent takes actions in an environment, receives rewards or penalties, and learns a strategy (called a policy) that maximizes long-term reward.

The key difference from supervised learning: there’s no “right answer” for each input. There’s only a score at the end. The agent has to figure out which of its many actions along the way contributed to winning or losing – like a chess player reviewing a game and asking “which move was the mistake?”

Real-World Examples:

  • Game-playing AI (AlphaGo, OpenAI Five for Dota 2)
  • Robotics (teaching a robot arm to pick up objects through trial and error)
  • Recommendation feeds (TikTok’s algorithm optimizing for watch time)
  • RLHF – Reinforcement Learning from Human Feedback – the technique that turned raw GPT into ChatGPT by rewarding helpful, harmless responses

The PM Takeaway: You won’t reach for RL in most product work – it’s data-hungry, slow to train, and hard to debug. But you should know it exists because it powers two things you’ll encounter constantly: the fine-tuning behind every major LLM (RLHF), and the optimization engines behind recommendation and ad-ranking systems. We’ll go deep on RL mechanics in Post 17.


The Bias-Variance Tradeoff: The One Concept That Rules Them All

If there’s one idea that governs every ML system – from a simple linear regression to GPT-4 – it’s this tradeoff. Get it wrong, and your model either fails spectacularly or fails quietly. Both are bad.

Mental Model: The Portrait Artist

Imagine you hire an artist to draw a portrait of your face from a photo.

High Bias (Underfitting): The artist draws a generic smiley face. It vaguely looks like a human, but it doesn’t capture you at all. The artist has oversimplified – they used a mental model too simple to capture the complexity of your actual face. This is a model that’s too simple for the data.

High Variance (Overfitting): The artist draws an incredibly detailed portrait – but they included the pimple on your nose, the ketchup stain on your shirt, and the shadow from the lamp behind you. The portrait looks exactly like the photo, but if you took a new photo the next day, the portrait would look nothing like you. The artist memorized the specific photo instead of learning what you actually look like. This is a model that’s too complex and has memorized the training data, including its noise.

The Sweet Spot: A skilled artist captures the essential features of your face – bone structure, eye shape, expression – while ignoring the accidental details. That’s what a well-tuned model does.

Model Complexity Spectrum:

   Underfitting                    Sweet Spot                    Overfitting
   (High Bias)                                                  (High Variance)
       |                              |                              |
  "Generic smiley"           "Captures your essence"       "Memorized the photo"
       |                              |                              |
  Too simple.                  Just right.                   Too complex.
  Misses real patterns.        Generalizes well.             Memorizes noise.
  Fails on training data.      Works on new data.            Fails on new data.

Why This Matters in Production:

Overfitting is the silent killer of AI products. Your model crushes the test set during development, everyone celebrates, and then it falls apart in production because real-world data is messier, more diverse, and constantly shifting compared to the neat training set.

Every decision in ML – how much data to collect, how complex a model to use, when to stop training, which features to include – is ultimately a decision about where to sit on this spectrum.

The PM Takeaway: When your data science team says “the model is overfitting,” what they mean is: “It memorized the training data instead of learning the real pattern, so it won’t work on new users.” When they say “it’s underfitting,” they mean: “The model is too simple – we need more features, more data, or a more powerful algorithm.” Your job is to help them get the right data and set the right success criteria so the model lands in the sweet spot.


How It All Connects

Here’s how these concepts form a decision tree for any AI product initiative:

Step 1: Can you solve this with explicit rules?
         |                    |
        YES                  NO
         |                    |
   Write code.         You need ML.
   Ship it.                   |
                     Step 2: What's your data?
                       |                |
                  Structured         Unstructured
                  (tables)        (text/images/audio)
                       |                |
                  Classical ML     Deep Learning
                  (XGBoost, etc.)  (Neural Networks)
                       |                |
                     Step 3: How does the system learn?
                       |                |                    |
                   Labeled data?    No labels?        Reward signal?
                       |                |                    |
                  Supervised       Unsupervised      Reinforcement
                  Learning         Learning           Learning
                       |                |                    |
                     Step 4: Tune for the bias-variance sweet spot.
                              Monitor in production.
                              Iterate.

This is a simplification, of course – real products mix and match. A recommendation system might use unsupervised clustering to segment users, then supervised models to rank content within each segment, with deep learning for processing the content itself. But this decision tree gives you the mental scaffolding to navigate those conversations.


Common Misconceptions

“More data always helps.” Not if the data is mislabeled, biased, or irrelevant. 1,000 clean, representative examples often outperform 100,000 noisy ones. Data quality beats data quantity almost every time.

“Deep learning is always better than classical ML.” On structured tabular data, XGBoost and similar algorithms routinely beat deep learning in Kaggle competitions. Deep learning shines on unstructured data. Using an LLM to predict churn from a user behavior table is like using a sledgehammer to hang a picture frame.

“The model is wrong, so we need a better algorithm.” Nine times out of ten, the problem is the data, not the algorithm. Garbage in, garbage out is the oldest truth in computer science, and it applies doubly to ML.

“AI replaces human judgment.” AI augments human judgment. The best AI products are designed with humans in the loop – especially for high-stakes decisions. The model predicts, the human decides.


The Mental Models – Your Cheat Sheet

ConceptMental ModelOne-Liner
Rules vs. PatternsThe Restaurant KitchenRecipes vs. a chef who’s tasted 10,000 dishes
Classical MLThe Real Estate AgentPredicts price from structured features
Deep LearningBlind Men and the ElephantLayers build understanding from raw input
Supervised LearningThe Flashcard TutorLearn from labeled question-answer pairs
Unsupervised LearningThe LibrarianDiscover structure without labels
Underfitting (High Bias)The Generic Smiley FaceToo simple to capture real patterns
Overfitting (High Variance)The Over-Detailed PortraitMemorized noise instead of signal
The Sweet SpotThe Skilled Portrait ArtistCaptures essence, ignores accidents
Reinforcement LearningThe Dog TrainerLearn a strategy with reward and penalty

Final Thought

The paradigm shift from rules to patterns is the single biggest mental model change for anyone coming from traditional software engineering. Once you internalize it, everything else in AI – from embeddings to transformers to RAG pipelines – becomes a variation on the same theme: teaching machines to find patterns instead of writing rules.

Three things to carry with you:

  1. Start simple. If rules work, use rules. If classical ML works, use classical ML. Reach for deep learning only when the problem demands it. The best AI product teams are disciplined about using the simplest tool that solves the problem.
  2. The data is the product. In traditional software, the code is the asset. In ML, the data is the asset. The quality, diversity, and representativeness of your training data matters more than which algorithm you pick. If you’re a PM, fight harder for data quality than for a fancier model.
  3. Bias-variance is your north star. Every performance issue in an ML system – from a model that doesn’t work at all to one that works in testing but fails in production – maps back to this tradeoff. Learn to diagnose which side you’re on, and you’ll always know what lever to pull.

In the next post, we’ll go deeper into the foundational data structure of modern AI: embeddings and vector spaces. This is how machines translate human concepts – words, images, preferences – into math they can actually work with. It’s the key that unlocks semantic search, recommendation engines, and the entire RAG architecture. See you there.

Related Posts:

  • How CNNs Actually Work
  • Needle in the Haystack: Embedding Training and Context Rot
  • Teaching AI Models: Gradient Descent
  • Measuring Meaning: Cosine Similarity
  • ML Foundations - Linear Combinations to Logistic Regression
  • Making Sense Of Embeddings

Tags:

Machine LearningReinforced LearningSupervised LearningUnsupervised Learning
Author

Archit Sharma

Follow Me
Other Articles
Previous

Seq2Seq Models: Basics behind LLMs

Next

Measuring Meaning: Cosine Similarity

No Comment! Be the first one.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

icons8 pencil 100
ML Basics

Back to the basics

screenshot 1
Model Intuition

Build model intuition

icons8 lock 100 (1)
Encryption

How encryption works

icons8 gears 100
Privacy Tech

What protects privacy

screenshot 4
Musings

Writing is thinking

Recent Posts

  • Teaching AI Models: Gradient Descent
  • Needle in the Haystack: Embedding Training and Context Rot
  • Measuring Meaning: Cosine Similarity
  • AI Paradigm Shift: From Rules to Patterns
  • Seq2Seq Models: Basics behind LLMs
  • Word2Vec: Start of Dense Embeddings
  • Advertising in the Age of AI
  • Breaking the “Unbreakable” Encryption – Part 2
  • Breaking the “Unbreakable” Encryption – Part 1
  • ML Foundations – Linear Combinations to Logistic Regression
Copyright 2026 — Building AI Intuition. All rights reserved. Blogsy WordPress Theme