Skip to content
icon icon Building AI Intuition

Connecting the dots...

icon icon Building AI Intuition

Connecting the dots...

  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Concepts
  • Musings
  • About
  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Concepts
  • Musings
  • About
Close

Search

Subscribe
icon icon Building AI Intuition

Connecting the dots...

icon icon Building AI Intuition

Connecting the dots...

  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Concepts
  • Musings
  • About
  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Concepts
  • Musings
  • About
Close

Search

Subscribe
Recent Posts
April 7, 2026
Exploring “Linear” in Linear Regression
April 7, 2026
The curious case of R-Squared: Keep Guessing
March 11, 2026
[C1] What Machines Actually Do (And What They Don’t)
March 11, 2026
[ML x] Machine Decision: From One Tree to a Forest
November 2, 2024
[ML 1] AI Paradigm Shift: From Rules to Patterns
November 5, 2025
[ML 1.a] ML Foundations – Linear Combinations to Logistic Regression
November 14, 2025
[ML 1.b] Teaching AI Models: Gradient Descent
November 19, 2025
[ML 2] Making Sense Of Embeddings
November 22, 2025
[ML 2.a] Word2Vec: Start of Dense Embeddings
November 28, 2025
[ML 2.b] Measuring Meaning: Cosine Similarity
December 3, 2025
[ML 2.c] Needle in the Haystack: Embedding Training and Context Rot
February 16, 2026
[MI 3] Seq2Seq Models: Basics behind LLMs
February 13, 2026
[MU 1] Advertising in the Age of AI
December 9, 2025
[EN 1.a] Breaking the “Unbreakable” Encryption – 1
December 13, 2025
[EN 1.b] Breaking the “Unbreakable” Encryption – 2
December 18, 2025
[PET 1] Privacy Enhancing Technologies – Introduction
December 21, 2025
[PET 1.a] Privacy Enhancing Technologies (PETs) — Part 1
December 25, 2025
[PET 1.b] Privacy Enhancing Technologies (PETs) — Part 2
December 30, 2025
[PET 1.c] Privacy Enhancing Technologies (PETs) — Part 3
February 2, 2026
[MI 1] An Intuitive Guide to CNNs and RNNs
November 9, 2025
[MI 2] How CNNs Actually Work
January 16, 2026
How Smart Vector Search Works
Home/Machine Learning Basics/[ML 1] AI Paradigm Shift: From Rules to Patterns
Machine Learning Basics

[ML 1] AI Paradigm Shift: From Rules to Patterns

By Archit Sharma
17 Min Read
0
Updated on March 11, 2026

Every piece of software you’ve ever shipped works the same way. A developer thinks through the logic and writes explicit rules — if the user clicks here, do this; if the input exceeds 100, reject it; if the date passes the deadline, send an email. This approach built the entire digital world, from banking systems to flight booking engines.

But it has one fatal flaw: it breaks the moment the problem gets ambiguous. Try writing explicit rules to detect sarcasm in a customer review, identify a cat in a photo, or predict which ad a user will click next Tuesday at 2pm. You can’t — because the rules are too complex, too numerous, and too fluid for any human to write down.

This is the paradigm shift that machine learning represents. Instead of humans writing rules, we show the machine millions of examples and let it figure out the patterns. This post gives you the mental model for understanding that shift — when rules work, when they break, which AI approach fits which problem, and the one tradeoff that governs every ML system ever built. No equations — just mental models you can carry with you.


The Lay of the Land

Before diving in, here’s the map of what this post covers and how the pieces connect:

The Core Question: How should a system make decisions?
        |
        ├── Option 1: Write Rules
        |       Powerful for simple, stable, well-defined problems
        |       Breaks under edge case explosion and rule conflicts
        |
        ├── Option 2: Learn Patterns (Machine Learning)
        |       ├── Classical ML — structured/tabular data
        |       ├── Deep Learning — unstructured data (text, images, audio)
        |       └── How it learns: Supervised / Unsupervised / Reinforcement
        |
        ├── The Core Tradeoff: Bias vs. Variance
        |       The one concept that governs every ML system ever built
        |
        ├── The Problem with Pure AI
        |       Non-determinism, misplaced confidence, accountability gaps
        |
        └── Option 3: Hybrid Systems
                Rules for the predictable core. ML for the edge cases.
                How the best production systems are actually built.

By the end of this post you’ll know where each option fits, when to combine them, and why hybrid is how serious production systems are built.


Part 1: The Two Paradigms — Rules vs. Patterns

Traditional software and machine learning are two fundamentally different answers to the same question: how does a system decide what to do?

Mental Model: The Restaurant Kitchen

Traditional software is like a kitchen that runs on a recipe book — every dish has an exact recipe, the chef follows it step by step, and the output is perfectly predictable. But if a customer asks for something not in the book, the kitchen freezes.

Machine learning is like hiring a chef who has eaten at 10,000 restaurants. You don’t give them recipes — you show them 500 plates of “good pasta” and 500 plates of “bad pasta” and say: “figure out what makes pasta good.” They internalize patterns that no recipe book would ever capture. They can even handle a dish they’ve never seen before, because they’ve learned what “good” actually means.

Traditional software encodes human knowledge as rules. Machine learning encodes human examples as patterns.

Traditional Software:
  Input → [Human-Written Rules] → Output
  "IF email contains 'Nigerian prince' AND has attachment THEN → spam"

Machine Learning:
  Input + Labeled Examples → [Algorithm Finds Patterns] → Output
  "Here are 100,000 emails labeled spam/not-spam. YOU figure out the rules."

[Image: Two-path diagram — left path shows a human writing if-then rules flowing to output; right path shows labeled examples flowing into a model flowing to output]

Real-World Examples: Rule-based systems power tax calculation engines (TurboTax), payment routing logic, and compliance checks — any domain where the answer is unambiguous and stable. ML powers Gmail’s spam filter, Netflix recommendations, Uber’s surge pricing, and Google Search — anything where the variety of inputs is too high for any human to enumerate.

Trade-offs:

  • Rule-based systems are brittle at scale — every new edge case requires a human to write a new rule
  • Rule conflicts grow combinatorially — N rules can create N² interaction problems
  • Cannot generalize — any situation not explicitly covered produces no answer or the wrong one
  • High maintenance burden — rules must be updated every time the world changes

Part 2: Why Rules Break Down — The Edge Case Explosion

Rule-based systems work beautifully right up until the real world shows up.

Consider a bank writing rules to detect fraudulent transactions. You start simple: flag transactions over $10,000, flag activity from new countries, flag 3am purchases. Reasonable — until a legitimate customer travels internationally, withdraws a large amount for a house down payment, or works a night shift. Now you’re flagging everything.

So you add exceptions — and exceptions to the exceptions. After six months you have 400 rules, and some of them contradict each other.

That’s the second problem: rule conflicts.

When two rules fire at the same time, which one wins? If a customer is a frequent international traveler and the transaction is unusually large and it’s to a new payee — do you flag it or not? Someone has to decide the priority order of every rule, for every possible combination. Miss one and you either let fraud through or block a legitimate customer.

The Signal: When your rule-based system has 400+ edge cases and counting, and you’re spending more time adjudicating conflicts than building — you’ve hit the ceiling of the rules paradigm.

[Image: A branching decision tree that starts simple with 3 rules, then explodes into dozens of overlapping branches with conflict annotations]

Real-World Examples: Insurance underwriting systems at large carriers have accumulated thousands of overlapping rules over decades — requiring dedicated “rules governance” teams just to manage conflicts. Early content moderation at Facebook was rule-based and required constant manual updates as edge cases multiplied. The explosion of conflicts was a primary driver toward ML-based moderation.

Trade-offs:

  • Rule explosion is inevitable — complexity grows faster than any team’s ability to manage it
  • No graceful degradation — the system gives a wrong answer, not an uncertain one
  • Audit trails are clear but maintenance costs compound every year

Part 3: Classical ML vs. Deep Learning — When to Use What

Not all pattern-finding is created equal. The ML world splits into two broad camps, and knowing which to reach for is one of the highest-leverage decisions a product team makes.

Classical Machine Learning

Classical ML algorithms — linear regression, decision trees, random forests, XGBoost — find patterns in structured, tabular data. The kind of data that lives in spreadsheets and SQL databases.

Mental Model: The Experienced Real Estate Agent

A seasoned agent can look at a house listing — square footage, bedrooms, zip code, year built — and predict the price within 10%. They’ve seen enough deals that they’ve internalized the relationship between these structured features and price. The table of numbers is enough — they don’t need to see a photo.

Classical ML is mature, fast, interpretable, and cheap to run. For a large number of real-world product problems, it is all you need.

[Image: Table of structured features (columns: bedrooms, sqft, zipcode, year) with an arrow pointing to a predicted price output]

Real-World Examples: Credit scoring at banks (FICO, Experian), churn prediction at SaaS companies (Salesforce), demand forecasting at retailers (Walmart, Amazon), fraud scoring at payment processors (Stripe, Adyen), ad click-through rate prediction at Google and Meta.

Trade-offs:

  • Requires features to be hand-crafted by a human — poor raw signal handling
  • Performance plateaus when underlying relationships are highly non-linear
  • Cannot process images, audio, or free-form text directly
Deep Learning

Deep learning uses neural networks — layers of interconnected nodes that learn complex patterns from unstructured data: images, text, audio, video.

Mental Model: The Blind Men and the Elephant

Each blind man touches a different part — one feels the trunk and says “snake,” another feels the leg and says “tree.” A deep neural network stacks many such layers: the first detects tiny local patterns (edges in an image, individual words in text), the next combines them into bigger patterns (shapes, phrases), and so on until the final layer assembles complete understanding from raw, unstructured input.

Deep learning handles raw signals no spreadsheet could represent — but requires far more data and compute than classical ML.

[Image: Neural network diagram showing layers — raw pixels at left, through edge detection and shape detection layers, to final object classification at right]

Real-World Examples: Google Photos (image recognition), ChatGPT and Google Translate (language), Siri and Alexa (speech recognition), TikTok and YouTube (recommendations), Tesla and Waymo (autonomous driving perception).

Trade-offs:

  • Data-hungry — requires thousands to millions of labeled examples to train well
  • Compute-intensive — needs GPUs, costs significant money to train and serve
  • Black box — difficult to explain why a specific decision was made
  • Overkill for structured tabular data — classical ML often outperforms it there
The Decision Matrix
FactorClassical MLDeep Learning
Data TypeStructured (tables, numbers)Unstructured (text, images, audio)
Data Volume NeededHundreds to thousands of rowsThousands to millions of examples
Training TimeMinutes to hoursHours to weeks
InterpretabilityHigh — you can explain whyLow — it’s a black box
Compute CostLow — runs on a laptopHigh — needs GPUs
When to UseClear features, tabular dataRaw signals, complex patterns

Key Takeaway: Before reaching for an LLM or neural network, ask: “Can this problem be solved with a well-designed XGBoost model on structured data?” If yes, you’ll ship faster, spend less, and have a system you can explain to stakeholders. Reaching for deep learning when you don’t need it is one of the most expensive mistakes in AI product development.


Part 4: Supervised, Unsupervised, and Reinforcement Learning

Now that you know what kind of pattern-finder to use, the next question is how you teach it. There are three fundamentally different approaches.

Supervised Learning

Mental Model: The Flashcard Tutor

You show a student a card with a question on the front (“What’s in this image?”) and the answer on the back (“A cat”). After thousands of flashcards, the student gets answers right even on cards they’ve never seen. Supervised learning works exactly like this — labeled inputs paired with correct outputs, and the algorithm learns the mapping between them.

This is the most common form of ML. It comes in two flavors: Classification (output is a category — spam or not spam) and Regression (output is a number — predicted house price, estimated delivery time).

Real-World Examples: Gmail spam filtering, medical diagnosis assistance (tumor detection in radiology scans), Uber’s delivery time estimation, bank credit risk scoring, Amazon product review sentiment classification.

Trade-offs:

  • Requires labeled data — labeling at scale is expensive and time-consuming
  • Only as good as the quality and representativeness of the labels provided
  • Struggles with novel situations not represented in the training set
Unsupervised Learning

Mental Model: The Librarian Who Organizes by Instinct

You dump 10,000 unlabeled books on a table. A skilled librarian starts grouping them — romance here, science fiction there, cookbooks in that corner — just by reading the content. Nobody told them the categories. They discovered the structure in the data.

Unsupervised learning finds hidden patterns without being told what to look for. In product work it’s often a first step — you discover segments, then build supervised models for each.

Real-World Examples: Customer segmentation at marketing firms (Mailchimp, HubSpot), anomaly detection in cybersecurity (Darktrace), topic discovery in support ticket systems (Zendesk), user behavior clustering at Spotify and Netflix.

Trade-offs:

  • No ground truth — hard to evaluate whether discovered clusters are actually meaningful
  • Results require human interpretation and labeling after the fact
  • Rarely a standalone product — most useful as input to a downstream supervised model
Reinforcement Learning

Mental Model: The Dog Trainer

You don’t show a dog 10,000 labeled flashcards of “sit” and “not sit.” You let it try things — when it sits, it gets a treat; when it jumps on the couch, it gets a firm “no.” Over time it learns the strategy that maximizes treats. That’s reinforcement learning: an agent takes actions in an environment, receives rewards or penalties, and learns a policy that maximizes long-term reward.

There’s no “right answer” per input — only a score at the end. The agent must work out which actions contributed to winning or losing, like a chess player reviewing a game asking “which move was the mistake?”

Real-World Examples: DeepMind’s AlphaGo (beat the world Go champion), robotic arms in Amazon fulfillment centers, TikTok’s feed optimization for watch time, and RLHF — Reinforcement Learning from Human Feedback — the technique that turned raw GPT into ChatGPT by rewarding helpful, harmless responses.

Trade-offs:

  • Slow and data-hungry — requires massive trial-and-error cycles to converge
  • Reward signals can produce unexpected behaviors if poorly designed
  • Hard to debug — the learned policy is often opaque
  • Rarely used in standard product work but underpins every major LLM fine-tuning pipeline

Part 5: The Bias-Variance Tradeoff — The One Concept That Rules Them All

If there’s one idea that governs every ML system — from a simple regression to GPT-4 — it’s this. Get it wrong and your model fails spectacularly or fails quietly. Both are expensive.

Bias: The Model That Doesn’t Know Enough

Mental Model: The Out-of-Depth Dinner Guest

The conversation turns to Middle East geopolitics and oil prices. You have zero background — no knowledge of OPEC dynamics, Gulf politics, or how sanctions affect supply. Whatever you say will be generic and miss the real drivers. You’re not wrong because of bad data — you’re wrong because your internal model of the world doesn’t have enough signal in this domain at all.

A model with high bias has underfitted — it’s too simple to capture the actual patterns. It makes broad, oversimplified predictions that miss what’s actually happening, even on data it was trained on.

Variance: The Model That Memorized Instead of Learned

Mental Model: The Overconfident Oil Trader

A veteran trader with 20 years on the desk has lived through 1990, 2008, 2014. Every time prices move in a certain pattern, he says: “I’ve seen this before — this ALWAYS leads to X.” He’s supremely confident. But he’s so anchored to historical patterns that he’s ignoring what’s genuinely different this time: new geopolitical actors, the green energy transition, fundamentally different demand structures. He memorized his training data so precisely that he can’t generalize to a new situation.

A model with high variance has overfitted — it learned the training data too well, including noise and quirks, and fails the moment the world looks even slightly different.

The Sweet Spot
Model Complexity Spectrum:

   Underfitting                    Sweet Spot                    Overfitting
   (High Bias)                                                  (High Variance)
       |                              |                              |
  "No clue about oil"         "Understands the market"     "Memorized 2008"
       |                              |                              |
  Too simple.                  Just right.                   Too rigid.
  Misses real patterns.        Generalizes to new data.     Breaks on new conditions.
  Fails even on training.      Handles novelty well.        Overconfident and wrong.

[Image: U-shaped error curve — high error on the left for underfitting, low in the middle for the sweet spot, high on the right for overfitting; x-axis labeled model complexity]

Real-World Examples: An overfitted fraud detection model works perfectly on last year’s fraud patterns but misses entirely when fraudsters change tactics. An underfitted churn model predicts “no churn” for almost everyone because it hasn’t learned enough signal from the data.

Trade-offs:

  • Too simple: misses real patterns, fails even on familiar data
  • Too complex: memorizes noise, fails on new data — the silent production killer
  • The sweet spot requires careful tuning of data volume, model complexity, and training duration

Key Takeaway: When a model works in testing but fails in production, overfitting is the first suspect. When a model barely works at all, underfitting is the diagnosis. Every model performance problem maps to one of these two — knowing which tells you exactly what lever to pull.


Part 6: The Problem with Pure Pattern-Based AI

If rules are so brittle, why not just use AI for everything?

Because pattern-based AI has its own sharp edge: it is not deterministic. The same input can produce slightly different outputs. More critically, AI gives you probabilities, not guarantees — “there’s an 87% chance this is fraud” is not the same as “this is fraud, full stop.”

Humans are wired for certainty. We want a binary answer and we want to know who is responsible when something goes wrong. A rule you wrote is your decision — you own it. A pattern the model found in 10 million transactions is… whose decision exactly?

This creates real friction in regulated industries (banking, healthcare, legal), in high-stakes decisions (hiring, lending, criminal justice), and anywhere a human needs to explain why a decision was made. “The model said so” is not acceptable in a compliance review.

And the bias-variance tradeoff compounds this: an overfitted model doesn’t know what it doesn’t know — it gives high-confidence predictions on situations it has never genuinely seen before. This is the root cause of AI hallucinations and misplaced confidence — a topic important enough to deserve its own post (coming up: [ML 1.c] When AI Gets Confidently Wrong).

Real-World Examples: Amazon scrapped an AI hiring tool in 2018 after discovering it systematically downgraded resumes from women — the model had learned patterns from historical male-dominated hiring data and encoded that bias. Several healthcare AI systems have shown high accuracy in trials but degraded significantly in deployment when real hospital data differed from training data in ways nobody anticipated.

Trade-offs:

  • Non-deterministic — same input can yield different outputs across runs
  • Probabilistic outputs require humans to set thresholds, introducing new judgment calls
  • Hard to audit — regulators in finance and healthcare require explainability black-box models can’t easily provide
  • Encodes historical bias — training data that reflects past discrimination produces models that perpetuate it

Part 7: The Hybrid Approach — Best of Both Worlds

The sharpest AI systems in production today don’t choose between rules and patterns — they combine them. Rules handle well-understood, high-stakes, must-be-deterministic cases. AI handles the ambiguous, pattern-dependent edge cases.

Mental Model: The Chef, Revisited

The best restaurant kitchen isn’t pure recipe book and it isn’t pure improvisation. Core dishes have exact recipes everyone follows — consistency is what keeps customers coming back. But when a customer makes a special request — “I’m lactose intolerant, can you adapt the pasta?” — the chef draws on years of pattern knowledge and culinary instinct to handle it. The recipe book covers the 80%. The chef’s experience covers the rest.

Real-World Examples: Visa and Mastercard use hard rules for mandatory regulatory reporting (any transaction over $10,000 triggers a filing regardless of model output), while ML scores every transaction in real time to catch sophisticated fraud no rule would anticipate. Enterprise chatbots (Salesforce, Zendesk) use rules for deterministic intents (“cancel my order”) and ML for ambiguous ones (“I’m thinking of leaving”). YouTube uses hard rules to immediately remove content matching known illegal material, then ML to handle the context-dependent grey zone.

Trade-offs:

  • More complex to build and maintain than either pure approach
  • Requires deliberate design decisions about which cases belong to rules vs. ML
  • Boundary cases — where rules end and ML begins — need ongoing governance
  • Audit complexity: rule decisions are explainable, ML decisions less so

Rules vs. Patterns vs. Hybrid — The Full Picture

DimensionRule-BasedPattern-Based (ML)Hybrid
How it worksHumans write if-then logicMachine learns from examplesRules for core, ML for edges
Output typeDeterministicProbabilisticDeterministic where required
Handles edge cases?PoorlyWellBest of both
Rule conflictsReal problemNot applicableContained to defined scope
Scales with complexity?NoYesYes
InterpretabilityHighLowMixed
AccountabilityClearMurkyClear where it counts
Regulatory complianceEasy to auditHard to auditAuditable where required
Best forSimple, stable, compliance-criticalAmbiguous, high-variety, evolvingMost real production systems
Real examplesTax calc, compliance limitsSpam filter, image recognitionFraud detection, content moderation

How It All Connects

Step 1: Can explicit rules solve this reliably?
         |                         |
        YES                        NO
         |                         |
   Write code.              You need ML.
   Ship it.                        |
                      Step 2: Are there high-stakes cases
                              requiring guaranteed answers?
                         |                    |
                        YES                   NO
                         |                    |
                  Hybrid: rules for      Pure ML approach
                  those, ML for rest            |
                                     Step 3: What is your data?
                                         |              |
                                    Structured      Unstructured
                                    (tables)     (text/images/audio)
                                         |              |
                                   Classical ML    Deep Learning
                                              |
                               Step 4: How does the system learn?
                                  |              |              |
                            Labeled data?   No labels?   Reward signal?
                                  |              |              |
                             Supervised   Unsupervised  Reinforcement
                                              |
                               Step 5: Tune for the bias-variance
                                       sweet spot.
                                       Monitor in production.
                                       Iterate.

Common Misconceptions

“AI is always better than rules — just use AI for everything.” This is the most dangerous oversimplification in product AI. Rules are faster, cheaper, fully auditable, and deterministic. For well-defined stable problems — compliance logic, payment routing, eligibility checks — rules are the right tool. Using ML where rules suffice adds cost, opacity, and failure modes you didn’t need.

“More data always fixes the problem.” Not if the data is mislabeled, biased, or irrelevant. 1,000 clean representative examples routinely outperform 100,000 noisy ones. Data quality beats data quantity almost every time.

“Deep learning is always the most powerful approach.” On structured tabular data — which is the majority of business data — XGBoost and similar classical algorithms routinely outperform deep learning. Using an LLM to predict churn from a user behavior table is like hiring a surgeon to put on a bandage.

“The model is wrong, so we need a better algorithm.” Nine times out of ten, the problem is the data, not the algorithm. Garbage in, garbage out is the oldest truth in computing and it applies doubly to ML.

“AI gives confident answers, so confident answers are correct.” AI gives probabilities, not truths. A model can be 95% confident and completely wrong — especially for inputs that differ from the training distribution. High confidence is not the same as high accuracy.


Summary

  • Rules work until they don’t. Powerful and predictable for stable problems — but edge case explosion and rule conflicts make them unmanageable at scale.
  • Pattern-based AI handles what rules can’t. ML learns from examples and generalizes — but gives probabilities, not guarantees, and encodes whatever bias exists in the training data.
  • Classical ML vs. deep learning is a data type decision. Structured tabular data → classical ML. Unstructured data → deep learning. Wrong tool choice wastes money and time.
  • How you teach matters as much as which model you pick. Supervised needs labels. Unsupervised discovers structure. Reinforcement learns through reward signals.
  • Bias and variance are your diagnostic tools. Underfitting — too simple, misses patterns. Overfitting — memorized noise, fails on new data. Every model performance problem maps to one of these two.
  • Pure AI has real failure modes. Non-determinism, misplaced confidence, and accountability gaps make pure ML the wrong choice for high-stakes or regulated decisions.
  • Hybrid is how serious production systems are built. Rules for the deterministic core. ML for the ambiguous edges.
ConceptMental ModelOne-Liner
Rules vs. PatternsThe Restaurant KitchenRecipes vs. a chef who’s tasted 10,000 dishes
Rule ConflictsThe 400-Rule BankTwo rules fire at once — which one wins?
Classical MLThe Real Estate AgentPredicts price from a table of structured features
Deep LearningBlind Men and the ElephantLayers build understanding from raw unstructured input
Supervised LearningThe Flashcard TutorLearn from labeled question-answer pairs
Unsupervised LearningThe LibrarianDiscover structure without being told what to look for
Reinforcement LearningThe Dog TrainerLearn a strategy through reward and penalty
Bias / UnderfittingThe Out-of-Depth Dinner GuestNo domain knowledge — can’t even start
Variance / OverfittingThe Overconfident Oil TraderMemorized the past, blind to what’s new
Hybrid SystemsThe Chef + Recipe BookRules for the core, pattern knowledge for the edges

Final Thought

The shift from rules to patterns is the single biggest mental model change for anyone coming from traditional software. Once you internalize it, everything else in AI — embeddings, transformers, RAG pipelines, agents — becomes a variation on the same theme: teaching machines to find patterns instead of writing rules by hand.

Three things to carry with you:

  1. Start with the simplest tool that works. Rules first. Classical ML if rules fail. Deep learning only when the data demands it. Complexity has a cost that compounds in production.
  2. The data is the product. In traditional software the code is the asset. In ML the data is the asset. The quality, diversity, and representativeness of your training data matters more than which algorithm you pick.
  3. Bias-variance is your north star. Every performance failure in an ML system maps back to this tradeoff. Learn to diagnose which side you’re on and you’ll always know what lever to pull.

In the next post we go deeper into the foundational data structure of modern AI: embeddings and vector spaces. This is how machines translate human concepts — words, images, preferences — into math they can actually work with. It’s the key that unlocks semantic search, recommendation engines, and the entire RAG architecture.


Math Reference

The concepts in this post are powered by a small set of formulas. You do not need to work these to understand the intuition — the body of this post covered that. But if you encounter them in a paper, textbook, or codebase, here is what each one is.

Loss Function (generic form)
L = (1/n) * SUM[ difference between predicted and actual ]

What it measures: The single number a model is trying to minimize during training — the average gap between what the model predicts and what actually happened.


Mean Squared Error (MSE)
MSE = (1/n) * SUM[ (y_actual - y_predicted)^2 ]

What it measures: The average squared gap between actual and predicted values — the most common loss function for regression problems.


Bias-Variance Decomposition
Total Error = Bias^2 + Variance + Irreducible Noise

What it measures: The three sources of prediction error in any ML model — systematic wrongness (bias), sensitivity to training data fluctuations (variance), and noise that no model can eliminate.

Related Posts:

  • [C1] What Machines Actually Do (And What They Don't)
  • [MI 2] How CNNs Actually Work
  • [ML x] Machine Decision: From One Tree to a Forest
  • [ML 1.b] Teaching AI Models: Gradient Descent
  • [ML 2.c] Needle in the Haystack: Embedding Training…
  • [ML 1.a] ML Foundations - Linear Combinations to…

Tags:

Supervised LearningUnsupervised LearningReinforced LearningMachine Learning
Author

Archit Sharma

Follow Me
Other Articles
Previous

[ML 1.a] ML Foundations – Linear Combinations to Logistic Regression

Next

[ML x] Machine Decision: From One Tree to a Forest

No Comment! Be the first one.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

icons8 pencil 100
ML Basics

Back to the basics

screenshot 1
Model Intuition

Build model intuition

icons8 lock 100 (1)
Encryption

How encryption works

icons8 gears 100
Privacy Tech

What protects privacy

screenshot 4
Musings

Writing is thinking

Recent Posts

  • Exploring “Linear” in Linear Regression
  • The curious case of R-Squared: Keep Guessing
  • [C1] What Machines Actually Do (And What They Don’t)
  • [ML x] Machine Decision: From One Tree to a Forest
  • [MI 3] Seq2Seq Models: Basics behind LLMs
  • [MU 1] Advertising in the Age of AI
  • [MI 1] An Intuitive Guide to CNNs and RNNs
  • How Smart Vector Search Works
  • [PET 1.c] Privacy Enhancing Technologies (PETs) — Part 3
  • [PET 1.b] Privacy Enhancing Technologies (PETs) — Part 2
Copyright 2026 — Building AI Intuition. All rights reserved. Blogsy WordPress Theme