AI Paradigm Shift: From Rules to Patterns
Post 1/N
Every piece of software you’ve ever shipped or have seen shipped works the same way. A developer sits down, thinks through the logic, and writes explicit rules. If the user clicks here, do this. If the input is greater than 100, reject it. If the date is past the deadline, send an email.
This approach has built the entire digital world – from banking systems to flight booking engines to the app on your phone. But it has one fatal flaw: it breaks the moment the problem gets ambiguous. Try writing explicit rules to detect sarcasm in a customer review. Or to identify a cat in a photo. Or to predict which ad a user will click next Tuesday at 2pm. You can’t – because the rules are too complex, too numerous, and too fluid for any human to write down.
This is the paradigm shift that machine learning represents. Instead of humans writing rules, we show the machine thousands (or millions) of examples and let it figure out the rules on its own. This post will give you the mental model for understanding this shift, when classical ML is enough versus when you need deep learning, and the single most important tradeoff that governs every AI system ever built. No equations – just intuition.
The Two Paradigms: Rules vs. Patterns
Let’s start with the clearest possible distinction.
Mental Model: The Restaurant Kitchen
Traditional software is like a kitchen that runs on a recipe book. Every dish has an exact recipe. The chef follows it step by step. If a customer asks for something not in the book, the kitchen can’t make it. This is rule-based programming – powerful, predictable, and completely helpless when the recipe book doesn’t cover the situation.
Machine learning is like hiring a chef who has eaten at 10,000 restaurants. You don’t give them recipes. Instead, you show them 500 plates of “good pasta” and 500 plates of “bad pasta” and say, “figure out what makes pasta good.” They taste patterns – salt levels, texture, sauce ratios – that no recipe book would ever capture. They can even handle a pasta dish they’ve never seen before, because they’ve internalized the pattern of what “good” means.
This is the fundamental shift. Traditional software encodes human knowledge as rules. Machine learning encodes human examples as patterns.
Traditional Software:
Input → [Human-Written Rules] → Output
"IF email contains 'Nigerian prince' AND has attachment THEN → spam"
Machine Learning:
Input + Labeled Examples → [Algorithm Finds Patterns] → Output
"Here are 100,000 emails labeled spam/not-spam. YOU figure out the rules."
Where It Matters for PMs and Engineers: The moment you find yourself writing increasingly complex if-else chains, or your rule-based system has 400 edge cases and counting, that’s your signal. You’ve hit the ceiling of the rules paradigm. The problem likely needs a pattern-based approach.
Classical ML vs. Deep Learning: When to Use What
Not all pattern-finding is created equal. The ML world splits into two broad camps, and knowing which one to reach for is one of the highest-leverage decisions a product team makes.
Classical Machine Learning
Classical ML algorithms – think linear regression, decision trees, random forests, XGBoost – are pattern-finders that work on structured, tabular data. The kind of data that lives in spreadsheets and SQL databases.
Mental Model: The Experienced Real Estate Agent
A good real estate agent can look at a house listing – square footage, number of bedrooms, zip code, year built – and predict the price within 10%. They’ve seen enough deals that they’ve internalized the relationship between these features and the price. They don’t need to see a photo of the house. The structured data is enough.
That’s classical ML. You give it a table of features (columns) and outcomes (labels), and it finds the mathematical relationship between them.
Classical ML is mature, fast, interpretable, and cheap to run. For a shocking number of real-world product problems, it’s all you need.
Real-World Examples:
- Credit scoring (Will this person default on a loan?)
- Churn prediction (Will this user cancel their subscription?)
- Demand forecasting (How many units will we sell next quarter?)
- Fraud detection (Is this transaction suspicious?)
- Ad click-through rate prediction (Will this user click this ad?)
Deep Learning
Deep learning uses neural networks – layers of interconnected nodes that can learn incredibly complex patterns from unstructured data: images, text, audio, video.
Mental Model: The Blind Men and the Elephant
You’ve heard the story. Six blind men each touch a different part of an elephant. One feels the trunk and says “it’s a snake.” Another feels the leg and says “it’s a tree.” Each one detects a local pattern but misses the whole picture.
A deep neural network is like stacking many layers of these blind men. The first layer detects tiny local patterns (edges in an image, individual words in text). The second layer combines those into bigger patterns (shapes, phrases). The third layer combines those into even bigger concepts (faces, sentences). By the time you reach the final layer, the network has assembled a complete understanding from raw, unstructured input – something no single rule or simple algorithm could do.
Real-World Examples:
- Image recognition (Google Photos, medical imaging)
- Natural language processing (ChatGPT, Google Translate)
- Speech recognition (Siri, Alexa)
- Recommendation systems (YouTube, TikTok’s “For You” page)
- Autonomous driving (Tesla, Waymo)
The Decision Matrix
| Factor | Classical ML | Deep Learning |
|---|---|---|
| Data Type | Structured (tables, numbers) | Unstructured (text, images, audio) |
| Data Volume Needed | Hundreds to thousands of rows | Thousands to millions of examples |
| Training Time | Minutes to hours | Hours to weeks |
| Interpretability | High (you can explain why) | Low (it’s a black box) |
| Compute Cost | Low (runs on a laptop) | High (needs GPUs) |
| When to Use | Clear features, tabular data | Raw signals, complex patterns |
The PM Takeaway: Before you pitch an LLM-powered feature, ask yourself: “Can this problem be solved with a well-designed XGBoost model on structured data?” If yes, you’ll ship faster, spend less, and have a system you can actually explain to stakeholders. Not every problem needs a neural network, and reaching for one when you don’t need it is one of the most expensive mistakes in AI product development.
Supervised vs. Unsupervised vs. Reinforcement Learning: Teaching with and without Answers
Now that you know what kind of pattern-finder to use, the next question is how you teach it.
Supervised Learning
Mental Model: The Flashcard Tutor
Imagine tutoring a student with flashcards. You show them a card with a question on the front (“What’s in this image?”) and the answer on the back (“A cat”). After hundreds of flashcards, the student starts getting the answers right on their own – even for cards they’ve never seen.
Supervised learning works exactly like this. You provide the algorithm with labeled data – inputs paired with correct outputs – and it learns the mapping between them.
This is the most common and well-understood form of ML. Every time you see “training data,” it almost always means labeled examples.
Two flavors:
- Classification: The output is a category. “Is this email spam or not?” “Is this tumor malignant or benign?”
- Regression: The output is a number. “What will this house sell for?” “How many minutes will delivery take?”
Unsupervised Learning
Mental Model: The Librarian Who Organizes by Instinct
Imagine you dump 10,000 books on a table with no labels, no categories, no Dewey Decimal numbers. A skilled librarian would start grouping them – this stack is romance, that stack is science fiction, these are cookbooks – just by reading the content. Nobody told them the categories. They discovered the structure.
That’s unsupervised learning. The algorithm finds hidden patterns and groupings in data without being told what to look for.
Real-World Examples:
- Customer segmentation (Grouping users by behavior for marketing)
- Anomaly detection (Finding the one weird transaction in millions of normal ones)
- Topic discovery (What are people talking about in these 50,000 support tickets?)
Where It Gets Practical: In product work, unsupervised learning is often a first step, not a final product. You cluster your users to discover segments, then build supervised models for each segment. You detect anomalies to flag potential fraud, then a human reviews the flags.
Reinforcement Learning
Mental Model: The Dog Trainer
You don’t show a dog 10,000 labeled flashcards of “sit” and “not sit.” Instead, you let the dog try things. When it sits, it gets a treat. When it jumps on the couch, it gets a firm “no.” Over time, the dog learns a strategy – a sequence of behaviors that maximizes treats and minimizes scolding. Nobody gave it a rulebook. Nobody showed it labeled examples. It learned by doing and getting feedback.
That’s reinforcement learning. An agent takes actions in an environment, receives rewards or penalties, and learns a strategy (called a policy) that maximizes long-term reward.
The key difference from supervised learning: there’s no “right answer” for each input. There’s only a score at the end. The agent has to figure out which of its many actions along the way contributed to winning or losing – like a chess player reviewing a game and asking “which move was the mistake?”
Real-World Examples:
- Game-playing AI (AlphaGo, OpenAI Five for Dota 2)
- Robotics (teaching a robot arm to pick up objects through trial and error)
- Recommendation feeds (TikTok’s algorithm optimizing for watch time)
- RLHF – Reinforcement Learning from Human Feedback – the technique that turned raw GPT into ChatGPT by rewarding helpful, harmless responses
The PM Takeaway: You won’t reach for RL in most product work – it’s data-hungry, slow to train, and hard to debug. But you should know it exists because it powers two things you’ll encounter constantly: the fine-tuning behind every major LLM (RLHF), and the optimization engines behind recommendation and ad-ranking systems. We’ll go deep on RL mechanics in Post 17.
The Bias-Variance Tradeoff: The One Concept That Rules Them All
If there’s one idea that governs every ML system – from a simple linear regression to GPT-4 – it’s this tradeoff. Get it wrong, and your model either fails spectacularly or fails quietly. Both are bad.
Mental Model: The Portrait Artist
Imagine you hire an artist to draw a portrait of your face from a photo.
High Bias (Underfitting): The artist draws a generic smiley face. It vaguely looks like a human, but it doesn’t capture you at all. The artist has oversimplified – they used a mental model too simple to capture the complexity of your actual face. This is a model that’s too simple for the data.
High Variance (Overfitting): The artist draws an incredibly detailed portrait – but they included the pimple on your nose, the ketchup stain on your shirt, and the shadow from the lamp behind you. The portrait looks exactly like the photo, but if you took a new photo the next day, the portrait would look nothing like you. The artist memorized the specific photo instead of learning what you actually look like. This is a model that’s too complex and has memorized the training data, including its noise.
The Sweet Spot: A skilled artist captures the essential features of your face – bone structure, eye shape, expression – while ignoring the accidental details. That’s what a well-tuned model does.
Model Complexity Spectrum:
Underfitting Sweet Spot Overfitting
(High Bias) (High Variance)
| | |
"Generic smiley" "Captures your essence" "Memorized the photo"
| | |
Too simple. Just right. Too complex.
Misses real patterns. Generalizes well. Memorizes noise.
Fails on training data. Works on new data. Fails on new data.
Why This Matters in Production:
Overfitting is the silent killer of AI products. Your model crushes the test set during development, everyone celebrates, and then it falls apart in production because real-world data is messier, more diverse, and constantly shifting compared to the neat training set.
Every decision in ML – how much data to collect, how complex a model to use, when to stop training, which features to include – is ultimately a decision about where to sit on this spectrum.
The PM Takeaway: When your data science team says “the model is overfitting,” what they mean is: “It memorized the training data instead of learning the real pattern, so it won’t work on new users.” When they say “it’s underfitting,” they mean: “The model is too simple – we need more features, more data, or a more powerful algorithm.” Your job is to help them get the right data and set the right success criteria so the model lands in the sweet spot.
How It All Connects
Here’s how these concepts form a decision tree for any AI product initiative:
Step 1: Can you solve this with explicit rules?
| |
YES NO
| |
Write code. You need ML.
Ship it. |
Step 2: What's your data?
| |
Structured Unstructured
(tables) (text/images/audio)
| |
Classical ML Deep Learning
(XGBoost, etc.) (Neural Networks)
| |
Step 3: How does the system learn?
| | |
Labeled data? No labels? Reward signal?
| | |
Supervised Unsupervised Reinforcement
Learning Learning Learning
| | |
Step 4: Tune for the bias-variance sweet spot.
Monitor in production.
Iterate.
This is a simplification, of course – real products mix and match. A recommendation system might use unsupervised clustering to segment users, then supervised models to rank content within each segment, with deep learning for processing the content itself. But this decision tree gives you the mental scaffolding to navigate those conversations.
Common Misconceptions
“More data always helps.” Not if the data is mislabeled, biased, or irrelevant. 1,000 clean, representative examples often outperform 100,000 noisy ones. Data quality beats data quantity almost every time.
“Deep learning is always better than classical ML.” On structured tabular data, XGBoost and similar algorithms routinely beat deep learning in Kaggle competitions. Deep learning shines on unstructured data. Using an LLM to predict churn from a user behavior table is like using a sledgehammer to hang a picture frame.
“The model is wrong, so we need a better algorithm.” Nine times out of ten, the problem is the data, not the algorithm. Garbage in, garbage out is the oldest truth in computer science, and it applies doubly to ML.
“AI replaces human judgment.” AI augments human judgment. The best AI products are designed with humans in the loop – especially for high-stakes decisions. The model predicts, the human decides.
The Mental Models – Your Cheat Sheet
| Concept | Mental Model | One-Liner |
|---|---|---|
| Rules vs. Patterns | The Restaurant Kitchen | Recipes vs. a chef who’s tasted 10,000 dishes |
| Classical ML | The Real Estate Agent | Predicts price from structured features |
| Deep Learning | Blind Men and the Elephant | Layers build understanding from raw input |
| Supervised Learning | The Flashcard Tutor | Learn from labeled question-answer pairs |
| Unsupervised Learning | The Librarian | Discover structure without labels |
| Underfitting (High Bias) | The Generic Smiley Face | Too simple to capture real patterns |
| Overfitting (High Variance) | The Over-Detailed Portrait | Memorized noise instead of signal |
| The Sweet Spot | The Skilled Portrait Artist | Captures essence, ignores accidents |
| Reinforcement Learning | The Dog Trainer | Learn a strategy with reward and penalty |
Final Thought
The paradigm shift from rules to patterns is the single biggest mental model change for anyone coming from traditional software engineering. Once you internalize it, everything else in AI – from embeddings to transformers to RAG pipelines – becomes a variation on the same theme: teaching machines to find patterns instead of writing rules.
Three things to carry with you:
- Start simple. If rules work, use rules. If classical ML works, use classical ML. Reach for deep learning only when the problem demands it. The best AI product teams are disciplined about using the simplest tool that solves the problem.
- The data is the product. In traditional software, the code is the asset. In ML, the data is the asset. The quality, diversity, and representativeness of your training data matters more than which algorithm you pick. If you’re a PM, fight harder for data quality than for a fancier model.
- Bias-variance is your north star. Every performance issue in an ML system – from a model that doesn’t work at all to one that works in testing but fails in production – maps back to this tradeoff. Learn to diagnose which side you’re on, and you’ll always know what lever to pull.
In the next post, we’ll go deeper into the foundational data structure of modern AI: embeddings and vector spaces. This is how machines translate human concepts – words, images, preferences – into math they can actually work with. It’s the key that unlocks semantic search, recommendation engines, and the entire RAG architecture. See you there.