Skip to content
icon icon Building AI Intuition

Connecting the dots...

icon icon Building AI Intuition

Connecting the dots...

  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Concepts
  • Musings
  • About
  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Concepts
  • Musings
  • About
Close

Search

Subscribe
icon icon Building AI Intuition

Connecting the dots...

icon icon Building AI Intuition

Connecting the dots...

  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Concepts
  • Musings
  • About
  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Concepts
  • Musings
  • About
Close

Search

Subscribe
Recent Posts
April 7, 2026
Exploring “Linear” in Linear Regression
April 7, 2026
The curious case of R-Squared: Keep Guessing
March 11, 2026
[C1] What Machines Actually Do (And What They Don’t)
March 11, 2026
[ML x] Machine Decision: From One Tree to a Forest
November 2, 2024
[ML 1] AI Paradigm Shift: From Rules to Patterns
November 5, 2025
[ML 1.a] ML Foundations – Linear Combinations to Logistic Regression
November 14, 2025
[ML 1.b] Teaching AI Models: Gradient Descent
November 19, 2025
[ML 2] Making Sense Of Embeddings
November 22, 2025
[ML 2.a] Word2Vec: Start of Dense Embeddings
November 28, 2025
[ML 2.b] Measuring Meaning: Cosine Similarity
December 3, 2025
[ML 2.c] Needle in the Haystack: Embedding Training and Context Rot
February 16, 2026
[MI 3] Seq2Seq Models: Basics behind LLMs
February 13, 2026
[MU 1] Advertising in the Age of AI
December 9, 2025
[EN 1.a] Breaking the “Unbreakable” Encryption – 1
December 13, 2025
[EN 1.b] Breaking the “Unbreakable” Encryption – 2
December 18, 2025
[PET 1] Privacy Enhancing Technologies – Introduction
December 21, 2025
[PET 1.a] Privacy Enhancing Technologies (PETs) — Part 1
December 25, 2025
[PET 1.b] Privacy Enhancing Technologies (PETs) — Part 2
December 30, 2025
[PET 1.c] Privacy Enhancing Technologies (PETs) — Part 3
February 2, 2026
[MI 1] An Intuitive Guide to CNNs and RNNs
November 9, 2025
[MI 2] How CNNs Actually Work
January 16, 2026
How Smart Vector Search Works
Home/Concepts/The curious case of R-Squared: Keep Guessing
Concepts

The curious case of R-Squared: Keep Guessing

By Archit Sharma
5 Min Read
0

Most explanations of R-squared start with a formula:

R² = 1 − (SS_res / SS_tot)

Then they say something like “the proportion of variance explained by the model” and move on. And you nod, and you write it down, and somewhere in the back of your head a small voice says: what does “explained” actually mean?

I went down this rabbit hole recently, and the answer turned out to be much simpler — and much more useful — than the textbook framing. So here it is, the way I wish someone had told me on day one.


The Question Nobody Asks

Mean Squared Error is straightforward. For every house in your dataset, you take the model’s predicted price, subtract the actual price, square it, and average. The smaller the number, the closer your predictions are to reality. Done.

And we take square for MSE to ensure we don’t artificially balance negative errors against positive errors and then to bring it back to the regular world metrics, we may choose to do a square root again, to get RMSE – root mean squared error.

But here’s the thing: an MSE of 1,800 means nothing on its own. Is that good? Bad? It depends entirely on what you’re predicting. An MSE of 1,800 on house prices in dollars is excellent. An MSE of 1,800 on shoe sizes is a disaster.

MSE has no built-in sense of scale. It tells you how wrong you are in absolute terms, but it can’t tell you whether your model is doing anything clever — or whether you’d have done about the same by flipping a coin.

That’s the gap R-squared fills.


The Naive Baseline

Imagine you have ten houses and you have to predict their prices, but you’re not allowed to look at any features — no square footage, no bedrooms, no zip code. Nothing.

What’s the best you can do?

Guess the average. For every single house, you predict the mean price of all ten. It’s a lazy, dumb prediction — but it’s the best you can do without information. Some houses you’ll overshoot, some you’ll undershoot, and the total squared error you rack up is exactly the variance of your dataset.

That number — the squared error you get from blindly predicting the mean — is your floor. It’s what life looks like with zero intelligence.

Now you build a real model. It looks at features. It learns weights. It produces a new MSE. And the only question worth asking is:

How much better is my model than the lazy version that just guesses the average?

That’s R-squared. That’s the whole thing.

R² = 1 − (model's MSE / variance of the data)
        \_______________/
         "what fraction of the dumb baseline error
          did the model manage to get rid of?"

If the model cuts the lazy baseline error by 70%, R² = 0.7. If it cuts it by 36%, R² = 0.36. If it does no better than the mean, R² = 0. If it’s worse than the mean (yes, this happens), R² goes negative.


The Circle Analogy

Here’s the picture I find easiest to hold in my head.

              N
              *
              |
              |
    W *-------+-------* E  ← four houses on a circle of radius r
              |              around the true center (the mean)
              |
              *
              S

Imagine four houses sitting at the four cardinal points of a circle, radius r, centered on the average price. If your “model” is just predict the mean for everyone, then every house is r away from your prediction. Each squared error is r², and the total error across the four houses is 4r². That is your total variance. That is the floor.

Now a real model shows up. For each house, it predicts a point on a smaller, concentric circle of radius r/2 — closer to each true house than the center was. Each squared error is now (r/2)² = r²/4. Across four houses, the total is r².

Plug it in:

R² = 1 − (r² / 4r²) = 1 − 0.25 = 0.75

Your model explains 75% of the spread. Meaning: of the original 4r² of “wrongness” you’d have eaten by guessing the mean, the model has eliminated three-quarters of it. The remaining quarter is what the model still can’t account for.

Suddenly “explained variance” stops sounding like jargon and starts sounding like what it actually is: how much of the original mess did we manage to clean up? Or how much closer is our estimate to the real values compared to just using mean for each estimate.


A Detail Worth Noticing

Here’s a fun follow-up. What if the model’s predictions sit on a circle of radius 1.5r — bigger than the original spread, on the opposite side of each true house?

Each squared error is still (0.5r)² = r²/4. Total is still r². R² is still 0.75.

R-squared doesn’t care which side of the truth your prediction sits on, or whether your prediction is bigger or smaller than the actual value. It only cares about the size of the gap. Two very different-looking models can have identical R² scores. Worth remembering before you celebrate one.


So How Do MSE and R² Actually Relate?

They’re not competing metrics. They’re answering two different questions about the same numbers.

MetricQuestion it answers
MSEOn average, how far off are my predictions, in the units of the thing I’m predicting?
R²Compared to the dumbest possible prediction (the mean), what fraction of the error did I get rid of?

MSE is the raw measurement. R² is MSE graded on a curve — the curve being the worst you’d do without any model at all. One is absolute, the other is relative. You usually want both: MSE tells you what your error looks like in dollars or degrees or grams; R² tells you whether the model is actually pulling its weight.


The One-Liner

R² is just asking: how much better is your model than guessing the average?

That’s it. Not “proportion of variance explained.” Not a ratio of sum-of-squares. Just: did the model beat the laziest possible baseline, and by how much?

Once that clicks, the formula is no longer something you memorize. It’s something you can re-derive on the back of a napkin.

P.S. – We bring MSE back to real world as RMSE, by doing a square root, we can also bring R squared back to real world with the same formula – it becomes “r” -> Pearson Correlation Coefficient.

Related Posts:

  • [ML x] Machine Decision: From One Tree to a Forest
  • [ML 1] AI Paradigm Shift: From Rules to Patterns
  • [C1] What Machines Actually Do (And What They Don't)
  • [ML 1.b] Teaching AI Models: Gradient Descent
  • [ML 2.c] Needle in the Haystack: Embedding Training…
  • [MI 2] How CNNs Actually Work

Tags:

R SquaredMean Square ErrorMSERMSE
Author

Archit Sharma

Follow Me
Other Articles
Previous

[C1] What Machines Actually Do (And What They Don’t)

Next

Exploring “Linear” in Linear Regression

No Comment! Be the first one.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

icons8 pencil 100
ML Basics

Back to the basics

screenshot 1
Model Intuition

Build model intuition

icons8 lock 100 (1)
Encryption

How encryption works

icons8 gears 100
Privacy Tech

What protects privacy

screenshot 4
Musings

Writing is thinking

Recent Posts

  • Exploring “Linear” in Linear Regression
  • The curious case of R-Squared: Keep Guessing
  • [C1] What Machines Actually Do (And What They Don’t)
  • [ML x] Machine Decision: From One Tree to a Forest
  • [MI 3] Seq2Seq Models: Basics behind LLMs
  • [MU 1] Advertising in the Age of AI
  • [MI 1] An Intuitive Guide to CNNs and RNNs
  • How Smart Vector Search Works
  • [PET 1.c] Privacy Enhancing Technologies (PETs) — Part 3
  • [PET 1.b] Privacy Enhancing Technologies (PETs) — Part 2
Copyright 2026 — Building AI Intuition. All rights reserved. Blogsy WordPress Theme