Skip to content
icon icon Building AI Intuition

Connecting the dots...

icon icon Building AI Intuition

Connecting the dots...

  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Concepts
  • Musings
  • About
  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Concepts
  • Musings
  • About
Close

Search

Subscribe
icon icon Building AI Intuition

Connecting the dots...

icon icon Building AI Intuition

Connecting the dots...

  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Concepts
  • Musings
  • About
  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Concepts
  • Musings
  • About
Close

Search

Subscribe
Recent Posts
April 7, 2026
Exploring “Linear” in Linear Regression
April 7, 2026
The curious case of R-Squared: Keep Guessing
March 11, 2026
[C1] What Machines Actually Do (And What They Don’t)
March 11, 2026
[ML x] Machine Decision: From One Tree to a Forest
November 2, 2024
[ML 1] AI Paradigm Shift: From Rules to Patterns
November 5, 2025
[ML 1.a] ML Foundations – Linear Combinations to Logistic Regression
November 14, 2025
[ML 1.b] Teaching AI Models: Gradient Descent
November 19, 2025
[ML 2] Making Sense Of Embeddings
November 22, 2025
[ML 2.a] Word2Vec: Start of Dense Embeddings
November 28, 2025
[ML 2.b] Measuring Meaning: Cosine Similarity
December 3, 2025
[ML 2.c] Needle in the Haystack: Embedding Training and Context Rot
February 16, 2026
[MI 3] Seq2Seq Models: Basics behind LLMs
February 13, 2026
[MU 1] Advertising in the Age of AI
December 9, 2025
[EN 1.a] Breaking the “Unbreakable” Encryption – 1
December 13, 2025
[EN 1.b] Breaking the “Unbreakable” Encryption – 2
December 18, 2025
[PET 1] Privacy Enhancing Technologies – Introduction
December 21, 2025
[PET 1.a] Privacy Enhancing Technologies (PETs) — Part 1
December 25, 2025
[PET 1.b] Privacy Enhancing Technologies (PETs) — Part 2
December 30, 2025
[PET 1.c] Privacy Enhancing Technologies (PETs) — Part 3
February 2, 2026
[MI 1] An Intuitive Guide to CNNs and RNNs
November 9, 2025
[MI 2] How CNNs Actually Work
January 16, 2026
How Smart Vector Search Works
Home/Concepts/Exploring “Linear” in Linear Regression
Concepts

Exploring “Linear” in Linear Regression

By Archit Sharma
5 Min Read
0

Linear regression is one of those things you learn early, use forever, and never quite slow down to inspect. So here’s a slow inspection — three questions that look obvious until you actually try to answer them.

  1. What’s linear about linear regression?
  2. Why do we still bother with it when we could fit a curve?
  3. If linear regression and logistic regression both start with the same linear combination, can I run linear regression first and then squash the result with a sigmoid to do classification?

The first two are warm-ups. The third is where most intuition quietly breaks.


What “Linear” Actually Means

A linear combination is just this:

output = w1·x1 + w2·x2 + w3·x3 + ... + bias

Each input is multiplied by a weight. The results are added up. That’s it.

The word linear refers to one specific thing: no input is raised to any power, and no input is transformed before being weighted. No x², no √x, no log(x), no x1 · x2. Each feature contributes proportionally to the output. Double the square footage, double its contribution to the predicted price. Halve it, halve the contribution.

The relationship between an individual input and the output is a straight line. Always. That’s where the name comes from — not from the regression line on the chart, but from the straight-line relationship between each feature and the prediction.

A feature can still contribute negatively (a higher crime score lowers the price), and a feature can still be inversely related to the output (a higher weight on 1/distance_to_city punishes distant houses). What it can’t do is bend. No curves.


Linear Combination ≠ Linear Regression

Worth pausing here, because people use these as if they were the same thing.

  • Linear combination is the structure: just weighted inputs added together.
  • Linear regression is the process of finding the specific weights that make that structure predict your data as accurately as possible.

Linear combination is the equation. Linear regression is the search for the best version of that equation given a pile of real examples. One is a noun, the other is a verb.

Put the two together and you get the cleanest definition of the whole thing: linear regression is the art of finding the best linear combination that fits a set of data points — the one that minimizes the gap between the line and each actual point, and therefore acts as the best prediction line for everything that comes next.


Why a Straight Line, When Curves Fit Better?

The honest answer is: because straight lines are the only thing humans can reason about cleanly.

You could fit a curve. Polynomial regression exists. Splines exist. Neural networks exist. They will often fit your training data more tightly. But the moment you bend the line, three things start to go wrong:

  1. Interpretability collapses. With a straight line you can say “every extra square foot adds $200 to the price.” Try saying that about a sixth-degree polynomial. The model becomes a black box that spits out numbers nobody can defend in a meeting.
  2. Visualization collapses. A line in 2D you can draw. A curve in 12-dimensional feature space you cannot, and neither can your stakeholders.
  3. Overfitting becomes the default. A curve flexible enough to hit every training point will hit every quirk and accident in your data. On new data, it falls apart.

Linear regression is the model that says: I will take the hit on accuracy in exchange for being something a human can hold in their head and explain to their boss. For a huge number of real-world problems, that trade is worth it.


Now the Trap: Linear Regression → Sigmoid?

Here’s the question that catches people. Logistic regression is “linear regression plus a sigmoid,” right? You take the linear combination, push it through 1 / (1 + e^(-x)), and out comes a probability between 0 and 1.

So can you just train a linear regression on house prices, take its output, and pipe that into a sigmoid to answer the question “will this house sell for more than $500K?”

It feels like it should work. It doesn’t. And the reason is genuinely interesting.

Imagine you train linear regression on house prices. It learns weights to minimize the dollar error between predicted and actual prices. You feed it a house, and it predicts $50,000. Now you push that $50,000 through a sigmoid and get out, say, 0.2.

Here’s the trap: that 0.2 is fixed. It is now your model’s answer to every question you might ask:

  • “Will this house sell for more than $50K?” → 0.2
  • “Will it sell for more than $500K?” → 0.2
  • “Will it sell for more than $5?” → 0.2

The 0.2 has no idea what threshold you care about. It’s just a deterministic transformation of the dollar prediction. The probability isn’t probability of anything in particular. It’s just a squashed price.


The Reversal: What Logistic Regression Actually Does

Logistic regression doesn’t work that way. It doesn’t predict a price and then ask a yes/no question afterwards. The yes/no question is baked in from the start.

When you train a logistic model with the threshold “above $500K,” the training process does this: for every house, it knows the true answer (yes or no), and it adjusts the weights so that the linear combination, after sigmoid, comes out near 1 for the yeses and near 0 for the nos. The weights it learns are the weights that best separate those two classes — not the weights that best predict the dollar price.

If you change the threshold to $50K, the training data labels flip for a bunch of houses, and the model learns a completely different set of weights. Different boundary, different problem, different model.

So the relationship between linear and logistic regression isn’t:

Step 1: Run linear regression. Step 2: Squash the output.

It’s:

Both models use a linear combination as their core. Linear regression trains the weights to minimize squared dollar error. Logistic regression trains the weights to maximize how cleanly the sigmoid output separates the yeses from the nos.

Same machinery. Different thing being optimized. Different weights at the end. You cannot extract one from the other.


Why This Matters

This is the kind of confusion that reads “minor technicality” but is actually the whole point. Two models can share an architecture and still be answering completely different questions, because what makes a model what it is isn’t its structure — it’s the loss function it was trained against.

A linear combination is just a shape. Linear regression pours a “minimize squared error” loss into that shape and gets a price predictor. Logistic regression pours a “separate the classes” loss into the same shape and gets a classifier. The shape is identical. The thing being chiseled is the same block of marble. But the sculpture at the end is not the same.

Once you see that, a lot of ML stops being a zoo of unrelated models and starts looking like one machine — linear combination + activation + loss — being pointed at different problems by swapping the last two pieces.


The One-Liner

Linear regression and logistic regression aren’t a pipeline. They’re the same machine pointed at different questions, and the question gets baked in during training — not bolted on at the end.

Try to bolt it on at the end and you get a number that looks like a probability and means nothing.

Related Posts:

  • [ML 1.a] ML Foundations - Linear Combinations to…
  • [ML x] Machine Decision: From One Tree to a Forest
  • [ML 1] AI Paradigm Shift: From Rules to Patterns
  • [C1] What Machines Actually Do (And What They Don't)
  • [ML 2.c] Needle in the Haystack: Embedding Training…
  • [ML 1.b] Teaching AI Models: Gradient Descent

Tags:

artificial-intelligencemachine-learningLinear RegressionLogistic Regression
Author

Archit Sharma

Follow Me
Other Articles
Previous

The curious case of R-Squared: Keep Guessing

No Comment! Be the first one.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

icons8 pencil 100
ML Basics

Back to the basics

screenshot 1
Model Intuition

Build model intuition

icons8 lock 100 (1)
Encryption

How encryption works

icons8 gears 100
Privacy Tech

What protects privacy

screenshot 4
Musings

Writing is thinking

Recent Posts

  • Exploring “Linear” in Linear Regression
  • The curious case of R-Squared: Keep Guessing
  • [C1] What Machines Actually Do (And What They Don’t)
  • [ML x] Machine Decision: From One Tree to a Forest
  • [MI 3] Seq2Seq Models: Basics behind LLMs
  • [MU 1] Advertising in the Age of AI
  • [MI 1] An Intuitive Guide to CNNs and RNNs
  • How Smart Vector Search Works
  • [PET 1.c] Privacy Enhancing Technologies (PETs) — Part 3
  • [PET 1.b] Privacy Enhancing Technologies (PETs) — Part 2
Copyright 2026 — Building AI Intuition. All rights reserved. Blogsy WordPress Theme