Exploring “Linear” in Linear Regression

5 Min Read

Linear regression is one of those things you learn early, use forever, and never quite slow down to inspect. So here’s a slow inspection — three questions that look obvious until you actually try to answer them.

What’s linear about linear regression?
Why do we still bother with it when we could fit a curve?
If linear regression and logistic regression both start with the same linear combination, can I run linear regression first and then squash the result with a sigmoid to do classification?

The first two are warm-ups. The third is where most intuition quietly breaks.

What “Linear” Actually Means

A linear combination is just this:

output = w1·x1 + w2·x2 + w3·x3 + ... + bias

Each input is multiplied by a weight. The results are added up. That’s it.

The word linear refers to one specific thing: no input is raised to any power, and no input is transformed before being weighted. No x², no √x, no log(x), no x1 · x2. Each feature contributes proportionally to the output. Double the square footage, double its contribution to the predicted price. Halve it, halve the contribution.

The relationship between an individual input and the output is a straight line. Always. That’s where the name comes from — not from the regression line on the chart, but from the straight-line relationship between each feature and the prediction.

A feature can still contribute negatively (a higher crime score lowers the price), and a feature can still be inversely related to the output (a higher weight on 1/distance_to_city punishes distant houses). What it can’t do is bend. No curves.

Linear Combination ≠ Linear Regression

Worth pausing here, because people use these as if they were the same thing.

Linear combination is the structure: just weighted inputs added together.
Linear regression is the process of finding the specific weights that make that structure predict your data as accurately as possible.

Linear combination is the equation. Linear regression is the search for the best version of that equation given a pile of real examples. One is a noun, the other is a verb.

Put the two together and you get the cleanest definition of the whole thing: linear regression is the art of finding the best linear combination that fits a set of data points — the one that minimizes the gap between the line and each actual point, and therefore acts as the best prediction line for everything that comes next.

Why a Straight Line, When Curves Fit Better?

The honest answer is: because straight lines are the only thing humans can reason about cleanly.

You could fit a curve. Polynomial regression exists. Splines exist. Neural networks exist. They will often fit your training data more tightly. But the moment you bend the line, three things start to go wrong:

Interpretability collapses. With a straight line you can say “every extra square foot adds $200 to the price.” Try saying that about a sixth-degree polynomial. The model becomes a black box that spits out numbers nobody can defend in a meeting.
Visualization collapses. A line in 2D you can draw. A curve in 12-dimensional feature space you cannot, and neither can your stakeholders.
Overfitting becomes the default. A curve flexible enough to hit every training point will hit every quirk and accident in your data. On new data, it falls apart.

Linear regression is the model that says: I will take the hit on accuracy in exchange for being something a human can hold in their head and explain to their boss. For a huge number of real-world problems, that trade is worth it.

Now the Trap: Linear Regression → Sigmoid?

Here’s the question that catches people. Logistic regression is “linear regression plus a sigmoid,” right? You take the linear combination, push it through 1 / (1 + e^(-x)), and out comes a probability between 0 and 1.

So can you just train a linear regression on house prices, take its output, and pipe that into a sigmoid to answer the question “will this house sell for more than $500K?”

It feels like it should work. It doesn’t. And the reason is genuinely interesting.

Imagine you train linear regression on house prices. It learns weights to minimize the dollar error between predicted and actual prices. You feed it a house, and it predicts $50,000. Now you push that $50,000 through a sigmoid and get out, say, 0.2.

Here’s the trap: that 0.2 is fixed. It is now your model’s answer to every question you might ask:

“Will this house sell for more than $50K?” → 0.2
“Will it sell for more than $500K?” → 0.2
“Will it sell for more than $5?” → 0.2

The 0.2 has no idea what threshold you care about. It’s just a deterministic transformation of the dollar prediction. The probability isn’t probability of anything in particular. It’s just a squashed price.

The Reversal: What Logistic Regression Actually Does

Logistic regression doesn’t work that way. It doesn’t predict a price and then ask a yes/no question afterwards. The yes/no question is baked in from the start.

When you train a logistic model with the threshold “above $500K,” the training process does this: for every house, it knows the true answer (yes or no), and it adjusts the weights so that the linear combination, after sigmoid, comes out near 1 for the yeses and near 0 for the nos. The weights it learns are the weights that best separate those two classes — not the weights that best predict the dollar price.

If you change the threshold to $50K, the training data labels flip for a bunch of houses, and the model learns a completely different set of weights. Different boundary, different problem, different model.

So the relationship between linear and logistic regression isn’t:

Step 1: Run linear regression. Step 2: Squash the output.

It’s:

Both models use a linear combination as their core. Linear regression trains the weights to minimize squared dollar error. Logistic regression trains the weights to maximize how cleanly the sigmoid output separates the yeses from the nos.

Same machinery. Different thing being optimized. Different weights at the end. You cannot extract one from the other.

Why This Matters

This is the kind of confusion that reads “minor technicality” but is actually the whole point. Two models can share an architecture and still be answering completely different questions, because what makes a model what it is isn’t its structure — it’s the loss function it was trained against.

A linear combination is just a shape. Linear regression pours a “minimize squared error” loss into that shape and gets a price predictor. Logistic regression pours a “separate the classes” loss into the same shape and gets a classifier. The shape is identical. The thing being chiseled is the same block of marble. But the sculpture at the end is not the same.

Once you see that, a lot of ML stops being a zoo of unrelated models and starts looking like one machine — linear combination + activation + loss — being pointed at different problems by swapping the last two pieces.

The One-Liner

Linear regression and logistic regression aren’t a pipeline. They’re the same machine pointed at different questions, and the question gets baked in during training — not bolted on at the end.

Try to bolt it on at the end and you get a number that looks like a probability and means nothing.

Tags:

Exploring “Linear” in Linear Regression

What “Linear” Actually Means

Linear Combination ≠ Linear Regression

Why a Straight Line, When Curves Fit Better?

Now the Trap: Linear Regression → Sigmoid?

The Reversal: What Logistic Regression Actually Does

Why This Matters

The One-Liner

Related Posts:

Tags:

Archit Sharma

Other Articles

The curious case of R-Squared: Keep Guessing

[MI 4] The BERT Family: A family of different experts

No Comment! Be the first one.

Leave a Reply Cancel reply

ML Basics

Model Intuition

Encryption

Privacy Tech

Musings