
What is Regression vs Classification?
Classification predicts a discrete category label ("spam" or "not spam"), while regression predicts a continuous numerical value (house price: €350,000). These are the two fundamental types of supervised machine learning problems — virtually every prediction task falls into one or the other.
Why It Matters
Choosing the right problem type determines the model architecture, loss function, evaluation metrics, and entire approach. Misframing a problem (treating a regression as classification or vice versa) leads to poor results. Understanding this distinction is the first step in any ML project.
How It Works
Classification:
- Output: discrete class label from a finite set
- Examples: spam detection (spam/not spam), image recognition (cat/dog/bird), diagnosis (positive/negative)
- Loss functions: cross-entropy, hinge loss
- Metrics: accuracy, precision, recall, F1, AUC-ROC
- Models: logistic regression, random forest, SVM, neural network classifier
Regression:
- Output: continuous numerical value
- Examples: house price prediction (€350K), temperature forecasting (21.5°C), stock price prediction, demand estimation
- Loss functions: MSE (mean squared error), MAE (mean absolute error)
- Metrics: RMSE, MAE, R² score
- Models: linear regression, polynomial regression, random forest regressor, neural network regressor
How to decide:
- "Which category?" → Classification
- "How much/many?" → Regression
- "Will it rain?" (yes/no) → Classification
- "How many mm of rain?" → Regression
Grey areas:
- Ordinal classification — classes have an order (low/medium/high risk) — can be treated as classification or regression
- Binning — converting continuous values into categories (age → "young/middle/senior") turns regression into classification
- Probability output — classifiers can output probabilities (0.87 = 87% likely spam), blurring the line
LLMs and classification/regression:
- LLMs perform classification via prompting: "Classify this review as positive or negative"
- LLMs perform regression less naturally but can: "Estimate the price of this house given..."
- For high-volume tasks, purpose-built classifiers/regressors are more efficient than LLMs
Example
A real estate company uses both: a classifier determines the property type (apartment/house/commercial) and a regressor predicts the sale price. Same input data (location, size, age, features), but two different prediction tasks requiring different models and evaluation metrics.