What is Feature Engineering?

Feature engineering is the process of selecting, transforming, and creating input variables (features) from raw data to improve a machine learning model's performance. It's the art and science of representing data in a way that helps models learn the right patterns.

Why It Matters

In classical ML, feature engineering often matters more than model choice — a simple model with great features beats a complex model with poor features. While deep learning and LLMs have automated some feature engineering (learning representations directly from raw data), the concept remains essential for tabular data, time series, and understanding how AI extracts signal from noise.

How It Works

Types of feature engineering:

1. Feature selection:

Choose which raw features to include
Remove irrelevant, redundant, or noisy features
Methods: correlation analysis, mutual information, recursive feature elimination

2. Feature transformation:

Scaling — normalize features to similar ranges (StandardScaler, MinMaxScaler)
Log transform — handle skewed distributions (income, prices)
Encoding — convert categorical variables to numbers (one-hot encoding, label encoding)
Binning — group continuous values into categories

3. Feature creation:

Combine existing features: price_per_sqm = price / area
Extract from dates: day_of_week, is_weekend, month
Text features: word count, sentiment score, TF-IDF
Aggregations: average_purchase_last_30_days, total_logins

4. Domain-specific features:

Finance: moving averages, volatility, RSI
NLP: n-grams, POS tags, named entities
Computer vision: HOG, SIFT, edge histograms (before deep learning)
Time series: lag features, rolling statistics, Fourier components

Deep learning and feature engineering:

Neural networks learn features automatically (representation learning)
Convolutional layers learn image features; transformer layers learn text features
This reduced (but didn't eliminate) the need for manual feature engineering
Tabular data still benefits significantly from manual feature engineering

Feature stores:

Centralized systems for storing, versioning, and serving features
Ensures consistency between training and inference
Tools: Feast, Tecton, Vertex AI Feature Store

Example

Predicting house prices from raw data: a good feature engineer creates distance_to_city_center from coordinates, price_per_sqm from price and area, property_age from build_year and current_year, and neighborhood_avg_price from aggregating nearby sales. These engineered features capture relationships the model might struggle to learn from raw numbers alone.