
What is Feature Engineering?
Feature engineering is the process of selecting, transforming, and creating input variables (features) from raw data to improve a machine learning model's performance. It's the art and science of representing data in a way that helps models learn the right patterns.
Why It Matters
In classical ML, feature engineering often matters more than model choice — a simple model with great features beats a complex model with poor features. While deep learning and LLMs have automated some feature engineering (learning representations directly from raw data), the concept remains essential for tabular data, time series, and understanding how AI extracts signal from noise.
How It Works
Types of feature engineering:
1. Feature selection:
- Choose which raw features to include
- Remove irrelevant, redundant, or noisy features
- Methods: correlation analysis, mutual information, recursive feature elimination
2. Feature transformation:
- Scaling — normalize features to similar ranges (StandardScaler, MinMaxScaler)
- Log transform — handle skewed distributions (income, prices)
- Encoding — convert categorical variables to numbers (one-hot encoding, label encoding)
- Binning — group continuous values into categories
3. Feature creation:
- Combine existing features: price_per_sqm = price / area
- Extract from dates: day_of_week, is_weekend, month
- Text features: word count, sentiment score, TF-IDF
- Aggregations: average_purchase_last_30_days, total_logins
4. Domain-specific features:
- Finance: moving averages, volatility, RSI
- NLP: n-grams, POS tags, named entities
- Computer vision: HOG, SIFT, edge histograms (before deep learning)
- Time series: lag features, rolling statistics, Fourier components
Deep learning and feature engineering:
- Neural networks learn features automatically (representation learning)
- Convolutional layers learn image features; transformer layers learn text features
- This reduced (but didn't eliminate) the need for manual feature engineering
- Tabular data still benefits significantly from manual feature engineering
Feature stores:
- Centralized systems for storing, versioning, and serving features
- Ensures consistency between training and inference
- Tools: Feast, Tecton, Vertex AI Feature Store
Example
Predicting house prices from raw data: a good feature engineer creates distance_to_city_center from coordinates, price_per_sqm from price and area, property_age from build_year and current_year, and neighborhood_avg_price from aggregating nearby sales. These engineered features capture relationships the model might struggle to learn from raw numbers alone.