Core Concepts
41 concepts

Scaling Laws for LLMs
Empirical patterns showing that LLM capabilities improve predictably as model size, training data, and compute increase — enabling reliable planning of AI investments

AI Hallucination
When an LLM confidently generates false or fabricated information

AI Inference
The process of running a trained LLM to generate output from input

Fine-Tuning
Training a pre-trained LLM further on domain-specific data to specialize its behavior

Temperature in AI
A parameter controlling the randomness of LLM output — lower values produce consistent results, higher values increase creativity

Top-p (Nucleus) Sampling
A decoding method that samples from the smallest set of tokens whose cumulative probability exceeds a threshold p — adapting candidate pool size to model confidence

Context Window
The maximum number of tokens an LLM can process in a single request

Large Language Model (LLM)
A neural network trained on massive text data to understand and generate human-like language

Neural Network
A network of interconnected artificial neurons that learns patterns from data — the foundational architecture behind all modern AI

Prompt
The input text or instructions given to an LLM to generate a response

Token in AI
The smallest unit of text an LLM processes — approximately 4 characters or 0.75 words

Embedding
A numerical vector that captures the semantic meaning of text, enabling similarity search

Batch Size
Batch size (examples per update) and learning rate (step size for weight updates) are the two most important hyperparameters controlling how neural networks train.

Overfitting
Overfitting means a model memorizes training data without generalizing; underfitting means a model is too simple to learn the underlying patterns. Balancing them is key to effective ML.

Artificial Intelligence (AI)
Artificial intelligence is the field of computer science that builds systems capable of performing tasks normally requiring human intelligence, such as learning, reasoning, and perception.

Backpropagation
Backpropagation is the algorithm that trains neural networks by computing how each weight contributes to prediction error and adjusting weights to reduce that error.

Catastrophic Forgetting
Catastrophic forgetting is when training a neural network on new data overwrites previously learned knowledge, causing it to lose earlier capabilities.

Continual Learning
Continual learning enables AI systems to learn new tasks over time without forgetting previous knowledge, solving the stability-plasticity dilemma.

Cosine Similarity
Cosine similarity measures how similar two vectors are by computing the cosine of the angle between them — the standard metric for comparing AI embeddings.

Deep Learning
Deep learning is a machine learning technique using multi-layered neural networks that automatically learn hierarchical data representations, powering modern AI breakthroughs.

Feature Engineering
Feature engineering transforms raw data into informative input variables for ML models — selecting, creating, and encoding features that help models learn effectively.

Federated Learning
Federated learning trains AI models across decentralized devices by sharing model updates instead of raw data, enabling privacy-preserving machine learning.

Generative AI
Generative AI is a category of AI systems that create new content — text, images, audio, code — rather than just analyzing existing data.

Gradient Descent
Gradient descent is the optimization algorithm that trains neural networks by iteratively adjusting parameters in the direction that reduces prediction error.

Latent Space
Latent space is the internal representation space learned by neural networks — a compressed mathematical space where data is mapped to vectors capturing essential features and relationships.

Machine Learning (ML)
Machine learning is a branch of AI where systems learn patterns from data to improve at a task without being explicitly programmed.

Natural Language Processing (NLP)
Natural language processing is the AI field that enables computers to understand, interpret, and generate human language, underpinning chatbots, translation, and LLMs.

Pre-training
Pre-training is the initial training phase where an AI model learns broad patterns from a large general-purpose dataset before being adapted for specific tasks.

Reasoning in AI
AI reasoning is the ability of models to think step by step, using techniques like Chain-of-Thought and reasoning models (o1, o3) for complex problem-solving.

Reinforcement Learning (RL)
Reinforcement learning is a machine learning paradigm where an agent learns optimal behavior through trial-and-error interaction with an environment, guided by reward signals.

Self-Supervised Learning
Self-supervised learning trains models by generating labels from the data itself — like predicting the next token — enabling pre-training on virtually unlimited unlabeled data.

Supervised Learning
Supervised learning is a machine learning approach where models learn from labeled input-output pairs to make predictions on new data.

Synthetic Data
Synthetic data is artificially generated data that mimics real-world patterns, used when real data is scarce, biased, or privacy-restricted.

Transfer Learning
Transfer learning is a technique where knowledge from a model trained on one task is reused for a different task, enabling powerful AI with less data and compute.

Unsupervised Learning
Unsupervised learning is a machine learning approach where models discover patterns and structure in unlabeled data without being given correct outputs.

Benchmark (AI Evaluation)
A benchmark is a standardized test used to measure and compare AI model performance, providing reproducible scores across tasks like reasoning, coding, and knowledge.

Classifier
A classifier is an ML model that assigns inputs to predefined categories — the foundation of spam filters, sentiment analysis, image recognition, and fraud detection.

Loss Function
A loss function measures how wrong a model's predictions are, providing the error signal that training algorithms minimize to improve the model.

Tokenizer
A tokenizer converts raw text into tokens — the discrete units a language model processes — using subword algorithms like BPE or SentencePiece.

Difference Between Regression
Classification predicts categories (spam/not spam); regression predicts continuous values (house price). These are the two fundamental supervised ML problem types.

Difference Between Training
Training teaches a model by adjusting its parameters on data (expensive, done once); inference uses the trained model to make predictions (cheap, done millions of times).