Models & Architecture
39 concepts

Emotion Vectors
Measurable internal neural representations inside AI models that function like emotions and causally steer the model's behavior.

Adaptive Thinking in AI
A reasoning strategy where AI models dynamically adjust how much they think per turn — from instant responses to deep multi-step deliberation — based on task complexity.

Adversarial Cost to Exploit (ACE)
A security benchmark that measures the economic token cost an adversary must spend to trick an AI agent into unauthorized tool use, replacing static pass/fail evaluations with game-theoretic cost analysis.

Automated Alignment Research
Using frontier AI models to autonomously discover methods for aligning other AI systems — addressing the scalable oversight challenge by letting safety research scale with capabilities.

DeepStack Injection
A VLM architecture that routes abstract visual features to early Transformer layers and high-resolution details to later layers for optimal document parsing in compact models.

GRPO (Group Relative Policy Optimization)
A reinforcement learning algorithm that aligns language models by comparing groups of outputs against each other, eliminating the need for a separate reward model.

Gemma 4
Google DeepMind's open-weight multimodal model family that natively handles text, vision, and audio on-device.

LoRA (Low-Rank Adaptation)
An efficient fine-tuning method that trains only small adapter layers instead of the full model

Model Distillation
Training a smaller 'student' model to replicate a larger 'teacher' model's capabilities at a fraction of the cost and latency

PEFT (Parameter-Efficient Fine-Tuning)
A family of techniques that adapt large AI models to specific tasks by updating only a tiny fraction of parameters, cutting fine-tuning costs by 90–99%.

Perplexity in NLP
The standard metric for evaluating language model quality — measuring how well a model predicts text, where lower values indicate better language understanding

Quantization
Reducing model weight precision from 16/32-bit to 8/4-bit to shrink size and speed up inference

RAG (Retrieval-Augmented Generation)
A technique that combines LLMs with external knowledge retrieval to improve accuracy and reduce hallucinations

RLHF (Reinforcement Learning from Human Feedback)
A training technique that uses human preference ratings to align LLM behavior with human values

Text/Action Mismatch
A failure mode where an LLM verbally refuses a restricted request in its text output while simultaneously executing the forbidden action in its structured tool-call output.

Mixture-of-Experts (MoE) Model
An architecture that routes tokens to specialized sub-networks, increasing model capacity without a proportional increase in computing costs.

Transformer
The neural network architecture underlying all modern LLMs, using attention mechanisms to process text

VLM (Vision-Language Model)
An AI model architecture that jointly processes visual and textual inputs, enabling tasks like document understanding, image reasoning, and visual question answering.

Attention Mechanism
The mathematical mechanism that allows transformers to dynamically focus on the most relevant parts of the input when processing each token

KV Cache
A memory optimization that stores previously computed key-value pairs in transformer attention layers — avoiding redundant computation and accelerating generation 3-5×

Self-Evolving Agentic Models
AI systems that autonomously improve their own capabilities by generating synthetic training data, debugging their own learning process, and modifying their reasoning strategies—early steps toward recursive self-improvement.

Autoregressive Generation
Autoregressive generation is how LLMs produce text: predicting one token at a time, with each new token conditioned on all previously generated tokens.

Beam Search
Beam search generates text by exploring multiple candidate sequences in parallel, keeping the top-k most promising paths to find the highest-probability output.

DeepSeek
A highly efficient, open-weight AI model family that delivers frontier-level coding and reasoning capabilities at significantly lower computational costs.

Flash Attention
A hardware-aware algorithm that massively speeds up LLM processing by optimizing GPU memory reads, enabling very long context windows.

GPT
GPT (Generative Pre-trained Transformer) is OpenAI's family of large language models that demonstrated how scaling transformers produces increasingly capable AI.

Gemini Omni
Google's any-to-any multimodal foundation model capable of generating any output (text, image, audio, video) from any input, with physics-grounded video generation as its first major capability.

Mamba
A highly efficient AI architecture that uses State-Space Models instead of Transformers to process massive amounts of text with very low memory usage.

MiniMax-M2
A 229.9B parameter Mixture-of-Experts model with only 9.8B active parameters per token, optimized for agentic tasks and exhibiting early signs of self-evolution—autonomously debugging its own training and modifying its scaffolding.

Nemotron-Labs Diffusion
NVIDIA's family of language models (3B-14B) that merge autoregressive and diffusion generation into one architecture, enabling both GPT-style sequential generation and 10-50x faster parallel diffusion mode.

Positional Encoding
Positional encoding tells transformers the order of tokens in a sequence, since self-attention alone is position-agnostic. Modern approaches like RoPE enable 128K+ context windows.

Speculative Decoding
Speculative decoding speeds up LLM inference by having a small draft model generate candidate tokens that the large model verifies in parallel — same quality, 2-3x faster.

Diffusion Model
A diffusion model generates images by learning to reverse a noise-adding process, iteratively refining random noise into coherent outputs guided by text prompts.

Foundation Model
A foundation model is a large AI model pre-trained on broad data at scale that can be adapted to many downstream tasks through prompting, fine-tuning, or retrieval augmentation.

GAN (Generative Adversarial Network)
A GAN uses two competing neural networks — a generator and a discriminator — to produce realistic synthetic data through adversarial training.

State-Space Model (SSM)
An efficient AI architecture that maintains a continuously updating internal state to process massive sequences of data without the memory overhead of Transformers.

Activation Function
Activation functions introduce non-linearity into neural networks, enabling them to learn complex patterns. Common ones: ReLU, GELU (transformers), sigmoid, softmax.

Encoder-Decoder Architecture
An encoder-decoder architecture pairs an encoder (which reads and compresses input) with a decoder (which generates output), forming the basis of transformer model variants like BERT, GPT, and T5.

Bicameral Model
A neural architecture that couples two parallel language models via their hidden states for real-time latent-channel coordination, dramatically improving reasoning accuracy without token overhead.