Models & Architecture
24 concepts

Emotion Vectors
Measurable internal neural representations inside AI models that function like emotions and causally steer the model's behavior.

Adaptive Thinking in AI
A reasoning strategy where AI models dynamically adjust how much they think per turn — from instant responses to deep multi-step deliberation — based on task complexity.

Adversarial Cost to Exploit (ACE)
A security benchmark that measures the economic token cost an adversary must spend to trick an AI agent into unauthorized tool use, replacing static pass/fail evaluations with game-theoretic cost analysis.

Automated Alignment Research
Using frontier AI models to autonomously discover methods for aligning other AI systems — addressing the scalable oversight challenge by letting safety research scale with capabilities.

DeepStack Injection
A VLM architecture that routes abstract visual features to early Transformer layers and high-resolution details to later layers for optimal document parsing in compact models.

GRPO (Group Relative Policy Optimization)
A reinforcement learning algorithm that aligns language models by comparing groups of outputs against each other, eliminating the need for a separate reward model.

Gemma 4
Google DeepMind's open-weight multimodal model family that natively handles text, vision, and audio on-device.

LoRA (Low-Rank Adaptation)
An efficient fine-tuning method that trains only small adapter layers instead of the full model

Model Distillation
Training a smaller 'student' model to replicate a larger 'teacher' model's capabilities at a fraction of the cost and latency

PEFT (Parameter-Efficient Fine-Tuning)
A family of techniques that adapt large AI models to specific tasks by updating only a tiny fraction of parameters, cutting fine-tuning costs by 90–99%.

Perplexity in NLP
The standard metric for evaluating language model quality — measuring how well a model predicts text, where lower values indicate better language understanding

Quantization
Reducing model weight precision from 16/32-bit to 8/4-bit to shrink size and speed up inference

RAG (Retrieval-Augmented Generation)
A technique that combines LLMs with external knowledge retrieval to improve accuracy and reduce hallucinations

RLHF (Reinforcement Learning from Human Feedback)
A training technique that uses human preference ratings to align LLM behavior with human values

Text/Action Mismatch
A failure mode where an LLM verbally refuses a restricted request in its text output while simultaneously executing the forbidden action in its structured tool-call output.

Mixture-of-Experts (MoE) Model
An architecture that routes tokens to specialized sub-networks, increasing model capacity without a proportional increase in computing costs.

Transformer
The neural network architecture underlying all modern LLMs, using attention mechanisms to process text

VLM (Vision-Language Model)
An AI model architecture that jointly processes visual and textual inputs, enabling tasks like document understanding, image reasoning, and visual question answering.

Attention Mechanism
The mathematical mechanism that allows transformers to dynamically focus on the most relevant parts of the input when processing each token

KV Cache
A memory optimization that stores previously computed key-value pairs in transformer attention layers — avoiding redundant computation and accelerating generation 3-5×

DeepSeek
A highly efficient, open-weight AI model family that delivers frontier-level coding and reasoning capabilities at significantly lower computational costs.

Flash Attention
A hardware-aware algorithm that massively speeds up LLM processing by optimizing GPU memory reads, enabling very long context windows.

Mamba
A highly efficient AI architecture that uses State-Space Models instead of Transformers to process massive amounts of text with very low memory usage.

State-Space Model (SSM)
An efficient AI architecture that maintains a continuously updating internal state to process massive sequences of data without the memory overhead of Transformers.