
An embedding is a numerical vector — a list of hundreds to thousands of numbers — that represents the semantic meaning of a piece of text in a high-dimensional mathematical space. Embedding models convert words, sentences, or entire documents into these dense vectors, positioning semantically similar content close together and dissimilar content far apart. The sentence "How do I reset my password?" and "I forgot my login credentials" would have vectors pointing in nearly the same direction despite sharing no words, because they express the same intent. Embeddings are the foundational technology behind semantic search, recommendation systems, and Retrieval-Augmented Generation (RAG).
Why it matters
Embeddings solve a fundamental problem: how do you make a computer understand that "car" and "automobile" mean the same thing, or that a customer asking "How do I cancel?" relates to your "Subscription termination" documentation? Traditional keyword search fails here entirely. Embeddings enable semantic search — finding content based on meaning rather than exact word matches. This capability powers RAG systems that give LLMs access to organizational knowledge, recommendation engines that surface relevant content, and clustering algorithms that automatically organize documents by topic. For any AI application that needs to search, compare, or organize text, embeddings are the enabling technology.
How it works
Embedding models are neural networks trained specifically to produce meaningful vector representations. They learn from massive datasets of text, developing the ability to position related concepts near each other in vector space. When you send text to an embedding model, it outputs a fixed-length vector (commonly 768, 1536, or 3072 dimensions). The geometric relationships between these vectors encode semantic relationships: vectors for "king" minus "man" plus "woman" yields a vector close to "queen." In practice, similarity between embeddings is measured using cosine similarity (the angle between vectors) or dot product. You generate embeddings once for your document corpus, store them in a vector database, and at query time generate an embedding for the user's question, then find the most similar stored vectors.
Example
A law firm builds an internal knowledge system across 50,000 legal documents. Traditional search requires lawyers to guess the exact terms used in documents — searching "breach of fiduciary duty" misses documents that discuss "violation of trustee obligations" despite identical legal meaning. By generating embeddings for every document paragraph, the firm enables semantic search: a query about "executive liability for misleading shareholders" retrieves relevant sections from case law, regulatory filings, and internal memos regardless of the specific terminology used. The system finds relevant precedents that keyword search would miss, reducing research time from hours to minutes. Combined with an LLM to synthesize the retrieved passages, the firm has a RAG-powered legal research assistant that understands legal concepts rather than just matching words.