
What is Cosine Similarity?
Cosine similarity is a mathematical measure of similarity between two vectors by computing the cosine of the angle between them. In AI, it's the standard way to compare embeddings β measuring how semantically similar two pieces of text, images, or other data are.
Why It Matters
Cosine similarity is the math behind semantic search, RAG, recommendation systems, and duplicate detection. Whenever an AI system needs to answer "how similar are these two things?", cosine similarity is typically the metric used. Understanding it demystifies how vector databases and embedding-based systems make decisions.
How It Works
The formula:
cosine_similarity(A, B) = (A Β· B) / (||A|| Γ ||B||)
Where:
- A Β· B is the dot product of vectors A and B
- ||A|| and ||B|| are the magnitudes (lengths) of the vectors
Properties:
- Returns a value between -1 and 1 (for normalized embeddings, typically 0 to 1)
- 1 = vectors point in the same direction (identical meaning)
- 0 = vectors are perpendicular (unrelated)
- -1 = vectors point in opposite directions (opposite meaning)
Why cosine over Euclidean distance?