Skip to main content
BVDNETBVDNET
ServicesWorkLibraryAboutPricingBlogContact
Contact
  1. Home
  2. AI Woordenboek
  3. Core Concepts
  4. What is Cosine Similarity?
book-openCore Concepts
Intermediate
2026-W17

What is Cosine Similarity?

Cosine similarity measures how similar two vectors are by computing the cosine of the angle between them — the standard metric for comparing AI embeddings.

Also known as:
cosinusgelijkenis
cosine distance
vector similarity
AI Intel Pipeline
What is Cosine Similarity?

What is Cosine Similarity?

Cosine similarity is a mathematical measure of similarity between two vectors by computing the cosine of the angle between them. In AI, it's the standard way to compare embeddings — measuring how semantically similar two pieces of text, images, or other data are.

Why It Matters

Cosine similarity is the math behind semantic search, RAG, recommendation systems, and duplicate detection. Whenever an AI system needs to answer "how similar are these two things?", cosine similarity is typically the metric used. Understanding it demystifies how vector databases and embedding-based systems make decisions.

How It Works

The formula:

cosine_similarity(A, B) = (A · B) / (||A|| × ||B||)

Where:

  • A · B is the dot product of vectors A and B
  • ||A|| and ||B|| are the magnitudes (lengths) of the vectors

Properties:

  • Returns a value between -1 and 1 (for normalized embeddings, typically 0 to 1)
  • 1 = vectors point in the same direction (identical meaning)
  • 0 = vectors are perpendicular (unrelated)
  • -1 = vectors point in opposite directions (opposite meaning)

Why cosine over Euclidean distance?

  • Cosine similarity measures the angle between vectors, ignoring their magnitude
  • Two documents about "AI" should be similar regardless of length
  • Euclidean distance is affected by vector magnitude; cosine is not

In practice:

  • Text embeddings from OpenAI, Cohere, or open-source models produce 768–3072 dimensional vectors
  • Vector databases compute cosine similarity between the query vector and millions of stored vectors in milliseconds
  • Top-k results (highest similarity scores) are returned as search results

Example

Suppose we embed three sentences:

  • A: "The cat sat on the mat" → vector [0.2, 0.8, 0.1, ...]
  • B: "A kitten rested on the rug" → vector [0.19, 0.78, 0.12, ...]
  • C: "Stock markets rallied today" → vector [0.7, 0.1, 0.6, ...]

Cosine similarity: A↔B ≈ 0.95 (very similar meaning), A↔C ≈ 0.15 (unrelated). This is how semantic search knows that B is relevant to a query like A, while C is not.

Sources

  1. Pinecone – Cosine Similarity
  2. scikit-learn – Cosine Similarity

Need help implementing AI?

I can help you apply this concept to your business.

Get in touch

Related Concepts

Tokenizer
A tokenizer converts raw text into tokens — the discrete units a language model processes — using subword algorithms like BPE or SentencePiece.
Artificial Intelligence (AI)
Artificial intelligence is the field of computer science that builds systems capable of performing tasks normally requiring human intelligence, such as learning, reasoning, and perception.
Batch Size
Batch size (examples per update) and learning rate (step size for weight updates) are the two most important hyperparameters controlling how neural networks train.
Benchmark (AI Evaluation)
A benchmark is a standardized test used to measure and compare AI model performance, providing reproducible scores across tasks like reasoning, coding, and knowledge.

AI Consulting

Need help understanding or implementing this concept?

Talk to an expert
Previous

Continual Learning

Next

DeceptGuard

BVDNETBVDNET

Web development and AI automation. Done properly.

Company

  • About
  • Contact
  • FAQ

Resources

  • Services
  • Work
  • Library
  • Blog
  • Pricing

Connect

  • LinkedIn
  • Email

© 2026 BVDNET. All rights reserved.

Privacy Policy•Terms of Service•Cookie Policy