If you scratch the surface of almost any modern AI product — semantic search, recommendation engines, retrieval-augmented chatbots, duplicate detection, personalisation — you find the same underlying idea. Text, images, audio, and user behaviour are all turned into lists of numbers called embeddings, and those lists are then compared using a tiny bit of geometry. Understanding embeddings is the difference between treating AI as magic and actually being able to design with it. This guide strips the jargon away, shows exactly what embeddings are and how they work, and walks through why "nearby in vector space" turned out to be the most useful operation in modern machine learning.
Embeddings in one sentence
An embedding is a list of numbers that represents the meaning of something — a word, a sentence, an image, an audio clip — in a way that makes similar things have similar lists. That is the entire trick. Everything else about embeddings — dimensions, cosine similarity, embedding models, vector databases — follows from that one idea.
Concretely, a modern embedding is a vector of somewhere between 256 and 3072 floating-point numbers. When you compute the embedding of "dog" and the embedding of "puppy," the two lists will look numerically similar. When you compute the embedding of "dog" and "spreadsheet," the two lists will be noticeably different. The model that produces the embeddings has been trained so that this property holds across virtually any concept you might compare.
That is the core magic. Once similarity in meaning is captured as similarity in numbers, you can do an enormous number of useful things with simple math.
The geometry of meaning
Every embedding lives in a high-dimensional space. "High-dimensional" sounds intimidating; it just means there are more coordinates than you can visualise. A 1024-dimensional embedding is a point with 1024 coordinates. You cannot draw it on paper, but you can compute distances between two such points the same way you would in two or three dimensions.
Two useful measures of "how similar are these two points."
Cosine similarity measures the angle between two vectors. If two embeddings point in the same direction, cosine similarity is 1. If they are perpendicular, it is 0. If they point in opposite directions, -1. This is the default similarity measure in virtually every embedding-based system because it cares about direction, not magnitude.
Euclidean distance measures straight-line distance between the two points. Occasionally used, especially when magnitudes carry information, but most embedding models are trained to be meaningful under cosine similarity specifically.
Once you have a way to measure similarity, you can do semantic search (find items close to a query), clustering (find groups of items that are close to each other), classification (assign labels by finding the closest labelled examples), and a dozen other operations — all by turning the problem into geometry.
Where embeddings come from
Embeddings do not arise from nowhere. They are produced by embedding models — neural networks specifically trained so that their output vectors have the similarity property.
The modern embedding model of choice for text is almost always a transformer, often a specialised variant of an LLM architecture. It takes text as input, runs it through transformer layers, and outputs a single vector that represents the whole input. Training is done with contrastive objectives: the model sees pairs of related texts (like a question and its answer) and pairs of unrelated texts, and is trained so that related pairs have nearby embeddings while unrelated pairs have distant ones. Run this training at scale on hundreds of millions of pairs and you get a model that can embed arbitrary text in a way that captures meaning.
Popular embedding models in 2026 include OpenAI's text-embedding-3-large and -small, Cohere's embed v3, Voyage AI embeddings, Jina AI embeddings, and open-source models like BGE-M3, Nomic, and E5. The field moves quickly; a new top-of-the-leaderboard model drops every few months.
The same principle applies to images and audio. CLIP (from OpenAI) trained on image-caption pairs produces image and text embeddings in a shared space, so you can search images with text queries. Wav2Vec and similar models produce audio embeddings. Modern multimodal embedding models handle text, images, and audio natively in a single model.
The classic example: king minus man plus woman equals queen
One of the earliest and still-striking observations about embeddings, from the Word2Vec work in 2013, is that simple vector arithmetic often corresponds to semantic relationships.
Take the embedding of "king." Subtract the embedding of "man." Add the embedding of "woman." The resulting vector, when you find the word whose embedding is closest to it, turns out to be "queen." Similarly, the embedding of "Paris" minus "France" plus "Japan" gives you "Tokyo." "Walking" minus "walk" plus "swim" gives "swimming."
This was astonishing in 2013 and remains a useful demonstration of what embeddings capture. The model has, without being told anything about grammar or semantics, discovered that a direction in the vector space encodes gender, another direction encodes capital-country relationship, another encodes verb tense. These relationships were never labelled or explicitly trained; they emerge from the contrastive objective.
Modern sentence and document embeddings are more powerful and harder to visualise, but the same phenomenon persists at higher levels of abstraction. Embeddings capture meaning in ways that reveal themselves through geometry.
Semantic search, the killer application
The most widely deployed embedding application is semantic search. Traditional keyword search finds documents that contain the exact words you typed. Semantic search finds documents that mean what you typed, regardless of whether they use the same words.
The pipeline is straightforward. At ingestion time, embed every document (or document chunk) and store the embeddings in a vector database. At query time, embed the user's query, and find the documents whose embeddings are closest. Return those.
The result is search that handles synonyms, paraphrasing, and conceptual similarity without any manual effort. A user searching for "how to reset my password" finds articles titled "Account recovery procedure" because the embeddings land near each other in vector space, even though they share almost no keywords.
Semantic search underpins virtually every RAG system. It also drives product recommendations ("items similar to this one"), duplicate detection ("is this question already in our FAQ?"), content moderation ("does this post resemble known harmful posts?"), and personalisation ("content similar to what this user engages with"). Once you see the pattern, you start noticing embeddings everywhere.
Practical choices: dimensions, normalisation, and models
A few engineering details that matter once you start using embeddings for real.
Dimensions. Higher-dimensional embeddings can represent more nuance but cost more to store and search. A 3072-dim embedding is four times more memory and roughly four times slower to search than a 768-dim embedding. For most retrieval tasks, 768 to 1536 is a good sweet spot. Some modern embedding models support "Matryoshka" truncation, where you can use the first n dimensions of a long embedding and still get meaningful similarity — letting you trade size for precision.
Normalisation. If you are using cosine similarity, normalise your embeddings to unit length. This lets you use dot product as a faster equivalent to cosine and helps many vector databases index more efficiently.
Model choice. Do not just pick the highest-ranked model on MTEB. Benchmarks do not always reflect your domain. Test three or four candidates on a representative set of your own query-document pairs and measure retrieval precision at k=10. Differences are often dramatic.
Consistency. Always use the same embedding model for indexing and querying. Switching models invalidates your entire index and will produce nonsense retrieval until you re-embed.
Visualising high-dimensional embedding space
Working with 1024-dimensional vectors is unintuitive. Human minds evolved to visualise three dimensions, not a thousand. The classic tool for peering into embedding space is dimensionality reduction — algorithms that collapse a high-dimensional cloud of points into a two-dimensional or three-dimensional plot while preserving as much of the neighbourhood structure as possible.
t-SNE produces beautiful 2D visualisations where similar items cluster into visible regions. It is the most popular choice for illustrations and exploratory analysis, though it distorts global distances and should not be trusted for quantitative claims.
UMAP is faster than t-SNE and preserves more of the global structure, making it the default in many modern tools.
PCA — principal component analysis — is the simplest and fastest option, useful as a first look. It captures linear structure well but misses non-linear relationships.
These visualisations are genuinely illuminating the first few times you see them. Running UMAP on your document embeddings and discovering that your corpus naturally falls into clusters you did not design is a near-magical experience. It also quickly reveals problems: if your "technical documentation" and "marketing copy" land in the same cluster, your embeddings are probably not capturing the distinction you care about.
Embedding models have tiers and tradeoffs
Just as with LLMs, embedding models come in a range of sizes and capabilities, and the right choice depends on your specific needs. A rough 2026 landscape.
Frontier embedding models like Cohere embed v3, OpenAI text-embedding-3-large, and Voyage AI's top tier produce exceptional retrieval quality but charge per million tokens and run over API. Best for production quality when cost per embedding is not a primary constraint.
Mid-tier open-source models like BGE-M3, E5-Mistral, and Nomic embed are free to run, match or beat older closed models on benchmarks, and are the practical choice for self-hosted production systems at scale.
Small embedding models like BGE-small, All-MiniLM, and Jina embed tiny variants produce slightly lower retrieval quality but run in milliseconds on CPU and embed thousands of items per second. Great for latency-critical or edge-device applications.
Domain-specific embeddings trained on medical, legal, code, or scientific corpora can dramatically outperform general-purpose models in their specialty. If you operate in a specialised domain, testing a domain-tuned model is worth the effort.
Never pick an embedding model by reputation alone. Test two or three on a representative sample of your query-document pairs and measure retrieval precision. The best model for your domain may not be the one topping the general leaderboard.
Beyond text: images, audio, multimodal
Embeddings work just as well beyond text. A few useful capabilities become possible with multimodal embeddings.
Image search by image. Embed a product photo and find visually similar products. Embed a screenshot and find matching UI components. Embed a face and find similar faces — a powerful and sensitive capability that touches on privacy law.
Image search by text. CLIP-style shared embedding spaces let you search an image library with natural language. "Find me photos of sunsets over mountains" turns into an embedding that matches images whose CLIP embeddings are closest to the query.
Audio similarity. Embed songs to find musically similar songs. Embed voice recordings to find speakers with similar voices. Embed podcast episodes to recommend similar ones. Spotify's and Netflix's recommendation systems rely heavily on embeddings in this space.
Cross-modal retrieval. Search a video library with a spoken description. Find images that match a mood specified in text. Retrieve product docs using a photo of the product. Multimodal embeddings, still improving rapidly in 2026, make all of this possible with a single vector database.
Clustering, classification, and anomaly detection
Beyond search, embeddings power several other classic ML tasks with remarkable simplicity.
Clustering — grouping similar items — becomes trivial with embeddings. Run k-means on the embedding vectors and you get semantic clusters. Customer support teams use this to discover common ticket themes. Product teams use it to group similar user feedback. Content teams use it to find topic clusters on their sites.
Classification with zero training can be done by computing the embedding of each class label (or a representative example) and assigning a new item to the class whose embedding is closest. This zero-shot approach is often good enough for moderate-quality classification without any labelled data.
Anomaly detection becomes "find items whose embeddings are far from everything else." A new support ticket whose embedding has no close neighbours in the existing corpus is probably a novel problem. A new transaction whose embedding does not cluster with past normal transactions is a candidate for fraud review. The same pattern applies across domains.
Common pitfalls and how to dodge them
A few traps that bite almost every team early on.
Embedding space staleness. If you switch embedding models (because a better one shipped, or you are switching vendors), you must re-embed your entire corpus. Mixing embeddings from different models produces garbage similarity scores. Version your embedding pipeline.
Domain gap. General-purpose embedding models underperform in specialised domains (medical, legal, code). A domain-specific fine-tuned embedding model, or a general model post-processed with contrastive fine-tuning on your own data, can dramatically improve retrieval quality. Do not assume the off-the-shelf model is optimal for your use case.
Chunking mismatch. Embedding long documents as single vectors loses detail; embedding very short chunks loses context. Most production systems use chunks in the 200-1000 token range with some overlap.
Cosine similarity myths. Cosine similarity of 0.85 sounds impressive but is not meaningfully different from 0.83 in most embedding models. Absolute similarity scores are almost never interpretable; only relative rankings matter.
Privacy and leakage. Embeddings can sometimes leak information about their source text. If you are embedding sensitive data, understand that the vectors themselves may need to be treated as sensitive too.
What changes next for embeddings
A few trends shaping the embedding landscape in 2026 and beyond.
Multimodal embeddings are becoming unified. Instead of a text embedding model, an image embedding model, and an audio embedding model, a single multimodal model handles all three in the same space. This simplifies infrastructure and enables cross-modal applications out of the box.
Long-context embeddings are emerging. Models that can embed whole documents of tens of thousands of tokens, preserving hierarchical meaning, are making chunking less essential for some use cases.
Learned retrievers are blurring the line between embeddings and retrieval. Models that jointly learn to embed and to rank are producing better search quality than separated embed-then-rank pipelines for complex domains.
On-device embedding models are improving fast. Running embeddings locally on a phone or laptop is now practical for moderate volumes, unlocking private semantic search, email filtering, and personal assistants that never send your data to the cloud.
A first project to cement the concept
If you have never built anything with embeddings, here is a 30-minute project that will lock the concept in. Take the README files from ten of your favourite open-source projects. Embed each one using any embedding model (OpenAI, Cohere, or a Hugging Face model). Store the ten embeddings. Embed a natural-language query like "a library for caching" or "something for time-zone handling." Find which of the ten READMEs has the highest cosine similarity. You have just built a tiny semantic search engine with maybe 50 lines of code.
Expand the idea: embed every page of your company wiki and you have an internal semantic search. Embed every customer support ticket and you have duplicate detection. Embed every product in your catalogue and you have recommendations. The pattern repeats endlessly. Once you see it, embedding-based solutions appear everywhere you look.
Embeddings turn words, images, and audio into number lists where nearby points mean similar things — and that is the whole magic.
The short version
An embedding is a vector of numbers produced by a neural network, trained so that items with similar meaning have similar vectors. Once you can measure similarity as geometric distance, you can do semantic search, clustering, classification, and personalisation with almost trivial code. Every RAG system, every recommendation engine, and most modern AI infrastructure depends on embeddings somewhere. Understand them well and most of the AI stack becomes obvious. Ignore them and you will keep wondering why everyone else's system just works better.