Tech

Feb 3, 2026

Wolt’s Embeddings Framework: Building scalable machine learning infrastructure for embeddings in search and recommendations

TL;DR

At Wolt, embeddings power search, recommendations, and personalization across millions of items and venues. Our first embedding pipeline helped us move fast early on — but it didn’t scale.

In this post, we share how we built a streaming-first Embedding Framework that supports real-time updates, deduplication, versioning, and shared use across teams — and how it now powers semantic search, substitutions, and discovery features at Wolt scale.

Every time someone opens Wolt to find something they need — a quick bite, a new toy for their pet, or a last-minute birthday gift, embeddings are quietly at work. They power how items are searched, recommended, and personalized across the app.

Embeddings are dense vector representations of text, images, or multi-modal data, and they form the foundation of many modern Machine Learning (ML) systems. At Wolt, we use embeddings to represent core entities in our domain, such as items, venues or customers. These entities contain different types of structured and unstructured data like menus, pictures, customer interactions, text descriptions, etc. Wolt’s Applied Scientists use embeddings to abstract all these data into vectors that can be used as input in our ML models.

But using embeddings at Wolt’s scale is not trivial.

With hundreds of thousands of merchant partners, hundreds of millions of items, and tens of millions of customers across dozens of countries, embedding systems quickly run into hard constraints around scale, cost, and reliability. Generating embeddings for millions of entities takes time, especially when third-party APIs impose rate limits. Recomputing embeddings unnecessarily drives up costs. And serving embeddings efficiently requires supporting very different access patterns — from large offline training jobs to low-latency, online inference.

Beyond the technical challenges, organizational complexity adds another layer. Embeddings are often reused across teams and use cases, making sharing, versioning, and change management essential. Without the right abstractions, teams risk duplicating work, locking themselves into specific models, or breaking downstream dependencies when embeddings evolve.

In this blog post, we describe how we addressed these challenges by building Wolt’s Embedding Framework — a streaming-first, event-driven approach to generating, managing, and serving embeddings across the company. We’ll walk through the lessons from our first implementation, the architecture we designed for scale, and how this framework now powers production use cases like semantic search, real-time recommendations, and substitutions.

Generating Embeddings at Scale

Lessons from Our First Embedding Pipeline

Our first embedding pipeline aimed at generating embeddings for the items in our assortment. It was built for simplicity, not scale. Twice a week, it pulled item data for each country, sent it to OpenAI for embedding, and upserted the results into MongoDB. This worked well for early experiments — but cracks quickly started to show.

The pipeline re-embedded every item on every run, even when nothing had changed. This drove up API costs and wasted processing time. New or updated items could take days to appear in production. Identical products - like the same Cola bottle sold by thousands of venues - were embedded again and again.

The system was also tied to one model and configuration, making experimentation or switching providers difficult. Without timestamps or versioning, tracking changes was nearly impossible. And whenever we tried to process large batches, MongoDB struggled to keep up with index updates.

The takeaway was clear: the pipeline worked for prototypes, not in production. We needed a scalable, modular approach that could handle Wolt’s growth.

Introducing Wolt’s Embedding Framework

That realization led to the Embedding Framework — a new approach to generating and managing embeddings across Wolt. Instead of patching the old system, we went back and asked: what would an embedding system look like if it were designed for scale from day one? The answer was not a centralized “platform,” but a set of abstractions, best practices, and architectural patterns that teams could adopt and adapt. We wanted something flexible enough for diverse use cases, yet robust enough to avoid reinventing the wheel each time.

We grounded the design around three simple ideas:

Platform no, abstractions yes — build reusable building blocks, not rigid infrastructure
Start with real pain points — solve today’s concrete problems
Start specific, then generalize — prove the value in one use case first

This mindset gave birth to the first implementation of Wolt’s embedding framework. It’s our proof that embedding generation can be scalable, cost-efficient, and still give teams the freedom to customize to their needs.

A Streaming-First Architecture for Embeddings

To overcome the limitations of the old batch pipeline, we designed a streaming-first embedding framework built around a simple principle: whether data arrives in large backfills or as real-time updates, it should flow through one unified, reliable process.

At its core, the architecture consists of three layers — data ingestion, embedding computation, and storage — connected by event queues that ensure consistency, scalability, and fault tolerance:

Ingestion handles how data enters the system. It can extract data from static sources like Snowflake or consume live updates from Kafka. Regardless of the source, everything is normalized and published into a central events queue, providing a single interface for all the embedding tasks.
Computation is handled by an embedding consumer that reads events from the queue, batches requests efficiently, and communicates with the embedding model via API calls. It includes robust error handling and retry logic to ensure reliability at scale. To avoid unnecessary recomputation, the framework applies deduplication. Identical or near-identical inputs — such as the same product sold by many venues (think Coca-Cola bottle) — reuse existing embeddings instead of generating new ones. This reduced the total number of embeddings we needed to compute by up to 10x, significantly lowering both cost and processing time.
Storage acts as the system’s source of truth. The embeddings are sent to a dedicated kafka topic. Each record includes metadata such as the embedding model version, creation timestamp, and the embedded fields. For low-latency use cases, embeddings can be read from the stream directly and propagated to specialized stores like vector databases or ML feature stores.

Finally, the framework is fully observable: we track metrics such as API usage, embedding costs, update latencies, and error rates in our monitoring platform, giving teams transparency and control as the system scales.

This architecture directly addresses the pain points of the old pipeline - eliminating unnecessary recomputation, enabling real-time updates, adding metadata and versioning, and improving scalability through batching and asynchronous queues. The result is a unified, production-ready framework capable of generating and managing embeddings efficiently and reliably — truly Wolt-grade. 🩵

Similarity Search Using Embeddings

One of the most direct applications of these embeddings in our system is finding similar items.

By representing items as dense vectors in a shared embedding space, we can measure semantic similarity using metrics such as cosine similarity or dot product. This allows us to identify items that are meaningfully related — not just textually similar.

Behind the scenes, similarity search boils down to a nearest-neighbor search problem in a high-dimensional vector space. With more than 100 million items in Wolt’s catalog, a naive linear scan over all embeddings is computationally infeasible for real-time use. To solve this, we store embeddings in DoorDash’s in-house search engine (built on top of Apache Lucene). The engine supports Approximate Nearest Neighbor (ANN) search, enabling us to retrieve the closest vectors without scanning the entire dataset. The system leverages the Hierarchical Navigable Small World (HNSW) graph algorithm.

HNSW constructs a multi-layer graph over the embeddings, which allows queries to traverse the space efficiently and converge on the nearest neighbors. This approach balances:

High recall — retrieving most true nearest neighbors.
Low latency — queries scale sublinearly with the number of items, making real-time recommendations feasible.

By connecting the real-time embedding updates from the Embedding Framework directly into our search ingestion pipeline, new or updated items are immediately represented in the vector index.

Connecting the Embedding Framework with our internal search engine

This integration gave us a fast feedback loop: instead of building bespoke infrastructure for each use-case, we can now iterate on vector search–powered product features quickly without having to worry about data availability and freshness.

Use cases

Recommending Alternative Items / Substitutions

The first embeddings powered by the Embedding Framework were item embeddings over Wolt’s entire item catalog. We used Large Language Models (LLMs) to embed textual item features such as their name, product description, ingredients and conditions of use. Combining these embeddings with the similarity search capability described above, we delivered the “You might also like” carousel, shown when a user clicks on an item card. This carousel surfaces highly similar items from the store’s inventory, such as the same chocolate bar in a different flavor or a different size package of the same product. It reduces the friction of navigating large menus and helps customers quickly find what they are looking for.

Another closely related use-case is substitutions. When an item is out of stock, we can recommend replacements in real-time. While the underlying mechanism is the same — finding the nearest neighbors in the embedding space — the user experience differs: instead of exploring, the goal is to offer the best possible replacement for a missing item.

Although both features rely on the same embedding-based similarity search, we can tune the output for each context. For example, by:

Adjusting similarity thresholds to control how close alternatives need to be.
Applying metadata filters (e.g. same product family, brand, or category) to ensure recommendations feel natural and relevant.

This flexibility allows us to use a single embedding pipeline to power multiple product experiences, each with slightly different requirements on precision and diversity.

“Because You Viewed” carousel - Real-Time Recommendations

Another use case for embedding-based similarity search is powering real-time recommendations on Wolt’s Discovery page. As users browse retail items in the app, we collect interaction signals: item clicks, searches, impressions, and more. These events are ingested in real-time, and for each user we maintain a short history of their most recent clicks on items they explored but did not purchase.

When building the “Because you viewed” carousel, we choose an anchor item from this history to drive the recommendations. The anchor selection takes into account:

Recency — more recently clicked items are prioritized.
Frequency within product families — repeated clicks on similar categories suggest higher intent.

Once the anchor is selected, we use its embedding to retrieve semantically similar items via vector search. These items are displayed in the carousel, nudging the user back toward products they showed interest in but haven’t yet purchased.

Semantic Search with Embeddings at Wolt

Historically, Wolt’s search relied on lexical matching, using inverted indexes to match user queries with item and venue titles, descriptions, and tags. This method works well for direct searches like "Milk" or "Burger King."

However, lexical search struggles when users search for items using more conceptual or descriptive language — for instance, queries like "father's day gifts" or "medicine for headache". In these cases, exact keyword matches are often insufficient.

This is where item embeddings become valuable.

By representing items as dense vectors in a high-dimensional space, item embeddings capture the semantic meaning and underlying characteristics of each product. This allows our search engine to understand not just keywords, but also the intent behind a user's query and the relationships between different items.

For the system to work effectively, both sides of the search interaction — the query and the items — must be represented in the same embedding space. When a user searches with a query, we first convert it into a vector representation (a query embedding). This embedding is then used to find semantically similar item embeddings with our semantic search engine (powered by ANN search), enabling us to retrieve products that match the meaning of the query, not just the exact words.

To ensure quality and precision, we also apply a similarity threshold on the retrieved results. The semantic search engine computes a similarity score (e.g., cosine similarity) for each candidate item with respect to the query embedding, and only items scoring above a defined threshold are considered relevant. This prevents semantically distant or noisy matches from surfacing, maintaining a balance between recall (finding enough results) and precision (keeping them truly relevant).

In practice, the best results come from combining semantic and lexical search. Lexical search excels at exact matches, while semantic search captures intent and meaning. Together, they provide a more complete and accurate search experience.

What’s Next

Building the Embedding Framework has been more than an infrastructure upgrade — it’s been a shift in how we think about embeddings at Wolt. By turning them into a shared, scalable capability, we’ve made it possible for teams across the company to experiment, build, and ship ML-powered features faster and more reliably.

What started as a simple batch pipeline has evolved into a streaming-first, event-driven system that scales across millions of entities and serves several production use cases — from personalised recommendations to semantic search. The result: fresher data, smarter models, and a shorter path from idea to impact.

And this is just the beginning. With new embedding models emerging and multimodal data becoming the norm, this framework too will evolve. As we’re currently building a new foundation for our machine learning initiatives across Wolt and Doordash, we will soon set out to use this new foundation also for embeddings. This way we ensure that the framework will continue to power the next generation of intelligent, personalised experiences across Wolt.

👋 Want to help shape machine learning at Wolt? Check out our open engineering roles.

Written by Steffen Klempau, Attila Nagy, Sowmya Yellapragada, Sergio Gonzalez Sanz

Interested in joining us?

Android Engineer, Consumer Discovery Team

Engineering•Berlin, Germany

Applied Scientist

Engineering•Berlin, Germany

Backend Engineer (Golang), Support Group

Engineering•Helsinki, Finland

Backend Engineer (Python)

Engineering•Helsinki, Finland

See all open roles

Follow our tech team on XGet updates about our next meetups, tech blogs, open positions and more.

Subscribe to our Wolt Tech Talks YouTube channelTune in for interesting tech talks from our product teams.