Why does this course exist?
Most engineering programs teach databases or machine learning. Almost none teach search as a systems discipline. Students graduate knowing SQL and transformers but have no mental model of:
-
How a search query actually executes Not the API call — the data structures, the scoring, the disk reads underneath
-
Why Elasticsearch exists separately from PostgreSQL What retrieval needs that a general-purpose database cannot efficiently provide
-
What "relevant" means mechanically Not philosophically — as a number, computed from a corpus signal
-
How vector search and keyword search relate Why one doesn't replace the other, and when to use both
Search is the most deployed non-trivial backend system in production. Every e-commerce site has one. Every SaaS product has one. Every RAG pipeline starts with retrieval. Most engineers know how to call the API. Few understand why it returns what it returns.
Why now
RAG (Retrieval Augmented Generation) put search infrastructure back in the spotlight. Every LLM application needs a retrieval layer. If you're building one, you need to understand what happens below the API.
Why build from scratch
You don't understand a system until you've made its mistakes yourself. Using Elasticsearch as a black box teaches you nothing about why recall drops after an index merge, or why BM25 outperforms a neural model on short queries.
Who is this for?
Module map
One codebase, IndexZero, extended incrementally across 10 modules. Each module produces a working system, not just a subsystem.
M0: The Problem
No code. No setup. Just observation, curiosity, and a hypothesis document that the rest of the course will systematically prove or disprove.
Before you build anything, you have to feel the problem.
Search looks simple from the outside. You type words. Results appear. The illusion breaks the moment you ask: why this result, and not that one? This module is about breaking that illusion — before you have the vocabulary to explain it.
What students will learn
-
Search results are not retrieved — they are ranked There is no list of "correct" answers. There is a scoring function applied to a corpus. This is the first mental shift.
-
Relevance is not binary A result is not relevant or irrelevant — it is more or less relevant for a specific query, for a specific user, in a specific context.
-
The same query returns different results on different sites Because they use different signals, different corpora, and different definitions of "good." There is no universal ranking.
-
Position matters enormously — and is itself a signal Click-through rate on result #1 vs result #5 is not linear. This feedback loop shapes the ranking over time.
What they will get wrong (and that's the point)
"The site shows the most popular product first."
Popularity is one of many signals — weighted against recency, margin, inventory, query match, and personalisation.
"Better search means more results."
Precision and recall trade off. Showing more results lowers average relevance. Good search is ruthlessly selective.
"AI / semantic search is just better."
Keyword search outperforms vector search on exact queries and fresh content. Neither dominates universally.
"The search box just queries a database."
A separate index, built offline and structured for retrieval, is what gets queried. It's not the same store as the product database.
The M0 Exercise: Ranking Audit
Reverse-engineer a real search result page
Pick any Indian e-commerce site you actually use — Flipkart, Meesho, Nykaa, Zepto, whatever. You will observe, hypothesize, and document.
- 01Run 3 different searches on the same site. Choose one broad query ("shoes"), one specific query ("Nike Air Max size 10"), one ambiguous query ("blue").
- 02Screenshot the top 10 results for each search.
- 03For each of the top 3 results per query: write one hypothesis for why it ranked there. Be specific — not "it's popular" but "it ranked here because X."
- 04Find one result that surprises you — either too high or too low. Hypothesize why the system got it wrong.
- 05Write a one-paragraph answer to: "What signals do you think this search engine is using?" You will revisit this answer at M9.
The narrative thread
The Ranking Hypothesis Doc is not graded for correctness. It is a baseline artifact. At M9, after building a full search system, students revisit it. The delta between their M0 hypothesis and their M9 understanding is the most honest measure of what they learned.
Most students will find their M0 hypotheses were partially right and fundamentally incomplete. That gap is the course.
What data will students work with?
One dataset, used progressively deeper across all modules. Students build familiarity with the corpus the same way they build familiarity with the system.
Primary dataset: Amazon ESCI — real product queries with human relevance labels (Exact, Substitute, Complement, Irrelevant). Familiar domain, real queries, proper ground truth for eval.
Interested?
This course is in active development. If you'd like early access or want to use it in a classroom setting, get in touch.
Reach out on X Email