IndexZero

Motivation

Why does this course exist?

Most engineering programs teach databases or machine learning. Almost none teach search as a systems discipline. Students graduate knowing SQL and transformers but have no mental model of:

—
How a search query actually executes Not the API call — the data structures, the scoring, the disk reads underneath
—
Why Elasticsearch exists separately from PostgreSQL What retrieval needs that a general-purpose database cannot efficiently provide
—
What "relevant" means mechanically Not philosophically — as a number, computed from a corpus signal
—
How vector search and keyword search relate Why one doesn't replace the other, and when to use both

Search is the most deployed non-trivial backend system in production. Every e-commerce site has one. Every SaaS product has one. Every RAG pipeline starts with retrieval. Most engineers know how to call the API. Few understand why it returns what it returns.

Why now

RAG (Retrieval Augmented Generation) put search infrastructure back in the spotlight. Every LLM application needs a retrieval layer. If you're building one, you need to understand what happens below the API.

Why build from scratch

You don't understand a system until you've made its mistakes yourself. Using Elasticsearch as a black box teaches you nothing about why recall drops after an index merge, or why BM25 outperforms a neural model on short queries.

Audience

Who is this for?

Track

Profile

What they get

Depth

B.Tech final year

Understand how search works; build a working system from scratch

M0 → M7

MTech / MS

IR foundations; starting point for search research

M0 → M9 + design reviews

IND

Practitioners

Understand the tools you already use at work

Self-paced, any module

OPEN

Self-learners

Work at your own pace, all code on GitHub

Full course, open access

Course Structure

Module map

One codebase, IndexZero, extended incrementally across 10 modules. Each module produces a working system, not just a subsystem.

Module

Name

What gets built

Core concept

M0
The Problem
Ranking audit — no code
Why search is hard; forming hypotheses

Text as Data

Tokenizer + vocabulary builder

Normalization; Zipf's law; preprocessing decisions

The Index

Inverted index on disk

Postings lists; disk layout; lookup cost

Ranking

BM25 scorer

TF-IDF → BM25; why frequency alone fails

M4
Did It Work?
Eval harness
nDCG, MRR, precision@k; benchmark discipline

Smarter Queries

Query processor

Boolean, phrase, proximity; query decomposition

Meaning, Not Words

Vector search (HNSW / flat)

Embeddings; semantic vs lexical; recall-latency frontier

Both Together

Hybrid retrieval

RRF; score fusion; cross-encoder re-ranking

Keeping It Alive

Index pipeline

Incremental updates; deletes; segment merges

The Full System

End-to-end search API

Latency budgeting; revisit M0 hypothesis

Module Zero — Deep Dive

M0: The Problem

No code. No setup. Just observation, curiosity, and a hypothesis document that the rest of the course will systematically prove or disprove.

Module 0 · The Problem

Before you build anything, you have to feel the problem.

Search looks simple from the outside. You type words. Results appear. The illusion breaks the moment you ask: why this result, and not that one? This module is about breaking that illusion — before you have the vocabulary to explain it.

What students will learn

01
Search results are not retrieved — they are ranked There is no list of "correct" answers. There is a scoring function applied to a corpus. This is the first mental shift.
02
Relevance is not binary A result is not relevant or irrelevant — it is more or less relevant for a specific query, for a specific user, in a specific context.
03
The same query returns different results on different sites Because they use different signals, different corpora, and different definitions of "good." There is no universal ranking.
04
Position matters enormously — and is itself a signal Click-through rate on result #1 vs result #5 is not linear. This feedback loop shapes the ranking over time.

What they will get wrong (and that's the point)

Common assumption

"The site shows the most popular product first."

What's actually happening

Popularity is one of many signals — weighted against recency, margin, inventory, query match, and personalisation.

Common assumption

"Better search means more results."

What's actually happening

Precision and recall trade off. Showing more results lowers average relevance. Good search is ruthlessly selective.

Common assumption

"AI / semantic search is just better."

What's actually happening

Keyword search outperforms vector search on exact queries and fresh content. Neither dominates universally.

Common assumption

"The search box just queries a database."

What's actually happening

A separate index, built offline and structured for retrieval, is what gets queried. It's not the same store as the product database.

The M0 Exercise: Ranking Audit

Exercise · M0 · The Ranking Audit

Reverse-engineer a real search result page

Pick any Indian e-commerce site you actually use — Flipkart, Meesho, Nykaa, Zepto, whatever. You will observe, hypothesize, and document.

01Run 3 different searches on the same site. Choose one broad query ("shoes"), one specific query ("Nike Air Max size 10"), one ambiguous query ("blue").
02Screenshot the top 10 results for each search.
03For each of the top 3 results per query: write one hypothesis for why it ranked there. Be specific — not "it's popular" but "it ranked here because X."
04Find one result that surprises you — either too high or too low. Hypothesize why the system got it wrong.
05Write a one-paragraph answer to: "What signals do you think this search engine is using?" You will revisit this answer at M9.

Deliverable: Ranking Hypothesis Doc — 1 page, submitted before M1 begins. No code. No right answers.

The narrative thread

The Ranking Hypothesis Doc is not graded for correctness. It is a baseline artifact. At M9, after building a full search system, students revisit it. The delta between their M0 hypothesis and their M9 understanding is the most honest measure of what they learned.

Most students will find their M0 hypotheses were partially right and fundamentally incomplete. That gap is the course.

Data Strategy

What data will students work with?

One dataset, used progressively deeper across all modules. Students build familiarity with the corpus the same way they build familiarity with the system.

Primary dataset: Amazon ESCI — real product queries with human relevance labels (Exact, Substitute, Complement, Irrelevant). Familiar domain, real queries, proper ground truth for eval.

Module

Dataset used

Why this data at this stage

What becomes possible

Live site (student choice)

Immediate familiarity; personal relevance

Forms intuition before formalism

M1–M2

Flipkart product titles

Short, messy, Indian English — great for normalization edge cases

Tokenizer stress testing

M3–M9

Amazon ESCI subset

Real queries + relevance labels; consistent across modules

Proper eval; apples-to-apples comparison of all approaches

Interested?

This course is in active development. If you'd like early access or want to use it in a classroom setting, get in touch.

Reach out on X Email