Reranking is the process of reordering retrieved documents or chunks based on their relevance to a query. It acts as a second filtering step to boost the most useful results and discard less relevant ones.

📊 When Does Reranking Have the Most Impact?
Scenario | Effect of Reranking |
---|---|
Large chunks (>1000 tokens) | ✅ Filters out irrelevant sections within large text blocks. |
Small chunks (<400 tokens) | ✅ Helps find the best combinations of relevant smaller pieces. |
Many retrieved documents (k > 10) | ✅ Prevents irrelevant results from polluting the output. |
Semantically complex queries | ✅ Identifies meaningful matches beyond simple keyword overlaps. |
⚙️ How is Reranking Applied?
- Initial Retrieval (BM25, Vector Search, Hybrid Search) → Fetches the top-k relevant chunks.
- Reranking using an AI model (e.g., Cohere Rerank, Cross-Encoder models like BERT/RoBERTa) → Rescores and reorders results based on deeper relevance.
- Final selection of the best-ranked chunks for LLM processing.