Why Keyword Search Breaks on Real Resumes
Keyword search looks good in demos because demos are clean. Real recruiter queries are not. A recruiter rarely types only one skill or one exact title. They usually ask for a combination of role, stack, seniority, employer context, notice period, location, and exclusions in one sentence, and they expect the system to understand what is negotiable and what is not.
Real resumes are equally messy. One candidate may describe the same work as backend engineering, platform engineering, distributed systems, Java microservices, enterprise APIs, or cloud modernization. Another may have the right experience buried inside project descriptions rather than in the headline title. Exact string matching misses these candidates even when the fit is obvious to a human recruiter.
That gap is what pushed us away from treating candidate search as a plain text lookup problem. The actual problem is translation. The recruiter expresses hiring intent in natural language, while the resume expresses career evidence in uneven language. Good search has to bridge the two without flattening everything into fuzzy matching.
Turn the Query Into a Search Plan
Inside the backend, our semantic search flow is deliberately staged. The orchestration layer runs query parsing, filter validation, concept normalization, embedding generation, retrieval, post-filtering, scoring or reranking, and final response assembly as separate steps. That separation matters because recruiter queries mix interpretation problems with business-rule problems, and those should not be handled by the same mechanism.
The first stage uses an LLM to parse the recruiter request into structured JSON rather than only producing a single embedding. We extract a normalized semantic query, a search plan, concepts, and filters. But we do not trust that first response blindly. If parsing fails, the system falls back to a vector-only search using the raw recruiter query so search can still proceed instead of collapsing into an error state.
We also apply extra guardrails around the parsed output. If the LLM returns a search plan, we convert it into authoritative concepts and a normalized query. If the recruiter says things like not from X, exclude Y, or without Z, our negation fallback logic injects explicit must-not concepts even if the model under-parsed the exclusion. That detail sounds small, but in recruiting it is one of the differences between a clever demo and a trustworthy tool.
Ovii turns recruiter text into a structured search plan, keeps exclusions explicit, and still falls back gracefully when parsing is uncertain.
Semantics for Meaning, Rules for Truth
One of the biggest lessons from building this is that semantic search should not try to replace every structured rule. Some parts of a recruiter query are naturally fuzzy, like adjacent role language, transferable domain exposure, or synonymous technical phrasing. Other parts are hard constraints, like notice period, location, salary band, final-assessment status, or exact employer and institution anchors in the right context.
Our search flow validates filters separately and even applies hard-filter fallbacks based on the original recruiter phrasing so normalization does not erase operational meaning. For example, join-window phrasing like can join in 20 days or within a few weeks needs to survive interpretation as a real constraint, not become vague semantic decoration around the query.
We also added post-retrieval checks for anchor-heavy cases. If a recruiter searches for people who worked at a specific company or studied at a specific institution, we do not want semantically nearby but factually wrong candidates surviving only because their profiles mention related terms. That is why strict anchor filtering and hard-filter revalidation happen after retrieval too. In practice, the rule is simple: use semantics to understand meaning, but use exact logic where the truth value matters.
Index Resumes by Section, Not as a Blob
Early semantic systems often embed an entire profile as one document and hope the vector captures everything important. That approach loses too much signal for recruiting. A resume is not one piece of meaning. It is a bundle of different evidence zones: summary, experience, education, skills, projects, certifications, location, languages, availability, and awards. Recruiters search across those zones differently.
Our resume chunking pipeline reflects that reality. We split structured resumes into semantic sections such as SUMMARY, EDUCATION, EXPERIENCE, SKILLS, PROJECTS, CERTIFICATIONS, DEMOGRAPHICS, LANGUAGES, AVAILABILITY, and AWARDS. Long experience sections can be split into multiple chunks, and each chunk is tagged with its section type. That means a search for a university, a notice period, or a specialized project can retrieve the most relevant evidence instead of competing with the full resume body.
We also add context to each chunk before embedding it. A chunk is prefixed with candidate-level metadata such as name, current title, and section label so the embedding has more situational meaning. On top of that, we build contextual resume text using current role, company, total experience, location, top skills, previous employers, education, and certifications. The effect is that the vector is not only about isolated sentences. It is about those sentences in the context of an actual career profile.
Build an Ingestion Pipeline That Can Be Trusted
Search quality starts before search time. It starts at ingestion time. In our system, resume embedding generation runs through a RabbitMQ consumer so processing can be asynchronous, observable, and resilient under real operational load. The embedding listener validates payloads, deduplicates messages for idempotency, classifies failures, and avoids reprocessing the same event inside the deduplication window.
We also handle source-specific business rules during indexing. Candidate self-uploads are not indexed until the resume has actually been approved, while recruiter-side uploads can be processed immediately. That matters because candidate search should not surface drafts or unreviewed states as if they were finalized profiles.
When a resume is ready, we chunk it semantically and generate embeddings for all chunks in one batch API call rather than one network call per section. If structured chunking fails or returns nothing useful, we still fall back to a single contextual chunk so the profile remains searchable. The theme again is controlled degradation. We prefer a lower-fidelity search document over a missing search document.
Make Candidate State Searchable
Another important design choice is that the searchable payload should include more than raw resume content. Recruiters do not just ask who has Java or who knows fintech. They ask who can join quickly, who has already cleared a final assessment, who appears ready for a particular stage, and who belongs in a company-specific or shared talent pool.
Our indexing payload carries structured fields such as current job title, current company, total experience, skills, previous employers, institutions, certifications, location, availability, notice period, and expected joining date. We also enrich the search payload with cross-domain signals like whether the candidate has taken a final round and what their latest final-round score is when that signal is search-eligible.
That changes what candidate discovery can mean inside the product. Search is no longer limited to semantic matching over prose. It becomes a retrieval layer over profile evidence, hiring readiness, and workflow-relevant metadata.
Use Hybrid Retrieval, Not One Vector Lookup
When the recruiter query reaches retrieval, we still do not reduce it to one raw sentence embedding and hope for the best. The retrieval layer builds an enriched embedding text from the normalized vector query, concept phrases, aliases, and selected text filters. It also attaches per-concept embeddings so the system can search not just for the whole idea of the query but for the important concepts inside it.
The runtime path prefers Qdrant when available and falls back to Weaviate if needed. On the Qdrant side, search is hybrid and multi-collection. We search both the company-specific private collection and the public shared collection in parallel, then merge and aggregate results at the candidate level so one strong chunk can support a candidate without flooding the ranking with duplicate chunk hits.
Hybrid here means more than dense vector similarity. The Qdrant path combines the main dense query, concept-specific dense lookups, lexical search phrases, and supplemental recall paths for availability-heavy and final-assessment-heavy searches. Those branches are fused with reciprocal rank fusion rather than trusting one scoring channel to be perfect. In plain English, we are combining meaning-based retrieval and literal evidence retrieval instead of forcing recruiters to choose between them.
Search blends company and public talent pools with dense, lexical, and concept-level retrieval before ranking candidates.
Route Search Across the Right Talent Pools
Recruiting systems rarely live in a single clean corpus. Some candidates belong only to a company context. Others should be discoverable from a shared public pool. Some are uploaded by recruiters or vendors. Others come from candidates directly, from job boards, or from campus and partner channels. Those sources should not all be indexed or exposed in the same way.
Our vector routing layer handles that explicitly. Recruiter uploads, bulk uploads, vendor uploads, and employee referrals can be stored on a dual path: the company collection and the public collection. Candidate uploads, job-board applications, TPO students, and branded-career-board profiles can be routed public-only. That gives us a better balance between company-specific recall and broader discovery.
We also delete stale vectors before reinserting fresh chunk sets for the same candidate. That sounds operational, but it is directly tied to trust. Without it, edited profiles can leave old evidence behind in the vector store, and search starts surfacing yesterday's version of the candidate.
Guard the Results After Retrieval
Retrieval is only the middle of the story. Once candidates come back, they still need cleanup and protection. Our post-filtering layer deduplicates candidates using identity signals such as email, name, and title, and if company-accessible duplicates exist it can prefer the better company-relevant version over a weaker duplicate.
Then we apply strict anchor filtering when it is needed. Employer, institution, and location-style anchors often need exact textual evidence rather than semantic approximation. We also revalidate hard filters like notice-period constraints after deduplication so the final result set honors the recruiter intent even if some earlier step under-enforced it.
The same goes for the parsed search plan. Required concepts and exclusions are checked again against the candidate set so the system does not simply say it understood the recruiter. It proves it in the final list. This is one of the most important philosophical choices in the whole stack: retrieval can be generous for recall, but the final answer has to be strict enough to earn trust.
Rank for Coverage, Focus, and Evidence
Even after good retrieval and filtering, ranking remains the hardest problem. Pure similarity scores are not enough because recruiters care about coverage. A candidate who partly matches many surface words is not always better than a candidate who strongly satisfies the role, domain, and readiness criteria that actually matter.
Our scoring layer combines concept coverage, focus alignment, specialization signals, transition evidence, and availability scoring into the final ordering. It also keeps the result labels recruiter-friendly by harmonizing them into bands like STRONG MATCH, GOOD MATCH, and RELATED. Those labels matter because raw decimal scores are not how recruiters make quick decisions.
When reranking is enabled, we build depth-first candidate documents for the reranker rather than dumping raw text. The document is structured to present role, experience duration, project highlights, skills, company and location context, resume excerpt, and AI summary fallback in a deliberate order. Low-confidence results for anchor-heavy queries also need lexical evidence to survive. The overall effect is that shallow keyword stuffing has a harder time outranking substantively stronger candidates.
Make Semantic Search Explainable
One reason recruiters distrust semantic search is that many tools act like black boxes. The search returns people, but the user cannot see what the system thought the query meant or how to correct it without starting from scratch. We wanted the opposite experience.
Our response builder returns the normalized query, the structured search plan, interpreted criteria, applied filters, and dropped filters. On the recruiter side, the Candidate Discover panel surfaces that through a We Understood interpretation panel. Recruiters can inspect must-have, should-have, any-of, must-not, and filter chips, then move or remove concepts and rerun the search through a dedicated refine flow.
That interaction changes the trust model. The recruiter is not forced to either accept or reject a mysterious result set. They can edit the system's interpretation, see the impact, and continue. Semantic search becomes collaborative instead of opaque.
Recruiters can see what the system understood, refine the interpretation, and move strong candidates straight into pipeline workflow.
Connect Discovery to Hiring Workflow
Search is only valuable if the next step is obvious. Once a recruiter finds a strong fit, the product should let them act immediately instead of exporting the result mentally into another system. That is why semantic candidate search in Ovii is wired into downstream hiring operations.
From the recruiter experience, a discovered candidate can be added directly into the job pipeline. On the backend, that path checks for duplicates, creates the job application, places the candidate into the intake stage, and then triggers async scoring against the job description when resume text is available. In other words, the search result is not a dead end. It becomes part of the evaluation workflow.
This is a subtle but important product distinction. We were not trying to build a beautiful search widget in isolation. We were trying to make discovery a reliable first-class step in structured hiring.
What We Learned Building It
The clearest lesson is that semantic candidate search is not the same thing as storing resumes in a vector database. Real recruiting search needs language understanding, but it also needs schema awareness, source-aware indexing, exact anchor handling, business-rule enforcement, resilient fallbacks, and workflow integration.
Another lesson is that trust is cumulative. Recruiters trust the system more when it handles negation correctly, when it understands join windows, when it does not hallucinate employer matches, when it explains what it understood, and when strong candidates can move straight into pipeline actions. No single feature creates trust on its own. The whole chain does.
Finally, good semantic search in recruiting should widen recall without lowering standards. The point is not to retrieve more profiles for the sake of volume. The point is to help recruiters surface better-fit candidates faster, with enough explanation and control that they can confidently move from search to selection.