Three interviewers come out of a debrief. One likes the candidate. One is on the fence. The third "had some concerns" but can't name them. When the hiring manager asks which way the panel is leaning, nobody can give a direct answer. The search continues.

This happens constantly, not because interviewers are careless but because the format invites it. When every interviewer asks different questions in a different order and evaluates candidates on different implicit criteria, you get different answers. That is not a calibration problem. It is a structural one.

The Research Has Said the Same Thing Since the 1940s

In personnel selection research, "validity" refers to how well a tool predicts what you actually care about: job performance. The evidence on unstructured interviews has been consistent for more than eighty years. They are not good predictors. Not because interviewers are unintelligent, but because a conversation without structure does not generate consistent, comparable signal across candidates.

A 2022 re-analysis of decades of personnel selection research, summarized by the Society for Industrial and Organizational Psychology, found that structured interviews had a mean operational validity of r = .42, placing them above cognitive ability tests as the strongest predictor of job performance in the dataset. That is a significant finding: after correcting for methodological overcorrections that had inflated earlier estimates, structured interviews still came out on top.

The U.S. Office of Personnel Management published a comprehensive guide on structured interviewing that practitioners and researchers continue to reference. It described unstructured interviews as showing "low to moderate levels of validity" for predicting job performance, while structured interviews demonstrate "a high degree of reliability, validity, and legal defensibility." That guide was published in 2008, but the underlying research it draws on has been replicated across multiple subsequent meta-analyses, and the pattern it describes has not changed.

The research has been there. The practice has not caught up.

What "Unstructured" Looks Like in Practice

Unstructured does not mean chaotic. Most interviewers believe they are running thoughtful, rigorous conversations. The problem is that each interviewer is running a different conversation.

One interviewer goes deep on technical criteria. Another covers culture through a free-form discussion about career trajectory. A third asks behavioral questions but follows up differently based on how each candidate responds. When the debrief happens, the panel is comparing answers to different questions, weighted differently, scored against different implicit standards.

That is unstructured interviewing. It is also the dominant format across most hiring processes.

What makes an interview structured: the same questions, asked in the same order, to every candidate for the same role, with pre-defined scoring criteria applied consistently across responses. Behaviorally anchored rating scales describe what a strong, adequate, and weak answer looks like for each question. Interviewers score independently before the debrief begins.

None of this requires reading candidates less carefully. It requires reading all candidates against the same standard.

Why the Gap Between Research and Practice Persists

If the evidence has been this clear for this long, why do most interview processes still look the way they do?

Unstructured interviews feel like they work. When an interviewer clicks with a candidate and the conversation flows, it feels like a meaningful signal. Research on interviewer accuracy suggests that confidence in the judgment has essentially no correlation with the accuracy of that judgment. The interview felt good. That is not the same as the hire working out.

There is also the logistics problem. Writing structured question banks tied to a job analysis, building scoring rubrics, calibrating interviewers before a panel: this is real work upfront, and most recruiting teams are filling roles while simultaneously trying to build better infrastructure for filling them.

Hiring manager culture creates its own friction. Asking a senior engineer or a VP of Sales to score candidates against pre-defined criteria can feel like a bureaucratic constraint on a judgment call they believe they are uniquely qualified to make. That belief is understandable. It is also the condition under which interviewer bias operates most freely, precisely because there is no shared standard to return to.

What Structure Actually Changes

When teams move to structured interviews, a few things shift.

The signal improves. Candidates who are strong on the actual criteria required for the role show up more clearly, because those criteria are being assessed directly rather than inferred from conversational impression.

The debrief changes. When interviewers score independently against the same rubric before discussion, the debrief becomes a review of evidence rather than a negotiation of feelings. "I had concerns" becomes: "My score on criterion three was a two out of five, here is what I observed." That is a conversation the team can actually work with.

Legal exposure decreases. The OPM's framing of structured interviews as having "legal defensibility" is not incidental. When every candidate is asked the same questions and evaluated on the same criteria, you have a documented, consistently applied process. That matters when a hiring decision is later questioned.

Where Sia Fits into the Structured Process

Structuring the initial screening conversation is often where the gains are largest, because it is the first place in the funnel where inconsistency compounds. A recruiter running thirty screens for a single role in a single week will not ask the same questions the same way to every candidate. Volume makes consistency nearly impossible by hand.

Sia, Eximius's AI screening agent, conducts structured screening conversations with candidates across SMS, WhatsApp, and email. The recruiter sets the criteria for the role; Sia asks the same questions to every candidate and surfaces responses against those criteria. Every candidate gets screened against the same standard, without variation based on who ran the screen that day or how many open reqs the recruiter was carrying.

The screening stage is not where the hire is made. It is where the slate is built. Structure at this stage means the recruiter reviews a slate built on consistent signal, not a stack assembled through whatever happened to surface first.

The hiring decision stays with the team. Sia runs the part of the process that should not have required a recruiter's full attention to begin with.

The Design Problem Underneath

Most hiring teams treat the interview as the place where the real evaluation happens. Eighty years of research confirms that is true, but only when the interview is structured.

The unstructured conversation, however well-intentioned, introduces variance that the research consistently shows does not predict outcomes. Structure removes that variance. It does not remove judgment. It focuses judgment where it actually matters: on the criteria relevant to the role, evaluated consistently, across every candidate in the slate.

The question is not whether your team is capable of reading candidates. The question is whether your process is designed to give them consistent signal to read. Most processes are not. That is a design problem, and design problems have design solutions.

Want to see what structured screening looks like on your req volume? Book a pilot and we'll run your next role through the Eximius workflow.