When interviews earn their place
A user interview is a one-to-one conversation, typically forty-five to sixty minutes, in which the researcher elicits behavioural, contextual and motivational data from a participant who matches a defined recruitment profile. That is the textbook definition. The operational definition is shorter: interviews are how you find out what is actually happening when the numbers can't tell you.
Interviews earn their place when the team needs to understand motivations behind observed behaviour, mental models of a domain or product, decision contexts that surveys flatten, the workflow that surrounds a single point of product friction, or the language users actually use to describe what they're trying to do. They do not earn their place when the team needs to know how widespread a behaviour is, when self-report would be unreliable, or when the decision is between two equally specified designs (in which case usability testing or A/B is the right tool).
Interview research is also expensive in stakeholder time. Recruiting, scheduling, conducting and synthesising twelve interviews is two to four weeks of researcher effort, plus the participant cost. A team that runs interviews to confirm a decision they have already made is wasting that effort; a team that runs interviews before the decision is framed will produce findings the room can act on. The cluster pillar covers the broader choosing a research method decision; this guide covers the operational reality once that choice has been made.
Planning the study
The first hour of any interview study is spent on a kickoff document that names four things. Anything cut from this list weakens the study disproportionately.
Four questions to answer before recruiting
- What decision does this research inform? Name it specifically. "Should we build feature X?" "Why are users abandoning at checkout step 3?" "What workflow does the field worker actually follow?" If no decision is named, the study will produce findings that no one acts on.
- Who, specifically, are we interviewing? The segment, the use case, the experience level, the recency of the relevant behaviour. The screener is a function of this answer; weak screeners come from vague answers.
- What do we expect to learn? Write out the three to five most likely findings up front. This is not a prediction to confirm; it's a way to recognise the moment when the study tells you something unexpected, and to avoid running a study that only confirms what you already knew.
- What does the readout look like? A one-pager? A deck for leadership? A working session with the engineering team? The form of the readout shapes what kind of evidence you need to capture during the interviews.
Stakeholders who ask for "user interviews" without specifying the underlying decision should be pushed back on. Doing the requested study verbatim, when the underlying question is unclear, makes the researcher complicit in the failure that follows. Senior practitioners spend a meaningful portion of any kickoff forcing the decision into the open.
Recruitment
Recruitment is the single largest source of failure in interview studies. Bad recruitment means the segment you intended to study is not the segment you actually spoke with, which means the findings don't transfer to the decision they were meant to inform. The operational reality is that recruitment takes longer than every other stage of the study and is rarely budgeted as such.
The screener is a short questionnaire that filters volunteers against the target profile. It needs four things: a behavioural qualifier (have you done X in the last Y), a context qualifier (in what setting), a non-leading framing (no "are you interested in topic Z" questions), and a quota or no-quota stance per segment. Screeners written by stakeholders almost always include an unintentional self-selection bias; researcher review is non-optional.
Three recruitment sources cover most product research. Internal customer list, ideal for product-specific behavioural studies and the cheapest source, but only available to teams whose customers know the brand. Panel providers like User Interviews, Respondent, Prolific or Ethnio, faster and lower-friction but more expensive and with sample profile risks. Open recruitment via social channels, lowest cost and highest self-selection bias; viable for generative discovery work, weaker for evaluative studies. Incentives are inevitable; for forty-five-minute professional interviews, £75 to £150 is the 2026 range in the UK and US.
The discussion guide
The discussion guide is the structured list of question areas the moderator will work through. It is not a script. The discipline is to write it as a series of behavioural prompts and probes rather than as a fixed wording, so the moderator can adapt to what the participant actually says.
The standard structure runs warm-up, context, current behaviour, candidate concept (if evaluative), wrap. The warm-up establishes rapport and orients the participant; ten minutes of context-setting is rarely wasted. The middle is where the real material lives: probing recent behaviour, walking through workflows, surfacing decisions and mental models. The wrap closes out, asks anything we should have asked, thanks the participant.
Two structural choices separate competent guides from weak ones. First, the guide moves from broad to specific, not the other way round. Asking about a specific feature too early biases the participant's recall of the broader context. Second, the guide treats every "would you" question with suspicion. Hypothetical answers are unreliable; ask about what people did last time, not what they might do next time.
Questioning technique
The single most useful technique in user interviews is the behavioural retrospective. Instead of asking "how do you typically handle X?", ask "walk me through the last time you did X". The retrospective version produces concrete detail, surfaces real workflows, and reveals the workaround behaviours that aspirational answers conceal.
A short list of disciplines distinguishes strong interview practice from weak.
- Ask about behaviour, not preference. "What did you do" beats "what do you prefer".
- Use retrospectives, not hypotheticals. "Last time you" beats "if you had to".
- Probe, don't lead. "Tell me more about that" beats "and was that frustrating?".
- Let silence work. A five-second silence after a participant trails off often produces the most useful follow-up. Filling the silence is the moderator's most common mistake.
- Single-barrelled questions only. "What did you do, and how did you feel about it?" splits the answer; ask each separately.
- Avoid jargon. Both yours and theirs. If a participant uses a term, ask what they mean by it before assuming you share the definition.
- Don't validate. Reacting with "right, that's a great point" subtly trains the participant to give you more of what you reacted to. Neutral acknowledgements only.
The two most useful sentences in interview moderation, in my experience: "Tell me more about that" and (when the participant pauses) saying nothing. The first invites depth without leading. The second lets the participant find the thought they're reaching for. Most weak interview transcripts I've reviewed contain too much moderator and not enough participant; the discipline is to talk less, prompt better, and trust silence.
Avoiding bias
Bias contamination is the most common cause of qualitative findings that don't survive contact with the product. Six bias sources are worth naming because they are addressable; the others are structural and need awareness rather than fix.
Confirmation bias: hearing what you expected to hear. Mitigated by writing predictions before the study and explicitly testing whether the transcripts contradict them. Leading questions: signalling the answer in the question. Mitigated by piloting the guide and rewriting any prompt that previews a desired response. Acquiescence bias: participants agreeing with the interviewer to be polite. Mitigated by asking for examples and counter-examples, never "do you agree?". Recency bias: over-weighting the most recent interview in synthesis. Mitigated by tagging during fieldwork, not after it, and revisiting earlier transcripts during analysis. Recruitment bias: speaking with self-selected enthusiasts. Mitigated by behavioural screeners and quota-based recruitment. Synthesis bias: pattern-matching against what the stakeholders want to find. Mitigated by validating themes against the raw quotes that support them, not from memory.
Two operational practices halve the typical bias load. Have a second person attend at least every third interview; their observations during synthesis surface what the moderator missed. And explicitly validate findings against source transcripts during write-up, not against general recollection.
Analysis and synthesis
The synthesis phase is where studies are won or lost. Strong recruiting and good moderation produce raw material that still has to be turned into a finding that changes a decision. Synthesis takes longer than most stakeholders assume: as a rule of thumb, count on three to four hours of synthesis per hour of interview audio, including AI-assisted workflows.
The standard analysis cycle: tag transcripts with descriptive codes; cluster codes into themes; rank themes by frequency, severity and confidence; validate each theme against the raw quotes; write the narrative; produce the readout artefacts. A senior researcher does each of these steps with discipline; a junior researcher typically over-tags, under-clusters and produces a list of forty findings that the stakeholder room collapses into the three they wanted anyway.
By 2026, AI accelerates the first two steps credibly. A tagging pass that took an afternoon in 2022 takes minutes in 2026, and the cluster summary that the model produces is a usable first draft for an experienced researcher. The risk is the researcher who skips validation: the model's clusters are plausible, the researcher signs off without checking the raw quotes that support them, and a hallucinated finding makes it into the readout. The full operational view of AI's role in synthesis sits in AI-assisted UX research; the rule of thumb is that AI accelerates the mechanical layer of analysis and degrades the senior judgement layer.
The readout
The readout is half the work of the study. A brilliantly designed study presented badly produces no decision change; a modestly scoped study presented well changes minds. Senior researchers in 2026 spend a disproportionate amount of their effort on stakeholder communication, because the readout is where the decision actually shifts.
The strongest readouts have a consistent shape. Headline: the single most important finding, stated in one sentence. Evidence: the three to five themes that support or qualify the headline, each anchored to participant quotes. Recommendations: what the team should do differently, ranked by impact and confidence. Open questions: what the study didn't answer, where additional work is needed. Appendices: participant breakdown, methodology, the discussion guide, full quote bank.
Three things to do in the readout, regardless of audience. Lead with the finding the room least expects, because that's the one most likely to shift behaviour. Use direct participant quotes rather than paraphrasing; the verbatim language is more persuasive than the synthesised summary. And rank recommendations explicitly; a list of seven equally-weighted ideas invites the room to cherry-pick the one they already wanted.
Five mistakes to avoid
The patterns I see most often in interview practice audits.
- Running interviews to confirm a decision already made. The fastest tell: the kickoff document says "validate the proposed feature". Validation is not research; it's theatre. Either commit to learning or skip the study.
- Over-recruiting and under-analysing. Twenty interviews recorded, six analysed, fourteen abandoned. Better to recruit eight and synthesise rigorously than to over-collect and run out of analysis time.
- Asking hypothetical questions. "Would you use a feature that did X?" produces optimistic answers that don't survive launch. Ask about behaviour, not preference; about the last time, not the next time.
- Synthesising from memory. A week after fieldwork, the moderator writes a summary based on what they remember. The vivid quotes survive; the structurally important findings vanish. Tag the source data; validate findings against quotes.
- A readout that lists everything. Forty observations presented as equally important gives the room permission to act on none of them. Rank the findings, defend the ranking, name the trade-offs.
Templates and tools
Operational artefacts for the cluster. The interview prep worksheet covers candidate-side preparation, but its structure (questions to anticipate, examples to prepare) maps cleanly onto the moderator-side preparation a researcher does before running their own interviews.
UX Interview Prep Worksheet
Structured worksheet for the four stages of an interview-led conversation. Useful for candidates preparing for interview rounds and for researchers preparing moderator prep.
The 0–4 Severity Framework
Use the same rubric on interview findings as on usability test findings. Severity, frequency, fix cost; ranked output the stakeholder room can prioritise.
Frequently asked questions
How many users should I interview?
For a homogeneous segment and a clearly scoped question, six to eight interviews surfaces most of the themes you would see at twelve. For two segments, double the number. For three or more segments, reconsider whether you're trying to answer too many questions at once. Theme convergence by participant five or six is the honest signal that you have the right segment and question.
How long should a user interview be?
Forty-five to sixty minutes for most product research. Schedule sixty, target forty-five, leave a buffer for follow-up and rapport. Anything beyond ninety minutes belongs in contextual enquiry territory rather than interview territory; participant attention fades, and the marginal insight rarely justifies the additional time.
Should I record user interviews?
Yes, with explicit consent, named purpose, retention limit and clear opt-out. Recording lets you listen for what you missed and validate synthesis later. Store recordings in tooling that satisfies the participant's jurisdiction, delete after the project, and never paste raw transcripts into consumer AI tools without explicit permission to do so.
What questions should I ask in a user interview?
Open, behavioural, retrospective. Ask about what people did, not what they would do. Ask about the last time something happened, not how often. Use "why" sparingly because direct why-questions invite rationalisation; instead, probe the behaviour and let the reasoning emerge. Avoid hypotheticals, double-barrelled questions, and anything that signals the answer you want to hear.
How do I avoid bias in user interviews?
Five disciplines reduce most bias: pilot the discussion guide on a colleague first; phrase questions behaviourally not evaluatively; treat silence as room to think; have a second person attend at least every third interview; and validate themes against source transcripts rather than from memory. The deepest bias risk is the moderator's own prior beliefs about what the study will find.
What's the difference between user interviews and customer interviews?
User interviews ask how people use the product. Customer interviews ask how people decided to buy it and what would make them leave. Both have a place; product teams typically need user interviews more, sales and marketing need customer interviews more. Conflating them is a common mistake; a user interview that drifts into purchase rationale rarely produces useful product insight.
How do I analyse user interviews?
Tag the transcript with codes describing behaviour and reasoning. Cluster codes into themes. Validate themes against the raw quotes that support them. Rank themes by impact and confidence. AI can do the first pass of tagging and clustering credibly in 2026; the senior researcher validates, challenges and reframes, then writes the synthesis. Skipping validation turns AI-assisted analysis into AI-substituted analysis, which produces plausible but unreliable findings.