User Interviews Guide (2026) — How Modern Teams Run Them

Q: How many users should I interview?

For a clearly scoped research question and a homogeneous segment, six to eight interviews surfaces most of the themes you'll see at twelve. For two segments, double the number; for three or more, reconsider whether you're trying to answer too many questions at once. The honest signal is theme convergence: by participant five or six, recurring patterns should be visible. If they're not, the screener is wrong, the segment isn't homogeneous, or the underlying question is shallower than assumed.

Q: How long should a user interview be?

Forty-five to sixty minutes for most product research. Longer interviews lose participant attention; shorter ones rarely get past surface answers into the texture of behaviour. Schedule sixty minutes, target forty-five, leave a buffer. Anything over ninety minutes belongs in contextual enquiry territory rather than interview territory.

Q: Should I record user interviews?

Yes, with explicit consent, named purpose, retention limit, and a clear opt-out. Recording is the only way to listen for what you missed in the moment and to validate your synthesis later. The privacy obligations are real: store recordings in a tool that satisfies the participant's jurisdiction, delete after the project, never paste raw transcripts into consumer AI tools without permission.

Q: How do I analyse user interviews?

Tag the transcript with codes that describe behaviour and reasoning. Cluster the codes into themes. Validate the themes against the raw quotes that support them. Rank the themes by impact and confidence. In 2026, AI can do the first pass of tagging and clustering credibly; the senior researcher validates, challenges and reframes, then writes the synthesis. Skipping the validation step is the mistake; the AI output is a draft, not a deliverable.

When interviews earn their place

A user interview is a one-to-one conversation, typically forty-five to sixty minutes, in which the researcher elicits behavioural, contextual and motivational data from a participant who matches a defined recruitment profile. That is the textbook definition. The operational definition is shorter: interviews are how you find out what is actually happening when the numbers can't tell you.

Interviews earn their place when the team needs to understand motivations behind observed behaviour, mental models of a domain or product, decision contexts that surveys flatten, the workflow that surrounds a single point of product friction, or the language users actually use to describe what they're trying to do. They do not earn their place when the team needs to know how widespread a behaviour is, when self-report would be unreliable, or when the decision is between two equally specified designs (in which case usability testing or A/B is the right tool).

Interviews are the method of choice when the team has run out of explanations. The numbers say something is wrong; nobody knows why. That gap is where interviews earn their cost.

Interview research is also expensive in stakeholder time. Recruiting, scheduling, conducting and synthesising twelve interviews is two to four weeks of researcher effort, plus the participant cost. A team that runs interviews to confirm a decision they have already made is wasting that effort; a team that runs interviews before the decision is framed will produce findings the room can act on. The cluster pillar covers the broader choosing a research method decision; this guide covers the operational reality once that choice has been made.

Planning the study

The first hour of any interview study is spent on a kickoff document that names four things. Anything cut from this list weakens the study disproportionately.

Interview kickoff

Four questions to answer before recruiting

What decision does this research inform? Name it specifically. "Should we build feature X?" "Why are users abandoning at checkout step 3?" "What workflow does the field worker actually follow?" If no decision is named, the study will produce findings that no one acts on.
Who, specifically, are we interviewing? The segment, the use case, the experience level, the recency of the relevant behaviour. The screener is a function of this answer; weak screeners come from vague answers.
What do we expect to learn? Write out the three to five most likely findings up front. This is not a prediction to confirm; it's a way to recognise the moment when the study tells you something unexpected, and to avoid running a study that only confirms what you already knew.
What does the readout look like? A one-pager? A deck for leadership? A working session with the engineering team? The form of the readout shapes what kind of evidence you need to capture during the interviews.

Stakeholders who ask for "user interviews" without specifying the underlying decision should be pushed back on. Doing the requested study verbatim, when the underlying question is unclear, makes the researcher complicit in the failure that follows. Senior practitioners spend a meaningful portion of any kickoff forcing the decision into the open.

Recruitment

Recruitment is the single largest source of failure in interview studies. Bad recruitment means the segment you intended to study is not the segment you actually spoke with, which means the findings don't transfer to the decision they were meant to inform. The operational reality is that recruitment takes longer than every other stage of the study and is rarely budgeted as such.

The screener is a short questionnaire that filters volunteers against the target profile. It needs four things: a behavioural qualifier (have you done X in the last Y), a context qualifier (in what setting), a non-leading framing (no "are you interested in topic Z" questions), and a quota or no-quota stance per segment. Screeners written by stakeholders almost always include an unintentional self-selection bias; researcher review is non-optional.

Three recruitment sources cover most product research. Internal customer list, ideal for product-specific behavioural studies and the cheapest source, but only available to teams whose customers know the brand. Panel providers like User Interviews, Respondent, Prolific or Ethnio, faster and lower-friction but more expensive and with sample profile risks. Open recruitment via social channels, lowest cost and highest self-selection bias; viable for generative discovery work, weaker for evaluative studies. Incentives are inevitable; for forty-five-minute professional interviews, £75 to £150 is the 2026 range in the UK and US.

The discussion guide

The discussion guide is the structured list of question areas the moderator will work through. It is not a script. The discipline is to write it as a series of behavioural prompts and probes rather than as a fixed wording, so the moderator can adapt to what the participant actually says.

The standard structure runs warm-up, context, current behaviour, candidate concept (if evaluative), wrap. The warm-up establishes rapport and orients the participant; ten minutes of context-setting is rarely wasted. The middle is where the real material lives: probing recent behaviour, walking through workflows, surfacing decisions and mental models. The wrap closes out, asks anything we should have asked, thanks the participant.

Two structural choices separate competent guides from weak ones. First, the guide moves from broad to specific, not the other way round. Asking about a specific feature too early biases the participant's recall of the broader context. Second, the guide treats every "would you" question with suspicion. Hypothetical answers are unreliable; ask about what people did last time, not what they might do next time.

Questioning technique

The single most useful technique in user interviews is the behavioural retrospective. Instead of asking "how do you typically handle X?", ask "walk me through the last time you did X". The retrospective version produces concrete detail, surfaces real workflows, and reveals the workaround behaviours that aspirational answers conceal.

A short list of disciplines distinguishes strong interview practice from weak.

Ask about behaviour, not preference. "What did you do" beats "what do you prefer".
Use retrospectives, not hypotheticals. "Last time you" beats "if you had to".
Probe, don't lead. "Tell me more about that" beats "and was that frustrating?".
Let silence work. A five-second silence after a participant trails off often produces the most useful follow-up. Filling the silence is the moderator's most common mistake.
Single-barrelled questions only. "What did you do, and how did you feel about it?" splits the answer; ask each separately.
Avoid jargon. Both yours and theirs. If a participant uses a term, ask what they mean by it before assuming you share the definition.
Don't validate. Reacting with "right, that's a great point" subtly trains the participant to give you more of what you reacted to. Neutral acknowledgements only.

From the practice

The two most useful sentences in interview moderation, in my experience: "Tell me more about that" and (when the participant pauses) saying nothing. The first invites depth without leading. The second lets the participant find the thought they're reaching for. Most weak interview transcripts I've reviewed contain too much moderator and not enough participant; the discipline is to talk less, prompt better, and trust silence.

Avoiding bias

Bias contamination is the most common cause of qualitative findings that don't survive contact with the product. Six bias sources are worth naming because they are addressable; the others are structural and need awareness rather than fix.

Confirmation bias: hearing what you expected to hear. Mitigated by writing predictions before the study and explicitly testing whether the transcripts contradict them. Leading questions: signalling the answer in the question. Mitigated by piloting the guide and rewriting any prompt that previews a desired response. Acquiescence bias: participants agreeing with the interviewer to be polite. Mitigated by asking for examples and counter-examples, never "do you agree?". Recency bias: over-weighting the most recent interview in synthesis. Mitigated by tagging during fieldwork, not after it, and revisiting earlier transcripts during analysis. Recruitment bias: speaking with self-selected enthusiasts. Mitigated by behavioural screeners and quota-based recruitment. Synthesis bias: pattern-matching against what the stakeholders want to find. Mitigated by validating themes against the raw quotes that support them, not from memory.

Two operational practices halve the typical bias load. Have a second person attend at least every third interview; their observations during synthesis surface what the moderator missed. And explicitly validate findings against source transcripts during write-up, not against general recollection.

Analysis and synthesis

The synthesis phase is where studies are won or lost. Strong recruiting and good moderation produce raw material that still has to be turned into a finding that changes a decision. Synthesis takes longer than most stakeholders assume: as a rule of thumb, count on three to four hours of synthesis per hour of interview audio, including AI-assisted workflows.

The standard analysis cycle: tag transcripts with descriptive codes; cluster codes into themes; rank themes by frequency, severity and confidence; validate each theme against the raw quotes; write the narrative; produce the readout artefacts. A senior researcher does each of these steps with discipline; a junior researcher typically over-tags, under-clusters and produces a list of forty findings that the stakeholder room collapses into the three they wanted anyway.

By 2026, AI accelerates the first two steps credibly. A tagging pass that took an afternoon in 2022 takes minutes in 2026, and the cluster summary that the model produces is a usable first draft for an experienced researcher. The risk is the researcher who skips validation: the model's clusters are plausible, the researcher signs off without checking the raw quotes that support them, and a hallucinated finding makes it into the readout. The full operational view of AI's role in synthesis sits in AI-assisted UX research; the rule of thumb is that AI accelerates the mechanical layer of analysis and degrades the senior judgement layer.

The readout

The readout is half the work of the study. A brilliantly designed study presented badly produces no decision change; a modestly scoped study presented well changes minds. Senior researchers in 2026 spend a disproportionate amount of their effort on stakeholder communication, because the readout is where the decision actually shifts.

The strongest readouts have a consistent shape. Headline: the single most important finding, stated in one sentence. Evidence: the three to five themes that support or qualify the headline, each anchored to participant quotes. Recommendations: what the team should do differently, ranked by impact and confidence. Open questions: what the study didn't answer, where additional work is needed. Appendices: participant breakdown, methodology, the discussion guide, full quote bank.

Three things to do in the readout, regardless of audience. Lead with the finding the room least expects, because that's the one most likely to shift behaviour. Use direct participant quotes rather than paraphrasing; the verbatim language is more persuasive than the synthesised summary. And rank recommendations explicitly; a list of seven equally-weighted ideas invites the room to cherry-pick the one they already wanted.

Five mistakes to avoid

The patterns I see most often in interview practice audits.

Running interviews to confirm a decision already made. The fastest tell: the kickoff document says "validate the proposed feature". Validation is not research; it's theatre. Either commit to learning or skip the study.
Over-recruiting and under-analysing. Twenty interviews recorded, six analysed, fourteen abandoned. Better to recruit eight and synthesise rigorously than to over-collect and run out of analysis time.
Asking hypothetical questions. "Would you use a feature that did X?" produces optimistic answers that don't survive launch. Ask about behaviour, not preference; about the last time, not the next time.
Synthesising from memory. A week after fieldwork, the moderator writes a summary based on what they remember. The vivid quotes survive; the structurally important findings vanish. Tag the source data; validate findings against quotes.
A readout that lists everything. Forty observations presented as equally important gives the room permission to act on none of them. Rank the findings, defend the ranking, name the trade-offs.

Templates and tools

Operational artefacts for the cluster. The interview prep worksheet covers candidate-side preparation, but its structure (questions to anticipate, examples to prepare) maps cleanly onto the moderator-side preparation a researcher does before running their own interviews.

Companion · Worksheet

UX Interview Prep Worksheet

Structured worksheet for the four stages of an interview-led conversation. Useful for candidates preparing for interview rounds and for researchers preparing moderator prep.

Download PDF DOCX

Companion · Severity rubric

The 0–4 Severity Framework

Use the same rubric on interview findings as on usability test findings. Severity, frequency, fix cost; ranked output the stakeholder room can prioritise.

Download PDF DOCX

Frequently asked questions

How many users should I interview?

For a homogeneous segment and a clearly scoped question, six to eight interviews surfaces most of the themes you would see at twelve. For two segments, double the number. For three or more segments, reconsider whether you're trying to answer too many questions at once. Theme convergence by participant five or six is the honest signal that you have the right segment and question.

How long should a user interview be?

Forty-five to sixty minutes for most product research. Schedule sixty, target forty-five, leave a buffer for follow-up and rapport. Anything beyond ninety minutes belongs in contextual enquiry territory rather than interview territory; participant attention fades, and the marginal insight rarely justifies the additional time.

Should I record user interviews?

Yes, with explicit consent, named purpose, retention limit and clear opt-out. Recording lets you listen for what you missed and validate synthesis later. Store recordings in tooling that satisfies the participant's jurisdiction, delete after the project, and never paste raw transcripts into consumer AI tools without explicit permission to do so.

What questions should I ask in a user interview?

Open, behavioural, retrospective. Ask about what people did, not what they would do. Ask about the last time something happened, not how often. Use "why" sparingly because direct why-questions invite rationalisation; instead, probe the behaviour and let the reasoning emerge. Avoid hypotheticals, double-barrelled questions, and anything that signals the answer you want to hear.

How do I avoid bias in user interviews?

Five disciplines reduce most bias: pilot the discussion guide on a colleague first; phrase questions behaviourally not evaluatively; treat silence as room to think; have a second person attend at least every third interview; and validate themes against source transcripts rather than from memory. The deepest bias risk is the moderator's own prior beliefs about what the study will find.

What's the difference between user interviews and customer interviews?

User interviews ask how people use the product. Customer interviews ask how people decided to buy it and what would make them leave. Both have a place; product teams typically need user interviews more, sales and marketing need customer interviews more. Conflating them is a common mistake; a user interview that drifts into purchase rationale rarely produces useful product insight.

How do I analyse user interviews?

Tag the transcript with codes describing behaviour and reasoning. Cluster codes into themes. Validate themes against the raw quotes that support them. Rank themes by impact and confidence. AI can do the first pass of tagging and clustering credibly in 2026; the senior researcher validates, challenges and reframes, then writes the synthesis. Skipping validation turns AI-assisted analysis into AI-substituted analysis, which produces plausible but unreliable findings.

Continue in the cluster

The User Interviews Guide

When interviews earn their place

Planning the study

Four questions to answer before recruiting

Recruitment

The discussion guide

Questioning technique

Avoiding bias

Analysis and synthesis

The readout

Five mistakes to avoid

Templates and tools

UX Interview Prep Worksheet

The 0–4 Severity Framework

Operational UX, every other Sunday.

Frequently asked questions

When interviews earn their place

Planning the study

Four questions to answer before recruiting

Recruitment

The discussion guide

Questioning technique

Avoiding bias

Analysis and synthesis

The readout

Five mistakes to avoid

Templates and tools

UX Interview Prep Worksheet

The 0–4 Severity Framework

Operational UX, every other Sunday.

Frequently asked questions

UX research methods

Usability testing guide

AI-assisted UX research

UX interview questions

UX audit hub

What AI should not replace