What UX research is for
UX research exists to reduce the cost of being wrong. That is the operational definition. Every research method, framework and ritual ultimately serves that purpose, and methods that don't pay back the cost of running them eventually get cut.
Three things change once you accept that framing. Method selection becomes a function of decision risk rather than personal preference. Sample sizes shrink to what the decision actually needs. And the standard for "good" research is no longer "rigorous" but "informed the decision before the decision was made". A piece of research that arrives the week after the team has shipped is not late research. It is wasted research.
This guide is structured around that view. It maps the major methods to the kinds of decisions they inform, names the trade-offs in time and stakeholder cost, and identifies the situations where each method is wasted effort. It is the pillar of UX Companion's research cluster, and the spokes that follow (user interviews, usability testing, and more to come) cover the operational details of each method.
The two axes
Almost every UX research method can be located on two axes. Naming them up front means the rest of the guide reads as a map rather than a list.
How modern teams categorise UX research
Qualitative versus quantitative. Qualitative answers "why" and "how". It is depth at the cost of scale. Quantitative answers "how much", "how often", and "which segment". It is scale at the cost of depth. Mature research practices combine both: the qualitative finding informs the quantitative measure; the quantitative anomaly triggers the qualitative investigation.
Generative versus evaluative. Generative methods discover what the problem is. They are used early, when the team is uncertain. Evaluative methods test a specific design or hypothesis. They are used later, when the team has a candidate solution.
A method belongs to one cell of this 2x2 grid. Interviews are qualitative-generative. Surveys with closed questions are quantitative-evaluative. Usability testing is qualitative-evaluative. Analytics is quantitative-generative or quantitative-evaluative depending on what you are looking for. Knowing the cell tells you what the method is for and what it cannot do.
Qualitative methods
Qualitative methods produce small numbers of deep observations. They are the method of choice when the team needs to understand a behaviour, motivation, mental model, or workflow at a level of detail that numbers cannot reach. The cost is sample size: a qualitative study of 8 to 12 participants surfaces what a survey of 500 cannot, but it cannot tell you how widespread the finding is across the user base.
Five qualitative methods cover the majority of practical product work. User interviews for understanding motivations, mental models, and decision contexts. Contextual enquiry for observing real-world workflows in their environment, especially valuable for enterprise software and field-worker products. Usability testing for evaluating a design against a task. Diary studies for capturing experiences that unfold over days or weeks rather than a single session. And concept testing for sounding out a candidate idea before committing engineering effort.
The honest signal in qualitative work is convergence. By participant five or six in a qualitative study, themes recur. If they don't, the recruitment screener was wrong, the segment isn't homogeneous, or the question being studied is shallower than the team assumed. Senior researchers read non-convergence as a signal, not a failure.
Quantitative methods
Quantitative methods produce numbers across enough people to generalise. They are the method of choice when the team needs to prioritise, segment, or demonstrate effect. They are also the method most frequently misused, because the apparent precision of a number invites overconfidence.
Five quantitative methods cover the majority of practical product work. Behavioural analytics for understanding what users actually do across the product. Surveys for capturing self-reported attitude, satisfaction, and demographic distribution. Card sorting at scale for testing information architecture. Tree testing for evaluating navigation findability. A/B testing for measuring the lift of a specific design change in production.
The recurring quantitative trap is the assumption that a statistically significant result is a meaningful one. A 0.4 percent lift in clickthrough on a multi-million-user surface is real, statistically robust, and may not be worth the engineering cost to maintain. A senior researcher's job is to translate the statistical claim into a commercial one before stakeholders run with the headline number.
Generative versus evaluative
The single most common research mistake in product teams is reaching for the wrong axis of methods for the question at hand. Generative methods are used when the team does not yet know what the problem is. Evaluative methods are used when the team has a candidate solution and needs to know if it works.
A team that runs usability tests on a prototype before deciding whether the underlying problem is the right one to solve has skipped the generative step. The usability test will produce findings; those findings will not answer the strategic question. A team that runs more interviews after a clear opportunity has emerged is in the opposite trap: spending effort on confirmation when they could be testing the candidate solution.
The most common stakeholder confusion I see is the request for "research" without specifying the decision the research has to inform. The first hour of any research kickoff should be spent forcing the decision into the open. "What changes if the research says X? What changes if it says Y?" If the answer is the same, the research is not worth running.
The eleven core methods
An operational reference. For each method: what it is, when it earns its place, when it doesn't, and the rough time-to-insight on a realistic project timeline.
1. User interviews
Cell. Qualitative-generative. Best for. Understanding motivations, mental models, decision contexts, the "why" behind observed behaviour. Sample size. 6 to 12 per segment. Avoid when. The team needs to know how widespread a behaviour is rather than why it happens. Time to insight. Two to four weeks from kickoff to readout, recruitment-bound. The cluster's dedicated spoke covers the operational detail in the user interviews guide.
2. Usability testing
Cell. Qualitative-evaluative. Best for. Identifying friction in a specific design against a defined task. Sample size. 5 to 8 per segment for qualitative findings; larger if numeric measures matter. Avoid when. The team has not yet decided what the design should do, or when the prototype is too sketchy to elicit realistic behaviour. Time to insight. One to two weeks. The cluster spoke on usability testing covers task design, moderation, and severity scoring.
3. Behavioural analytics
Cell. Quantitative-generative or evaluative. Best for. Understanding what users actually do at scale, surfacing drop-off points, segmenting behaviour by cohort. Sample size. Already in the data. Avoid when. The instrumentation is unreliable, when "users" are obscured by bots and sessions are misattributed, or when the team conflates correlation with explanation. Time to insight. Hours to days for a focused analysis; longer if instrumentation needs fixing first.
4. Surveys
Cell. Quantitative, generative or evaluative depending on question type. Best for. Measuring attitude, satisfaction, demographic distribution, and segment prevalence at scale. Sample size. 150 to 400 per cell as a serviceable minimum; effect sizes determine the floor. Avoid when. The team needs to know why people do things, when self-report is unreliable for the behaviour in question, or when the team is fishing for a finding rather than testing a hypothesis. Time to insight. One to three weeks.
5. Card sorting
Cell. Mixed; qualitative if moderated and open, quantitative if unmoderated and closed. Best for. Understanding how users group concepts, informing information architecture. Sample size. 15 to 30 per segment for unmoderated; 8 to 12 for moderated. Avoid when. The IA decisions have already been made and the card sort is being used as theatre. Time to insight. Two weeks for unmoderated; longer if recruitment is hard.
6. Tree testing
Cell. Quantitative-evaluative. Best for. Testing whether users can find content within a proposed navigation, isolated from visual design. Sample size. 50 to 100 per tree. Avoid when. The team is testing wayfinding in the context of real visual design, where tree testing strips away the cues that matter. Time to insight. One to two weeks.
7. Diary studies
Cell. Qualitative-generative. Best for. Capturing experiences that unfold over time, where moment-in-time methods miss the texture of how a behaviour fits into a life or workflow. Sample size. 10 to 20 participants over 2 to 6 weeks. Avoid when. The behaviour you care about happens in a single session, or when participant fatigue would make the data unreliable. Time to insight. Six to ten weeks including fieldwork.
8. First-click testing
Cell. Quantitative-evaluative. Best for. Measuring whether users click the right thing first on a screen, isolated from completion. Sample size. 30 to 50 per design. Avoid when. The task realistically involves more than one decision, in which case completion rates and usability testing serve better. Time to insight. Days.
9. Contextual enquiry
Cell. Qualitative-generative. Best for. Observing users doing real work in their real environment. Disproportionately valuable for enterprise software, field-worker products, and any domain where the office desk study would miss the actual workflow. Sample size. 6 to 10 participants. Avoid when. The product is fully online and consumer-facing, where remote methods produce equivalent insight at lower cost. Time to insight. Three to six weeks.
10. Concept testing
Cell. Qualitative-evaluative or quantitative-evaluative. Best for. Sounding out a candidate idea before committing engineering effort. Catches the ideas that no users want before the team builds them. Sample size. 8 to 12 qualitative; 200 to 400 quantitative. Avoid when. The concept is too abstract to evaluate without a working artefact, in which case usability testing on a prototype works better. Time to insight. One to three weeks.
11. A/B testing
Cell. Quantitative-evaluative. Best for. Measuring the behavioural lift of a specific change in production with real users and real stakes. Sample size. A function of base conversion rate and minimum detectable effect; many product teams chronically under-power tests. Avoid when. The change is small and the audience is small, when the team will run only one test, or when the metric being optimised is poorly chosen. Time to insight. Two to six weeks per test cycle.
Choosing the right method
The decision tree for method selection is shorter than most research literature suggests. Five questions get you most of the way there.
Five questions that pick the method
- What decision will this research inform? If there isn't one, the research is theatre. Name the decision before naming the method.
- Do we know what the problem is? If no, generative. If yes, evaluative.
- Do we need depth or scale? If depth, qualitative. If scale, quantitative. Most product research needs depth first, scale second.
- What's the cost of being wrong? High cost justifies higher-rigour methods. A pricing-page tweak does not justify a six-week diary study; a B2B product redesign does.
- What can the team actually act on? The most rigorous study in the world produces nothing if the organisation has no capacity to change the product in response.
Senior researchers often run this decision tree in their heads in five minutes during a stakeholder conversation. The discipline is making the questions explicit when the room is sceptical, when budgets are tight, or when a junior researcher needs to see how the call gets made.
Six mistakes to avoid
The recurring patterns in weak research practice. Each is the kind of failure mode I see most often in audits of UX practice inside product teams.
- Method first, decision second. "Let's run usability tests" before the team has decided what the design should do. The findings inform nothing because there is no decision to inform.
- Over-recruiting and under-analysing. Twenty interviews recorded, three analysed, seventeen abandoned. Better to recruit six and synthesise rigorously.
- Leading questions, predictable answers. "How much do you love this feature?" gets the answer it deserves. Bias contamination is the most common cause of useless qualitative data.
- Surveys that ask everyone everything. Forty-question surveys with eight-percent completion rates and self-selection bias. The longer the survey, the worse the sample.
- Findings without prioritisation. A readout that lists forty observations gives stakeholders permission to pick the three they already wanted to act on. Rank the findings by impact and confidence; defend the ranking.
- Research that arrives after the decision. A four-week study that lands the week after the design is committed. The research wasn't bad; it was late. Lateness is a research failure.
Stakeholder reality
UX research is performed inside an organisational context that rarely matches the textbook. Three patterns are worth naming because they shape what good research practice actually looks like in commercial environments.
Stakeholders rarely want the research they ask for. The PM who asks for "user research" usually wants validation that the planned feature is correct. The job of the senior researcher is to surface that, push back where appropriate, and translate the underlying decision into a research design that actually informs it. Doing the requested study verbatim, when the stakeholder is asking the wrong question, makes the researcher complicit in the failure that follows.
The deadline is usually fixed before the method is chosen. "We need findings by Friday" arrives more often than "what research would best inform this decision". A competent researcher trades scope and rigour against the deadline transparently: "If you give me two weeks, I can run six interviews; if I have three days, I can run a hallway test." The trade gets named in writing.
The readout is half the work. A brilliantly designed study presented badly produces zero behaviour change. A modestly scoped study presented well changes minds. Senior researchers in 2026 spend a disproportionate amount of their effort on stakeholder communication, not because the research matters less, but because the readout is where the decision actually shifts.
AI's place in research
By 2026, AI has rearranged what a researcher's day looks like rather than replaced the role. The pattern is consistent across product teams: AI accelerates the synthesis layer, which means more research can be turned around in the same calendar time, which means senior researchers do more strategic and stakeholder work and less mechanical coding.
The areas where AI now earns its place: tagging and clustering interview transcripts as a first pass, drafting summary findings, scoping survey questions, generating opportunity-sizing scaffolds from mixed inputs, and producing first-draft persona structures that humans then validate. The areas where AI continues to underperform: recognising what an interviewee is not saying, recruiting and rapport-building, deciding which finding matters, presenting findings to a sceptical room, and translating insight into product decisions.
The full operational view sits in the AI cluster: AI-assisted UX research covers the specific tools, workflows and risks. The framework for deciding which research tasks AI should and shouldn't touch is in what AI should not replace in UX.
Operational templates
Reference artefacts for the cluster. The library is growing; the items below are live now.
The UX Audit Severity Framework
The 0–4 severity rubric used in audit readouts. Applies cleanly to usability test findings and research observations as well.
Human Judgement vs AI Support framework
The three-category decision framework. Which research tasks should be human-led, AI-assisted, or AI-led with human review.
Frequently asked questions
What are the main UX research methods?
The major methods are user interviews, usability testing, surveys, behavioural analytics, card sorting, tree testing, diary studies, contextual enquiry, concept testing, first-click testing, and A/B testing. They divide on two axes: qualitative versus quantitative, and generative versus evaluative. Modern product research practices typically run two to four of these methods regularly, choosing based on the decision the research has to inform.
How do I choose the right UX research method?
Name the decision the research must inform. If the team does not yet know what the problem is, choose generative methods (interviews, diary studies, behavioural analytics). If the team has a candidate solution, choose evaluative methods (usability testing, tree testing, A/B). If the decision is about prioritisation across many users, choose quantitative methods (surveys, analytics segments). The method follows the decision.
What's the difference between generative and evaluative research?
Generative research uncovers what the problem is. Broad, exploratory, used early. Examples: interviews, diary studies, behavioural analytics. Evaluative research tests a specific design or hypothesis. Narrow, confirmatory, used later. Examples: usability testing, tree testing, A/B. A team running evaluative methods before the underlying problem is understood is the most common research misstep.
How many users do I need for UX research?
For qualitative research, 5 to 8 users per segment surfaces most usability issues; 8 to 12 covers most generative questions. For quantitative research, the floor is set by the effect size you need to detect; 150 to 400 respondents per cell is a serviceable starting point for survey work. The bottleneck in research is almost always synthesis time, not sample size; most teams over-recruit and under-analyse.
How long does UX research take?
A scoped piece of qualitative research runs two to four weeks from kickoff to readout. A usability test runs one to two weeks. A diary study runs six to ten weeks because of fieldwork. Behavioural analytics on existing data takes hours to days. Recruitment, not analysis, is the usual bottleneck. Stakeholders typically assume research is slower than it actually is.
Do startups need UX research?
Yes, with a different shape than enterprise. Startups need fewer participants, shorter cycles, and more weight on behavioural analytics and rapid usability tests. Five-user studies and two-week cycles serve startup contexts better than the eight-week study cycles common in mature organisations. The cost of getting the problem wrong is higher at a startup, which means research matters more, not less.
Can AI replace UX research?
No, but it reshapes the role. AI accelerates synthesis, coding, theme clustering, and first-pass analysis. It cannot recruit, build rapport, recognise what an interviewee is not saying, decide which finding matters, or carry an insight into a sceptical stakeholder room. Senior researchers in 2026 spend less time on mechanical synthesis and more time on study design, stakeholder facilitation, and decision translation.
What's the difference between UX research and market research?
Market research asks who the customer is and what they will buy. UX research asks how that customer behaves with a specific product, where they struggle, and what would help. They overlap at the boundaries: ethnography is shared territory, and segmentation work informs both. Healthy product organisations have the two functions sit close together and share fieldwork.