What heuristic evaluation is
Heuristic evaluation is an expert review method. A senior practitioner walks through a product and rates it against a set of established usability principles. The output is a list of findings, each citing the heuristic it violates, each scored for severity, each paired with a recommended fix.
It was defined by Jakob Nielsen and Rolf Molich in the early 1990s. The Nielsen Norman Group still maintains the canonical list. Three decades on, it remains the single most useful framework in UX because it covers the structural failure modes of digital products with very little overlap and very few gaps.
When to use it
The right moments to run a heuristic evaluation, in rough order of return on time invested.
- Before a usability test. Catch the obvious failures so the test isn't dominated by them.
- Before a redesign or replatforming. Baseline the original to avoid regressing the working parts.
- After a conversion drop. Heuristic review surfaces the qualitative failures analytics can't see.
- As a quarterly product health check. Bake it into the team's cadence. Every quarter, a senior practitioner walks the core flows against the heuristics.
- As part of a wider UX audit. The methodology spine; most audits include a heuristic review as the first pass. See the full UX audit guide for context.
What follows is the canonical ten, in Nielsen's original order. Each entry covers the principle, the failure patterns we see most often, an example, and how to score it.
1. Visibility of system status
The system always keeps users informed about what is going on, through appropriate feedback within reasonable time.
Failure patterns
- Loading states absent or misleading.
- Submitting forms with no acknowledgement.
- Multi-step flows that don't show progress.
- Async background actions (file uploads, payments) with no status surface.
- "Saved" indicators that never appear or never disappear.
2. Match between system and the real world
The system should speak the user's language, with words, phrases and concepts familiar to the user. Real-world conventions, information in a natural and logical order.
Failure patterns
- Internal jargon leaking into the UI ("entity", "object", "instance").
- Date formats that don't match the user's locale.
- Icons that mean something in the design system but nothing to the user.
- Information architectures organised by the company's org chart rather than the user's mental model.
- Error codes shown without translation.
The most common version of this failure is shipping the engineering team's vocabulary unchanged. Audit fix: a UX writing pass on every system-facing label.
3. User control and freedom
Users often choose system functions by mistake. They need a clearly marked emergency exit to leave the unwanted state without going through an extended dialogue. Support undo and redo.
Failure patterns
- Destructive actions with no confirmation step.
- Multi-step flows with no obvious way to back out.
- Modal dialogs that trap focus and offer only one path forward.
- No undo on delete, cancel, or unsubscribe actions.
- Saved drafts that can't be recovered after navigating away.
This heuristic overlaps with WCAG 3.3.4 (error prevention for legal, financial or data submissions). Where they overlap, cite the WCAG criterion in the audit — it carries more weight than the heuristic alone.
4. Consistency and standards
Users should not have to wonder whether different words, situations or actions mean the same thing. Follow platform and industry conventions.
Failure patterns
- Two different button styles for the same action across the product.
- Save buttons in different positions in different flows.
- Drift between the design system and the production code.
- Custom controls that mimic but don't behave like native ones (dropdowns that don't open on space, sliders that don't move on arrow keys).
- Different terminology for the same concept across pages.
5. Error prevention
Better than good error messages is a careful design that prevents a problem from occurring in the first place.
Failure patterns
- Free-text inputs where a constrained input would do (postcodes, phone numbers, dates).
- No inline validation; errors only surface on submit.
- Identical or near-identical buttons placed close together ("Save" next to "Delete").
- Time-bounded actions with no warning before timeout.
- Auto-correcting input that silently changes user data without confirmation.
6. Recognition rather than recall
Minimise the user's memory load by making objects, actions and options visible. The user should not have to remember information from one part of the dialogue to another.
Failure patterns
- Multi-step flows that surface key context only on the first step.
- Form fields where the label disappears when the user starts typing.
- Sub-menus that require the user to remember the path to find them again.
- Reference codes shown briefly then never again.
- Search results that strip the original query from the page.
7. Flexibility and efficiency of use
Accelerators, unseen by the novice user, may speed up interaction for the expert user. Allow users to tailor frequent actions.
Failure patterns
- No keyboard shortcuts for repetitive actions.
- Bulk actions absent in lists that obviously need them (selecting many items, sending many invites).
- No way to save common configurations as templates or favourites.
- Workflows that force novice steps on expert users with no opt-out.
This heuristic is the one most often deprioritised in audits because expert users are a smaller cohort. Worth scoring honestly; the lifetime value of expert users is disproportionate.
8. Aesthetic and minimalist design
Dialogues should not contain information that is irrelevant or rarely needed. Every extra unit of information competes with the relevant units and diminishes their relative visibility.
Failure patterns
- Form fields that ask for information the company won't use.
- Dashboards with twenty metrics where four would do.
- Marketing chrome around transactional moments (signup, checkout, password reset).
- Notifications, banners and cookie modals stacked on the same page.
- Decorative animation that competes with the primary action.
"Aesthetic" here doesn't mean prettier; it means quieter. The audit fix is almost always to remove, not to redesign.
9. Help users recognise, diagnose and recover from errors
Error messages should be expressed in plain language, precisely indicate the problem, and constructively suggest a solution.
Failure patterns
- "Something went wrong" with no specific guidance.
- Validation messages that name the rule violated but not the fix ("Invalid input").
- Error codes shown without explanation.
- Blame-laden language ("You entered an incorrect password" rather than "That password didn't match").
- Errors that disappear before the user can read them.
This is the heuristic most directly improved by good UX writing. The UX writing generator produces compliant variants for any error context.
10. Help and documentation
Even though it is better if the system can be used without documentation, it may be necessary to provide help. Any such information should be easy to search, focused on the user's task, list concrete steps to be carried out, and not be too large.
Failure patterns
- Help links that lead to generic marketing pages.
- Documentation organised by feature rather than by user task.
- "Contact us" as the only available support path.
- In-product help that opens a new tab and loses the user's context.
- Tooltips and microcopy missing on complex controls.
Severity scoring
Findings without severity scores get ignored. Use a 0 to 4 scale, applied honestly. Most teams default to scoring everything as severity 3 because it feels safe; this dilutes the prioritisation and the roadmap becomes a flat list.
How to score each finding
- 0 — Not a usability problem. The finding was an opinion, not an issue against the heuristic. Drop it from the report.
- 1 — Cosmetic. Need not be fixed unless extra time is available. Visual polish, minor inconsistencies.
- 2 — Minor. Fix if time allows. Users encounter the problem occasionally and work around it.
- 3 — Major. Must be fixed before the next release. Users encounter the problem frequently and it slows or blocks them.
- 4 — Catastrophe. Imperative to fix. Users cannot complete the task, or the failure causes data loss, financial harm, or accessibility exclusion.
Severity is a function of three factors: frequency (how often it occurs), impact (how badly it affects the user), and persistence (whether it gets worse with repeated use). Score honestly; the roadmap is the document the team will actually act on.
Operational workflow
A defensible heuristic evaluation runs in five steps. Same shape for a one-person review and a five-person panel.
Running the evaluation
- Define the scope. Which screens, which flows, which user roles. Write it down in a single paragraph.
- Two passes per evaluator. First pass: get a feel for the product. Second pass: methodically check each screen against the ten heuristics. The two-pass rule materially improves catch rate.
- Capture findings as they appear. Each finding: screenshot, heuristic violated, severity score, recommended fix, estimated effort. Don't wait until the end to write them up.
- Consolidate across evaluators. If you used more than one evaluator, merge duplicates, reconcile severity scores, and rank.
- Present, don't just send. Walk the team through the top findings live. The deliverable is behaviour change, not a document.
Three to five evaluators is the empirical sweet spot. A single evaluator catches around 35 percent of issues in the typical product; three evaluators catch around 60 percent; five catch around 75 percent. Beyond five, the return diminishes sharply. If you're auditing alone, score everything more conservatively — your blind spots are larger than you think.
Frequently asked questions
What is heuristic evaluation in UX?
An expert review method where a senior practitioner rates a product against established usability principles. Nielsen's 10 heuristics are the most common. The output is prioritised, severity-scored findings.
What are Nielsen's 10 heuristics?
Visibility of system status; match between system and the world; user control and freedom; consistency and standards; error prevention; recognition over recall; flexibility and efficiency; aesthetic and minimalist design; help users recover from errors; help and documentation.
How many evaluators do I need?
Three to five. A single evaluator catches around 35 percent of issues; three catch around 60 percent; five catch around 75 percent. Beyond five, the return diminishes.
How do I score severity?
Use a 0 to 4 scale. 0 not a problem; 1 cosmetic; 2 minor; 3 major; 4 catastrophe. Severity is a function of frequency, impact and persistence. Score honestly.
Heuristic evaluation vs usability testing — which first?
Heuristic evaluation first. It catches the known failure modes cheaply so the usability test can focus on unknowns. Reversing the order means paying users to discover problems a senior practitioner would have flagged in a morning.