Skip to content

I Use AI to Complete UX Research 20% Faster. The Time It Frees Up Goes on Strategy, Stakeholder Influence, and the Strategic Judgment AI Can't Match Yet.

I turned 400+ qualitative observations into a quantitative prioritisation framework, built with AI and validated by hand. The methodology spans synthesis, storytelling, discovery, and documentation. It includes where AI failed, because knowing the limits matters as much as the applications.

Five vintage toy robots in blue, purple, and grey lined up on white background representing AI as tools for UX research.
Toy robots from an era when AI was science fiction. The real thing is more useful than they imagined, and more limited than most people admit. Photo by Eric Krull on Unsplash.
ℹ️
The 400+ observation case study on this page comes from work at By Miles in 2025. The methodology it demonstrates has been applied and refined across every study since.

The Headlines

20% faster research. A quantitative prioritisation framework built from 400+ qualitative observations. A frank account of where AI failed completely.

I use AI selectively across synthesis, storytelling, discovery, and documentation. Every application is planned, validated, and sense-checked against raw data. The strategic judgment stays with me. This page shows where the line is drawn and why it produces better research.


AI Earns Its Place. It Doesn't Get a Free Pass.

My strategic work requires human judgment, business context, and domain expertise no AI model currently has. Problem reframing, opportunity identification, stakeholder influence, roadmap input: these stay with me. AI earns its place on the time-consuming work that doesn't require that judgment. Sentiment analysis, report drafting, discovery validation, journey narratives.

When I kick off a study, I map the full workflow before touching any tools, identifying where AI can act efficiently and ethically, and where it can't. Every AI output gets validated against raw data. PII gets removed before anything leaves my hands. And I know from direct experience where AI fails completely, because I've watched it fabricate sources, invent quotes, and confuse one participant's feedback with another's.

That experience is part of the methodology. Knowing the limits is as important as knowing the applications.


400+ Observations. No Way to Prioritise Them. So I Built One.

In October 2025, I led By Miles' first remote, moderated member interview study. Three hour-long sessions. Eight emerging themes. 400+ individual observations. A senior product manager who needed a defensible prioritisation framework, not just a list of findings.

Qualitative research doesn't naturally produce one. So I built it.

Six Days of Observation Work AI Can't Touch

I watched each session twice: once without notes to absorb the full picture, once capturing everything. Verbatim quotes, body language, on-screen interactions, hesitancy, dwell time. Every observation landed as an individual sticky in FigJam, organised by theme and participant. 400+ stickies before AI touched a single one.

PII came out before anything moved. Names, postcodes, policy numbers, email addresses, all scrubbed. Only then did the data transfer into Google Sheets, structured and ready for analysis.

AI can't watch videos. It can't read body language. It can't tell the difference between what a participant does and what they say. The six days of manual observation aren't a preliminary step. They're where the strategic judgment happens.

Three Days. This Is Where AI Earned Its Place.

Sentiment Analysis

I fed ChatGPT the spreadsheet in small chunks, one task at a time, asking it to extract up to seven insights per sentiment category: positive, neutral, negative, and ideas for improvement. Seven, not five or ten. Five left gaps; ten produced too much noise to work with efficiently across multiple participants.

Every insight went back to the raw transcripts. ChatGPT hallucinates. It makes up quotes. It confuses one participant's feedback with another's when conversations run too long. Roughly 15 to 25% of its output needed correction: hallucinations, misattributions, or generic insights with no grounding in the data.

That error rate isn't a reason to abandon the tool. It's a reason to validate obsessively.

Empathy Categorisation

I asked ChatGPT to classify each observation across three dimensions: empathy lens (Pain, Neutral, or Gain), empathy category (Does, Says, Feels, or Thinks), and a one-sentence reasoning for each classification. Traditional empathy mapping across 400+ observations would have been unworkable. Spreadsheet format made it scalable.

The Does category carries the most weight in my scoring. Observed behaviour is more reliable than stated preference, particularly when participants are incentivised. A £50 Amazon voucher changes what people say. It doesn't change what they do on screen. That distinction shaped the entire scoring mechanism.

Hybrid Scoring Mechanism

Building a scoring system manually would have taken two days and produced something far less sophisticated. With AI, it took a fraction of that time and produced something the product team could interrogate, filter, and reuse.

I worked with ChatGPT to build eight metrics into the framework:

MetricCalculationPurpose
Evidence WeightDoes = 3, Says = 2, Feels = 2, Thinks = 1Observed behavior is most reliable
Sentiment WeightNegative = 3, Idea = 2, Neutral = 1, Positive = 0Prioritises actionable insights
Lens WeightPain = 3, Neutral = 2, Gain = 1Gives more weight to pain points
Theme FrequencyCount of observations with same themeIdentifies recurring patterns
Opportunity ScoreSentiment + Lens + log(1 + Frequency)Highlights impactful, recurring issues
Pain CountObservations where Lens = Pain for themeIdentifies problem areas
Gain CountObservations where Lens = Gain for themeIdentifies positive experiences
Tension IndexAbsolute difference: Pain - GainSpots conflicting feedback

I didn't accept the formulas blindly. I stepped through every calculation, questioned the methodology, and cut the metrics that didn't hold up under scrutiny. The Tension Index didn't pass that test. It didn't make it into the final recommendations.

AI enabled a level of analytical sophistication I couldn't have achieved alone. Every formula was validated by hand before it was used.

A Few Hours. A Report the Whole Team Could Act On.

I built two documentation worksheets into the spreadsheet before the report stage: a Start Here tab explaining the spreadsheet's purpose, and a Key / Column Definitions tab detailing every column, its values, and how they were calculated. Both were originally built for stakeholders.

They turned out to be the most effective prompting tool I had. Starting a fresh ChatGPT conversation to avoid context bleed, I fed it the completed spreadsheet and had it read both tabs before drafting anything. The structure I'd designed to help stakeholders understand the data became the structure ChatGPT used to write about it.

The format I outlined: Executive Summary, What We Learned, What We Don't Know, Next Steps. ChatGPT drafted each section. I rewrote for tone, nuance, accuracy, and plain English, then ran everything through Grammarly before final edits.

What would have taken several days took a few hours. The strategic framing was mine throughout.

Speechless. Then a Reusable Framework.

I presented the spreadsheet at the Product Team Show and Tell. The Head of Product, not someone given to effusive reactions, said one word: "Speechless." The Product Designer I line managed was more precise: "You've found a way of analysing qualitative research using quantitative analysis methods. That's impressive."

The whole team asked about the scoring methodology. It became a reusable template for future studies and the team's standard framework for turning research into roadmap decisions. Top Opportunities drove planning conversations. Most Friction surfaced quick wins.

The study also gave leadership confidence that member research could be run efficiently at scale, which led directly to more discovery work being commissioned across the product.

Time saved:

ApproachTime
Without AI~11-12 days (9 days synthesis + 2-3 days report writing)
With AI~9.5 days (9 days synthesis + few hours report writing)
Efficiency gain~20%

Where AI Fits Across the Rest of My Research Practice

Turning Insights Into Stories Stakeholders Remember

Generic insights don't move stakeholders. "37% of members feel anxious about running out of miles" lands differently when it becomes a specific person's story: a 45-year-old teacher who bought By Miles to save money on her commute, who books a trip to Scotland and suddenly can't work out whether she'll run out of miles, whether to top up now, or what happens if she goes over. Her anxiety isn't about cost. It's about control.

That reframe, from cost comprehension to confidence and control, changed the entire solution approach.

I use a prompt framework built around Pixar's storytelling principles: emotional journey, friction points, what the member needs from us. AI drafts the narrative. I refine for authenticity, tone, and strategic accuracy. What would take an hour takes twenty minutes. The insight and the framing are mine throughout.

Using AI to Pressure-Test My Own Thinking

I always generate my own thinking before opening any AI tool. A timer set to fifteen minutes, or a minimum of ten ideas produced manually. The reason is deliberate: AI anchors thinking. See its output first and it shapes what you generate next. My framing comes first. AI helps me find what I might have missed.

In practice: abstraction laddering done manually, then ChatGPT asked for an alternative ladder to compare against. Problem statements flipped myself, then five more inversions requested to stress-test the framing. The first three steps of a Five Whys taken manually, then ChatGPT pushed to go deeper, with the results checked against what the research actually shows.

What AI hasn't done in any of these exercises is reframe a problem better than I would have manually. It surfaces options faster. The judgment about which options matter is still mine.

Finding Competitors I'd Never Have Found Manually

For competitive analysis, AI is useful for surfacing insurers outside the UK that share By Miles' proposition but aren't direct competitors. Pay-per-mile startups in Scandinavia, usage-based insurance disruptors in South America or Australia. These are companies I'd have found eventually through LinkedIn or Wellfound, but AI gets me there faster.

Most of the time. There have been occasions where manual research surfaced companies ChatGPT didn't return. AI accelerates the scanning. It doesn't replace the judgment about which competitors are worth learning from and why.

Deliberately Rough Visuals. Deliberately Fast Decisions.

I use DALL-E to generate rough visual concepts for workshops and stakeholder discussions, deliberately low-fidelity. Three to five different approaches to communicating a concept, produced in minutes, before a designer commits time to any of them.

The point isn't polish. It's to surface which strategic direction is worth pursuing before the real work begins. Sacrificial by design.

Documentation That Used to Take Two Hours. Now Takes Thirty Minutes.

Interview discussion guides, research plans, workshop summaries: I outline the structure and key points, ChatGPT drafts the sections, I refine for tone, accuracy, and context. A two-hour task becomes thirty minutes.

The time saved isn't a minor efficiency gain. Across a full research study, it compounds.

A Final Pass for Tone, Clarity, and Plain English

Before any research report or stakeholder presentation leaves my hands, it goes through two tools. Grammarly catches grammar and style issues. Claude checks tone, clarity, and accessibility, with a particular focus on plain English and scannability.

I use Claude rather than ChatGPT for this stage because it handles British English more reliably and is more nuanced about tone, particularly in communications aimed at senior stakeholders. Different tools for different jobs.


When AI Fails Completely. And Why That's Worth Documenting.

For complex strategic questions, I experimented with Google Gemini's Deep Research feature. Member attitudes toward safer driving, the use of driving data for personalised insurance. Questions that would take days to research manually.

Gemini produced a polished report in minutes. Credible-looking sources, complete with author names, publication dates, and URLs. Every one of them invented. The information it presented as established fact was fabricated from nothing.

Fact-checking that output took longer than doing the research manually would have. For strategic questions that require accuracy and depth, AI isn't a shortcut. It's a liability.

For that work, I rely on manual research, member surveys, qualitative interviews, CX team insights, first-party data, and previous studies. Sources I can verify. Evidence I can stand behind in a stakeholder meeting.


The Work AI Doesn't Touch

Live user interviews. Watching research videos. Strategic decisions. Final recommendations without validation. Stakeholder persuasion. Design critique. Assumption mapping. Roadmap prioritisation.

These are the parts of the practice AI doesn't touch. Not because the tools aren't capable enough yet. Because they require human judgment, domain expertise, and contextual understanding no prompt can replicate.

Live interviews need empathy, real-time adaptation, and the ability to follow a thread the participant didn't know they were pulling. Research videos need someone who can read what a participant does against what they say. Strategic decisions need business context and accountability. Roadmap prioritisation needs stakeholder alignment and risk assessment that lives outside any spreadsheet. Assumption mapping needs years of domain knowledge to get right.

AI informs. It doesn't decide. That line doesn't move.


What Makes This Methodology Defensible

No PII Leaves My Hands. Ever.

Working in insurance sharpens PII discipline in ways that carry across every regulated industry. Before anything leaves my hands, I scrub names, postcodes, policy numbers, email addresses, phone numbers, and any detail that could identify a participant. Every time, without exception.

In regulated industries, one data breach destroys trust and triggers regulatory action. Rigorous PII removal isn't just an ethical obligation. It's a practical competitive advantage in fintech, healthtech, legal, or any sector where data governance is non-negotiable.

One Task. One Participant. One Conversation.

ChatGPT loses context in long threads. It confuses participants, misattributes feedback, and hallucinates. One task per conversation, one participant per conversation. Start fresh every time.

In the 400+ observations study I learned this the hard way: feeding multiple participants into a single conversation caused ChatGPT to attribute one participant's feedback to another. The rule came from that mistake. It hasn't changed since.

Raw Data Is the Final Authority. Not AI Output.

Every AI output gets cross-referenced with the original transcripts. Every insight, every quote, every theme. Roughly 15 to 25% of ChatGPT's output needs correction: hallucinations, misattributions, generic insights with no grounding in the data, sentiment misclassifications that sound plausible but don't hold up.

That validation step isn't overhead. It's where the strategic judgment happens. AI surfaces patterns. I determine which ones matter and why.

Garbage In, Garbage Out. Structure the Data First.

AI can't read FigJam. It struggles with unstructured text. The workflow that made the 400+ observations study work: observations captured in FigJam for spatial thinking, transferred to Google Sheets for structure, then fed to ChatGPT in clean, analysable format.

The quality of AI output depends entirely on the quality of human-prepared inputs. Structure the data first. Everything else follows from that.

AI Enables Sophistication I Couldn't Achieve Alone

The hybrid scoring mechanism is the clearest proof of this. Building it manually would have taken two days and produced something far less sophisticated. AI made it possible in a fraction of the time and produced something the product team could interrogate, filter, and act on.

The same principle applies across smaller tasks: generating additional How Might We statements after I've exhausted my own thinking, validating abstraction ladders against AI-generated alternatives, drafting Pixar-style journey narratives I'd otherwise spend an hour writing. AI extends the range of what's achievable. The strategic thinking that makes it useful is still mine.

Map the Workflow Before Touching Any Tools

The moment a research study begins, I'm mapping the AI workflow: where it can act ethically and efficiently, where it can't, what prompts I'll need, and where the risks sit. Planning happens before any tools are opened.

The 400+ observations study worked because the workflow was designed before the research started. Manual observation capture first, because AI can't watch videos. PII removal before anything moved. Structured data transfer to enable analysis. AI-assisted sentiment scoring. Human validation at every output. AI-assisted report drafting. Human editing and strategic framing throughout.

Every step was intentional. That's why it worked.


Plan Before You Prompt. Everything Else Follows.

Every study starts with the same question: where can AI act ethically and efficiently, and where can it not? The answer changes by project, by method, and by what the research needs to produce. The discipline of asking it before touching any tools is what makes the methodology work.

Context shapes everything. A well-structured prompt built on clean, organised data produces something useful. An improvised prompt fed into a long, unfocused conversation produces output that undermines the research rather than accelerating it.

AI can't watch videos. It can't read body language. It can't make strategic decisions or build stakeholder trust. But planned deliberately, used selectively, and validated obsessively, it produces research that's faster, more sophisticated, and more defensible.

That's what this methodology delivers. And it's replicable across any research practice, in any regulated industry, at any scale.


What You Get When This Methodology Is in Your Team

Research completed 20% faster, without sacrificing depth or rigour. Two days per study freed up for strategic work rather than synthesis and documentation.

Hybrid scoring systems that turn qualitative research into quantitative prioritisation frameworks. Defensible, filterable, and reusable across future studies. The product team at By Miles now uses the methodology I built as their standard for turning research into roadmap decisions.

PII discipline and regulatory rigour built in from the start. A practical competitive advantage in any industry where data governance is non-negotiable.

More time on the work that moves products forward: stakeholder influence, problem reframing, discovery planning, and product collaboration. Less time on the parts AI can handle.


If you're looking for a researcher who treats AI as a methodology rather than a shortcut, I'm available for permanent, fully remote roles at Lead or Senior level.

Next

Designers Grew. Practices Stuck. And Years Later, the People I'd Backed Were Still Backing Me.

I Lead Through Research, Influence, and Evidence. The Work Makes the Case, Not the Title.