And how this fits inside real user research workflows
Mark has been a marketing researcher since 1984, starting his first company that year and never really stopping. Across five companies and four trade associations, he has focused on helping clients understand customers, make better decisions and avoid chasing false signals.
Today, he is doing that with AI. As regional director for the AI Collective in the US Southeast, and a long time independent researcher, Mark is experimenting with digital twins and synthetic personas as a way to extend qualitative research rather than replace it.
In this episode of How I Vibe Design, he walks through how he builds synthetic users, why citations matter more than “vibes”, and where this practice fits inside real research workflows. This article captures the biggest takeaways from that conversation.
When I talk about synthetic users, I am usually talking about two different things: personas and digital twins. People often mix them together, but they behave very differently in research.
Personas, which are composite, not a person. When I build a persona, I am blending insights from many people who share similar traits and contexts. In marketing language, it sits between an ICP and a segment. If I am working with a baby products company, for example, I might have younger urban moms, rural moms with more space and different budgets, dads who do most of the shopping and grandparents who spoil the kids. Each persona is defined by demographics, attitudes and archetypes and their role and KPIs on the client side.
A key point for me: for personas, I strip out PII. I am not trying to mirror any one individual. I am trying to represent a recognizable type in the market.
A digital twin is the opposite. A twin is based on one real individual with a clear digital footprint and often a direct relationship with me or my client. To build a twin, I pull together:
What I am trying to capture is not just biography. I want their belief system, decision logic and operating style. I use digital twins for people who are:
I always get consent when I do this for client work. In some cases I will offer to remove any identifying markers so the twin is de identified.
TLDR; Personas are composite and anonymized. Digital twins are grounded in a real person, used with permission and built from their own words.
Most of the negative press around synthetic users is aimed at a pattern I try to avoid: people let the model invent everything. If you say, “Give me a synthetic Head of Product for a SaaS company,” and stop there, the model will happily blend internet stereotypes into something that looks polished but has no lineage. It feels like insight, but it is an average of content, not a person or segment.
My approach is the opposite. I start from trusted, cited sources and treat synthetic users as structured research artifacts.
For every persona or twin, I want:
Even opinions should be anchored in specific places. I want to be able to say, “This comes from what they said in this talk,” or “This came from this paper.” If I cannot point to a source, I do not consider it part of the brain.
I prefer tools that automatically surface citations, show if a link is broken or suspect and let me limit answers to my own corpora when needed. In other words, a synthetic user is only as trustworthy as the body of data behind it. The safest way to work is to make that body small, clear, and fully cited.
I like to call the knowledge base for a persona or twin a brain. Technically, it is just a curated corpus, a bit like a small RAG system. But the “brain” metaphor is useful when I explain the idea to clients. Here is how I build one.
Step 1: Map the digital footprint
For a twin, I start by mapping where their public presence actually lives. That usually includes:
If it is someone I have interviewed, I add:
I also disambiguate people with the same name so I am not mixing two lives together.
Step 2: Consolidate into a single document
Once I have a list of sources, I consolidate. Instead of uploading 20 different files, I:
That PDF is the brain for that twin or persona. It is much easier to manage and reason about one file per brain.
Step 3: Name it clearly and date it
Every brain file follows a simple naming convention, for example:
SaaS_Product_Leaders_2025-12-05_MB.pdf
Where I include:
This tells me at a glance how fresh the brain is, and who curated it.
Step 4: Upload into my analysis environment
I then upload this PDF into a workspace that lets me:
Once that is in place, the twin is ready for structured interviews. If this sounds abstract, the easiest way to start is to build a twin of yourself.
That is often what I do in workshops. We:
The value is not in having a twin. The value is in how you talk to it and how you validate its answers. I try to ask grounded questions that the sources can reasonably answer.
For example, “How has this person talked about risk and uncertainty in past talks?”, “When they evaluate a new vendor, what tradeoffs do they emphasize?”, “Based on their public writing, how might they respond to this concept?”. These are grounded in the data. If I know a rough true range for an answer, I can tell when the twin is drifting.
What I avoid are fantasy prompts like: “Invent a new product for them.” That is not research any more, that is creative writing.
I routinely ask twins obviously irrelevant questions as a stress test. If I have a twin of a Delta Airlines pilot and ask them how they would design a candy line for toddlers, and I get a confident, detailed answer, that tells me the system is stepping outside the brain. A good synthetic user should say something closer to: “This is outside my area and the sources do not cover this.”
For each twin, I carry a mental map of what is in scope including the core domain, where they are experts, adjacent topics they mention sometimes. This also helps me rule out areas that are clearly irrelevant.
When a synthetic user speaks with a lot of confidence far outside the domain of the sources, I treat that as a model problem, not a human insight.
Lastly, in the setups I use, I make it explicit that I want no sycophancy and just direct, specific feedback. I want the behavior of a careful, slightly conservative participant, not a hype machine.
I do not see synthetic users as a replacement for research. I see them as a way to extend what good qualitative work already does. Here is where they have been most useful in my practice:
When there is real money or policy on the line, I still go back to live humans and proper methods.
There is a lot of skepticism about synthetic users right now, and honestly, much of it is justified. If your process is describe a generic buyer persona and let the model hallucinate one without showing any sources — then yes, that should be challenged. You are mixing marketing fantasy with a probabilistic model and calling it insight.
However, if you start from grounded, cited data, keep the corpus narrow and intentional validate the behavior with pressure tests and use the outputs as directional, not definitive, then synthetic users can become a practical extension of research, not a shortcut around it.
It is hard to change minds with arguments alone. It is easier to show a twin built from real interviews and public talks, walk through the sources, and compare that to a generic, invented persona. The difference is obvious once you see it side by side.
How 0xDesigner uses MCPs, Codex, PRDs, and agent-driven workflows to build faster, and stay focused on product intent.
Curiousity got me to use Cursor, Unicorn Studio, Perplexity and get 118k impressions online

List of icons companies are using for Deep Research, Thinking, Create, etc.
for designers and product teams in the new AI paradigm.
