Is the writing I submit used to train AI?

No. Your writing is used only to run the simulation and is never used to train models. This is stated in our Terms of Service, and right after analysis you choose whether to delete or keep the original yourself.

Can't I just ask ChatGPT to "review this from a woman in her 30s' perspective"?

You can, but the result isn't the same. ChatGPT imagines a single hypothetical reviewer — a 'Korean woman in her 30s' — from within its training distribution. Ilkim follows the Statistics Korea (KOSIS) distribution to draw N readers who, even within women in their 30s, vary in occupation, region, education, and interests. You see the distribution, not the average. ChatGPT also tends to rate your own writing favorably. Ilkim's personas aren't designed to flatter — on a dull paragraph they simply drop off, and that's what gets recorded.

ChatGPT is free — is there really a reason to pay?

To mimic 30 readers with ChatGPT, you'd prompt it 30 times and tally the answers into an average by hand. It takes over an hour, and you still don't get a distribution. Ilkim runs 30 personas at once in a single click and organizes the drop-off points, completion rate, and segment-level responses for you. What you save isn't money — it's time.

How closely do the personas' responses match real readers?

We target 70%+ similarity based on beta-user evaluation. During the beta we share cases where simulation results were validated against the real comments and responses on already-published writing.

Does this replace focus groups or surveys?

It fills the stage before them rather than fully replacing them. Outside quantitative research typically costs a few hundred to several thousand dollars and takes 2–4 weeks. Ilkim shows segment-level responses quantitatively in 90 seconds, right before you publish — so you can narrow down which hypotheses are worth researching, or settle smaller decisions without commissioning a study at all.

Can it analyze writing in English or Japanese, not just Korean?

For now it's specialized for Korean content and Korean readers, because the dataset follows the Statistics Korea distribution. Global expansion needs separate data infrastructure — it's on the roadmap, but the Korean market is the priority.

Company security makes it hard to send our writing to an external API. What then?

On-premise and private deployment options for in-house enterprise content teams are available under the Enterprise plan. We provide a package that runs on your internal GPU environment with no external transfer of processed data.

If I upgrade from Free to PRO, is my data kept?

Yes. All projects and analysis history are preserved. Downgrading is also possible — on downgrade only what exceeds the monthly limit is deactivated, and no data is deleted.

← All posts

What Is a Synthetic Persona vs a Marketing Persona?

June 17, 20266 min readIlkim Team

When you want to know how a draft will land before you publish it, more teams are reaching for a synthetic persona — the idea of letting an AI-generated reader read your draft first. But "synthetic persona" gets used loosely. What is it, exactly, and how is it different from the marketing personas teams already build?

What is a synthetic persona?

A synthetic persona is a population of simulated readers generated to match a statistical demographic distribution. The key is that it is not data about a real individual — it is a profile synthesized to resemble the makeup of a population.

Both "synthetic" and "population" are key concepts here. "Synthetic" means the profile is generated from data rather than drawn from one real person's records. "Population" means it is not one reader but many — people whose jobs, ages, regions, and interests are spread out the way the real population is. Each one reads the same piece from its own vantage point, finishes or drops off, and leaves a score and a comment. The output is not a single verdict but a distribution of reactions.

How is it different from a marketing persona?

The biggest difference is one versus many. A marketing persona imagines a single representative person to define your target; a synthetic persona generates a crowd that mirrors the population and shows you the distribution of reactions.

A marketing persona is something like "Jane, 35, working parent" — one representative figure you build to align a team's strategy. That is useful for direction, but it is a fixed profile built from assumptions, so it cannot tell you how a specific piece of writing will actually split a real audience.

Dimension	Marketing persona	Synthetic persona
Unit	One representative (imagined)	A crowd matching the distribution (N people)
Basis	Team assumptions, a few interviews	Statistical demographic data
Output	A fixed profile document	Individual reactions (completion, score, comments)
Question it answers	"Who is our target?"	"Who reacts strongly, and who leaves?"
Refresh	Manual, rare	Re-simulated per piece

The two are not rivals. You use a marketing persona to decide who you are speaking to, and a synthetic persona to check — before publishing — how that piece reads across a varied audience.

How is a synthetic persona built?

A synthetic persona is only as trustworthy as what it was synthesized from. A profile invented out of thin air and one anchored to an official statistical distribution produce very different results.

For example, Ilkim draws its synthetic personas from the population distribution of KOSIS (Statistics Korea). Even within "women in their 30s," the profiles scatter across jobs, regions, and interests the way the real distribution does. The data is built on NVIDIA's Nemotron-Personas-Korea dataset (CC BY 4.0) together with KOSIS distributions. The starting point is "a statistically grounded crowd," not "one imagined person."

The industry view lines up with this: synthetic personas deliver the most value when they are grounded in credible signals, clearly labeled as synthetic, and used to generate falsifiable hypotheses (deepsona.ai). The difference is between dressing up a baseless profile and tying it to a real distribution.

Why a crowd instead of one person?

Because content succeeds or fails in the tails of the distribution, not at the average.

A small group that reacts strongly shares your content and reach explodes; a particular segment that bails in the first sentence can sink real reach even when the average score looks fine. "Not bad on average" hides both risks. You cannot see those two tails through a single imagined evaluator or one representative persona.

Averages create a comforting illusion of safety. What actually makes or breaks a piece is the minority far from the average.

The same trap shows up when you ask a general-purpose LLM to role-play one reader — covered in depth in why asking ChatGPT to "evaluate this as a woman in her 30s" is risky.

What can a synthetic persona validate (and what can't it)?

A synthetic persona is powerful before you publish, but it does not replace real customers on purchase or pricing decisions. Knowing where it fits is the whole skill.

Good fits

Estimating how a draft will read — completion rate and likely drop-off points — before anyone sees it.
Surfacing weaknesses in your title or opening before exposure.
Validating without live visitor traffic — especially useful for new blogs or newsletters with small audiences (see why A/B testing alone isn't enough).
Checking content that is hard to reverse once it ships, like print or press releases.

What it does not replace

Any question whose answer would change a business decision — pricing strategy, a buying choice — still needs validation with real customers. A synthetic persona is a fast way to pre-screen hypotheses, not a stand-in for a real human voice.

So the safest order is complementary: pre-screen weaknesses with synthetic personas before publishing, then confirm with real reader data afterward. For a more concrete pre-publish workflow, see how to preview reader reactions before publishing.

Frequently asked questions

Is a synthetic persona the same as a marketing persona?

No. A marketing persona defines one imagined representative to align strategy; a synthetic persona generates a crowd that mirrors a population to reveal the distribution of reactions. Seeing many readers instead of one is the decisive difference.

Is a synthetic persona based on real people's data?

No. It is not any individual's information — it is a profile synthesized to follow a statistical demographic distribution. That lets you estimate population-like reactions without collecting or exposing personal data.

If I validate with synthetic personas, do I still need real audience research?

It complements rather than replaces it. Pre-screen weaknesses without traffic using synthetic personas before publishing, then fine-tune with real reader data afterward — the two cover each other's blind spots.

In short, a synthetic persona is a crowd of simulated readers generated from a statistical demographic distribution. Unlike a marketing persona that imagines one person, it shows the distribution of reactions and lets you screen weaknesses before publishing — without live traffic. It does not replace real customers on final decisions, so pairing pre-publish synthetic validation with post-publish real data is the safe play.

synthetic persona
AI reader simulation
content validation