Is the writing I submit used to train AI?

No. Your writing is used only to run the simulation and is never used to train models. This is stated in our Terms of Service, and right after analysis you choose whether to delete or keep the original yourself.

Can't I just ask ChatGPT to "review this from a woman in her 30s' perspective"?

You can, but the result isn't the same. ChatGPT imagines a single hypothetical reviewer — a 'Korean woman in her 30s' — from within its training distribution. Ilkim follows the Statistics Korea (KOSIS) distribution to draw N readers who, even within women in their 30s, vary in occupation, region, education, and interests. You see the distribution, not the average. ChatGPT also tends to rate your own writing favorably. Ilkim's personas aren't designed to flatter — on a dull paragraph they simply drop off, and that's what gets recorded.

ChatGPT is free — is there really a reason to pay?

To mimic 30 readers with ChatGPT, you'd prompt it 30 times and tally the answers into an average by hand. It takes over an hour, and you still don't get a distribution. Ilkim runs 30 personas at once in a single click and organizes the drop-off points, completion rate, and segment-level responses for you. What you save isn't money — it's time.

How closely do the personas' responses match real readers?

We target 70%+ similarity based on beta-user evaluation. During the beta we share cases where simulation results were validated against the real comments and responses on already-published writing.

Does this replace focus groups or surveys?

It fills the stage before them rather than fully replacing them. Outside quantitative research typically costs a few hundred to several thousand dollars and takes 2–4 weeks. Ilkim shows segment-level responses quantitatively in 90 seconds, right before you publish — so you can narrow down which hypotheses are worth researching, or settle smaller decisions without commissioning a study at all.

Can it analyze writing in English or Japanese, not just Korean?

For now it's specialized for Korean content and Korean readers, because the dataset follows the Statistics Korea distribution. Global expansion needs separate data infrastructure — it's on the roadmap, but the Korean market is the priority.

Company security makes it hard to send our writing to an external API. What then?

On-premise and private deployment options for in-house enterprise content teams are available under the Enterprise plan. We provide a package that runs on your internal GPU environment with no external transfer of processed data.

If I upgrade from Free to PRO, is my data kept?

Yes. All projects and analysis history are preserved. Downgrading is also possible — on downgrade only what exceeds the monthly limit is deactivated, and no data is deleted.

← All posts

Why A/B Testing Isn't Enough: Validating Content Before You Publish

June 16, 20265 min readIlkim Team

When you're unsure whether a piece will land, a lot of teams say "let's just A/B test it." Ship two headlines, two thumbnails, and let the data pick a winner. It's a reasonable instinct. But there's a clear zone A/B testing can't answer for you, and that's often where the budget quietly leaks.

What A/B testing tells you — and what it doesn't

A/B testing tells you which of two already-shipped versions won on average. What it doesn't tell you matters more.

A/B testing splits live traffic across two or more variants, compares a performance metric (click-through, read-through, conversion), and decides whether the difference is statistically meaningful. Because it measures real behavior on live traffic, it's a more trustworthy signal than a survey or an internal opinion.

The catch is when and under what conditions that signal arrives. A/B testing works only after you publish, and you can only trust the result once enough traffic has flowed through it. Those two preconditions break more often than people expect.

Three real weaknesses of A/B testing

A/B testing's limits don't come from the method being wrong — they come from its operating conditions being demanding. There are three big ones.

It's reactive. To see a result, you have to publish first. By then, the losing variant has already reached some of your readers. For newsletters, press releases, or print — things you can't quietly pull back — the lesson you learn only helps the next send.
It depends on traffic. Reaching a statistically meaningful conclusion needs a sufficient sample. A new blog with few visitors, a newsletter with a small list, or a freshly launched product page may take a long time to show a real difference — or never reach one.
It shows the average. By default, A/B testing tells you which version won on average. Who reacted strongly and who quietly left — the distribution of reactions — isn't visible out of the box. Segmenting reveals some of it, but each segment then needs its own sufficient sample, which only raises the traffic bar.

A/B testing is strong at "which of two won on average," but it can't answer "before I ship, who will read this and how."

That third weakness is tied directly to reach. Virality is usually decided not by the average reaction but by the tails of the distribution — the small group that reacts strongly and shares, and the segment that bounces at the first sentence. We went deeper on this in why asking a general LLM to role-play a specific reader misleads you.

The gap pre-publish validation fills

Pre-publish validation looks at reactions before A/B testing can even start — at the point where you haven't shipped to anyone yet. It sidesteps both the reactive nature and the traffic dependency.

The method is simple: before exposing the piece to real readers, you let a crowd that mirrors the real population read the draft first. Validating content before you publish lets you filter out concepts that pass internal approval but fail with actual audiences (marketingmag.com.au). The core move is shifting the measurement from "after shipping" to "before shipping."

A/B testing vs. pre-publish simulation

The two methods differ in when they run and what they look at. They're complementary, not substitutes.

Dimension	A/B testing	Pre-publish simulation
Timing	After publishing	Before publishing
Requirement	Enough live traffic	No live traffic needed
Unit	Average performance of two versions	Reactions of a crowd (N readers) that mirrors the distribution
Reveals	Which version won on average	Who reacts strongly and who leaves
Reversibility	Difficult (already exposed to readers)	Easy (free to revise before publishing)

So the safest order is: filter weaknesses with a distribution before publishing, then fine-tune on live traffic with A/B testing afterward. It isn't one or the other — you're splitting validation into two moments in time.

How to add pre-publish validation to your workflow

Here's how to add a pre-publish step without dropping A/B testing.

Ilkim samples synthetic personas that follow the population distribution from KOSIS (Statistics Korea). They're people whose occupation, region, and interests are spread out the way the statistics say they are, and each reads your draft from their own vantage point and returns completion/drop-off, a score, and a comment. The result isn't one number — it's a distribution of reactions. This is built on NVIDIA's Nemotron-Personas-Korea dataset (CC BY 4.0) together with KOSIS distributions.

Finish a draft — define your target and write the piece as usual.
Simulate before publishing — submit the draft and let a crowd read it to see the distribution of reactions.
Fix the weak points — look at the tails, not the average, and fix where drop-off clusters.
A/B test after publishing — fine-tune the remaining variables on live traffic.

Frequently asked questions

Does this mean I should stop A/B testing?

No. A/B testing is a powerful way to measure real behavior on live traffic. It just works only after publishing and requires enough traffic, so it can't replace validation at the pre-publish stage. Use them as complements: filter with a distribution before publishing, then fine-tune with A/B testing after.

How do I validate content when I don't have much traffic?

Pre-publish simulation doesn't need real visitor traffic. Because it has a crowd of synthetic personas that mirror a statistical distribution read your draft, you can gauge reactions before publishing even on a new blog or a newsletter with a small list.

How can I know reader reactions before publishing?

Before exposing the piece to real readers, let a crowd of synthetic personas that mirror the population distribution read the draft and measure completion rate, drop-off points, and scores. It moves reactions you could previously see only in post-publish metrics to the pre-publish stage.

In short: A/B testing picks the better of two already-shipped versions on average, but it works only after publishing, requires enough traffic, and doesn't show the distribution of reactions by default. Let a crowd that mirrors the distribution read your draft before publishing to filter out weaknesses, then fine-tune with A/B testing afterward — and the two methods cover each other's gaps.

A/B testing
Pre-publish validation
Content marketing