Over the past few weeks, I’ve written a lot about AI survey fraud, human verification, and the $10 million fraud case involving Op4G and Slice.
That story raised a lot of justified fear but also some confusion.
Specifically, around one term that keeps getting thrown around:
Synthetic data.
Let’s clear this up now:
- Synthetic data is not inherently bad
- It is not the same thing as survey fraud
- And yes, it can be useful in market research and CX
But here’s the catch: Only if it’s built on a foundation of verified, real human feedback.
So What Is Synthetic Data?
Synthetic data is data that’s artificially generated by algorithms to reflect the statistical structure of real-world data, without obtaining that data from actual individuals.
It’s often used for:
- AI model training
- Simulation of rare events
- Privacy protection in healthcare
- Controlled testing environments
It’s not “fake data pretending to be real.”
It’s simulated data, clearly labeled as such, used in specific contexts.
Let’s Be Clear: This Is Not the Slice Case
The Op4G/Slice scandal involved:
- Fraudulent survey responses
- Real people pretending to be other people
- Manual VPN masking and coaching scripts
- And $10M+ in deception
That wasn’t synthetic data.
That was fraud.
Synthetic data, when used correctly, is generated transparently and it’s never passed off as actual human response data.
When Synthetic Data Can Be Useful — With One Major Condition
There are valid use cases for synthetic data in market research and CX, such as:
- Training AI models to detect sentiment or classify feedback
- Stress testing different experience scenarios
- Simulating low-incidence populations or segments
- Privacy-conscious modeling in healthcare environments
But here’s the key:
Synthetic data in market research or CX is only useful if it’s based on high integrity, verified human feedback.
If your base data is unreliable, if it's fraud, filler, or bot-generated, then your synthetic data isn’t simulation. It’s distortion.
And that distortion compounds with every layer of analysis.
This is why PeopleMetrics is so focused on human-verified data:
- Our custom research panels are built from the ground up with recruited, vetted, real people
- Our CX data comes directly from real customer transactions
- When we use online sample providers, we have a rigorous vetting process for each partner before we use them and multiple check points once we receive the data
If we can’t prove it’s a real human, it doesn’t get in our datasets!
That’s the only way synthetic modeling becomes trustworthy, when it’s built on a foundation of truth.
Synthetic ≠ Strategic (Unless It’s Transparent)
Synthetic data can help you simulate, prototype, or scale ideas.
But it can’t tell you:
- What real customers feel
- Why are they frustrated or delighted
- What’s truly working (or broken) in the experience
Only real humans can do that.
So use synthetic data if:
- You disclose it
- You understand its limits
- You base it on verified ground truth
And never use it as a shortcut for talking to real people.
Bottom Line
Synthetic data isn’t the problem.
In market research and CX, synthetic data can complement human insight. But it can’t replace it and it can’t be trusted unless it’s modeled on verified, fraud-free data.
If you start with fiction, you’ll scale fiction.
If you start with truth, you can scale trust.
The difference is everything.
Up next:
Post 10: The Future Belongs to the Real
Comment Here!