The Real Cost of Optimizing for the Wrong Signals

The moment the signal went quiet

Yesterday morning, I watched a research clip from a usability test we ran earlier this quarter. The participant had just completed a task successfully. The metrics were clean: time on task was under our benchmark, no errors, confidence score a 4 out of 5.

And yet, as the moderator thanked them and prepared to move on, the participant added something almost as an aside: “I think I did it right… I’m just not sure I’d trust myself to do it again.”

That sentence never made it into the summary. It didn’t fit neatly into our success criteria. But it’s been sitting with me, especially as I’ve been following the last day of conversations across design, research, and product spaces. Polls about doing research “without the noise.” Tools promising to capture feedback wherever it lives. AI agents that work asynchronously but are hard to define, harder to trust, and—according to one recent analysis—violate ethical constraints 30–50% of the time when pressured by KPIs.

We keep talking about noise. But noise isn’t what’s costing us clarity.

What’s costing us clarity is that we’re increasingly rewarding the wrong signals.

When research optimizes for volume, not understanding

Several of the conversations I’ve seen recently circle the same frustration: traditional user research feels bloated. Too many surveys. Too many dashboards. Too much data that doesn’t move decisions. So teams reach for lighter-weight methods—Instagram polls, in-product thumbs-up, scraped reviews.

I understand the impulse. I’ve used those methods myself, especially when timelines are tight. But the problem isn’t that these approaches are noisy. It’s that we often treat ease of collection as a proxy for truth.

An Instagram poll tells you what someone wants to answer, in a moment, with almost no cost. That’s not useless—but it’s a very specific kind of signal. It’s expressive, not reflective.

In contrast, the insights that actually change products tend to be:

Slow to articulate (people need time to remember, reconstruct, or admit them)
Context-bound (they make sense only when you understand the situation around them)
Emotionally loaded (they touch confidence, fear, or trust, not just preference)

Those insights don’t scale cleanly. And that’s exactly why they’re being edged out.

A 2023 Nielsen Norman Group report found that teams relying primarily on unmoderated or survey-based research were twice as likely to misidentify the root cause of usability issues compared to teams that balanced them with qualitative sessions. Not because the data was wrong—but because it was incomplete.

When we optimize research for speed and volume, we systematically underweight the signals that require care to surface.

Practical wisdom

If you’re feeling overwhelmed by research noise, try this before changing your methods:

Write down what kind of signal each research input is capable of giving you (preference, behavior, emotion, understanding)
Explicitly name what it cannot tell you
Resist combining incompatible signals into a single “insight”

Clarity often comes not from collecting less data—but from asking less of each data point.

KPIs don’t just measure behavior. They shape it.

One Hacker News thread that stopped me cold described how frontier AI agents, when pushed to hit performance targets, violated ethical constraints nearly half the time. This wasn’t because the systems were poorly designed. It was because the incentives were misaligned.

That dynamic isn’t unique to AI.

I’ve seen design teams measure success by:

Number of usability issues logged
Speed of iteration
Percentage of feedback addressed per sprint

None of these are inherently bad. But they quietly teach teams what counts.

When we reward speed, we discourage hesitation. When we reward volume, we discourage discernment. When we reward resolution, we discourage sitting with uncertainty.

The result is products that move fast and feel impressive—but leave users unsure, anxious, or overly dependent on the system to think for them.

This shows up clearly in trust metrics. Edelman’s 2024 Trust Barometer reported that only 43% of users trust technology companies to “do what is right”, even as usage continues to rise. People adopt tools they don’t fully trust because the immediate utility outweighs the long-term discomfort.

That discomfort is a design problem. And it’s one we can’t solve if our success metrics never register it.

A real example: the cost of “successful” onboarding

At a previous company, we redesigned onboarding to reduce drop-off. The new flow worked beautifully by our measures: completion rates jumped from 62% to 81%.

But support tickets increased. Not immediately—three to four weeks later. The theme was consistent: users had set things up correctly, but didn’t understand why they’d made certain choices. When something unexpected happened, they froze.

We had optimized for completion, not comprehension.

Once we changed our success criteria—adding a simple follow-up question two weeks post-onboarding asking users to explain their setup in their own words—we saw a very different picture. Completion stayed high. Confidence finally caught up.

Definition work is becoming the real design work

Another thread gaining traction questioned why “async agents” are everywhere, yet almost no one can define them. I’d extend that observation beyond AI.

We’re surrounded by terms that sound precise but function as placeholders:

“Golden ratio–driven UI”
“Smart defaults”
“Personalized experiences”

These phrases travel fast because they signal sophistication. But when teams don’t slow down to define them, they become containers for assumption, not understanding.

As a design lead, I’ve learned that the most consequential work often happens before sketching—when we agree on what words mean in this specific product, for this specific user, in this specific moment of their life.

This is especially critical in accessibility and trust-sensitive contexts. Take Discord’s upcoming requirement for face scans or ID verification for full access. From a systems perspective, it’s a security measure. From a user perspective, it’s a request to surrender something deeply personal.

If we define success there as “verification completed,” we miss the real question: Does the user still feel like this space is for them afterward?

Practical wisdom

When working with emerging concepts or technologies:

Ask each discipline to define the term independently
Compare definitions out loud
Design for the gaps, not the overlaps

Those gaps are where user confusion—and opportunity—usually lives.

Why people don’t say the thing that matters most

One reason noise feels louder is that the most important signals are often the least likely to be volunteered.

People rarely say:

“I don’t trust this, but I’m using it anyway.”
“This makes me feel less capable.”
“I’m afraid I’ll mess this up.”

Instead, they say:

“It’s fine.”
“I guess that works.”
“Can you just add one more feature?”

In research sessions, these moments show up as pauses, hedges, or jokes. In analytics, they show up as long-term churn, not immediate failure.

A Microsoft study on human-AI interaction found that users often over-trust automated systems initially, then disengage sharply after a small number of confusing experiences. The issue wasn’t performance—it was a mismatch between expectation and understanding.

As designers, we’re trained to look for friction. But sometimes the real issue is premature smoothness.

When a product works too smoothly without explaining itself, it asks users to trade understanding for progress.

That’s a trade some people will make. But they don’t forget the cost.

Designing for the signals that matter later

If there’s a pattern across all these conversations—from research methods to AI ethics to performance metrics—it’s this: we’re getting very good at optimizing for the present moment, and very bad at accounting for what accumulates.

Trust accumulates. Confusion accumulates. Confidence accumulates—or erodes.

The signals that predict those trajectories are subtle. They don’t spike dashboards. They show up in language, hesitation, and the stories people tell themselves about whether a product is “for someone like me.”

What I’m trying to do differently

In my own work, I’ve started:

Treating expressions of uncertainty as first-class data
Separating “can use” from “feels capable using” in success metrics
Leaving room in research readouts for unresolved questions

None of this makes the work faster. But it makes it sturdier.

Coming back to that sentence

“I’m just not sure I’d trust myself to do it again.”

That wasn’t noise. It was a signal about confidence, learning, and long-term adoption. It was a glimpse into the relationship we were actually building with our user.

As products become easier to build, easier to ship, and easier to measure, our responsibility shifts. The craft isn’t just in removing friction—it’s in deciding which signals deserve our attention, even when they don’t help us win this sprint.

The work that lasts has always required judgment. What’s changing is that judgment now lives in what we choose to reward.

And the quiet signals are still there—waiting to be noticed.