Article

Stop Estimating Stories Like They’re Mini Waterfalls

Why “team sizing” beats siloed estimates, and how to make it work

By Gary Worthington, More Than Monkeys

Let me start with a small request: Can we all agree to stop calling it estimation?

The word “estimate” suggests we’re aiming for some kind of precision. A guess yes, but one rooted in accuracy, hours, and individual effort. It leads teams down the road of false certainty. “It’ll take three days to code and one day to test, so that’s four points.” No, it’s not. That’s just an illusion of control, and it’s holding your team back.

We should be calling it sizing, because that’s what we’re really doing…looking at a piece of work and saying, “Compared to other things we’ve done as a team, how big is this?”

And yet, here we are. Still breaking work down like it’s 2006.

The Mini Waterfall Problem

You’ve probably seen it. Sprint planning becomes an internal negotiation:

  • “This is a 3-pointer for dev.”
  • “QA says it’s a 2.”
  • “Well, Dev is busy this sprint, so we’ll say 5 total.”

The work hasn’t even started, and already it’s split across roles. Coding first. Then testing. Then fixing. Then maybe releasing. That’s not Scrum. That’s a mini waterfall, and it’s wrapped in the comforting blanket of team velocity. But it’s not close to being agile and its not honest.

Let’ me be clear; estimating like this breaks the whole point of working as a cross-functional team.

Stories Should Be Sized by the Team, Not the Role

If your team estimates stories based on how long each role thinks it will take, you’re building a broken plan because that’s not how software gets delivered.

A story is only done when it’s potentially shippable. Not when it’s coded. Not when it’s “with QA.” Done means tested, integrated, and valuable. And the work required to get there is everyone’s job.

When we size stories, we’re not trying to divvy up the work like tasks on a Gantt chart. We’re trying to get a shared sense of the total effort, risk, and complexity to deliver a working, finished outcome.

Real World Example: The Misleading “3 from Dev, 5 from QA”

I was working with a team that picked up a story:

“As a user, I want to download my account statement as a PDF.”

The dev looked at it and said:

“It’s about a 3. We’ve got the data model, we just need to plug in the export service and generate the PDF.”

Then QA chimed in:

“That’s probably a 5 from me. I’ll need two and a half days to validate formatting, check multiple account types, and do cross-browser download tests.”

Now stop and think about what just happened.
QA didn’t estimate complexity or risk — they estimated time. And that’s where the process broke down.

Everyone nodded, the story went in as an “8,” and it blew the sprint apart.

Why? Because:

  • The dev estimated relative complexity — a 3 compared to similar export features.
  • QA estimated absolute time — “two and a half days of work,” which they loosely translated into a 5 using a hidden points to time conversion matrix.
  • The team combined those mismatched estimates into an 8 without questioning the logic.
  • No one clarified what Done actually looked like — formatting edge cases, download fallbacks, or error handling weren’t defined.

What Went Wrong?

The estimates weren’t connected to team effort — they were disconnected guesses:

  • Dev used relative sizing based on past features
  • QA converted a to-do list into hours, then hours into points

This is what happens when teams confuse sizing with time estimation. You get mismatched expectations and unreliable forecasts.

If the team had sized the story together, the conversation would have looked different:

  • “Let’s walk through what Done means for this story.”
  • “What makes this risky?”
  • “Where might we get stuck?”
  • “Are we clear on what we’re validating?”
  • “Does anything make this more complex than a typical export?”

That would likely have landed the story as a 5 or 8, but with alignment. And with QA flagging test complexity before planning, not during testing.

Why Sizing as a Team Matters

Here’s what you gain when you size as a team:

1. Shared Understanding

When the whole team talks through a story and agrees on a size, everyone gets clearer on:

  • What needs doing
  • Where the risk lives
  • What “done” actually means

It becomes about alignment, not division.

2. Reliable Near Term Forecasting

Using velocity to help with Sprint Planning only works if your data is consistent. If you estimate coding separately from testing, you’re not tracking the true cost of delivery and your historical velocity becomes meaningless.

3. Less Finger-Pointing, More Ownership

If QA is pressured because “Dev said it was a 2-pointer,” the trust in the team breaks down. But if everyone agreed it was a 5 based on the entire flow to production, then everyone owns the outcome.

Common Objections (And How to Tackle Them)

“But QA doesn’t know how long dev will take.”

Perfect. That’s why we talk as a team. QA will ask good questions. So will DevOps. That’s the point. Sizing is a conversation, not a solo guess.

“We need separate estimates for resource planning.”

Then do resource planning elsewhere. Sizing is for the team’s understanding and sprint forecasting. If you want time tracking, use time tracking tools. Don’t conflate them.

“We always do dev first, then test — why not size them separately?”

Because that’s not agile. That’s sequential. The whole reason we work in cross-functional teams is to collaborate on delivery, not take turns like we’re on a relay team.

Sizing Sessions: How to Do It Right

  1. Use Planning Poker — Get the whole team involved.
  2. Talk in Terms of Effort + Risk + Uncertainty — Not hours.
  3. Anchor to Golden Stories— “This feels like the same size as X.”
  4. Don’t Get Hung Up on Points — Use T-shirt sizes if you like. The goal is alignment.
  5. Challenge the Split-Mindset — If people say “dev is 3, QA is 2,” ask “what’s the size to deliver this story, start to finish?”

But What If You Do Want to Track Role-Based Effort?

You can! Just don’t conflate that with story sizing.

Track individual effort separately in your time tools or capacity planning exercises. But let story sizing remain a team exercise focused on delivering outcomes, not measuring input.

Final Thought: Language Shapes Behaviour

“Estimation” invites false precision. “Sizing” invites relative judgement.

And “this is a 5-pointer” means “we know what this is, and we know it’ll take meaningful effort to ship it.”

When you shift your language and your process, you shift the mindset.

Stop splitting the story. Stop estimating in silos. Start sizing like a team that delivers together.

Let’s stop pretending we can divide delivery up by discipline and still call it agile. It’s not, never was and never will be. Scrum teams succeed when they swarm and collaborate, not when they segment.

If you want your team to improve flow, forecast better, and actually work like a team. Start by sizing stories the way you deliver them: together.

Gary Worthington is a software engineer, delivery consultant, and agile coach who helps teams move fast, learn faster, and scale when it matters. He writes about modern engineering, product thinking, and helping teams ship things that matter.

Through his consultancy, More Than Monkeys, Gary helps startups and scaleups improve how they build software — from tech strategy and agile delivery to product validation and team development.

Visit morethanmonkeys.co.uk to learn how we can help you build better, faster.

Follow Gary on LinkedIn for practical insights into engineering leadership, agile delivery, and team performance