Debugging Velocity: October 2019

Wednesday, 16 October 2019

How to choose a useful measure of incremental progress for your team

Recently I had an interesting call with a senior QA leader. He reached out to me He wanted to get a better sense of how his people are doing, both as a functional unit and individually. Primarily, I suspect he wanted to be proactive, and have some kind of a numerical early warning system in place, which he could cross-reference with common sense and qualitative input he got elsewhere.

As we spoke, he kept using the term "velocity" initially; however, it became clear that he meant velocity in a much looser sense than the typical iterative scrum/agile sense. It doesn't really work for what he wanted to achieve.

Here's what I mean:

Core metrics to baseline progress iteratively

What is velocity anyway?

Velocity itself is first and foremost a team output metric, not an individual one. It is a measure of story points completed over a unit of elapsed time.

It gives visibility on whether the product development team is functioning effectively--as a system for generating new features. In this context, new features are what the customer is expected to value most, so we track only that. It is not an efficiency measure, and shouldn't be confused for one. Traditionally this approach came from a software development environment, but can be applied anywhere there is significant complexity and thought required. Knowledge work.

These story points are the primary "raw material" to generate estimates relative to a goal or target date. Once you have a sense of:

who you're building it for and why
what you want to build, i.e. the actual stories defined
and you have estimated the stories using story points

then the dance around the iron triangle begins.

When the product or project work starts, you keep track of how many story points are completed over time. You use to improve future planning. Usually this works in "sprints", which are predetermined lengths of time, as a way to plan and track progress. For example, in the popular flavor of agile called scrum, these will typically last 1-4 weeks.

Realized velocity

Let's use 2 weeks as an example. The newly formed team has started working on a new product or project. The backlog of items is defined and estimated for the absolute "must have" features.

At this point, if you're being completely transparent, you don't know how fast the team will actually go. You can also negotiate what exactly is "must have" to help reduce the time required (less work, done faster). And ideally you'll also all agree on a quality standard that everyone is ok with--which will also have schedule implications (higher bar takes more time per feature on average). So your initial realized velocity/sprint is 0, and you have a guess as to what the expected velocity will be.

You agree (with the team) which stories will be accomplished in the first sprint. And after 2 weeks, you sit down with the team, and compare what actually happened with what you'd hope would happen. At this early stage, there are likely to be a lot of learning outcomes in general, as it's a new effort. But among other things, you can add up the story points completed by the team. This is your first realized velocity.

Expected velocity

After 3 sprints, you should start to see some kind of a trend to emerge in terms of an average velocity. Sometimes it's worth giving the team the benefit of the doubt, as they might pick up the pace once they get their collective heads around what needs to be done.

Usually this number will be significantly different than your expected velocity for the dates you'd like to hit. If you calculate the total story points needed for the "must have" initial release, and divide it by the realized velocity so far. To simplify the thought process, assume it will stay fixed.

This gives you a sense of how many sprints of work will be needed to hit that final date. Usually, there will be a gap between what's happening vs. what's expected. It's best to know this as early as possible. In fact, this transparency is one of agile's strengths. It's difficult to sugarcoat reality, if you see what is being delivered. Moreover, you also see how many initially estimated story points of cognitive effort were realized.

Warning: This type of analysis can cause some healthy consternation and discussion. This is intended. Using this performance data, you can re-prioritize, change resourcing levels, change scope, or whatever else you think might help the team at that stage.

Expected velocity is the ideal pace you'd like to keep, in order to hit your business goals. Often, in more traditional environments, this will be expressed in terms of a target release date. But it can also be in other forms, depending on what's actually important to the business as a whole.

The core difference between realized and expected velocities is their time orientation. The former measures the velocity trend in the recent past. The latter is more of a business requirement, translated into a number. Expected velocity is a practical way to "have a relationship with your target date". This is a metric which translates longer term expectations into an early warning system checked regularly. When compared to your realized velocity, you'll know whether or not your teams are going too slow to hit your dates.

Cycle time

Cycle time comes from a lean background. It's a measure of how long it takes to build one unit of output. In practical terms, it's a measurement of the elapsed time from the start to the end of your production process.

= time(end of process) - time(start of process)

It includes both the actual time spent working by the team, but also all of the wait time in between steps of the process.

Unlike story points, the unit of measurement is time. This is probably cycle time's greatest strength. Time can be subject to arithmetic, statistics like mean and standard deviation, even compared across various aggregations (e.g. among QA team members). It's also less subjective, as there is not estimation required up front. It's just measured continuously. It gives you a sense of what's been happening. And how healthy your process is.

Now for the downsides. Cycle time implicitly assumes:

that the units of output are pretty standard, uniform, and therefore of similar size
when aggregated, that there is no difference between types of work. For example, building new features and fixing bugs in already built features doesn't take the same amount of time.
that there is no goal. It only measures efficiency not effectiveness

Cycle time works well, as a metric, in software for two scenarios:

When stories aren't estimated but just all broken down to be a maximum expected length of 2 days per story for example.
When working on maintenance development, where general process monitoring is needed so that extremes can be investigated but where time pressures tend to be issue & person specific and not team-wide

Takt Time

Takt time operates within a similar framework to that of cycle time. However, instead of measuring what has been happening, it's used to quantify expectations so that they can be continuously monitored.

In a nutshell, takt time measures the slowest expected rate at which you need to complete production processes in order to meet customer demand. It's calculated as

=net production time / total output needed by customer

There are a few numerical examples over here, if you want to take a peek.

Anyhoo, there are a number of really helpful attributes of takt time. It expresses expectations numerically, in terms of how much time should be spent on each item in order to hit a target output. For example, if takt time is 10 minutes, evety 10 minutes you should be pushing out another unit. If you are faster, great! If not, you need to troubleshoot and improve your production process, resources, or context.

The "total output needed by customer" can be measured in just units, e.g. number of stories. This way you don't need estimation and estimation won't introduce subjective bias.

Like expected velocity, it gives the team a number to help establish an operational relationship with a longer term goal or target (that has business meaning). In the moment.

Isn't this all a bit abstract and self-referential?

Yes. It is.

The primary measure of progress in an agile framework is "working software". Or to be more general, demonstrably completed work. It's demoed for everyone to see and comment, and should be done in a generic way so that anyone can participate (i.e. not only people with PhDs in Computer Science). Anyone should be able to see the new features working.

That said, not everything is software. And not all software has a user interface. So it's a bit harder to apply this, particularly in the early days of a new product.

In that case, you can use these metrics to monitor effectiveness and efficiency. You can hold both yourself and the team accountable. You have a numerical framework to deliberate with stakeholders, one that can be checked at any given moment, where you don't need to "check with the team" every time someone wants an update. And like the senior QA manager above, you can use this as a proactive early warning system. If one of a number of efforts is going off the rails, and you oversee a number of them, you'd naturally want some way of knowing that something is off.

So that's the menu. Which one to choose?

It depends where you are in your efforts, how much time you want to spend on estimation itself, and how much you need to make comparisons.

Where you are in your efforts:

Early on in a project, you have a lot of unknowns. They tend to be interdependent. For example, in order to give a date estimate, you need to agree on what you're building, and how you're building it. That might depend on the market segmentation or main business goals you want to achieve, which also might need to be negotiated. And if you tweak any one of these, all the rest are also affected.

At this point, if you add technical estimation with story points for granular tasks the mix, you expose even more uncertainty to the whole thing. You might be better off delaying story point estimation. And just use cycle time until you have a clearer picture. This way, you maximize the team's time on delivering actual work, rather than on estimation under conditions of high uncertainty, and both business and technical complexity.

Once you get to a stable team and vision and roughly stable scope, it might be worth doing some estimation and prioritization of the bigger epics. Follow this with the breakdown (into stories) and estimation of the highest priority epic or two. If your initial scope is very large, you'll spend a lot of time estimating something you don't really understand very well yet (yet another reason to be deliberate and precise with your initial release).

How much time you want to spend on estimation & monitoring:

This is a more general question about the ratio of time spent doing vs. monitoring the work. Estimation is a tool to help you monitor and measure the work. Ideally, it's good to do some estimation, so that you can slot in work tactically. In particular, it's most useful when considering the business value generated and comparing it to the amount of work required to complete it.

But estimating out a year's worth of work, especially if there are no releases to customers during that entire period--that's a notch short of madness. Ideally your releases should be tight and getting feedback both from individual customers and also the market as whole.

How much you need to make comparisons:

Like in the example opening this blog post, if you want to measure and compare individual or team efficiency, then cycle time is easily comparable. This is because the "denominator" is the same in all cases: elapsed time:

You can compare cycle time across various team members, ideally if they are doing similar work, for example QA.
Also you'd be able to compute averages to compare between teams, i.e. QA across different teams.
Standard deviation in cycle time can also be useful to figure out what is truly exceptional, so that you diagnose and troubleshoot (if bad) or repeat (if good)

Next steps

That should hopefully give you enough to get started. The next step is choosing which is most relevant for you, and figuring out how to gather the raw data from internal company systems. Ideally, this is done automatically & behind the scenes using software, so that your teams don't need to enter data manually, esp. time spent.

Key Takeaways

Velocity is a team based output metric that tracks story points completed over time.
Estimation can improve accountability and prioritization, but it costs time and is subject to bias.
Keep customer facing releases small, as this will improve your accuracy and estimate variability.

Wednesday, 9 October 2019

Why estimating cognitive effort simplifies knowledge work

"There were only an estimated two to five thousand humans alive in Africa sixty thousand years ago. We were literally a species on the brink of extinction! And some scientists believe (from studies of carbon-dated cave art, archaeological sites, and human skeletons) that the group that crossed the Red Sea to begin the great migration was a mere one hundred fifty. Only the most innovative survived, carrying with them problem-solving traits that would eventually give rise to our incredible imagination and creativity." --Marc Sisson

Imagination fed into our uniquely human ability to cooperate flexibly in large numbers. So fast forward to today. Our most valuable and exciting work, particularly in the context of innovation, still relies on our ability to imagine what needs to be done, start, and continuously course correct.

In this case, we're using imagination to structure and agree how work needs to happen, and to map that to a subjective estimate of effort.

First, we imagine what needs to be built, why it needs to be built, and how it needs to work. Then, we subdivide the big overall vision into lots of little pieces, and divvy it up among a group of people who go execute on the vision. Before they do that, though, these people imagine, analyze, and discuss doing the work involved on each specific piece of the overall vision. They all need to agree how much effort it will take to complete that task. If there are differences of opinion, they should be ironed out up front.

If done successfully, this generate full buy-in and alignment from everyone involved. Even if the end product isn't a physical thing, this approach works. The benefits of trusting people and harnessing all their energy and imagination far outweigh the inherent risks. It's already done by tens of thousands of teams around the world already in various digital industries including software.

Relative Cognitive Effort is what we're imagining.

The key number that is used for tracking this is a measure of how much "cognitive effort" was completed over a predetermined unit of time. Agile and scrum use the concept of a story instead of tasks, in order to help describe complex needs in a narrative form if needed. Usually this includes elements such as: user problem, what's missing, acceptance criteria for the solution required. Therefore, the unit of measure for the cognitive effort expected to complete a story is called a story point.

Each story is sized in terms of story points. Story points themselves are quite abstract. They relate to the relative complexity of each item. If this task more complex than that one, then it should have more story points. Story points primarily refer to how difficult the team expects a specific task to be.

For example, it's more precise to track the number of story points completed in the last 2 weeks, than just the raw number of stories completed. As stories can be of different sizes.

Now it's time for a few disclaimers...

1. Story points are not measures of developer time required.

Cognitive complexity isn't necessarily the same thing as how time consuming it will be to achieve. For example, a complex story may require a lot of thought to get right, but once you figure out how to do it, it can be a few minor changes in the codebase. Or it could be a relatively simple change that need to be done many times over, which in and of itself increases potential complexity and risk of side effects.

The main purpose of story points is to help communicate--up front--how much effort a given task will require. To have meaning for the team, it should be generated by the team who will actually be doing the work. These estimates can then be used by non-technical decisionmakers to prioritize, order, and plan work accordingly. They can then take into account the amount of effort and trade it off with expected business value for each particular story.

2. Story points related to time are a lagging indicator for team performance.

The key, though, is that story points shouldn't be derived as 1 story point = half day, so this item will be 3 story points because I expect it will take 1.5 days. This type of analysis can only be done after the fact, and related to entire timeboxes like a 2 week period. Instead, the team should be comparing the story they are estimating to other stories already estimated on the backlog:

Do you think it will be bigger than story X123? Or smaller?
What about X124?

The team needs to get together regularly and estimate the relative size of each story, compared to every other story.

This generates a lot of discussion. It takes time. And therefore estimation itself has a very real cost. Some technical people view it as a distraction from "doing the work". Rightly so.

3. Story Points assume you fix all bugs & address problems as you discover them.

Only new functionality has a story point value associated. This means that you are incentivized to creating new functionality. While discovering and fixing problems takes up time, it doesn't contribute to the final feature set upon release. Or the value a user will get from the product.

Anything that is a bug or a problem with existing code needs to be logged and addressed as soon as possible, ideally before any new functionality is started, to be certain that anything "done" (where the story points have been credited) is actually done. If you don't do this, then you will have a lot of story points completed, but you won't be able to release the product because of the amount of bugs you know about. What's worse, bugfixing can drag on and on for months, if you delay this until the end. It's highly unpredictable how long it will take a team to fix all bugs, as each bug can take a few minutes or a few weeks. If you fix bugs immediately, you have a much higher chance of fixing them quickly, as the work is still fresh in the team's collective memory.

Fixing bugs as soon as they're discovered is a pretty high bar in terms of team discipline. And a lot will depend on the organizational context where the work happens. Is it really ok to invest 40% more time to deliver stories with all unit testing embedded, and deliver less features that we're more confident in? Or is the release date more important?

4. One team's trash is another team's treasure.

Finally, it's worth noting that story points themselves will always be team-specific. In other words, a "3" in one team won't necessarily be equal to a "3" in another team. Each team have their own relative strengths, levels of experience with different technologies, and levels of understanding how to approach a particular technical problem.

Moreover, there are lots of factors which can affect both estimates and comparability. It wouldn't make sense to compare the story point estimates of a team working on an established legacy code base with a team who is building an initial prototype for a totally new product. Those have very different technical ramifications and "cognitive loads".

Conversely, you can compare story points over time within one team, as it was the same team who provided the estimates. So you can reason about how long it took to deliver a 3 story point story now vs. six months ago--by the same team only.

Wait, can't Story Point estimation be gamed?

As a system, story points gamify completing the work. Keen observers sarcastically claim they will just do a task to help the team "score a few points".

But then again, that's the idea behind the approach of measuring story points. To draw everyone's attention to what matters the most: fully specifying, developing, and testing new features as an interdependent delivery team.

Moreover, all of this discussion focuses on capacity and allocation. The key measure of progress (in an agile context) is working software. Or new product features in a non-software context. Not story points completed. If you start to make goals using story points, for example for velocity, you introduce trade-offs usually around quality:

Should we make it faster or should we make it better?
Do we have enough time for Refactoring?
Why not accumulate some Technical Debt to increase our Velocity?

Story points completed are only a proxy for completed features. They come in handy in scenarios where you don't have a clear user interface to see a features in action. For example, on an infrastructure project with a lot of back-end services, you might not be able to demo much until you have the core

Example: Adding technical scope to an already tight schedule

On a client project, I had a really good architect propose and start a major restructuring of the code base. It was kicked off by his frustration with trying to introduce a small change. A fellow developer tried to add something that should have taken an hour or two, but it took a few days. The architect decided the structure of the project was at fault.

Yet, this refactoring started to go into a few weeks. The majority of the team were blocked on anything important. He was working on an important part of the final deliverable. While the work he was doing was necessary, it would have been good to figure out any elapsed time impact on the overall deliverable, so that it could be coordinated with everyone interested.

As the sprint ended, I proposed we define the work explicitly on the backlog, and estimate it as a team. This way, the architectural work would be a bit more "on the radar". There were around nine tasks left. The team said the remaining work was comparable across all of them, and collectively decided it was about a 5 story point size per item. So we had added roughly 45 story points of effort.

Knowing that the team was averaging around 20 story points per elapsed week, it became clear we had suddenly added 2 weeks worth of work--without explicitly acknowledging what this might do to the final delivery date. While the architect was quite productive, and claimed he could do it faster, there was still an opportunity cost. He wasn't working on something else that was important.

In this case, story points helped come up with a realistic impact to schedule that senior stakeholders and sponsors needed to know about. The impact on the initial launch date was material. So the estimation with story points helped provide an "elapsed time" estimate of an otherwise invisible change.

While not perfect, Story Points are primarily a tool for capacity planning, not whip cracking.

So to step back, you can see that story points are a useful abstraction which gets at the core of what everyone cares about: new product features. While subjective, for the same task--as long as it's well defined--most of the members of a team usually come up with pretty close estimates. It's kind of surprising at first, but eventually you get used it. And you look forward to differences, because that means there may be something that needs to be discussed or agreed first. That is the primary purpose of story points. As a side effect, it can help get to grips with a much larger backlog. And plan roughly how many teams need to be involved.

However, this approach only works within relatively strict parameters and disclaimers if you want the numbers to mean anything. It is at a level of resolution that proxies progress, but makes external micromanagement difficult. If you want the team to self-manage and self-organize, this is a feature not a bug of the whole approach. Ultimately the core goal is correctly functioning new features. Best not to lose sight of that.

Wednesday, 2 October 2019

How to simplify a complicated process, so that even a 2.5 year old would understand them

A few years ago, we had a significant challenge with our 2 year old daughter. Morning and evening routines were an uphill battle every day. Getting out the door to her childminder quickly enough to make my first meeting in the morning was often a drawn out battle of wills.

While it was clear she wanted to collaborate and appease us as parents, she didn't understand what we expected of her. Moreover, her brain development still seemed to behind. The neocortex doesn't really kick into overdrive growth until later. She was also awash hormones, which is completely normal for this age. This caused the temper tantrums typical for a two year year old. They're called "terrible twos" for a reason. We were also frustrated as parents, and we didn't know how to help her. Fundamentally, this was an issue of her feeling overwhelmed. And unable to sort out what's important from what isn't.

In a professional context, visualization works really well to help stop overwhelm. Whether this to map out a business process, plan a large scale software system, or figure out a business model, it always helps to have everyone involved "brain dump" onto post-its. And then to organize them. This usually unleashes a lot of latent creativity. Plus it helps front-load difficult discussions. You find out really quickly what the major challenges are with a new initiative.

How it all started

One Saturday afternoon, as I was watching her learn to draw on a coffee table, I had the idea to map out her morning and evening routines as a process. This would be analogous to a light weight lean value stream map or a business process Eventstorm. First and foremost, I wanted to do it with her, not to her. As she was already drawing and playing around, I felt a little more comfortable drawing my chicken-scratch cartoons. I don't feel like drawing was ever a personal strength of mine.

So I pitched it to her as a fun project we can do together. I pulled out some bigger post-its and a sharpie, and sat down at the coffee table with her.

First, I suggested that we brainstorm all of the things which she does in the morning. As she was coming up with specific actions, like eating breakfast, I would sketch out in cartoon format some kind of symbol of that particular activity.

As she clearly wasn't able to read yet, images reduced the cognitive load for her. And she was excited to see me draw things she understands on the fly. It isn't that common of a sight to be honest. As an output of each suggestion, we drew out a specific green wide post it, and put it on a coffee table.

Once we had a handful of these, I suggested a few others which she might have missed. I also suggest a few which were incorrect, just to make sure she was paying attention.

After this, we moved to a "converging phase". I suggested that she take the post its and put them in order on the wall. We had to do it together in practice, but the key was that I gave her the final say in the actual order. I was holding the relevant two postits, and asking questions like do you "eat breakfast" before you "descend the stairs"? Doing this multiple times, we came up with a chonologically ordered list of post-its that reflected her morning routine.

At that moment, she seemed to step back and view the whole process. And she was absolutely beaming, proud of both us for doing it together. But also happy that she finally understood what her parents were on about every morning. I think this was all because she felt less overwhelmed.

So she felt confident that she will now be able to achieve what is expected of her. Because she understood what is expected of her for the first time in her life!

Wrap up and implementation

We then did the same thing thing with her evening routine on dark blue post its. And ordered it the other way, finishing with her in bed and falling asleep.

When thinking about it, I realized that some of the activities are performed on the ground floor of our house. And some on the first floor, where her bedroom and the bathroom was. So I unwrapped a brown paper roll, ripped off two pieces about a meter long, and sat down with my daughter. We put all of the ground floor post-its on one brown paper square, and all of the first floor post its on the other.

upstairs process mapped out, with modifications/corrections from my daughter

Finally, we hung up the ground floor post its in our dining room, and the first floor post its in her bedroom. So in the end, she had a detailed map of her daily routines, organized chronologically and physically near the place where she would actually do them.

What happened in practice

My wife and I were shocked at how effective this was. The daily tantrums nearly disappeared completely overnight. If there was push-back from her, it lasted 15 seconds not 15-45 minutes as it frequently did in the past.

The fastest way to help her calm down, when she looked like she was about to blow up, was to walk her over to the post its. Then ask her where we were at that moment in the process. She would point to the relevant one. The emotions would calm down, as this required some cognitive load from her. And we could continue on with the rest of the routine that morning or evening.

About a year later, as I was putting her to bed, she said

"Daddy, that picture there is wrong" pointing at the one where she brushes her teeth.

"Oh realy, what do you mean?" I asked.

"By the time I am brushing my teeth, I'm already wearing my PJs, not a dress".

She was absolutely right. The next weekend, I drew out a version of the same post-it with her avatar dressed in a pajamas.

Her brain development caught up to understand what this map meant. She had full ownership of the process, because she'd been involved from the beginning. And most importantly, she could call out specific ideas for improvement.

Lessons learned

This experience made it clear to me how powerful the principle of visualization actually is.

It can help make sense of initially overwhelming complexity, by putting everything "out there" on the wall rather than in your own head.
It helps participants feel empowered and in control of what is happening, thus improving motivation once decisions are made.
It helps everyone involved to view a situation more objectively, both big picture and into the smaller details (what should i draw on your plate when you eat breakfast?)
In the case of my daughter, the increased clarity and reduced overwhelm also helped with emotional regulation. While (hopefully) not as necessary in a professional environment, it's good to know that this a welcome side effect.
It doesn't even require the ability to read or write.

Visualizing waste and complexity is a very powerful way to help get a grip on it. Clearly, the visual component speaks to us at a primordial level. Cavemen drew images. Medieval religious communication was all based on paintings and images.

In software terms, this would be like the kernel of the operating system. So you really get through to the root causes of problems and address them, rather than just yelling louder and pressuring people-regardless of age-who don't act according to your wishes.