Tag: story points

Estimation antipattern – Comparing teams’ velocities

In this article, we’re going to examine another antipattern around story point estimation. Last time we discussed the fallout of using story points as targets. Now, we caution you against using story points to compare teams. It’s a common enough trap, but we must hold ourselves back. By the end of this article, you’ll know why such comparison is a meaningless exercise. 

What’s the comparison about?

To put it simply, using story points to compare team has accusatory implications. “Hey this other team is doing more points per iteration than your team, so that makes them better.” So Team A could be doing 20 points and Team B could be completing 12 points, and because Team A’s velocity is greater, it’s the better team. When put like that, we will naturally witness Team B desperately try to outdo Team A, or at least match its velocity. End result: Pressure builds up, the environment can get ugly, team members may resort to workarounds to meet the numbers, and little thought will go toward what should really be in the focus – the quality of the software that’s going into production. 

Here’s why such comparison is meaningless 

Despite the exercise having no real meaning or value, we still find many teams engaging in it during story point estimation. We reiterate that such thinking, i.e., comparing teams on the basis of completed points, is totally flawed, and here are our reasons for taking this stance: 

Every team is unique

Each team has its own distinct personality and comes with diverse skill sets and levels of experience. No two teams are alike in their inter-personal as well as working relationships – how well they gel together, how effectively can they collaborate, etc. Then there are more factors like do they have any dependencies, what’s the environment in which they work, and so on.

All these factors contribute to the uniqueness of each team. Hence it is completely unfair and impractical to compare velocities because it would be impossible to create two teams that have the exact same skills and the same personalities and do the same kind of work in the same environment. Velocity is a reflection of many things happening within a team – As we saw above, each team comes with its unique composition, skills, experience, and dynamics. All of these have a bearing on how long that team will take to complete a story and how much they can pack into one iteration. This output will vary across teams and there is no ground for comparison.

The understanding of a point is not common across teams – The basis of comparison is itself skewed. As we saw earlier, comparison takes the form of: Team A completed 20 story points but Team B could complete only 12 in the same duration. What we fail to realize is that each team’s understanding of what makes up 1 story point is limited to that team and it will differ across teams. This understanding of how big is the backlog is not standard; it’s relative to each team. What Team A considers to be 1 point can be very different from the perspective that Team B takes. Note that 1 point is just a numeric representation for a “small” story. And 4 points would be just a numeric representation of a large story. But the buckets of what is a small story is only relevant and meaningful within the team. How then can we compare their performance and efficiency on the basis of the number of points completed?

A standard definition of 1 point doesn’t work

To counter the above problem, some folks go a step further and say everyone must agree upon a standard definition of 1 point, so everyone can align the sizes for stories. While this sounds reasonable this is very difficult to achieve in practice. The cost of the bureaucracy required to ensure this especially on a large program OR for the entire organisation is hardly worth the ability to compare teams’ velocities. 

The nature of each team’s work can be different

So let’s say there are two teams in the organization that are fundamentally doing very different kinds of work. One is building some API’s and the other is building a mobile app. It’s a no-brainer that the tech landscape and the kind of complexities faced by each team are extremely different. In such a scenario, comparing how fast they are completing their points is fundamentally meaningless!

Similar work cannot be compared either

Let’s look at cases where teams could be involved in similar work, such as both teams are building on top of a common API. One is building an Android app and the other an iOS app. It may seem like they are doing similar work, but even then, it’s not so straightforward that we can begin comparing their velocities. That’s because contributing factors such as environment, skills, etc. continue to remain different. 

Let’s look at a typical development scenario where two teams are working on the same backlog – one is onshore, the other is offshore. On one hand, we can safely assume that that the onsite time will go faster because the Product Owners are sitting next to them, providing real quick answers or feedback. On the other hand, this same situation may work to their detriment. The team members may find themselves getting distracted because the client stakeholders are right there. The offshore team doesn’t face that pressure and may be able to focus more effectively and actually end up with a higher velocity. But, truly, can we conclude that they are really more efficient? We cannot, because the comparison itself has no real meaning. 

How to avoid the antipattern of comparing teams using story points

The above reasons illustrate the futility of such a comparison. That’s why we strongly recommend against comparing teams. It’s not a valuable exercise to say one team is better than the other by itself. But if at all you have to compare or you want to compare, then track the trends of each team, such as the level of predictability, frequency of releases, or the business impact they create. 

Predictability

Have the teams set up short-term goals or iteration-level targets and then track how often each team is able to meet its targets. The targets can vary across iterations, depending on factors like yesterday’s weather. But if a team says on a Monday that in the next two-week-long iteration, they will complete 10 points, then how often do they actually achieve that completion? Or how close do they get to it every time? This level of predictability can be compared, if at all one really has to.

Frequency of releases

The more frequently the team is putting software into production and into the hands of real users, the more value they are creating. So you can compare how frequently each team is able to create such value.

Business Impact

Mature XP teams don’t even bother with intermediate output; they are more concerned with the outcome, i.e., the business impact the team is able to create. The way you do that is to have the team sign up for key OKRs. These could be key business results they want to achieve such as improving customer conversion or reducing churn in customers. That’s the business objective they work toward. And whether they deliver software to achieve that or do something else – that’s entirely up to the team.

However, this extreme practice of only measuring business outcomes may not be feasible in every software development scenario, but the takeaway from that is to not just measure output, but measure on these more meaningful aspects.

However, it’s again important to bear in mind that you shouldn’t be comparing on parameters such as the revenue each team is able to generate. That’s because each team functions in a different context and could have distinct strategies. One team’s focus could be revenue generation, another team could be driving consumer engagement–which again eliminates any common ground for comparison. So, even if we’re comparing on the parameter of business impact, the OKRs are not to be compared. How far each team is able to achieve its OKRs is more important. 

By now, you have got a chance to understand the various reasons to avoid comparing teams in itself. Comparing teams on velocity and the points they’ve delivered has absolutely no meaning. If at all you have to compare and want to compare, then you can certainly find ways to make the comparison more relevant and meaningful. The outcome of such comparisons has more value. 

How Story Points Make Our Life Better

The entire series on estimation has focused on the inherent problems of time-based estimates. We introduced and recommended story points as a more viable option to time estimates. In this series finale article, we look at story point estimation more closely and understand how it resolves the issues that crop up with time estimates. 

Creates confidence by asking the right questions

A prime concern with time estimates is that they are almost always wrong. That’s because you’re asking too complex a question–combining two aspects, viz., how big something is and how fast you can go. Story point estimation makes things easier at the very outset by separating these two questions. 

Further, it simplifies the “How big?” question by asking how big the stories are relative to a benchmark we’ve set, which is typically the smallest story in the backlog. This process of Relative Sizing or allocating a size to a story by comparing it with a standard can be completed quickly. In addition to speed, this process generates a high level of confidence in the sizes. Infact, we need not call them estimates at all; they are definite sizes. There is no scope for ambiguity and there is no need to revisit or change the sizes.  

The second of the 2 sub-questions that story points ask is “How fast can you go?”. We admit upfront that nobody really knows the answer. So we move on to the next best way to approach this question and that’s through informed and logical guesswork–using the exercise of Raw Velocity. This involves multiple developers picking up multiple and diverse items from the backlog with their estimates hidden and stating which ones they can finish in a given time period (an iteration). Then we add up the estimates for the items that were picked up to find the “gut feel” of the story points that can be completed in an iteration. We do this math over several rounds with each of the developers. The average derived thus reflects a gut feeling of the majority of the team regarding the time required to complete the given backlog. 

While it’s accepted as guesswork, these numbers evoke a higher level of confidence because the developers have answered a comparatively easier question–regarding their own capabilities and not that of someone else. Hence the answer is likely to be accurate. 

Moreover, we use the Raw Velocity numbers only for a short duration – during the first couple of iterations. After those initial iterations, we are in a position to make more informed velocity calculations because we have real data from the real work done during the first few iterations. Going forward, we use our judgement based on this data and not on the guesswork we started out with. Thus the answer to “How fast?” is now rooted in accuracy and we can proceed with confidence and conviction in our sign-up for the subsequent iteration.

Story point estimation, thus, instills confidence by asking the right questions. The responses to these questions leave no room for confusion or doubt. From one question we get definite sizes and the response to the other is grounded in real data. We can successfully avoid the feeling of being wrong–which is typically what happens in time estimation. On the contrary, we are now confident and convinced about how much the team can do especially in the short term.

Eliminates pressure

Time-based estimation also creates pressure–at the 2 stages of estimation and execution. Let’s dive deep…

During estimation

There are 2 problem-creating scenarios with time estimates during the estimation
phase:

  • The compound question that kickstarts time estimation is itself a pressure point because you’ve mixed up 2 sub-questions. In an attempt to respond to that compound question, you try to cover all possible scenarios and come up with precise numbers. Too much time is spent on seeking too many details too early on. 
  • The other problem is that you have too many options to choose from in terms of time period. This wide range of choices significantly raises the probability of errors. You may end up either over- or under-estimating. In either case, you end up feeling pressured. Typically you will react to such pressure by being extra vigilant about every possible scenario or then by adding buffers to play it safe. 

Story point estimation has no scope for such problems. Firstly, it limits the number of size options to just 3 or 4. All we’re doing is picking up each story and comparing it to a sample set in terms of size. Once we’ve assigned a size, it’s frozen. We don’t need any more details; we don’t need to revisit the sizes. The issues–and consequent pressure–of time estimates do not even exist in the story points world.

During execution

In a time estimates scenario, you may realize that the size of the task is different from what was assumed during planning, or that the pace of work is getting affected by extraneous conditions. However, you’ve already committed to getting an X amount of work done in a given time-frame. You naturally feel pressured because time is running out. 

Story points avoid this pressure by setting the right expectations from the start. We are cognizant that Raw Velocity is a guess and those numbers will change in the face of real work. We take a cue from previously completed iterations and incorporate those learnings into Estimated Velocity. The guesswork progresses into informed estimation, which definitely has more value. Over time, this boosts the team’s confidence too. 

Replaces negative pressure with positive ambition

Using Relative Sizing and Velocity, we’ve eliminated the pressure of completing an X amount of work in a given period of time. With that pressure gone, what we experience is a positive emotion and the aspiration and ambition to strive for continuous delivery and improvement. 

Having sifted out questions that have no real meaning, we zone in on the most relevant ones in our effort to achieve continuous delivery – What should we take up in the next iteration? What will be of highest value to go next? With these questions, we strive to make the delivery process better, more productive and centered around excellence. 

Empowers the team

Story points create a transition from individual performance and individual goals to team goals. The commitment now is to put out a set of features into production. The onus is on the team as a whole and so developers are likely to help each other and even help other roles to complete what has been started. There is no rush to start new items from the backlog. The emphasis is on completing what’s been started–in the right sequence, with the right quality, with mutual collaboration as a team. The debilitating competitive pressure gives way to a positive environment and a collective intention to achieve. The atmosphere is still charged with high intensity, but it’s energizing and collaborative.

Retains the focus on completion

At this point, it’s important to remember that story points work around a rule of averages without any direct conversion of points into days. If we still did the conversion in our minds , we would lose some of the benefits intrinsic to this type of estimation. There can be no blanket rule that a story of X points will always get done in Y days. Conditions are fluid and each story and each developer has their own pace. If this is accepted, then we’ve reduced some of the pressure of dates and time commitments faced by developers.

However, it’s imperative to be aware that doing away with the pressure of time commitments does not equate to development meandering along at its own pace or the absence of accountability around completion. We definitely must ask these pertinent questions, but with the intention and aim of removing impediments and achieving progress. The ticking clock must not be allowed to assume nightmarish proportions. Developers should breathe easy and strive for excellence and continuous improvement instead of struggling for self-preservation. 

Enables effective tracking

Time-based estimation poses a tracking hazard because we cannot segregate how much time was spent doing real development work and how much was lost in breaks or leaves. If we do attempt to track break and leave time, it would lead to micromanagement of the team, and we do not want to go there. 

Separately, in time estimates, there is no way to apply the learnings from previous work to what will be taken up next. For example, if one story gets delayed, we cannot predict the time completion for the next story. 

Story point estimation is able to counter such problems. Velocity is an average across multiple developers and multiple iterations. After the initial stories, the guesswork is replaced by insights gathered from real work and real data. We are in a good position to estimate how much time will be required to complete the next story, based on our learnings from previous iterations. This is valuable and it helps boost the team’s confidence in its own capabilities. 

Similarly, if we have visibility into who’s going to be away next week, we count them out of capacity and make adjustments accordingly. For e.g., if the full team’s velocity is 10, we might only plan for 8, knowing that there will be 2 less on the team. 

Clarity that results from learning from previous iterations or from accounting for leaves and breaks prevents us from erroneously concluding that the team hasn’t been working hard enough if our targets aren’t met. The team’s morale isn’t affected adversely and we stay focused on guiding the next iteration in the correct way, using practices that can ensure accomplishment and achievement as a team. 

Targets continuous delivery even if project is lagging

Despite the best intentions and care, the project can go off track. Imagine that it’s not possible to meet the target of 10 points per iteration. At such a time, story point estimation adopts a rational problem-solving approach instead of mindlessly indulging in the blame game. Since XP’s version of the Golden Triangle keeps Scope variable, we can proceed by going live with the most important, prioritized stories on the said date. The rest of the features can get added over the next few iterations. 

This ensures that we deliver on the committed date, even if it’s not the entire bulk of scheduled deliverables. This practice of Continuous Delivery instills confidence in the client stakeholders that there is no major delay harming the project. We’ve ensured that the most relevant features are already in production by the half-way mark, and what didn’t get completed will be delivered in the next few iterations, which will be just a couple of weeks out. We’ve succeeded in derisking the project to a large extent quite early in the game. 

Story point estimation makes life better

At the end of this series, we’ve firmly established that story point estimation and the XP way of planning makes estimation easier and more accurate. It creates the right atmosphere for individuals and the team as a whole. Software quality and excellence remain the star of the show throughout. Harmful after-effects and negative behavior patterns arising out of pressure find no place in this process. Stakeholders also experience a high level of confidence as they get a clear view into the project’s progress and can take efficient and timely decisions in response to possible changes. Story point estimation is a win-win for all parties, and it’s the only effective method for project estimation and planning. 

Answering the question “How fast will we go?”

To corroborate our advocacy of story points over time-based estimates, our last article recommended the use of Relative Sizing to answer the question “How big is this task?” We ended that piece by highlighting how the sizing process provides a good grip on the project’s scope. We now move ahead in the estimation and planning journey to the next question “How fast can it be done?”

“How fast?” has no concrete answer

The stark truth is that the question “How fast?” cannot really have a correct and definitive answer. The only honest answer is “I don’t know.” Many factors contribute to this ambiguity: 

  1. Team composition – How fast can the team get something done directly depends on who is on the team, what’s their experience with this specific technology, and their overall exposure and experience within the given domain. 
  2. Team dynamics – However, it’s not just about putting the right team together and ticking off the technical requirements. It’s important for the team to collaborate efficiently and harmoniously. They must be able to work well together in order to get things done. 
  3. Environmental factors – The ecosystem in which the team will work also affects their speed. For example, will they use fast machines, fast internet, and fast servers that serve data quickly? Or will it be virtual networking and logging on to remote terminals that can compromise the pace of work? Even if the team has ideal experience and exposure levels, such environmental factors will have an effect. 
  4. Business stakeholders’ response time – This impacts task completion in a big way. Are clients available to answer your questions as soon as required? Can anything in their work environment lead to deprioritization of your questions? Maybe they don’t have the answers ready and need to conduct research or ask around…delaying their response time and consequently the team’s progress. 

Answering the “How fast?” question helps planning

Given the above unknowns, the only honest answer remains “I don’t know.” Any claim to a precise response should raise suspicion because nobody can really know.

If not knowing is the only thing we really know, do we still need to even address this question? And the answer is yes, we do, because its response guides the important activity of planning. Planning is a must because it gives us a chance to uncover assumptions, become aware of risks, and get a clearer picture of the reality of the project. Of course, with the caveat that the response is going to be based on guesswork. 

Keep the guesswork logical

Within this framework, we must be as logical as possible while estimating how fast we will go. That can be achieved by making more people guess how fast they can go about various items in your backlog. Ask them questions about different stories, types of technologies, the UI, database, services, and the interplay between these. Gather all these myriad responses and average it out. So you’ve got as many inputs and insights as possible into a wide range of relevant concerns. Averaging all that will help you make headway–but it will still be guesswork. 

Raw Velocity provides a logical starting point

We recommend doing this guesswork logically by using the Raw Velocity exercise. Let’s quickly understand what it is and why do we recommend it: 

  1. Enlist 4 or 5 developers with diverse experiences and exposure to your technology and domain. The more diverse they are, the better off the results of the guessed work will be. 
  2. Assign Developer A a batch of 10 random stories from your backlog of say, 100 stories. Offer a mix of story points, features, etc.. Make sure you hide the size or points. Get an understanding from the developer how many s/he can do in 1 iteration (the timespan of an iteration is fixed, e.g., 1 week). Essentially you are asking Developer A how many stories s/he can get done in 1 week. 
  3. The developer assesses the size of the stories and gives the response. For e.g., s/he could say stories P, Q, and R from this batch and they turn out to be 2 Smalls and 1 Medium. So Dev A can do 4 points in 1 iteration. 
  4. Give Dev A more batches of 10, and at the end, average out their responses. 
  5. Conduct similar exercises with the remaining developers.
  6. Finally average across rounds and across people–and that average is your Raw Velocity. 
  7. You then factor in other variables such as paid time off, unplanned leave, etc. and calculate the team’s planning velocity. 
  8. This is then used to create an initial plan that is taken to the stakeholders and expectations are set. You can start development with the points you get from the Raw Velocity exercise. But 2-3 weeks later, revisit these expectations because the real picture and real feedback once you start work will input additional–and more valuable–lessons. This initial plan will keep evolving over time based on real, observed velocity.  

Advantages of Raw Velocity 

While we are fully aware that Raw Velocity also entails guesswork, the process is better informed. 

  • It is an average of multiple developers picking up multiple and diverse items from the backlog and doing the math over several rounds. The average arrived at thus reflects the gut feeling of the majority of your team. 
  • Each developer is answering a fairly easier question of how many stories they can complete in an iteration. It’s about their capabilities and not a notional response about a third person. Hence it has more potential for accuracy. 
  • This averaging exercise is done speedily–10 rounds with 4 developers each can be accomplished in just 2-3 hours. 
  • Since you hide the story points from each developer, you ensure that all possibility of bias is removed. What you get are fresh, original estimates from each developer about the batch in each iteration.  

Call it Estimated Velocity

Since it’s based on guesswork, it is best to call this as “Estimated Velocity”. This is contrary to the definitive sizes we got after the process of Relative Sizing, wherein we recommended to not use the word “estimated”.  

It’s important to bear in mind that just because you guessed a number, it does not entail that that is the only correct number. Hence some of the anti-patterns in this space must be avoided, viz., “Target Velocity” or “Planned Velocity”. These create an impression that that’s the target velocity and those numbers must be achieved. No, you’re just guessing and it’s bound to be different from the ground reality. Let’s just be honest and call this what it is–guessed velocity, estimated velocity, etc. 

Set clear expectations 

With the understanding that this is guessed or estimated velocity, set clear expectations within your team and with your stakeholders. Emphasise that this guesswork will need to be revisited periodically–it’s more of a short-term view. Take a cue from previously completed iterations and incorporate those learnings into the next version of estimated velocity. Staying in touch with reality is what will refine your guesswork and make it more probable. Over time, your confidence about your guesses will improve too. 

Conclusion

To sum it up, the only honest answer to “How fast can this be done?” is “I don’t know.” The next best thing to do is use logical guesswork to arrive at estimated velocity. Furthermore, stay aligned with the real picture and be prepared to relook at this estimated velocity as needed. 

In our next article, we’ll talk about the real show–what happens when you start actually working on the project, what kind of exceptional situations can crop up, and the resultant reactions… Keep watching this space! 

Time-based estimates are a bad idea

Time-based estimation is frequently used in software development projects, even though it is far from being accurate or efficient. Here’s a deep dive into the inherent problems that should prompt both developers and managers to avoid this approach.

Continue reading “Time-based estimates are a bad idea”