Month: January 2020

Estimation antipattern – Comparing teams’ velocities

In this article, we’re going to examine another antipattern around story point estimation. Last time we discussed the fallout of using story points as targets. Now, we caution you against using story points to compare teams. It’s a common enough trap, but we must hold ourselves back. By the end of this article, you’ll know why such comparison is a meaningless exercise. 

What’s the comparison about?

To put it simply, using story points to compare team has accusatory implications. “Hey this other team is doing more points per iteration than your team, so that makes them better.” So Team A could be doing 20 points and Team B could be completing 12 points, and because Team A’s velocity is greater, it’s the better team. When put like that, we will naturally witness Team B desperately try to outdo Team A, or at least match its velocity. End result: Pressure builds up, the environment can get ugly, team members may resort to workarounds to meet the numbers, and little thought will go toward what should really be in the focus – the quality of the software that’s going into production. 

Here’s why such comparison is meaningless 

Despite the exercise having no real meaning or value, we still find many teams engaging in it during story point estimation. We reiterate that such thinking, i.e., comparing teams on the basis of completed points, is totally flawed, and here are our reasons for taking this stance: 

Every team is unique

Each team has its own distinct personality and comes with diverse skill sets and levels of experience. No two teams are alike in their inter-personal as well as working relationships – how well they gel together, how effectively can they collaborate, etc. Then there are more factors like do they have any dependencies, what’s the environment in which they work, and so on.

All these factors contribute to the uniqueness of each team. Hence it is completely unfair and impractical to compare velocities because it would be impossible to create two teams that have the exact same skills and the same personalities and do the same kind of work in the same environment. Velocity is a reflection of many things happening within a team – As we saw above, each team comes with its unique composition, skills, experience, and dynamics. All of these have a bearing on how long that team will take to complete a story and how much they can pack into one iteration. This output will vary across teams and there is no ground for comparison.

The understanding of a point is not common across teams – The basis of comparison is itself skewed. As we saw earlier, comparison takes the form of: Team A completed 20 story points but Team B could complete only 12 in the same duration. What we fail to realize is that each team’s understanding of what makes up 1 story point is limited to that team and it will differ across teams. This understanding of how big is the backlog is not standard; it’s relative to each team. What Team A considers to be 1 point can be very different from the perspective that Team B takes. Note that 1 point is just a numeric representation for a “small” story. And 4 points would be just a numeric representation of a large story. But the buckets of what is a small story is only relevant and meaningful within the team. How then can we compare their performance and efficiency on the basis of the number of points completed?

A standard definition of 1 point doesn’t work

To counter the above problem, some folks go a step further and say everyone must agree upon a standard definition of 1 point, so everyone can align the sizes for stories. While this sounds reasonable this is very difficult to achieve in practice. The cost of the bureaucracy required to ensure this especially on a large program OR for the entire organisation is hardly worth the ability to compare teams’ velocities. 

The nature of each team’s work can be different

So let’s say there are two teams in the organization that are fundamentally doing very different kinds of work. One is building some API’s and the other is building a mobile app. It’s a no-brainer that the tech landscape and the kind of complexities faced by each team are extremely different. In such a scenario, comparing how fast they are completing their points is fundamentally meaningless!

Similar work cannot be compared either

Let’s look at cases where teams could be involved in similar work, such as both teams are building on top of a common API. One is building an Android app and the other an iOS app. It may seem like they are doing similar work, but even then, it’s not so straightforward that we can begin comparing their velocities. That’s because contributing factors such as environment, skills, etc. continue to remain different. 

Let’s look at a typical development scenario where two teams are working on the same backlog – one is onshore, the other is offshore. On one hand, we can safely assume that that the onsite time will go faster because the Product Owners are sitting next to them, providing real quick answers or feedback. On the other hand, this same situation may work to their detriment. The team members may find themselves getting distracted because the client stakeholders are right there. The offshore team doesn’t face that pressure and may be able to focus more effectively and actually end up with a higher velocity. But, truly, can we conclude that they are really more efficient? We cannot, because the comparison itself has no real meaning. 

How to avoid the antipattern of comparing teams using story points

The above reasons illustrate the futility of such a comparison. That’s why we strongly recommend against comparing teams. It’s not a valuable exercise to say one team is better than the other by itself. But if at all you have to compare or you want to compare, then track the trends of each team, such as the level of predictability, frequency of releases, or the business impact they create. 

Predictability

Have the teams set up short-term goals or iteration-level targets and then track how often each team is able to meet its targets. The targets can vary across iterations, depending on factors like yesterday’s weather. But if a team says on a Monday that in the next two-week-long iteration, they will complete 10 points, then how often do they actually achieve that completion? Or how close do they get to it every time? This level of predictability can be compared, if at all one really has to.

Frequency of releases

The more frequently the team is putting software into production and into the hands of real users, the more value they are creating. So you can compare how frequently each team is able to create such value.

Business Impact

Mature XP teams don’t even bother with intermediate output; they are more concerned with the outcome, i.e., the business impact the team is able to create. The way you do that is to have the team sign up for key OKRs. These could be key business results they want to achieve such as improving customer conversion or reducing churn in customers. That’s the business objective they work toward. And whether they deliver software to achieve that or do something else – that’s entirely up to the team.

However, this extreme practice of only measuring business outcomes may not be feasible in every software development scenario, but the takeaway from that is to not just measure output, but measure on these more meaningful aspects.

However, it’s again important to bear in mind that you shouldn’t be comparing on parameters such as the revenue each team is able to generate. That’s because each team functions in a different context and could have distinct strategies. One team’s focus could be revenue generation, another team could be driving consumer engagement–which again eliminates any common ground for comparison. So, even if we’re comparing on the parameter of business impact, the OKRs are not to be compared. How far each team is able to achieve its OKRs is more important. 

By now, you have got a chance to understand the various reasons to avoid comparing teams in itself. Comparing teams on velocity and the points they’ve delivered has absolutely no meaning. If at all you have to compare and want to compare, then you can certainly find ways to make the comparison more relevant and meaningful. The outcome of such comparisons has more value. 

How Story Points Make Our Life Better

The entire series on estimation has focused on the inherent problems of time-based estimates. We introduced and recommended story points as a more viable option to time estimates. In this series finale article, we look at story point estimation more closely and understand how it resolves the issues that crop up with time estimates. 

Creates confidence by asking the right questions

A prime concern with time estimates is that they are almost always wrong. That’s because you’re asking too complex a question–combining two aspects, viz., how big something is and how fast you can go. Story point estimation makes things easier at the very outset by separating these two questions. 

Further, it simplifies the “How big?” question by asking how big the stories are relative to a benchmark we’ve set, which is typically the smallest story in the backlog. This process of Relative Sizing or allocating a size to a story by comparing it with a standard can be completed quickly. In addition to speed, this process generates a high level of confidence in the sizes. Infact, we need not call them estimates at all; they are definite sizes. There is no scope for ambiguity and there is no need to revisit or change the sizes.  

The second of the 2 sub-questions that story points ask is “How fast can you go?”. We admit upfront that nobody really knows the answer. So we move on to the next best way to approach this question and that’s through informed and logical guesswork–using the exercise of Raw Velocity. This involves multiple developers picking up multiple and diverse items from the backlog with their estimates hidden and stating which ones they can finish in a given time period (an iteration). Then we add up the estimates for the items that were picked up to find the “gut feel” of the story points that can be completed in an iteration. We do this math over several rounds with each of the developers. The average derived thus reflects a gut feeling of the majority of the team regarding the time required to complete the given backlog. 

While it’s accepted as guesswork, these numbers evoke a higher level of confidence because the developers have answered a comparatively easier question–regarding their own capabilities and not that of someone else. Hence the answer is likely to be accurate. 

Moreover, we use the Raw Velocity numbers only for a short duration – during the first couple of iterations. After those initial iterations, we are in a position to make more informed velocity calculations because we have real data from the real work done during the first few iterations. Going forward, we use our judgement based on this data and not on the guesswork we started out with. Thus the answer to “How fast?” is now rooted in accuracy and we can proceed with confidence and conviction in our sign-up for the subsequent iteration.

Story point estimation, thus, instills confidence by asking the right questions. The responses to these questions leave no room for confusion or doubt. From one question we get definite sizes and the response to the other is grounded in real data. We can successfully avoid the feeling of being wrong–which is typically what happens in time estimation. On the contrary, we are now confident and convinced about how much the team can do especially in the short term.

Eliminates pressure

Time-based estimation also creates pressure–at the 2 stages of estimation and execution. Let’s dive deep…

During estimation

There are 2 problem-creating scenarios with time estimates during the estimation
phase:

  • The compound question that kickstarts time estimation is itself a pressure point because you’ve mixed up 2 sub-questions. In an attempt to respond to that compound question, you try to cover all possible scenarios and come up with precise numbers. Too much time is spent on seeking too many details too early on. 
  • The other problem is that you have too many options to choose from in terms of time period. This wide range of choices significantly raises the probability of errors. You may end up either over- or under-estimating. In either case, you end up feeling pressured. Typically you will react to such pressure by being extra vigilant about every possible scenario or then by adding buffers to play it safe. 

Story point estimation has no scope for such problems. Firstly, it limits the number of size options to just 3 or 4. All we’re doing is picking up each story and comparing it to a sample set in terms of size. Once we’ve assigned a size, it’s frozen. We don’t need any more details; we don’t need to revisit the sizes. The issues–and consequent pressure–of time estimates do not even exist in the story points world.

During execution

In a time estimates scenario, you may realize that the size of the task is different from what was assumed during planning, or that the pace of work is getting affected by extraneous conditions. However, you’ve already committed to getting an X amount of work done in a given time-frame. You naturally feel pressured because time is running out. 

Story points avoid this pressure by setting the right expectations from the start. We are cognizant that Raw Velocity is a guess and those numbers will change in the face of real work. We take a cue from previously completed iterations and incorporate those learnings into Estimated Velocity. The guesswork progresses into informed estimation, which definitely has more value. Over time, this boosts the team’s confidence too. 

Replaces negative pressure with positive ambition

Using Relative Sizing and Velocity, we’ve eliminated the pressure of completing an X amount of work in a given period of time. With that pressure gone, what we experience is a positive emotion and the aspiration and ambition to strive for continuous delivery and improvement. 

Having sifted out questions that have no real meaning, we zone in on the most relevant ones in our effort to achieve continuous delivery – What should we take up in the next iteration? What will be of highest value to go next? With these questions, we strive to make the delivery process better, more productive and centered around excellence. 

Empowers the team

Story points create a transition from individual performance and individual goals to team goals. The commitment now is to put out a set of features into production. The onus is on the team as a whole and so developers are likely to help each other and even help other roles to complete what has been started. There is no rush to start new items from the backlog. The emphasis is on completing what’s been started–in the right sequence, with the right quality, with mutual collaboration as a team. The debilitating competitive pressure gives way to a positive environment and a collective intention to achieve. The atmosphere is still charged with high intensity, but it’s energizing and collaborative.

Retains the focus on completion

At this point, it’s important to remember that story points work around a rule of averages without any direct conversion of points into days. If we still did the conversion in our minds , we would lose some of the benefits intrinsic to this type of estimation. There can be no blanket rule that a story of X points will always get done in Y days. Conditions are fluid and each story and each developer has their own pace. If this is accepted, then we’ve reduced some of the pressure of dates and time commitments faced by developers.

However, it’s imperative to be aware that doing away with the pressure of time commitments does not equate to development meandering along at its own pace or the absence of accountability around completion. We definitely must ask these pertinent questions, but with the intention and aim of removing impediments and achieving progress. The ticking clock must not be allowed to assume nightmarish proportions. Developers should breathe easy and strive for excellence and continuous improvement instead of struggling for self-preservation. 

Enables effective tracking

Time-based estimation poses a tracking hazard because we cannot segregate how much time was spent doing real development work and how much was lost in breaks or leaves. If we do attempt to track break and leave time, it would lead to micromanagement of the team, and we do not want to go there. 

Separately, in time estimates, there is no way to apply the learnings from previous work to what will be taken up next. For example, if one story gets delayed, we cannot predict the time completion for the next story. 

Story point estimation is able to counter such problems. Velocity is an average across multiple developers and multiple iterations. After the initial stories, the guesswork is replaced by insights gathered from real work and real data. We are in a good position to estimate how much time will be required to complete the next story, based on our learnings from previous iterations. This is valuable and it helps boost the team’s confidence in its own capabilities. 

Similarly, if we have visibility into who’s going to be away next week, we count them out of capacity and make adjustments accordingly. For e.g., if the full team’s velocity is 10, we might only plan for 8, knowing that there will be 2 less on the team. 

Clarity that results from learning from previous iterations or from accounting for leaves and breaks prevents us from erroneously concluding that the team hasn’t been working hard enough if our targets aren’t met. The team’s morale isn’t affected adversely and we stay focused on guiding the next iteration in the correct way, using practices that can ensure accomplishment and achievement as a team. 

Targets continuous delivery even if project is lagging

Despite the best intentions and care, the project can go off track. Imagine that it’s not possible to meet the target of 10 points per iteration. At such a time, story point estimation adopts a rational problem-solving approach instead of mindlessly indulging in the blame game. Since XP’s version of the Golden Triangle keeps Scope variable, we can proceed by going live with the most important, prioritized stories on the said date. The rest of the features can get added over the next few iterations. 

This ensures that we deliver on the committed date, even if it’s not the entire bulk of scheduled deliverables. This practice of Continuous Delivery instills confidence in the client stakeholders that there is no major delay harming the project. We’ve ensured that the most relevant features are already in production by the half-way mark, and what didn’t get completed will be delivered in the next few iterations, which will be just a couple of weeks out. We’ve succeeded in derisking the project to a large extent quite early in the game. 

Story point estimation makes life better

At the end of this series, we’ve firmly established that story point estimation and the XP way of planning makes estimation easier and more accurate. It creates the right atmosphere for individuals and the team as a whole. Software quality and excellence remain the star of the show throughout. Harmful after-effects and negative behavior patterns arising out of pressure find no place in this process. Stakeholders also experience a high level of confidence as they get a clear view into the project’s progress and can take efficient and timely decisions in response to possible changes. Story point estimation is a win-win for all parties, and it’s the only effective method for project estimation and planning. 

The XP Game

Let’s begin this article with a recap of what we’ve covered. We started out with the argument against time-based estimates. We then sought to answer the question “How big something is?” using story points and relative sizing. Next, we attempted to meaningfully respond to “How fast can we go?” and explored the concept of velocity in that context. Now, we move on to “When will all of this get done?”

The question is not meaningful

While we may be eager to know the answer, the question “When will all of this get done?” is not meaningful and needs to be rephrased for 2 reasons:

  1. While Raw Velocity is a logical starting point to estimate how fast you will go, the averages derived from it remain guesswork. It offers us a reasonable starting point that must be periodically revisited once actual work commences. The numbers are bound to change in real work situations and that presents a challenge. To illustrate this: If your scope is 200 points and the Raw Velocity is 10 points per iteration, then it’s safe to say it will take 20 iterations to complete the entire scope. But once you start working, you may realize that this 10 itself is going to change–it may come down to 8 or 5, or even go up to 12. This fluctuating number affects the response to “When will all of this get done?”
  2. Moreover, the “all” in question, i.e., the scope, is also variable and therein lies the second challenge. You may have estimated 200 points as scope while initiating the project. But dynamic situations in the market or the organisation can impact this number. The scope is very likely to change over the course of the project and it’s prudent to proceed with that awareness.

So, what we know is that neither the scope nor the velocity will remain constant. Hence, the question “When will all of this get done?” has no real value. 

What, then, should we be really asking?

The Triple Constraints Triangle

Before we frame the right question, let’s take a step back to understand the oft-cited paradigm of project management – the Triple Constraints Triangle – and how XP connects with it. Scope, Time and Cost are the 3 points of this triangle, with Quality as the central, overarching theme. What happens if we arrange these 3 constraints into different permutations? 

  • Flexible time, fixed scope and cost – This is not advisable because we could spend years building out “all” the (ever increasing) scope we can think of, but not be able to reap the benefits of the software we are building. This has been the fate of many-a-traditional software projects.
  • Flexible cost, fixed scope and time – Keeping cost flexible essentially means agreeing to varying the team size in response to variations in scope or velocity. However, growing and shrinking the team is neither feasible nor useful.One of the most important ingredients of a successful software team is the context and the knowledge that they build over a period of time about the problem at hand and the solution. A team that’s constantly changing its composition will not be able to take advantage of such a benefit.  
  • Everything fixed – The biggest casualty in this scenario would be quality because if everything is fixed, quality is the only thing that can change and it will change. 

The XP view of Triple Constraints Triangle

The third permutation is typical to traditional software development. XP, however, takes a different view. We advocate keeping quality fixed at the highest possible level as well as freezing time and cost, and keeping scope as the only variable. Freezing the team size (i.e., the cost) is age-old wisdom that adding people to a late software project makes it later. And, time must be frozen too and you should commit to yourself and to your stakeholders that you will go live by a certain date. 

However, be mindful that the concept of “freezing time” in XP refers to freezing time in multiple small chunks or iterations. At the beginning of each iteration, ask yourself how much will you get done in that period. And at the end of each iteration, be sure to deliver something into production. 

The meaningful questions to ask

Having understood and accepted this XP-variant of the Triple Constraints Triangle, we frame the right and relevant questions that you should really be concerned about: 

  1. “How much are you able to do in each iteration?”
  2. “What should you do in the next iteration?”

Now the focus shifts to prioritization. Instead of debating over the timeframe for the entire project completion and then scrambling about to achieve that, you are now asking what should be completed next. You are systematically breaking down “all” the scope into small releases and iterations. By asking yourself the above 2 questions regularly and frequently, say every 2 weeks, you nail down a priority list. This also helps in appropriately spreading work out over iterations. 

The correct questions reduce pressure

The exercise of Raw Velocity–as we saw in previous articles–provides a good starting point. Since it’s an average derived from multiple developers across multiple stories, it instills some confidence even if it’s guesswork. As you progress with these numbers, the guesswork gets replaced by real numbers derived from real data. The first few iterations give a more realistic estimate of how many story points can be completed per iteration, and this Actual Velocity–based on real experience–guides future iterations. 

Having understood how much can get done in each iteration, you can now prioritize what to take up in the next iteration. For example, if the last 3 iterations showed you that it’s possible to complete 10 points per iteration (that’s the response to our first question), you can then decide which 10 points to take up next (the response to the second question). Such prioritization helps you move the most important features ahead in production. And this momentum of continuous delivery ensures the project doesn’t lag behind terribly or goes completely off-track. 

Here’s a practical scenario borrowed from the real world to illustrate the above: 

We’d said we’d go live with a set of features in 3 months. At the end of 3 months, we’ve built 80% of the features. It’s imperative to now go live with that 80%. The good part is because we have effectively prioritized what needs to be taken up in every iteration, we have covered the most important aspects of those features in that 80% and in the estimated 3 months. The remaining 20% will get covered in the next few weeks or so. The client doesn’t have to choose between “We’ll go live with everything or nothing”. They feel assuaged that the most important 80% is out there already and the remaining pieces will be put out shortly. Likewise, the team doesn’t feel pressured about performance or productivity, keeping the environment positive and collaborative.    

Prioritization enhances outcome

We reiterate that such prioritization and focus on “What to do in the next iteration?” also ensures that you take the most valuable pieces into production first. What needs to get delivered on priority gets picked up on priority. When you’re thinking about what should go into production next, here are 2 useful approaches: 

  1. Learn from software that’s been put into production. For example, you have previously built a feature to increase conversion rate of your customers. Now, you may want to evaluate the progress of that metric and assess how it can be made more effective. So you can pick that up as the next piece to deliver based on what’s already been done. 
  2. Get early feedback from real users and use that as a guide. Build prototypes to carry out user testing with real users–even before you write a single line of code. Analyze its results to determine what your users really want and schedule that as your next to-do item.

It’s important to note that we are making space for new scope. We’ve accepted that scope will change and we welcome that. Based on yesterday’s weather – average of actual velocity from the last few iterations, we can go back and identify what to pick up next. We may even end up identifying new features to build in the next iteration based on user/market response. Our process of prioritization and our Velocity numbers give us room to be flexible in order to deliver what is critical and remain responsive to what users want or what the market needs. In a nutshell, we’ve ensured the outcome is truly effective and efficient. 

Summary

Let’s pause at this point for a quick review. 

“When will all of this get done?” is really not the question you should fuss about. Instead, you should try to answer the question “How much are we actually able to do every iteration?” and then concentrate on prioritising the most valuable functionality at the beginning of each iteration.  We recommend keeping scope flexible, while freezing time, cost, and most importantly, quality. Freezing time in XP refers to the concept of of iterations–with the emphasis being on continuous release into production at the boundary of each iteration or even more frequently if possible. This practice of Continuous Delivery works hand in hand with Continuous Discovery–as we investigate and assess the next valuable thing to take up next. And to do that right, we either learn from software that’s already in production or use feedback from user testing. 

Going forward, we will see how this XP perspective of software development and project management successfully overcomes many of the typical hurdles that are common in the traditional development space that uses time-based estimates.