3. Key takeaways
• Why is it important to measure?
• What can we learn from these metrics?
• How do we use the data responsibly?
4. “When a measure becomes a target, it ceases to be a good measure” – Goodhart’s Law
“Measures tend to be corrupted/gamed when used for target-setting” - Campbell’s Law
“Monitoring a metric may subtly influence people to maximize that measure” – The Observer Effect
(via https://www.industriallogic.com/blog/what-should-we-measure/)
8. Acquisition
• WHAT: # new users in a given period
• WHY: Upward trends are indicators of positive user engagement
• HOW: Visitor data
9. Activation
• WHAT: A measure of how users are engaging with your product for the first time
• WHY: Focus on making our product more engaging to new users and converting to loyal users
• HOW: Track multiple page visits, new account sign ups
10. Retention
• WHAT: A measure of how often and for how long our customers are coming back
• WHY: Another measure of even higher engagement
• HOW: Repeat visits, length of session
11. Referral
• WHAT: Measuring how many customers come to us from existing customer referral
• WHY: People will generally only refer a product that they find valuable and/or love, so referrals are
a good indicator that we are building a high value product
• HOW: Dependent on referral mechanism (for example, surveys, email clickthroughs)
12. Revenue
• WHAT: The amount of revenue that can directly be attributed to users of your product
• WHY: If our products are successful, they should generate revenue.
• HOW: Track all revenue generating activities, such as products purchases, licences purchased
(Note, revenue may be replaced by “some benefit” if you are a team that builds internal products)
16. Why is flow important?
• Predictability
• Less bottlenecks
• Frequent delivery of value to the customer
• Easier to manage capacity
17. How can we measure flow?
• Cycle and lead time
• Queue sizes
• Tact time
• Throughput
• Cumulative flow
• Process control charts
18. Process Control Charts
• WHAT: Measuring and visualising the uniformity of your cycle time.
• WHY: Predictability, drive conversations around outliers (mura).
• HOW: Use process control charts from your cycle time data.
19.
20. Cumulative Flow
• WHAT: Number of stories in each lane, over time.
• WHY: It highlights when there are bottlenecks in the process.
• HOW: Count cards in each step daily.
23. What is waste?
• Any activity leading to a suboptimal system
• An activity that does not add to the value stream
• 3 Lean wastes - muda, mura and muri
24. How can we measure waste?
• Value stream mapping
• Cycle time throughout the value stream, especially wait times
• Failure demand
25. Failure Demand
• WHAT: Identify and visualise the time spent on failure demand work (bug fixing, production
support issue, unnecessary rework).
• WHY: Identify areas of improvement by highlighting areas of concern.
• HOW: Simple timers when working on certain activities.
26. "You could have a timer go off a dozen times per day and record whether you
were a) wasting time, b) working on new stuff, or c) fixing stuff that should have
been done correctly the first time. If the team does that for a week, a picture will
start to emerge. After a month, you'll know how much of your capacity is spent on
failure demand”
Kent Beck
28. A better way to measure progress…
• Visualising in a post burn up/down world
• Providing the “full picture” to all stakeholders
• Release confidence, risk management, deployment cadence
29. Release Confidence
• WHAT: A quantifiable measure of how confident the team is of releasing the next version of a
product within defined time frames.
• WHY: It replaces the concept of a deadline with a conversation about how confident we are as a
team that we will be ready to release. It drives conversations when dissonance exists. what does
person A know that makes them far less confident than person B.
• HOW: Vote on confidence based on bands (e.g. 0 - 4 weeks, 4 - 8 weeks etc). Visualise reasons for
change, such as increased understanding leading to broader scope.
31. Risk Radiators
• WHAT: A collaborative approach to triaging, assessing and mitigating risks combined with metrics
to watch for trends.
• WHY: Managing risks early is extremely important. Upward trends in high priority risks may be a
sign to reset.
• HOW: Visualise risks on the wall, identify risks early (e.g. during inception) and review regularly.
Identify severity and likelihood. Track mitigation efforts. Visualise trends.
36. What do we mean by TEAM HEALTH?
• How engaged is the team with the work and with each other?
• Are we creating a happy and safe environment?
• Are there underlying issues we are not addressing?
• Do we have time for learning and continuous improvement? Are we managing “muri”?
37. How can we measure team health?
• Health checks
• Safety and happiness indicators
• Feedback health
• Slack time
• Continuous improvement actions
38. Team Health Check
• WHAT: A more detailed analysis of factors that contribute to the overall health of the team.
• WHY: We can target specific areas for kaizen and track how we are improving.
• HOW: Spotify health check, for example. Adapt based on the team’s needs. Remember it’s about the
data AND the conversations.
39.
40. Slack Time
• WHAT: Identify and visualise the time spent on learning and improving.
• WHY: Slack time is essential for continuous improvement and innovation.
• HOW: Track activities. Team surveys.
42. Driving a focus on quality
• Lack of quality increases failure demand
• How confident are we in our code and systems?
• How robust are our incident response processes?
44. System Health
• WHAT: Measurements that indicate the health of our systems in production, such as load times.
• WHY: Monitoring system health allows us to be proactive in ensuring a good user experience. For
example, if we see page load times trending up, we can assess and fix before users contact us.
• HOW: Use automated tools such as new relic, but most importantly, visualise these metrics in your
team area.
45. System Confidence
• WHAT: A shared understanding, often subjective but always collaborative, of how confident we are in
our ability to maintain and build new features upon our systems.
• WHY: It is a tool to help us prioritise work and identify areas of concern that we may need to address
(such as an increase in technical debt).
• HOW: Any form of health check on system confidence would be sufficient.
46. Measuring our incident responses
• Number of production incidents
• How long it takes us to detect an incident
• How long it takes us to resolve an incident
47. Mean Time to Detect (MTTD)
• WHAT: The mean time taken by the team to detect a production incident.
• WHY: The less time taken to detect production incidents the less likely it is to severely impact users,
revenue and/or business reputation due to a potentially quicker time to resolve. Always run Post-
Incident Reviews with an aim to generate actions to lower MTTD.
• HOW: The best time to gather this information is during a post-incident review by doing a timeline
exercise. Keep track of the time taken to detect and note that it may vary depending on incident
type.
48. Mean Time to Resolve (MTTR)
• WHAT: The mean time taken by the team to resolve a production incident.
• WHY: Quicker resolution times can usually reduce the impact an incident may have, so track this
with the aim of trending downward. Look for outliers and call these out in retrospectives.
• HOW: Gather this information during post-incident reviews by doing a timeline exercise.
50. Cycle Time
• WHAT: How long a “unit of work” takes to move through a process step. In most cases, we are
referring to the number of days a user story takes to go from “dev start” to done.
• WHY: Reduce cycle time to shorten the feedback loop.
• HOW: Dot the card (this has multiple benefits) and record the number of days when done.
51. Cycle Time (rolling average)
• WHAT: Average cycle time over a shorter (rolling) period.
• WHY: Long term, overall average can “smooth” out trends and hide areas of concern.
• HOW: Calculate average CT only for the last x-weeks.
52. Tact Time
• WHAT: The time between starting new work.
• WHY: Another way of measuring predictability and flow and highlighting issues using trends.
• HOW: Record dates on which new work is started. This can be done at various levels of
granularity (for example, user stories, epics, features, initiatives)
53. Throughput
• WHAT: Number of stories completed in a given time period.
• WHY: Helps us plan by measuring consistent flow / rate of work.
• HOW: Date a card when completed and derive count for a period, eg stories per month
54. Work-In-Progress Limits
• WHAT: Limit the number of stories in each step (lane) and track when and why limits are exceeded.
• WHY: Improve flow and reduce context switching.
• HOW: Use cumulative flow diagrams to inform WIP limits.
55. Lead Time
• WHAT: Amount of time it takes a unit of work to go through the entire value stream.
• WHY: Gives us an insight into our value stream and can highlight areas of waste. We can watch
for issues such as long queues or slow deployment processes.
• HOW: Track time from idea to market, for example, date the card when you write it and add it to
the backlog and date it when it is in production.
56. Queue Size
• WHAT: The amount of work sitting in queues, such as backlogs.
• WHY: Drives prioritisation conversations. Aim to minimise the size of queues (both quantity of
work and the length of time it takes to pass through).
• HOW: Count cards. Date when they are added to backlog.
57. Time Between Production Deployments
• WHAT: Time between deployments to production.
• WHY: Working software is the primary measure of progress, so we want to see it put in front of
users as often as possible. Is it frequent enough for us? If time is trending upward, is it a symptom
of batch sizes being too large?
• HOW: Track date (and time) between production deployments and visualise means and trends
58. Team Happiness Indicator
• WHAT: A simple indicator of how happy people are in the team.
• WHY: We want the team to be happy as it helps us do our best work. Are people in the team
enjoying what they do? are they being fulfilled? do they have any unmet needs?
• HOW: Simple traffic light, ad-hoc or at regular intervals, anonymous or otherwise (depending on
safety).
59.
60. Team Safety Indicator
• WHAT: A simple indicator of how safe people feel to speak openly within the team, to speak their
minds and voice concerns.
• WHY: If people do not feel safe to speak their mind we will struggle to improve as a team, and we
people will feel unhappy and stressed.
• HOW: Team safety check (1 - 5). This could be done as part of retrospectives.
61.
62. Feedback Health
• WHAT: Measuring how often people in the team are giving and receiving constructive feedback.
• WHY: A healthy feedback culture is integral to an environment of continuous improvement.
• HOW: Feedback matrix.
63.
64. Retrospective Actions
• WHAT: Tracking the number of retro actions completed as a percentage of retro actions raised.
• WHY: Measure the effectiveness of retrospectives (large, unactionable results from retros will lead to
disengagement and loss of continuous improvement opportunities).
• HOW: Tracking retro actions raised and completed (e.g. retro action kanban).
Note, this is not highly useful on its own, but could be a leading indicator for team health problems.
65. Test Coverage
• WHAT: A measure of the degree to which your code base is covered by tests, usually as a percentage
of the total code.
• WHY: Testing allows your team to make changes to code with a higher level of confidence. Test
coverage will tell you how much of your code is tested (or untested) but won’t tell you about the
quality of these tests.
• HOW: Many automated tools will calculate code coverage. Often teams will agree on a minimum
level.
66. Number of Production Incidents
• WHAT: A count of production incidents (e.g. bugs, security incidents, outages).
• WHY: An upward trend in the number of production incidents is often a trailing indicator of potential
quality issues in our systems. We should be aiming to always trend down.
• HOW: Keep track of production incidents (date occurred) and visualise trends/counts/days since last
incident.
Editor's Notes
Metrics are integral to the BUILD-MEASURE-LEARN cycle
How can we experiment when we don’t have data to tell us if our hypothesis was wrong?
“Measure what is important, don’t make important what you can measure” Robert McNamara
Measure for your TEAM not your MANAGER (but keep your manager very happy as a result!)
Pick and choose metrics to suit your context.
Don’t measure everything.
Use the data as indicators and “dials to turn”, not performance targets or mechanisms to compare teams.
It is important to note that most team metrics are about TRENDS in the data more so than the raw values.
For example, a cycle time of 5 days may be working perfectly for a team, while a cycle time of 1 day is reasonable for another team.
Developed by Dave McClure in the context of startups but we can apply it to all teams delivering value to customers.
Particularly in the world of product development we often ask our teams to “act like start ups”
How do we find our ‘revenue’ if we don’t have immediate impact on financials
A delivery pipeline should be predictable in its flow and cycle time is one of the major influences on flow.
A delivery pipeline should be predictable in its flow and cycle time is one of the major influences on flow.
Mura is the waste of unevenness (one of the 3 M’s in lean – mura, muri (overburden) and muda (any activity that doesn’t add value)
Things to call out ->
Lack of uniformity at start while team is forming
Outliers drive conversation, inspect-and-adapt
Muda is any wasteful activity. We can think about non-value add activities (with caveats). This is what we’re focusing on in the next couple of slides.
Mura is waste due to unevenness (lack of predictable flow) which we have touched on with metrics around flow.
Muri is the waste of overburden (unnecessary stress) which we will address shortly.
Things to call out
When there are large discrepancies between team members
Don’t just harvest the data, talk about it!
Do it frequently and keep track of changes
A delivery pipeline should be predictable in its flow and cycle time is one of the major influences on flow.
Where cycle time can tell us about the size of our user stories to help us slice, lead time tells us more about our delivery pipeline and impediments to delivery