How to avoid breaking other people's things

•Transferir como PPTX, PDF•

0 gostou•119 visualizações

Breaking changes are sad. We’ve all been there; someone else changes their API in a way you weren’t expecting, and now you have a live-ops incident you need to fix urgently to get your software working again. Of course, many of us are on the other side too: we build APIs that other people’s software relies on. There are many strategies to mitigate breaking changes: think SemVer or API versioning, but all rely on the API developer identifying which changes are breaking. That’s the bit we’ll focus on in this talk: how do you (a) Get really good at identifying what changes might break someone’s integration (b) Help your API consumers to build integrations that are resilient to these kinds of changes (c) Release potentially breaking changes as safely as possible.

Tecnologia

@paprikati_eng
About Me
Lisa Karlin Curtis
Software Engineer at GoCardless
I blog at paprikati.github.io

@paprikati_eng
API
API
Us
API
Webhook
Handler
Integrator
Database
Webhook
GET /resource
UPDATE

@paprikati_eng
Adding a mandatory
field to an endpoint
Breaking apart a
database transaction
Introducing a rate
limit
Changing an error
response string
Changing the timing
of batch processing
Reducing the
latency on an API
call

@paprikati_eng
Things break
because an
assumption
made by the
integrator is no
longer correct

@paprikati_eng
How do assumptions develop?
01 Documentation
02 Support Articles & Blog Posts
03 Ad Hoc Communication
04 Industry Standards
05 Observed behaviour

@paprikati_eng
Explicit Assumptions
01 Documentation
02 Support Articles & Blog Posts
03 Ad Hoc Communication
04 Industry Standards
05 Observed behaviour

@paprikati_eng
Implicit Assumptions
01 Documentation
02 Support Articles & Blog Posts
03 Ad Hoc Communication
04 Industry Standards
05 Observed behaviour

@paprikati_eng
Developers
witness a
behaviour, and
assume it is
reliable

@paprikati_eng
If it hasn’t broken for a long time, we
think it never will

@paprikati_eng
Avoiding bad assumptions
01 Documentation
02 Support Articles & Blog Posts
03 Ad Hoc Communication
04 Industry Standards
05 Observed behaviour

@paprikati_eng
Avoiding bad assumptions
Given that many integrators just look at the
HTTP examples, naming is critical
Deliberately call out tripwires in your docs to
combat pattern matching
Restrict and document your behaviour as
explicitly as you can

@paprikati_eng
‘Breaking’ is not
a binary

@paprikati_eng
Empathise with
your integrators
How breaking is my change?
Dogfood your
products
Observe and
measure

@paprikati_eng
Releasing a potentially breaking change
Pull Comms
Updating docs or a
changelog
Push Comms
Newsletter or email to
integrators
Ack’d Comms
Wait for a positive
response from
integrators before
rolling out a change
01 02 03
Likelihood of a change being breaking

@paprikati_eng
Can you make the change incremental?
Can you release the change into a test environment?
Can you easily roll back if there are unexpected consequences?
Releasing a potentially breaking change

@paprikati_eng
Find the balance
between caution
and product delivery
that’s right for you.

CREDITS: This presentation template was adapted from a
template by Slidesgo, including icons by Flaticon, and
images by Unsplash
@paprikati_eng

Mais conteúdo relacionado

Mais procurados

Developer Support Models: Calibrating Service Level to CommitmentNordic APIs

apidays LIVE New York 2021 - Service API design validation by Uchit Vyas, KPMGapidays

Take Your API Docs from 406 Not Acceptable to 200 OKNordic APIs

apidays LIVE New York 2021 - Design-First: How to champion an API culture shi...apidays

Dependency Down, Flexibility Up – The Benefits of API-First DevelopmentNordic APIs

Case Study: Creating a DocOps/Docs-As-Code DevPortal for C3.aiPronovix

Rest api best practices – comprehensive handbookKaty Slemon

apidays LIVE LONDON - Unlock the Power of OAS in the Last Mile of your Lifecy...apidays

Recipes for API NinjasNordic APIs

apidays LIVE New York 2021 - Docs Driven API Development by Rahul Dighe, Paypalapidays

Engineer Stunning (API) documentationPronovix

apidays LIVE JAKARTA - Machine Learning powered API governance by Jenks Guoapidays

Standardizing APIs Across Your Organization with Swagger and OAS | A SmartBea...SmartBear

The Inverted Funnel of API DocumentationPronovix

Building an API Platform for Digital TransformationWSO2

Designing APIs and Microservices Using Domain-Driven DesignLaunchAny

apidays LIVE Australia - The Evolution of APIs: Events and the AsyncAPI speci...apidays

APIdays Paris 2019 - Improve the Security of Your APIs by Securing the API Li...apidays

apidays LIVE Helsinki & North - Ideas around automating API Management by Mat...apidays

apidays LIVE LONDON - Protecting financial-grade APIs - Getting the right API...apidays

Mais procurados (20)

Developer Support Models: Calibrating Service Level to Commitment

apidays LIVE New York 2021 - Service API design validation by Uchit Vyas, KPMG

Take Your API Docs from 406 Not Acceptable to 200 OK

apidays LIVE New York 2021 - Design-First: How to champion an API culture shi...

Dependency Down, Flexibility Up – The Benefits of API-First Development

Case Study: Creating a DocOps/Docs-As-Code DevPortal for C3.ai

Rest api best practices – comprehensive handbook

apidays LIVE LONDON - Unlock the Power of OAS in the Last Mile of your Lifecy...

Recipes for API Ninjas

apidays LIVE New York 2021 - Docs Driven API Development by Rahul Dighe, Paypal

Engineer Stunning (API) documentation

apidays LIVE JAKARTA - Machine Learning powered API governance by Jenks Guo

Standardizing APIs Across Your Organization with Swagger and OAS | A SmartBea...

The Inverted Funnel of API Documentation

Building an API Platform for Digital Transformation

Designing APIs and Microservices Using Domain-Driven Design

apidays LIVE Australia - The Evolution of APIs: Events and the AsyncAPI speci...

APIdays Paris 2019 - Improve the Security of Your APIs by Securing the API Li...

apidays LIVE Helsinki & North - Ideas around automating API Management by Mat...

apidays LIVE LONDON - Protecting financial-grade APIs - Getting the right API...

Semelhante a How to avoid breaking other people's things

#ATAGTR2020 Presentation - Redefining DevOps for seamless performance testingAgile Testing Alliance

Building regression tests to increase velocity and prevent things from “Going...CA Technologies

Tailoring the DITA Suit to FitSalesforce Engineering

To Open Banking and Beyond: Developing APIs that are Resilient to every new I...Curiosity Software Ireland

When RESTful may be considered harmfulRoss Garrett

Implementation PresentationStephen Porter

Inside Developer Relations at AWSAdam FitzGerald

Guidewire Connections 2023 DE-4 Using AI to Accelerate Application IntegrationBrianPetrini

What makes a cellular IoT API great? Tobias GoebelAlan Quayle

The AppExchange for DevelopersSalesforce Developers

Test Everything: TrustRadius Delivers Customer Value with ExperimentationOptimizely

Designing Good API & Its ImportanceImran M Yousuf

ERP Merged Slides.pdfAnushkaDwivedi10

B2B eCommerce on Salesforce: The FactsCloudCraze

Cloud Expo - Designing Cloud Solutions for CustomersDatapipe

How to deal with REST API EvolutionMiredot

Do Agile Data in Just 5 Shocking Steps!DataKitchen

10 Immutable Steps to Mobilize Your BusinessProntoForms

How GetNinjas uses data to make smarter product decisionsBernardo Srulzon

How to Use Data to Inform Your Design and Drive Your BusinessKissmetrics on SlideShare

Semelhante a How to avoid breaking other people's things (20)

#ATAGTR2020 Presentation - Redefining DevOps for seamless performance testing

Building regression tests to increase velocity and prevent things from “Going...

Tailoring the DITA Suit to Fit

To Open Banking and Beyond: Developing APIs that are Resilient to every new I...

When RESTful may be considered harmful

Implementation Presentation

Inside Developer Relations at AWS

Guidewire Connections 2023 DE-4 Using AI to Accelerate Application Integration

What makes a cellular IoT API great? Tobias Goebel

The AppExchange for Developers

Test Everything: TrustRadius Delivers Customer Value with Experimentation

Designing Good API & Its Importance

ERP Merged Slides.pdf

B2B eCommerce on Salesforce: The Facts

Cloud Expo - Designing Cloud Solutions for Customers

How to deal with REST API Evolution

Do Agile Data in Just 5 Shocking Steps!

10 Immutable Steps to Mobilize Your Business

How GetNinjas uses data to make smarter product decisions

How to Use Data to Inform Your Design and Drive Your Business

Mais de Pronovix

By the time they're reading the docs, it's already too latePronovix

Optimizing Dev Portals with Analytics and FeedbackPronovix

Success metrics when launching your first developer portalPronovix

Documentation, APIs & AIPronovix

Making sense of analytics for documentation pagesPronovix

Feedback cycles and their role in improving overall developer experiencesPronovix

GraphQL Isn't An Excuse To Stop Writing DocsPronovix

API Documentation For Web3Pronovix

Why your API doesn’t solve my problem: A use case-driven API designPronovix

unREST among the docsPronovix

Developing a best-in-class deprecation policy for your APIsPronovix

Annotate, Automate & Educate: Driving generated OpenAPI docs to benefit everyonePronovix

What do developers do when it comes to understanding and using APIs?Pronovix

Inclusive, Accessible Tech: Bias-Free Language in Code and ConfigurationsPronovix

Creating API documentation for international communitiesPronovix

One Developer Portal to Document Them AllPronovix

Docs-as-Code: Evolving the API Documentation ExperiencePronovix

Developer journey - make it easy for devs to love your productPronovix

Complexity is not complicatednessPronovix

How cognitive biases and ranking can foster an ineffective architecture and d...Pronovix

Mais de Pronovix (20)

By the time they're reading the docs, it's already too late

Optimizing Dev Portals with Analytics and Feedback

Success metrics when launching your first developer portal

Documentation, APIs & AI

Making sense of analytics for documentation pages

Feedback cycles and their role in improving overall developer experiences

GraphQL Isn't An Excuse To Stop Writing Docs

API Documentation For Web3

Why your API doesn’t solve my problem: A use case-driven API design

unREST among the docs

Developing a best-in-class deprecation policy for your APIs

Annotate, Automate & Educate: Driving generated OpenAPI docs to benefit everyone

What do developers do when it comes to understanding and using APIs?

Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations

Creating API documentation for international communities

One Developer Portal to Document Them All

Docs-as-Code: Evolving the API Documentation Experience

Developer journey - make it easy for devs to love your product

Complexity is not complicatedness

How cognitive biases and ranking can foster an ineffective architecture and d...

Último

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Advanced Computer Architecture – An IntroductionDilum Bandara

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

"ML in Production",Oleksandr BaganFwdays

How to avoid breaking other people's things

2. @paprikati_eng About Me Lisa Karlin Curtis Software Engineer at GoCardless I blog at paprikati.github.io

3. @paprikati_eng

4. @paprikati_eng API API Us API Webhook Handler Integrator Database Webhook GET /resource UPDATE

5. @paprikati_eng Adding a mandatory field to an endpoint Breaking apart a database transaction Introducing a rate limit Changing an error response string Changing the timing of batch processing Reducing the latency on an API call

6. @paprikati_eng Things break because an assumption made by the integrator is no longer correct

7. @paprikati_eng How do assumptions develop? 01 Documentation 02 Support Articles & Blog Posts 03 Ad Hoc Communication 04 Industry Standards 05 Observed behaviour

8. @paprikati_eng Explicit Assumptions 01 Documentation 02 Support Articles & Blog Posts 03 Ad Hoc Communication 04 Industry Standards 05 Observed behaviour

9. @paprikati_eng Implicit Assumptions 01 Documentation 02 Support Articles & Blog Posts 03 Ad Hoc Communication 04 Industry Standards 05 Observed behaviour

10. @paprikati_eng Developers witness a behaviour, and assume it is reliable

11. @paprikati_eng

12. @paprikati_eng If it hasn’t broken for a long time, we think it never will

13.

14. @paprikati_eng Avoiding bad assumptions 01 Documentation 02 Support Articles & Blog Posts 03 Ad Hoc Communication 04 Industry Standards 05 Observed behaviour

15. @paprikati_eng Avoiding bad assumptions 01 Documentation 02 Support Articles & Blog Posts 03 Ad Hoc Communication 04 Industry Standards 05 Observed behaviour

16. @paprikati_eng Avoiding bad assumptions 01 Documentation 02 Support Articles & Blog Posts 03 Ad Hoc Communication 04 Industry Standards 05 Observed behaviour

17. @paprikati_eng Avoiding bad assumptions 01 Documentation 02 Support Articles & Blog Posts 03 Ad Hoc Communication 04 Industry Standards 05 Observed behaviour

18. @paprikati_eng Avoiding bad assumptions 01 Documentation 02 Support Articles & Blog Posts 03 Ad Hoc Communication 04 Industry Standards 05 Observed behaviour @paprikati_eng

19. @paprikati_eng Avoiding bad assumptions Given that many integrators just look at the HTTP examples, naming is critical Deliberately call out tripwires in your docs to combat pattern matching Restrict and document your behaviour as explicitly as you can

20.

21. @paprikati_eng ‘Breaking’ is not a binary

22. @paprikati_eng Empathise with your integrators How breaking is my change? Dogfood your products Observe and measure

23. @paprikati_eng Releasing a potentially breaking change Pull Comms Updating docs or a changelog Push Comms Newsletter or email to integrators Ack’d Comms Wait for a positive response from integrators before rolling out a change 01 02 03 Likelihood of a change being breaking

24. @paprikati_eng Can you make the change incremental? Can you release the change into a test environment? Can you easily roll back if there are unexpected consequences? Releasing a potentially breaking change

25. @paprikati_eng Find the balance between caution and product delivery that’s right for you.

26. CREDITS: This presentation template was adapted from a template by Slidesgo, including icons by Flaticon, and images by Unsplash @paprikati_eng

Notas do Editor

I’m Lisa Karlin Curtis, born and bred in London. I’m a software engineer at GoCardless working in our core-banking team. I’m gonna be talking about how to stop breaking other people’s things
We’re going to start with a sad story. A developer notices that they have an endpoint that has a really high latency compared to what they’d expect. They find a performance issue in the code (essentially an exascerbated N+1 problem), and they deploy a fix. The latency on the endpoint goes down by a half. The developer stares at the beautiful graph with a lovely cliff shape, feels good about themselves, and moves on. Somewhere else in the world, another developer gets paged - their database CPU usage has spiked and it is struggling to handle the load. So what happened here?
They start investigating - there’s no obvious cause. No recent changes, request volume is pretty much as expected. They start scaling down queues to relieve the pressure, which solves the immediate issue. The database seems to have recovered. Then they notice something strange. They’ve suddenly started processing webhooks much more quickly than they used to. It turns out that our integrator had a webhook handler would receive a webhook from us and then make a request back to find the status of the resource. This was the endpoint that we had fixed earlier that day. By the way, I’m going to use the word integrator a lot - what I mean is people who are integrating against the API that you are maintaining. Sometimes that will be inside your company, sometimes it will be a customer. Back to the story. That webhook handler spent most of its time waiting for our response, before then updating its own database. So the slow endpoint was essentially rate limiting the webhook handler’s interaction with its own database. It’s worth noting that our webhooks are often a result of batch processes, so they are really spiky - we send lots of them in a short space of time, a couple of times a day As the endpoint got faster, during those spikes, the webhook handler started to apply more load to the database than normal, to such an extent that an engineer got paged to resolve a service degradation. The fix here is fairly simple: scale down the webhook handlers so they process fewer webhooks and the database usage returns to normal. Or alternatively, beef up your database. This shows us just how easy it is to accidentally break someone else’s thing - even if you’re trying to do right by your integrators.
When do we break things? To set the scene, here are some examples of changes that have broken code in the past: Traditional API changes - adding a mandatory field, removing an endpoint, changing validation logic - I think we’re all comfortable with this stuff Introducing a rate limit / changing your rate limiting logic - docker did this recently and I think communicated really clearly, but it obviously impacted lots of their integrators Changing an error string: At GoCardless we found a bug where we weren’t respecting the accept-language header on a few of our endpoints, and we fixed it, and one of our integrators raised a ticket saying that we’d broken their software - it turned out they were relying on us not translating that particular error. Breaking apart a database transaction Change timing of your batch processing We can see from our logs that certain integrators create lots of payments ‘just-in-time’ - i.e. just before our daily payment run, so we know that changing our timings without communicating with them would cause significant issues Reducing the latency on an API call END SLIDE at about 5-6 mins
I’m gonna define a breaking change as something where I (the API developer) do a thing and someone’s integration breaks. And that happens because an assumption made by that integrator is no longer correct. When this happens, it’s easy to criticise that engineer whose made that assumption Assumptions are inevitable - as a developer you really can’t get anywhere without them Even if it is their fault, it’s often your problem. Possibly not if you’re google or AWS (unless it’s slack that you’ve killed),but for most companies if your integrators are feeling pain, then you’ll feel it too.either immediately or in the long term, when you're trying to renew contracts.
There are a few different ways that assumptions develop
Some of these are explicit: a integrator asking a question, getting an answer, And builds their system based on that answer The first step when you’re building an integration is often to look at the documentation. Although it's worth noting that people often skip to the examples and don't actually read any of the text that you have slaved over so you really need to make sure that your example is a super representative They might also look at support articles and blog posts - either stuff you’ve published Or maybe from a third party. And then you have ad hoc communication So what I mean by this is random emails or phone calls maybe with a pre sales team or your solution engineers, it might be a conversation that gets had on a support ticket. It might be emailing the friend that you have that used to work at the company and all of that kind of ad hoc communication is still driving the assumptions that integrators make about how your software is going to behave.
Other assumptions are more implicit. Industry standards are quite interesting: you send me a json response you're going to give me an application/json header. So I don't need to tell my http client that it's going to be json because it can work it out for itself and i'm going to assume as an integrator that that never changes. Similarly, I assume that you will keep my secrets safe. So if you tell me my access token was used to create something, I’ll assume it was me. Generally this stuff is fine, but in some cases you can find yourself in trouble if these standards change We had a really bad incident where we upgraded our HA Proxy version which was observing the new industry standard And downcased all our outgoing HTTP headers. According to the official textbook, HTTP response headers should not be treated as case sensitive, but a couple of key integrators had been relying on the previous behaviour and had a significant outage. And that outage was actually exacerbated by the fact that their requests were being process but they weren't processing our response and that meant that we had two systems that are out of sync in a really unfortunate way. Observed behaviour Skip to next slide!
As a integrator, you want the engineers who run the services that you use to be constantly improving it and adding features, but in a way you also want them to not touch it so you can be sure that its behaviour won’t change. As soon as a developer sees something, whether that’s An undocumented header on an HTTP response A batch process that happens at the same time every day A particular API latency They assume it’s reliable and build their systems accordingly. Humans also pattern match really aggressively - not just in software but in all walks of life. We find it very easy to convince ourselves that correlation = causation And that means particularly if we can come up with an explanation of why A always means B, we are quick to accept and rely on it. When you think about it, this is a bit bizarre - we are all employed to make changes to our own systems, We should understand that they are constantly in flux. We also all encounter interesting edge cases every day where someone has hit some incredibly unlikely scenario that’s caused your code to misbehave. But we all assume that everyone else’s will stay exactly the same forever. T-15 mins
None of this stuff is new. A great example of this is MS-DOS. MS-DOS was released with a number of documented interrupts, calls hooks - all that retro stuff - but early application developers found that they weren’t able to achieve everything they wanted. This was made worse because microsoft would use undocumented calls in their own software, so it was impossible to compete using only what was in the documentation. So like all good engineers, they started decompiling the OS, and writing lists of undocumented information like ralf brown’s interrupt list. This information was shared, and using these undocumented features became so widespread that microsoft couldn’t change anything without breaking all these applications that people used every day. We can think of the interrupt list being analogous to someone writing a blog on medium called ‘10 things you didn’t know that X API could do’
Some of these assumptions are also unconscious. Once something is stable for a while, we sort of just assume it will never break. We also make our resourcing choices based on previous data because napkin math is always quite haphazard so when i'm choosing how much cpu to allocate to my pod. I pick a number out of thin air, and then I see what happens, and then I change it until it’s happy. That works fine as long as what that pod is being asked to do is reasonably consistent over time, but as we've discussed that's not always true. We can think about this in our first story - the database had plenty of resource until our endpoint got faster
So if we want to stop breaking other people’s things, we need to help our integrators stop making bad assumptions.
Document edge cases Discoverability is important - think about SEO and also search within your docs site Don’t ever deliberately not document something. If it’s subject to change, call it out so there’s no ambiguity.
Keep your own religiously up-to-date and searchable If you’ve got 3rd party blogs that are incorrect, try contacting the author or commenting with the fix needed to make the guide work or point them at an equivalent page If you get unlucky, that 3rd party content can become the equivalent of ralf brown’s interrupt list.
Consistency is key. If a developer wants to understand what might break things, they need to know what communication is going out, ideally in a super searchable format. In my experience many B2B software companies end up emailing random PDFs around or creating shared slack channels, at which point the engineers working on the product don’t really stand a chance of knowing what assumptions might have been made as a result.
Follow them where you can Flag really loudly if you can’t, or where the industry has not yet settled
There’s a lot to think about with observed behaviour
Naming is really important. Particularly when developers don’t read the docs and just look at the examples An example is numbers that begin with 0s which often get truncated (company reg. number) We also have a field in our API called ‘account_number_ending’, but unfortunately in Australia some account numbers have letters in them, which is pretty sad. You can also try to draw attention to it in the docs - particularly by making the example include the edge case Use documentation and communication to combat pattern matching If you know you could change your batch timings, call that out in the docs ‘we currently run it once a day at 11am, but this is likely to change’ Expose information on your API that you might want to change - it’s a good flag. Restrict your own behaviour both by documenting a limit and then implementing it in the code to ensure you keep to that commitment. We had an issue at GoCardless where somebody that we integrate with started adding a lot of extra events to each webhook And our webhook handlers ran out of memory because they were loading so much data. T - 11 mins
For complex products, it’s very unlikely that all your integrators will have avoided bad assumptions. So we need to find strategies to mitigate the impact of our changes.
The first thing to remember is that a change isn’t either breaking or not. If a integrator has done something strange enough, almost anything can be breaking. This binary is historically used to assign blame: if it’s not ‘breaking’ then its the integrators fault. As we discussed earlier, it may not be technically ‘your fault’ but it’s probably still your problem. If your biggest customer’s integration breaks, the fact that you didn’t ‘break the rules’ will be little consolation to the engineers up all night trying to resolve it. So instead of thinking about it as a yes/no question - we should think about it in terms of probabilities. How likely is it that someone is relying on this behaviour.
Not all breaking changes are equal - yes some changes are 100% breaking (e.g. killing an endpoint). But many are neither 0% or 100% Try to empathise with your integrators about what assumptions they might have made. Use people in your organisation who are less familiar with the specifics than you are to rubber duck. If possible, try and talk to some of them. If you can, find ways to dogfood your APIs to find tripwires. This is particularly good as an onboarding exercise - it helps your new joiners immediately put themselves in the shoes of your integrators, And helps you keep docs and guides up-to-date as well as introducing them to your product. Sometimes you can even measure it - add observability to help you look for people relying on this undocumented behaviour - for example we can see a spike in Payment Create requests every day just before our payment run. This can also help you identify which integrators will be impacted
Scale your release approach depending on how many integrators you think have made the bad assumption. We want to have different strategies to employ at different levels. If we over communicate, we get into a ‘boy who cried wolf’ situation where no-one reads anything you send them, and their stuff ends up breaking anyway. Surprisingly, the email in their inbox that they didn’t read doesn’t make them feel better. Start at pull comms - updating docs or a changelog. This is useful to help integrators recover after they’ve found an issue You can then upgrade to push comms - perhaps a newsletter or email. This is where it gets tough - we all ignore emails every day - so try to make sure the content is as relevant as possible. Don’t tell integrators about changes to features they don’t use, and try to resist the temptation to include marketing content in the developer-focussed comms. Then if you’re really worried, you can use explicitly acknowledged comms. This works well if you have a few key integrators you want to check in with before pulling the trigger. T-5 mins
We can also mitigate the impact of a breaking change by releasing it in different ways. If at all possible, you want to try and make changes incrementally to help give early warning signs to your integrators. For example, apply the new behaviour to a % of requests. That will help integrators avoid performance cliffs and could turn a potential outage into a minor service degradation. Many integrators will have ‘near miss’ alerting to help them identify problems before they cause significant damage. If you’ve got a test or sandbox environment, that’s also a great candidate. Making changes there (if integrators are actively using it) can act as the canary in the coal mine. The final point is about rolling back - if your biggest integrators phones you and tells you that you’ve broken their integration, it’s really nice to have a kill switch in your back pocket to stop the bleeding. Now that's obviously not always possible, because it totally depends on the nature of the change. But it's worth knowing what that kill switches and also being really clear internally about when that isn't isn't possible so that as soon as that call comes in, you know what your options are.
The only way to truly avoid breaking other people’s things, is to not change anything at all, and often even that is not possible. Also, we’d mostly be out of a job. Instead, we should think in terms of managing risk. We’ve talked about ways of preventing these issues by helping your integrators make good assumptions in the first place, And how important it is to build and maintain a capability to communicate when you are making potentially breaking changes to help mitigate the impact But, you aren’t a mind reader, and integrators are sometimes careless and under pressure, just like you. So be cautious; assume that your integrators didn’t read the docs perfectly, or at all, and may have cut corners. They may not have the observability of their systems that you might hope or expect. You need to find the balance between caution and product delivery that’s right for your organisation. For all the modern talk of ‘move fast and break things’, it is still painful when stuff breaks and it can take a lot of time and energy to recover. Building trust with your integrators is critical to the success of a product, but so is delivering features. We may not be able to completely stop breaking other people’s things, but we can definitely make it much less likely if we put the effort in.
I hope you’ve enjoyed the talk - thank you for listening! Please find me on twitter at @paprikati_eng if you’d like to chat about anything we’ve covered today Have a great day!

How to avoid breaking other people's things

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a How to avoid breaking other people's things

Semelhante a How to avoid breaking other people's things (20)

Mais de Pronovix

Mais de Pronovix (20)

Último

Último (20)

How to avoid breaking other people's things

Notas do Editor