SlideShare a Scribd company logo
1 of 54
Resilience and Compliance
at Speed and Scale
ISACA SV Spring Conference
Jason Chan
chan@netflix.com
linkedin.com/in/jasonbchan
@chanjbs
About Me
 Engineering Director @ Netflix:
 Security: product, app, ops, IR, fraud/abuse
 Previously:
 Led infosec team @ VMware
 Consultant - @stake, iSEC Partners
About Netflix
Common Approaches to Reslience
Common Controls to Promote Resilience
 Architectural committees
 Change approval boards
 Centralized deployments
 Vendor-specific, component-
level HA
 Standards and checklists
 Designed to standardize on
design patterns, vendors, etc.
 Problems for Netflix:
 Freedom and Responsibility
Culture
 Highly aligned and loosely
coupled
 Innovation cycles
Common Controls to Promote Resilience
 Architectural committees
 Change approval boards
 Centralized deployments
 Vendor-specific, component-
level HA
 Standards and checklists
 Designed to control and de-
risk change
 Focus on artifacts, test and
rollback plans
 Problems for Netflix:
 Freedom and Responsibility
Culture
 Highly aligned and loosely
coupled
 Innovation cycles
Common Controls to Promote Resilience
 Architectural committees
 Change approval boards
 Centralized deployments
 Vendor-specific, component-
level HA
 Standards and checklists
 Separate Ops team deploys at
a pre-ordained time (e.g.
weekly, monthly)
 Problems for Netflix:
 Freedom and Responsibility
Culture
 Highly aligned and loosely
coupled
 Innovation cycles
Common Controls to Promote Resilience
 Architectural committees
 Change approval boards
 Centralized deployments
 Vendor-specific, component-
level HA
 Standards and checklists
 High reliance on vendor
solutions to provide HA and
resilience
 Problems for Netflix:
 Traditional data center oriented
systems do not translate well
to the cloud
 Heavy use of open source
Common Controls to Promote Resilience
 Architectural committees
 Change approval boards
 Centralized deployments
 Vendor-specific, component-
level HA
 Standards and checklists
 Designed for repeatable
execution
 Problems for Netflix:
 Reliance on humans
Approaches to Resilience @ Netflix
What does the business value?
 Customer experience
 Innovation and agility
 In other words:
 Stability and availability for customer experience
 Rapid development and change to continually improve product
and outpace competition
 Not that different from anyone else
Overall Approach
 Understand and solve for relevant failure modes
 Rely on automation and tools, not humans or
committees
 Make no assumptions that planned controls will work
 Provide train tracks and guardrails, but invite deviation
Resilience and Compliance at Speed and Scale
Goals of Simian Army
“Each system has to be able to succeed, no matter what, even all on its own.
We’re designing each distributed system to expect and tolerate failure from
other systems on which it depends.”
http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html
Systems fail
Resilience and Compliance at Speed and Scale
Chaos Monkey
 “By frequently causing failures, we force our services to
be built in a way that is more resilient.”
 Terminates cluster nodes during business hours
 Rejects “If it ain’t broke, don’t fix it”
 Goals:
 Simulate random hardware failures, human error at small scale
 Identify weaknesses
 No service impact
Lots of systems fail
Resilience and Compliance at Speed and Scale
Chaos Gorilla
 Chaos Monkey’s bigger brother
 Standard deployment pattern is to distribute
load/systems/data across three data centers (AZs)
 What happens if one is lost?
 Goals:
 Simulate data center loss, hardware/service failures at larger
scale
 Identify weaknesses, dependencies, etc.
 Minimal service impact
What about larger catastrophes?
Resilience and Compliance at Speed and Scale
Chaos Kong
 Simulate an entire region (US west coast, US east coast)
failing
 For example – hurricane, large winter storm, earthquake, etc.
 Goals:
 Exercise end-to-end large-scale failover (routing, DNS, scaling
up)
The sick and wounded
Resilience and Compliance at Speed and Scale
Latency Monkey
 Distributed systems have many upstream/downstream
connections
 How fault-tolerant are systems to dependency
failure/slowdown?
 Goals:
 Simulate latencies and error codes, see how a service responds
 Survivable services regardless of dependencies
Outliers and rebels
Resilience and Compliance at Speed and Scale
Conformity Monkey
 Without architecture review, how do you ensure designs
leverage known successful patterns?
 Conformity Monkey provides automated analysis for
pattern adherence
 Goals:
 Evaluate deployment modes (data center distribution)
 Evaluate health checks, discoverability, versions of key libraries
 Help ensure service has best chance of successful operation
Cruft, junk, and clutter
Resilience and Compliance at Speed and Scale
Janitor Monkey
 Clutter accumulates, in the form of:
 Complexity
 Vulnerabilities
 Cost
 Janitor identifies unused resources and reaps them to
save money and reduce exposure
 Goals:
 Automated hygiene
 More freedom for engineers to innovate and move fast
Non-Simian Approaches
 Org model
 Engineers write, deploy, support code
 Culture
 De-centralized with as few processes and rules as possible
 Lots of local autonomy
 “If you’re not failing, you’re not trying hard enough”
 Peer pressure
 Productive and transparent incident reviews
Software Deployment for Compliance-Sensitive Apps
Control Objectives for Software Deployments
Visibility and transparency
 Who did what, when?
 What was the scope of the
change or deployment?
 Was it reviewed?
 Was it tested?
 Was it approved?
Typically attempted via:
 Restricted access/SoD
 CMDBs
 Change management
processes
 Test results
 Change windows
Large and Dynamic Systems Need a Different Approach
 No operations organization
 No acceptable windows for downtime
 Thousands of deployments and changes per day
Control Objectives Haven’t Changed
Visibility and transparency
 Who did what, when?
 What was the scope of the change or deployment?
 Was it reviewed?
 Was it tested?
 Was it approved?
System-wide view on changes
Access to changes by app,
region, environment, etc.
Lookback in time
as needed
Changes, via email
When?
By who?
What changed?
Integrated awareness
Chat integration
lets engineers
easily access info
Automated testing
Resilience and Compliance at Speed and Scale
1000+ tests to compare
proposed vs. existing
Automated scoring and
deployment decision
Complete view of deployment lifecycle
Jenkins
(CI) job
App name
Currently
running clusters
by
region/environm
ent
Cluster ID
Deployment
details
AMI version
SCM commit
Modified
files
Source
diffs
Link to
relevant
JIRA(s)
Resilience and Compliance at Speed and Scale
Takeaway
 Control objectives have not changed, but advantages of
new technologies and operational models dictate
updated approaches
Netflix References
 http://netflix.github.com
 http://techblog.netflix.com
 http://slideshare.net/netflix
Questions?
chan@netflix.com

More Related Content

What's hot

Securing Systems at Cloud Scale with DevSecOps
Securing Systems at Cloud Scale with DevSecOpsSecuring Systems at Cloud Scale with DevSecOps
Securing Systems at Cloud Scale with DevSecOpsAmazon Web Services
 
Security at the Speed of Software Development
Security at the Speed of Software DevelopmentSecurity at the Speed of Software Development
Security at the Speed of Software DevelopmentDevOps.com
 
Proactive Security AppSec Case Study
Proactive Security AppSec Case StudyProactive Security AppSec Case Study
Proactive Security AppSec Case StudyAndy Hoernecke
 
Overcoming Security Challenges in DevOps
Overcoming Security Challenges in DevOpsOvercoming Security Challenges in DevOps
Overcoming Security Challenges in DevOpsAlert Logic
 
Best Practices for Workload Security: Securing Servers in Modern Data Center ...
Best Practices for Workload Security: Securing Servers in Modern Data Center ...Best Practices for Workload Security: Securing Servers in Modern Data Center ...
Best Practices for Workload Security: Securing Servers in Modern Data Center ...CloudPassage
 
Maturing your organization from DevOps to DevSecOps
Maturing your organization from DevOps to DevSecOpsMaturing your organization from DevOps to DevSecOps
Maturing your organization from DevOps to DevSecOpsAmazon Web Services
 
A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016
A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016
A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016Shannon Lietz
 
Cloud Security Essentials 2.0 at RSA
Cloud Security Essentials 2.0 at RSACloud Security Essentials 2.0 at RSA
Cloud Security Essentials 2.0 at RSAShannon Lietz
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015Shannon Lietz
 
DevOps In Azure: Deliver Value With Automation
DevOps In Azure: Deliver Value With AutomationDevOps In Azure: Deliver Value With Automation
DevOps In Azure: Deliver Value With AutomationUtkarsh Pandey
 
Chaos Engineering and Systems Reliability
Chaos Engineering and Systems ReliabilityChaos Engineering and Systems Reliability
Chaos Engineering and Systems ReliabilitySylvain Hellegouarch
 
Shared Security Responsibility for the Azure Cloud
Shared Security Responsibility for the Azure CloudShared Security Responsibility for the Azure Cloud
Shared Security Responsibility for the Azure CloudAlert Logic
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015Shannon Lietz
 
From Zero to ATO: A Step-by-Step Guide on the DoD Compliance Framework
From Zero to ATO: A Step-by-Step Guide on the DoD Compliance FrameworkFrom Zero to ATO: A Step-by-Step Guide on the DoD Compliance Framework
From Zero to ATO: A Step-by-Step Guide on the DoD Compliance FrameworkAmazon Web Services
 
DevSecOps - CrikeyCon 2017
DevSecOps - CrikeyCon 2017DevSecOps - CrikeyCon 2017
DevSecOps - CrikeyCon 2017kieranjacobsen
 
CSS17: Atlanta - Realities of Security in the Cloud
CSS17: Atlanta - Realities of Security in the CloudCSS17: Atlanta - Realities of Security in the Cloud
CSS17: Atlanta - Realities of Security in the CloudAlert Logic
 
Managed Threat Detection & Response for AWS Applications
Managed Threat Detection & Response for AWS ApplicationsManaged Threat Detection & Response for AWS Applications
Managed Threat Detection & Response for AWS ApplicationsAlert Logic
 

What's hot (20)

Securing Systems at Cloud Scale with DevSecOps
Securing Systems at Cloud Scale with DevSecOpsSecuring Systems at Cloud Scale with DevSecOps
Securing Systems at Cloud Scale with DevSecOps
 
Introduction to DevSecOps
Introduction to DevSecOpsIntroduction to DevSecOps
Introduction to DevSecOps
 
Security at the Speed of Software Development
Security at the Speed of Software DevelopmentSecurity at the Speed of Software Development
Security at the Speed of Software Development
 
Proactive Security AppSec Case Study
Proactive Security AppSec Case StudyProactive Security AppSec Case Study
Proactive Security AppSec Case Study
 
Overcoming Security Challenges in DevOps
Overcoming Security Challenges in DevOpsOvercoming Security Challenges in DevOps
Overcoming Security Challenges in DevOps
 
Best Practices for Workload Security: Securing Servers in Modern Data Center ...
Best Practices for Workload Security: Securing Servers in Modern Data Center ...Best Practices for Workload Security: Securing Servers in Modern Data Center ...
Best Practices for Workload Security: Securing Servers in Modern Data Center ...
 
Maturing your organization from DevOps to DevSecOps
Maturing your organization from DevOps to DevSecOpsMaturing your organization from DevOps to DevSecOps
Maturing your organization from DevOps to DevSecOps
 
A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016
A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016
A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016
 
Cloud Security Essentials 2.0 at RSA
Cloud Security Essentials 2.0 at RSACloud Security Essentials 2.0 at RSA
Cloud Security Essentials 2.0 at RSA
 
Implementing DevSecOps
Implementing DevSecOpsImplementing DevSecOps
Implementing DevSecOps
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015
 
DevOps In Azure: Deliver Value With Automation
DevOps In Azure: Deliver Value With AutomationDevOps In Azure: Deliver Value With Automation
DevOps In Azure: Deliver Value With Automation
 
Chaos Engineering and Systems Reliability
Chaos Engineering and Systems ReliabilityChaos Engineering and Systems Reliability
Chaos Engineering and Systems Reliability
 
Shared Security Responsibility for the Azure Cloud
Shared Security Responsibility for the Azure CloudShared Security Responsibility for the Azure Cloud
Shared Security Responsibility for the Azure Cloud
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015
 
From Zero to ATO: A Step-by-Step Guide on the DoD Compliance Framework
From Zero to ATO: A Step-by-Step Guide on the DoD Compliance FrameworkFrom Zero to ATO: A Step-by-Step Guide on the DoD Compliance Framework
From Zero to ATO: A Step-by-Step Guide on the DoD Compliance Framework
 
Azure Security Center
Azure Security CenterAzure Security Center
Azure Security Center
 
DevSecOps - CrikeyCon 2017
DevSecOps - CrikeyCon 2017DevSecOps - CrikeyCon 2017
DevSecOps - CrikeyCon 2017
 
CSS17: Atlanta - Realities of Security in the Cloud
CSS17: Atlanta - Realities of Security in the CloudCSS17: Atlanta - Realities of Security in the Cloud
CSS17: Atlanta - Realities of Security in the Cloud
 
Managed Threat Detection & Response for AWS Applications
Managed Threat Detection & Response for AWS ApplicationsManaged Threat Detection & Response for AWS Applications
Managed Threat Detection & Response for AWS Applications
 

Viewers also liked

Amazon Web Services Security
Amazon Web Services SecurityAmazon Web Services Security
Amazon Web Services SecurityJason Chan
 
The Psychology of Security Automation
The Psychology of Security AutomationThe Psychology of Security Automation
The Psychology of Security AutomationJason Chan
 
Defending Netflix from Abuse
Defending Netflix from AbuseDefending Netflix from Abuse
Defending Netflix from AbuseJason Chan
 
Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedJason Chan
 
Practical Cloud Security
Practical Cloud SecurityPractical Cloud Security
Practical Cloud SecurityJason Chan
 
Practical Security Automation
Practical Security AutomationPractical Security Automation
Practical Security AutomationJason Chan
 
Careers in Security
Careers in SecurityCareers in Security
Careers in SecurityJason Chan
 
Real World Cloud Application Security
Real World Cloud Application SecurityReal World Cloud Application Security
Real World Cloud Application SecurityJason Chan
 
Security at Scale - Lessons from Six Months at Yahoo
Security at Scale - Lessons from Six Months at YahooSecurity at Scale - Lessons from Six Months at Yahoo
Security at Scale - Lessons from Six Months at YahooAlex Stamos
 
Virtualization: Security and IT Audit Perspectives
Virtualization: Security and IT Audit PerspectivesVirtualization: Security and IT Audit Perspectives
Virtualization: Security and IT Audit PerspectivesJason Chan
 
Cloud Security @ Netflix
Cloud Security @ NetflixCloud Security @ Netflix
Cloud Security @ NetflixJason Chan
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalaspyker
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Sourceaspyker
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4aspyker
 
AWS Security: A Practitioner's Perspective
AWS Security: A Practitioner's PerspectiveAWS Security: A Practitioner's Perspective
AWS Security: A Practitioner's PerspectiveJason Chan
 
Netflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV DevicesNetflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV DevicesMatt McCarthy
 
Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thingaspyker
 

Viewers also liked (20)

Amazon Web Services Security
Amazon Web Services SecurityAmazon Web Services Security
Amazon Web Services Security
 
The Psychology of Security Automation
The Psychology of Security AutomationThe Psychology of Security Automation
The Psychology of Security Automation
 
Defending Netflix from Abuse
Defending Netflix from AbuseDefending Netflix from Abuse
Defending Netflix from Abuse
 
Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons Learned
 
Practical Cloud Security
Practical Cloud SecurityPractical Cloud Security
Practical Cloud Security
 
Practical Security Automation
Practical Security AutomationPractical Security Automation
Practical Security Automation
 
Careers in Security
Careers in SecurityCareers in Security
Careers in Security
 
Real World Cloud Application Security
Real World Cloud Application SecurityReal World Cloud Application Security
Real World Cloud Application Security
 
Security at Scale - Lessons from Six Months at Yahoo
Security at Scale - Lessons from Six Months at YahooSecurity at Scale - Lessons from Six Months at Yahoo
Security at Scale - Lessons from Six Months at Yahoo
 
Analyze System and Code Interactions
Analyze System and Code InteractionsAnalyze System and Code Interactions
Analyze System and Code Interactions
 
Virtualization: Security and IT Audit Perspectives
Virtualization: Security and IT Audit PerspectivesVirtualization: Security and IT Audit Perspectives
Virtualization: Security and IT Audit Perspectives
 
Cloud Security @ Netflix
Cloud Security @ NetflixCloud Security @ Netflix
Cloud Security @ Netflix
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinal
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4
 
AWS Security: A Practitioner's Perspective
AWS Security: A Practitioner's PerspectiveAWS Security: A Practitioner's Perspective
AWS Security: A Practitioner's Perspective
 
Netflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV DevicesNetflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV Devices
 
Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thing
 

Similar to Resilience and Compliance at Speed and Scale

Enterprise DevOps: Scaling Build, Deploy, Test, Release
Enterprise DevOps: Scaling Build, Deploy, Test, ReleaseEnterprise DevOps: Scaling Build, Deploy, Test, Release
Enterprise DevOps: Scaling Build, Deploy, Test, ReleaseIBM UrbanCode Products
 
Dev ops developer (session 3)
Dev ops developer (session 3)Dev ops developer (session 3)
Dev ops developer (session 3)MSDEVMTL
 
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...agilemaine
 
Implementing a testing strategy
Implementing a testing strategyImplementing a testing strategy
Implementing a testing strategyDaniel Giraldo
 
Use DevOps to Respond Faster to End Customers
Use DevOps to Respond Faster to End CustomersUse DevOps to Respond Faster to End Customers
Use DevOps to Respond Faster to End CustomersInfo-Tech Research Group
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auevanbottcher
 
Risk Driven Testing
Risk Driven TestingRisk Driven Testing
Risk Driven TestingJorge Boria
 
WKS402 Well-Architected Workshop
WKS402 Well-Architected WorkshopWKS402 Well-Architected Workshop
WKS402 Well-Architected WorkshopAmazon Web Services
 
Continuous delivery
Continuous deliveryContinuous delivery
Continuous deliveryMasas Dani
 
Andy singleton continuous delivery-fcb - nov 2014
Andy singleton   continuous delivery-fcb - nov 2014Andy singleton   continuous delivery-fcb - nov 2014
Andy singleton continuous delivery-fcb - nov 2014Brad Power
 
Best practice adoption (and lack there of)
Best practice adoption (and lack there of)Best practice adoption (and lack there of)
Best practice adoption (and lack there of)John Pape
 
ალექსანდრე ნემსაძე - Release it
ალექსანდრე ნემსაძე - Release itალექსანდრე ნემსაძე - Release it
ალექსანდრე ნემსაძე - Release itunihack
 
Anti Patterns Siddhesh Lecture2 Of3
Anti Patterns Siddhesh Lecture2 Of3Anti Patterns Siddhesh Lecture2 Of3
Anti Patterns Siddhesh Lecture2 Of3Siddhesh Bhobe
 
DevOps Roadshow - removing barriers between development and operations
DevOps Roadshow - removing barriers between development and operationsDevOps Roadshow - removing barriers between development and operations
DevOps Roadshow - removing barriers between development and operationsMicrosoft Developer Norway
 
Large scale agile development practices
Large scale agile development practicesLarge scale agile development practices
Large scale agile development practicesSkills Matter
 
Raise the Bar! Reloaded
Raise the Bar! ReloadedRaise the Bar! Reloaded
Raise the Bar! ReloadedCodemotion
 
Encontrando la Aguja en el Rendimiento de Aplicaciones
Encontrando la Aguja en el Rendimiento de AplicacionesEncontrando la Aguja en el Rendimiento de Aplicaciones
Encontrando la Aguja en el Rendimiento de AplicacionesSoftware Guru
 
No Devops Without Continuous Testing
No Devops Without Continuous TestingNo Devops Without Continuous Testing
No Devops Without Continuous TestingParasoft
 

Similar to Resilience and Compliance at Speed and Scale (20)

Enterprise DevOps: Scaling Build, Deploy, Test, Release
Enterprise DevOps: Scaling Build, Deploy, Test, ReleaseEnterprise DevOps: Scaling Build, Deploy, Test, Release
Enterprise DevOps: Scaling Build, Deploy, Test, Release
 
Dev ops developer (session 3)
Dev ops developer (session 3)Dev ops developer (session 3)
Dev ops developer (session 3)
 
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
 
Implementing a testing strategy
Implementing a testing strategyImplementing a testing strategy
Implementing a testing strategy
 
Use DevOps to Respond Faster to End Customers
Use DevOps to Respond Faster to End CustomersUse DevOps to Respond Faster to End Customers
Use DevOps to Respond Faster to End Customers
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.au
 
Risk Driven Testing
Risk Driven TestingRisk Driven Testing
Risk Driven Testing
 
WKS402 Well-Architected Workshop
WKS402 Well-Architected WorkshopWKS402 Well-Architected Workshop
WKS402 Well-Architected Workshop
 
Continuous delivery
Continuous deliveryContinuous delivery
Continuous delivery
 
Andy singleton continuous delivery-fcb - nov 2014
Andy singleton   continuous delivery-fcb - nov 2014Andy singleton   continuous delivery-fcb - nov 2014
Andy singleton continuous delivery-fcb - nov 2014
 
Best practice adoption (and lack there of)
Best practice adoption (and lack there of)Best practice adoption (and lack there of)
Best practice adoption (and lack there of)
 
ალექსანდრე ნემსაძე - Release it
ალექსანდრე ნემსაძე - Release itალექსანდრე ნემსაძე - Release it
ალექსანდრე ნემსაძე - Release it
 
Enterprise DevOps
Enterprise DevOpsEnterprise DevOps
Enterprise DevOps
 
Anti Patterns Siddhesh Lecture2 Of3
Anti Patterns Siddhesh Lecture2 Of3Anti Patterns Siddhesh Lecture2 Of3
Anti Patterns Siddhesh Lecture2 Of3
 
DevOps Roadshow - removing barriers between development and operations
DevOps Roadshow - removing barriers between development and operationsDevOps Roadshow - removing barriers between development and operations
DevOps Roadshow - removing barriers between development and operations
 
Large scale agile development practices
Large scale agile development practicesLarge scale agile development practices
Large scale agile development practices
 
Raise the Bar! Reloaded
Raise the Bar! ReloadedRaise the Bar! Reloaded
Raise the Bar! Reloaded
 
Raise the bar! Reloaded
Raise the bar! ReloadedRaise the bar! Reloaded
Raise the bar! Reloaded
 
Encontrando la Aguja en el Rendimiento de Aplicaciones
Encontrando la Aguja en el Rendimiento de AplicacionesEncontrando la Aguja en el Rendimiento de Aplicaciones
Encontrando la Aguja en el Rendimiento de Aplicaciones
 
No Devops Without Continuous Testing
No Devops Without Continuous TestingNo Devops Without Continuous Testing
No Devops Without Continuous Testing
 

Recently uploaded

UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 

Recently uploaded (20)

UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 

Resilience and Compliance at Speed and Scale

  • 1. Resilience and Compliance at Speed and Scale ISACA SV Spring Conference Jason Chan chan@netflix.com linkedin.com/in/jasonbchan @chanjbs
  • 2. About Me  Engineering Director @ Netflix:  Security: product, app, ops, IR, fraud/abuse  Previously:  Led infosec team @ VMware  Consultant - @stake, iSEC Partners
  • 5. Common Controls to Promote Resilience  Architectural committees  Change approval boards  Centralized deployments  Vendor-specific, component- level HA  Standards and checklists  Designed to standardize on design patterns, vendors, etc.  Problems for Netflix:  Freedom and Responsibility Culture  Highly aligned and loosely coupled  Innovation cycles
  • 6. Common Controls to Promote Resilience  Architectural committees  Change approval boards  Centralized deployments  Vendor-specific, component- level HA  Standards and checklists  Designed to control and de- risk change  Focus on artifacts, test and rollback plans  Problems for Netflix:  Freedom and Responsibility Culture  Highly aligned and loosely coupled  Innovation cycles
  • 7. Common Controls to Promote Resilience  Architectural committees  Change approval boards  Centralized deployments  Vendor-specific, component- level HA  Standards and checklists  Separate Ops team deploys at a pre-ordained time (e.g. weekly, monthly)  Problems for Netflix:  Freedom and Responsibility Culture  Highly aligned and loosely coupled  Innovation cycles
  • 8. Common Controls to Promote Resilience  Architectural committees  Change approval boards  Centralized deployments  Vendor-specific, component- level HA  Standards and checklists  High reliance on vendor solutions to provide HA and resilience  Problems for Netflix:  Traditional data center oriented systems do not translate well to the cloud  Heavy use of open source
  • 9. Common Controls to Promote Resilience  Architectural committees  Change approval boards  Centralized deployments  Vendor-specific, component- level HA  Standards and checklists  Designed for repeatable execution  Problems for Netflix:  Reliance on humans
  • 11. What does the business value?  Customer experience  Innovation and agility  In other words:  Stability and availability for customer experience  Rapid development and change to continually improve product and outpace competition  Not that different from anyone else
  • 12. Overall Approach  Understand and solve for relevant failure modes  Rely on automation and tools, not humans or committees  Make no assumptions that planned controls will work  Provide train tracks and guardrails, but invite deviation
  • 14. Goals of Simian Army “Each system has to be able to succeed, no matter what, even all on its own. We’re designing each distributed system to expect and tolerate failure from other systems on which it depends.” http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html
  • 17. Chaos Monkey  “By frequently causing failures, we force our services to be built in a way that is more resilient.”  Terminates cluster nodes during business hours  Rejects “If it ain’t broke, don’t fix it”  Goals:  Simulate random hardware failures, human error at small scale  Identify weaknesses  No service impact
  • 20. Chaos Gorilla  Chaos Monkey’s bigger brother  Standard deployment pattern is to distribute load/systems/data across three data centers (AZs)  What happens if one is lost?  Goals:  Simulate data center loss, hardware/service failures at larger scale  Identify weaknesses, dependencies, etc.  Minimal service impact
  • 21. What about larger catastrophes?
  • 23. Chaos Kong  Simulate an entire region (US west coast, US east coast) failing  For example – hurricane, large winter storm, earthquake, etc.  Goals:  Exercise end-to-end large-scale failover (routing, DNS, scaling up)
  • 24. The sick and wounded
  • 26. Latency Monkey  Distributed systems have many upstream/downstream connections  How fault-tolerant are systems to dependency failure/slowdown?  Goals:  Simulate latencies and error codes, see how a service responds  Survivable services regardless of dependencies
  • 29. Conformity Monkey  Without architecture review, how do you ensure designs leverage known successful patterns?  Conformity Monkey provides automated analysis for pattern adherence  Goals:  Evaluate deployment modes (data center distribution)  Evaluate health checks, discoverability, versions of key libraries  Help ensure service has best chance of successful operation
  • 30. Cruft, junk, and clutter
  • 32. Janitor Monkey  Clutter accumulates, in the form of:  Complexity  Vulnerabilities  Cost  Janitor identifies unused resources and reaps them to save money and reduce exposure  Goals:  Automated hygiene  More freedom for engineers to innovate and move fast
  • 33. Non-Simian Approaches  Org model  Engineers write, deploy, support code  Culture  De-centralized with as few processes and rules as possible  Lots of local autonomy  “If you’re not failing, you’re not trying hard enough”  Peer pressure  Productive and transparent incident reviews
  • 34. Software Deployment for Compliance-Sensitive Apps
  • 35. Control Objectives for Software Deployments Visibility and transparency  Who did what, when?  What was the scope of the change or deployment?  Was it reviewed?  Was it tested?  Was it approved? Typically attempted via:  Restricted access/SoD  CMDBs  Change management processes  Test results  Change windows
  • 36. Large and Dynamic Systems Need a Different Approach  No operations organization  No acceptable windows for downtime  Thousands of deployments and changes per day
  • 37. Control Objectives Haven’t Changed Visibility and transparency  Who did what, when?  What was the scope of the change or deployment?  Was it reviewed?  Was it tested?  Was it approved?
  • 39. Access to changes by app, region, environment, etc. Lookback in time as needed
  • 46. 1000+ tests to compare proposed vs. existing Automated scoring and deployment decision
  • 47. Complete view of deployment lifecycle
  • 48. Jenkins (CI) job App name Currently running clusters by region/environm ent
  • 52. Takeaway  Control objectives have not changed, but advantages of new technologies and operational models dictate updated approaches
  • 53. Netflix References  http://netflix.github.com  http://techblog.netflix.com  http://slideshare.net/netflix