How can infrastructure engineers empower their product developers with easy-to-use systems and processes that abstract the complexity of core infrastructure? This talk focuses on Envoy configuration management, and how the networking team at Lyft builds on top of Envoy to allow Lyft engineers to focus on business logic.
3. My time at Lyft
1. Initial Envoy open sourcing: documentation, and docker sandbox examples
2. Create Envoyoutbound: enable developers to easily communicate with partners
over stable IPs
3. Open sourcing ratelimit, and a couple other golang libraries: provide ample
documentation for consumers
4. Expand Envoy’s outlier detection system, and build tooling (stats, logging) to
help developers understand anomalies in their services
5. xDS APIs and the future of Envoy configuration management at Lyft: how do we
make the control plane accessible and easy to use
@junr03
4. There is a pattern...
1. Open sourcing envoy: documentation, and docker sandbox examples
2. Create Envoyoutbound: enable developers to easily communicate with
partners over stable IPs
3. Open sourcing ratelimit, and a couple other golang libraries: provide ample
documentation for consumers
4. Expand Envoy’s outlier detection system, and build tooling (stats, logging) to
help developers understand anomalies in their services
5. xDS APIs and the future of Envoy configuration management at Lyft: how do we
make the control plane accessible and easy to use
The focus is on developer productivity!
@junr03
5. The Story
Envoy is a powerful and complex tool.
How does the Networking Team at
Lyft hide the complexity to allow
service developers to leverage the
power of Envoy?
@junr03
6. Why is this important?
• Lyft engineers are the Infra org’s customers
• Lyft is about to have a lot more engineers
• The number of services at Lyft is ever increasing
@junr03
8. Frame of Reference - The Control Plane
• Proxy configuration is complicated: envoy is not the exception
• Operating the data plane should be reserved to a select few
• Configuring some options of the data plane via the control plane should be
open to all service owners
@junr03
9. Configuration Management - The Past
Initially static files
‒ Only two types: edge proxy, service sidecar
‒ Deployed on a deploy bundle out to the edge proxy, and to all services in the mesh
Human Static Files
“Deploy
Magic”
Proxies
@junr03
10. Configuration Management - The Past
As complexity grew we moved to templated files
‒ Jinja2 templates, and some python glue
‒ Expose certain “knobs” to the service engineers at Lyft
‒ At deploy time, create the configuration file
Human
Exposed
Knobs
“Deploy
Magic”
Proxies
Jinja2
Templates
+
@junr03
11. Use case: create a new public route
• Service developers manipulate edge proxy route table
• Deploying public routing changes was tied to an Envoy binary deployment
• Erroneous configuration could be deployed next to complex code
Front Envoy
/new/route
New Service
@junr03
12. Pain points
• Configuration deployment was tied to binary deployment
• UX is tedious and fragmented
The Complexity is in Plain Sight
@junr03
13. Configuration Management - The Present
Mid 2017: xDS APIs for configuration management.
• gRPC/protobuf based
• Bi-directional gRPC streaming
• Interacting with the control plane is separated from data plane operation
• Enable us to develop smart, robust control plane solutions
RDS - Route Discovery Service
CDS - Cluster DS
LDS - Listener DS
...
@junr03
14. Configuration Management - The Present
Envoymanager
/ /
service
deployment
envoy-static-config
service
“manifest”
Document
Cloud Storage
@junr03
15. Configuration Management - The Present
envoy-static-config
service
“manifest”
match:
path: /rider/
route:
cluster: pagelauncher
@junr03
31. Wins
• Allows service developers to own configuration changes all the way to
production
• Most configuration changes do not entail an envoy restart
• Most configuration changes do not entail an envoy binary deploy
• Opens up the world to more friendly UX for configuration changes
@junr03
33. The networking team focuses on building
accessible and easy-to-use systems for
service developers to successfully
configure, operate, and debug Envoy
@junr03
I am an Envoy Maintainer, but I am also a software engineer in Lyft’s networking team.
So I am in an interesting spot, because I help write Envoy, but I also have to operate it, and productionalize it for the rest of the engineering org at Lyft.
I wanted to show you my timeline because I think that a very clear pattern emerges.
As infrastructure developers we need to enable developers so that they can execute fast in a reliable manner.
We need to provide great, and clear documentation. We need to provide easy to follow examples. We need to build tooling that is accessible and easy to use.
Today I have focused on configuration management but the networking team does a great deal of to accelerate developer productivity:
Default dashboards
Access logging
Tracing
DoS protection