SlideShare a Scribd company logo
1 of 28
Download to read offline
Reveal Your Deepest
Kubernetes Secrets Metrics
KubeCon 2017 Prometheus Salon- 12/6/2017
Bob Cotton - FreshTracks.io
About Me
● Co-Founder - FreshTracks.io - A CA Accelerator Incubation
● bob@freshtracks.io
● @bob_cotton
Agenda
● Sources of metrics
○ Node
○ kubelet and containers
○ Kubernetes API
○ etcd
○ Derived metrics (kube-state-metrics)
● The new K8s metrics server
● Horizontal pod auto-scaler
● Prometheus re-labeling and recording rules
● K8s cluster hierarchies and metrics aggregation
Sources of Metrics in Kubernetes
Host Metrics from the node_exporter
● Standard Host Metrics
○ Load Average
○ CPU
○ Memory
○ Disk
○ Network
○ Many others
● ~1000 Unique series in a typical node
Node
node_exporter
/metrics
Container Metrics from cAdvisor
● cAdvisor is embedded into the kubelet, so we
scrape the kubelet to get container metrics
● These are the so-called “core” metrics
● For each container on the node:
○ CPU Usage (user and system) and time throttled
○ Filesystem read/writes/limits
○ Memory usage and limits
○ Network transmit/receive/dropped
Node
node_exporter
/metrics
kubelet
cAdvisor
Kubernetes Metrics from the K8s API Server
● Metrics about the performance of the K8s API Server
○ Performance of controller work queues
○ Request Rates and Latencies
○ Etcd helper cache work queues and cache performance
○ General process status (File Descriptors/Memory/CPU
Seconds)
○ Golang status (GC/Memory/Threads)
Node
node_exporter
kubelet
cAdvisor
Any other Pod
/metrics
API Server
Etcd Metrics from etcd
● Etcd is “master of all truth” within a K8s cluster
○ Leader existence and leader change rate
○ Proposals committed/applied/pending/failed
○ Disk write performance
○ Network and gRPC counters
K8s Derived Metrics from kube-state-metrics
● Counts and meta-data about many K8s types
○ Counts of many “nouns”
○ Resource Limits
○ Container states
■ ready/restarts/running/terminated/waiting
○ _labels series just carries labels from Pods
● cronjob
● daemonset
● deployment
● horizontalpodautoscaler
● job
● limitrange
● namespace
● node
● persistentvolumeclaim
● pod
● replicaset
● replicationcontroller
● resourcequota
● service
● statefulset
Sources of Metrics in Kubernetes
● Node via the node_exporter
● Container metrics via the kubelet and cAdvisor
● Kubernetes API server
● etcd
● Derived metrics via kube-state-metrics
Scheduling and Autoscaling
i.e. The Metrics Pipeline
The New “Metrics Server”
● Replaces Heapster
● Standard (versioned and auth) API aggregated into the K8s API Server
● In “beta” in K8s 1.8
● Used by the scheduler and (eventually) the Horizontal Pod Autoscaler
● A stripped-down version of Heapster
● Reports on “core” metrics (CPU/Memory/Network) gathered from cAdvisor
● For internal to K8s use only.
● Pluggable for custom metrics
Feeding the Horizontal Pod Autoscaler
● Before the metrics server the HPA utilized Heapster for it’s Core metrics
○ This will be the metrics-server going forward
● API Adapter will bridge to third party monitoring system
○ e.g. Prometheus
Labels, Re-Label and Recording Rules
Oh My...
Metric Metadata
In the beginning:
<metric name> = <metric value>
http_requests_total = 1.4
Increased complexity lead to workarounds
region.az.instance_type.instance.hostname.http_requests_total = 5439
Metric Metadata - Metrics 2.0
us-east.us-east-1.m2_xlarge.i-3582k8.host1.http_requests_total = 5439
http_requests_total{region=” us-east”,
az=”us-east-1”,
instance_type=” m2.xlarge”,
instance=” i-3582k8”,
hostname=” host1”} = 5439
Kubernetes Labels
● Kubernetes gives us labels on all the things
● Our scrape targets live in the context of the K8s labels
● We want to enhance the scraped metric labels with K8s labels
● This is why we need relabel rules in Prometheus
Prometheus
K8s API Server
TSDB
Kublet
(cAdvisor)
node-exporter
kube_state_metrics
App containers
other exporters
node_exporter
App containers
Kublet
(cAdvisor)
Service Discovery
K8s API Server
TSDB
Scrape Target
Service Discovery
Prometheus
0="{__address__ 300.196.17.41}"
1="{__meta_kubernetes_namespace default}"
2="{__meta_kubernetes_pod_annotation_freshtracks_io_data_sidecar true}"
3="{__meta_kubernetes_pod_annotation_freshtracks_io_path /metrics2}"
4="{__meta_kubernetes_pod_annotation_kubernetes_io_created_by "kind":"SerializedReference"?}"
5="{__meta_kubernetes_pod_annotation_kubernetes_io_limit_ranger LimitRanger plugin set: cpu
request for container prometheus-configmap-reload; cpu request for container data-sidecar}"
6="{__meta_kubernetes_pod_annotation_prometheus_io_port 8077}"
7="{__meta_kubernetes_pod_annotation_prometheus_io_scrape false}"
8="{__meta_kubernetes_pod_container_name prometheus-configmap-reload}"
9="{__meta_kubernetes_pod_host_ip 172.20.42.119}"
10="{__meta_kubernetes_pod_ip 100.96.17.41}"
11="{__meta_kubernetes_pod_label_freshtracks_io_cluster bowl.freshtracks.io}"
12="{__meta_kubernetes_pod_label_pod_template_hash 1636686694}"
13="{__meta_kubernetes_pod_label_run data-sidecar}"
14="{__meta_kubernetes_pod_name data-sidecar-1636686694-83crm}"
15="{__meta_kubernetes_pod_node_name ip-xx-xxx-xx-xxx.us-west-2.compute.internal}"
16="{__meta_kubernetes_pod_ready false}"
17="{__metrics_path__ /metrics}"
18="{__scheme__ http}"
19="{job ftio-data-sidecar-calc}"
<relabel_config>
{__address__ 300.196.17.41:8077}
{__scheme__ http}
{__metrics_path__ /metrics}
{job ftio-data-sidecar-calc}
{kubernetes_namespace default}
{container_name prometheus-configmap-reload}
http_requests_total{region=”us-east”,
az=”us-east-1”, instance_type=”m2.xlarge”,
instance=”i-3582k8”, hostname=”host1”} = 5439
http_requests_total{region=”us-east”,
az=”us-east-1”,
instance_type=”m2.xlarge”,
instance=”i-3582k8”,
hostname=”host1”,
instance=”300.196.17.41:8077”,
job=”ftio-data-sidecar-calc”,
kubernetes_namespace=”default”,
container_name=”prometheus-configmap-reload”,
} = 5439
<metric_relabel_config>
Recording Rules
Create a new series, derived from one or more existing series
# The name of the time series to output to. Must be a valid metric name.
record: <string>
# The PromQL expression to evaluate. Every evaluation cycle this is
# evaluated at the current time, and the result recorded as a new set of
# time series with the metric name as given by 'record'.
expr: <string>
# Labels to add or overwrite before storing the result.
labels:
[ <labelname>: <labelvalue> ]
Recording Rules
Create a new series, derived from one or more existing series
record: pod_name:cpu_usage_seconds:rate5m
expr: sum(rate(container_cpu_usage_seconds_total{pod_name=~"^(?:.+)$"}[5m]))
BY (pod_name)
labels:
ft_target: "true"
Kubernetes
Hierarchy and Aggregation
Core Metrics Aggregation
● K8s clusters form a hierarchy
● We can aggregate the “core” metrics to any level
● This allows for some interesting monitoring
opportunities
○ Using Prometheus “recording rules” aggregate the core
metrics at every level
○ Insights into all levels of your Kubernetes cluster
● This also applies to any custom application metric
Demo
Namespace
Deployment
Pod
Container
FreshTracks.io is Hiring!
https://searchjobs.ca.com/go/Accelerator/3279000/
Questions?
Resources
● Prometheus.io
● Core Metrics in Kubelet
● Kubernetes monitoring architecture
● What is the new metrics-server?

More Related Content

What's hot

OpenCensus with Prometheus and Kubernetes
OpenCensus with Prometheus and KubernetesOpenCensus with Prometheus and Kubernetes
OpenCensus with Prometheus and KubernetesJinwoong Kim
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)DoiT International
 
Kafka on Kubernetes—From Evaluation to Production at Intuit
Kafka on Kubernetes—From Evaluation to Production at Intuit Kafka on Kubernetes—From Evaluation to Production at Intuit
Kafka on Kubernetes—From Evaluation to Production at Intuit confluent
 
Architectural caching patterns for kubernetes
Architectural caching patterns for kubernetesArchitectural caching patterns for kubernetes
Architectural caching patterns for kubernetesRafał Leszko
 
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...NETWAYS
 
Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...
Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...
Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...Stephen Gordon
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionJohn Garbutt
 
Distributed Locking in Kubernetes
Distributed Locking in KubernetesDistributed Locking in Kubernetes
Distributed Locking in KubernetesRafał Leszko
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High AvailabilityJakub Pavlik
 
Swami osi bangalore2017days pike release_updates
Swami osi bangalore2017days pike release_updatesSwami osi bangalore2017days pike release_updates
Swami osi bangalore2017days pike release_updatesRanga Swami Reddy Muthumula
 
Patroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companionPatroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companionAlexander Kukushkin
 
Metal-k8s presentation by Julien Girardin @ Paris Kubernetes Meetup
Metal-k8s presentation by Julien Girardin @ Paris Kubernetes MeetupMetal-k8s presentation by Julien Girardin @ Paris Kubernetes Meetup
Metal-k8s presentation by Julien Girardin @ Paris Kubernetes MeetupLaure Vergeron
 
Build Your Kubernetes Operator with the Right Tool!
Build Your Kubernetes Operator with the Right Tool!Build Your Kubernetes Operator with the Right Tool!
Build Your Kubernetes Operator with the Right Tool!Rafał Leszko
 
Architecting for Microservices Part 2
Architecting for Microservices Part 2Architecting for Microservices Part 2
Architecting for Microservices Part 2Elana Krasner
 
Kubernetes #2 monitoring
Kubernetes #2   monitoring Kubernetes #2   monitoring
Kubernetes #2 monitoring Terry Cho
 
Heroku to Kubernetes & Gihub to Gitlab success story
Heroku to Kubernetes & Gihub to Gitlab success storyHeroku to Kubernetes & Gihub to Gitlab success story
Heroku to Kubernetes & Gihub to Gitlab success storyJérémy Wimsingues
 

What's hot (20)

OpenCensus with Prometheus and Kubernetes
OpenCensus with Prometheus and KubernetesOpenCensus with Prometheus and Kubernetes
OpenCensus with Prometheus and Kubernetes
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)
 
Kafka on Kubernetes—From Evaluation to Production at Intuit
Kafka on Kubernetes—From Evaluation to Production at Intuit Kafka on Kubernetes—From Evaluation to Production at Intuit
Kafka on Kubernetes—From Evaluation to Production at Intuit
 
Topologies of OpenStack
Topologies of OpenStackTopologies of OpenStack
Topologies of OpenStack
 
Architectural caching patterns for kubernetes
Architectural caching patterns for kubernetesArchitectural caching patterns for kubernetes
Architectural caching patterns for kubernetes
 
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
 
Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...
Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...
Containers for the Enterprise: Delivering OpenShift on OpenStack for Performa...
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer Introduction
 
Distributed Locking in Kubernetes
Distributed Locking in KubernetesDistributed Locking in Kubernetes
Distributed Locking in Kubernetes
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High Availability
 
Swami osi bangalore2017days pike release_updates
Swami osi bangalore2017days pike release_updatesSwami osi bangalore2017days pike release_updates
Swami osi bangalore2017days pike release_updates
 
Kubernetes Node Deep Dive
Kubernetes Node Deep DiveKubernetes Node Deep Dive
Kubernetes Node Deep Dive
 
Patroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companionPatroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companion
 
Metal-k8s presentation by Julien Girardin @ Paris Kubernetes Meetup
Metal-k8s presentation by Julien Girardin @ Paris Kubernetes MeetupMetal-k8s presentation by Julien Girardin @ Paris Kubernetes Meetup
Metal-k8s presentation by Julien Girardin @ Paris Kubernetes Meetup
 
Build Your Kubernetes Operator with the Right Tool!
Build Your Kubernetes Operator with the Right Tool!Build Your Kubernetes Operator with the Right Tool!
Build Your Kubernetes Operator with the Right Tool!
 
Architecting for Microservices Part 2
Architecting for Microservices Part 2Architecting for Microservices Part 2
Architecting for Microservices Part 2
 
Intro to Kubernetes
Intro to KubernetesIntro to Kubernetes
Intro to Kubernetes
 
Kubernetes #2 monitoring
Kubernetes #2   monitoring Kubernetes #2   monitoring
Kubernetes #2 monitoring
 
Heroku to Kubernetes & Gihub to Gitlab success story
Heroku to Kubernetes & Gihub to Gitlab success storyHeroku to Kubernetes & Gihub to Gitlab success story
Heroku to Kubernetes & Gihub to Gitlab success story
 

Similar to KubeCon Prometheus Salon -- Kubernetes metrics deep dive

Autoscaling in kubernetes v1
Autoscaling in kubernetes v1Autoscaling in kubernetes v1
Autoscaling in kubernetes v1JurajHantk
 
KubeCon EU 2016: A Practical Guide to Container Scheduling
KubeCon EU 2016: A Practical Guide to Container SchedulingKubeCon EU 2016: A Practical Guide to Container Scheduling
KubeCon EU 2016: A Practical Guide to Container SchedulingKubeAcademy
 
Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Idan Atias
 
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...Tobias Schneck
 
How to Migrate 100 Clusters from On-Prem to Google Cloud Without Downtime
How to Migrate 100 Clusters from On-Prem to Google Cloud Without DowntimeHow to Migrate 100 Clusters from On-Prem to Google Cloud Without Downtime
How to Migrate 100 Clusters from On-Prem to Google Cloud Without Downtimeloodse
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 introTerry Cho
 
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기Jinsu Moon
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetescraigbox
 
Kubernetes Problem-Solving
Kubernetes Problem-SolvingKubernetes Problem-Solving
Kubernetes Problem-SolvingAll Things Open
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
 
Kubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best PracticesKubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best PracticesAjeet Singh Raina
 
Docker on docker leveraging kubernetes in docker ee
Docker on docker leveraging kubernetes in docker eeDocker on docker leveraging kubernetes in docker ee
Docker on docker leveraging kubernetes in docker eeDocker, Inc.
 
OSDC 2018 | Monitoring Kubernetes at Scale by Monica Sarbu
OSDC 2018 | Monitoring Kubernetes at Scale by Monica SarbuOSDC 2018 | Monitoring Kubernetes at Scale by Monica Sarbu
OSDC 2018 | Monitoring Kubernetes at Scale by Monica SarbuNETWAYS
 
Kubernetes and bluemix
Kubernetes  and  bluemixKubernetes  and  bluemix
Kubernetes and bluemixDuckDuckGo
 
The journey to the kubernetes metrics
The journey to the kubernetes metricsThe journey to the kubernetes metrics
The journey to the kubernetes metricsChenYiHuang5
 
Getting started with Kubernetes and Azure DevOps
Getting started with Kubernetes and Azure DevOpsGetting started with Kubernetes and Azure DevOps
Getting started with Kubernetes and Azure DevOpsRaTul Basak
 
Extending kubernetes
Extending kubernetesExtending kubernetes
Extending kubernetesGigi Sayfan
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle ManagementDoKC
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle ManagementDoKC
 
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019UA DevOps Conference
 

Similar to KubeCon Prometheus Salon -- Kubernetes metrics deep dive (20)

Autoscaling in kubernetes v1
Autoscaling in kubernetes v1Autoscaling in kubernetes v1
Autoscaling in kubernetes v1
 
KubeCon EU 2016: A Practical Guide to Container Scheduling
KubeCon EU 2016: A Practical Guide to Container SchedulingKubeCon EU 2016: A Practical Guide to Container Scheduling
KubeCon EU 2016: A Practical Guide to Container Scheduling
 
Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)
 
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...
 
How to Migrate 100 Clusters from On-Prem to Google Cloud Without Downtime
How to Migrate 100 Clusters from On-Prem to Google Cloud Without DowntimeHow to Migrate 100 Clusters from On-Prem to Google Cloud Without Downtime
How to Migrate 100 Clusters from On-Prem to Google Cloud Without Downtime
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 intro
 
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetes
 
Kubernetes Problem-Solving
Kubernetes Problem-SolvingKubernetes Problem-Solving
Kubernetes Problem-Solving
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
Kubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best PracticesKubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best Practices
 
Docker on docker leveraging kubernetes in docker ee
Docker on docker leveraging kubernetes in docker eeDocker on docker leveraging kubernetes in docker ee
Docker on docker leveraging kubernetes in docker ee
 
OSDC 2018 | Monitoring Kubernetes at Scale by Monica Sarbu
OSDC 2018 | Monitoring Kubernetes at Scale by Monica SarbuOSDC 2018 | Monitoring Kubernetes at Scale by Monica Sarbu
OSDC 2018 | Monitoring Kubernetes at Scale by Monica Sarbu
 
Kubernetes and bluemix
Kubernetes  and  bluemixKubernetes  and  bluemix
Kubernetes and bluemix
 
The journey to the kubernetes metrics
The journey to the kubernetes metricsThe journey to the kubernetes metrics
The journey to the kubernetes metrics
 
Getting started with Kubernetes and Azure DevOps
Getting started with Kubernetes and Azure DevOpsGetting started with Kubernetes and Azure DevOps
Getting started with Kubernetes and Azure DevOps
 
Extending kubernetes
Extending kubernetesExtending kubernetes
Extending kubernetes
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle Management
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle Management
 
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
 

Recently uploaded

A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 

Recently uploaded (20)

A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 

KubeCon Prometheus Salon -- Kubernetes metrics deep dive

  • 1. Reveal Your Deepest Kubernetes Secrets Metrics KubeCon 2017 Prometheus Salon- 12/6/2017 Bob Cotton - FreshTracks.io
  • 2. About Me ● Co-Founder - FreshTracks.io - A CA Accelerator Incubation ● bob@freshtracks.io ● @bob_cotton
  • 3. Agenda ● Sources of metrics ○ Node ○ kubelet and containers ○ Kubernetes API ○ etcd ○ Derived metrics (kube-state-metrics) ● The new K8s metrics server ● Horizontal pod auto-scaler ● Prometheus re-labeling and recording rules ● K8s cluster hierarchies and metrics aggregation
  • 4. Sources of Metrics in Kubernetes
  • 5. Host Metrics from the node_exporter ● Standard Host Metrics ○ Load Average ○ CPU ○ Memory ○ Disk ○ Network ○ Many others ● ~1000 Unique series in a typical node Node node_exporter /metrics
  • 6. Container Metrics from cAdvisor ● cAdvisor is embedded into the kubelet, so we scrape the kubelet to get container metrics ● These are the so-called “core” metrics ● For each container on the node: ○ CPU Usage (user and system) and time throttled ○ Filesystem read/writes/limits ○ Memory usage and limits ○ Network transmit/receive/dropped Node node_exporter /metrics kubelet cAdvisor
  • 7. Kubernetes Metrics from the K8s API Server ● Metrics about the performance of the K8s API Server ○ Performance of controller work queues ○ Request Rates and Latencies ○ Etcd helper cache work queues and cache performance ○ General process status (File Descriptors/Memory/CPU Seconds) ○ Golang status (GC/Memory/Threads) Node node_exporter kubelet cAdvisor Any other Pod /metrics API Server
  • 8. Etcd Metrics from etcd ● Etcd is “master of all truth” within a K8s cluster ○ Leader existence and leader change rate ○ Proposals committed/applied/pending/failed ○ Disk write performance ○ Network and gRPC counters
  • 9. K8s Derived Metrics from kube-state-metrics ● Counts and meta-data about many K8s types ○ Counts of many “nouns” ○ Resource Limits ○ Container states ■ ready/restarts/running/terminated/waiting ○ _labels series just carries labels from Pods ● cronjob ● daemonset ● deployment ● horizontalpodautoscaler ● job ● limitrange ● namespace ● node ● persistentvolumeclaim ● pod ● replicaset ● replicationcontroller ● resourcequota ● service ● statefulset
  • 10. Sources of Metrics in Kubernetes ● Node via the node_exporter ● Container metrics via the kubelet and cAdvisor ● Kubernetes API server ● etcd ● Derived metrics via kube-state-metrics
  • 11. Scheduling and Autoscaling i.e. The Metrics Pipeline
  • 12. The New “Metrics Server” ● Replaces Heapster ● Standard (versioned and auth) API aggregated into the K8s API Server ● In “beta” in K8s 1.8 ● Used by the scheduler and (eventually) the Horizontal Pod Autoscaler ● A stripped-down version of Heapster ● Reports on “core” metrics (CPU/Memory/Network) gathered from cAdvisor ● For internal to K8s use only. ● Pluggable for custom metrics
  • 13.
  • 14. Feeding the Horizontal Pod Autoscaler ● Before the metrics server the HPA utilized Heapster for it’s Core metrics ○ This will be the metrics-server going forward ● API Adapter will bridge to third party monitoring system ○ e.g. Prometheus
  • 15. Labels, Re-Label and Recording Rules Oh My...
  • 16. Metric Metadata In the beginning: <metric name> = <metric value> http_requests_total = 1.4 Increased complexity lead to workarounds region.az.instance_type.instance.hostname.http_requests_total = 5439
  • 17. Metric Metadata - Metrics 2.0 us-east.us-east-1.m2_xlarge.i-3582k8.host1.http_requests_total = 5439 http_requests_total{region=” us-east”, az=”us-east-1”, instance_type=” m2.xlarge”, instance=” i-3582k8”, hostname=” host1”} = 5439
  • 18. Kubernetes Labels ● Kubernetes gives us labels on all the things ● Our scrape targets live in the context of the K8s labels ● We want to enhance the scraped metric labels with K8s labels ● This is why we need relabel rules in Prometheus
  • 19. Prometheus K8s API Server TSDB Kublet (cAdvisor) node-exporter kube_state_metrics App containers other exporters node_exporter App containers Kublet (cAdvisor) Service Discovery
  • 20. K8s API Server TSDB Scrape Target Service Discovery Prometheus 0="{__address__ 300.196.17.41}" 1="{__meta_kubernetes_namespace default}" 2="{__meta_kubernetes_pod_annotation_freshtracks_io_data_sidecar true}" 3="{__meta_kubernetes_pod_annotation_freshtracks_io_path /metrics2}" 4="{__meta_kubernetes_pod_annotation_kubernetes_io_created_by "kind":"SerializedReference"?}" 5="{__meta_kubernetes_pod_annotation_kubernetes_io_limit_ranger LimitRanger plugin set: cpu request for container prometheus-configmap-reload; cpu request for container data-sidecar}" 6="{__meta_kubernetes_pod_annotation_prometheus_io_port 8077}" 7="{__meta_kubernetes_pod_annotation_prometheus_io_scrape false}" 8="{__meta_kubernetes_pod_container_name prometheus-configmap-reload}" 9="{__meta_kubernetes_pod_host_ip 172.20.42.119}" 10="{__meta_kubernetes_pod_ip 100.96.17.41}" 11="{__meta_kubernetes_pod_label_freshtracks_io_cluster bowl.freshtracks.io}" 12="{__meta_kubernetes_pod_label_pod_template_hash 1636686694}" 13="{__meta_kubernetes_pod_label_run data-sidecar}" 14="{__meta_kubernetes_pod_name data-sidecar-1636686694-83crm}" 15="{__meta_kubernetes_pod_node_name ip-xx-xxx-xx-xxx.us-west-2.compute.internal}" 16="{__meta_kubernetes_pod_ready false}" 17="{__metrics_path__ /metrics}" 18="{__scheme__ http}" 19="{job ftio-data-sidecar-calc}" <relabel_config> {__address__ 300.196.17.41:8077} {__scheme__ http} {__metrics_path__ /metrics} {job ftio-data-sidecar-calc} {kubernetes_namespace default} {container_name prometheus-configmap-reload} http_requests_total{region=”us-east”, az=”us-east-1”, instance_type=”m2.xlarge”, instance=”i-3582k8”, hostname=”host1”} = 5439 http_requests_total{region=”us-east”, az=”us-east-1”, instance_type=”m2.xlarge”, instance=”i-3582k8”, hostname=”host1”, instance=”300.196.17.41:8077”, job=”ftio-data-sidecar-calc”, kubernetes_namespace=”default”, container_name=”prometheus-configmap-reload”, } = 5439 <metric_relabel_config>
  • 21. Recording Rules Create a new series, derived from one or more existing series # The name of the time series to output to. Must be a valid metric name. record: <string> # The PromQL expression to evaluate. Every evaluation cycle this is # evaluated at the current time, and the result recorded as a new set of # time series with the metric name as given by 'record'. expr: <string> # Labels to add or overwrite before storing the result. labels: [ <labelname>: <labelvalue> ]
  • 22. Recording Rules Create a new series, derived from one or more existing series record: pod_name:cpu_usage_seconds:rate5m expr: sum(rate(container_cpu_usage_seconds_total{pod_name=~"^(?:.+)$"}[5m])) BY (pod_name) labels: ft_target: "true"
  • 24. Core Metrics Aggregation ● K8s clusters form a hierarchy ● We can aggregate the “core” metrics to any level ● This allows for some interesting monitoring opportunities ○ Using Prometheus “recording rules” aggregate the core metrics at every level ○ Insights into all levels of your Kubernetes cluster ● This also applies to any custom application metric Demo Namespace Deployment Pod Container
  • 25.
  • 28. Resources ● Prometheus.io ● Core Metrics in Kubelet ● Kubernetes monitoring architecture ● What is the new metrics-server?