SlideShare uma empresa Scribd logo
1 de 30
Low Latency Streaming Using Heron
Karthik Ramasamy
@karthikz
Co-founder of Streamlio
2
Real-time is key
Information Age
Ká
3
Real Time Connected World
Internet of Things
30 B connected devices by 2020
Health Care
153 Exabytes (2013) -> 2314 Exabytes
(2020)
Machine Data
40% of digital universe by 2020
Connected Vehicles
Data transferred per vehicle per month
4 MB -> 5 GB
Digital Assistants (Predictive Analytics)
$2B (2012) -> $6.5B (2019) [1]
Siri/Cortana/Google Now
Augmented/Virtual Reality
$150B by 2020 [2]
Oculus/HoloLens/Magic Leap
Ñ
+
>
[1] http://www.siemens.com/innovation/en/home/pictures-of-the-future/digitalization-and-software/digital-assistants-trends.html
[2] http://techcrunch.com/2015/04/06/augmented-and-virtual-reality-to-hit-150-billion-by-2020/#.7q0heh:oABw
4
Value of Data
Value&of&Data&to&Decision/Making&
Time&
Preven8ve/&
Predic8ve&
Ac8onable&
Reac8ve&
Historical&
Real%&
Time&
Seconds& Minutes& Hours& Days&
Tradi8onal&“Batch”&&&&&&&&&&&&&&&
Business&&Intelligence&
Informa9on&Half%Life&
In&Decision%Making&
Months&
Time/cri8cal&
Decisions&
[1] Courtesy Michael Franklin, BIRTE, 2015.
5
Introducing Heron
! Scaling
! Debugging
! Consistent performance
! Yet another system to manage
! Consistent performance at scale
! Easy to debug and tune
! Fast/Efficient General purpose streaming engine
! Storm API compatibile
! Latency/Thruput configurability
! Library not a service
Issues with Apache Storm Heron Design Goals
6
Heron in Production @ Twitter
Completely replaced Storm 3 years ago
3x reduction in cores and memory
Significantly reduced operational overhead
10x reduction in production incidents
7
Heron Use Cases
REALTIME
ETL
REAL TIME
BI
SPAM
DETECTION REAL TIME
TRENDS
REALTIME
ML
REAL TIME
OPS
8
Open Souring
https://github.com/twitter/heron
http://heron.io
Apache 2.0 License
Contributions from
Microsoft, Machine Zone, Mesosphere, Google,
Wanda Group, WeChat, Fitbit and growing
OPEN SOURCED
MAY 2016
9
Heron Core Concepts
Topology
Directed acyclic graph
vertices = computation, and
edges = streams of data tuples
Spouts
Sources of data tuples for the topology
Examples - Kafka/Kestrel/MySQL/Postgres
Bolts
Process incoming tuples, and emit outgoing
tuples
Examples - filtering/aggregation/join/any
function
,
%
10
Sample Heron Topology
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
11
Topology Architecture
Topology
Master
ZK
Cluster
Stream
Manager
I1 I2 I3 I4
Stream
Manager
I1 I2 I3 I4
Logical Plan,
Physical Plan and
Execution State
Sync Physical Plan
CONTAINER CONTAINER
Metrics
Manager
Metrics
Manager
12
Stream Manager - Design Goals
Core logic in one centralized place
Super Efficient
Pluggable
Transport (tcp sockets, unix sockets,
shared memory)
Interlanguage Data Format (Protobufs,
Cap N’ Proto, etc)
Protocol (HTTP, gRPC, custom, etc)
Oculus/HoloLens/Magic Leap
Ñ
+
>
Multilanguage Instances (C++, Java,
Python),
13
Stream Manager - Current Implementation
Implements at most once and at least
once
Written in C++ for efficiency
Custom protocol (very similar to gRPC)
Transport using TCP Sockets
Protobuf data format
Ñ
+
14
Stream Manager - Shortcomings
01 02 03
Transport
Shortcomings
TCP Overhead
Multiple memory copies
Protobuf
Shortcomings
Serde is very expensive
Full deserialization necessary to access any field
Creation/Deletion is very expensive
Core Logic
Implementation
Followed immutable pattern
Easy to reason but inefficient
/ .
-
15
Stream Manager - Performance Analysis
Too slow
Too much overhead
Changes what we are trying to observe
Very fast
Doesn’t do code instrumentation
cpu-profiling/memory-profiling in one tool
Ñ
Valgrind Google Perftools
16
Stream Manager - Performance Analysis
17% in new/delete
15% immutable pattern
15% eager deserialization
12% protobuf size collection
17
Stream Manager - Optimization 1
! new/delete overhead
! Problem:- Create/Delete a new protobuf object every time we read/wrote
something.
! Protobuf sacrifices speed for safety
! Solution
! Create protobuf pools at startup
! We do a “new” only when the pool is exhausted
! The pool is bounded in size to avoid running out of memory
18
Stream Manager - Optimization 2
! Immutable pattern
! Problem:- In general case, one tuple can fan-out to multiple downstream instances
! For each downstream instance, we made a new copy
! Solution
! Do early serialization to create an immutable byte array
! Just copy the raw bytes
19
Stream Manager - Optimization 3
! Eager Deserialization
! Problem:- Protobuf deserializes the entire message even if we access just the ‘header’
! Solution
! Change the protobuf message to have raw bytes to avoid expensive deserialization
! Lazy deserialization is done manually only when needed
20
Stream Manager - Optimization 4
! Calculation of Protobuf ByteSize
! Problem:- Bytesize computation is expensive and every time the computation is from the scratch
! Solution
! Used CachedByteSize when possible
21
Benchmark Settings
Components Expt 1 Expt 2 Expt 3
Spout 25 100 200
Bolt 25 100 200
# Heron
Containers
25 100 200
Dual Intel Xeon
E5645@2.4GHz, 72GB RAM,
500GB Disk
175K Random words generated
Word Count Topology
22
Benchmark - At most once throughput
5 - 6x
23
Benchmark - At least once throughput
4 - 5x
24
Benchmark - At least once latency
2 - 4x
25
Real-Time is Messy, Unpredictable and Hard
Aggregation
Systems
Messaging
Systems
Result
Engine
HDFS
Queryable
Engines
26
Real Time - End to End
Storm API DSL SQL
Application
Builder
Ingestion
API
Query
API
27
Curious to Learn More?
Twitter Heron: Stream Processing at Scale
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg,
Sailesh Mittal, Jignesh M. Patel*,1
, Karthik Ramasamy, Siddarth Taneja
@sanjeevrk, @challenger_nik, @Louis_Fumaosong, @vikkyrk, @cckellogg,
@saileshmittal, @pateljm, @karthikz, @staneja
Twitter, Inc., *University of Wisconsin – Madison
ABSTRACT
Storm has long served as the main platform for real-time analytics
at Twitter. However, as the scale of data being processed in real-
time at Twitter has increased, along with an increase in the
diversity and the number of use cases, many limitations of Storm
have become apparent. We need a system that scales better, has
better debug-ability, has better performance, and is easier to
manage – all while working in a shared cluster infrastructure. We
considered various alternatives to meet these needs, and in the end
concluded that we needed to build a new real-time stream data
processing system. This paper presents the design and
implementation of this new system, called Heron. Heron is now
the de facto stream data processing engine inside Twitter, and in
this paper we also share our experiences from running Heron in
production. In this paper, we also provide empirical evidence
demonstrating the efficiency and scalability of Heron.
ACM Classification
H.2.4 [Information Systems]: Database Management—systems
Keywords
Stream data processing systems; real-time data processing.
1. INTRODUCTION
Twitter, like many other organizations, relies heavily on real-time
system process, which makes debugging very challenging. Thus, we
needed a cleaner mapping from the logical units of computation to
each physical process. The importance of such clean mapping for
debug-ability is really crucial when responding to pager alerts for a
failing topology, especially if it is a topology that is critical to the
underlying business model.
In addition, Storm needs dedicated cluster resources, which requires
special hardware allocation to run Storm topologies. This approach
leads to inefficiencies in using precious cluster resources, and also
limits the ability to scale on demand. We needed the ability to work
in a more flexible way with popular cluster scheduling software that
allows sharing the cluster resources across different types of data
processing systems (and not just a stream processing system).
Internally at Twitter, this meant working with Aurora [1], as that is
the dominant cluster management system in use.
With Storm, provisioning a new production topology requires
manual isolation of machines, and conversely, when a topology is
no longer needed, the machines allocated to serve that topology
now have to be decommissioned. Managing machine provisioning
in this way is cumbersome. Furthermore, we also wanted to be far
more efficient than the Storm system in production, simply
because at Twitter’s scale, any improvement in performance
translates into significant reduction in infrastructure costs and also
significant improvements in the productivity of our end users.
Storm @Twitter
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni,
Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy
@ankitoshniwal, @staneja, @amits, @karthikz, @pateljm, @sanjeevrk,
@jason_j, @krishnagade, @Louis_Fumaosong, @jakedonham, @challenger_nik, @saileshmittal, @squarecog
Twitter, Inc., *University of Wisconsin – Madison
Streaming@Twitter
Maosong Fu, Sailesh Mittal, Vikas Kedigehalli, Karthik Ramasamy, Michael Barry,
Andrew Jorgensen, Christopher Kellogg, Neng Lu, Bill Graham, Jingwei Wu
Twitter, Inc.
Abstract
Twitter generates tens of billions of events per hour when users interact with it. Analyzing these
events to surface relevant content and to derive insights in real time is a challenge. To address this, we
developed Heron, a new real time distributed streaming engine. In this paper, we first describe the design
goals of Heron and show how the Heron architecture achieves task isolation and resource reservation
to ease debugging, troubleshooting, and seamless use of shared cluster infrastructure with other critical
Twitter services. We subsequently explore how a topology self adjusts using back pressure so that the
pace of the topology goes as its slowest component. Finally, we outline how Heron implements at most
once and at least once semantics and we describe a few operational stories based on running Heron in
production.
1 Introduction
Stream processing platforms enable enterprises to extract business value from data in motion similar to batch
processing platforms that facilitated the same with data at rest [42]. The goal of stream processing is to enable
real time or near real time decision making by providing capabilities to inspect, correlate and analyze data as
28
Curious to Learn More?
29
WHAT WHY WHERE WHEN WHO HOW
Any Question ???
30
@karthikz
Get in Touch

Mais conteúdo relacionado

Mais procurados

The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101Data Con LA
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastoreKishore Gopalakrishna
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...confluent
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinTyler Wishnoff
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyKishore Gopalakrishna
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Kai Wähner
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScaleSeunghyun Lee
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebookAniket Mokashi
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Databricks
 
Lessons Learned Developing and Managing Massive (300TB+) Apache Spark Pipelin...
Lessons Learned Developing and Managing Massive (300TB+) Apache Spark Pipelin...Lessons Learned Developing and Managing Massive (300TB+) Apache Spark Pipelin...
Lessons Learned Developing and Managing Massive (300TB+) Apache Spark Pipelin...Spark Summit
 
Apache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with SparkApache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with SparkLuiz Henrique Zambom Santana
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 

Mais procurados (20)

The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastore
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case study
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
 
Lessons Learned Developing and Managing Massive (300TB+) Apache Spark Pipelin...
Lessons Learned Developing and Managing Massive (300TB+) Apache Spark Pipelin...Lessons Learned Developing and Managing Massive (300TB+) Apache Spark Pipelin...
Lessons Learned Developing and Managing Massive (300TB+) Apache Spark Pipelin...
 
Apache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with SparkApache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with Spark
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 

Semelhante a Real Time Processing Using Twitter Heron by Karthik Ramasamy

Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Karthik Ramasamy
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...HostedbyConfluent
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Data Culture Series - Keynote - 3rd Dec
Data Culture Series - Keynote - 3rd DecData Culture Series - Keynote - 3rd Dec
Data Culture Series - Keynote - 3rd DecJonathan Woodward
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnelukdpe
 
Cytoscape CI Chapter 2
Cytoscape CI Chapter 2Cytoscape CI Chapter 2
Cytoscape CI Chapter 2bdemchak
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Stavros Kontopoulos
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionAmazon Web Services
 
Graphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform OptimizationGraphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform OptimizationBig Data Value Association
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaborationJulien Pivotto
 
Comparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonComparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonIRJET Journal
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Tal Bar-Zvi
 
Azure machine learning ile tahminleme modelleri
Azure machine learning ile tahminleme modelleriAzure machine learning ile tahminleme modelleri
Azure machine learning ile tahminleme modelleriKoray Kocabas
 
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloMini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloOCTO Technology
 
Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023Chris Grundemann
 

Semelhante a Real Time Processing Using Twitter Heron by Karthik Ramasamy (20)

Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Path to continuous delivery
Path to continuous deliveryPath to continuous delivery
Path to continuous delivery
 
Data Culture Series - Keynote - 3rd Dec
Data Culture Series - Keynote - 3rd DecData Culture Series - Keynote - 3rd Dec
Data Culture Series - Keynote - 3rd Dec
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
 
Cytoscape CI Chapter 2
Cytoscape CI Chapter 2Cytoscape CI Chapter 2
Cytoscape CI Chapter 2
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in Action
 
Graphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform OptimizationGraphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform Optimization
 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 
Comparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonComparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & Python
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019
 
Azure machine learning ile tahminleme modelleri
Azure machine learning ile tahminleme modelleriAzure machine learning ile tahminleme modelleri
Azure machine learning ile tahminleme modelleri
 
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloMini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
 
Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023
 

Mais de Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Mais de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Último

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Real Time Processing Using Twitter Heron by Karthik Ramasamy

  • 1. Low Latency Streaming Using Heron Karthik Ramasamy @karthikz Co-founder of Streamlio
  • 3. 3 Real Time Connected World Internet of Things 30 B connected devices by 2020 Health Care 153 Exabytes (2013) -> 2314 Exabytes (2020) Machine Data 40% of digital universe by 2020 Connected Vehicles Data transferred per vehicle per month 4 MB -> 5 GB Digital Assistants (Predictive Analytics) $2B (2012) -> $6.5B (2019) [1] Siri/Cortana/Google Now Augmented/Virtual Reality $150B by 2020 [2] Oculus/HoloLens/Magic Leap Ñ + > [1] http://www.siemens.com/innovation/en/home/pictures-of-the-future/digitalization-and-software/digital-assistants-trends.html [2] http://techcrunch.com/2015/04/06/augmented-and-virtual-reality-to-hit-150-billion-by-2020/#.7q0heh:oABw
  • 4. 4 Value of Data Value&of&Data&to&Decision/Making& Time& Preven8ve/& Predic8ve& Ac8onable& Reac8ve& Historical& Real%& Time& Seconds& Minutes& Hours& Days& Tradi8onal&“Batch”&&&&&&&&&&&&&&& Business&&Intelligence& Informa9on&Half%Life& In&Decision%Making& Months& Time/cri8cal& Decisions& [1] Courtesy Michael Franklin, BIRTE, 2015.
  • 5. 5 Introducing Heron ! Scaling ! Debugging ! Consistent performance ! Yet another system to manage ! Consistent performance at scale ! Easy to debug and tune ! Fast/Efficient General purpose streaming engine ! Storm API compatibile ! Latency/Thruput configurability ! Library not a service Issues with Apache Storm Heron Design Goals
  • 6. 6 Heron in Production @ Twitter Completely replaced Storm 3 years ago 3x reduction in cores and memory Significantly reduced operational overhead 10x reduction in production incidents
  • 7. 7 Heron Use Cases REALTIME ETL REAL TIME BI SPAM DETECTION REAL TIME TRENDS REALTIME ML REAL TIME OPS
  • 8. 8 Open Souring https://github.com/twitter/heron http://heron.io Apache 2.0 License Contributions from Microsoft, Machine Zone, Mesosphere, Google, Wanda Group, WeChat, Fitbit and growing OPEN SOURCED MAY 2016
  • 9. 9 Heron Core Concepts Topology Directed acyclic graph vertices = computation, and edges = streams of data tuples Spouts Sources of data tuples for the topology Examples - Kafka/Kestrel/MySQL/Postgres Bolts Process incoming tuples, and emit outgoing tuples Examples - filtering/aggregation/join/any function , %
  • 10. 10 Sample Heron Topology % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5
  • 11. 11 Topology Architecture Topology Master ZK Cluster Stream Manager I1 I2 I3 I4 Stream Manager I1 I2 I3 I4 Logical Plan, Physical Plan and Execution State Sync Physical Plan CONTAINER CONTAINER Metrics Manager Metrics Manager
  • 12. 12 Stream Manager - Design Goals Core logic in one centralized place Super Efficient Pluggable Transport (tcp sockets, unix sockets, shared memory) Interlanguage Data Format (Protobufs, Cap N’ Proto, etc) Protocol (HTTP, gRPC, custom, etc) Oculus/HoloLens/Magic Leap Ñ + > Multilanguage Instances (C++, Java, Python),
  • 13. 13 Stream Manager - Current Implementation Implements at most once and at least once Written in C++ for efficiency Custom protocol (very similar to gRPC) Transport using TCP Sockets Protobuf data format Ñ +
  • 14. 14 Stream Manager - Shortcomings 01 02 03 Transport Shortcomings TCP Overhead Multiple memory copies Protobuf Shortcomings Serde is very expensive Full deserialization necessary to access any field Creation/Deletion is very expensive Core Logic Implementation Followed immutable pattern Easy to reason but inefficient / . -
  • 15. 15 Stream Manager - Performance Analysis Too slow Too much overhead Changes what we are trying to observe Very fast Doesn’t do code instrumentation cpu-profiling/memory-profiling in one tool Ñ Valgrind Google Perftools
  • 16. 16 Stream Manager - Performance Analysis 17% in new/delete 15% immutable pattern 15% eager deserialization 12% protobuf size collection
  • 17. 17 Stream Manager - Optimization 1 ! new/delete overhead ! Problem:- Create/Delete a new protobuf object every time we read/wrote something. ! Protobuf sacrifices speed for safety ! Solution ! Create protobuf pools at startup ! We do a “new” only when the pool is exhausted ! The pool is bounded in size to avoid running out of memory
  • 18. 18 Stream Manager - Optimization 2 ! Immutable pattern ! Problem:- In general case, one tuple can fan-out to multiple downstream instances ! For each downstream instance, we made a new copy ! Solution ! Do early serialization to create an immutable byte array ! Just copy the raw bytes
  • 19. 19 Stream Manager - Optimization 3 ! Eager Deserialization ! Problem:- Protobuf deserializes the entire message even if we access just the ‘header’ ! Solution ! Change the protobuf message to have raw bytes to avoid expensive deserialization ! Lazy deserialization is done manually only when needed
  • 20. 20 Stream Manager - Optimization 4 ! Calculation of Protobuf ByteSize ! Problem:- Bytesize computation is expensive and every time the computation is from the scratch ! Solution ! Used CachedByteSize when possible
  • 21. 21 Benchmark Settings Components Expt 1 Expt 2 Expt 3 Spout 25 100 200 Bolt 25 100 200 # Heron Containers 25 100 200 Dual Intel Xeon E5645@2.4GHz, 72GB RAM, 500GB Disk 175K Random words generated Word Count Topology
  • 22. 22 Benchmark - At most once throughput 5 - 6x
  • 23. 23 Benchmark - At least once throughput 4 - 5x
  • 24. 24 Benchmark - At least once latency 2 - 4x
  • 25. 25 Real-Time is Messy, Unpredictable and Hard Aggregation Systems Messaging Systems Result Engine HDFS Queryable Engines
  • 26. 26 Real Time - End to End Storm API DSL SQL Application Builder Ingestion API Query API
  • 27. 27 Curious to Learn More? Twitter Heron: Stream Processing at Scale Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel*,1 , Karthik Ramasamy, Siddarth Taneja @sanjeevrk, @challenger_nik, @Louis_Fumaosong, @vikkyrk, @cckellogg, @saileshmittal, @pateljm, @karthikz, @staneja Twitter, Inc., *University of Wisconsin – Madison ABSTRACT Storm has long served as the main platform for real-time analytics at Twitter. However, as the scale of data being processed in real- time at Twitter has increased, along with an increase in the diversity and the number of use cases, many limitations of Storm have become apparent. We need a system that scales better, has better debug-ability, has better performance, and is easier to manage – all while working in a shared cluster infrastructure. We considered various alternatives to meet these needs, and in the end concluded that we needed to build a new real-time stream data processing system. This paper presents the design and implementation of this new system, called Heron. Heron is now the de facto stream data processing engine inside Twitter, and in this paper we also share our experiences from running Heron in production. In this paper, we also provide empirical evidence demonstrating the efficiency and scalability of Heron. ACM Classification H.2.4 [Information Systems]: Database Management—systems Keywords Stream data processing systems; real-time data processing. 1. INTRODUCTION Twitter, like many other organizations, relies heavily on real-time system process, which makes debugging very challenging. Thus, we needed a cleaner mapping from the logical units of computation to each physical process. The importance of such clean mapping for debug-ability is really crucial when responding to pager alerts for a failing topology, especially if it is a topology that is critical to the underlying business model. In addition, Storm needs dedicated cluster resources, which requires special hardware allocation to run Storm topologies. This approach leads to inefficiencies in using precious cluster resources, and also limits the ability to scale on demand. We needed the ability to work in a more flexible way with popular cluster scheduling software that allows sharing the cluster resources across different types of data processing systems (and not just a stream processing system). Internally at Twitter, this meant working with Aurora [1], as that is the dominant cluster management system in use. With Storm, provisioning a new production topology requires manual isolation of machines, and conversely, when a topology is no longer needed, the machines allocated to serve that topology now have to be decommissioned. Managing machine provisioning in this way is cumbersome. Furthermore, we also wanted to be far more efficient than the Storm system in production, simply because at Twitter’s scale, any improvement in performance translates into significant reduction in infrastructure costs and also significant improvements in the productivity of our end users. Storm @Twitter Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy @ankitoshniwal, @staneja, @amits, @karthikz, @pateljm, @sanjeevrk, @jason_j, @krishnagade, @Louis_Fumaosong, @jakedonham, @challenger_nik, @saileshmittal, @squarecog Twitter, Inc., *University of Wisconsin – Madison Streaming@Twitter Maosong Fu, Sailesh Mittal, Vikas Kedigehalli, Karthik Ramasamy, Michael Barry, Andrew Jorgensen, Christopher Kellogg, Neng Lu, Bill Graham, Jingwei Wu Twitter, Inc. Abstract Twitter generates tens of billions of events per hour when users interact with it. Analyzing these events to surface relevant content and to derive insights in real time is a challenge. To address this, we developed Heron, a new real time distributed streaming engine. In this paper, we first describe the design goals of Heron and show how the Heron architecture achieves task isolation and resource reservation to ease debugging, troubleshooting, and seamless use of shared cluster infrastructure with other critical Twitter services. We subsequently explore how a topology self adjusts using back pressure so that the pace of the topology goes as its slowest component. Finally, we outline how Heron implements at most once and at least once semantics and we describe a few operational stories based on running Heron in production. 1 Introduction Stream processing platforms enable enterprises to extract business value from data in motion similar to batch processing platforms that facilitated the same with data at rest [42]. The goal of stream processing is to enable real time or near real time decision making by providing capabilities to inspect, correlate and analyze data as
  • 29. 29 WHAT WHY WHERE WHEN WHO HOW Any Question ???