SlideShare uma empresa Scribd logo
1 de 49
Arun  Kejariwal                  Karthik  Ramasamy	
  
	
  	
  	
  	
  	
  MZ  Research                                                                      Twi.er
Anomaly Detection in Real-Time Data Streams
Using Heron
2
3
DATA  @  MZ  
An  Overview
GOW AND MOBILE STRIKE
Peaked at 1M events/sec
MARKETING
Serve >1B impressions/day worldwide
Integrated with >150 distinct advertising channels
POTPOURRI
~35B messages/day
Writes: 20TB/day
4
SENSORS
Monitoring	
  
Smartwatches,  Refrigerators  
Wearables
ACTUATORS
Automa,on	
  
Manufacturing	
  
Robo@cs
DRONES
Expanding  the  scope	
  
Delivery,  Real  Estate	
  
Power  Transmission  Lines
MOBILE
Life’s  Remote  Control	
  
Personaliza@on	
  
Produc@vity
EXPLOSION  IN  DATA  VELOCITY  AND  VOLUME
5
MANUFACTURING HEALTH	
  
Care
POWER	
  
Grid
GAS	
  
Pipelines
SECURITY OPERATIONS ROBOTICS #  TWEETS	
  
per  minute
ANOMALY  DETECTION:  WHY  BOTHER?
DIGITAL	
  
Marke,ng
CONNECTED	
  
Cars
6
ANOMALY  DETECTION:  LIVE  EXAMPLE
7
ANOMALY  DETECTION:  HISTORY
8
RESEARCHED  
FOR  
>100  YEARS
Manufacturing
Econometrics
Networking
Image  Processing
Computer  Vision (Cyber)	
  Security
Text  Mining
Signal  Processing
Finance
Experimental  Social  Psychology
Web  Opera@ons
Sta@s@cs  (and  Time  Series  Analysis)
Data  Fidelity
Astronomy
ANOMALY  DETECTION:    APPLICATION  DOMAINS
9
ANOMALY  DETECTION:  RECENT  WORKS  IN  INDUSTRY
JAN’15 MARCH’15 AUG’15
NOV’15NOV’15AUG’15
JULY’15
JUNE’16
10
FALSE	
  
Posi@ve	
  
Rate
FALSE	
  
Nega@ve	
  
Rate
SCALE	
  
Data	
  
Granularity
WHY  NOT  USE  OFF-­‐THE-­‐SHELF?
Anomalies  are  CONTEXTUAL
11
Severity
Data	
  
Characteris@cs
Data    
Fidelity
Different  Ac@ons	
  
Page  or  not  
Sta@onarity,  Normal  	
  
Distribu,on  
Missing  Data	
  
Data  Corrup,on  
MOSTLY  UNSUPERVISED
12
DATA  VISUALIZATION	
  
Not  viable  in  prac2ce
13
MEAN AND STANDARD DEVIATION
Mean: Compute incrementally
Not robust in the presence of anomalies
COMMONLY  USED  STATISTICS
TRIMMED MEAN
Robust in the presence of anomalies
Small samples?
How to handle asymmetric distributions?
Results in a biased estimator
What should be the trimming boundaries?
WINSORIZED MEAN
L-ESTIMATORS
Linear combinations of order statistics
14
ROBUST  STATISTICS
MEDIAN AND MEDIAN ABSOLUTE DEVIATION (MAD)
Robust in the presence of anomalies
Not amenable to incremental computation
Use q-digest, t-digest
What if MAD is zero?
A sample with many similar values
BROADENED MEDIAN, M-ESTIMATORS, SN AND QN
15
ANALYZE INDIVIDUAL TIME SERIES
Too many alerts
Not actionable
Alert Fatigue
MULTIPLE  TIME  SERIES	
  
Methods
MINIMUM COVARIANCE DETERMINANT (MCD)
Proposed by Rousseeuw, 1984
Mahalanobis distance1
FastMCD
[1]	
  “On	
  the	
  generalised	
  distance	
  in	
  sta/s/cs”,	
  by	
  P.	
  C.	
  Mahalanobis,	
  1936.	
  
16
MULTIPLE  TIME  SERIES	
  
Other  Methods
CORRELATION
Direction
Magnitude
nxn Correlation Matrix?
Bake in context
Exploit topology
17
CHALLENGES
Susceptible to Anomalies
Data Skew
Missing Data
Speed
MULTIPLE  TIME  SERIES	
  
Other  Methods
TECHNIQUES
Robust Correlation
Cross Correlation
Intersection Analysis
Trade-off between speed and accuracy
THE	
  BIG	
  PICTURE
19
THE  FLOW	
  
RTpla9orm  and  Heron
Live  Data Streaming  Computa,on
RTpla/orm
20
RTplatform
Cloud-based platform built for connecting, processing,
and reacting to live data.
+ Extreme scale
+ High performance
+ Unprecedented reliability
+ Natively serverless
21
RTplatform
“Real-time” has many definitions that have variable KPIs.
Real time results on data-at-rest, not on live data
22
Live Stream Bots
A backbone for live data:
Free Messaging for publishers
and subscribers
Filter, analyze and
transform messages
in live stream
Notify
Anomaly
detection
RTplatform
MESSAGING Real-time Pub/Sub with ultra-low latency and high fanout
QUERYING Filter, analyze, and transform messages live, in-stream
BOTS Deploy rule-based bots for real-time anomaly detection/reaction
23
RTplatform
HERON
25
HERON  DESIGN  GOALS
Task isolation
Ease	
  of	
  debug-­‐ability/isolaDon/profiling
Support for back pressure
Topologies	
  should	
  self	
  adjusDng
Efficiency
Reduce resource consumption
Off -the-shelf schedulers
Unmanaged	
  	
  -­‐	
  Apache	
  YARN/Mesos	
  
Managed	
  -­‐	
  	
  Apache	
  Aurora,	
  Amazon	
  ECS
Use of main stream languages
C++,	
  Java	
  and	
  Python
Batching of tuples
AmorDzing	
  the	
  cost	
  of	
  transferring	
  tuples !
"#
G
4 !
26
HERON  ARCHITECTURE
Topology 1
Topology
Submission
Scheduler
Topology 2
Topology N
27
TOPOLOGY  ARCHITECTURE
Topology
Master
ZK
Cluster
Stream
Manager
I1 I2 I3 I4
Stream
Manager
I1 I2 I3 I4
Logical Plan,
Physical Plan and
Execution State
Sync Physical Plan
CONTAINER CONTAINER
Metrics
Manager
Metrics
Manager
27
28
STREAM  MANAGER	
  
Sample  Topology
% %
S1 B2 B3
%
B4
29
HERON  PHYSICAL  EXECUTION
S1 B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
B4
30
BACKPRESSURE	
  
Stragglers  are  the  norm  in  a  mul2-­‐tenant  distributed  systems
BAD HOST EXECUTION SKEW INADEQUATE
PROVISIONING
Ñ"
31
SENDERS TO STRAGGLER: DROP DATA
BACKPRESSURE	
  
Approaches  to  Handle  Stragglers
DETECT STRAGGLERS AND RESCHEDULE THEM
SENDERS SLOW DOWN TO THE SPEED OF STRAGGLER
32
BACKPRESSURE	
  
Data  Drop  Strategy
UNPREDICTABLE AFFECTS ACCURACY POOR VISIBILITY
33
BACKPRESSURE	
  
Slow  Down  Sender
HANDLES
TEMPORARY
SPIKES
#
PROCESSES DATA
AT MAXIMUM
RATE
/
PROVIDES
PREDICTABILITY
REDUCE
RECOVERY TIMES
34
BACKPRESSURE	
  
Stream  Manager
TCP backpressure
Spout based backpressure
Stagewise backpressure
!
!
!
35
BACKPRESSURE  -­‐  TCP	
  
Stream  Manager
Slows  upstream  and  downstream  instances
S1 B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
B4
36
BACKPRESSURE  -­‐  SPOUT	
  
Stream  Manager
S1 S1
S1S1S1 S1
S1S1 B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
B2
B3 B4
B2
B3
B2
B3 B4
B4
37
IN MOST SCENARIOS BACK PRESSURE RECOVERS
Without any manual intervention
BACKPRESSURE	
  
In  Prac2ce
SOMETIMES USER PREFERS DROPPING OF DATA
Care about only latest data
SUSTAINED BACK PRESSURE
Irrecoverable GC cycles, Bad or faulty host
38
PREDICTABILITY
Tuple failures are more deterministic
BACKPRESSURE	
  
Advantages
SELF ADJUSTS
Topology goes as fast as the slowest component
39
HERON:  EXTENSIBLE  STREAMING  ENGINE
HARDWARE
BASIC INTER/INTRA IPC
Topology
Master
Stream
Manager
Instance
Metrics
Manager
Scribe Graphite
SCHEDULERSTATEMANAGER
40
PLUG AND PLAY COMPONENTS
As environment changes, core does not change
MULTI LANGUAGE INSTANCES
Support multiple language API with native instances
MULTIPLE PROCESSING SEMANTICS
Efficient stream managers for each semantics
EASE OF DEVELOPMENT
Faster development of components with little dependency
HERON:  EXTENSIBLE  STREAMING  ENGINE
41
REPEATED SERIALIZATION
Java objects —> Byte Arrays —> Protocol Buffers
EAGER DESERIALIZATION
Stream manager deserializes entire tuple even though full contents are not examined
IMMUTABILITY
Stream manager does not reuse any ProtoBuf objects
OPTIMIZING  HERON
42
HERON:  PERFORMANCE	
  
At  most  once  seman2cs
0
2000
4000
6000
8000
10000
12000
25 100 200
MILLION	TUPLES/MIN
SPOUT	PARALLELISM
THROUGHPUT
Without	Optimizations With	Optimizations
0
5
10
15
20
25
30
35
25 100 200
MILLION	TUPLES/MIN
SPOUT	PARALLELISM
THROUGHPUT	 PER	CORE
Without	Optimizations With	Optimizations
43
HERON:  PERFORMANCE	
  
At  least  once  seman2cs
0
500
1000
1500
2000
2500
25 100 200
MILLION	TUPLES/MIN
SPOUT	PARALLELISM
THROUGHPUT
Without	Optimizations With	Optimizations
0
20
40
60
80
100
120
140
160
180
25 100 200
MILLISECS
SPOUT	PARALLELISM
LATENCY
Without	Optimizations With	Optimizations
44
HERON:  PERFORMANCE	
  
At  least  once  seman2cs  -­‐  Impact  of  Cache  Drain  Frequency
0
500
1000
1500
2000
2500
0 5 10 15 20 25 30 35
MILLION	TUPLES/MIN
CACHE	DRAIN	FREQUENCY	(MS)
THROUGHPUT	 VS	CACHE	 DRAIN	FREQUENCY
200 100 25
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35
LATENCY	(MS)
CACHE	DRAIN	FREQUENCY	(MS)
LATENCY	 VS	CACHE	 DRAIN	FREQUENCY
200 100 25
45
HALBERT	
  
Nakagawa	
  
Co-­‐Founder	
  &	
  CTO
FRANCOIS	
  
Orsini	
  
CTO
JOSH	
  
Lulewicz	
  
Head  of  Data  Placorm
WE  ARE  HIRING!
KARTHIK	
  
Ramasamy	
  
Manager
46
QUESTIONS    ANSWERS	
  
Go  ahead.	
  
Don‘t  hesitate.
47
READINGS
STROM @ TWITTER
A. Toshniwall et. al, SIGMOD 2014.
TWITTER HERON: STREAM PROCESSING AT SCALE
S. Kulkarni et al., SIGMOD 2015.
STREAMING @ TWITTER
M. Fu, 2016.
TWITTER HERON: TOWARDS EXTENSIBLE STREAMING ENGINES
M. Fu, ICDE 2017.
48
READINGS
LIMITS THEOREMS FOR THE MEDIAN DEVIATIONS
P. Hall and A. H. Welsh, 1985.
ALTERNATIVES TO MEDIAN ABSOLUTE DEVIATION
P. J. Rousseeuw and C. Croux, 1993.
ASYMPTOTIC INDEPENDENCE OF MEDIAN AND MAD
M. Falk, 1997.
BAHADUR REPRESENTATIONS FOR THE MEDIAN ABSOLUTE DEVIATION AND ITS
MODIFICATIONS
S. Mazumder and R. Serfling, 2009.
THE MINIMUM REGULARIZED COVARIANCE DETERMINANT ESTIMATOR
K. Boudt, P. J. Rousseeuw, S. Vanduffel and T. Verdonck, 2017.
THANK  YOU	
  
For  your  aKen2on!

Mais conteúdo relacionado

Mais procurados

Real-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTMReal-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTMNumenta
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?Ted Dunning
 
Streaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameStreaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameNumenta
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis PatternsMikio L. Braun
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really MatterTed Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Getting Started with Numenta Technology
Getting Started with Numenta Technology Getting Started with Numenta Technology
Getting Started with Numenta Technology Numenta
 
Detecting Anomalies in Streaming Data
Detecting Anomalies in Streaming DataDetecting Anomalies in Streaming Data
Detecting Anomalies in Streaming DataSubutai Ahmad
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Storm users group real time hadoop
Storm users group real time hadoopStorm users group real time hadoop
Storm users group real time hadoopTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Time Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureTime Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureMarco Parenzan
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesTed Dunning
 

Mais procurados (20)

Real-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTMReal-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTM
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?
 
Streaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameStreaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same Game
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Strata New York 2012
Strata New York 2012Strata New York 2012
Strata New York 2012
 
Getting Started with Numenta Technology
Getting Started with Numenta Technology Getting Started with Numenta Technology
Getting Started with Numenta Technology
 
Detecting Anomalies in Streaming Data
Detecting Anomalies in Streaming DataDetecting Anomalies in Streaming Data
Detecting Anomalies in Streaming Data
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Storm users group real time hadoop
Storm users group real time hadoopStorm users group real time hadoop
Storm users group real time hadoop
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
Time Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureTime Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and Azure
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search engines
 

Destaque

Secure development environment @ Meet Magento Croatia 2017
Secure development environment @ Meet Magento Croatia 2017Secure development environment @ Meet Magento Croatia 2017
Secure development environment @ Meet Magento Croatia 2017Anna Völkl
 
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Kevin Mao
 
Diagnóstico SEO Técnico con Herramientas #TheInbounder
Diagnóstico SEO Técnico con Herramientas #TheInbounderDiagnóstico SEO Técnico con Herramientas #TheInbounder
Diagnóstico SEO Técnico con Herramientas #TheInbounderMJ Cachón Yáñez
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Apache Apex
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafkaconfluent
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected BreweryJason Hubbard
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsArun Kejariwal
 
B2B Marketing and The Power of Twitter
B2B Marketing and The Power of TwitterB2B Marketing and The Power of Twitter
B2B Marketing and The Power of TwitterSteve Yanor
 
Understanding P2P
Understanding P2PUnderstanding P2P
Understanding P2Purbanlabs
 
How do you make things stick?
How do you make things stick?How do you make things stick?
How do you make things stick?Marlies van Dijk
 
Fortune 1000 HR Leader Survey Results
Fortune 1000 HR Leader Survey ResultsFortune 1000 HR Leader Survey Results
Fortune 1000 HR Leader Survey ResultsChuck Solomon
 
Jenkins 2.0 最新事情 〜Make Jenkins Great Again〜
Jenkins 2.0 最新事情 〜Make Jenkins Great Again〜Jenkins 2.0 最新事情 〜Make Jenkins Great Again〜
Jenkins 2.0 最新事情 〜Make Jenkins Great Again〜Jumpei Miyata
 
Prins Amedeo officieel benoemd bij Gutzwiller bank
Prins Amedeo officieel benoemd bij Gutzwiller bankPrins Amedeo officieel benoemd bij Gutzwiller bank
Prins Amedeo officieel benoemd bij Gutzwiller bankThierry Debels
 
Qualité, bonnes pratiques et CMS - WordCamp Bordeaux - 18 mars 2017
Qualité, bonnes pratiques et CMS - WordCamp Bordeaux - 18 mars 2017Qualité, bonnes pratiques et CMS - WordCamp Bordeaux - 18 mars 2017
Qualité, bonnes pratiques et CMS - WordCamp Bordeaux - 18 mars 2017Elie Sloïm
 
NJ Future Redevelopment Forum 2017 Anderson
NJ Future Redevelopment Forum 2017 AndersonNJ Future Redevelopment Forum 2017 Anderson
NJ Future Redevelopment Forum 2017 AndersonNew Jersey Future
 
[GUIDE] Vigilance sommeil - Guide prévention et santé
[GUIDE] Vigilance sommeil - Guide prévention et santé [GUIDE] Vigilance sommeil - Guide prévention et santé
[GUIDE] Vigilance sommeil - Guide prévention et santé AG2R LA MONDIALE
 

Destaque (20)

Secure development environment @ Meet Magento Croatia 2017
Secure development environment @ Meet Magento Croatia 2017Secure development environment @ Meet Magento Croatia 2017
Secure development environment @ Meet Magento Croatia 2017
 
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
 
Diagnóstico SEO Técnico con Herramientas #TheInbounder
Diagnóstico SEO Técnico con Herramientas #TheInbounderDiagnóstico SEO Técnico con Herramientas #TheInbounder
Diagnóstico SEO Técnico con Herramientas #TheInbounder
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected Brewery
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and Systems
 
B2B Marketing and The Power of Twitter
B2B Marketing and The Power of TwitterB2B Marketing and The Power of Twitter
B2B Marketing and The Power of Twitter
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Understanding P2P
Understanding P2PUnderstanding P2P
Understanding P2P
 
How do you make things stick?
How do you make things stick?How do you make things stick?
How do you make things stick?
 
Velocity 2015-final
Velocity 2015-finalVelocity 2015-final
Velocity 2015-final
 
Fortune 1000 HR Leader Survey Results
Fortune 1000 HR Leader Survey ResultsFortune 1000 HR Leader Survey Results
Fortune 1000 HR Leader Survey Results
 
Jenkins 2.0 最新事情 〜Make Jenkins Great Again〜
Jenkins 2.0 最新事情 〜Make Jenkins Great Again〜Jenkins 2.0 最新事情 〜Make Jenkins Great Again〜
Jenkins 2.0 最新事情 〜Make Jenkins Great Again〜
 
Prins Amedeo officieel benoemd bij Gutzwiller bank
Prins Amedeo officieel benoemd bij Gutzwiller bankPrins Amedeo officieel benoemd bij Gutzwiller bank
Prins Amedeo officieel benoemd bij Gutzwiller bank
 
Qualité, bonnes pratiques et CMS - WordCamp Bordeaux - 18 mars 2017
Qualité, bonnes pratiques et CMS - WordCamp Bordeaux - 18 mars 2017Qualité, bonnes pratiques et CMS - WordCamp Bordeaux - 18 mars 2017
Qualité, bonnes pratiques et CMS - WordCamp Bordeaux - 18 mars 2017
 
Let’s grow
Let’s growLet’s grow
Let’s grow
 
NJ Future Redevelopment Forum 2017 Anderson
NJ Future Redevelopment Forum 2017 AndersonNJ Future Redevelopment Forum 2017 Anderson
NJ Future Redevelopment Forum 2017 Anderson
 
[GUIDE] Vigilance sommeil - Guide prévention et santé
[GUIDE] Vigilance sommeil - Guide prévention et santé [GUIDE] Vigilance sommeil - Guide prévention et santé
[GUIDE] Vigilance sommeil - Guide prévention et santé
 

Semelhante a Anomaly detection in real-time data streams using Heron

Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
 
HYPERSIM Relay Protection Webinar
HYPERSIM Relay Protection WebinarHYPERSIM Relay Protection Webinar
HYPERSIM Relay Protection WebinarEtienne Leduc
 
Data Volume Compression Using BIST to get Low-Power Pseudorandom Test Pattern...
Data Volume Compression Using BIST to get Low-Power Pseudorandom Test Pattern...Data Volume Compression Using BIST to get Low-Power Pseudorandom Test Pattern...
Data Volume Compression Using BIST to get Low-Power Pseudorandom Test Pattern...IJMTST Journal
 
What we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing systemWhat we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing systemAlejandro Llaves
 
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLOptimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLSpark Summit
 
NVIDIA @ Infinite Conference, London
NVIDIA @ Infinite Conference, LondonNVIDIA @ Infinite Conference, London
NVIDIA @ Infinite Conference, LondonAlison B. Lowndes
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Universität Rostock
 
Cognitive Engine: Boosting Scientific Discovery
Cognitive Engine:  Boosting Scientific DiscoveryCognitive Engine:  Boosting Scientific Discovery
Cognitive Engine: Boosting Scientific Discoverydiannepatricia
 
OPAL-RT Webinar - Challenges in Protection Relay Testing
OPAL-RT Webinar - Challenges in Protection Relay TestingOPAL-RT Webinar - Challenges in Protection Relay Testing
OPAL-RT Webinar - Challenges in Protection Relay TestingOPAL-RT TECHNOLOGIES
 
Vlsi projects
Vlsi projectsVlsi projects
Vlsi projectsshahu2212
 
Spark streaming for the internet of flying things 20160510.pptx
Spark streaming for the internet of flying things 20160510.pptxSpark streaming for the internet of flying things 20160510.pptx
Spark streaming for the internet of flying things 20160510.pptxPablo Francisco Pérez Hidalgo
 
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Experiences in ELK with D3.js  for Large Log Analysis  and VisualizationExperiences in ELK with D3.js  for Large Log Analysis  and Visualization
Experiences in ELK with D3.js for Large Log Analysis and VisualizationSurasak Sanguanpong
 
DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...
DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...
DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...Synack
 
Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...
Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...
Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...Synack
 
First aid andriod in defence
First aid andriod in defenceFirst aid andriod in defence
First aid andriod in defenceRehan Ahmed
 

Semelhante a Anomaly detection in real-time data streams using Heron (20)

Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
HYPERSIM Relay Protection Webinar
HYPERSIM Relay Protection WebinarHYPERSIM Relay Protection Webinar
HYPERSIM Relay Protection Webinar
 
Machine Learning @NECST
Machine Learning @NECSTMachine Learning @NECST
Machine Learning @NECST
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Data Volume Compression Using BIST to get Low-Power Pseudorandom Test Pattern...
Data Volume Compression Using BIST to get Low-Power Pseudorandom Test Pattern...Data Volume Compression Using BIST to get Low-Power Pseudorandom Test Pattern...
Data Volume Compression Using BIST to get Low-Power Pseudorandom Test Pattern...
 
What we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing systemWhat we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing system
 
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLOptimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone ML
 
Smallsat 2021
Smallsat 2021Smallsat 2021
Smallsat 2021
 
NVIDIA @ Infinite Conference, London
NVIDIA @ Infinite Conference, LondonNVIDIA @ Infinite Conference, London
NVIDIA @ Infinite Conference, London
 
Gene's law
Gene's lawGene's law
Gene's law
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
 
OPAL-RT ePHASORsim Webinar
OPAL-RT ePHASORsim WebinarOPAL-RT ePHASORsim Webinar
OPAL-RT ePHASORsim Webinar
 
Cognitive Engine: Boosting Scientific Discovery
Cognitive Engine:  Boosting Scientific DiscoveryCognitive Engine:  Boosting Scientific Discovery
Cognitive Engine: Boosting Scientific Discovery
 
OPAL-RT Webinar - Challenges in Protection Relay Testing
OPAL-RT Webinar - Challenges in Protection Relay TestingOPAL-RT Webinar - Challenges in Protection Relay Testing
OPAL-RT Webinar - Challenges in Protection Relay Testing
 
Vlsi projects
Vlsi projectsVlsi projects
Vlsi projects
 
Spark streaming for the internet of flying things 20160510.pptx
Spark streaming for the internet of flying things 20160510.pptxSpark streaming for the internet of flying things 20160510.pptx
Spark streaming for the internet of flying things 20160510.pptx
 
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Experiences in ELK with D3.js  for Large Log Analysis  and VisualizationExperiences in ELK with D3.js  for Large Log Analysis  and Visualization
Experiences in ELK with D3.js for Large Log Analysis and Visualization
 
DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...
DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...
DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...
 
Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...
Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...
Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...
 
First aid andriod in defence
First aid andriod in defenceFirst aid andriod in defence
First aid andriod in defence
 

Mais de Arun Kejariwal

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The EdgeArun Kejariwal
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseArun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar FunctionsArun Kejariwal
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series DataArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architecturesArun Kejariwal
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactArun Kejariwal
 
Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterArun Kejariwal
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceArun Kejariwal
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionArun Kejariwal
 
A Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldA Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldArun Kejariwal
 
Isolating Events from the Fail Whale
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail WhaleArun Kejariwal
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintArun Kejariwal
 
A Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudArun Kejariwal
 

Mais de Arun Kejariwal (18)

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the Enterprise
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar Functions
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data Applications
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series Data
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architectures
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 
Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ Twitter
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy service
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
 
A Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldA Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real World
 
Isolating Events from the Fail Whale
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail Whale
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud Footprint
 
A Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the Cloud
 

Último

NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 

Último (20)

NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 

Anomaly detection in real-time data streams using Heron

  • 1. Arun  Kejariwal                  Karthik  Ramasamy            MZ  Research                                                                      Twi.er Anomaly Detection in Real-Time Data Streams Using Heron
  • 2. 2
  • 3. 3 DATA  @  MZ   An  Overview GOW AND MOBILE STRIKE Peaked at 1M events/sec MARKETING Serve >1B impressions/day worldwide Integrated with >150 distinct advertising channels POTPOURRI ~35B messages/day Writes: 20TB/day
  • 4. 4 SENSORS Monitoring   Smartwatches,  Refrigerators   Wearables ACTUATORS Automa,on   Manufacturing   Robo@cs DRONES Expanding  the  scope   Delivery,  Real  Estate   Power  Transmission  Lines MOBILE Life’s  Remote  Control   Personaliza@on   Produc@vity EXPLOSION  IN  DATA  VELOCITY  AND  VOLUME
  • 5. 5 MANUFACTURING HEALTH   Care POWER   Grid GAS   Pipelines SECURITY OPERATIONS ROBOTICS #  TWEETS   per  minute ANOMALY  DETECTION:  WHY  BOTHER? DIGITAL   Marke,ng CONNECTED   Cars
  • 8. 8 RESEARCHED   FOR   >100  YEARS Manufacturing Econometrics Networking Image  Processing Computer  Vision (Cyber)  Security Text  Mining Signal  Processing Finance Experimental  Social  Psychology Web  Opera@ons Sta@s@cs  (and  Time  Series  Analysis) Data  Fidelity Astronomy ANOMALY  DETECTION:    APPLICATION  DOMAINS
  • 9. 9 ANOMALY  DETECTION:  RECENT  WORKS  IN  INDUSTRY JAN’15 MARCH’15 AUG’15 NOV’15NOV’15AUG’15 JULY’15 JUNE’16
  • 10. 10 FALSE   Posi@ve   Rate FALSE   Nega@ve   Rate SCALE   Data   Granularity WHY  NOT  USE  OFF-­‐THE-­‐SHELF? Anomalies  are  CONTEXTUAL
  • 11. 11 Severity Data   Characteris@cs Data     Fidelity Different  Ac@ons   Page  or  not   Sta@onarity,  Normal     Distribu,on   Missing  Data   Data  Corrup,on   MOSTLY  UNSUPERVISED
  • 12. 12 DATA  VISUALIZATION   Not  viable  in  prac2ce
  • 13. 13 MEAN AND STANDARD DEVIATION Mean: Compute incrementally Not robust in the presence of anomalies COMMONLY  USED  STATISTICS TRIMMED MEAN Robust in the presence of anomalies Small samples? How to handle asymmetric distributions? Results in a biased estimator What should be the trimming boundaries? WINSORIZED MEAN L-ESTIMATORS Linear combinations of order statistics
  • 14. 14 ROBUST  STATISTICS MEDIAN AND MEDIAN ABSOLUTE DEVIATION (MAD) Robust in the presence of anomalies Not amenable to incremental computation Use q-digest, t-digest What if MAD is zero? A sample with many similar values BROADENED MEDIAN, M-ESTIMATORS, SN AND QN
  • 15. 15 ANALYZE INDIVIDUAL TIME SERIES Too many alerts Not actionable Alert Fatigue MULTIPLE  TIME  SERIES   Methods MINIMUM COVARIANCE DETERMINANT (MCD) Proposed by Rousseeuw, 1984 Mahalanobis distance1 FastMCD [1]  “On  the  generalised  distance  in  sta/s/cs”,  by  P.  C.  Mahalanobis,  1936.  
  • 16. 16 MULTIPLE  TIME  SERIES   Other  Methods CORRELATION Direction Magnitude nxn Correlation Matrix? Bake in context Exploit topology
  • 17. 17 CHALLENGES Susceptible to Anomalies Data Skew Missing Data Speed MULTIPLE  TIME  SERIES   Other  Methods TECHNIQUES Robust Correlation Cross Correlation Intersection Analysis Trade-off between speed and accuracy
  • 19. 19 THE  FLOW   RTpla9orm  and  Heron Live  Data Streaming  Computa,on RTpla/orm
  • 20. 20 RTplatform Cloud-based platform built for connecting, processing, and reacting to live data. + Extreme scale + High performance + Unprecedented reliability + Natively serverless
  • 21. 21 RTplatform “Real-time” has many definitions that have variable KPIs. Real time results on data-at-rest, not on live data
  • 22. 22 Live Stream Bots A backbone for live data: Free Messaging for publishers and subscribers Filter, analyze and transform messages in live stream Notify Anomaly detection RTplatform MESSAGING Real-time Pub/Sub with ultra-low latency and high fanout QUERYING Filter, analyze, and transform messages live, in-stream BOTS Deploy rule-based bots for real-time anomaly detection/reaction
  • 24. HERON
  • 25. 25 HERON  DESIGN  GOALS Task isolation Ease  of  debug-­‐ability/isolaDon/profiling Support for back pressure Topologies  should  self  adjusDng Efficiency Reduce resource consumption Off -the-shelf schedulers Unmanaged    -­‐  Apache  YARN/Mesos   Managed  -­‐    Apache  Aurora,  Amazon  ECS Use of main stream languages C++,  Java  and  Python Batching of tuples AmorDzing  the  cost  of  transferring  tuples ! "# G 4 !
  • 27. 27 TOPOLOGY  ARCHITECTURE Topology Master ZK Cluster Stream Manager I1 I2 I3 I4 Stream Manager I1 I2 I3 I4 Logical Plan, Physical Plan and Execution State Sync Physical Plan CONTAINER CONTAINER Metrics Manager Metrics Manager 27
  • 28. 28 STREAM  MANAGER   Sample  Topology % % S1 B2 B3 % B4
  • 29. 29 HERON  PHYSICAL  EXECUTION S1 B2 B3 Stream Manager Stream Manager Stream Manager Stream Manager S1 B2 B3 B4 S1 B2 B3 S1 B2 B3 B4 B4
  • 30. 30 BACKPRESSURE   Stragglers  are  the  norm  in  a  mul2-­‐tenant  distributed  systems BAD HOST EXECUTION SKEW INADEQUATE PROVISIONING Ñ"
  • 31. 31 SENDERS TO STRAGGLER: DROP DATA BACKPRESSURE   Approaches  to  Handle  Stragglers DETECT STRAGGLERS AND RESCHEDULE THEM SENDERS SLOW DOWN TO THE SPEED OF STRAGGLER
  • 32. 32 BACKPRESSURE   Data  Drop  Strategy UNPREDICTABLE AFFECTS ACCURACY POOR VISIBILITY
  • 33. 33 BACKPRESSURE   Slow  Down  Sender HANDLES TEMPORARY SPIKES # PROCESSES DATA AT MAXIMUM RATE / PROVIDES PREDICTABILITY REDUCE RECOVERY TIMES
  • 34. 34 BACKPRESSURE   Stream  Manager TCP backpressure Spout based backpressure Stagewise backpressure ! ! !
  • 35. 35 BACKPRESSURE  -­‐  TCP   Stream  Manager Slows  upstream  and  downstream  instances S1 B2 B3 Stream Manager Stream Manager Stream Manager Stream Manager S1 B2 B3 B4 S1 B2 B3 S1 B2 B3 B4 B4
  • 36. 36 BACKPRESSURE  -­‐  SPOUT   Stream  Manager S1 S1 S1S1S1 S1 S1S1 B2 B3 Stream Manager Stream Manager Stream Manager Stream Manager B2 B3 B4 B2 B3 B2 B3 B4 B4
  • 37. 37 IN MOST SCENARIOS BACK PRESSURE RECOVERS Without any manual intervention BACKPRESSURE   In  Prac2ce SOMETIMES USER PREFERS DROPPING OF DATA Care about only latest data SUSTAINED BACK PRESSURE Irrecoverable GC cycles, Bad or faulty host
  • 38. 38 PREDICTABILITY Tuple failures are more deterministic BACKPRESSURE   Advantages SELF ADJUSTS Topology goes as fast as the slowest component
  • 39. 39 HERON:  EXTENSIBLE  STREAMING  ENGINE HARDWARE BASIC INTER/INTRA IPC Topology Master Stream Manager Instance Metrics Manager Scribe Graphite SCHEDULERSTATEMANAGER
  • 40. 40 PLUG AND PLAY COMPONENTS As environment changes, core does not change MULTI LANGUAGE INSTANCES Support multiple language API with native instances MULTIPLE PROCESSING SEMANTICS Efficient stream managers for each semantics EASE OF DEVELOPMENT Faster development of components with little dependency HERON:  EXTENSIBLE  STREAMING  ENGINE
  • 41. 41 REPEATED SERIALIZATION Java objects —> Byte Arrays —> Protocol Buffers EAGER DESERIALIZATION Stream manager deserializes entire tuple even though full contents are not examined IMMUTABILITY Stream manager does not reuse any ProtoBuf objects OPTIMIZING  HERON
  • 42. 42 HERON:  PERFORMANCE   At  most  once  seman2cs 0 2000 4000 6000 8000 10000 12000 25 100 200 MILLION TUPLES/MIN SPOUT PARALLELISM THROUGHPUT Without Optimizations With Optimizations 0 5 10 15 20 25 30 35 25 100 200 MILLION TUPLES/MIN SPOUT PARALLELISM THROUGHPUT PER CORE Without Optimizations With Optimizations
  • 43. 43 HERON:  PERFORMANCE   At  least  once  seman2cs 0 500 1000 1500 2000 2500 25 100 200 MILLION TUPLES/MIN SPOUT PARALLELISM THROUGHPUT Without Optimizations With Optimizations 0 20 40 60 80 100 120 140 160 180 25 100 200 MILLISECS SPOUT PARALLELISM LATENCY Without Optimizations With Optimizations
  • 44. 44 HERON:  PERFORMANCE   At  least  once  seman2cs  -­‐  Impact  of  Cache  Drain  Frequency 0 500 1000 1500 2000 2500 0 5 10 15 20 25 30 35 MILLION TUPLES/MIN CACHE DRAIN FREQUENCY (MS) THROUGHPUT VS CACHE DRAIN FREQUENCY 200 100 25 0 10 20 30 40 50 60 70 80 90 100 0 5 10 15 20 25 30 35 LATENCY (MS) CACHE DRAIN FREQUENCY (MS) LATENCY VS CACHE DRAIN FREQUENCY 200 100 25
  • 45. 45 HALBERT   Nakagawa   Co-­‐Founder  &  CTO FRANCOIS   Orsini   CTO JOSH   Lulewicz   Head  of  Data  Placorm WE  ARE  HIRING! KARTHIK   Ramasamy   Manager
  • 46. 46 QUESTIONS    ANSWERS   Go  ahead.   Don‘t  hesitate.
  • 47. 47 READINGS STROM @ TWITTER A. Toshniwall et. al, SIGMOD 2014. TWITTER HERON: STREAM PROCESSING AT SCALE S. Kulkarni et al., SIGMOD 2015. STREAMING @ TWITTER M. Fu, 2016. TWITTER HERON: TOWARDS EXTENSIBLE STREAMING ENGINES M. Fu, ICDE 2017.
  • 48. 48 READINGS LIMITS THEOREMS FOR THE MEDIAN DEVIATIONS P. Hall and A. H. Welsh, 1985. ALTERNATIVES TO MEDIAN ABSOLUTE DEVIATION P. J. Rousseeuw and C. Croux, 1993. ASYMPTOTIC INDEPENDENCE OF MEDIAN AND MAD M. Falk, 1997. BAHADUR REPRESENTATIONS FOR THE MEDIAN ABSOLUTE DEVIATION AND ITS MODIFICATIONS S. Mazumder and R. Serfling, 2009. THE MINIMUM REGULARIZED COVARIANCE DETERMINANT ESTIMATOR K. Boudt, P. J. Rousseeuw, S. Vanduffel and T. Verdonck, 2017.
  • 49. THANK  YOU   For  your  aKen2on!