SlideShare uma empresa Scribd logo
1 de 51
Hotspot Detection in a Service Oriented
Architecture
Pranay Anchuri, anchupa@cs.rpi.edu,
http://cs.rpi.edu/~anchupa
Rensselaer Polytechnic Institute, Troy, NY
Roshan Sumbaly, roshan@coursera.org
Coursera, Mountain View, CA
Sam Shah, samshah@linkedin.com
LinkedIn, Mountain View, CA
www.rpi.edu
Introduction
www.rpi.edu
 Largest professional
network.
 300M members
from 200 countries.
 2 new members per
second.
www.rpi.edu
 Largest professional
network.
 300M members
from 200 countries.
 2 new members per
second.
www.rpi.edu
Service Oriented Architecture
www.rpi.edu
What is a Hotspot
 Hotspot : Service responsible for suboptimal
performance of a user facing functionality.
www.rpi.edu
What is a Hotspot
 Hotspot : Service responsible for suboptimal
performance of a user facing functionality.
 Performance measures:
 Latency
 Cost to serve
 Error rate
www.rpi.edu
Who uses hotspot detection ?
 Engineering teams :
 Minimize latency for the user.
 Increase the throughput of the servers.
 Operations teams :
 Reduce the cost of serving user requests.
www.rpi.edu
Goal
www.rpi.edu
Data - Service Call Graphs
 Service call metrics logged into a central
system.
 Call graph structure re-constructed from
random trace id.
www.rpi.edu
Example of Service Call Graph
Read
profile
Content
Service
Context
Service
Content
Service
Entitlements Visibility
3
7
12
10 11
www.rpi.edu
Example of Service Call Graph
Read
profile
Content
Service
Context
Service
Content
Service
Entitlements Visibility
3
7
12
10 11
www.rpi.edu
Example of Service Call Graph
Read
profile
Content
Service
Context
Service
Content
Service
Entitlements Visibility
3
7
12
10 11
www.rpi.edu
Example of Service Call Graph
Read
profile
Content
Service
Context
Service
Content
Service
Entitlements Visibility
3
7
12
10 11
www.rpi.edu
www.rpi.edu
Challenges in mining hotspots
www.rpi.edu
Structure of call graphs
 Structure of call graphs change rapidly
across requests.
 Depends on member’s attributes.
 A/B testing.
 Changes to code base.
 Over 90% unique structures for most
requested services.
www.rpi.edu
Asynchronous service calls
 Calls AB, AC are
 Serial : C is called after B returns to A.
 Parallel : B and C are called at same time or in a
brief time span.
 Parallel service calls are particularly difficult
to handle.
 Degree of parallelism ~ 20 for some
services.
www.rpi.edu
Related Work
 Hu et. al [SIGCOMM 04, INFOCOMM 05]
 Tools to detect bottlenecks along network paths.
 Mann et. al [USENIX 11]
 Models to estimate latency as a function of RPC’s
latencies.
www.rpi.edu
Why existing methods don’t work ?
 Metric cannot be controlled as in bottleneck
detection algorithms.
 Analyzing millions of small networks.
 Parallel service calls.
www.rpi.edu
Our approach
www.rpi.edu
● Given call graphs
Optimize and summarize approach
www.rpi.edu
● Given call graphs
● Hotspots in each
call graph
Optimize and summarize approach
www.rpi.edu
● Given call graphs
● Hotspots in each
call graph
● Ranking hotspots
Optimize and summarize approach
www.rpi.edu
What are the top-k hotspots in a call graph ?
 Hotspots in a specific call
graph irrespective of
other call graphs for the
same type of request.
www.rpi.edu
Key Idea
What are the k services, if already optimized, that
would have lead to maximum reduction in the latency
of request ?
(Specific to a particular call graph)
www.rpi.edu
Quantifying impact of a service
 What if a service was optimized by
θ ? (think after the fact)
www.rpi.edu
Quantifying impact of a service
 What if a service was optimized by
θ ? (think after the fact)
 Its internal computations are θ times faster.
 No effect on the overall latency if its parent is
waiting on other service call to return.
www.rpi.edu
Example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
www.rpi.edu
Example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
www.rpi.edu
Example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
2x faster
www.rpi.edu
Example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
2x faster
www.rpi.edu
Example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
2x faster
Effect of 2x speedup
www.rpi.edu
Local effect of optimization
 Latency : Sum of computation and waiting
times.
 Effect : Lesser computation times and early
subcalls. 1) e′ 𝒗 = 𝒆 𝒗 − 𝟏 − 𝜽−𝟏 ∗ 𝒊=𝟏
𝒎
𝒄𝒊
2) s′ 𝒗 𝒋
= 𝒔 𝒗 𝒋
− 𝟏 − 𝜽−𝟏 ∗ 𝒊=𝟏
𝒏
𝒄𝒊
3) e′ 𝒗 𝒋
= 𝒆 𝒗 𝒋
− 𝟏 − 𝜽−𝟏 ∗ 𝒊=𝟏
𝒏
𝒄𝒊
𝒓(𝒗) = 𝒊=𝟏
𝒎
(𝒄𝒊 + 𝒘𝒊)
𝑣 is a service and 𝑣𝑗 is its subcall after
𝑛 computation intervals.
www.rpi.edu
Negative example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
www.rpi.edu
Negative example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
www.rpi.edu
Negative example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
www.rpi.edu
Example
www.rpi.edu
Under the propagation assumption
 Computing the optimal 𝑘 services is NP-
hard.
 Reduction from a variation of subset sum
problem.
 Construction and proof in the paper.
www.rpi.edu
Relaxation
 Variation of the propagation assumption
that allows for a service to propagate
fractional effects to its parent.
 Leads to a greedy algorithm.
www.rpi.edu
Greedy algorithm to compute top-k
hotspots
 Given an optimization factor θ,
 Repeatedly select a service that has maximum impact
on frontend service.
 Update the times after each selection.
 Stop after k iterations.
www.rpi.edu
Ranking hotspots
 top 𝑘 services change
significantly across
different call graphs.
 Rank hotspots on:
 Frequency (itemset
mining)
 Impact on front end
service.
www.rpi.edu
Rest of the paper
 Similar approach applied to cost of request
metric.
 Generalized framework for optimizing
arbitrary metrics.
 Other ranking schemes.
www.rpi.edu
Results
www.rpi.edu
Dataset
Request
type
Avg # of
call graphs
per day*
Avg # of
service call
per
request
Avg # of
subcalls
per service
Max # of
parallel
subcalls
Home 10.2 M 16.90 1.88 9.02
Mailbox 3.33 M 23.31 1.9 8.88
Profile 3.14 M 17.31 1.86 11.04
Feed 1.75 M 16.29 1.87 8.97
* Scaled down by a constant factor
www.rpi.edu
vs Baseline algorithm
www.rpi.edu
User of the system
www.rpi.edu
Consistency over a time period
www.rpi.edu
Conclusion
www.rpi.edu
Conclusions
 Defined hotspots in service oriented
architectures.
 Framework to mine hotspots w.r.t various
performance metrics.
 Experiments on real world large scale
datasets.
www.rpi.edu
Thanks
Questions ?

Mais conteúdo relacionado

Semelhante a cikm14

Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit
 
Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Steve Feldman
 
Parallel analytics as a service
Parallel analytics as a serviceParallel analytics as a service
Parallel analytics as a servicePetrie Wong
 
Root Cause Detection in a Service-Oriented Architecture
Root Cause Detection in a Service-Oriented ArchitectureRoot Cause Detection in a Service-Oriented Architecture
Root Cause Detection in a Service-Oriented ArchitectureSam Shah
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document usefulssuser3c3f88
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
Final Year IEEE Project 2013-2014 - Web Services Project Title and Abstract
Final Year IEEE Project 2013-2014  - Web Services Project Title and AbstractFinal Year IEEE Project 2013-2014  - Web Services Project Title and Abstract
Final Year IEEE Project 2013-2014 - Web Services Project Title and Abstractelysiumtechnologies
 
Developing Highly Instrumented Applications with Minimal Effort
Developing Highly Instrumented Applications with Minimal EffortDeveloping Highly Instrumented Applications with Minimal Effort
Developing Highly Instrumented Applications with Minimal EffortTim Hobson
 
Performance Evaluation of a Network Using Simulation Tools or Packet Tracer
Performance Evaluation of a Network Using Simulation Tools or Packet TracerPerformance Evaluation of a Network Using Simulation Tools or Packet Tracer
Performance Evaluation of a Network Using Simulation Tools or Packet TracerIOSRjournaljce
 
A Pragmatic Approach to Semantic Repositories Benchmarking
A Pragmatic Approach to Semantic Repositories BenchmarkingA Pragmatic Approach to Semantic Repositories Benchmarking
A Pragmatic Approach to Semantic Repositories BenchmarkingDhaval Thakker
 
Starting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsStarting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsDynatrace
 
Nonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the CoinNonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the CoinTechWell
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics HeroTechWell
 
Performance Testing using LoadRunner
Performance Testing using LoadRunnerPerformance Testing using LoadRunner
Performance Testing using LoadRunnerKumar Gupta
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSease
 
Web Page Recommendation using Domain Knowledge and Web Usage Knowledge
Web Page Recommendation using Domain Knowledge and Web Usage KnowledgeWeb Page Recommendation using Domain Knowledge and Web Usage Knowledge
Web Page Recommendation using Domain Knowledge and Web Usage KnowledgeIRJET Journal
 
IRJET- Constrained Role Mining using K-Map
IRJET- Constrained Role Mining using K-MapIRJET- Constrained Role Mining using K-Map
IRJET- Constrained Role Mining using K-MapIRJET Journal
 

Semelhante a cikm14 (20)

Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07
 
Parallel analytics as a service
Parallel analytics as a serviceParallel analytics as a service
Parallel analytics as a service
 
Fundamentals Performance Testing
Fundamentals Performance TestingFundamentals Performance Testing
Fundamentals Performance Testing
 
Root Cause Detection in a Service-Oriented Architecture
Root Cause Detection in a Service-Oriented ArchitectureRoot Cause Detection in a Service-Oriented Architecture
Root Cause Detection in a Service-Oriented Architecture
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document useful
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
Final Year IEEE Project 2013-2014 - Web Services Project Title and Abstract
Final Year IEEE Project 2013-2014  - Web Services Project Title and AbstractFinal Year IEEE Project 2013-2014  - Web Services Project Title and Abstract
Final Year IEEE Project 2013-2014 - Web Services Project Title and Abstract
 
Developing Highly Instrumented Applications with Minimal Effort
Developing Highly Instrumented Applications with Minimal EffortDeveloping Highly Instrumented Applications with Minimal Effort
Developing Highly Instrumented Applications with Minimal Effort
 
Performance Evaluation of a Network Using Simulation Tools or Packet Tracer
Performance Evaluation of a Network Using Simulation Tools or Packet TracerPerformance Evaluation of a Network Using Simulation Tools or Packet Tracer
Performance Evaluation of a Network Using Simulation Tools or Packet Tracer
 
A Pragmatic Approach to Semantic Repositories Benchmarking
A Pragmatic Approach to Semantic Repositories BenchmarkingA Pragmatic Approach to Semantic Repositories Benchmarking
A Pragmatic Approach to Semantic Repositories Benchmarking
 
Starting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsStarting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for Ops
 
Nonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the CoinNonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the Coin
 
OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
 
Performance Testing using LoadRunner
Performance Testing using LoadRunnerPerformance Testing using LoadRunner
Performance Testing using LoadRunner
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer Perspective
 
Web Page Recommendation using Domain Knowledge and Web Usage Knowledge
Web Page Recommendation using Domain Knowledge and Web Usage KnowledgeWeb Page Recommendation using Domain Knowledge and Web Usage Knowledge
Web Page Recommendation using Domain Knowledge and Web Usage Knowledge
 
IRJET- Constrained Role Mining using K-Map
IRJET- Constrained Role Mining using K-MapIRJET- Constrained Role Mining using K-Map
IRJET- Constrained Role Mining using K-Map
 

cikm14

Notas do Editor

  1. Alternative : k services with maximum latency.
  2. Baseline algorithm : The top-k hotspots in a call graph are the top-k services in decreasing order their latency.