cikm14

Hotspot Detection in a Service Oriented
Architecture
Pranay Anchuri, anchupa@cs.rpi.edu,
http://cs.rpi.edu/~anchupa
Rensselaer Polytechnic Institute, Troy, NY
Roshan Sumbaly, roshan@coursera.org
Coursera, Mountain View, CA
Sam Shah, samshah@linkedin.com
LinkedIn, Mountain View, CA

www.rpi.edu
 Largest professional
network.
 300M members
from 200 countries.
 2 new members per
second.

www.rpi.edu
Service Oriented Architecture

www.rpi.edu
What is a Hotspot
 Hotspot : Service responsible for suboptimal
performance of a user facing functionality.

www.rpi.edu
What is a Hotspot
 Hotspot : Service responsible for suboptimal
performance of a user facing functionality.
 Performance measures:
 Latency
 Cost to serve
 Error rate

www.rpi.edu
Who uses hotspot detection ?
 Engineering teams :
 Minimize latency for the user.
 Increase the throughput of the servers.
 Operations teams :
 Reduce the cost of serving user requests.

www.rpi.edu
Data - Service Call Graphs
 Service call metrics logged into a central
system.
 Call graph structure re-constructed from
random trace id.

www.rpi.edu
Example of Service Call Graph
Read
profile
Content
Service
Context
Service
Content
Service
Entitlements Visibility
3
7
12
10 11

www.rpi.edu
Challenges in mining hotspots

www.rpi.edu
Structure of call graphs
 Structure of call graphs change rapidly
across requests.
 Depends on member’s attributes.
 A/B testing.
 Changes to code base.
 Over 90% unique structures for most
requested services.

www.rpi.edu
Asynchronous service calls
 Calls AB, AC are
 Serial : C is called after B returns to A.
 Parallel : B and C are called at same time or in a
brief time span.
 Parallel service calls are particularly difficult
to handle.
 Degree of parallelism ~ 20 for some
services.

www.rpi.edu
Related Work
 Hu et. al [SIGCOMM 04, INFOCOMM 05]
 Tools to detect bottlenecks along network paths.
 Mann et. al [USENIX 11]
 Models to estimate latency as a function of RPC’s
latencies.

www.rpi.edu
Why existing methods don’t work ?
 Metric cannot be controlled as in bottleneck
detection algorithms.
 Analyzing millions of small networks.
 Parallel service calls.

www.rpi.edu
● Given call graphs
Optimize and summarize approach

www.rpi.edu
● Hotspots in each
call graph

www.rpi.edu
● Hotspots in each
call graph
● Ranking hotspots

www.rpi.edu
What are the top-k hotspots in a call graph ?
 Hotspots in a specific call
graph irrespective of
other call graphs for the
same type of request.

www.rpi.edu
Key Idea
What are the k services, if already optimized, that
would have lead to maximum reduction in the latency
of request ?
(Specific to a particular call graph)

www.rpi.edu
Quantifying impact of a service
 What if a service was optimized by
θ ? (think after the fact)

www.rpi.edu
Quantifying impact of a service
 What if a service was optimized by
θ ? (think after the fact)
 Its internal computations are θ times faster.
 No effect on the overall latency if its parent is
waiting on other service call to return.

www.rpi.edu
Example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]

www.rpi.edu
Example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
2x faster

www.rpi.edu
Example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
2x faster
Effect of 2x speedup

www.rpi.edu
Local effect of optimization
 Latency : Sum of computation and waiting
times.
 Effect : Lesser computation times and early
subcalls. 1) e′ 𝒗 = 𝒆 𝒗 − 𝟏 − 𝜽−𝟏 ∗ 𝒊=𝟏
𝒎
𝒄𝒊
2) s′ 𝒗 𝒋
= 𝒔 𝒗 𝒋
− 𝟏 − 𝜽−𝟏 ∗ 𝒊=𝟏
𝒏
𝒄𝒊
3) e′ 𝒗 𝒋
= 𝒆 𝒗 𝒋
− 𝟏 − 𝜽−𝟏 ∗ 𝒊=𝟏
𝒏
𝒄𝒊
𝒓(𝒗) = 𝒊=𝟏
𝒎
(𝒄𝒊 + 𝒘𝒊)
𝑣 is a service and 𝑣𝑗 is its subcall after
𝑛 computation intervals.

www.rpi.edu
Negative example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]

www.rpi.edu
Under the propagation assumption
 Computing the optimal 𝑘 services is NP-
hard.
 Reduction from a variation of subset sum
problem.
 Construction and proof in the paper.

www.rpi.edu
Relaxation
 Variation of the propagation assumption
that allows for a service to propagate
fractional effects to its parent.
 Leads to a greedy algorithm.

www.rpi.edu
Greedy algorithm to compute top-k
hotspots
 Given an optimization factor θ,
 Repeatedly select a service that has maximum impact
on frontend service.
 Update the times after each selection.
 Stop after k iterations.

www.rpi.edu
Ranking hotspots
 top 𝑘 services change
significantly across
different call graphs.
 Rank hotspots on:
 Frequency (itemset
mining)
 Impact on front end
service.

www.rpi.edu
Rest of the paper
 Similar approach applied to cost of request
metric.
 Generalized framework for optimizing
arbitrary metrics.
 Other ranking schemes.

www.rpi.edu
Dataset
Request
type
Avg # of
call graphs
per day*
Avg # of
service call
per
request
Avg # of
subcalls
per service
Max # of
parallel
subcalls
Home 10.2 M 16.90 1.88 9.02
Mailbox 3.33 M 23.31 1.9 8.88
Profile 3.14 M 17.31 1.86 11.04
Feed 1.75 M 16.29 1.87 8.97
* Scaled down by a constant factor

www.rpi.edu
vs Baseline algorithm

www.rpi.edu
User of the system

www.rpi.edu
Consistency over a time period

www.rpi.edu
Conclusions
 Defined hotspots in service oriented
architectures.
 Framework to mine hotspots w.r.t various
performance metrics.
 Experiments on real world large scale
datasets.

www.rpi.edu
Thanks
Questions ?

cikm14

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a cikm14

Semelhante a cikm14 (20)

cikm14

Notas do Editor