Practices of Presto & Alluxio in E-commerce Big Data

The Practice of Presto & Alluxio in
E-Commerce Big Data Platform
2020-12
Wenjun Tao, JD.com
Big Data Platfrom Engineer

1 2
3 4
JD BDP
Introducation of JD.com BDP architecture
Practice with Presto in BDP
Introducation of Presto and
practice in JD BDP
Presto & Alluxio Stack
Our user case of Presto & Alluxio
Ongoing Exploration
The features we are exploring
Contents

JD BDP
4
Tens of thousands of nodes
Thousands of users
cluster scale Computing ability
Tens of PB offline data
daily
Millions of jobs daily
Storage capacity
Hundreds of PB data
Tens of PB daily increase
Business scale
Tens of business units
Hundreds of data models

Our Works on Presto
8
Cluster Scaling01
03
Job Isolation & ERP Authorization02
Query Result Cache04
Operation & Maintenance

Presto on YARN
Unified Resource
ManagementYARN
Presto worker
scaling
Dynamic
Resource
Configure Presto in
WebConfiguration
load/unload pluginsPlugin

PowerServer for operation and maintenance
10
• export query result
• update plugin
Plugin manager
• route query to cluster
• adjust resource group
Dynamic Congfiguation
• track users’query
• security
ERP Authorization
• dynamic auto-scale
• start/stop cluster
Auto Maintenance

Periodical Queries
◉ controllable data range
◉ high query frequency
◉ high data reuse rate
◉ high proportion
Unpredicatible Queries
◉ controllable data range
◉ low query frequency
◉ low data resuse rate
◉ low proportion
Application Scenario

1 2 3
Cache based on TTL
Owing to so many users,
there will be many identical
SQL in the same time period.
However, if table updates by
users, the cache result may
be dirty.
Cache based on Hive MetaStore
According to the last
modification time of
metadata, we can judge
whether the data has been
updated, so as to determine
whether the cache is valid.
Query pre-execute
Based on the two cache
conditions above, there
is still room for
improvement in cache hit
rate.
Query Result Cache
• 0 calculation for workers
• Return query result quickly
• Relieve cluster pressure

Data Ecosystem with Alluxio
16
•Apps only talk
to Alluxio
•Simple
Add/Remove
•No App Changes
•Highest
performance in
Memory

Presto + Alluxio = Better Together
17
Higher query throughput
Consistent low query latency
Eliminates network traffic

JD Contribution to Alluxio
18
Business
Strategy
ui-grid based
sort/pagination/filter
add an input field
New Web UI
high watermark start
evict
low watermark stop evict
Watermark Evict Strategy
check startsup
check every time
Cache Consistency
monitor JVM pause periodically
log message and metrics
JVM Pause Monitor
cp/ls/load/rm/format
Shell Command
Deadlock
thrift add timeout time
…
Bugfix
shell
RESTful API
Change Log Level
SyncQuery
AlluxioTools
…
Test

Sync Evit Strategy Async Evit Strategy
Watermark Evict Strategy

Cache Consistency
Keep Alluxio & HDFS Consistency
RPC API
RESTful API
Alluxio Master startup
Client request metadata by getFileId, getFileInfo, listStatus,
etc
Alluxio master will check file cache consistency
To ensure that dirty data is not read. There are three ways to
trigger file consistency check.
calling reloadMetaData to trigger Alluxio to reload all
metadata
check file cache consistency while master start up

Presto on Alluxio
Why Presto on Alluxio?
High Performance
Consistent Low Query Latency
Eliminate Network Traffic
Others: Fault-tolerant & Pluggable
When we use Alluxio for Presto, we make some
changes and bring some good features
•Alluxio led to 10x performance improvement
•Hundreds of nodes
•More than 3 years in production enviroment.

Presto Exploration
Presto Master Load Balancing
Thread Level Resource Isolation
Unify Larger Clusters
As the amount of data grows, the cluster size becomes larger, and the number of
query tasks increases, Master will become a performance bottleneck. To achieve
load balancing, how to improve Presto will be a challenge.
The execution tasks running on the workers compete for resources, especially the
jobs in the test phase. If we can restrict the execution tasks with CGroups, it
will reduce the mutual impact among queries.
Large-scale cluster help improving resource utilization. In the past year, we have
reduced the number of clusters from more than 100 to 20. Within ensuring query
efficiency, we will further increase the cluster size to reduce the number of clusters.

Alluxio Exploration
Exploring more application scenarios
Porting HDFS Authentication to Alluxio
HDFS RBF or Alluxio
Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up
access to shuffle data
We are going to port custom authentication on our HDFS to Alluxio.
We have tried to use HDFS router-based fedration, but its performance does not
meet our online requirements. We find that Alluxio also has forwarding
capabilities and hopes that Alluxio will perform better, That is what we are
doing.

Practices of Presto & Alluxio in E-commerce Big Data

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Practices of Presto & Alluxio in E-commerce Big Data

Semelhante a Practices of Presto & Alluxio in E-commerce Big Data (20)

Mais de Alluxio, Inc.

Mais de Alluxio, Inc. (20)

Último

Último (20)

Practices of Presto & Alluxio in E-commerce Big Data