This document discusses JD.com's use of Presto and Alluxio in their big data platform (BDP) architecture. It provides an introduction to Presto and how JD.com uses it in their BDP, including scaling Presto on YARN and using PowerServer for operations and maintenance. It also discusses how Presto and Alluxio are used together to improve query performance through caching and eliminating network traffic. Finally, it outlines ongoing explorations around improving Presto and Alluxio, such as load balancing, resource isolation, supporting larger clusters, and porting HDFS authentication to Alluxio.
why an Opensea Clone Script might be your perfect match.pdf
Practices of Presto & Alluxio in E-commerce Big Data
1. The Practice of Presto & Alluxio in
E-Commerce Big Data Platform
2020-12
Wenjun Tao, JD.com
Big Data Platfrom Engineer
2. 1 2
3 4
JD BDP
Introducation of JD.com BDP architecture
Practice with Presto in BDP
Introducation of Presto and
practice in JD BDP
Presto & Alluxio Stack
Our user case of Presto & Alluxio
Ongoing Exploration
The features we are exploring
Contents
4. JD BDP
4
Tens of thousands of nodes
Thousands of users
cluster scale Computing ability
Tens of PB offline data
daily
Millions of jobs daily
Storage capacity
Hundreds of PB data
Tens of PB daily increase
Business scale
Tens of business units
Hundreds of data models
12. Periodical Queries
◉ controllable data range
◉ high query frequency
◉ high data reuse rate
◉ high proportion
Unpredicatible Queries
◉ controllable data range
◉ low query frequency
◉ low data resuse rate
◉ low proportion
Application Scenario
13. 1 2 3
Cache based on TTL
Owing to so many users,
there will be many identical
SQL in the same time period.
However, if table updates by
users, the cache result may
be dirty.
Cache based on Hive MetaStore
According to the last
modification time of
metadata, we can judge
whether the data has been
updated, so as to determine
whether the cache is valid.
Query pre-execute
Based on the two cache
conditions above, there
is still room for
improvement in cache hit
rate.
Query Result Cache
• 0 calculation for workers
• Return query result quickly
• Relieve cluster pressure
18. JD Contribution to Alluxio
18
Business
Strategy
ui-grid based
sort/pagination/filter
add an input field
New Web UI
high watermark start
evict
low watermark stop evict
Watermark Evict Strategy
check startsup
check every time
Cache Consistency
monitor JVM pause periodically
log message and metrics
JVM Pause Monitor
cp/ls/load/rm/format
Shell Command
Deadlock
thrift add timeout time
…
Bugfix
shell
RESTful API
Change Log Level
SyncQuery
AlluxioTools
…
Test
20. Cache Consistency
Keep Alluxio & HDFS Consistency
RPC API
RESTful API
Alluxio Master startup
Client request metadata by getFileId, getFileInfo, listStatus,
etc
Alluxio master will check file cache consistency
To ensure that dirty data is not read. There are three ways to
trigger file consistency check.
calling reloadMetaData to trigger Alluxio to reload all
metadata
check file cache consistency while master start up
21. Presto on Alluxio
Why Presto on Alluxio?
High Performance
Consistent Low Query Latency
Eliminate Network Traffic
Others: Fault-tolerant & Pluggable
When we use Alluxio for Presto, we make some
changes and bring some good features
•Alluxio led to 10x performance improvement
•Hundreds of nodes
•More than 3 years in production enviroment.
28. Presto Exploration
Presto Master Load Balancing
Thread Level Resource Isolation
Unify Larger Clusters
As the amount of data grows, the cluster size becomes larger, and the number of
query tasks increases, Master will become a performance bottleneck. To achieve
load balancing, how to improve Presto will be a challenge.
The execution tasks running on the workers compete for resources, especially the
jobs in the test phase. If we can restrict the execution tasks with CGroups, it
will reduce the mutual impact among queries.
Large-scale cluster help improving resource utilization. In the past year, we have
reduced the number of clusters from more than 100 to 20. Within ensuring query
efficiency, we will further increase the cluster size to reduce the number of clusters.
29. Alluxio Exploration
Exploring more application scenarios
Porting HDFS Authentication to Alluxio
HDFS RBF or Alluxio
Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up
access to shuffle data
We are going to port custom authentication on our HDFS to Alluxio.
We have tried to use HDFS router-based fedration, but its performance does not
meet our online requirements. We find that Alluxio also has forwarding
capabilities and hopes that Alluxio will perform better, That is what we are
doing.