Alluxio Product School Webinar
Nov. 17, 2022
For more Alluxio Events: https://www.alluxio.io/events/
Speaker: Adit Madan (Director of Product Management, Alluxio)
In this webinar, Adit Madan, Director of Product Management at Alluxio, will highlight new features, enhanced manageability, improved security and performance in Alluxio 2.9 release.
2. Strong Demand For Simplification
Agility across private,
hybrid or multi-cloud
APPLICATION
PORTABILITY
Efficiently serve data to
analytics & AI
DATA ACCESS
ACROSS SILOS
≈
2
Isolation when
expanding to new apps
MULTI-TENANT
ISOLATION
3. MEET ALLUXIO
Data API, Data Cache, Data Tiering
HDFS Interface S3 Interface
POSIX Interface
Java File API
HDFS Driver Azure Driver
GCS Driver
S3 Driver
4. v v
DATACENTER 2
DATACENTER 1
Hive
MULTI-CLOUD ANALYTICS & AI PLATFORM
Example Solution Architecture with Alluxio
Unified
Namespace
Application portability across
compute & cloud
For Ex: SQL applications running in
one env can be migrated to a different
compute in another env
≈
No-copy Data Access across silos
For multiple instances of compute,
whether ephemeral or elastic
5. Whatʼs New in Alluxio 2.9
Multi-tenant & Multi-Environment Consistency
1. Cross-cluster Synchronization
Resource Efficiency
2. Paging Storage on Workers (Preview)
Application Portability
3. Flexible Access Control with S3 Data API
Efficient Operations
4. Kubernetes Operator
5. Master Health Status
ALLUXIO 5
6. ALLUXIO 6
Enables
• Isolation across tenants for agility
How
• Satellite Alluxio Clusters as an
popular way for splitting resources
across tenants
Whatʼs New
• Scalable metadata synchronization
across multiple consumers
• Reduced synchronization load on data
lake storage with each new tenant
Cross-cluster Synchronization
SCENARIO 1: MULTI-TENANT ISOLATION
Tenant 1 Tenant 2 Tenant 3
Across
Any Env
Sync Sync
Alluxio Alluxio Alluxio
2.9
7. ALLUXIO 7
Enables
• Platform elasticity across
environments
How
• Port Compute Clusters across
environments for additional capacity
Whatʼs New
• Consistent view of data across
environment
Cross-cluster Synchronization
SCENARIO 2: MULTI-ENVIRONMENT CONSISTENCY
2.9
Env 1 Env 2
Sync
Alluxio Alluxio
8. ALLUXIO 8
Scenario
• Two Alluxio clusters and a shared synchronized S3 bucket
Cluster Membership Master
alluxio-start.sh cross_cluster_master
Cluster 1
alluxio.master.cross.cluster.id=C1
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket
Cluster 2
alluxio.master.cross.cluster.id=C2
alluxio.master.cross.cluster.rpc.addresses=master-hostname1:20009
alluxio.user.file.writetype.default=CACHE_THROUGH
alluxio.master.mount.table.root.cross.cluster=true
alluxio.master.mount.table.root.ufs=s3://my-bucket
Cross-cluster Synchronization
GETTING STARTED
2.9
New Stateless Process
9. ALLUXIO 9
Enables
• Reduced infrastructure costs
How
• Fine-grained paging-level (e.g. 1 MB) storage
representation for caching on Alluxio workers as an
alternative to block-based storage (e.g. 64 MB)
What to Expect
• Lower space utilization with reduced amplification of data
in cache
• Depending on workload the amplification can be brought
down from 3x (for block-based storage) to 1.3x (w/ paging)
Paging Storage (Preview)
RESOURCE EFFICIENCY
Multiple
Alluxio
2.9
10. ALLUXIO 10
Enables
• A secure Data API across data sources
How
• Northbound S3 API for a variety of applications & compute
Whatʼs New
• Open authentication protocol to integrate with environment agnostic
identity management systems (such as PingFederate)
• Ex: User & Resource Attribute Based Access Control for more flexible policies
using Open Policy Agent (OPA)
Flexible Access Control with S3 Data API
APPLICATION PORTABILITY
2.9
11. ALLUXIO 11
Enables
• Reduced Operations Costs
How
• Kubernetes CRD as the way to
manage multiple Alluxio clusters
Whatʼs New
• Phase 1: Parity with Alluxio helm
charts and logging for resource usage
across clusters
Kubernetes Operator
EFFICIENT OPERATIONS
Tenant 1 Tenant 2 Tenant 3
Sync Sync
Alluxio Alluxio Alluxio
Namespace 1 Namespace 2 Namespace 3
2.9
12. ALLUXIO 12
What
• New Metric: master.system.status
How
• Infer Overall State as by inspecting a combination of
resource usage & critical internal data structures
Possible Statuses
• IDLE
• ACTIVE
• STRESSED
• OVERLOADED
Master Health Status
EFFICIENT OPERATIONS
2.9
13. ALLUXIO 13
Enables
• Agility, lower people & infra cost
How
• With the increase in number of Alluxio
clusters across environments, weʼre
improving both:
(a) people efficiency with better
management tooling, and
(b) resource efficiency with I/O
optimizations
2.9 Summary
USE CASE: HYBRID AND MULTI-CLOUD
Tenant 1 Tenant 2 Tenant 3
Sync Sync
Alluxio Alluxio Alluxio
2.9
14. Additional Resources
For more updates, check out the below resources:
1. Product Blog
2. Release Notes
a. Community Edition
b. Enterprise Edition
Free downloads of Alluxio 2.9 open source Community Edition and trials of
Alluxio Enterprise Edition are immediately available here:
https://www.alluxio.io/download/.
Questions or Feedback? Letʼs Connect on our Slack Channel
ALLUXIO 14