SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless Data Lake Workshop
Amardeep Chudda
Solutions Architect
Amazon Web Services
A R C 3 0 2
Mike Gillespie
Solutions Architect
Amazon Web Services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Development Environment Setup
Review Data Lake Architecture
Why Serverless?
Glue Extract Transform Load (ETL)
Data Governance
Bonus Content
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Related breakouts
Tuesday, Nov 27
ANT354-R - [REPEAT] Build a Query to Analyze Data in Your Amazon
Redshift Warehouse & S3 Data Lake Together
Time – 8:30 AM to 9:30 PM | Mirage
Friday, Nov 30
AIM405-R1 - [REPEAT 1] Better Analytics Through Natural
Language Processing
Time – 11:30 PM to 12:30 PM | Venetian
Thursday, Nov 29
ADT301 - Create a Serverless Web Event Pipeline
Time – 4:00 PM to 5:00 PM | Mirage
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scenario
You support a successful online ecommerce website with millions of users. The
website is tracking your end user activity and their buying habits online.
Your analytics team would like the ability to query data in both ad-hoc queries and
using Business Intelligence tools with a end goal of helping business teams derive
efficiencies in their marketing campaigns. You want to enable your analytics team
but at the same time you don’t want to loose the focus on data quality and
governance controls.
Data Sources include weblogs, NoSQL databases and other datasources
Your task is to build a cost effective solution to have a unified analytics
environment.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
re:Invent workshop summary
• Ingest data from various data sources and join them together
• Enrich raw data
• Convert data to parquet for efficient querying
• Grant access to roles based on the data classification
• SQL Access for Data Scientists
• Data Visualization with charts and graphs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. Your own device for console access
2. An AWS account that you are able to use for testing.
(Should not be used for production or other purposes.)
3. Workshop on GitHub at https://bit.ly/2RX54o3
Requirements
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Development environment
Your Cloud Engineering team has deployed a development environment for you
Ingestion / Data Generation
Kinesis / Log Data
Data Generation Lambda Functions
Amazon Simple Storage Service (Amazon S3) Buckets
Amazon DynamoDB
AWS Glue Management Console / Development Endpoint
Amazon Athena
Amazon QuickSight
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. Deploy the Lab CloudFormation template from here
https://bit.ly/2RX54o3
2. Examine the environment in AWS CloudFormation
Designer
3. Deploy your stack
Deploy the lab environment
Template
Stack
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
High-level architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Firehose
• Serverless, easy to use
• Seamless integration with AWS data stores
• Support for serverless transformation
• Near real-time ingestion
• Pay only for what you use
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Simple Storage Service (Amazon S3)
• Object store
• Highly durable
• Limitless scalability
• Pay for what you use
• Comprehensive security & compliance capabilities
• Support for query in place
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue
• Serverless ETL
• Universal Data Catalog
• Open source Apache Spark environment
• DynamicFrame – Built in functions
• Seamless integration with AWS services
• Support for on-premises data stores
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Athena
• Serverless interactive query service
• Integrated with AWS Glue Data Catalog
• Open source, built on Presto, query with standard SQL
• Pay per query
• Support for standard formats like CSV, JSON, ORC, Avro and Parquet
• Fast parallel query execution
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon QuickSight
• Serverless, end to end BI solution
• Built-in SPICE engine
• Smart visualizations
• Seamless integration with AWS services
• On-premises database support
• Pay only for what you use
• Multiple device support
• Share and collaborate
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data classification and security
• Grant S3 access by role to bucket / prefix
• Approaches to segment data
• Multiple copies of the data in different buckets
• Tokenization, join to tokenized tables, and views to
resolve them
Bucket with
objects
Role Permissions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
UserProfile
Duplication
ID First Last
1 Sam Smith
2 Jane Jones
UserProfileSecure
ID First Last SSN
1 Sam Smith 111-11-1111
2 Jane Jones 222-22-2222
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Duplication
UserProfile
ID First Last
1 Sam Smith
2 Jane Jones
UserProfileSecure
ID First Last SSN
1 Sam Smith 111-11-1111
2 Jane Jones 222-22-2222
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tokenization
UserProfile
ID First Last SSN_Token
1 Sam Smith 8c9d409dcc43
2 Jane Jones 06a38ea94e69
SSN_Tokens
Token SSN
8c9d409dcc43 111-11-1111
06a38ea94e69 222-22-2222
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tokenization
ProfileView
ID First Last
1 Sam Smith
2 Jane Jones
ProfileSecureView
ID First Last SSN
1 Sam Smith 111-11-1111
2 Jane Jones 222-22-2222
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Redshift Spectrum
UserProfileSecure
ID First Last SSN
1 Sam Smith 111-11-1111
2 Jane Jones 222-22-2222
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bonus Content
• AWS Glue Development Endpoints – Apache Zeppelin notebook
• Amazon Redshift/Spectrum Integration
• AWS Database Migration Service (DMS) - Importing files from S3 to
DynamoDB
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amar, Mike
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Mais conteúdo relacionado

Mais de Amazon Web Services

Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSAmazon Web Services
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAmazon Web Services
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightAmazon Web Services
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotAmazon Web Services
 
Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Amazon Web Services
 
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?Amazon Web Services
 
Protect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced AttacksProtect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced AttacksAmazon Web Services
 
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Amazon Web Services
 

Mais de Amazon Web Services (20)

Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei server
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSight
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
 
Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows
 
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
 
Protect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced AttacksProtect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced Attacks
 
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
 

Architecting a Serverless Data Lake (ARC302) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Serverless Data Lake Workshop Amardeep Chudda Solutions Architect Amazon Web Services A R C 3 0 2 Mike Gillespie Solutions Architect Amazon Web Services
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Development Environment Setup Review Data Lake Architecture Why Serverless? Glue Extract Transform Load (ETL) Data Governance Bonus Content
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Related breakouts Tuesday, Nov 27 ANT354-R - [REPEAT] Build a Query to Analyze Data in Your Amazon Redshift Warehouse & S3 Data Lake Together Time – 8:30 AM to 9:30 PM | Mirage Friday, Nov 30 AIM405-R1 - [REPEAT 1] Better Analytics Through Natural Language Processing Time – 11:30 PM to 12:30 PM | Venetian Thursday, Nov 29 ADT301 - Create a Serverless Web Event Pipeline Time – 4:00 PM to 5:00 PM | Mirage
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scenario You support a successful online ecommerce website with millions of users. The website is tracking your end user activity and their buying habits online. Your analytics team would like the ability to query data in both ad-hoc queries and using Business Intelligence tools with a end goal of helping business teams derive efficiencies in their marketing campaigns. You want to enable your analytics team but at the same time you don’t want to loose the focus on data quality and governance controls. Data Sources include weblogs, NoSQL databases and other datasources Your task is to build a cost effective solution to have a unified analytics environment.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. re:Invent workshop summary • Ingest data from various data sources and join them together • Enrich raw data • Convert data to parquet for efficient querying • Grant access to roles based on the data classification • SQL Access for Data Scientists • Data Visualization with charts and graphs
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1. Your own device for console access 2. An AWS account that you are able to use for testing. (Should not be used for production or other purposes.) 3. Workshop on GitHub at https://bit.ly/2RX54o3 Requirements
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Development environment Your Cloud Engineering team has deployed a development environment for you Ingestion / Data Generation Kinesis / Log Data Data Generation Lambda Functions Amazon Simple Storage Service (Amazon S3) Buckets Amazon DynamoDB AWS Glue Management Console / Development Endpoint Amazon Athena Amazon QuickSight
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1. Deploy the Lab CloudFormation template from here https://bit.ly/2RX54o3 2. Examine the environment in AWS CloudFormation Designer 3. Deploy your stack Deploy the lab environment Template Stack
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. High-level architecture
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Firehose • Serverless, easy to use • Seamless integration with AWS data stores • Support for serverless transformation • Near real-time ingestion • Pay only for what you use
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Simple Storage Service (Amazon S3) • Object store • Highly durable • Limitless scalability • Pay for what you use • Comprehensive security & compliance capabilities • Support for query in place
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue • Serverless ETL • Universal Data Catalog • Open source Apache Spark environment • DynamicFrame – Built in functions • Seamless integration with AWS services • Support for on-premises data stores
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Athena • Serverless interactive query service • Integrated with AWS Glue Data Catalog • Open source, built on Presto, query with standard SQL • Pay per query • Support for standard formats like CSV, JSON, ORC, Avro and Parquet • Fast parallel query execution
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon QuickSight • Serverless, end to end BI solution • Built-in SPICE engine • Smart visualizations • Seamless integration with AWS services • On-premises database support • Pay only for what you use • Multiple device support • Share and collaborate
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data classification and security • Grant S3 access by role to bucket / prefix • Approaches to segment data • Multiple copies of the data in different buckets • Tokenization, join to tokenized tables, and views to resolve them Bucket with objects Role Permissions
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. UserProfile Duplication ID First Last 1 Sam Smith 2 Jane Jones UserProfileSecure ID First Last SSN 1 Sam Smith 111-11-1111 2 Jane Jones 222-22-2222
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Duplication UserProfile ID First Last 1 Sam Smith 2 Jane Jones UserProfileSecure ID First Last SSN 1 Sam Smith 111-11-1111 2 Jane Jones 222-22-2222
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Tokenization UserProfile ID First Last SSN_Token 1 Sam Smith 8c9d409dcc43 2 Jane Jones 06a38ea94e69 SSN_Tokens Token SSN 8c9d409dcc43 111-11-1111 06a38ea94e69 222-22-2222
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Tokenization ProfileView ID First Last 1 Sam Smith 2 Jane Jones ProfileSecureView ID First Last SSN 1 Sam Smith 111-11-1111 2 Jane Jones 222-22-2222
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Redshift Spectrum UserProfileSecure ID First Last SSN 1 Sam Smith 111-11-1111 2 Jane Jones 222-22-2222
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bonus Content • AWS Glue Development Endpoints – Apache Zeppelin notebook • Amazon Redshift/Spectrum Integration • AWS Database Migration Service (DMS) - Importing files from S3 to DynamoDB
  • 27. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amar, Mike
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.