Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
John Mallory,
Principal BDM, Storage, AWS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Defining the data lake
Getting to results
Making it easier
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Finding value in data is a journey
Business transformation
Business optimization
Business insights
Business monitoring
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Defining the AWS data lake
Data lakes are designed to provide:
Business Machine
Intelligence Learning Relational and non-relational data
Scale-out to Exabytes
DW Queries Big data
Interactive Real-time
processing
Catalog
Diverse set of analytics and machine learning tools
1001100001001010111001
0101011100101010000101
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lake on AWS components
AWS
Snowball
Amazon
Kinesis Data
AWS Direct
Connect
AWS Database
Migration
AWS Storage
Gateway
S3
Firehose Service
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fortnite
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Epic Games uses data lakes and analytics
S3
Game Tableau/BI Use Amazon EMR for large batch data processing
services
Databases
ETL using S3 Ad-hoc SQL
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why do we use Amazon S3 for data lakes?
Unmatched durability, Best security, Object-level controls Business insights Most ways to
availability, compliance, and audit into your data bring data in
and scalability capabilities
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingest Rapidly ingest all data sources
methods
IoT, sensor data, clickstream data,
social media feeds, streaming logs
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A data lake is not one big bucket
Highly decoupled configurations scale better, are more fault tolerant, and are cost optimized
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Set up a catalog, ETL, and data prep
with AWS Glue
Serverless provisioning, configuration,
and scaling to run your ETL jobs on
Apache Spark
Pay only for the resources used for jobs
Crawl your data sources, identify data
formats, and suggest schemas and
transformations
Automates the effort in building,
maintaining, and running ETL jobs
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Event-driven AWS Glue ETL pipeline
Let Amazon CloudWatch Events and AWS Lambda drive the pipeline
Presto Hive …
• Teams share S3 buckets and Zeppelin
“shared services”
architecture
Amazon S3 buckets
“Fine-grained” ownership
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Access controls applied at pipeline stages
Amazon S3 Amazon S3 Amazon
Amazon S3
Ingested Data Staged or Data Lake Redshift
Data Origin Sandboxed Data
Data cleaning/prep
Amazon EMR Amazon EMR
Accessible by:
Service role Data mgmt. service Data mgmt. service Data mgmt. service Reporting service
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control access to data with AWS Identity and Access
Management (IAM)
Configure Amazon S3 permissions
• Use S3 bucket policies for easy cross-account data IAM principals Amazon EMR Amazon
sharing Redshift
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optimizing data lake performance & costs
Amazon Amazon
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optimize costs with data tiering
Hot
HDFS Use Amazon EMR/Hadoop with
local HDFS for hottest datasets
S3 Glacier
Cold Deep Archive
Amazon S3 Analytics
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3 Intelligent-Tiering automates cost savings NEW!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Process data in place…
Amazon S3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3 Select
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Seamless integration with Amazon S3
Link your Amazon S3 dataset to your Amazon FSx for Lustre file system, then….
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sysco—Analytics on the data lake
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Lake Formation
Build a secure data lake in days
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Most partners to complement AWS offerings
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Featured Data & Analytics APN Consulting
Partners
AWS Data & Analytics Competency Partners have demonstrated success helping customers evaluate
and use the tools and best practices for collecting, storing, governing, and analyzing data, at any scale.
They help customers use data and analytics as a competitive differentiator and a primary source of
value generation. This includes designing and deploying data lakes and analytic solutions; defining and
enforcing data policies; security and management of personal information; creating data catalogs and
glossaries; data integration, data warehousing, reporting, dashboarding, data visualization; and more.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learn from AWS experts. Advance your skills and
knowledge. Build your future in the AWS Cloud.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why work with an APN Partner?
APN Partners are uniquely positioned APN Partners with deep expertise in
to help your organization at any AWS services:
stage of your cloud adoption journey, AWS Managed Service Provider (MSP)
and they:
Partners
• Share your goals—focused on your APN Partners with cloud infrastructure and
success application migration expertise
• Help you take full advantage of all the AWS Competency Partners
business benefits that AWS has to offer APN Partners with verified, vetted, and validated
specialized offerings
• Provide services and solutions to
support any AWS use case across your AWS Service Delivery Partners
full customer life cycle APN Partners with a track record of delivering
specific AWS services to customers
aws-apac-marketing@amazon.com
twitter.com/AWSCloud
facebook.com/AmazonWebServices
youtube.com/user/AmazonWebServices
slideshare.net/AmazonWebServices
twitch.tv/aws
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.