Mais conteúdo relacionado Semelhante a Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data Analytics (20) Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data Analytics1. Page 1 Hortonworks © 2014
Distilling Hadoop Patterns of Use
Shaun Connolly, Hortonworks
@shaunconnolly
March 25, 2014
2. Page 2 Hortonworks © 2014
Our Mission:
Our Commitment
Open Leadership
Drive innovation in the open exclusively via the
Apache community-driven open source process
Enterprise Rigor
Engineer, test and certify Apache Hadoop with
the enterprise in mind
Ecosystem Endorsement
Focus on deep integration with existing data
center technologies and skills
Headquarters: Palo Alto, CA
Employees: 300+ and growing
Reseller Partners
Enable your Modern Data Architecture by
Delivering Enterprise Apache Hadoop
3. Page 3 Hortonworks © 2014
Data Continues to Grow Sharply
2020:
Digital
universe
=
40
Ze'abytes
2012:
Digital
universe
=
20
Ze'abytes
1
Ze2abyte
(ZB)
=
1
billion
Terabytes
(TB)
2014:
31%
of
enterprises
managing
more
than
1
Petabyte
Social
Networks
Machine
Generated
Documents,
Emails
OLTP,
ERP,
CRM
Systems
Geoloca@on
Data
Sensor
Data
Web
Logs,
Click
Streams
85%
of
growth
from
new
types
of
data
with
machine-‐generated
data
increasing
15x
Sources:
IDC
and
IDG
Enterprise
4. Page 4 Hortonworks © 2014
Cameras and
microphones widely
deployed
New routes to market via
intelligent objects
Content and services
via connected
products
Everything
has a URL
Remote sensing of
objects and environment
Augmented
reality
Situational
decision support
Building and
infrastructure management
Over 50% of Internet connections are things:
2011: 15+ billion permanent, 50+ billion intermittent
2020: 30+ billion permanent, >200 billion intermittent
Source: Gartner Keynote at Hadoop Summit 2013
5. Page 5 Hortonworks © 2014
Harnessing Big Data is
transformational to business models
Enables the move from post-transaction,
reactive analysis of subsets of data stored in
silos to a world of pre-transaction, interactive
insights across all data that impacts both the top
and bottom lines
6. Page 6 Hortonworks © 2014
DATA
SYSTEMS
APPLICATIONS
Repositories
ROOMS
Sta@s@cal
Analysis
BI
/
Repor@ng,
Ad
Hoc
Analysis
Interac@ve
Web
&
Mobile
Applica@ons
Enterprise
Applica@ons
EDW MPPRDBMS
EDW
MPP
Governance
&
Integra=on
Security
Opera=ons
Data
Access
Data
Management
SOURCES
OLTP,
ERP,
CRM
Systems
Documents,
Emails
Web
Logs,
Click
Streams
Social
Networks
Machine
Generated
Sensor
Data
Geoloca@on
Data
Modern Data Architecture with Hadoop
OPERATIONS
TOOLS
Provision,
Manage &
Monitor
DEV
&
DATA
TOOLS
Build &
Test
ENTERPRISE HADOOP
7. Page 7 Hortonworks © 2014
MDA Unlocks New Approach to Insight
Enterprise
Hadoop
Mul@ple
Query
Engines
Itera@ve
Process:
Explore,
Transform,
Analyze
SQL
Single
Query
Engine
Repeatable
Linear
Process
Determine
list
of
ques@ons
Current
Approach
Apply
schema
on
write
Dependent
on
IT
Augment
with
Hadoop
Apply
schema
on
read
Support
range
of
access
paRerns
to
data
stored
in
HDFS
Design
solu@ons
Collect
structured
data
Ask
ques@ons
from
list
Detect
addi@onal
ques@ons
Batch
Interac@ve
Real-‐@me
Streaming
8. Page 8 Hortonworks © 2014
Schema-on-Write vs. Schema-on-Read
Standard Digital Camera
§ Zoom & focus first
§ Capture limited set of pixels
§ Crop around the focused area
Lytro Lightfield Camera
§ Capture entire lightfield
§ Infinite zoom & focus
§ Crop any captured areas
9. Page 9 Hortonworks © 2014
MDA Uses Commodity Compute + Storage
$0 $20,000 $40,000 $60,000 $80,000 $180,000
Cloud Storage
HADOOP
NAS
Engineered System
Hadoop Enables Scalable
Compute & Storage at a
Compelling Cost Structure
Fully Loaded Cost per Raw TB of Data (min – max cost)
EDW/MPP
SAN
10. Page 10 Hortonworks © 2014
MDA Optimizes Data Warehouse
Analytics
20%
ETL Process
30%
Operations
50%
Current Reality
§ EDW at capacity; some usage
from low value workloads
§ Older transformed data
archived, unavailable for
ongoing exploration
§ Source data often discarded
Operations
50%
Analytics
50%
HADOOP
Parse, cleanse,
apply structure, transform
Augment with Hadoop
§ Free up EDW resources from low
value tasks
§ Keep 100% of source data and
historical data for ongoing exploration
§ Mine data for value after loading it
because of schema-on-read
11. Page 11 Hortonworks © 2014
Integrating with Existing InvestmentsAPPLICATIONS
DATA
SYSTEM
SOURCES
RDBMS
EDW
MPP
Emerging
Sources
(Sensor,
Sen=ment,
Geo,
Unstructured)
HANA
BusinessObjects BI
OPERATIONAL
TOOLS
DEV
&
DATA
TOOLS
Exis=ng
Sources
(CRM,
ERP,
Clickstream,
Logs)
INFRASTRUCTURE
12. Page 12 Hortonworks © 2014
Powering the Modern Data Architecture
Enables
deep
insight
across
a
large,
broad,
diverse
set
of
data
at
efficient
scale
Mul=-‐Use
Data
PlaSorm
Store
all
data
in
one
place,
process
in
many
ways
1
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
n
Batch
Interac=ve
Real-‐=me
Streaming
Data Lake that contains ALL data;
raw sources and any processed data
over extended periods of time.
YARN
:
Data
Opera=ng
System
13. Page 13 Hortonworks © 2014
How
Hadoop?
“Hadoop
can
be
used
to
create
a
‘data
lake’
–
an
integrated
repository
of
data
from
internal
and
external
data
sources...
Data
combined
from
mulVple
silos
can
help
your
organizaVon
find
answers
to
complex
quesVons
that
no
one
has
previously
dared
ask
or
known
how
to
ask.”
-‐-‐
Forrester
14. Page 14 Hortonworks © 2014
The Common Journey with Hadoop
SCALE
SCOPE
More data and
analytic apps
New Analytic Apps
New types of data
LOB-driven
A Modern Data Architecture
RDBMS
MPP
EDW
Governance
&Integration
Security
Operations
Data Access
Data Management
15. Page 15 Hortonworks © 2014
Unlock Value in New Types of Data
1. Social
Understand how people are feeling and interacting –
right now
2. Clickstream
Capture and analyze website visitors’ data trails and
optimize your website
3. Sensor/Machine
Discover patterns in data streaming from remote
sensors and machines
4. Geographic
Analyze location-based data to manage operations
where they occur
5. Server Logs
Diagnose process failures and prevent security
breaches
6. Unstructured (txt, video, pictures, etc..)
Understand patterns in files across millions of web
pages, emails, and documents
Value
+ Online archive
Data that was once purged or moved
to tape can be stored in Hadoop to
discover long term trends and
previously hidden value
16. Page 16 Hortonworks © 2014
20 Business Applications of Hadoop
Industry Use Case Type of Data
Financial Services
New Account Risk Screens Text, Server Logs
Trading Risk Server Logs
Insurance Underwriting Geographic, Sensor, Text
Telecom
Call Detail Records (CDRs) Machine, Geographic
Infrastructure Investment Machine, Server Logs
Real-time Bandwidth Allocation Server Logs, Text, Social
Retail
360° View of the Customer Clickstream, Text
Localized, Personalized Promotions Geographic
Website Optimization Clickstream
Manufacturing
Supply Chain and Logistics Sensor
Assembly Line Quality Assurance Sensor
Crowdsourced Quality Assurance Social
Healthcare
Use Genomic Data in Medical Trials Structured
Monitor Patient Vitals in Real-Time Sensor
Pharmaceuticals
Recruit and Retain Patients for Drug Trials Social, Clickstream
Improve Prescription Adherence Social, Unstructured, Geographic
Oil & Gas
Unify Exploration & Production Data Sensor, Geographic & Unstructured
Monitor Rig Safety in Real-Time Sensor, Unstructured
Government
ETL Offload in Response to Federal Budgetary Pressures Structured
Sentiment Analysis for Government Programs Social
17. Page 17 Hortonworks © 2014
360° Customer View for Home Supply Retailer
Problem
Disjoint customer engagement across all channels
Data repositories on website traffic, POS transactions and in-
home services exist in separate silos
Unable to perform analytics on customer buying behavior
across all channels
Limited ability for targeted marketing to specific segments
Solution
Unified system of engagement via “golden record”
Golden record enables targeted marketing capabilities:
customized coupons, promotions and emails
Deep visibility into all customers and all market segments
Unlocks rich, informed cross-sell & up-sell opportunities
Creating Opportunity
Data: Clickstream,
Unstructured, Structured
Retail
Major home
improvement retailer
>$74B in revenue
>300K employees
>2,200 stores
18. Page 18 Hortonworks © 2014
Monetize Anonymous & Aggregate Banking Data
Problem
Unable to unlock valuable cross-sell banking data
Bank possesses data that indicates larger macro-economic
trends, which can be monetized in secondary markets
Data sets are isolated in legacy silos controlled by LOBs
Regulations and company policies protect customer privacy
IT challenged by joining data while guaranteeing anonymity
Solution
Create cross-LOB data lake of de-identified data
Mortgage bankers, consumer bankers, credit card group and
treasury bankers have access to the same cross-sell data
Single point of security & privacy for de-identification, masking,
encryption, authentication and access control
Interoperability with SAS, Red Hat & Splunk
Creating Opportunity
Data: Structured,
Clickstream, Social &
Unstructured
Banking
One of the largest
US banks
19. Page 19 Hortonworks © 2014
Improving Efficiency
Data: SensorOptimize High-Tech Manufacturing
Problem
Ineffective root cause analysis on product defects
200 million digital storage devices manufactured yearly
>10K faulty devices returned by customers every month
Limited data available for root cause analysis means that
diagnosing problems is highly manual (physical inspections)
Subset of sensor data from QA testing retained 3-12 months
Solution
Created sensor data lake for 10x quality improvement
Repository holds 24 months of data for each device
Manufacturing dashboard allows >1,000 employees to search
data, with results returned in less than 1 second
Quality improved 10x: rate down to ~1K faulty devices / month
Manufacturing
Digital Storage
Devices
>$15B in revenue
>85K employees
21. Page 21 Hortonworks © 2014
Enabling Hadoop for the Enterprise Journey
Capabili=es
Ensure
enterprise
capabili@es
are
delivered
in
100%
open
source
to
benefit
all
1
2Integra=on
Interoperable
with
exis@ng
data
center
investments
Skills
Leverage
your
exis@ng
skills:
development,
analy@cs,
opera@ons
3
Scale
Scope
More data and
analytic apps
New Analytic Apps
New types of data
LOB-driven
A Modern Data Architecture
RDBMS
MPP
EDW
Governance
&Integration
Security
Operations
Data Access
Data Management
22. Page 22 Hortonworks © 2014
Try Hadoop Today… Get Involved
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
San Jose, CA
June 3 - 5, 2014
REGISTER NOW
Amsterdam
April 2 - 3, 2014
REGISTER NOW