SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
1Copyright © Capgemini 2016. All Rights Reserved
Bigdata Architecture Overview
2Copyright © Capgemini 2016. All Rights Reserved
Gartner Hype Cycle – Emerging Technologies
3Copyright © Capgemini 2016. All Rights Reserved
Benefits
4Copyright © Capgemini 2016. All Rights Reserved
Big Data and its Dimensions
Extracting insight from an immense volume, variety and velocity of data, in context, beyond
what was previously possible
Manage the complexity of data in many different
structures, ranging from relational, to logs, to raw
text
Streaming data and large volume data movement
Scale from Terabytes to Petabytes
(1K TBs) to Zetabytes (1B TBs)
Having a lot of data in different volumes coming in
at high speed is worthless if that data is incorrect.
Organizations need to ensure that the data is
correct as well as the analyses performed on the
data are correct.
Discovering value from multichannel datasets
Variety:
Velocity:
Volume:
Veracity:
Value:
5Copyright © Capgemini 2016. All Rights Reserved
Applications for Big Data Analytics
Homeland Security
FinanceSmarter Healthcare Multi-channel sales
Telecom
Manufacturing
Traffic Control
Trading Analytics Fraud and Risk
Log Analysis
Search Quality
Retail: Churn
6Copyright © Capgemini 2016. All Rights Reserved
Manage
 Data governance and security
 Data privacy
 Compliance
 Collaboration
 Value generation
 Program delivery
 Data-driven culture
 Information strategy
 Skill development
 Master data mgmt
 Metadata mgmt
 Data quality mgmt
 Operations, SLA’s
 Orchestration
General reference architecture for Big Data Analytics
ValueActInsightAnalyzeInformationProcessSource
data
Customer
profitability
Operational cost
cutting
Risk prevention
Market share
increase
Business Applications
 Customer
campaign
 Trigger activity
Business Processes
 Trigger event
 Adjust process
Decision makers
 Approve/reject
business
opportunities
 Develop new
business models
and products
Customer
Experience
Operational Process
Optimization
Risk, Fraud
Disruptive Business
Model
Search
What is relevant?
Explorative
How does it work?
Descriptive
What
happened?
Diagnostic
Why did it happen?
Predictive
What
will happen?
Prescriptive
How to
act next?
Data asset
descriptions
Processed data
 Measures, KPI’s
 Dimensions,
Master data
Granular data
 Events
 Context
information
Ingest
Catalog
Stream
Store
Prepare
Refine, blend
Manage lifecycle
Internal data
 IT managed
applications (ERP,
SCM, CRM)
 Master and
reference data
 Business owned
informal data
 Documents, mail,
images, voice,
video
 Web and mobile
apps
 B2B
 Internet, Social,
Internet of Things
(machine, sensor)
 Third party data:
market, weather,
climate,
geolocation
 Open data
External Data
Business
performance
Performance
improvement
Mask
7Copyright © Capgemini 2016. All Rights Reserved
The BDL is also aligned with our principles 
Unleash Data and Insights
as-a-service
Make Insight-driven
Value a Crucial
Business KPI
Empower your People
with Insights at the
Point of Action
Develop an Enterprise Data
Science Culture
Master Governance,
Security and Privacy of your
Data Assets
Enable your Data
Landscape for the Flood
coming from Connected
People and Things
Embark on the Journey
to Insights within your
Business and
Technology Context
1 2 3
7654
It concerns both
Business and
(disruptive) Technology
It works with high volumes of
all kinds of data
It integrates Unified Data
Management capabilities to
manage governance, security,
privacy, MDM, RDM, etc
it also comes with a new,
specific mindset that has to
be addressed at the
Enterprise level
We (Capgemini) intend to
offer the BDL as-a-Service
Bringing Business Value by
delivering Insights at the Point
of Action is the motto of the
BDL
1
2 3
7
654
8Copyright © Capgemini 2016. All Rights Reserved
Business Data Lake Reference Architecture - Conceptual
Characteristics
 Store-anything; analyze everything
 Blend traditional data elements with
new data types
 Manage centrally, govern locally
 Future-proof design
 Highly scalable and available
Data Access Layer
Data Distillation Layer
Data Quality Governance Framework (Business Rules, Transformation, Aggregation)
Customer Master (CRM)
Data Lake Layer
Landing
Self-service
4
Data Ingestion LayerExtract & Load Streams
3
Structured data
Sources
2
1
ODS
SandboxSQL-on-Hadoop In-Memory Grid
Data Visualization and
Reporting
Advanced
Analytics
Data Virtualization
Or Blending
Marts
DataGovernance(Audit,Lineage)
7
MetadataManagement
Transactional
Systems(RES/CRM) Un/Semi-Structured Data Sources
Data Dissemination Layer Data Provisioning Layer
HR
Mart
1 HR
Mart
2
Distributed Compute Layer
/ Services
Distributed Storage Layer
Data Governance
Integration
APILayer
11 6 5
DataSecurity(Authentication,Authorization,Kerberos)
8 9
10
9Copyright © Capgemini 2016. All Rights Reserved
Business Data Lake Reference Architecture - Logical
Talend 6.3 or
latest
Data Access Layer
Data Distillation Layer
Data Quality Governance Framework (Business Rules, Transformation, Aggregation)
Customer Master (CRM)
Data Lake Layer
Landing
4
Data Ingestion LayerExtract & Load Streams
3
Structured data
Sources
2
1
ODS
SandboxSQL-on-Hadoop In-Memory Grid
Data Virtualization
Or Blending
Marts
DataGovernance(Audit,Lineage)
7
MetadataManagement
Transactional
Systems(RES/CRM) Un/Semi-Structured Data Sources
Data Dissemination Layer Data Provisioning Layer
HR
Mart
1 HR
Mart
2
APILayer
11 6 5
DataSecurity(Authentication,Authorization,Kerberos)
8 9
10
Ranger, Knox
Atlas
Hortonworks HDP 2.5
or latest
Spark
HBASE Hive
HBASE / Hive
Datamarts
Redshift
Zeppellin
RESTful
Service
Self-serviceData Visualization and
Reporting
Advanced
Analytics
Spark
Streaming/Storm
Kafka
10Copyright © Capgemini 2016. All Rights Reserved
Detailed layer breakup
11Copyright © Capgemini 2016. All Rights Reserved
Reference architecture for data ingestion - Indicative
Functionality: Ingest Data from a variety of sources and with varying latency, into the Data Lake
Data Integration Services
S/FTP based push
(Logs, text, other file based)
Changed Data Management
(Delta extracts, event mgmt)
Data
Sourcing
Source Extraction Services
(XML, Relational, Other extracts)
DataTransformation
Transformation Services
Fast Data
Manipulation
• Sorting
• File Merges
• Joins
• File Splitting
• Others
Transform
Routines
• Aggregation
• Mappings
• Lookups
• Calculations
• others
Metadata
Management
Automation
Services
Deployment
(Job & others)
Error Handling
Clustering &
Capacity
Common
Services
Data Sources (Structured, Semi-Structured, Unstructured)
DataState
Data at Rest
(ETL pushdown, batch using
standard DI tools or Sqoop)
Data in Motion
(Fast data, processed via tools like
Flume, Storm, Spark, etc)
Data Persistence
Big Data
Transformations
• User-defined
functions / custom
MR code (Java,
Python etc.) for
complex logic
ETL Pushdown Processing
(Execute mapping jobs on Hadoop cluster on
HDFS/Hive/Spark….)
Characteristics
 The Data Ingestion design principles are
based on integrating raw data
characterized by extreme scale and
variability, and making provisions for
both ‘data at rest’ (batch) and ‘data in
motion’ (low latency)
 The framework combines traditional
data integration methodologies
leveraging the Extract-Transform-Load
approach and extends it to also process
semi-structured and unstructured data
elements.
 The classical model of tracking data
elements through their lifecycle and
providing for lineage can be added in
this framework.
12Copyright © Capgemini 2016. All Rights Reserved
Data Acquisition and Reconciliation
The Data Reconciliation is part of data quality and ensures data
integrity in the data lake. Reconciliation process checks if the data has
been loaded properly to ensure accuracy and completeness of the data
Master Data – This is a fairly simple process as the Master Data is not
subject to frequent changes. The granularity of the data remains the
same in the source and the target
Transactional Data – Reconciliation of the Transactional Data is
instrumental to the success of the big data systems. Reconciliation can
happen on the entire data set or on the incremental data based on the
method by which the data is ingested
Separate metadata tables / files are designed specifically for
reconciliation. These tables/ files are populated with reconciliation
queries and reconciliation reports are generated after data is loaded
into the data lake.
Data Reconciliation (Optional)
The Data Acquisition can be described as combination of Landing Zone &
Data validation, Delta Detection & Data Enrichment
Landing Zone – It is an area wherein data from all the source systems
across client’s landscape will land for the utilization/consumption by
downstream systems
Data validation – It is the first check point or zone wherein the MDM
based checks will be applied on the incoming source data files.
Delta Detection : This will be applicable to the data feeds from those
source systems which have the capability to send/provide incremental
delta data for the regular ongoing data processing into data lake solution.
Data Enrichment : Data enrichment refers to processes used to enhance,
refine or otherwise improve raw data. Data from various enrichment
sources will be pushed to data lake via Landing zone for enrichment of
existing data.
Data Acquisition
13Copyright © Capgemini 2016. All Rights Reserved
Data Distillation in the Data Lake: approach to provisioning for
data consumption
Characteristics
 Uniform approach for distillation of information from
the data lake
 A centralized Data Quality engine for application of
uniform data quality rules across the enterprise
 An Integrated Data Quality function to cleanse,
standardize, enrich and de-duplicate data
 Console for Design, Development & Validation of
rules
 Data Quality Services for Integration with
operational systems, MDM
 A Exception Management solution for resolving data
issues and errors.
 Data quality process running on the data will be
translated into MapReduce for faster processing.
Data Persistence Layer
Distillation Layer
AGGREGATION
EXTRACT
TRANSFORM
Σ
SECURE
DATA QUALITY STORE
DATA QUALITY CONSOLE
DATA QUALITY ENGINE
DATA
PROFILING
DATA
CLEANSING
MATCH
& MERGE
DATA
ENRICHMENT
RULE MANAGER
DQ META-DATA
DATA
DASHBOARD
EXCEPTION
MANAGEMENT
DATA QUALITY
CONFIGURATOR
EXCEPTION
REPOSITORY
DQ MART
Functionality: Ability to ingest data from the storage tier and convert it to structured data for easier analysis by downstream applications.
This is done through a combination of Extraction, transformation and aggregation of high quality data from the Data Lake and making it
available for Analytical and Reporting Applications. Transformation will also involve data quality checks and corrections like profiling,
validating, cleansing structured and unstructured data based on Business rules. Data is distilled (or prepared) on a per-function basis, and
made available for consumption. This is consistent with the design practice of ‘manage data centrally and provision locally’
14Copyright © Capgemini 2016. All Rights Reserved
Data Persistence Layer : Schema on Read & Distill on Demand
Namenode
Hadoop Distributed File System (HDFS)
Datanodes Replication
Job / Task
Tracker
Storage Cluster/Rack
Characteristics
 Deliver a single, comprehensive view of all data,
across functional areas – to conduct deep
analysis
 Multi-tiered Data Lake that serves distinct
functionalities – e.g., Landing, staging and
curated stores
 A landing area containing both traditional data
as well as non-traditional data – characterized
by attributes of value, veracity, volume, velocity
and variety
 Eliminate the need for upfront schema design
and rigid pre-configured models
 Easy and cost-effective configuration for scale
up and scale down
 Store everything, distill on demand
Landing Staging
Data Lake
Curated
Audit Metadata Search
Data Ingestion
Functionality: Create a single repository for information and deliver a single, silo-less store to handle all types of data for all reporting,
analysis and discovery requirements
15Copyright © Capgemini 2016. All Rights Reserved
Approach to Data Provisioning
DataAccessLayer
Data provisioning
Discovery
Platform
/ Sandboxes
Analytical
Views
Data
Virtualization
DataDissemination
HR
Mart
1
HR
Mart
2
HR
Mart
3
HR
Mart
4
Characteristics
 The Data Marts & Aggregate Structures layer will
include subject specific data mart structures which
can be used by various tools to retrieve data and
information. This layer will also support User specific
Sandbox for power users to perform various
activities such as data mining, identifying data
patterns, running analytical and statistical model
using various tools
 If required, there will be multiple versions of the
subject areas for different production streams
 Data marts and aggregate structures such as
summary tables will be created based on business
and performance requirements. As far as possible,
database managed aggregates such as computed
views and indexes will be created to reduce ETL
based data movement
 Data Virtualization will address combining datasets
from multiple data stores across various layers in the
data lake stack.
Functionality: Provision data-sets to create various combinations of custom views – by specific functions/departments and also cross-
functional access
16Copyright © Capgemini 2016. All Rights Reserved
© David Feinleib
16

Mais conteúdo relacionado

Mais procurados

Track 3 - Atelier 3 - Assurez l’agilité et la profitabilité de votre business...
Track 3 - Atelier 3 - Assurez l’agilité et la profitabilité de votre business...Track 3 - Atelier 3 - Assurez l’agilité et la profitabilité de votre business...
Track 3 - Atelier 3 - Assurez l’agilité et la profitabilité de votre business...Amazon Web Services
 
Introducing Gartner
Introducing GartnerIntroducing Gartner
Introducing Gartnerchrisforte43
 
UNLIMITED by Capgemini: Foundation of Digital Business
UNLIMITED by Capgemini: Foundation of Digital BusinessUNLIMITED by Capgemini: Foundation of Digital Business
UNLIMITED by Capgemini: Foundation of Digital BusinessCapgemini
 
Pluto7 - Tableau Webinar on enabling Organization to be Data Driven in 201...
Pluto7   -  Tableau Webinar on enabling Organization to be Data Driven in 201...Pluto7   -  Tableau Webinar on enabling Organization to be Data Driven in 201...
Pluto7 - Tableau Webinar on enabling Organization to be Data Driven in 201...Manju Devadas
 
The Perfect Storm & Your Information Strategy
The Perfect Storm & Your Information StrategyThe Perfect Storm & Your Information Strategy
The Perfect Storm & Your Information StrategyCapgemini
 
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaArtificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaCapgemini
 
Top Trends in Commercial Banking: 2020
Top Trends in Commercial Banking: 2020Top Trends in Commercial Banking: 2020
Top Trends in Commercial Banking: 2020Capgemini
 
Invenio content financials
Invenio content financialsInvenio content financials
Invenio content financialsinvenioLSI
 
Make it a valuable experience, think design
Make it a valuable experience, think designMake it a valuable experience, think design
Make it a valuable experience, think designCapgemini
 
20151014 Presentation Conferência Banca e Seguros Portugal
20151014 Presentation Conferência Banca e Seguros Portugal20151014 Presentation Conferência Banca e Seguros Portugal
20151014 Presentation Conferência Banca e Seguros PortugalPascal Spelier
 
Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...
Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...
Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...DataCore Software
 
Achieving GxP compliance with SAP S/4HANA in the AWS Cloud
Achieving GxP compliance with SAP S/4HANA in the AWS CloudAchieving GxP compliance with SAP S/4HANA in the AWS Cloud
Achieving GxP compliance with SAP S/4HANA in the AWS CloudCapgemini
 
Hampshire City Council and Capgemini at SAPPHIRENOW
Hampshire City Council and Capgemini at SAPPHIRENOWHampshire City Council and Capgemini at SAPPHIRENOW
Hampshire City Council and Capgemini at SAPPHIRENOWCapgemini
 
Infographic-Unlocking Customer Satisfaction: Why Digital Holds the key for Te...
Infographic-Unlocking Customer Satisfaction: Why Digital Holds the key for Te...Infographic-Unlocking Customer Satisfaction: Why Digital Holds the key for Te...
Infographic-Unlocking Customer Satisfaction: Why Digital Holds the key for Te...Capgemini
 
Construction Viz Project Tracker
Construction Viz Project TrackerConstruction Viz Project Tracker
Construction Viz Project TrackerJeffrey Lydon
 
CWIN17 New-York / insurance spotlight building the digital core
CWIN17 New-York / insurance spotlight   building the digital coreCWIN17 New-York / insurance spotlight   building the digital core
CWIN17 New-York / insurance spotlight building the digital coreCapgemini
 
CWIN17 san francisco-shawn kelly-iot business value
CWIN17 san francisco-shawn kelly-iot business valueCWIN17 san francisco-shawn kelly-iot business value
CWIN17 san francisco-shawn kelly-iot business valueCapgemini
 
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...Capgemini
 
Future of service
Future of service Future of service
Future of service Capgemini
 
A strategic review of the top five offshore vendors
A strategic review of the top five offshore vendorsA strategic review of the top five offshore vendors
A strategic review of the top five offshore vendorsSemalytix
 

Mais procurados (20)

Track 3 - Atelier 3 - Assurez l’agilité et la profitabilité de votre business...
Track 3 - Atelier 3 - Assurez l’agilité et la profitabilité de votre business...Track 3 - Atelier 3 - Assurez l’agilité et la profitabilité de votre business...
Track 3 - Atelier 3 - Assurez l’agilité et la profitabilité de votre business...
 
Introducing Gartner
Introducing GartnerIntroducing Gartner
Introducing Gartner
 
UNLIMITED by Capgemini: Foundation of Digital Business
UNLIMITED by Capgemini: Foundation of Digital BusinessUNLIMITED by Capgemini: Foundation of Digital Business
UNLIMITED by Capgemini: Foundation of Digital Business
 
Pluto7 - Tableau Webinar on enabling Organization to be Data Driven in 201...
Pluto7   -  Tableau Webinar on enabling Organization to be Data Driven in 201...Pluto7   -  Tableau Webinar on enabling Organization to be Data Driven in 201...
Pluto7 - Tableau Webinar on enabling Organization to be Data Driven in 201...
 
The Perfect Storm & Your Information Strategy
The Perfect Storm & Your Information StrategyThe Perfect Storm & Your Information Strategy
The Perfect Storm & Your Information Strategy
 
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaArtificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
 
Top Trends in Commercial Banking: 2020
Top Trends in Commercial Banking: 2020Top Trends in Commercial Banking: 2020
Top Trends in Commercial Banking: 2020
 
Invenio content financials
Invenio content financialsInvenio content financials
Invenio content financials
 
Make it a valuable experience, think design
Make it a valuable experience, think designMake it a valuable experience, think design
Make it a valuable experience, think design
 
20151014 Presentation Conferência Banca e Seguros Portugal
20151014 Presentation Conferência Banca e Seguros Portugal20151014 Presentation Conferência Banca e Seguros Portugal
20151014 Presentation Conferência Banca e Seguros Portugal
 
Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...
Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...
Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...
 
Achieving GxP compliance with SAP S/4HANA in the AWS Cloud
Achieving GxP compliance with SAP S/4HANA in the AWS CloudAchieving GxP compliance with SAP S/4HANA in the AWS Cloud
Achieving GxP compliance with SAP S/4HANA in the AWS Cloud
 
Hampshire City Council and Capgemini at SAPPHIRENOW
Hampshire City Council and Capgemini at SAPPHIRENOWHampshire City Council and Capgemini at SAPPHIRENOW
Hampshire City Council and Capgemini at SAPPHIRENOW
 
Infographic-Unlocking Customer Satisfaction: Why Digital Holds the key for Te...
Infographic-Unlocking Customer Satisfaction: Why Digital Holds the key for Te...Infographic-Unlocking Customer Satisfaction: Why Digital Holds the key for Te...
Infographic-Unlocking Customer Satisfaction: Why Digital Holds the key for Te...
 
Construction Viz Project Tracker
Construction Viz Project TrackerConstruction Viz Project Tracker
Construction Viz Project Tracker
 
CWIN17 New-York / insurance spotlight building the digital core
CWIN17 New-York / insurance spotlight   building the digital coreCWIN17 New-York / insurance spotlight   building the digital core
CWIN17 New-York / insurance spotlight building the digital core
 
CWIN17 san francisco-shawn kelly-iot business value
CWIN17 san francisco-shawn kelly-iot business valueCWIN17 san francisco-shawn kelly-iot business value
CWIN17 san francisco-shawn kelly-iot business value
 
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
 
Future of service
Future of service Future of service
Future of service
 
A strategic review of the top five offshore vendors
A strategic review of the top five offshore vendorsA strategic review of the top five offshore vendors
A strategic review of the top five offshore vendors
 

Semelhante a CWIN17 India / Bigdata architecture yashowardhan sowale

Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Jeffrey T. Pollock
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJDaniel Madrigal
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...DataWorks Summit
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data BSP Media Group
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
The Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionThe Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionMongoDB
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonSynerzip
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshareJulianna DeLua
 
Using the information server toolset to deliver end to end traceability
Using the information server toolset to deliver end to end traceabilityUsing the information server toolset to deliver end to end traceability
Using the information server toolset to deliver end to end traceabilityIBM Sverige
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Chain Sys Corporation
 

Semelhante a CWIN17 India / Bigdata architecture yashowardhan sowale (20)

Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
The Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionThe Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reduction
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare
 
Using the information server toolset to deliver end to end traceability
Using the information server toolset to deliver end to end traceabilityUsing the information server toolset to deliver end to end traceability
Using the information server toolset to deliver end to end traceability
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
 
KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 
DataPlatform.pptx
DataPlatform.pptxDataPlatform.pptx
DataPlatform.pptx
 

Mais de Capgemini

Top Healthcare Trends 2022
Top Healthcare Trends 2022Top Healthcare Trends 2022
Top Healthcare Trends 2022Capgemini
 
Top P&C Insurance Trends 2022
Top P&C Insurance Trends 2022Top P&C Insurance Trends 2022
Top P&C Insurance Trends 2022Capgemini
 
Commercial Banking Trends book 2022
Commercial Banking Trends book 2022Commercial Banking Trends book 2022
Commercial Banking Trends book 2022Capgemini
 
Top Trends in Wealth Management 2022
Top Trends in Wealth Management 2022Top Trends in Wealth Management 2022
Top Trends in Wealth Management 2022Capgemini
 
Retail Banking Trends book 2022
Retail Banking Trends book 2022Retail Banking Trends book 2022
Retail Banking Trends book 2022Capgemini
 
Top Life Insurance Trends 2022
Top Life Insurance Trends 2022Top Life Insurance Trends 2022
Top Life Insurance Trends 2022Capgemini
 
キャップジェミニ、あなたの『RISE WITH SAP』のパートナーです
キャップジェミニ、あなたの『RISE WITH SAP』のパートナーですキャップジェミニ、あなたの『RISE WITH SAP』のパートナーです
キャップジェミニ、あなたの『RISE WITH SAP』のパートナーですCapgemini
 
Property & Casualty Insurance Top Trends 2021
Property & Casualty Insurance Top Trends 2021Property & Casualty Insurance Top Trends 2021
Property & Casualty Insurance Top Trends 2021Capgemini
 
Life Insurance Top Trends 2021
Life Insurance Top Trends 2021Life Insurance Top Trends 2021
Life Insurance Top Trends 2021Capgemini
 
Top Trends in Commercial Banking: 2021
Top Trends in Commercial Banking: 2021Top Trends in Commercial Banking: 2021
Top Trends in Commercial Banking: 2021Capgemini
 
Top Trends in Wealth Management: 2021
Top Trends in Wealth Management: 2021Top Trends in Wealth Management: 2021
Top Trends in Wealth Management: 2021Capgemini
 
Top Trends in Payments: 2021
Top Trends in Payments: 2021Top Trends in Payments: 2021
Top Trends in Payments: 2021Capgemini
 
Health Insurance Top Trends 2021
Health Insurance Top Trends 2021Health Insurance Top Trends 2021
Health Insurance Top Trends 2021Capgemini
 
Top Trends in Retail Banking: 2021
Top Trends in Retail Banking: 2021Top Trends in Retail Banking: 2021
Top Trends in Retail Banking: 2021Capgemini
 
Capgemini’s Connected Autonomous Planning
Capgemini’s Connected Autonomous PlanningCapgemini’s Connected Autonomous Planning
Capgemini’s Connected Autonomous PlanningCapgemini
 
Top Trends in Retail Banking: 2020
Top Trends in Retail Banking: 2020Top Trends in Retail Banking: 2020
Top Trends in Retail Banking: 2020Capgemini
 
Top Trends in Life Insurance: 2020
Top Trends in Life Insurance: 2020Top Trends in Life Insurance: 2020
Top Trends in Life Insurance: 2020Capgemini
 
Top Trends in Health Insurance: 2020
Top Trends in Health Insurance: 2020Top Trends in Health Insurance: 2020
Top Trends in Health Insurance: 2020Capgemini
 
Top Trends in Payments: 2020
Top Trends in Payments: 2020Top Trends in Payments: 2020
Top Trends in Payments: 2020Capgemini
 
How to get off the white elephant of physical and leverage the true benefits ...
How to get off the white elephant of physical and leverage the true benefits ...How to get off the white elephant of physical and leverage the true benefits ...
How to get off the white elephant of physical and leverage the true benefits ...Capgemini
 

Mais de Capgemini (20)

Top Healthcare Trends 2022
Top Healthcare Trends 2022Top Healthcare Trends 2022
Top Healthcare Trends 2022
 
Top P&C Insurance Trends 2022
Top P&C Insurance Trends 2022Top P&C Insurance Trends 2022
Top P&C Insurance Trends 2022
 
Commercial Banking Trends book 2022
Commercial Banking Trends book 2022Commercial Banking Trends book 2022
Commercial Banking Trends book 2022
 
Top Trends in Wealth Management 2022
Top Trends in Wealth Management 2022Top Trends in Wealth Management 2022
Top Trends in Wealth Management 2022
 
Retail Banking Trends book 2022
Retail Banking Trends book 2022Retail Banking Trends book 2022
Retail Banking Trends book 2022
 
Top Life Insurance Trends 2022
Top Life Insurance Trends 2022Top Life Insurance Trends 2022
Top Life Insurance Trends 2022
 
キャップジェミニ、あなたの『RISE WITH SAP』のパートナーです
キャップジェミニ、あなたの『RISE WITH SAP』のパートナーですキャップジェミニ、あなたの『RISE WITH SAP』のパートナーです
キャップジェミニ、あなたの『RISE WITH SAP』のパートナーです
 
Property & Casualty Insurance Top Trends 2021
Property & Casualty Insurance Top Trends 2021Property & Casualty Insurance Top Trends 2021
Property & Casualty Insurance Top Trends 2021
 
Life Insurance Top Trends 2021
Life Insurance Top Trends 2021Life Insurance Top Trends 2021
Life Insurance Top Trends 2021
 
Top Trends in Commercial Banking: 2021
Top Trends in Commercial Banking: 2021Top Trends in Commercial Banking: 2021
Top Trends in Commercial Banking: 2021
 
Top Trends in Wealth Management: 2021
Top Trends in Wealth Management: 2021Top Trends in Wealth Management: 2021
Top Trends in Wealth Management: 2021
 
Top Trends in Payments: 2021
Top Trends in Payments: 2021Top Trends in Payments: 2021
Top Trends in Payments: 2021
 
Health Insurance Top Trends 2021
Health Insurance Top Trends 2021Health Insurance Top Trends 2021
Health Insurance Top Trends 2021
 
Top Trends in Retail Banking: 2021
Top Trends in Retail Banking: 2021Top Trends in Retail Banking: 2021
Top Trends in Retail Banking: 2021
 
Capgemini’s Connected Autonomous Planning
Capgemini’s Connected Autonomous PlanningCapgemini’s Connected Autonomous Planning
Capgemini’s Connected Autonomous Planning
 
Top Trends in Retail Banking: 2020
Top Trends in Retail Banking: 2020Top Trends in Retail Banking: 2020
Top Trends in Retail Banking: 2020
 
Top Trends in Life Insurance: 2020
Top Trends in Life Insurance: 2020Top Trends in Life Insurance: 2020
Top Trends in Life Insurance: 2020
 
Top Trends in Health Insurance: 2020
Top Trends in Health Insurance: 2020Top Trends in Health Insurance: 2020
Top Trends in Health Insurance: 2020
 
Top Trends in Payments: 2020
Top Trends in Payments: 2020Top Trends in Payments: 2020
Top Trends in Payments: 2020
 
How to get off the white elephant of physical and leverage the true benefits ...
How to get off the white elephant of physical and leverage the true benefits ...How to get off the white elephant of physical and leverage the true benefits ...
How to get off the white elephant of physical and leverage the true benefits ...
 

Último

Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxEngaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxAsifArshad8
 
Application of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptxApplication of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptxRoquia Salam
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRachelAnnTenibroAmaz
 
A Guide to Choosing the Ideal Air Cooler
A Guide to Choosing the Ideal Air CoolerA Guide to Choosing the Ideal Air Cooler
A Guide to Choosing the Ideal Air Coolerenquirieskenstar
 
Internship Presentation | PPT | CSE | SE
Internship Presentation | PPT | CSE | SEInternship Presentation | PPT | CSE | SE
Internship Presentation | PPT | CSE | SESaleh Ibne Omar
 
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Sebastiano Panichella
 
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...Sebastiano Panichella
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...漢銘 謝
 
Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptxogubuikealex
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEMCharmi13
 
proposal kumeneger edited.docx A kumeeger
proposal kumeneger edited.docx A kumeegerproposal kumeneger edited.docx A kumeeger
proposal kumeneger edited.docx A kumeegerkumenegertelayegrama
 
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptxerickamwana1
 
General Elections Final Press Noteas per M
General Elections Final Press Noteas per MGeneral Elections Final Press Noteas per M
General Elections Final Press Noteas per MVidyaAdsule1
 
GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024GESCO SE
 
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRRsarwankumar4524
 
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunityDon't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunityApp Ethena
 
cse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitycse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitysandeepnani2260
 

Último (17)

Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxEngaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
 
Application of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptxApplication of GIS in Landslide Disaster Response.pptx
Application of GIS in Landslide Disaster Response.pptx
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
 
A Guide to Choosing the Ideal Air Cooler
A Guide to Choosing the Ideal Air CoolerA Guide to Choosing the Ideal Air Cooler
A Guide to Choosing the Ideal Air Cooler
 
Internship Presentation | PPT | CSE | SE
Internship Presentation | PPT | CSE | SEInternship Presentation | PPT | CSE | SE
Internship Presentation | PPT | CSE | SE
 
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...
 
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...Testing with Fewer Resources:  Toward Adaptive Approaches for Cost-effective ...
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
 
Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptx
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEM
 
proposal kumeneger edited.docx A kumeeger
proposal kumeneger edited.docx A kumeegerproposal kumeneger edited.docx A kumeeger
proposal kumeneger edited.docx A kumeeger
 
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
05.02 MMC - Assignment 4 - Image Attribution Lovepreet.pptx
 
General Elections Final Press Noteas per M
General Elections Final Press Noteas per MGeneral Elections Final Press Noteas per M
General Elections Final Press Noteas per M
 
GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024GESCO SE Press and Analyst Conference on Financial Results 2024
GESCO SE Press and Analyst Conference on Financial Results 2024
 
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
 
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunityDon't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
Don't Miss Out: Strategies for Making the Most of the Ethena DigitalOpportunity
 
cse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitycse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber security
 

CWIN17 India / Bigdata architecture yashowardhan sowale

  • 1. 1Copyright © Capgemini 2016. All Rights Reserved Bigdata Architecture Overview
  • 2. 2Copyright © Capgemini 2016. All Rights Reserved Gartner Hype Cycle – Emerging Technologies
  • 3. 3Copyright © Capgemini 2016. All Rights Reserved Benefits
  • 4. 4Copyright © Capgemini 2016. All Rights Reserved Big Data and its Dimensions Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible Manage the complexity of data in many different structures, ranging from relational, to logs, to raw text Streaming data and large volume data movement Scale from Terabytes to Petabytes (1K TBs) to Zetabytes (1B TBs) Having a lot of data in different volumes coming in at high speed is worthless if that data is incorrect. Organizations need to ensure that the data is correct as well as the analyses performed on the data are correct. Discovering value from multichannel datasets Variety: Velocity: Volume: Veracity: Value:
  • 5. 5Copyright © Capgemini 2016. All Rights Reserved Applications for Big Data Analytics Homeland Security FinanceSmarter Healthcare Multi-channel sales Telecom Manufacturing Traffic Control Trading Analytics Fraud and Risk Log Analysis Search Quality Retail: Churn
  • 6. 6Copyright © Capgemini 2016. All Rights Reserved Manage  Data governance and security  Data privacy  Compliance  Collaboration  Value generation  Program delivery  Data-driven culture  Information strategy  Skill development  Master data mgmt  Metadata mgmt  Data quality mgmt  Operations, SLA’s  Orchestration General reference architecture for Big Data Analytics ValueActInsightAnalyzeInformationProcessSource data Customer profitability Operational cost cutting Risk prevention Market share increase Business Applications  Customer campaign  Trigger activity Business Processes  Trigger event  Adjust process Decision makers  Approve/reject business opportunities  Develop new business models and products Customer Experience Operational Process Optimization Risk, Fraud Disruptive Business Model Search What is relevant? Explorative How does it work? Descriptive What happened? Diagnostic Why did it happen? Predictive What will happen? Prescriptive How to act next? Data asset descriptions Processed data  Measures, KPI’s  Dimensions, Master data Granular data  Events  Context information Ingest Catalog Stream Store Prepare Refine, blend Manage lifecycle Internal data  IT managed applications (ERP, SCM, CRM)  Master and reference data  Business owned informal data  Documents, mail, images, voice, video  Web and mobile apps  B2B  Internet, Social, Internet of Things (machine, sensor)  Third party data: market, weather, climate, geolocation  Open data External Data Business performance Performance improvement Mask
  • 7. 7Copyright © Capgemini 2016. All Rights Reserved The BDL is also aligned with our principles  Unleash Data and Insights as-a-service Make Insight-driven Value a Crucial Business KPI Empower your People with Insights at the Point of Action Develop an Enterprise Data Science Culture Master Governance, Security and Privacy of your Data Assets Enable your Data Landscape for the Flood coming from Connected People and Things Embark on the Journey to Insights within your Business and Technology Context 1 2 3 7654 It concerns both Business and (disruptive) Technology It works with high volumes of all kinds of data It integrates Unified Data Management capabilities to manage governance, security, privacy, MDM, RDM, etc it also comes with a new, specific mindset that has to be addressed at the Enterprise level We (Capgemini) intend to offer the BDL as-a-Service Bringing Business Value by delivering Insights at the Point of Action is the motto of the BDL 1 2 3 7 654
  • 8. 8Copyright © Capgemini 2016. All Rights Reserved Business Data Lake Reference Architecture - Conceptual Characteristics  Store-anything; analyze everything  Blend traditional data elements with new data types  Manage centrally, govern locally  Future-proof design  Highly scalable and available Data Access Layer Data Distillation Layer Data Quality Governance Framework (Business Rules, Transformation, Aggregation) Customer Master (CRM) Data Lake Layer Landing Self-service 4 Data Ingestion LayerExtract & Load Streams 3 Structured data Sources 2 1 ODS SandboxSQL-on-Hadoop In-Memory Grid Data Visualization and Reporting Advanced Analytics Data Virtualization Or Blending Marts DataGovernance(Audit,Lineage) 7 MetadataManagement Transactional Systems(RES/CRM) Un/Semi-Structured Data Sources Data Dissemination Layer Data Provisioning Layer HR Mart 1 HR Mart 2 Distributed Compute Layer / Services Distributed Storage Layer Data Governance Integration APILayer 11 6 5 DataSecurity(Authentication,Authorization,Kerberos) 8 9 10
  • 9. 9Copyright © Capgemini 2016. All Rights Reserved Business Data Lake Reference Architecture - Logical Talend 6.3 or latest Data Access Layer Data Distillation Layer Data Quality Governance Framework (Business Rules, Transformation, Aggregation) Customer Master (CRM) Data Lake Layer Landing 4 Data Ingestion LayerExtract & Load Streams 3 Structured data Sources 2 1 ODS SandboxSQL-on-Hadoop In-Memory Grid Data Virtualization Or Blending Marts DataGovernance(Audit,Lineage) 7 MetadataManagement Transactional Systems(RES/CRM) Un/Semi-Structured Data Sources Data Dissemination Layer Data Provisioning Layer HR Mart 1 HR Mart 2 APILayer 11 6 5 DataSecurity(Authentication,Authorization,Kerberos) 8 9 10 Ranger, Knox Atlas Hortonworks HDP 2.5 or latest Spark HBASE Hive HBASE / Hive Datamarts Redshift Zeppellin RESTful Service Self-serviceData Visualization and Reporting Advanced Analytics Spark Streaming/Storm Kafka
  • 10. 10Copyright © Capgemini 2016. All Rights Reserved Detailed layer breakup
  • 11. 11Copyright © Capgemini 2016. All Rights Reserved Reference architecture for data ingestion - Indicative Functionality: Ingest Data from a variety of sources and with varying latency, into the Data Lake Data Integration Services S/FTP based push (Logs, text, other file based) Changed Data Management (Delta extracts, event mgmt) Data Sourcing Source Extraction Services (XML, Relational, Other extracts) DataTransformation Transformation Services Fast Data Manipulation • Sorting • File Merges • Joins • File Splitting • Others Transform Routines • Aggregation • Mappings • Lookups • Calculations • others Metadata Management Automation Services Deployment (Job & others) Error Handling Clustering & Capacity Common Services Data Sources (Structured, Semi-Structured, Unstructured) DataState Data at Rest (ETL pushdown, batch using standard DI tools or Sqoop) Data in Motion (Fast data, processed via tools like Flume, Storm, Spark, etc) Data Persistence Big Data Transformations • User-defined functions / custom MR code (Java, Python etc.) for complex logic ETL Pushdown Processing (Execute mapping jobs on Hadoop cluster on HDFS/Hive/Spark….) Characteristics  The Data Ingestion design principles are based on integrating raw data characterized by extreme scale and variability, and making provisions for both ‘data at rest’ (batch) and ‘data in motion’ (low latency)  The framework combines traditional data integration methodologies leveraging the Extract-Transform-Load approach and extends it to also process semi-structured and unstructured data elements.  The classical model of tracking data elements through their lifecycle and providing for lineage can be added in this framework.
  • 12. 12Copyright © Capgemini 2016. All Rights Reserved Data Acquisition and Reconciliation The Data Reconciliation is part of data quality and ensures data integrity in the data lake. Reconciliation process checks if the data has been loaded properly to ensure accuracy and completeness of the data Master Data – This is a fairly simple process as the Master Data is not subject to frequent changes. The granularity of the data remains the same in the source and the target Transactional Data – Reconciliation of the Transactional Data is instrumental to the success of the big data systems. Reconciliation can happen on the entire data set or on the incremental data based on the method by which the data is ingested Separate metadata tables / files are designed specifically for reconciliation. These tables/ files are populated with reconciliation queries and reconciliation reports are generated after data is loaded into the data lake. Data Reconciliation (Optional) The Data Acquisition can be described as combination of Landing Zone & Data validation, Delta Detection & Data Enrichment Landing Zone – It is an area wherein data from all the source systems across client’s landscape will land for the utilization/consumption by downstream systems Data validation – It is the first check point or zone wherein the MDM based checks will be applied on the incoming source data files. Delta Detection : This will be applicable to the data feeds from those source systems which have the capability to send/provide incremental delta data for the regular ongoing data processing into data lake solution. Data Enrichment : Data enrichment refers to processes used to enhance, refine or otherwise improve raw data. Data from various enrichment sources will be pushed to data lake via Landing zone for enrichment of existing data. Data Acquisition
  • 13. 13Copyright © Capgemini 2016. All Rights Reserved Data Distillation in the Data Lake: approach to provisioning for data consumption Characteristics  Uniform approach for distillation of information from the data lake  A centralized Data Quality engine for application of uniform data quality rules across the enterprise  An Integrated Data Quality function to cleanse, standardize, enrich and de-duplicate data  Console for Design, Development & Validation of rules  Data Quality Services for Integration with operational systems, MDM  A Exception Management solution for resolving data issues and errors.  Data quality process running on the data will be translated into MapReduce for faster processing. Data Persistence Layer Distillation Layer AGGREGATION EXTRACT TRANSFORM Σ SECURE DATA QUALITY STORE DATA QUALITY CONSOLE DATA QUALITY ENGINE DATA PROFILING DATA CLEANSING MATCH & MERGE DATA ENRICHMENT RULE MANAGER DQ META-DATA DATA DASHBOARD EXCEPTION MANAGEMENT DATA QUALITY CONFIGURATOR EXCEPTION REPOSITORY DQ MART Functionality: Ability to ingest data from the storage tier and convert it to structured data for easier analysis by downstream applications. This is done through a combination of Extraction, transformation and aggregation of high quality data from the Data Lake and making it available for Analytical and Reporting Applications. Transformation will also involve data quality checks and corrections like profiling, validating, cleansing structured and unstructured data based on Business rules. Data is distilled (or prepared) on a per-function basis, and made available for consumption. This is consistent with the design practice of ‘manage data centrally and provision locally’
  • 14. 14Copyright © Capgemini 2016. All Rights Reserved Data Persistence Layer : Schema on Read & Distill on Demand Namenode Hadoop Distributed File System (HDFS) Datanodes Replication Job / Task Tracker Storage Cluster/Rack Characteristics  Deliver a single, comprehensive view of all data, across functional areas – to conduct deep analysis  Multi-tiered Data Lake that serves distinct functionalities – e.g., Landing, staging and curated stores  A landing area containing both traditional data as well as non-traditional data – characterized by attributes of value, veracity, volume, velocity and variety  Eliminate the need for upfront schema design and rigid pre-configured models  Easy and cost-effective configuration for scale up and scale down  Store everything, distill on demand Landing Staging Data Lake Curated Audit Metadata Search Data Ingestion Functionality: Create a single repository for information and deliver a single, silo-less store to handle all types of data for all reporting, analysis and discovery requirements
  • 15. 15Copyright © Capgemini 2016. All Rights Reserved Approach to Data Provisioning DataAccessLayer Data provisioning Discovery Platform / Sandboxes Analytical Views Data Virtualization DataDissemination HR Mart 1 HR Mart 2 HR Mart 3 HR Mart 4 Characteristics  The Data Marts & Aggregate Structures layer will include subject specific data mart structures which can be used by various tools to retrieve data and information. This layer will also support User specific Sandbox for power users to perform various activities such as data mining, identifying data patterns, running analytical and statistical model using various tools  If required, there will be multiple versions of the subject areas for different production streams  Data marts and aggregate structures such as summary tables will be created based on business and performance requirements. As far as possible, database managed aggregates such as computed views and indexes will be created to reduce ETL based data movement  Data Virtualization will address combining datasets from multiple data stores across various layers in the data lake stack. Functionality: Provision data-sets to create various combinations of custom views – by specific functions/departments and also cross- functional access
  • 16. 16Copyright © Capgemini 2016. All Rights Reserved © David Feinleib 16