SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
White Paper
Abstract
Big Data use cases are maturing and customers are using Big
Data to improve top and bottom line revenues. With this,
enterprise readiness and data management needs are becoming
increasingly important. Many organizations are working to make
their Hadoop & NoSQL environments enterprise ready. Data
lakes are the new repository that is becoming a single source of
truth. Today, there is lack of operational data protection and
enterprise readiness is becoming the main inhibitor for Big Data
adoption. EMC is providing an effective data protection strategy
that addresses these Big Data challenges.
January 2015
Data Lake Protection
A Technical Review
2
Data Lake Protection
A Technical Review
Copyright © 2015 EMC Corporation. All Rights Reserved.
EMC believes the information in this publication is accurate as
of its publication date. The information is subject to change
without notice.
The information in this publication is provided “as is.” EMC
Corporation makes no representations or warranties of any kind
with respect to the information in this publication, and
specifically disclaims implied warranties of merchantability or
fitness for a particular purpose.
Use, copying, and distribution of any EMC software described in
this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC
Corporation Trademarks on EMC.com.
Part Number h13932
3
Data Lake Protection
A Technical Review
Contents
Executive Summary................................................................................................ 4
The challenge.....................................................................................................................4
Solution overview...............................................................................................................4
Introduction........................................................................................................... 5
Audience............................................................................................................................5
Background........................................................................................................... 5
What is a data lake? ...........................................................................................................5
Big Data use cases are maturing.........................................................................................5
Data protection is a core component of any data lake.........................................................5
EMC Data Domain protection storage high level overview...................................................6
EMC Isilon scale-out NAS storage high level overview.........................................................6
EMC Elastic Cloud Storage (ECS) high level overview ..........................................................7
EMC solutions for data lake protection.................................................................... 7
Overview of Hadoop Distributed Copy data protection........................................................8
Overview of Isilon snapshots managed by NetWorker Snapshot Management....................8
EMC target storage options.................................................................................................8
Data lake protection using Hadoop Distributed Copy ............................................... 8
Hadoop Distributed Copy data protection to Data Domain..................................................8
Benefits of using Hadoop Distributed Copy to Data Domain................................................9
Hadoop Distributed Copy data protection to Isilon ...........................................................10
Benefits of using Hadoop Distributed Copy to Isilon .........................................................11
Hadoop Distributed Copy data protection to ECS..............................................................11
Benefits of using Hadoop Distributed Copy to ECS............................................................12
Data lake protection using Isilon snapshots managed by EMC NetWorker............... 12
Isilon snapshots managed by NetWorker Snapshot Management to Data Domain............12
Benefits of using NetWorker managed Isilon snapshots to Data Domain ..........................13
Isilon snapshots managed by NetWorker Snapshot Management to Isilon .......................14
Benefits of using NetWorker managed Isilon snapshots to Isilon......................................14
Isilon snapshots managed by NetWorker Snapshot Management to ECS..........................15
Benefits of using NetWorker managed Isilon snapshots to ECS ........................................16
Customer benefits................................................................................................ 16
Conclusion .......................................................................................................... 17
4
Data Lake Protection
A Technical Review
Executive Summary
Big Data use cases are maturing and customers are using Big Data to improve top and
bottom line revenues. With this, the need for enterprise readiness and data
management is becoming increasingly important.
The challenge
Today many organizations are working hard to make Hadoop and NoSQL enterprise
ready. Hadoop has moved past the batch process. SQL has become the new
battlefield. Data lakes are the new repository and are becoming the single source of
truth. Most enterprises need to bridge the 2nd platform and Big Data gaps. Today
there is lack of operational Big Data protection & management and enterprise
readiness is becoming the main inhibitor for Big Data adoption. Without proper data
protection, operationalizing Big Data analytics is going to be a nightmare and
businesses will not be able to put Big Data to effective use.
Solution overview
EMC®
is providing an effective data protection strategy to address the challenges
associated with Big Data. This paper discusses two EMC Data Lake protection
solutions that include:
• Enterprise readiness for compute and storage stacks
• Cost effective backup for operational recovery and disaster recovery leveraging
existing infrastructure, people, and processes
• Support for different data lake deployments with different source and
destination storage (Data Domain, Isilon, ECS), and leverage different data
movers (DistCp, snapshot, NDMP etc.)
5
Data Lake Protection
A Technical Review
Introduction
The purpose of this white paper is to provide important background information on
why data lake protection has become critically important and to communicate the
EMC Data Lake protection solutions that can help customers achieve higher degrees
of business value and operational efficiency with their important structured, semi-
structured, and unstructured data in a data lake.
Audience
This white paper is intended for IT & Hadoop Administrators, systems engineers,
partners and members of the EMC and partner professional services community who
are looking to better understand and implement EMC Data Lake protection options.
Background
What is a data lake?
In simple terms, a data lake is the evolution of an Enterprise Data Warehouse (EDW)
into an active repository for structured, semi-structured, and unstructured data that
retains all attributes against which you can run important business analytics. Said
another way, data lakes are emerging as the primary landing zone for many disparate
data sources, like clickstreams, weblogs, sensor data etc. The data lake is formed by
the combination of Hadoop and NoSQL. One of the main characteristics of an
effective data lake is that it doesn’t require an upfront schema which means it is
much more flexible and makes it very easy to add new data sources and store them in
their native format. This flexibility allows customers to easily add and leverage many
other data sources in order to make more holistic business decisions on their data.
Big Data use cases are maturing
Big Data is transforming every business and use cases such as EDW optimization, Big
Data archive, security & operational analytics, and 360 degree customer analytics for
targeting marketing are maturing. Large retailers are leveraging Big Data for price
optimization, financial services firms conduct fraud analysis & risk profiling, and large
pharmaceutical companies use Big Data to support drug development. Recent
advances in Hadoop have taken data lakes beyond just batch processing and are now
more interactive and real-time leading to the data lake becoming the primary data
source.
Data protection is a core component of any data lake
As the value of Big Data use cases and data lakes have become more critical, there is
now an important operational requirement for enterprise-grade data lake data
protection and disaster recovery. Insufficient data protection can have costly
consequences, keeping organizations from tackling new Big Data projects and
reaching new markets. EMC is uniquely positioned to provide the critical data
6
Data Lake Protection
A Technical Review
protection needed by leveraging the broad solution strengths of the EMC Federation
including partners such as Pivotal.
EMC Data Domain protection storage high level overview
EMC®
Data Domain®
protection storage systems deliver industry-leading speed and
efficiency with throughput up to 15 TB/hour enabling more backups to complete
sooner and reducing pressure on backup windows. Data Domain systems leverage
variable-length deduplication to minimize disk requirements and ensure data lands
on disk already deduplicated. This reduces backup and archive storage requirements
by an average of 10 to 30x, making disk a cost-effective alternative to tape. Data on
disk is available online and onsite for longer retention periods and restores and
retrievals become fast and reliable. This efficiency enables Data Domain systems to
protect up to 55 PB of logical capacity for backup and archive data on a single
system.
Data Domain systems are designed as the storage of last resort – built to ensure you
can reliably recover your data with confidence. The Data Domain Data Invulnerability
Architecture is built into the Data Domain Operating System (DD OS) to provide the
industry’s best defense against data integrity issues. For additional information on
Data Domain systems please refer to the EMC Data Domain Data Sheet, and the EMC
Data Domain Data Invulnerability Architecture white paper.
EMC Isilon scale-out NAS storage high level overview
EMC®
Isilon®
scale-out storage solutions are designed for enterprises that want to
manage their data, not their storage. Isilon storage systems are powerful yet simple
to install, manage, and scale to virtually any size. And, unlike traditional enterprise
storage, Isilon solutions stay simple no matter how much storage capacity is added,
how much performance is required, or how business needs change in the future.
Isilon challenges enterprises to think differently about their storage, because when
they do, they’ll recognize there’s a better, simpler way – with EMC Isilon.
Through the winning combination of the groundbreaking Isilon OneFS operating
system, high-performance industry-standard hardware, and powerful data and
storage management software, Isilon provides a complete portfolio of innovative
storage solutions that drive business value for customers by optimizing mission-
critical applications, workflows, and processes. Isilon storage enables enterprise and
research organizations worldwide to manage large and rapidly growing amounts of
data in a highly scalable, easy-to-manage, and cost effective way. Every Isilon
solution is designed to accelerate workflow productivity and reduce capital and
operational expenditures, while seamlessly scaling storage in lockstep with the
growth of mission-critical data. For additional information on Isilon please refer to
the EMC Isilon Data Sheet.
7
Data Lake Protection
A Technical Review
EMC Elastic Cloud Storage (ECS) high level overview
Customers are continually looking for more efficient architectures to manage today’s
hyperscale growth. Powered by EMC®
ViPR®
, the new Elastic Cloud Storage (ECSTM
)
Appliance provides a complete hyperscale storage infrastructure designed to meet
the requirements of modern applications. Regardless of the size of your organization,
the ECS Appliance lets you deliver competitive cloud storage services and grow
effortlessly. The ECS Appliance brings the cost profile, simplicity and scale of public
cloud services to anyone – with the trust, reliability and support you expect from EMC.
The ECS Appliance helps:
• Data Scientists accelerate Big Data initiatives
• Cloud Providers deliver competitive Cloud Storage services at scale
• Enterprises and software developers to accelerate development
The ECS Appliance makes hyperscale storage and cloud economics viable for any size
business by combining the power of ViPR on a low-cost, high density, scale out
commodity hardware platform. The ECS Appliance is available in multiple form
factors that can be deployed and expanded incrementally, so each customer can
choose the right size for their immediate needs and projected growth. Customers can
now optimize their solution based on their application and access needs – giving
them the flexibility and control they want. For additional information on Elastic Cloud
Storage please refer to the EMC ECS Data Sheet.
EMC solutions for data lake protection
EMC offers two different solution options for Data Lake protection; Hadoop
Distributed Copy for deployments where compute & storage have been integrated
(DAS architecture), and Isilon snapshots managed by EMC®
NetWorker®
Snapshot
Management for deployments where compute is separate from storage and the
storage is shared. Both solutions are illustrated in Figure 1 and explained in more
detail throughout the rest of this paper.
Figure 1: Data Lake protection solutions
8
Data Lake Protection
A Technical Review
Overview of Hadoop Distributed Copy data protection
This solution, illustrated on the left in Figure 1, leverages the native Distributed Copy
(DistCp) utility built into Pivotal HD (HDFS) to copy data from the integrated compute
& storage data lake to EMC Data Domain, EMC Isilon, or EMC ECS storage. This
approach leverages all of the nodes in the cluster to push the data.
Overview of Isilon snapshots managed by NetWorker Snapshot Management
Isilon snapshots managed by NetWorker Snapshot Management, illustrated on the
right in Figure 1, applies to data lake deployments where the compute and storage
are separated and the HDFS layer is running on the shared storage. Because you are
using shared storage, customers can leverage all the data management capabilities
that are built into that storage layer. This means customers can leverage Isilon
snapshot functionality managed by EMC NetWorker and can also do rollovers to Data
Domain protection storage. A rollover refers to performing a backup of a snapshot to
a secondary protection storage device via NDMP. This is typically done when longer
term retention of data is a requirement.
EMC target storage options
As described in the preceding paragraphs, both EMC Data Lake protection solutions
illustrated in Figure 1 can leverage EMC Data Domain, EMC Isilon, or EMC Elastic
Cloud Storage (ECS) as target storage depending on a number of factors including,
accessibility, storage efficiency, and capacity needs. Data Domain systems are ideal
for workloads that deduplicate well (databases, files, etc.) and provide storage
savings through industry leading variable-length deduplication. Isilon is a good fit for
data sets that don’t deduplicate well (video, voice, etc.) and provides efficient, cost-
effective storage from a single system. ECS is a good fit for object workloads at Cloud
(Exabytes) scale.
Data lake protection using Hadoop Distributed Copy
Hadoop Distributed Copy data protection to Data Domain
This section provides more details about leveraging the native Distributed Copy
(DistCp) utility built into HDFS (Hadoop File System) to backup and restore data from
an integrated compute & storage data lake to an on premise Data Domain protection
storage system.
The choice of using Data Domain systems as the target protection storage for this
solution will typically be made by customers based on a consideration of 3 primary
factors:
1. Will your data benefit from Data Domain variable-length deduplication &
compression storage benefits?
2. Does Data Domain storage scalability meet your needs? (Terabytes)
9
Data Lake Protection
A Technical Review
3. Does NFS/SMB (CIFS) meet your accessibility requirements?
DistCp (distributed copy) is a standard tool that comes with all Hadoop distributions
and versions that can be used to copy entire Hadoop directories. DistCp runs as a
MapReduce job to perform file copies in parallel, fully utilizing your systems if
desired. There is also an option to limit the bandwidth to control the impact on other
tasks.
This solution can be used in 2 different ways.
1. One approach takes a Pivotal HD HDFS snapshot from the Hadoop application
and then moves the snapshot using Pivotal DistCp to the protection storage.
2. The second approach uses Pivotal HD DistCp directly to the protection storage.
The advantage of the first approach is that the application is freed up after the
snapshot finishes.
In this data lake protection scenario, the Hadoop Administrator uses Pivotal HD
DistCp to perform full backups using NFS over Ethernet to an on premise Data Domain
system. The Data Domain system will ingest the backup data and perform variable-
length deduplication and compression.
The standard method to restore a DistCp backup, from a Data Domain system to a
traditional Hadoop infrastructure, is to run DistCp in the reverse direction again using
NFS over Ethernet. This is done simply by swapping the source and target paths. You
can perform partial or full restores and restores can be directed to the original
location or an alternate location.
Customers have the option of leveraging Data Domain replication to a separate Data
Domain system installed at a second site for additional disaster recovery protection.
DistCp restores could then be performed from the Data Domain system on the second
site for disaster recovery.
Benefits of using Hadoop Distributed Copy to Data Domain
Customers will realize very important benefits from Distributed Copy data lake
protection to Data Domain systems. First and most importantly, this Data Lake
protection solution provides enterprise-grade data protection for Hadoop from data
loss or corruption. This solution also gives the Hadoop Administrator direct visibility
and control over their data lake protection.
Data Domain’s Data Invulnerability Architecture provides the ultimate in data
protection ensuring that data from your data lake can be recovered when needed and
the data can be trusted. Data Domain systems provide storage efficiency through
variable-length deduplication and compression typically reducing storage
requirements by 10-30x. Data Domain systems are also very fast, capable of
performing backups up to 15 TB/hour minimizing the time it takes to complete your
data lake backups. If Data Domain systems are used for other data protection needs
then the same processes and expertise can be leveraged for data lake protection.
And finally, Data Domain Replicator can be leveraged for bandwidth efficient
replication to a Data Domain system at a second site for optional disaster recovery.
10
Data Lake Protection
A Technical Review
Hadoop Distributed Copy data protection to Isilon
This section provides more detail about leveraging the native Distributed Copy
(DistCp) utility built into HDFS (Hadoop File System) to backup & restore data from an
integrated compute & storage data lake to an on premise Isilon storage system.
The choice of using Isilon as the target storage for this solution will typically be made
by customers based on a consideration of 3 primary factors:
1. Do you already know that your data would not gain significant storage savings
from the variable-length deduplication & compression that Data Domain
systems would provide?
2. Does Isilon storage scalability meet your needs? (Petabytes)
3. Does your organization have NFS/SMB (CIFS)/HDFS accessibility
requirements?
DistCp (distributed copy) is a standard tool that comes with all Hadoop distributions
and versions that can be used to copy entire Hadoop directories. DistCp runs as a
MapReduce job to perform file copies in parallel, fully utilizing your systems if
desired. There is also an option to limit the bandwidth to control the impact on other
tasks.
This solution can be used in 2 different ways.
1. One approach takes a Pivotal HD HDFS snapshot from the Hadoop application
and then moves the snapshot using Pivotal DistCp to the target storage.
2. The second approach uses Pivotal HD DistCp directly to the target storage. The
advantage of the first approach is that the application is freed up after the
snapshot finishes.
In this data lake protection scenario, the Hadoop Administrator uses Pivotal HD
DistCp to perform full backups using NFS over Ethernet to an on premise Isilon
system. Isilon will ingest the backup data and perform post process deduplication
and compression.
The standard method to restore a DistCp backup from Isilon to a traditional Hadoop
infrastructure is to run DistCp in the reverse direction. This is done simply by
swapping the source and target paths. You can perform partial or full restores and
restores can be directed to the original location or an alternate location. The backup
target files on Isilon are accessible from Hadoop applications in the same way as the
source files due to Isilon’s support for HDFS. This provides a method to use your
backup data directly, without having to first restore it to your original source Hadoop
environment, which can save you analysis time overall.
Customers have the option of leveraging Isilon replication to a separate Isilon system
installed at a second site for additional disaster recovery protection. DistCp restores
could then be performed from the Isilon system on the second site for disaster
recovery.
11
Data Lake Protection
A Technical Review
Benefits of using Hadoop Distributed Copy to Isilon
Customers will realize very important benefits from Distributed Copy data lake
protection to Isilon systems. First and most importantly, this Data Lake protection
solution provides enterprise-grade data protection for Hadoop from data loss or
corruption. This solution also gives the Hadoop Administrator direct visibility and
control over their data lake protection.
Isilon is an ideal platform for Hadoop and other Big Data applications. It uses erasure
coding to protect data with greater than 80% storage efficiency, in contrast to
traditional HDFS with 33% storage efficiency. Isilon has several classes of node types.
This allows different Isilon tiers to be optimized for particular workloads. The backup
of traditional Hadoop environments to Isilon is easy to do and will allow for a dense
HDFS backup target.
If customer already uses Isilon for other needs then the same processes and
expertise can be leveraged for data lake protection. And finally, Isilon replication can
be leveraged to a second site for optional disaster recovery. DistCp restores could
then be performed from the second site Isilon system for disaster recovery.
Hadoop Distributed Copy data protection to ECS
This section provides more detail about leveraging the native Distributed Copy
(DistCp) utility built into HDFS (Hadoop File System) to backup & restore data from an
integrated compute & storage data lake to an on premise Elastic Cloud Storage
Appliance.
The choice of using ECS as the target storage for this solution will typically be made
by customers based on a consideration of 3 primary factors:
1. Do you already know that your data would not gain significant storage savings
from the variable-length deduplication & compression that Data Domain
systems would provide?
2. Do you require the hyperscale that ECS provides? (Exabytes)
3. Do you require Object/HDFS accessibility?
DistCp (distributed copy) is a standard tool that comes with all Hadoop distributions
and versions that can be used to copy entire Hadoop directories. DistCp runs as a
MapReduce job to perform file copies in parallel, fully utilizing your systems if
desired. There is also an option to limit the bandwidth to control the impact on other
tasks.
This solution can be used in 2 different ways.
1. One approach takes a Pivotal HD HDFS snapshot from the Hadoop application
and then moves the snapshot using Pivotal DistCp to the target storage.
2. The second approach uses Pivotal HD DistCp directly to the target storage. The
advantage of the first approach is that the application is freed up after the
snapshot finishes
12
Data Lake Protection
A Technical Review
In this data lake protection scenario, the Hadoop Administrator uses Pivotal HD
DistCp to perform full backups using NFS over Ethernet to an on premise ECS
Appliance.
The standard method to restore a DistCp backup from ECS to a traditional Hadoop
infrastructure is to run DistCp in the reverse direction. This is done simply by
swapping the source and target paths. You can perform partial or full restores and
restores can be directed to the original location or an alternate location.
Customers have the option of leveraging ECS replication to a separate ECS Appliance
installed at a second site for additional disaster recovery protection. DistCp restores
could then be performed from the second site ECS Appliance for disaster recovery.
Benefits of using Hadoop Distributed Copy to ECS
Customers will realize very important benefits from Distributed Copy data lake
protection to Elastic Cloud Storage. First and most importantly, this Data Lake
protection solution provides enterprise-grade data protection for Hadoop from data
loss or corruption. This solution also gives the Hadoop Administrator direct visibility
and control over their data lake protection.
The ECS Appliance makes hyperscale storage and cloud economics viable for any size
business by combining the power of ViPR on a low-cost, high density, scale out
commodity hardware platform. The ECS Appliance can be deployed and expanded
incrementally, so you can choose the right size for your immediate needs and your
projected growth. ECS allows you to optimize your data lake protection solution
based on your applications, storage requirements, and access needs – giving you the
flexibility and control that you want.
If customer already uses Elastic Cloud Storage for other needs then the same
processes and expertise can be leveraged for data lake protection.
Data lake protection using Isilon snapshots managed by EMC
NetWorker
Isilon snapshots managed by NetWorker Snapshot Management to Data
Domain
This section provides more detail about leveraging EMC NetWorker Snapshot
Management for data lake protection in deployments where the compute and storage
are separated and the HDFS layer is running on Isilon storage. Because you are using
shared Isilon storage, you can leverage all Isilon data management capabilities that
are built into the storage layer. In this data lake protection scenario, NetWorker
manages Isilon snapshots which are then rolled over to an on premise Data Domain
storage system.
The choice of using Data Domain systems as the target protection storage for this
solution will typically be made by customers based on a consideration of 3 primary
factors:
13
Data Lake Protection
A Technical Review
1. Will your data benefit from Data Domain variable-length deduplication &
compression storage benefits?
2. Does Data Domain storage scalability meet your needs? (Terabytes)
3. Does NFS meet your accessibility requirements?
The NetWorker Administrator can define a single policy to automate the data
protection process including initiating a snapshot on the data lake Isilon system and
then a executing a rollover of that Isilon snapshot using NDMP Tape Server over
Ethernet to an on premise Data Domain system. The Data Domain system will ingest
the snapshot data and perform variable-length deduplication and compression.
NetWorker maintains catalogs for all backups, snapshots, and clones which makes
restores for this data lake protection solution simple and straightforward. NetWorker
can also manage snapshot retention. To perform a restore, the NetWorker
Administrator can simply and quickly restore from the initial snapshot, or can select
one of the NDMP backup savesets that has been rolled over to the Data Domain
system and then restore it back to the primary Isilon system using NDMP over
Ethernet. Restoring from the snapshot offers the benefit of a much quicker RTO, while
recovery from the backup on a Data Domain provides quick access to longer RPOs.
NetWorker can perform full or partial restores and restores can be directed to the
original location or an alternate location on the same device.
Customers have the option of leveraging NetWorker controlled replication to a
separate Data Domain system installed at a second site for additional disaster
recovery protection. NetWorker restores could then be performed from the second
site Data Domain system for disaster recovery.
Benefits of using NetWorker managed Isilon snapshots to Data Domain
Customers will realize very important benefits from NetWorker management of Isilon
snapshots for data lake protection to a Data Domain system. First and most
importantly, this Data Lake protection solution provides enterprise-grade data
protection for Hadoop from data loss or corruption and provides superior RTOs.
NetWorker Snapshot Management simplifies the data protection process by
automating both the array snapshots & the rollovers to Data Domain. This data
protection solution provides multiple recovery options including recovery from the
initial snapshot and from rollover savesets on Data Domain protection storage.
Data Domain’s Data Invulnerability Architecture provides the best-in-class data
protection ensuring that data from your data lake can be recovered when needed and
the data can be trusted. Data Domain systems provide storage efficiency through
variable-length deduplication and compression typically reducing storage
requirements by 10-30x. Data Domain systems are also very fast, capable of
ingesting data up to 15 TB/hour minimizing the time it takes to complete data lake
protection backups. If customer already uses NetWorker or Data Domain systems for
other data protection needs then the same processes and expertise can be leveraged
for data lake protection. And finally, NetWorker can be leveraged to manage
14
Data Lake Protection
A Technical Review
bandwidth efficient Data Domain replication to a Data Domain system at a second
site for optional disaster recovery.
Isilon snapshots managed by NetWorker Snapshot Management to Isilon
This section provides more detail about leveraging EMC NetWorker Snapshot
Management for data lake protection in deployments where the compute and storage
are separated and the HDFS layer is running on Isilon storage. Because you are using
shared Isilon storage, you can leverage all Isilon data management capabilities that
are built into the storage layer. In this data lake protection scenario, NetWorker
manages Isilon snapshots which are then replicated to a second on premise Isilon
storage system.
The choice of using Isilon snap and replicate protection for this solution will typically
be made by customers based on a consideration of 4 primary factors:
1. Do you already know that your data would not gain significant storage savings
from the variable-length deduplication & compression that Data Domain
systems would provide?
2. Is it feasible to protect the amount of data that needs to be protected within
the allotted backup windows?
3. Does Isilon storage scalability meet your needs? (Petabytes)
4. Does your organization have NFS/SMB (CIFS)/HDFS accessibility
requirements?
The NetWorker Administrator can define a single policy to automate the data
protection process including initiating a snapshot on the data lake Isilon system and
automatically control the replication of that Isilon snapshot using Isilon SyncIQ to a
second on premise Isilon system. The second Isilon system will store a copy of the
snapshot data that has been replicated over by NetWorker and Isilon SyncIQ.
NetWorker maintains catalogs for all backups, snapshots, and clones which makes
restores for this data lake protection solution simple and straightforward. NetWorker
can also manage snapshot retention. To perform a restore, the NetWorker
Administrator can simply restore from the initial snapshot, or can select one of the
snapshots that have been replicated to the target Isilon system and then restore it
back to the primary Isilon system. NetWorker can perform full or partial restores and
restores can be directed to the original location or an alternate location on the same
device.
In a Remote Replication scenario, NetWorker can additionally orchestrate and
manage NDMP rollover to a Data Domain system or other backup target at the remote
site, completely offloading backup from the production Isilon system. This allows for
weekly or quarterly backups of larger datasets without impacting daily production.
Benefits of using NetWorker managed Isilon snapshots to Isilon
Customers will realize very important benefits from NetWorker management of Isilon
snapshots for data lake protection to Isilon storage. First and most importantly, this
15
Data Lake Protection
A Technical Review
Data Lake protection solution provides enterprise-grade data protection for Hadoop
from data loss or corruption and provides superior RTOs.
NetWorker Snapshot Management simplifies the data protection process by
automating both the initial snapshots & the replication process to a secondary Isilon.
This data protection solution provides multiple recovery options including recovery
from the initial snapshot on the source Isilon system and from replicated snapshots
on the second Isilon system. In addition, the ability to rollover to a Data Domain
system enables longer term retention and greater protection from data corruption and
disaster. The snapshot, replicate, and rollover process can all be controlled by a
single policy.
Isilon is an ideal platform for Hadoop and other Big Data applications. It uses erasure
coding to protect data with greater than 80% storage efficiency, in contrast to
traditional HDFS with 33% storage efficiency. Isilon has several classes of node types.
This allows different Isilon tiers to be optimized for particular workloads.
If customer already uses Isilon or NetWorker for other needs then the same processes
and expertise can be leveraged for this data lake protection solution. NetWorker
Snapshot Management is an integrated feature in NetWorker utilizing common
workflows and user interface for both snapshots and backup. And finally, NetWorker
can be leveraged to manage Isilon replication to another Isilon system at a second
site for optional disaster recovery.
Isilon snapshots managed by NetWorker Snapshot Management to ECS
This section provides more detail about leveraging EMC NetWorker Snapshot
Management for data lake protection in deployments where the compute and storage
are separated and the HDFS layer is running on Isilon storage. Because you are using
shared Isilon storage, you can leverage all Isilon data management capabilities that
are built into the storage layer. In this data lake protection scenario, NetWorker
manages Isilon snapshots which are then rolled over to an on premise Elastic Cloud
Storage (ECS) Appliance.
The choice of using ECS as the target storage for this solution will typically be made
by customers based on a consideration of 3 primary factors:
1. Do you already know that your data would not gain significant storage savings
from the variable-length deduplication & compression that Data Domain
systems would provide?
2. Do you require the hyperscale that ECS provides? (Exabytes)
3. Do you require Object/HDFS accessibility?
The NetWorker Administrator can define a single policy to automate the data
protection process including initiating a snapshot on the data lake Isilon system and
then executing a rollover of that Isilon snapshot using ECS APIs over Ethernet to a
second on premise ECS Appliance.
NetWorker maintains catalogs for all backups, snapshots, and clones which makes
restores for this data lake protection solution simple and straightforward. NetWorker
16
Data Lake Protection
A Technical Review
can also manage snapshot retention. To perform a restore, the NetWorker
Administrator can simply restore from the initial snapshot, or can select one of the
savesets that has been rolled over to the ECS system and then restore it back to the
primary Isilon system using ECS APIs over Ethernet. NetWorker can perform full or
partial restores and restores can be directed to the original location or an alternate
location on the same device.
Customers have the option of leveraging NetWorker controlled replication to a
separate ECS Appliance installed at a second site for additional disaster recovery
protection. NetWorker restores could then be performed from the second site ECS
Appliance for disaster recovery.
Benefits of using NetWorker managed Isilon snapshots to ECS
Customers will realize very important benefits from NetWorker management of Isilon
snapshots for data lake protection to Elastic Cloud Storage solution. First and most
importantly, this Data Lake protection solution provides enterprise-grade data
protection for Hadoop from data loss or corruption and provides superior RTOs.
NetWorker Snapshot Management simplifies the data protection process by
automating both the initial snapshots & the rollovers to ECS. This data protection
solution provides multiple recovery options including recovery from the initial
snapshot and from rollover savesets on ECS storage.
The ECS Appliance makes hyperscale storage and cloud economics viable for any size
business by combining the power of ViPR on a low-cost, high density, scale out
commodity hardware platform. The ECS Appliance can be deployed and expanded
incrementally, so you can choose the right size for your immediate needs and your
projected growth. ECS allows you to optimize your data lake protection solution
based on your applications, storage requirements, and access needs – giving you the
flexibility and control that you want.
If customer already uses NetWorker or Elastic Cloud Storage for other needs then the
same processes and expertise can be leveraged for data lake protection.
Customer benefits
As stated previously, all of the Data Lake protection solutions presented in this paper
provide much needed enterprise-grade data protection for Hadoop from data loss or
corruption. EMC gives customers choice in selecting the best data lake protection
solution depending on the size of their data lake, their data types, their accessibility
requirements, and their existing storage & data protection expertise.
The Data Lake protection solution options described in this paper that leverage Data
Domain systems as the protection storage target provide additional benefits that are
unique to Data Domain. Data Domain’s Data Invulnerability Architecture provides the
ultimate in data protection ensuring that data from your data lake can be recovered
when needed and the data can be trusted. Data Domain systems provide storage
efficiency through variable-length deduplication and compression typically reducing
17
Data Lake Protection
A Technical Review
storage requirements by 10-30x. Data Domain systems are also very fast, capable of
ingesting data up to 15 TB/hour minimizing the time it takes to complete data lake
protection backups. If customer already uses Data Domain for other data protection
needs then the same processes and expertise can be leveraged to protect your data
lake.
The Data Lake protection solution options described in this paper that leverage Isilon
systems as the storage target provide their own additional set of unique customer
benefits. Isilon uses erasure coding to protect data with greater than 80% storage
efficiency, in contrast to traditional HDFS with only 33% storage efficiency. Isilon has
several classes of node types which allow different Isilon tiers to be optimized for
particular workloads. If your organization already uses Isilon or for other needs then
the same processes and expertise can be leveraged for these data lake protection
solution options.
The Data Lake protection solution options described in this paper that leverage
Elastic Cloud Storage (ECS) as the storage target provide scalability and accessibility
advantages. The ECS Appliance makes hyperscale storage and cloud economics
viable for any size business by combining the power of ViPR on a low-cost, high
density, scale out commodity hardware platform. ECS allows you to optimize your
data lake protection solution based on your applications, storage requirements, and
access needs – giving you the flexibility and control that you want. And finally, if your
organization already uses Elastic Cloud Storage for other needs then the same
processes and expertise can be leveraged for data lake protection.
The Data Lake protection solutions described in this paper which leverage NetWorker
provide a number of additional advantages regardless of the storage option used.
The NetWorker administrator can define data protection policies that will automate all
the snapshot and rollover activities making day to day operations simple and
effective. NetWorker also provides control over retention of backups, snapshots, and
rollovers minimizing manual retention effort. And the NetWorker solution options
include the ability to recover from Isilon snapshots in addition to the rollover savesets
providing superior RTOs and maximum flexibility.
Conclusion
This paper has stated that Big Data use cases have matured, has provided a
definition for what is a data lake, and explained why customers are now demanding
serious enterprise-grade data lake protection solutions. As a thought leader in Big
Data solutions, EMC has presented in this paper a data protection strategy and
multiple data protection solution options to protect your data lake. EMC gives
customers choice for which solution approach and which target storage option best
meets their scalability & accessibility needs and can leverage any existing in-house
storage or data protection expertise that may already exist.
For more information on EMC Big Data and Data Lake solutions, please checkout our
Big Data solutions page on EMC.com.
18
Data Lake Protection
A Technical Review
Additional resources
EMC Data Domain Operating System Data Sheet
EMC Isilon Scale-Out Storage Product Family Data Sheet
EMC ECS Appliance, powered by ViPR Data Sheet
EMC Data Domain Data Invulnerability Architecture white paper
EMC NetWorker Data Sheet

Mais conteúdo relacionado

Mais procurados

IDC Whitepaper: Achieving the full Business Value of Virtualization
IDC Whitepaper: Achieving the full Business Value of VirtualizationIDC Whitepaper: Achieving the full Business Value of Virtualization
IDC Whitepaper: Achieving the full Business Value of VirtualizationDataCore Software
 
Comparing Cost of Dell EMC Centera and HPE/SUSE/iTernity iCAS
Comparing Cost of Dell EMC Centera and HPE/SUSE/iTernity iCASComparing Cost of Dell EMC Centera and HPE/SUSE/iTernity iCAS
Comparing Cost of Dell EMC Centera and HPE/SUSE/iTernity iCASIT Brand Pulse
 
Configuration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® ArchitectureConfiguration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® ArchitectureOdinot Stanislas
 
EMC Isilon Multitenancy for Hadoop Big Data Analytics
EMC Isilon Multitenancy for Hadoop Big Data AnalyticsEMC Isilon Multitenancy for Hadoop Big Data Analytics
EMC Isilon Multitenancy for Hadoop Big Data AnalyticsEMC
 
Decision Forward Cloud Backup-guide
Decision Forward Cloud Backup-guideDecision Forward Cloud Backup-guide
Decision Forward Cloud Backup-guideDavid Soden
 
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...Principled Technologies
 
Georg Thieme Verlag case study
Georg Thieme Verlag case studyGeorg Thieme Verlag case study
Georg Thieme Verlag case studyCisco Case Studies
 
Our Hero Flash eBook
Our Hero Flash eBookOur Hero Flash eBook
Our Hero Flash eBookthinkASG
 
Lowering Cloud and Data Center TCO with SAS Based Storage - Xyratex-Seagate
Lowering Cloud and Data Center TCO with SAS Based Storage - Xyratex-SeagateLowering Cloud and Data Center TCO with SAS Based Storage - Xyratex-Seagate
Lowering Cloud and Data Center TCO with SAS Based Storage - Xyratex-SeagateMichael K. Connolly, MBATM
 
Insider's Guide- Building a Virtualized Storage Service
Insider's Guide- Building a Virtualized Storage ServiceInsider's Guide- Building a Virtualized Storage Service
Insider's Guide- Building a Virtualized Storage ServiceDataCore Software
 
Gartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systemsGartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systemsparamitap
 
IBM PROTECTIER AND SAP: CRITICAL DATA PROTECTION WITHOUT DATA DISRUPTION
IBM PROTECTIER AND SAP: CRITICAL DATA PROTECTION WITHOUT DATA DISRUPTIONIBM PROTECTIER AND SAP: CRITICAL DATA PROTECTION WITHOUT DATA DISRUPTION
IBM PROTECTIER AND SAP: CRITICAL DATA PROTECTION WITHOUT DATA DISRUPTIONIBM India Smarter Computing
 
Grid rac preso 051007
Grid rac preso 051007Grid rac preso 051007
Grid rac preso 051007Sal Marcus
 
Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems divjeev
 
Virtualizing Business Critical Applications
Virtualizing Business Critical ApplicationsVirtualizing Business Critical Applications
Virtualizing Business Critical ApplicationsDataCore Software
 
Software Defined Storage Accelerates Storage Cost Reduction
Software Defined Storage Accelerates Storage Cost ReductionSoftware Defined Storage Accelerates Storage Cost Reduction
Software Defined Storage Accelerates Storage Cost ReductionDataCore Software
 

Mais procurados (20)

IDC Whitepaper: Achieving the full Business Value of Virtualization
IDC Whitepaper: Achieving the full Business Value of VirtualizationIDC Whitepaper: Achieving the full Business Value of Virtualization
IDC Whitepaper: Achieving the full Business Value of Virtualization
 
Comparing Cost of Dell EMC Centera and HPE/SUSE/iTernity iCAS
Comparing Cost of Dell EMC Centera and HPE/SUSE/iTernity iCASComparing Cost of Dell EMC Centera and HPE/SUSE/iTernity iCAS
Comparing Cost of Dell EMC Centera and HPE/SUSE/iTernity iCAS
 
Configuration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® ArchitectureConfiguration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® Architecture
 
EMC Isilon Multitenancy for Hadoop Big Data Analytics
EMC Isilon Multitenancy for Hadoop Big Data AnalyticsEMC Isilon Multitenancy for Hadoop Big Data Analytics
EMC Isilon Multitenancy for Hadoop Big Data Analytics
 
Decision Forward Cloud Backup-guide
Decision Forward Cloud Backup-guideDecision Forward Cloud Backup-guide
Decision Forward Cloud Backup-guide
 
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
 
Georg Thieme Verlag case study
Georg Thieme Verlag case studyGeorg Thieme Verlag case study
Georg Thieme Verlag case study
 
Our Hero Flash eBook
Our Hero Flash eBookOur Hero Flash eBook
Our Hero Flash eBook
 
Lowering Cloud and Data Center TCO with SAS Based Storage - Xyratex-Seagate
Lowering Cloud and Data Center TCO with SAS Based Storage - Xyratex-SeagateLowering Cloud and Data Center TCO with SAS Based Storage - Xyratex-Seagate
Lowering Cloud and Data Center TCO with SAS Based Storage - Xyratex-Seagate
 
Insider's Guide- Building a Virtualized Storage Service
Insider's Guide- Building a Virtualized Storage ServiceInsider's Guide- Building a Virtualized Storage Service
Insider's Guide- Building a Virtualized Storage Service
 
KEH Hospital
KEH HospitalKEH Hospital
KEH Hospital
 
Gartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systemsGartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systems
 
IBM PROTECTIER AND SAP: CRITICAL DATA PROTECTION WITHOUT DATA DISRUPTION
IBM PROTECTIER AND SAP: CRITICAL DATA PROTECTION WITHOUT DATA DISRUPTIONIBM PROTECTIER AND SAP: CRITICAL DATA PROTECTION WITHOUT DATA DISRUPTION
IBM PROTECTIER AND SAP: CRITICAL DATA PROTECTION WITHOUT DATA DISRUPTION
 
Solution Brief HPE StoreOnce backup with Veeam
Solution Brief HPE StoreOnce backup with VeeamSolution Brief HPE StoreOnce backup with Veeam
Solution Brief HPE StoreOnce backup with Veeam
 
Grid rac preso 051007
Grid rac preso 051007Grid rac preso 051007
Grid rac preso 051007
 
Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems
 
OUTBOARD SERVERS
OUTBOARD SERVERSOUTBOARD SERVERS
OUTBOARD SERVERS
 
Virtualizing Business Critical Applications
Virtualizing Business Critical ApplicationsVirtualizing Business Critical Applications
Virtualizing Business Critical Applications
 
Software Defined Storage Accelerates Storage Cost Reduction
Software Defined Storage Accelerates Storage Cost ReductionSoftware Defined Storage Accelerates Storage Cost Reduction
Software Defined Storage Accelerates Storage Cost Reduction
 
Data center terminology photostory
Data center terminology photostoryData center terminology photostory
Data center terminology photostory
 

Destaque

Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
 
Visual resume for Brad Wehring
Visual resume for Brad WehringVisual resume for Brad Wehring
Visual resume for Brad Wehringwehring1
 
Presentacion ingles problema de mi comunidad
Presentacion ingles problema de mi comunidadPresentacion ingles problema de mi comunidad
Presentacion ingles problema de mi comunidadthebosser
 
Skatta platform overview
Skatta platform overviewSkatta platform overview
Skatta platform overviewZak Lichfield
 
Mon post war europe
Mon post war europeMon post war europe
Mon post war europeTravis Klein
 
Hannah Strakey
Hannah StrakeyHannah Strakey
Hannah StrakeyDax Vorona
 
Facebook Regisrtration Plugin (Plugin)
Facebook Regisrtration Plugin (Plugin)Facebook Regisrtration Plugin (Plugin)
Facebook Regisrtration Plugin (Plugin)therealnicdev
 
HTTP 완벽가이드- 19장 배포시스템
HTTP 완벽가이드- 19장 배포시스템HTTP 완벽가이드- 19장 배포시스템
HTTP 완벽가이드- 19장 배포시스템박 민규
 
Transform Your Business with Big Data Storage
Transform Your Business with Big Data StorageTransform Your Business with Big Data Storage
Transform Your Business with Big Data StorageEMC
 
Town My Way User Guide
Town My Way User GuideTown My Way User Guide
Town My Way User Guidetownmyway
 
Japan russio japanese war
Japan russio japanese warJapan russio japanese war
Japan russio japanese warTravis Klein
 
Industrial rev vocabulary
Industrial rev vocabularyIndustrial rev vocabulary
Industrial rev vocabularyTravis Klein
 

Destaque (20)

Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
Biren shah
Biren shahBiren shah
Biren shah
 
Visual resume for Brad Wehring
Visual resume for Brad WehringVisual resume for Brad Wehring
Visual resume for Brad Wehring
 
Presentacion ingles problema de mi comunidad
Presentacion ingles problema de mi comunidadPresentacion ingles problema de mi comunidad
Presentacion ingles problema de mi comunidad
 
Skatta platform overview
Skatta platform overviewSkatta platform overview
Skatta platform overview
 
Mon post war europe
Mon post war europeMon post war europe
Mon post war europe
 
Hannah Strakey
Hannah StrakeyHannah Strakey
Hannah Strakey
 
Facebook Regisrtration Plugin (Plugin)
Facebook Regisrtration Plugin (Plugin)Facebook Regisrtration Plugin (Plugin)
Facebook Regisrtration Plugin (Plugin)
 
HTTP 완벽가이드- 19장 배포시스템
HTTP 완벽가이드- 19장 배포시스템HTTP 완벽가이드- 19장 배포시스템
HTTP 완벽가이드- 19장 배포시스템
 
Friday japan
Friday japanFriday japan
Friday japan
 
Presentation2
Presentation2Presentation2
Presentation2
 
Mongol oron neelttei hicheel
Mongol oron neelttei hicheelMongol oron neelttei hicheel
Mongol oron neelttei hicheel
 
ไทย 52
ไทย 52ไทย 52
ไทย 52
 
Monitoring and operating a private cloud with system center 2012
Monitoring and operating a private cloud with system center 2012Monitoring and operating a private cloud with system center 2012
Monitoring and operating a private cloud with system center 2012
 
Transform Your Business with Big Data Storage
Transform Your Business with Big Data StorageTransform Your Business with Big Data Storage
Transform Your Business with Big Data Storage
 
Pitch
PitchPitch
Pitch
 
Town My Way User Guide
Town My Way User GuideTown My Way User Guide
Town My Way User Guide
 
Japan russio japanese war
Japan russio japanese warJapan russio japanese war
Japan russio japanese war
 
Jn wp wpp2012
Jn wp wpp2012Jn wp wpp2012
Jn wp wpp2012
 
Industrial rev vocabulary
Industrial rev vocabularyIndustrial rev vocabulary
Industrial rev vocabulary
 

Semelhante a Protect Your Data Lake with EMC Solutions

White Paper: EMC Isilon OneFS Operating System
White Paper: EMC Isilon OneFS Operating System  White Paper: EMC Isilon OneFS Operating System
White Paper: EMC Isilon OneFS Operating System EMC
 
White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...
White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...
White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...EMC
 
Vmax 250 f_poweredge_r930_oracle_perf_0417_v3
Vmax 250 f_poweredge_r930_oracle_perf_0417_v3Vmax 250 f_poweredge_r930_oracle_perf_0417_v3
Vmax 250 f_poweredge_r930_oracle_perf_0417_v3photohobby
 
Preserve user response time while ensuring data availability
Preserve user response time while ensuring data availabilityPreserve user response time while ensuring data availability
Preserve user response time while ensuring data availabilityPrincipled Technologies
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataBlueData, Inc.
 
Backing Up Mountains of Data to Disk
Backing Up Mountains of Data to DiskBacking Up Mountains of Data to Disk
Backing Up Mountains of Data to DiskIT Brand Pulse
 
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...EMC
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeDenodo
 
DellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdf
DellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdfDellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdf
DellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdfhellobank1
 
DellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdf
DellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdfDellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdf
DellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdfhellobank1
 
EMC Isilon Solutions for Data Archives
EMC Isilon Solutions for Data ArchivesEMC Isilon Solutions for Data Archives
EMC Isilon Solutions for Data Archivessolarisyougood
 
The EMC Isilon Scale-Out Data Lake
The EMC Isilon Scale-Out Data LakeThe EMC Isilon Scale-Out Data Lake
The EMC Isilon Scale-Out Data LakeEMC
 
Dell: Why Virtualization
Dell: Why VirtualizationDell: Why Virtualization
Dell: Why VirtualizationLiamJohnson30
 
Sample_Blueprint-Fault_Tolerant_NAS
Sample_Blueprint-Fault_Tolerant_NASSample_Blueprint-Fault_Tolerant_NAS
Sample_Blueprint-Fault_Tolerant_NASMike Alvarado
 
Toshiba Cloud Storage Products 2014
Toshiba Cloud Storage Products 2014Toshiba Cloud Storage Products 2014
Toshiba Cloud Storage Products 2014Mustafa Kuğu
 
Dell EMC: Protect Your Workloads on AWS With Increased Scale & Performance
Dell EMC: Protect Your Workloads on AWS With Increased Scale & PerformanceDell EMC: Protect Your Workloads on AWS With Increased Scale & Performance
Dell EMC: Protect Your Workloads on AWS With Increased Scale & PerformanceAmazon Web Services
 
EMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMS
EMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMSEMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMS
EMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMSEMC
 
DataCore + Fujitsu Business Solutions
DataCore + Fujitsu Business SolutionsDataCore + Fujitsu Business Solutions
DataCore + Fujitsu Business SolutionsDataCore Software
 

Semelhante a Protect Your Data Lake with EMC Solutions (20)

White Paper: EMC Isilon OneFS Operating System
White Paper: EMC Isilon OneFS Operating System  White Paper: EMC Isilon OneFS Operating System
White Paper: EMC Isilon OneFS Operating System
 
White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...
White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...
White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...
 
Vmax 250 f_poweredge_r930_oracle_perf_0417_v3
Vmax 250 f_poweredge_r930_oracle_perf_0417_v3Vmax 250 f_poweredge_r930_oracle_perf_0417_v3
Vmax 250 f_poweredge_r930_oracle_perf_0417_v3
 
Preserve user response time while ensuring data availability
Preserve user response time while ensuring data availabilityPreserve user response time while ensuring data availability
Preserve user response time while ensuring data availability
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big Data
 
Backing Up Mountains of Data to Disk
Backing Up Mountains of Data to DiskBacking Up Mountains of Data to Disk
Backing Up Mountains of Data to Disk
 
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 
DellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdf
DellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdfDellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdf
DellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdf
 
DellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdf
DellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdfDellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdf
DellEMC_DSS7000_RedHat_Ceph_Performance_SizingGuide_WhitePaper.pdf
 
EMC Isilon Solutions for Data Archives
EMC Isilon Solutions for Data ArchivesEMC Isilon Solutions for Data Archives
EMC Isilon Solutions for Data Archives
 
Redington Value Journal - March 2019
Redington Value Journal - March 2019Redington Value Journal - March 2019
Redington Value Journal - March 2019
 
The EMC Isilon Scale-Out Data Lake
The EMC Isilon Scale-Out Data LakeThe EMC Isilon Scale-Out Data Lake
The EMC Isilon Scale-Out Data Lake
 
High Res CIO Review Article
High Res CIO Review ArticleHigh Res CIO Review Article
High Res CIO Review Article
 
Dell: Why Virtualization
Dell: Why VirtualizationDell: Why Virtualization
Dell: Why Virtualization
 
Sample_Blueprint-Fault_Tolerant_NAS
Sample_Blueprint-Fault_Tolerant_NASSample_Blueprint-Fault_Tolerant_NAS
Sample_Blueprint-Fault_Tolerant_NAS
 
Toshiba Cloud Storage Products 2014
Toshiba Cloud Storage Products 2014Toshiba Cloud Storage Products 2014
Toshiba Cloud Storage Products 2014
 
Dell EMC: Protect Your Workloads on AWS With Increased Scale & Performance
Dell EMC: Protect Your Workloads on AWS With Increased Scale & PerformanceDell EMC: Protect Your Workloads on AWS With Increased Scale & Performance
Dell EMC: Protect Your Workloads on AWS With Increased Scale & Performance
 
EMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMS
EMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMSEMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMS
EMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMS
 
DataCore + Fujitsu Business Solutions
DataCore + Fujitsu Business SolutionsDataCore + Fujitsu Business Solutions
DataCore + Fujitsu Business Solutions
 

Mais de EMC

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDEMC
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote EMC
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOEMC
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremioEMC
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereEMC
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History EMC
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeEMC
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic EMC
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityEMC
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeEMC
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015EMC
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesEMC
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsEMC
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookEMC
 
2014 Cybercrime Roundup: The Year of the POS Breach
2014 Cybercrime Roundup: The Year of the POS Breach2014 Cybercrime Roundup: The Year of the POS Breach
2014 Cybercrime Roundup: The Year of the POS BreachEMC
 

Mais de EMC (20)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBook
 
2014 Cybercrime Roundup: The Year of the POS Breach
2014 Cybercrime Roundup: The Year of the POS Breach2014 Cybercrime Roundup: The Year of the POS Breach
2014 Cybercrime Roundup: The Year of the POS Breach
 

Último

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Último (20)

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Protect Your Data Lake with EMC Solutions

  • 1. White Paper Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With this, enterprise readiness and data management needs are becoming increasingly important. Many organizations are working to make their Hadoop & NoSQL environments enterprise ready. Data lakes are the new repository that is becoming a single source of truth. Today, there is lack of operational data protection and enterprise readiness is becoming the main inhibitor for Big Data adoption. EMC is providing an effective data protection strategy that addresses these Big Data challenges. January 2015 Data Lake Protection A Technical Review
  • 2. 2 Data Lake Protection A Technical Review Copyright © 2015 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Part Number h13932
  • 3. 3 Data Lake Protection A Technical Review Contents Executive Summary................................................................................................ 4 The challenge.....................................................................................................................4 Solution overview...............................................................................................................4 Introduction........................................................................................................... 5 Audience............................................................................................................................5 Background........................................................................................................... 5 What is a data lake? ...........................................................................................................5 Big Data use cases are maturing.........................................................................................5 Data protection is a core component of any data lake.........................................................5 EMC Data Domain protection storage high level overview...................................................6 EMC Isilon scale-out NAS storage high level overview.........................................................6 EMC Elastic Cloud Storage (ECS) high level overview ..........................................................7 EMC solutions for data lake protection.................................................................... 7 Overview of Hadoop Distributed Copy data protection........................................................8 Overview of Isilon snapshots managed by NetWorker Snapshot Management....................8 EMC target storage options.................................................................................................8 Data lake protection using Hadoop Distributed Copy ............................................... 8 Hadoop Distributed Copy data protection to Data Domain..................................................8 Benefits of using Hadoop Distributed Copy to Data Domain................................................9 Hadoop Distributed Copy data protection to Isilon ...........................................................10 Benefits of using Hadoop Distributed Copy to Isilon .........................................................11 Hadoop Distributed Copy data protection to ECS..............................................................11 Benefits of using Hadoop Distributed Copy to ECS............................................................12 Data lake protection using Isilon snapshots managed by EMC NetWorker............... 12 Isilon snapshots managed by NetWorker Snapshot Management to Data Domain............12 Benefits of using NetWorker managed Isilon snapshots to Data Domain ..........................13 Isilon snapshots managed by NetWorker Snapshot Management to Isilon .......................14 Benefits of using NetWorker managed Isilon snapshots to Isilon......................................14 Isilon snapshots managed by NetWorker Snapshot Management to ECS..........................15 Benefits of using NetWorker managed Isilon snapshots to ECS ........................................16 Customer benefits................................................................................................ 16 Conclusion .......................................................................................................... 17
  • 4. 4 Data Lake Protection A Technical Review Executive Summary Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With this, the need for enterprise readiness and data management is becoming increasingly important. The challenge Today many organizations are working hard to make Hadoop and NoSQL enterprise ready. Hadoop has moved past the batch process. SQL has become the new battlefield. Data lakes are the new repository and are becoming the single source of truth. Most enterprises need to bridge the 2nd platform and Big Data gaps. Today there is lack of operational Big Data protection & management and enterprise readiness is becoming the main inhibitor for Big Data adoption. Without proper data protection, operationalizing Big Data analytics is going to be a nightmare and businesses will not be able to put Big Data to effective use. Solution overview EMC® is providing an effective data protection strategy to address the challenges associated with Big Data. This paper discusses two EMC Data Lake protection solutions that include: • Enterprise readiness for compute and storage stacks • Cost effective backup for operational recovery and disaster recovery leveraging existing infrastructure, people, and processes • Support for different data lake deployments with different source and destination storage (Data Domain, Isilon, ECS), and leverage different data movers (DistCp, snapshot, NDMP etc.)
  • 5. 5 Data Lake Protection A Technical Review Introduction The purpose of this white paper is to provide important background information on why data lake protection has become critically important and to communicate the EMC Data Lake protection solutions that can help customers achieve higher degrees of business value and operational efficiency with their important structured, semi- structured, and unstructured data in a data lake. Audience This white paper is intended for IT & Hadoop Administrators, systems engineers, partners and members of the EMC and partner professional services community who are looking to better understand and implement EMC Data Lake protection options. Background What is a data lake? In simple terms, a data lake is the evolution of an Enterprise Data Warehouse (EDW) into an active repository for structured, semi-structured, and unstructured data that retains all attributes against which you can run important business analytics. Said another way, data lakes are emerging as the primary landing zone for many disparate data sources, like clickstreams, weblogs, sensor data etc. The data lake is formed by the combination of Hadoop and NoSQL. One of the main characteristics of an effective data lake is that it doesn’t require an upfront schema which means it is much more flexible and makes it very easy to add new data sources and store them in their native format. This flexibility allows customers to easily add and leverage many other data sources in order to make more holistic business decisions on their data. Big Data use cases are maturing Big Data is transforming every business and use cases such as EDW optimization, Big Data archive, security & operational analytics, and 360 degree customer analytics for targeting marketing are maturing. Large retailers are leveraging Big Data for price optimization, financial services firms conduct fraud analysis & risk profiling, and large pharmaceutical companies use Big Data to support drug development. Recent advances in Hadoop have taken data lakes beyond just batch processing and are now more interactive and real-time leading to the data lake becoming the primary data source. Data protection is a core component of any data lake As the value of Big Data use cases and data lakes have become more critical, there is now an important operational requirement for enterprise-grade data lake data protection and disaster recovery. Insufficient data protection can have costly consequences, keeping organizations from tackling new Big Data projects and reaching new markets. EMC is uniquely positioned to provide the critical data
  • 6. 6 Data Lake Protection A Technical Review protection needed by leveraging the broad solution strengths of the EMC Federation including partners such as Pivotal. EMC Data Domain protection storage high level overview EMC® Data Domain® protection storage systems deliver industry-leading speed and efficiency with throughput up to 15 TB/hour enabling more backups to complete sooner and reducing pressure on backup windows. Data Domain systems leverage variable-length deduplication to minimize disk requirements and ensure data lands on disk already deduplicated. This reduces backup and archive storage requirements by an average of 10 to 30x, making disk a cost-effective alternative to tape. Data on disk is available online and onsite for longer retention periods and restores and retrievals become fast and reliable. This efficiency enables Data Domain systems to protect up to 55 PB of logical capacity for backup and archive data on a single system. Data Domain systems are designed as the storage of last resort – built to ensure you can reliably recover your data with confidence. The Data Domain Data Invulnerability Architecture is built into the Data Domain Operating System (DD OS) to provide the industry’s best defense against data integrity issues. For additional information on Data Domain systems please refer to the EMC Data Domain Data Sheet, and the EMC Data Domain Data Invulnerability Architecture white paper. EMC Isilon scale-out NAS storage high level overview EMC® Isilon® scale-out storage solutions are designed for enterprises that want to manage their data, not their storage. Isilon storage systems are powerful yet simple to install, manage, and scale to virtually any size. And, unlike traditional enterprise storage, Isilon solutions stay simple no matter how much storage capacity is added, how much performance is required, or how business needs change in the future. Isilon challenges enterprises to think differently about their storage, because when they do, they’ll recognize there’s a better, simpler way – with EMC Isilon. Through the winning combination of the groundbreaking Isilon OneFS operating system, high-performance industry-standard hardware, and powerful data and storage management software, Isilon provides a complete portfolio of innovative storage solutions that drive business value for customers by optimizing mission- critical applications, workflows, and processes. Isilon storage enables enterprise and research organizations worldwide to manage large and rapidly growing amounts of data in a highly scalable, easy-to-manage, and cost effective way. Every Isilon solution is designed to accelerate workflow productivity and reduce capital and operational expenditures, while seamlessly scaling storage in lockstep with the growth of mission-critical data. For additional information on Isilon please refer to the EMC Isilon Data Sheet.
  • 7. 7 Data Lake Protection A Technical Review EMC Elastic Cloud Storage (ECS) high level overview Customers are continually looking for more efficient architectures to manage today’s hyperscale growth. Powered by EMC® ViPR® , the new Elastic Cloud Storage (ECSTM ) Appliance provides a complete hyperscale storage infrastructure designed to meet the requirements of modern applications. Regardless of the size of your organization, the ECS Appliance lets you deliver competitive cloud storage services and grow effortlessly. The ECS Appliance brings the cost profile, simplicity and scale of public cloud services to anyone – with the trust, reliability and support you expect from EMC. The ECS Appliance helps: • Data Scientists accelerate Big Data initiatives • Cloud Providers deliver competitive Cloud Storage services at scale • Enterprises and software developers to accelerate development The ECS Appliance makes hyperscale storage and cloud economics viable for any size business by combining the power of ViPR on a low-cost, high density, scale out commodity hardware platform. The ECS Appliance is available in multiple form factors that can be deployed and expanded incrementally, so each customer can choose the right size for their immediate needs and projected growth. Customers can now optimize their solution based on their application and access needs – giving them the flexibility and control they want. For additional information on Elastic Cloud Storage please refer to the EMC ECS Data Sheet. EMC solutions for data lake protection EMC offers two different solution options for Data Lake protection; Hadoop Distributed Copy for deployments where compute & storage have been integrated (DAS architecture), and Isilon snapshots managed by EMC® NetWorker® Snapshot Management for deployments where compute is separate from storage and the storage is shared. Both solutions are illustrated in Figure 1 and explained in more detail throughout the rest of this paper. Figure 1: Data Lake protection solutions
  • 8. 8 Data Lake Protection A Technical Review Overview of Hadoop Distributed Copy data protection This solution, illustrated on the left in Figure 1, leverages the native Distributed Copy (DistCp) utility built into Pivotal HD (HDFS) to copy data from the integrated compute & storage data lake to EMC Data Domain, EMC Isilon, or EMC ECS storage. This approach leverages all of the nodes in the cluster to push the data. Overview of Isilon snapshots managed by NetWorker Snapshot Management Isilon snapshots managed by NetWorker Snapshot Management, illustrated on the right in Figure 1, applies to data lake deployments where the compute and storage are separated and the HDFS layer is running on the shared storage. Because you are using shared storage, customers can leverage all the data management capabilities that are built into that storage layer. This means customers can leverage Isilon snapshot functionality managed by EMC NetWorker and can also do rollovers to Data Domain protection storage. A rollover refers to performing a backup of a snapshot to a secondary protection storage device via NDMP. This is typically done when longer term retention of data is a requirement. EMC target storage options As described in the preceding paragraphs, both EMC Data Lake protection solutions illustrated in Figure 1 can leverage EMC Data Domain, EMC Isilon, or EMC Elastic Cloud Storage (ECS) as target storage depending on a number of factors including, accessibility, storage efficiency, and capacity needs. Data Domain systems are ideal for workloads that deduplicate well (databases, files, etc.) and provide storage savings through industry leading variable-length deduplication. Isilon is a good fit for data sets that don’t deduplicate well (video, voice, etc.) and provides efficient, cost- effective storage from a single system. ECS is a good fit for object workloads at Cloud (Exabytes) scale. Data lake protection using Hadoop Distributed Copy Hadoop Distributed Copy data protection to Data Domain This section provides more details about leveraging the native Distributed Copy (DistCp) utility built into HDFS (Hadoop File System) to backup and restore data from an integrated compute & storage data lake to an on premise Data Domain protection storage system. The choice of using Data Domain systems as the target protection storage for this solution will typically be made by customers based on a consideration of 3 primary factors: 1. Will your data benefit from Data Domain variable-length deduplication & compression storage benefits? 2. Does Data Domain storage scalability meet your needs? (Terabytes)
  • 9. 9 Data Lake Protection A Technical Review 3. Does NFS/SMB (CIFS) meet your accessibility requirements? DistCp (distributed copy) is a standard tool that comes with all Hadoop distributions and versions that can be used to copy entire Hadoop directories. DistCp runs as a MapReduce job to perform file copies in parallel, fully utilizing your systems if desired. There is also an option to limit the bandwidth to control the impact on other tasks. This solution can be used in 2 different ways. 1. One approach takes a Pivotal HD HDFS snapshot from the Hadoop application and then moves the snapshot using Pivotal DistCp to the protection storage. 2. The second approach uses Pivotal HD DistCp directly to the protection storage. The advantage of the first approach is that the application is freed up after the snapshot finishes. In this data lake protection scenario, the Hadoop Administrator uses Pivotal HD DistCp to perform full backups using NFS over Ethernet to an on premise Data Domain system. The Data Domain system will ingest the backup data and perform variable- length deduplication and compression. The standard method to restore a DistCp backup, from a Data Domain system to a traditional Hadoop infrastructure, is to run DistCp in the reverse direction again using NFS over Ethernet. This is done simply by swapping the source and target paths. You can perform partial or full restores and restores can be directed to the original location or an alternate location. Customers have the option of leveraging Data Domain replication to a separate Data Domain system installed at a second site for additional disaster recovery protection. DistCp restores could then be performed from the Data Domain system on the second site for disaster recovery. Benefits of using Hadoop Distributed Copy to Data Domain Customers will realize very important benefits from Distributed Copy data lake protection to Data Domain systems. First and most importantly, this Data Lake protection solution provides enterprise-grade data protection for Hadoop from data loss or corruption. This solution also gives the Hadoop Administrator direct visibility and control over their data lake protection. Data Domain’s Data Invulnerability Architecture provides the ultimate in data protection ensuring that data from your data lake can be recovered when needed and the data can be trusted. Data Domain systems provide storage efficiency through variable-length deduplication and compression typically reducing storage requirements by 10-30x. Data Domain systems are also very fast, capable of performing backups up to 15 TB/hour minimizing the time it takes to complete your data lake backups. If Data Domain systems are used for other data protection needs then the same processes and expertise can be leveraged for data lake protection. And finally, Data Domain Replicator can be leveraged for bandwidth efficient replication to a Data Domain system at a second site for optional disaster recovery.
  • 10. 10 Data Lake Protection A Technical Review Hadoop Distributed Copy data protection to Isilon This section provides more detail about leveraging the native Distributed Copy (DistCp) utility built into HDFS (Hadoop File System) to backup & restore data from an integrated compute & storage data lake to an on premise Isilon storage system. The choice of using Isilon as the target storage for this solution will typically be made by customers based on a consideration of 3 primary factors: 1. Do you already know that your data would not gain significant storage savings from the variable-length deduplication & compression that Data Domain systems would provide? 2. Does Isilon storage scalability meet your needs? (Petabytes) 3. Does your organization have NFS/SMB (CIFS)/HDFS accessibility requirements? DistCp (distributed copy) is a standard tool that comes with all Hadoop distributions and versions that can be used to copy entire Hadoop directories. DistCp runs as a MapReduce job to perform file copies in parallel, fully utilizing your systems if desired. There is also an option to limit the bandwidth to control the impact on other tasks. This solution can be used in 2 different ways. 1. One approach takes a Pivotal HD HDFS snapshot from the Hadoop application and then moves the snapshot using Pivotal DistCp to the target storage. 2. The second approach uses Pivotal HD DistCp directly to the target storage. The advantage of the first approach is that the application is freed up after the snapshot finishes. In this data lake protection scenario, the Hadoop Administrator uses Pivotal HD DistCp to perform full backups using NFS over Ethernet to an on premise Isilon system. Isilon will ingest the backup data and perform post process deduplication and compression. The standard method to restore a DistCp backup from Isilon to a traditional Hadoop infrastructure is to run DistCp in the reverse direction. This is done simply by swapping the source and target paths. You can perform partial or full restores and restores can be directed to the original location or an alternate location. The backup target files on Isilon are accessible from Hadoop applications in the same way as the source files due to Isilon’s support for HDFS. This provides a method to use your backup data directly, without having to first restore it to your original source Hadoop environment, which can save you analysis time overall. Customers have the option of leveraging Isilon replication to a separate Isilon system installed at a second site for additional disaster recovery protection. DistCp restores could then be performed from the Isilon system on the second site for disaster recovery.
  • 11. 11 Data Lake Protection A Technical Review Benefits of using Hadoop Distributed Copy to Isilon Customers will realize very important benefits from Distributed Copy data lake protection to Isilon systems. First and most importantly, this Data Lake protection solution provides enterprise-grade data protection for Hadoop from data loss or corruption. This solution also gives the Hadoop Administrator direct visibility and control over their data lake protection. Isilon is an ideal platform for Hadoop and other Big Data applications. It uses erasure coding to protect data with greater than 80% storage efficiency, in contrast to traditional HDFS with 33% storage efficiency. Isilon has several classes of node types. This allows different Isilon tiers to be optimized for particular workloads. The backup of traditional Hadoop environments to Isilon is easy to do and will allow for a dense HDFS backup target. If customer already uses Isilon for other needs then the same processes and expertise can be leveraged for data lake protection. And finally, Isilon replication can be leveraged to a second site for optional disaster recovery. DistCp restores could then be performed from the second site Isilon system for disaster recovery. Hadoop Distributed Copy data protection to ECS This section provides more detail about leveraging the native Distributed Copy (DistCp) utility built into HDFS (Hadoop File System) to backup & restore data from an integrated compute & storage data lake to an on premise Elastic Cloud Storage Appliance. The choice of using ECS as the target storage for this solution will typically be made by customers based on a consideration of 3 primary factors: 1. Do you already know that your data would not gain significant storage savings from the variable-length deduplication & compression that Data Domain systems would provide? 2. Do you require the hyperscale that ECS provides? (Exabytes) 3. Do you require Object/HDFS accessibility? DistCp (distributed copy) is a standard tool that comes with all Hadoop distributions and versions that can be used to copy entire Hadoop directories. DistCp runs as a MapReduce job to perform file copies in parallel, fully utilizing your systems if desired. There is also an option to limit the bandwidth to control the impact on other tasks. This solution can be used in 2 different ways. 1. One approach takes a Pivotal HD HDFS snapshot from the Hadoop application and then moves the snapshot using Pivotal DistCp to the target storage. 2. The second approach uses Pivotal HD DistCp directly to the target storage. The advantage of the first approach is that the application is freed up after the snapshot finishes
  • 12. 12 Data Lake Protection A Technical Review In this data lake protection scenario, the Hadoop Administrator uses Pivotal HD DistCp to perform full backups using NFS over Ethernet to an on premise ECS Appliance. The standard method to restore a DistCp backup from ECS to a traditional Hadoop infrastructure is to run DistCp in the reverse direction. This is done simply by swapping the source and target paths. You can perform partial or full restores and restores can be directed to the original location or an alternate location. Customers have the option of leveraging ECS replication to a separate ECS Appliance installed at a second site for additional disaster recovery protection. DistCp restores could then be performed from the second site ECS Appliance for disaster recovery. Benefits of using Hadoop Distributed Copy to ECS Customers will realize very important benefits from Distributed Copy data lake protection to Elastic Cloud Storage. First and most importantly, this Data Lake protection solution provides enterprise-grade data protection for Hadoop from data loss or corruption. This solution also gives the Hadoop Administrator direct visibility and control over their data lake protection. The ECS Appliance makes hyperscale storage and cloud economics viable for any size business by combining the power of ViPR on a low-cost, high density, scale out commodity hardware platform. The ECS Appliance can be deployed and expanded incrementally, so you can choose the right size for your immediate needs and your projected growth. ECS allows you to optimize your data lake protection solution based on your applications, storage requirements, and access needs – giving you the flexibility and control that you want. If customer already uses Elastic Cloud Storage for other needs then the same processes and expertise can be leveraged for data lake protection. Data lake protection using Isilon snapshots managed by EMC NetWorker Isilon snapshots managed by NetWorker Snapshot Management to Data Domain This section provides more detail about leveraging EMC NetWorker Snapshot Management for data lake protection in deployments where the compute and storage are separated and the HDFS layer is running on Isilon storage. Because you are using shared Isilon storage, you can leverage all Isilon data management capabilities that are built into the storage layer. In this data lake protection scenario, NetWorker manages Isilon snapshots which are then rolled over to an on premise Data Domain storage system. The choice of using Data Domain systems as the target protection storage for this solution will typically be made by customers based on a consideration of 3 primary factors:
  • 13. 13 Data Lake Protection A Technical Review 1. Will your data benefit from Data Domain variable-length deduplication & compression storage benefits? 2. Does Data Domain storage scalability meet your needs? (Terabytes) 3. Does NFS meet your accessibility requirements? The NetWorker Administrator can define a single policy to automate the data protection process including initiating a snapshot on the data lake Isilon system and then a executing a rollover of that Isilon snapshot using NDMP Tape Server over Ethernet to an on premise Data Domain system. The Data Domain system will ingest the snapshot data and perform variable-length deduplication and compression. NetWorker maintains catalogs for all backups, snapshots, and clones which makes restores for this data lake protection solution simple and straightforward. NetWorker can also manage snapshot retention. To perform a restore, the NetWorker Administrator can simply and quickly restore from the initial snapshot, or can select one of the NDMP backup savesets that has been rolled over to the Data Domain system and then restore it back to the primary Isilon system using NDMP over Ethernet. Restoring from the snapshot offers the benefit of a much quicker RTO, while recovery from the backup on a Data Domain provides quick access to longer RPOs. NetWorker can perform full or partial restores and restores can be directed to the original location or an alternate location on the same device. Customers have the option of leveraging NetWorker controlled replication to a separate Data Domain system installed at a second site for additional disaster recovery protection. NetWorker restores could then be performed from the second site Data Domain system for disaster recovery. Benefits of using NetWorker managed Isilon snapshots to Data Domain Customers will realize very important benefits from NetWorker management of Isilon snapshots for data lake protection to a Data Domain system. First and most importantly, this Data Lake protection solution provides enterprise-grade data protection for Hadoop from data loss or corruption and provides superior RTOs. NetWorker Snapshot Management simplifies the data protection process by automating both the array snapshots & the rollovers to Data Domain. This data protection solution provides multiple recovery options including recovery from the initial snapshot and from rollover savesets on Data Domain protection storage. Data Domain’s Data Invulnerability Architecture provides the best-in-class data protection ensuring that data from your data lake can be recovered when needed and the data can be trusted. Data Domain systems provide storage efficiency through variable-length deduplication and compression typically reducing storage requirements by 10-30x. Data Domain systems are also very fast, capable of ingesting data up to 15 TB/hour minimizing the time it takes to complete data lake protection backups. If customer already uses NetWorker or Data Domain systems for other data protection needs then the same processes and expertise can be leveraged for data lake protection. And finally, NetWorker can be leveraged to manage
  • 14. 14 Data Lake Protection A Technical Review bandwidth efficient Data Domain replication to a Data Domain system at a second site for optional disaster recovery. Isilon snapshots managed by NetWorker Snapshot Management to Isilon This section provides more detail about leveraging EMC NetWorker Snapshot Management for data lake protection in deployments where the compute and storage are separated and the HDFS layer is running on Isilon storage. Because you are using shared Isilon storage, you can leverage all Isilon data management capabilities that are built into the storage layer. In this data lake protection scenario, NetWorker manages Isilon snapshots which are then replicated to a second on premise Isilon storage system. The choice of using Isilon snap and replicate protection for this solution will typically be made by customers based on a consideration of 4 primary factors: 1. Do you already know that your data would not gain significant storage savings from the variable-length deduplication & compression that Data Domain systems would provide? 2. Is it feasible to protect the amount of data that needs to be protected within the allotted backup windows? 3. Does Isilon storage scalability meet your needs? (Petabytes) 4. Does your organization have NFS/SMB (CIFS)/HDFS accessibility requirements? The NetWorker Administrator can define a single policy to automate the data protection process including initiating a snapshot on the data lake Isilon system and automatically control the replication of that Isilon snapshot using Isilon SyncIQ to a second on premise Isilon system. The second Isilon system will store a copy of the snapshot data that has been replicated over by NetWorker and Isilon SyncIQ. NetWorker maintains catalogs for all backups, snapshots, and clones which makes restores for this data lake protection solution simple and straightforward. NetWorker can also manage snapshot retention. To perform a restore, the NetWorker Administrator can simply restore from the initial snapshot, or can select one of the snapshots that have been replicated to the target Isilon system and then restore it back to the primary Isilon system. NetWorker can perform full or partial restores and restores can be directed to the original location or an alternate location on the same device. In a Remote Replication scenario, NetWorker can additionally orchestrate and manage NDMP rollover to a Data Domain system or other backup target at the remote site, completely offloading backup from the production Isilon system. This allows for weekly or quarterly backups of larger datasets without impacting daily production. Benefits of using NetWorker managed Isilon snapshots to Isilon Customers will realize very important benefits from NetWorker management of Isilon snapshots for data lake protection to Isilon storage. First and most importantly, this
  • 15. 15 Data Lake Protection A Technical Review Data Lake protection solution provides enterprise-grade data protection for Hadoop from data loss or corruption and provides superior RTOs. NetWorker Snapshot Management simplifies the data protection process by automating both the initial snapshots & the replication process to a secondary Isilon. This data protection solution provides multiple recovery options including recovery from the initial snapshot on the source Isilon system and from replicated snapshots on the second Isilon system. In addition, the ability to rollover to a Data Domain system enables longer term retention and greater protection from data corruption and disaster. The snapshot, replicate, and rollover process can all be controlled by a single policy. Isilon is an ideal platform for Hadoop and other Big Data applications. It uses erasure coding to protect data with greater than 80% storage efficiency, in contrast to traditional HDFS with 33% storage efficiency. Isilon has several classes of node types. This allows different Isilon tiers to be optimized for particular workloads. If customer already uses Isilon or NetWorker for other needs then the same processes and expertise can be leveraged for this data lake protection solution. NetWorker Snapshot Management is an integrated feature in NetWorker utilizing common workflows and user interface for both snapshots and backup. And finally, NetWorker can be leveraged to manage Isilon replication to another Isilon system at a second site for optional disaster recovery. Isilon snapshots managed by NetWorker Snapshot Management to ECS This section provides more detail about leveraging EMC NetWorker Snapshot Management for data lake protection in deployments where the compute and storage are separated and the HDFS layer is running on Isilon storage. Because you are using shared Isilon storage, you can leverage all Isilon data management capabilities that are built into the storage layer. In this data lake protection scenario, NetWorker manages Isilon snapshots which are then rolled over to an on premise Elastic Cloud Storage (ECS) Appliance. The choice of using ECS as the target storage for this solution will typically be made by customers based on a consideration of 3 primary factors: 1. Do you already know that your data would not gain significant storage savings from the variable-length deduplication & compression that Data Domain systems would provide? 2. Do you require the hyperscale that ECS provides? (Exabytes) 3. Do you require Object/HDFS accessibility? The NetWorker Administrator can define a single policy to automate the data protection process including initiating a snapshot on the data lake Isilon system and then executing a rollover of that Isilon snapshot using ECS APIs over Ethernet to a second on premise ECS Appliance. NetWorker maintains catalogs for all backups, snapshots, and clones which makes restores for this data lake protection solution simple and straightforward. NetWorker
  • 16. 16 Data Lake Protection A Technical Review can also manage snapshot retention. To perform a restore, the NetWorker Administrator can simply restore from the initial snapshot, or can select one of the savesets that has been rolled over to the ECS system and then restore it back to the primary Isilon system using ECS APIs over Ethernet. NetWorker can perform full or partial restores and restores can be directed to the original location or an alternate location on the same device. Customers have the option of leveraging NetWorker controlled replication to a separate ECS Appliance installed at a second site for additional disaster recovery protection. NetWorker restores could then be performed from the second site ECS Appliance for disaster recovery. Benefits of using NetWorker managed Isilon snapshots to ECS Customers will realize very important benefits from NetWorker management of Isilon snapshots for data lake protection to Elastic Cloud Storage solution. First and most importantly, this Data Lake protection solution provides enterprise-grade data protection for Hadoop from data loss or corruption and provides superior RTOs. NetWorker Snapshot Management simplifies the data protection process by automating both the initial snapshots & the rollovers to ECS. This data protection solution provides multiple recovery options including recovery from the initial snapshot and from rollover savesets on ECS storage. The ECS Appliance makes hyperscale storage and cloud economics viable for any size business by combining the power of ViPR on a low-cost, high density, scale out commodity hardware platform. The ECS Appliance can be deployed and expanded incrementally, so you can choose the right size for your immediate needs and your projected growth. ECS allows you to optimize your data lake protection solution based on your applications, storage requirements, and access needs – giving you the flexibility and control that you want. If customer already uses NetWorker or Elastic Cloud Storage for other needs then the same processes and expertise can be leveraged for data lake protection. Customer benefits As stated previously, all of the Data Lake protection solutions presented in this paper provide much needed enterprise-grade data protection for Hadoop from data loss or corruption. EMC gives customers choice in selecting the best data lake protection solution depending on the size of their data lake, their data types, their accessibility requirements, and their existing storage & data protection expertise. The Data Lake protection solution options described in this paper that leverage Data Domain systems as the protection storage target provide additional benefits that are unique to Data Domain. Data Domain’s Data Invulnerability Architecture provides the ultimate in data protection ensuring that data from your data lake can be recovered when needed and the data can be trusted. Data Domain systems provide storage efficiency through variable-length deduplication and compression typically reducing
  • 17. 17 Data Lake Protection A Technical Review storage requirements by 10-30x. Data Domain systems are also very fast, capable of ingesting data up to 15 TB/hour minimizing the time it takes to complete data lake protection backups. If customer already uses Data Domain for other data protection needs then the same processes and expertise can be leveraged to protect your data lake. The Data Lake protection solution options described in this paper that leverage Isilon systems as the storage target provide their own additional set of unique customer benefits. Isilon uses erasure coding to protect data with greater than 80% storage efficiency, in contrast to traditional HDFS with only 33% storage efficiency. Isilon has several classes of node types which allow different Isilon tiers to be optimized for particular workloads. If your organization already uses Isilon or for other needs then the same processes and expertise can be leveraged for these data lake protection solution options. The Data Lake protection solution options described in this paper that leverage Elastic Cloud Storage (ECS) as the storage target provide scalability and accessibility advantages. The ECS Appliance makes hyperscale storage and cloud economics viable for any size business by combining the power of ViPR on a low-cost, high density, scale out commodity hardware platform. ECS allows you to optimize your data lake protection solution based on your applications, storage requirements, and access needs – giving you the flexibility and control that you want. And finally, if your organization already uses Elastic Cloud Storage for other needs then the same processes and expertise can be leveraged for data lake protection. The Data Lake protection solutions described in this paper which leverage NetWorker provide a number of additional advantages regardless of the storage option used. The NetWorker administrator can define data protection policies that will automate all the snapshot and rollover activities making day to day operations simple and effective. NetWorker also provides control over retention of backups, snapshots, and rollovers minimizing manual retention effort. And the NetWorker solution options include the ability to recover from Isilon snapshots in addition to the rollover savesets providing superior RTOs and maximum flexibility. Conclusion This paper has stated that Big Data use cases have matured, has provided a definition for what is a data lake, and explained why customers are now demanding serious enterprise-grade data lake protection solutions. As a thought leader in Big Data solutions, EMC has presented in this paper a data protection strategy and multiple data protection solution options to protect your data lake. EMC gives customers choice for which solution approach and which target storage option best meets their scalability & accessibility needs and can leverage any existing in-house storage or data protection expertise that may already exist. For more information on EMC Big Data and Data Lake solutions, please checkout our Big Data solutions page on EMC.com.
  • 18. 18 Data Lake Protection A Technical Review Additional resources EMC Data Domain Operating System Data Sheet EMC Isilon Scale-Out Storage Product Family Data Sheet EMC ECS Appliance, powered by ViPR Data Sheet EMC Data Domain Data Invulnerability Architecture white paper EMC NetWorker Data Sheet