Operating in a Multi-execution Engine Hadoop Environment by Erik Halseth of Datameer

•

0 gostou•731 visualizações

Businesses want to execute an analytical job at scale in Hadoop, but different parts of that job are potential candidates for specific execution engines so that the job performs under optimal conditions. In addition, past engines such as classic Map Reduce are potentially giving way to new ones such as Spark. This talk will demonstrate how you can leverage the Datameer application to hide the complexity of choosing the right execution engine for an analytical job at scale in Hadoop, and how Spark fits into this context.

Tecnologia

© 2014 Datameer, Inc. All rights reserved.
Datameer’s Vision!

Make big data analytics
simple for everyone

© 2014 Datameer, Inc. All rights reserved.
What Datameer Offers!
Wizard-led Data Integration!
• No ETL!
• 59 Connectors + plug-in API!
• Smart Sampling!
Point-and-click Analytics!
• Interactive spreadsheet UI!
• 270 pre-built analytic functions!
• Macros & function plug-in API!
Drag-and-Drop Visualization!
• Blank canvas for design !
• HTML5, consumable on any device!
• Visualization plug-in API!

© 2014 Datameer, Inc. All rights reserved.
Smart Analytics!
Column Dependencies
Decision Tree
Recommendation Engine
Clustering

© 2014 Datameer, Inc. All rights reserved.
Where does Datameer sit?!

© 2014 Datameer, Inc. All rights reserved.
Classic Business Analytics Data Flow !

© 2014 Datameer, Inc. All rights reserved.
New Business Analytics Data Flow!

© 2014 Datameer, Inc. All rights reserved.
Datameer On Premise Installation!

© 2014 Datameer, Inc. All rights reserved.
Datameer Implementation - Cloud!

© 2014 Datameer, Inc. All rights reserved.
!   Stefan Groschupf 
CEO, Co-Founder!
Problem

© 2014 Datameer, Inc. All rights reserved.
Typical Data Analytics Funnel
Raw Data (TB-PB)
Insights (KB)
! More sophisticated
! Less change
! High value
! Power users
! Planned / scheduled
! More ad hoc
! More change
! High & low value
! Casual users
! Interactive sessions
5 - 15 steps,
iterative algorithms
Explore
Summarize
Prepare
Learn
Aggregate
Present
Slice

© 2014 Datameer, Inc. All rights reserved.
Raw Data (TB-PB)
Insights (KB)
Map Reduce
•  Inefﬁcient for small data!
•  High latency!
Current Approaches: Either - Or
Raw Data (<TB)
Insights (KB)
In-Memory
•  Only small data!
•  Very expensive!
•  Not Hadoop!
Not
New

© 2014 Datameer, Inc. All rights reserved.
Small Data, Big Machine
VS

© 2014 Datameer, Inc. All rights reserved.
600h Spent on Jobs < 100MB!

© 2014 Datameer, Inc. All rights reserved.
!   Stefan Groschupf 
CEO, Co-Founder!
Our Solution

© 2014 Datameer, Inc. All rights reserved.
Smart Execution
Raw Data (TB-PB)
Insights (KB)
New
Optimized
MapReduce
In-Memory
Single
Node

© 2014 Datameer, Inc. All rights reserved.
Architecture
Hadoop
MapReduce
Dataﬂow Graph Engine
YARN
Smart Execution Engine
In-Memory
Tez
Others
Data Integration
Visualization
Spreadsheet
Other
(SQL)
Single Node

© 2014 Datameer, Inc. All rights reserved.
Workﬂow
Data Sets
System Resources
Optimized!
MapReduce!
Single Node!
In-Memory!
Future!
Technology!
Analytics

© 2014 Datameer, Inc. All rights reserved.
DAG Processing
vs.!

© 2014 Datameer, Inc. All rights reserved.
Transparent for End Users

Mais conteúdo relacionado

Mais procurados

Critical data center move case study NinthDimension

ESPC14 - T23 - SharePoint Online vs On-Premises vs Hosted - Making the Right ...Rene Modery

CloudHealth Boston PresentationAlert Logic

leave behind flyer-1Gary C Lang, Jr

Insight Facts & FiguresVince Caldwell

ISConvergencemoleyra

How Companies are Using Cloud-Based Data Visualization & Analytics to Transfo...Amazon Web Services

savvyTalent brochureJohn Skeffington

Invertedi ServicesInvertedi Analytics

Laerdal Medical experience with Aurea products - Aurea & Helmes Nordic Semina...Alen Leit

Tips To Create Stronger Business On CloudIntelligentia IT Systems Pvt. Ltd.

Learn NetSuite: Top NetSuite Training Resources For Self-TeachingProtelo, Inc.

Digital Transformation through Product and Service InnovationAmazon Web Services

Office 365 FactSheet-2Stuart Potter

Softchoice overviewTegan Wellington

Moogilu StartupKitJagadish Channagiri

Full-Service NetSuite Team: Implementation, Integration, Training & SupportProtelo, Inc.

AWS Webcast - Tibco JaspersoftAmazon Web Services

The Newgistics Digital Transformation JourneyZenoss

Freeing Minds - Reduce waste, improve efficiencySolarwinds N-able

Mais procurados (20)

Critical data center move case study

ESPC14 - T23 - SharePoint Online vs On-Premises vs Hosted - Making the Right ...

CloudHealth Boston Presentation

leave behind flyer-1

Insight Facts & Figures

ISConvergence

How Companies are Using Cloud-Based Data Visualization & Analytics to Transfo...

savvyTalent brochure

Invertedi Services

Laerdal Medical experience with Aurea products - Aurea & Helmes Nordic Semina...

Tips To Create Stronger Business On Cloud

Learn NetSuite: Top NetSuite Training Resources For Self-Teaching

Digital Transformation through Product and Service Innovation

Office 365 FactSheet-2

Softchoice overview

Moogilu StartupKit

Full-Service NetSuite Team: Implementation, Integration, Training & Support

AWS Webcast - Tibco Jaspersoft

The Newgistics Digital Transformation Journey

Freeing Minds - Reduce waste, improve efficiency

Semelhante a Operating in a Multi-execution Engine Hadoop Environment by Erik Halseth of Datameer

Making Hadoop based analytics simple for everyone to useSwiss Big Data User Group

Zementis hortonworks-webinar-2014-09Hortonworks

Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...DataWorks Summit

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightPrecisely

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman

Conflict in the Cloud – Issues & Solutions for Big DataHalo BI

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightCloudera, Inc.

The Future of Data Management: The Enterprise Data HubCloudera, Inc.

Datameer6 for prospects - june 2016_v2Datameer

Getting Started with Big Data for Business ManagersDatameer

A new platform for a new era emcTaldor Group

Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies

Cloudera Federal Forum 2014: Hadoop's Impact on the Future of Data ManagementCloudera, Inc.

Introducing SpectreDimensional Insight

The Future of Data Management: The Enterprise Data HubCloudera, Inc.

Horses for Courses: Database RoundtableEric Kavanagh

Complement Your Existing Data Warehouse with Big Data & HadoopDatameer

Does Big Data Spell Big Costs- Impetus WebinarImpetus Technologies

Oracle Big Data Appliance and Big Data SQL for advanced analyticsjdijcks

Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.

Semelhante a Operating in a Multi-execution Engine Hadoop Environment by Erik Halseth of Datameer (20)

Making Hadoop based analytics simple for everyone to use

Zementis hortonworks-webinar-2014-09

Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight

Conflict in the Cloud – Issues & Solutions for Big Data

Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight

The Future of Data Management: The Enterprise Data Hub

Datameer6 for prospects - june 2016_v2

Getting Started with Big Data for Business Managers

A new platform for a new era emc

Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...

Cloudera Federal Forum 2014: Hadoop's Impact on the Future of Data Management

Introducing Spectre

The Future of Data Management: The Enterprise Data Hub

Horses for Courses: Database Roundtable

Complement Your Existing Data Warehouse with Big Data & Hadoop

Does Big Data Spell Big Costs- Impetus Webinar

Oracle Big Data Appliance and Big Data SQL for advanced analytics

Building a Modern Analytic Database with Cloudera 5.8

Mais de Data Con LA

Data Con LA 2022 KeynotesData Con LA

Data Con LA 2022 KeynoteData Con LA

Data Con LA 2022 - Startup ShowcaseData Con LA

Data Con LA 2022 KeynoteData Con LA

Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA

Data Con LA 2022 - AI EthicsData Con LA

Data Con LA 2022 - Improving disaster response with machine learningData Con LA

Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA

Data Con LA 2022 - Real world consumer segmentationData Con LA

Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA

Data Con LA 2022 - Moving Data at Scale to AWSData Con LA

Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA

Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA

Data Con LA 2022 - Intro to Data ScienceData Con LA

Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA

Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA

Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA

Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA

Data Con LA 2022 - Data Streaming with KafkaData Con LA

Mais de Data Con LA (20)

Data Con LA 2022 Keynotes

Data Con LA 2022 Keynote

Data Con LA 2022 - Startup Showcase

Data Con LA 2022 Keynote

Data Con LA 2022 - Using Google trends data to build product recommendations

Data Con LA 2022 - AI Ethics

Data Con LA 2022 - Improving disaster response with machine learning

Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas

Data Con LA 2022 - Real world consumer segmentation

Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...

Data Con LA 2022 - Moving Data at Scale to AWS

Data Con LA 2022 - Collaborative Data Exploration using Conversational AI

Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...

Data Con LA 2022 - Intro to Data Science

Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment

Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...

Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...

Data Con LA 2022- Embedding medical journeys with machine learning to improve...

Data Con LA 2022 - Data Streaming with Kafka

Último

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765

2024 April Patch TuesdayIvanti

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

Scale your database traffic with Read & Write split using MySQL RouterMydbops

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Connecting the Dots for Information Discovery.pdfNeo4j

Top 10 Hubspot Development Companies in 2024TopCSSGallery

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

How to write a Business Continuity PlanDatabarracks

Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica

Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Data governance with Unity Catalog PresentationKnoldus Inc.

Operating in a Multi-execution Engine Hadoop Environment by Erik Halseth of Datameer

3. © 2014 Datameer, Inc. All rights reserved. What Datameer Offers! Wizard-led Data Integration! • No ETL! • 59 Connectors + plug-in API! • Smart Sampling! Point-and-click Analytics! • Interactive spreadsheet UI! • 270 pre-built analytic functions! • Macros & function plug-in API! Drag-and-Drop Visualization! • Blank canvas for design ! • HTML5, consumable on any device! • Visualization plug-in API!

11. © 2014 Datameer, Inc. All rights reserved. Typical Data Analytics Funnel Raw Data (TB-PB) Insights (KB) ! More sophisticated ! Less change ! High value ! Power users ! Planned / scheduled ! More ad hoc ! More change ! High & low value ! Casual users ! Interactive sessions 5 - 15 steps, iterative algorithms Explore Summarize Prepare Learn Aggregate Present Slice

12. © 2014 Datameer, Inc. All rights reserved. Raw Data (TB-PB) Insights (KB) Map Reduce •  Inefﬁcient for small data! •  High latency! Current Approaches: Either - Or Raw Data (<TB) Insights (KB) In-Memory •  Only small data! •  Very expensive! •  Not Hadoop! Not New

17. © 2014 Datameer, Inc. All rights reserved. Architecture Hadoop MapReduce Dataﬂow Graph Engine YARN Smart Execution Engine In-Memory Tez Others Data Integration Visualization Spreadsheet Other (SQL) Single Node

21. @Datameer!

Operating in a Multi-execution Engine Hadoop Environment by Erik Halseth of Datameer

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Operating in a Multi-execution Engine Hadoop Environment by Erik Halseth of Datameer

Semelhante a Operating in a Multi-execution Engine Hadoop Environment by Erik Halseth of Datameer (20)

Mais de Data Con LA

Mais de Data Con LA (20)

Último

Último (20)

Operating in a Multi-execution Engine Hadoop Environment by Erik Halseth of Datameer