SlideShare uma empresa Scribd logo
1 de 52
Baixar para ler offline
“Big Data” 
Edgars Ruņģis 
1 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
What is Big Data ? 
Social 
Social 
VOLUME VELOCITY VARIETY 
Copyright © 2013, Oracle and/or its affiliates. 2 All rights reserved. 
BLOG Sensors 
Enormous volumes of real-time 
data streams from internal and 
external sources and historic 
data, hyper-volumes of 
structured, semi-structured and 
unstructured data 
Combine historic data with 
data streams and feeds 
Detect significant events from 
real-time data streams 
Respond automatically to 
detected events by raising 
alerts 
Call Data Records, Social 
Media Traffic, Videos, Audio 
Financial Transactions 
Sensor based data, border 
crossings, airline passenger 
records
Big Data ≈ Hadoop 
Copyright © 2013, Oracle and/or its affiliates. 3 All rights reserved.
Hadoop Can Be Confusing 
Copyright © 2013, Oracle and/or its affiliates. 4 All rights reserved.
What is Hadoop? 
5 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
Hadoop 
The Apache Hadoop software library is a framework that allows for the 
distributed processing of large data sets across clusters of computers 
using simple programming models. Hadoop is designed to scale up from 
single servers to thousands of machines, each offering local 
computation and storage. Rather than rely on hardware to deliver high-availability, 
the library itself is designed to detect and handle failures at 
the application layer, so delivering a highly-available service on top of a 
cluster of computers, each of which may be prone to failures. 
Copyright © 2013, Oracle and/or its affiliates. 6 All rights reserved.
What to Pay Attention To 
 Distributed Storage 
– HDFS 
 Parallel Processing Framework 
– MapReduce 
 Higher-Level Languages 
– Hive 
– Pig 
– Etc. 
7 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
HDFS 
The Distributed Filesystem 
 What is it? 
 Benefits 
 Limitations 
8 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
The petabyte-scale distributed file system at 
the core of Hadoop. 
 Linearly-scalable on commodity hardware 
 An order of magnitude cheaper per TB 
 Designed around schema-on-read 
 Low security 
 Write-once, read-many model
Interacting with HDFS 
 NameNodes and DataNodes 
– NameNodes contain edits and organization 
– DataNodes store data 
 Command-line access resembles UNIX filesystems 
– ls (list) 
– cat, tail (concatenate or tail file) 
– cp, mv (copy or move within HDFS) 
– get, put (copy between local file system and HDFS) 
9 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
HDFS Mechanics 
DataNode DataNode 
10 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
DataNode 
DataNode 
DataNode DataNode 
Suppose we have a large file 
And a set of DataNodes
HDFS Mechanics 
• The file will be broken up into blocks 
• Blocks are stored in multiple locations 
• Allows for parallelism and fault-tolerance 
• Nodes operate on their local data 
DataNode DataNode 
11 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
DataNode 
DataNode 
DataNode DataNode
MapReduce 
The Parallel Processing Framework 
 What is it? 
 Benefits 
 Limitations 
12 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
The parallel processing framework that 
dominates the Big Data landscape. 
 Provides data-local computation 
 Fault-tolerant 
 Scales just like HDFS 
 You are the optimizer 
 Batch-oriented
MapReduce Mechanics 
Suppose 3 face cards are 
removed. 
How do we find which suits 
are short using 
MapReduce? 
13 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
MapReduce Mechanics 
Map Phase: 
Each TaskTracker has some data local to it. 
Map tasks operate on this local data. 
If face_card: emit(suit, card) 
14 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
TaskTracker/DataNode 
TaskTracker/DataNode 
TaskTracker/DataNode 
TaskTracker/DataNode
MapReduce Mechanics 
Shuffle/Sort: 
Intermediate data is shuffled and sorted for delivery to the reduce tasks 
15 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
Sort 
To Reducers
MapReduce Mechanics 
Reduce Phase: 
Reducers operate on local data to produce final result 
Emit:key, count(key) 
TaskTracker TaskTracker TaskTracker TaskTracker 
Spades: 3 Hearts: 2 Diamonds: 2 Clubs: 2 
16 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
Flow of key/values pairs 
17 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
Map 
Input 
K0,v0 
Output 
K1,V1 
Reduce 
Input 
K1,list(V) 
Output 
K2,V2
The default way to process data into Hadoop 
18 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
SQL style! 
What should I do if I don’t know Java but still want to process data into HDFS? 
19 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
Application 
NoSQL DB Driver 
NoSQL 
HDFS + MapReduce = Hadoop 
BigData 
Hadoop 
Use Hive!
Hive 
A move toward declarative language 
 What is it? 
 Benefits 
 Limitations 
20 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
A SQL-like language for Hadoop. 
 Abstracts MapReduce code 
 Schema-on-read via InputFormat and SerDe 
 Provides and preserves metatdata 
 Not ideal for ad hoc work (slow) 
 Subset of SQL-92 
 Immature optimizer
Storing a Clickstream 
21 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
 Storing large amounts of 
clickstream data is a 
common use for HDFS 
 Individual clicks aren’t 
valuable by them selves 
 We’d like to write queries 
over all clicks
Defining Tables Over HDFS 
22 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
 Hive allows us to define 
tables over HDFS 
directories 
 The syntax is simple SQL 
 SerDes allow Hive to 
deserialize data
How Does It Work 
Anatomy of a Hive Query 
23 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
SELECT suit, COUNT(*) 
FROM cards 
WHERE face_value > 10 
GROUP BY suit; 
How does Hive execute 
this query?
Anatomy of a Hive Query 
SELECT suit, COUNT(*) 
FROM cards 
WHERE face_value > 10 
GROUP BY suit; 
24 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
1. Hive optimizer builds a MapReduce Job 
2. Projections and predicates 
become Map code 
3. Aggregations become Reduce code 
4. Job is submitted to 
MapReduce JobTracker 
Map task 
If face_card: 
emit(suit, 
card) 
Reduce task 
emit(suit, 
count(suit)) 
Shuffle
Hadoop Programming - Summary 
• HDFS - Hadoop Distributed File System 
– Designed to achieve SCALE and aggregate THROUGHPUT on commodity hardware 
– Not a database; Data remains in its original format on disk (the query engine does NOT own 
25 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
the data format) 
• MapReduce 
– Simple programming model for large scale data processing (NOT performance !!!) 
– Checkpoints to disk for fault tolerance 
– Leverages InputFormat, RecordReader and SerDe 
– Leverages multiple copies of data (speculative execution) 
• Various higher level languages 
– Hive (SQL implemented as MapReduce) 
– Pig - Scripting Languages like Python 
25
Impala 
26 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
What is Impala ? 
• Massive parallel processing (MPP) database engine, developed by Cloudera 
• Integrated into Hadoop stack on the same level as MapReduce, and not 
above it (as Hive and Pig) 
• Impala process data in Hadoop cluster without using MapReduce 
27 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
HDFS 
Pig Hive 
Map Reduce 
Impala
Impala Architecture 
28 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
My query needs to go Faster 
• Impala, Spark, Stinger, Shark, Tajo, Presto a whole bunch 
more 
– Remove MapReduce and check pointing to disk 
– Sacrifice resilience and scale for performance 
– Limited SQL capability and access paths 
– Create a (new) SQL Database on top of HDFS 
– Create and load data into optimized storage formats to optimize 
performance 
29 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
How to load data in Hadoop ? 
30 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
Hadoop command-line 
• Source server should to have Hadoop client program. 
• File may be loaded by follow command hadoop fs –put 
source.file /tmp/hadoop_dir 
31 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
Load data into Hadoop from RDBMS (Sqoop) 
32 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
Acquire stream data. Flume 
33 Copyright © 2012, Oracle and/or its affiliates. All rights 
http://archive.cloudera.com/cdh4/cdh/4/flume-ng/FlumeUserGuide.html 
reserved.
Flume for collecting Twitter feeds 
34 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
Flume – twitter collecting 
http://archive.cloudera.com/cdh4/cdh/4/flume-ng/FlumeUserGuide.html
Hadoop distributives 
35 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
Cloudera 
Copyright © 2013, Oracle and/or its affiliates. 37 All rights reserved. 
37
Hortonworks 
Copyright © 2013, Oracle and/or its affiliates. 39 All rights reserved.
Big Data and Oracle 
40 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
Oracle Big Data Solution 
Oracle Real-Time 
Decisions 
Cloudera 
Hadoop 
Oracle Event 
Processing Oracle Big Data 
Apache 
Flume 
Oracle 
GoldenGate 
41 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
Oracle BI 
Foundation Suite 
Decide 
Endeca Information 
Discovery 
Connectors 
Oracle Data 
Integrator 
Oracle 
Database 
Oracle 
Advanced 
Analytics 
Oracle 
Spatial 
& Graph 
Oracle 
NoSQL 
Database 
Oracle R 
Distribution 
Stream Acquire – Organize – Analyze
Oracle Big Data Connectors 
Licensed Together 
• Oracle Loader for Hadoop 
• Oracle SQL Connector for HDFS 
• Oracle R Advanced Analytics for Hadoop 
• Oracle XQuery for Hadoop 
• Oracle Data Integrator Application Adapters for Hadoop 
42 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
Oracle Loader for Hadoop 
INPUT 
43 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
SHUFFLE 
/SORT 
SHUFFLE 
/SORT 
MAP 
MAP 
MAP 
MAP 
SHUFFLE 
/SORT 
REDUCE 
REDUCE 
1 
INPUT 
2 
MAP 
MAP 
MAP 
MAP 
MAP 
REDUCE 
REDUCE 
MAP 
MAP 
MAP 
MAP 
MAP 
REDUCE 
REDUCE 
REDUCE 
Load data from Hadoop 
into Oracle Database 
Oracle Database 
Unstructured Data 
REDUCE 
Local Oracle table
Oracle SQL Connector for HDFS 
Use Oracle SQL to Access Data on HDFS 
Generate external table in 
database pointing to HDFS data 
Load into database or query 
data in place on HDFS 
Fine-grained control over data 
type mapping 
Parallel load with automatic 
load balancing 
Kerberos authentication 
44 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
Access or load into the 
database in parallel using 
external table mechanism 
External 
Table 
OSCH 
OSCH 
OSCH 
SQL Query 
HDFS 
Client 
Hadoop 
Oracle Database 
OSCH
45 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
45 
New Data Sources for Oracle External Tables 
CREATE TABLE movielog 
(click VARCHAR2(4000)) 
ORGANIZATION EXTERNAL 
 ( TYPE ORACLE_HIVE 
DEFAULT DIRECTORY Dir1 
ACCESS PARAMETERS 
( 
com.oracle.bigdata.tablename logs 
com.oracle.bigdata.cluster 
mycluster) 
) 
REJECT LIMIT UNLIMITED 
• New set of properties 
– ORACLE_HIVE and ORACLE_HDFS access drivers 
– Identify a Hadoop cluster, data source, column mapping, error 
handling, overflow handling, logging 
• New table metadata passed from Oracle DDL to Hadoop 
readers at query execution 
• Architected for extensibility 
– StorageHandler capability enables future support for other 
data sources 
– Examples: MongoDB, Hbase, Oracle NoSQL DB
Oracle SQL Connector for HDFS 
• Load data from external table with Oracle SQL 
– INSERT INTO <tablename> AS SELECT * FROM <external tablename> 
• Access data in-place on HDFS with Oracle SQL 
– Note: No indexes, no partitioning, so queries are a full table scan 
• Read data in parallel 
– Ex: If there are 96 data files and the database can support 96 PQ slaves, all 96 files can 
be read in parallel 
47 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
Oracle SQL Connector for HDFS 
Input Data Formats 
• Text files 
• Hive tables (text data) 
• Oracle Data Pump files generated by Oracle Loader for 
Hadoop 
48 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
Oracle SQL Connector for HDFS 
Data Pump Files 
• Oracle Data Pump: Binary format data file 
• Load of Oracle Data Pump files is more efficient – uses about 50% 
less database CPU 
– Hadoop does more of the work, transforming text data into binary data 
optimized for Oracle 
Note: Only Oracle Data Pump files generated by Oracle Loader 
for Hadoop 
49 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
Using Hadoop To Optimize IT 
50 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
Big Data and Optimized Operations 
• Big Data can handle a lot of heavy lifting 
– It’s a complement to the database 
• Big Data allows access to more detail data for less 
• We can use Big Data to make the database do more 
51 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
Optimizing ETL 
Load to 
Oracle 
Copyright © 2013, Oracle and/or its affiliates. 52 All rights reserved. 
Mission 
Critical 
Reporting 
Ad Hoc 
Analysis 
Long-running 
batch 
transformation 
Big Data Problem 
Base Table 
Copy/Move 
Base Table to 
Hadoop 
Long-running 
batch 
transformation
53 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved. 
Q&A
54 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.
55 Copyright © 2012, Oracle and/or its affiliates. All rights 
reserved.

Mais conteúdo relacionado

Mais procurados

HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionDataWorks Summit
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopMike Pittaro
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1Giovanna Roda
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopRan Ziv
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovVasil Remeniuk
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
 
field_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahofield_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahoMartin Ferguson
 

Mais procurados (20)

HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
field_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahofield_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentaho
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
Enabling R on Hadoop
Enabling R on HadoopEnabling R on Hadoop
Enabling R on Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
6.hive
6.hive6.hive
6.hive
 

Destaque

Massively Parallel Processing with Procedural Python (PyData London 2014)
Massively Parallel Processing with Procedural Python (PyData London 2014)Massively Parallel Processing with Procedural Python (PyData London 2014)
Massively Parallel Processing with Procedural Python (PyData London 2014)Ian Huston
 
Oracle database 12c new features
Oracle database 12c new featuresOracle database 12c new features
Oracle database 12c new featuresJakkrapat S.
 
DBA to Data Scientist
DBA to Data ScientistDBA to Data Scientist
DBA to Data Scientistpasalapudi
 
servidores+de+contenidos
servidores+de+contenidosservidores+de+contenidos
servidores+de+contenidosLissml
 
AID Pordenone - Livescribe Smartpen
AID Pordenone - Livescribe SmartpenAID Pordenone - Livescribe Smartpen
AID Pordenone - Livescribe SmartpenGianandrea Poracin
 
It – a career, a life, sweat, smiles and cries – what is it
It – a career, a life, sweat, smiles and cries – what is itIt – a career, a life, sweat, smiles and cries – what is it
It – a career, a life, sweat, smiles and cries – what is itAndrei Postolache
 
Interstage BPM 2011
Interstage BPM 2011Interstage BPM 2011
Interstage BPM 2011Gordon Folz
 
Competing Against Free
Competing Against FreeCompeting Against Free
Competing Against FreeSameer Mathur
 
Christmas Can’t Get Any Bigger Than This: Vol 1
Christmas Can’t Get Any Bigger Than This: Vol 1Christmas Can’t Get Any Bigger Than This: Vol 1
Christmas Can’t Get Any Bigger Than This: Vol 1Abhishek Shah
 
Catálogo de ofertas BEEP Julio 2015
Catálogo de ofertas BEEP Julio 2015Catálogo de ofertas BEEP Julio 2015
Catálogo de ofertas BEEP Julio 2015Beep Informática
 
Everything is Marketing: Insights from the internship
Everything is Marketing: Insights from the internshipEverything is Marketing: Insights from the internship
Everything is Marketing: Insights from the internshipSameer Mathur
 
Paradigmas educacion superior
Paradigmas educacion superiorParadigmas educacion superior
Paradigmas educacion superiorcrojas6
 
Padrões de deploy para DevOps e Entrega Contínua, por Danilo Sato
Padrões de deploy para DevOps e Entrega Contínua, por Danilo SatoPadrões de deploy para DevOps e Entrega Contínua, por Danilo Sato
Padrões de deploy para DevOps e Entrega Contínua, por Danilo SatoThoughtworks
 
High Bandwidth suspention modelling and Design LQR Full state Feedback Contro...
High Bandwidth suspention modelling and Design LQR Full state Feedback Contro...High Bandwidth suspention modelling and Design LQR Full state Feedback Contro...
High Bandwidth suspention modelling and Design LQR Full state Feedback Contro...Idabagus Mahartana
 
Camino de santiago
Camino de santiagoCamino de santiago
Camino de santiagoPeter Eich
 

Destaque (20)

Massively Parallel Processing with Procedural Python (PyData London 2014)
Massively Parallel Processing with Procedural Python (PyData London 2014)Massively Parallel Processing with Procedural Python (PyData London 2014)
Massively Parallel Processing with Procedural Python (PyData London 2014)
 
Oracle database 12c new features
Oracle database 12c new featuresOracle database 12c new features
Oracle database 12c new features
 
DBA to Data Scientist
DBA to Data ScientistDBA to Data Scientist
DBA to Data Scientist
 
servidores+de+contenidos
servidores+de+contenidosservidores+de+contenidos
servidores+de+contenidos
 
D2
D2D2
D2
 
AID Pordenone - Livescribe Smartpen
AID Pordenone - Livescribe SmartpenAID Pordenone - Livescribe Smartpen
AID Pordenone - Livescribe Smartpen
 
Catálogo BEEP Abril 2015
Catálogo BEEP Abril 2015Catálogo BEEP Abril 2015
Catálogo BEEP Abril 2015
 
It – a career, a life, sweat, smiles and cries – what is it
It – a career, a life, sweat, smiles and cries – what is itIt – a career, a life, sweat, smiles and cries – what is it
It – a career, a life, sweat, smiles and cries – what is it
 
Interstage BPM 2011
Interstage BPM 2011Interstage BPM 2011
Interstage BPM 2011
 
Competing Against Free
Competing Against FreeCompeting Against Free
Competing Against Free
 
Out5 Cocoa
Out5 CocoaOut5 Cocoa
Out5 Cocoa
 
Christmas Can’t Get Any Bigger Than This: Vol 1
Christmas Can’t Get Any Bigger Than This: Vol 1Christmas Can’t Get Any Bigger Than This: Vol 1
Christmas Can’t Get Any Bigger Than This: Vol 1
 
Catálogo de ofertas BEEP Julio 2015
Catálogo de ofertas BEEP Julio 2015Catálogo de ofertas BEEP Julio 2015
Catálogo de ofertas BEEP Julio 2015
 
Everything is Marketing: Insights from the internship
Everything is Marketing: Insights from the internshipEverything is Marketing: Insights from the internship
Everything is Marketing: Insights from the internship
 
Bendita tu luz
Bendita tu luzBendita tu luz
Bendita tu luz
 
Paradigmas educacion superior
Paradigmas educacion superiorParadigmas educacion superior
Paradigmas educacion superior
 
Padrões de deploy para DevOps e Entrega Contínua, por Danilo Sato
Padrões de deploy para DevOps e Entrega Contínua, por Danilo SatoPadrões de deploy para DevOps e Entrega Contínua, por Danilo Sato
Padrões de deploy para DevOps e Entrega Contínua, por Danilo Sato
 
High Bandwidth suspention modelling and Design LQR Full state Feedback Contro...
High Bandwidth suspention modelling and Design LQR Full state Feedback Contro...High Bandwidth suspention modelling and Design LQR Full state Feedback Contro...
High Bandwidth suspention modelling and Design LQR Full state Feedback Contro...
 
Rock
RockRock
Rock
 
Camino de santiago
Camino de santiagoCamino de santiago
Camino de santiago
 

Semelhante a Big data overview by Edgars

Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop GuideSimplilearn
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Hadoop - A Very Short Introduction
Hadoop - A Very Short IntroductionHadoop - A Very Short Introduction
Hadoop - A Very Short Introductiondewang_mistry
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialRoxycodone Online
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Hadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons LearnedHadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons LearnedCloudera, Inc.
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 

Semelhante a Big data overview by Edgars (20)

Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Hadoop - A Very Short Introduction
Hadoop - A Very Short IntroductionHadoop - A Very Short Introduction
Hadoop - A Very Short Introduction
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study Material
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Hadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons LearnedHadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons Learned
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 

Mais de Andrejs Vorobjovs

Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my dataAndrejs Vorobjovs
 
Maksims Greckis - Trace File Analyzer
Maksims Greckis - Trace File Analyzer  Maksims Greckis - Trace File Analyzer
Maksims Greckis - Trace File Analyzer Andrejs Vorobjovs
 
Aleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDAAleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDAAndrejs Vorobjovs
 
My two cents about Mysql backup
My two cents about Mysql backupMy two cents about Mysql backup
My two cents about Mysql backupAndrejs Vorobjovs
 
Middleware upgrade to Oracle Fusion Middleware(FMW) 12c.Real Case stories.
Middleware upgrade to Oracle Fusion Middleware(FMW) 12c.Real Case stories. Middleware upgrade to Oracle Fusion Middleware(FMW) 12c.Real Case stories.
Middleware upgrade to Oracle Fusion Middleware(FMW) 12c.Real Case stories. Andrejs Vorobjovs
 
OTN tour 2015 press release in Russian
OTN tour 2015 press release in RussianOTN tour 2015 press release in Russian
OTN tour 2015 press release in RussianAndrejs Vorobjovs
 
OTN tour 2015 benchmarking oracle io performance with Orion by Alex Gorbachev
OTN tour 2015 benchmarking oracle io performance with Orion by Alex GorbachevOTN tour 2015 benchmarking oracle io performance with Orion by Alex Gorbachev
OTN tour 2015 benchmarking oracle io performance with Orion by Alex GorbachevAndrejs Vorobjovs
 
OTN tour 2015 Oracle Enterprise Manager 12c – Proof of Concept
OTN tour 2015 Oracle Enterprise Manager 12c – Proof of ConceptOTN tour 2015 Oracle Enterprise Manager 12c – Proof of Concept
OTN tour 2015 Oracle Enterprise Manager 12c – Proof of ConceptAndrejs Vorobjovs
 
OTN tour Oracle db Cloud by Alex Gorbachev
OTN tour Oracle db Cloud by Alex GorbachevOTN tour Oracle db Cloud by Alex Gorbachev
OTN tour Oracle db Cloud by Alex GorbachevAndrejs Vorobjovs
 
OTN tour 2015 Experience in implementing SSL between oracle db and oracle cli...
OTN tour 2015 Experience in implementing SSL between oracle db and oracle cli...OTN tour 2015 Experience in implementing SSL between oracle db and oracle cli...
OTN tour 2015 Experience in implementing SSL between oracle db and oracle cli...Andrejs Vorobjovs
 
OTN tour 2015 AWR data mining
OTN tour 2015 AWR data miningOTN tour 2015 AWR data mining
OTN tour 2015 AWR data miningAndrejs Vorobjovs
 

Mais de Andrejs Vorobjovs (20)

Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my data
 
Maksims Greckis - Trace File Analyzer
Maksims Greckis - Trace File Analyzer  Maksims Greckis - Trace File Analyzer
Maksims Greckis - Trace File Analyzer
 
Aleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDAAleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDA
 
LVOUG meetup #18
LVOUG meetup #18LVOUG meetup #18
LVOUG meetup #18
 
LVOUG meetup #17
LVOUG meetup #17LVOUG meetup #17
LVOUG meetup #17
 
My two cents about Mysql backup
My two cents about Mysql backupMy two cents about Mysql backup
My two cents about Mysql backup
 
LVOUG meetup #16
LVOUG meetup #16LVOUG meetup #16
LVOUG meetup #16
 
Middleware upgrade to Oracle Fusion Middleware(FMW) 12c.Real Case stories.
Middleware upgrade to Oracle Fusion Middleware(FMW) 12c.Real Case stories. Middleware upgrade to Oracle Fusion Middleware(FMW) 12c.Real Case stories.
Middleware upgrade to Oracle Fusion Middleware(FMW) 12c.Real Case stories.
 
Top 15 MySQL parameters
Top 15 MySQL parameters Top 15 MySQL parameters
Top 15 MySQL parameters
 
Riga Dev Day vestule
Riga Dev Day vestuleRiga Dev Day vestule
Riga Dev Day vestule
 
Rdd2016 featured talks
Rdd2016 featured talksRdd2016 featured talks
Rdd2016 featured talks
 
Rdd2016 flyer
Rdd2016 flyerRdd2016 flyer
Rdd2016 flyer
 
meetup #15
meetup #15meetup #15
meetup #15
 
OTN tour 2015 press release in Russian
OTN tour 2015 press release in RussianOTN tour 2015 press release in Russian
OTN tour 2015 press release in Russian
 
OTN tour 2015, 100miles
OTN tour 2015, 100milesOTN tour 2015, 100miles
OTN tour 2015, 100miles
 
OTN tour 2015 benchmarking oracle io performance with Orion by Alex Gorbachev
OTN tour 2015 benchmarking oracle io performance with Orion by Alex GorbachevOTN tour 2015 benchmarking oracle io performance with Orion by Alex Gorbachev
OTN tour 2015 benchmarking oracle io performance with Orion by Alex Gorbachev
 
OTN tour 2015 Oracle Enterprise Manager 12c – Proof of Concept
OTN tour 2015 Oracle Enterprise Manager 12c – Proof of ConceptOTN tour 2015 Oracle Enterprise Manager 12c – Proof of Concept
OTN tour 2015 Oracle Enterprise Manager 12c – Proof of Concept
 
OTN tour Oracle db Cloud by Alex Gorbachev
OTN tour Oracle db Cloud by Alex GorbachevOTN tour Oracle db Cloud by Alex Gorbachev
OTN tour Oracle db Cloud by Alex Gorbachev
 
OTN tour 2015 Experience in implementing SSL between oracle db and oracle cli...
OTN tour 2015 Experience in implementing SSL between oracle db and oracle cli...OTN tour 2015 Experience in implementing SSL between oracle db and oracle cli...
OTN tour 2015 Experience in implementing SSL between oracle db and oracle cli...
 
OTN tour 2015 AWR data mining
OTN tour 2015 AWR data miningOTN tour 2015 AWR data mining
OTN tour 2015 AWR data mining
 

Último

4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 

Último (20)

4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 

Big data overview by Edgars

  • 1. “Big Data” Edgars Ruņģis 1 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 2. What is Big Data ? Social Social VOLUME VELOCITY VARIETY Copyright © 2013, Oracle and/or its affiliates. 2 All rights reserved. BLOG Sensors Enormous volumes of real-time data streams from internal and external sources and historic data, hyper-volumes of structured, semi-structured and unstructured data Combine historic data with data streams and feeds Detect significant events from real-time data streams Respond automatically to detected events by raising alerts Call Data Records, Social Media Traffic, Videos, Audio Financial Transactions Sensor based data, border crossings, airline passenger records
  • 3. Big Data ≈ Hadoop Copyright © 2013, Oracle and/or its affiliates. 3 All rights reserved.
  • 4. Hadoop Can Be Confusing Copyright © 2013, Oracle and/or its affiliates. 4 All rights reserved.
  • 5. What is Hadoop? 5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 6. Hadoop The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. Copyright © 2013, Oracle and/or its affiliates. 6 All rights reserved.
  • 7. What to Pay Attention To  Distributed Storage – HDFS  Parallel Processing Framework – MapReduce  Higher-Level Languages – Hive – Pig – Etc. 7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 8. HDFS The Distributed Filesystem  What is it?  Benefits  Limitations 8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. The petabyte-scale distributed file system at the core of Hadoop.  Linearly-scalable on commodity hardware  An order of magnitude cheaper per TB  Designed around schema-on-read  Low security  Write-once, read-many model
  • 9. Interacting with HDFS  NameNodes and DataNodes – NameNodes contain edits and organization – DataNodes store data  Command-line access resembles UNIX filesystems – ls (list) – cat, tail (concatenate or tail file) – cp, mv (copy or move within HDFS) – get, put (copy between local file system and HDFS) 9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 10. HDFS Mechanics DataNode DataNode 10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. DataNode DataNode DataNode DataNode Suppose we have a large file And a set of DataNodes
  • 11. HDFS Mechanics • The file will be broken up into blocks • Blocks are stored in multiple locations • Allows for parallelism and fault-tolerance • Nodes operate on their local data DataNode DataNode 11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. DataNode DataNode DataNode DataNode
  • 12. MapReduce The Parallel Processing Framework  What is it?  Benefits  Limitations 12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. The parallel processing framework that dominates the Big Data landscape.  Provides data-local computation  Fault-tolerant  Scales just like HDFS  You are the optimizer  Batch-oriented
  • 13. MapReduce Mechanics Suppose 3 face cards are removed. How do we find which suits are short using MapReduce? 13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 14. MapReduce Mechanics Map Phase: Each TaskTracker has some data local to it. Map tasks operate on this local data. If face_card: emit(suit, card) 14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. TaskTracker/DataNode TaskTracker/DataNode TaskTracker/DataNode TaskTracker/DataNode
  • 15. MapReduce Mechanics Shuffle/Sort: Intermediate data is shuffled and sorted for delivery to the reduce tasks 15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Sort To Reducers
  • 16. MapReduce Mechanics Reduce Phase: Reducers operate on local data to produce final result Emit:key, count(key) TaskTracker TaskTracker TaskTracker TaskTracker Spades: 3 Hearts: 2 Diamonds: 2 Clubs: 2 16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 17. Flow of key/values pairs 17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Map Input K0,v0 Output K1,V1 Reduce Input K1,list(V) Output K2,V2
  • 18. The default way to process data into Hadoop 18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 19. SQL style! What should I do if I don’t know Java but still want to process data into HDFS? 19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Application NoSQL DB Driver NoSQL HDFS + MapReduce = Hadoop BigData Hadoop Use Hive!
  • 20. Hive A move toward declarative language  What is it?  Benefits  Limitations 20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. A SQL-like language for Hadoop.  Abstracts MapReduce code  Schema-on-read via InputFormat and SerDe  Provides and preserves metatdata  Not ideal for ad hoc work (slow)  Subset of SQL-92  Immature optimizer
  • 21. Storing a Clickstream 21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.  Storing large amounts of clickstream data is a common use for HDFS  Individual clicks aren’t valuable by them selves  We’d like to write queries over all clicks
  • 22. Defining Tables Over HDFS 22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.  Hive allows us to define tables over HDFS directories  The syntax is simple SQL  SerDes allow Hive to deserialize data
  • 23. How Does It Work Anatomy of a Hive Query 23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. SELECT suit, COUNT(*) FROM cards WHERE face_value > 10 GROUP BY suit; How does Hive execute this query?
  • 24. Anatomy of a Hive Query SELECT suit, COUNT(*) FROM cards WHERE face_value > 10 GROUP BY suit; 24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1. Hive optimizer builds a MapReduce Job 2. Projections and predicates become Map code 3. Aggregations become Reduce code 4. Job is submitted to MapReduce JobTracker Map task If face_card: emit(suit, card) Reduce task emit(suit, count(suit)) Shuffle
  • 25. Hadoop Programming - Summary • HDFS - Hadoop Distributed File System – Designed to achieve SCALE and aggregate THROUGHPUT on commodity hardware – Not a database; Data remains in its original format on disk (the query engine does NOT own 25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. the data format) • MapReduce – Simple programming model for large scale data processing (NOT performance !!!) – Checkpoints to disk for fault tolerance – Leverages InputFormat, RecordReader and SerDe – Leverages multiple copies of data (speculative execution) • Various higher level languages – Hive (SQL implemented as MapReduce) – Pig - Scripting Languages like Python 25
  • 26. Impala 26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 27. What is Impala ? • Massive parallel processing (MPP) database engine, developed by Cloudera • Integrated into Hadoop stack on the same level as MapReduce, and not above it (as Hive and Pig) • Impala process data in Hadoop cluster without using MapReduce 27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. HDFS Pig Hive Map Reduce Impala
  • 28. Impala Architecture 28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 29. My query needs to go Faster • Impala, Spark, Stinger, Shark, Tajo, Presto a whole bunch more – Remove MapReduce and check pointing to disk – Sacrifice resilience and scale for performance – Limited SQL capability and access paths – Create a (new) SQL Database on top of HDFS – Create and load data into optimized storage formats to optimize performance 29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 30. How to load data in Hadoop ? 30 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 31. Hadoop command-line • Source server should to have Hadoop client program. • File may be loaded by follow command hadoop fs –put source.file /tmp/hadoop_dir 31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 32. Load data into Hadoop from RDBMS (Sqoop) 32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 33. Acquire stream data. Flume 33 Copyright © 2012, Oracle and/or its affiliates. All rights http://archive.cloudera.com/cdh4/cdh/4/flume-ng/FlumeUserGuide.html reserved.
  • 34. Flume for collecting Twitter feeds 34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Flume – twitter collecting http://archive.cloudera.com/cdh4/cdh/4/flume-ng/FlumeUserGuide.html
  • 35. Hadoop distributives 35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 36. Cloudera Copyright © 2013, Oracle and/or its affiliates. 37 All rights reserved. 37
  • 37. Hortonworks Copyright © 2013, Oracle and/or its affiliates. 39 All rights reserved.
  • 38. Big Data and Oracle 40 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 39. Oracle Big Data Solution Oracle Real-Time Decisions Cloudera Hadoop Oracle Event Processing Oracle Big Data Apache Flume Oracle GoldenGate 41 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle BI Foundation Suite Decide Endeca Information Discovery Connectors Oracle Data Integrator Oracle Database Oracle Advanced Analytics Oracle Spatial & Graph Oracle NoSQL Database Oracle R Distribution Stream Acquire – Organize – Analyze
  • 40. Oracle Big Data Connectors Licensed Together • Oracle Loader for Hadoop • Oracle SQL Connector for HDFS • Oracle R Advanced Analytics for Hadoop • Oracle XQuery for Hadoop • Oracle Data Integrator Application Adapters for Hadoop 42 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 41. Oracle Loader for Hadoop INPUT 43 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. SHUFFLE /SORT SHUFFLE /SORT MAP MAP MAP MAP SHUFFLE /SORT REDUCE REDUCE 1 INPUT 2 MAP MAP MAP MAP MAP REDUCE REDUCE MAP MAP MAP MAP MAP REDUCE REDUCE REDUCE Load data from Hadoop into Oracle Database Oracle Database Unstructured Data REDUCE Local Oracle table
  • 42. Oracle SQL Connector for HDFS Use Oracle SQL to Access Data on HDFS Generate external table in database pointing to HDFS data Load into database or query data in place on HDFS Fine-grained control over data type mapping Parallel load with automatic load balancing Kerberos authentication 44 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Access or load into the database in parallel using external table mechanism External Table OSCH OSCH OSCH SQL Query HDFS Client Hadoop Oracle Database OSCH
  • 43. 45 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 45 New Data Sources for Oracle External Tables CREATE TABLE movielog (click VARCHAR2(4000)) ORGANIZATION EXTERNAL  ( TYPE ORACLE_HIVE DEFAULT DIRECTORY Dir1 ACCESS PARAMETERS ( com.oracle.bigdata.tablename logs com.oracle.bigdata.cluster mycluster) ) REJECT LIMIT UNLIMITED • New set of properties – ORACLE_HIVE and ORACLE_HDFS access drivers – Identify a Hadoop cluster, data source, column mapping, error handling, overflow handling, logging • New table metadata passed from Oracle DDL to Hadoop readers at query execution • Architected for extensibility – StorageHandler capability enables future support for other data sources – Examples: MongoDB, Hbase, Oracle NoSQL DB
  • 44. Oracle SQL Connector for HDFS • Load data from external table with Oracle SQL – INSERT INTO <tablename> AS SELECT * FROM <external tablename> • Access data in-place on HDFS with Oracle SQL – Note: No indexes, no partitioning, so queries are a full table scan • Read data in parallel – Ex: If there are 96 data files and the database can support 96 PQ slaves, all 96 files can be read in parallel 47 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 45. Oracle SQL Connector for HDFS Input Data Formats • Text files • Hive tables (text data) • Oracle Data Pump files generated by Oracle Loader for Hadoop 48 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 46. Oracle SQL Connector for HDFS Data Pump Files • Oracle Data Pump: Binary format data file • Load of Oracle Data Pump files is more efficient – uses about 50% less database CPU – Hadoop does more of the work, transforming text data into binary data optimized for Oracle Note: Only Oracle Data Pump files generated by Oracle Loader for Hadoop 49 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 47. Using Hadoop To Optimize IT 50 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 48. Big Data and Optimized Operations • Big Data can handle a lot of heavy lifting – It’s a complement to the database • Big Data allows access to more detail data for less • We can use Big Data to make the database do more 51 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 49. Optimizing ETL Load to Oracle Copyright © 2013, Oracle and/or its affiliates. 52 All rights reserved. Mission Critical Reporting Ad Hoc Analysis Long-running batch transformation Big Data Problem Base Table Copy/Move Base Table to Hadoop Long-running batch transformation
  • 50. 53 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Q&A
  • 51. 54 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 52. 55 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.