SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Big Data Analysis Tools & Methods
Spring 2015
OCCC - Tehran
Personal Profile:
●
Ehsan Derakhshan
●
Founder & CEO at innfinision Cloud & BigData Solutions
●
More than 15 year experience (Telecom & Datacom)
●
Ehsan.derakhshan@innfinision.net
●
Innfinision.net
About innfinision:
●
Providing Cloud, Virtualization and Data Center Solutions
●
BigData Management - Analysis & Development Solutions
●
Developing Software for Cloud Environments
●
Providing Services to Telecom, Education, Banking & more...
●
Supporting OpenStack Foundation as the First Iranian Company
●
Partner of : Docker - MongoDB - RedHat
BigData Analysis Tools & Methods innfinision.net
●
What is Data & BigData?
●
Important Questions
●
Tools & Solutions
●
Advantages - Why & Where
Agenda:
What is Data & BigData ?
innfinision.netBigData Analysis Tools & Methods
innfinision.netBigData Analysis Tools & Methods
What is Data?
Data is a collection of facts, such as numbers, words, measurements, observations or
even just descriptions of things.
Data can exist in a variety of forms -- as numbers or text on pieces of paper, as bits and
bytes stored in electronic memory, or as facts stored in a person's mind. Strictly
speaking, data is the plural of datum, a single piece of information.
Big data can be described by the following characteristics:
1- Volume
2- Velocity
3- Variety
4- Variability
5- Veracity
6- Complexity
7- & etc
Of information assets that demand cost-effective, innovative forms of information
processing for enhanced insight and decision making
innfinision.netBigData Analysis Tools & Methods
Important Questions
innfinision.netBigData Analysis Tools & Methods
Important Question:
Can a database really deliver quantifiable business advantage?
To some, the database is a low-level infrastructure component of a much larger
application -- something that only developers, DBAs and operations staff need to
care or worry about.
However, in the digital economy, data is the raw currency. How an organization
stores, manages, analyzes and uses data has a direct impact on its success -- and its
costs. Its choice of database affects how quickly it can deliver new applications to
market, support business growth and improve customer experience.
innfinision.netBigData Analysis Tools & Methods
Consider these examples:
- After trying for eight years to build a single view of their customer, one of the
world's leading insurance companies changed database and delivered the project
in just three months
- A leading telecommunications provider adopted a new database technology and
were able to accelerate time to market by 4x, reduce engineering costs by 50%
and improve customer experience by 10x
- A Tier 1 investment bank rebuilt its globally-distributed reference data platform
on a new database technology, enabling it to save an estimated $40M over five
years
Singles can now find their ideal partner 95% faster after one of the world’s leading
relationship providers switched data and machine learning to a new platform
innfinision.netBigData Analysis Tools & Methods
innfinision.netBigData Analysis Tools & Methods
So Why is database selection becoming so critical?
Because the requirements of modern applications and the demands of
sophisticated, data-savvy users are changing.
Data is being generated at much faster rates than ever before and can yield
insights never previously possible. The data no longer fits neatly into structured
rows and columns. Windows of market opportunity are getting smaller. Underlying
infrastructure is being commoditized, with powerful systems available for just
pennies per hour.
The database chosen by a project team can be the enabler -- or the blocker -- to
success. All of the assumptions that have dictated database selection over the
past 30 years are being revisited as a result of the factors discussed above.
Challenges for DataBase Selection:
- Risk tolerance for bugs and unmapped behaviors
- HA
- Redundancy
- Access- and location-based requirements
- Security requirements
- Skill sets and tooling
- Architecture and infrastructure
- Growth expectations and the timeline therein (Scalable)
- Support? Community?
- Free Schema (Flexible Data Model)
- Scale Out
- Real-time
- Rich Queries
- Migration
- Drivers
- Faster
- Agile
- Backup/Restore
- Monitoring & …
innfinision.netBigData Analysis Tools & Methods
Tools & Solutions
innfinision.netBigData Analysis Tools & Methods
innfinision.netBigData Analysis Tools & Methods
Innfinision BigData Solutions:
1- MongoDB :
MongoDB (from 'humongous') is a Scalable, High performance, OpenSource,
Schema-free, Document-Oriented Database.
MongoDB provides high performance, high availability, and easy scalability.
Document Database. Documents (objects) map nicely to programming language
data types. Embedded documents and arrays reduce need for joins. Dynamic
schema makes polymorphism easier.
2- PyTables :
PyTables is a package for managing hierarchical datasets and designed to efficiently
cope with extremely large amounts of data.
It is built on top of the HDF5 library and the NumPy package. It features an object-
oriented interface that, combined with C extensions for the performance-critical
parts of the code (generated using Cython), makes it a fast, yet extremely easy to
use tool for interactively save and retrieve very large amounts of data. One
important feature of PyTables is that it optimizes memory and disk resources so
that they take much less space (between a factor 3 to 5, and more if the data is
compressible) than other solutions, like for example, relational or object oriented
databases.
innfinision.netBigData Analysis Tools & Methods
3- Blosc :
Blosc is a high performance compressor optimized for binary data. It has been
designed to transmit data to the processor cache faster than the traditional, non-
compressed, direct memory fetch approach via a memcpy OS call. Blosc is the first
compressor (that I'm aware of) that is meant not only to reduce the size of large
datasets on-disk or in-memory, but also to accelerate memory-bound
computations.
4- Blaze :
Blaze is a high-level user interface for databases and array computing systems. It
consists of the following components:
- A symbolic expression system to describe and reason about analytic queries
- A set of interpreters from that query system to various databases /
computational engines
This architecture allows a single Blaze code to run against several computational
backends. Blaze interacts rapidly with the user and only communicates with the
database when necessary. Blaze is also able to analyze and optimize queries to
improve the interactive experience.
Advantages - Why - Where
innfinision.netBigData Analysis Tools & Methods
innfinision.netBigData Analysis Tools & Methods
MongoDB Advantages :
Any relational database has a typical schema design that shows number of tables
and the relationship between these tables. While in MongoDB there is no concept of
relationship.
Advantages of MongoDB over RDBMS
-- Schema less : MongoDB is document database in which one collection holds
different different documents. Number of fields, content and size of the
document can be differ from one document to another.
-- Structure of a single object is clear.
-- No complex joins.
-- Deep query-ability. MongoDB supports dynamic queries on documents using a
document-based query language that's nearly as powerful as SQL
-- Tuning
-- Ease of scale-out. MongoDB is easy to scale
- Conversion / mapping of application objects to database objects not needed
Uses internal memory for storing the (windowed) working set, enabling faster
access of data
innfinision.netBigData Analysis Tools & Methods
Why should use MongoDB?
- Document Oriented Storage : Data is stored in the form of JSON style
documents
- Index on any attribute
- Replication & High Availability
- Auto-Sharding
- Rich Queries
- Fast In-Place Updates
- Professional Support
Where should use MongoDB?
- Big Data
- Content Management and Delivery
- Mobile and Social Infrastructure
- User Data Management
- Data Hub
innfinision.netBigData Analysis Tools & Methods
Why should use PyTables?
PyTables can be used on any scenario where you need to save and retrieve large
amounts of data and provide metadata (that is, data about actual data) for it.
Whether you want to work with large datasets of (potentially multidimensional)
data, save and structure your NumPy datasets or just to provide a categorized
structure for some portions of your cluttered RDBMS, then give PyTables a try. It
works well for storing data from data acquisition systems, sensors in geosciences,
simulation software, network data monitoring systems or as a centralized
repository for system logs, to name only a few possible uses.
However, it's important to emphasize the fact that PyTables is not designed to
work as a relational database competitor, but rather as a teammate. For example,
if you have very large tables in your existing relational database, then you can
move those tables to PyTables so as to reduce the burden of your existing
database while efficiently keeping those huge tables on-disk.
innfinision.netBigData Analysis Tools & Methods
Why should use Blosc?
- multi-threaded compressor that can transmit data from caches to memory, and
back,
- speed can be larger than a OS memcpy()
Why Shoud Use Blaze?
Because Blaze is a query system that looks like NumPy/Pandas. You write Blaze
queries, Blaze translates those queries to something else (like SQL), and ships
those queries to various database to run on other people's fast code. It smoothes
out this process to make interacting with foreign data as accessible as using
Pandas. This is actually quite difficult.
Ehsan Derakhshan
Ehsan.Derakhshan@innfinision.net
innfinision.net
Thank you

Mais conteúdo relacionado

Mais procurados

Intro to big data and applications - day 2
Intro to big data and applications - day 2Intro to big data and applications - day 2
Intro to big data and applications - day 2Parviz Vakili
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data ScienceBrijeshGoyani
 
Open Source Tools for Big Data
Open Source Tools for Big DataOpen Source Tools for Big Data
Open Source Tools for Big DataTeemu Heikkilä
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataVipin Batra
 
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Edureka!
 
Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIMC Institute
 
000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021Dendej Sawarnkatat
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introductionSujaMaryD
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 

Mais procurados (20)

Intro to big data and applications - day 2
Intro to big data and applications - day 2Intro to big data and applications - day 2
Intro to big data and applications - day 2
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Big Data Hadoop
Big Data HadoopBig Data Hadoop
Big Data Hadoop
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
Open Source Tools for Big Data
Open Source Tools for Big DataOpen Source Tools for Big Data
Open Source Tools for Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data Science
 
000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introduction
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 

Semelhante a BigData Analysis

Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBMongoDB
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackAnant Corporation
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Prof.Balakrishnan S
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBMongoDB
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading StrategiesMongoDB
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBMongoDB
 
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Denodo
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL TechnologiesAmit Singh
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo
 
Traditional data word
Traditional data wordTraditional data word
Traditional data wordorcoxsm
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
Connecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationConnecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationDenodo
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchSheetal Pratik
 

Semelhante a BigData Analysis (20)

Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDB
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
AtomicDBCoreTech_White Papaer
AtomicDBCoreTech_White PapaerAtomicDBCoreTech_White Papaer
AtomicDBCoreTech_White Papaer
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
 
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?
 
Traditional data word
Traditional data wordTraditional data word
Traditional data word
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Connecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationConnecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data Virtualization
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
SegmentOfOne
SegmentOfOneSegmentOfOne
SegmentOfOne
 

Mais de Innfinision Cloud and BigData Solutions (6)

OpenStack as an Infrastructure
OpenStack as an InfrastructureOpenStack as an Infrastructure
OpenStack as an Infrastructure
 
OpenStack Introduction
OpenStack IntroductionOpenStack Introduction
OpenStack Introduction
 
OpenStack vs VMware vCloud
OpenStack vs VMware vCloudOpenStack vs VMware vCloud
OpenStack vs VMware vCloud
 
oVirt introduction
oVirt introductionoVirt introduction
oVirt introduction
 
Large Data Analyze With PyTables
Large Data Analyze With PyTablesLarge Data Analyze With PyTables
Large Data Analyze With PyTables
 
Docker Container Introduction
Docker Container IntroductionDocker Container Introduction
Docker Container Introduction
 

Último

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

BigData Analysis

  • 1. Big Data Analysis Tools & Methods Spring 2015 OCCC - Tehran
  • 2. Personal Profile: ● Ehsan Derakhshan ● Founder & CEO at innfinision Cloud & BigData Solutions ● More than 15 year experience (Telecom & Datacom) ● Ehsan.derakhshan@innfinision.net ● Innfinision.net
  • 3. About innfinision: ● Providing Cloud, Virtualization and Data Center Solutions ● BigData Management - Analysis & Development Solutions ● Developing Software for Cloud Environments ● Providing Services to Telecom, Education, Banking & more... ● Supporting OpenStack Foundation as the First Iranian Company ● Partner of : Docker - MongoDB - RedHat
  • 4. BigData Analysis Tools & Methods innfinision.net ● What is Data & BigData? ● Important Questions ● Tools & Solutions ● Advantages - Why & Where Agenda:
  • 5. What is Data & BigData ? innfinision.netBigData Analysis Tools & Methods
  • 6. innfinision.netBigData Analysis Tools & Methods What is Data? Data is a collection of facts, such as numbers, words, measurements, observations or even just descriptions of things. Data can exist in a variety of forms -- as numbers or text on pieces of paper, as bits and bytes stored in electronic memory, or as facts stored in a person's mind. Strictly speaking, data is the plural of datum, a single piece of information.
  • 7. Big data can be described by the following characteristics: 1- Volume 2- Velocity 3- Variety 4- Variability 5- Veracity 6- Complexity 7- & etc Of information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making innfinision.netBigData Analysis Tools & Methods
  • 9. Important Question: Can a database really deliver quantifiable business advantage? To some, the database is a low-level infrastructure component of a much larger application -- something that only developers, DBAs and operations staff need to care or worry about. However, in the digital economy, data is the raw currency. How an organization stores, manages, analyzes and uses data has a direct impact on its success -- and its costs. Its choice of database affects how quickly it can deliver new applications to market, support business growth and improve customer experience. innfinision.netBigData Analysis Tools & Methods
  • 10. Consider these examples: - After trying for eight years to build a single view of their customer, one of the world's leading insurance companies changed database and delivered the project in just three months - A leading telecommunications provider adopted a new database technology and were able to accelerate time to market by 4x, reduce engineering costs by 50% and improve customer experience by 10x - A Tier 1 investment bank rebuilt its globally-distributed reference data platform on a new database technology, enabling it to save an estimated $40M over five years Singles can now find their ideal partner 95% faster after one of the world’s leading relationship providers switched data and machine learning to a new platform innfinision.netBigData Analysis Tools & Methods
  • 11. innfinision.netBigData Analysis Tools & Methods So Why is database selection becoming so critical? Because the requirements of modern applications and the demands of sophisticated, data-savvy users are changing. Data is being generated at much faster rates than ever before and can yield insights never previously possible. The data no longer fits neatly into structured rows and columns. Windows of market opportunity are getting smaller. Underlying infrastructure is being commoditized, with powerful systems available for just pennies per hour. The database chosen by a project team can be the enabler -- or the blocker -- to success. All of the assumptions that have dictated database selection over the past 30 years are being revisited as a result of the factors discussed above.
  • 12. Challenges for DataBase Selection: - Risk tolerance for bugs and unmapped behaviors - HA - Redundancy - Access- and location-based requirements - Security requirements - Skill sets and tooling - Architecture and infrastructure - Growth expectations and the timeline therein (Scalable) - Support? Community? - Free Schema (Flexible Data Model) - Scale Out - Real-time - Rich Queries - Migration - Drivers - Faster - Agile - Backup/Restore - Monitoring & … innfinision.netBigData Analysis Tools & Methods
  • 13. Tools & Solutions innfinision.netBigData Analysis Tools & Methods
  • 14. innfinision.netBigData Analysis Tools & Methods Innfinision BigData Solutions: 1- MongoDB : MongoDB (from 'humongous') is a Scalable, High performance, OpenSource, Schema-free, Document-Oriented Database. MongoDB provides high performance, high availability, and easy scalability. Document Database. Documents (objects) map nicely to programming language data types. Embedded documents and arrays reduce need for joins. Dynamic schema makes polymorphism easier. 2- PyTables : PyTables is a package for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data. It is built on top of the HDF5 library and the NumPy package. It features an object- oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively save and retrieve very large amounts of data. One important feature of PyTables is that it optimizes memory and disk resources so that they take much less space (between a factor 3 to 5, and more if the data is compressible) than other solutions, like for example, relational or object oriented databases.
  • 15. innfinision.netBigData Analysis Tools & Methods 3- Blosc : Blosc is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non- compressed, direct memory fetch approach via a memcpy OS call. Blosc is the first compressor (that I'm aware of) that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations. 4- Blaze : Blaze is a high-level user interface for databases and array computing systems. It consists of the following components: - A symbolic expression system to describe and reason about analytic queries - A set of interpreters from that query system to various databases / computational engines This architecture allows a single Blaze code to run against several computational backends. Blaze interacts rapidly with the user and only communicates with the database when necessary. Blaze is also able to analyze and optimize queries to improve the interactive experience.
  • 16. Advantages - Why - Where innfinision.netBigData Analysis Tools & Methods
  • 17. innfinision.netBigData Analysis Tools & Methods MongoDB Advantages : Any relational database has a typical schema design that shows number of tables and the relationship between these tables. While in MongoDB there is no concept of relationship. Advantages of MongoDB over RDBMS -- Schema less : MongoDB is document database in which one collection holds different different documents. Number of fields, content and size of the document can be differ from one document to another. -- Structure of a single object is clear. -- No complex joins. -- Deep query-ability. MongoDB supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL -- Tuning -- Ease of scale-out. MongoDB is easy to scale - Conversion / mapping of application objects to database objects not needed Uses internal memory for storing the (windowed) working set, enabling faster access of data
  • 18. innfinision.netBigData Analysis Tools & Methods Why should use MongoDB? - Document Oriented Storage : Data is stored in the form of JSON style documents - Index on any attribute - Replication & High Availability - Auto-Sharding - Rich Queries - Fast In-Place Updates - Professional Support Where should use MongoDB? - Big Data - Content Management and Delivery - Mobile and Social Infrastructure - User Data Management - Data Hub
  • 19. innfinision.netBigData Analysis Tools & Methods Why should use PyTables? PyTables can be used on any scenario where you need to save and retrieve large amounts of data and provide metadata (that is, data about actual data) for it. Whether you want to work with large datasets of (potentially multidimensional) data, save and structure your NumPy datasets or just to provide a categorized structure for some portions of your cluttered RDBMS, then give PyTables a try. It works well for storing data from data acquisition systems, sensors in geosciences, simulation software, network data monitoring systems or as a centralized repository for system logs, to name only a few possible uses. However, it's important to emphasize the fact that PyTables is not designed to work as a relational database competitor, but rather as a teammate. For example, if you have very large tables in your existing relational database, then you can move those tables to PyTables so as to reduce the burden of your existing database while efficiently keeping those huge tables on-disk.
  • 20. innfinision.netBigData Analysis Tools & Methods Why should use Blosc? - multi-threaded compressor that can transmit data from caches to memory, and back, - speed can be larger than a OS memcpy() Why Shoud Use Blaze? Because Blaze is a query system that looks like NumPy/Pandas. You write Blaze queries, Blaze translates those queries to something else (like SQL), and ships those queries to various database to run on other people's fast code. It smoothes out this process to make interacting with foreign data as accessible as using Pandas. This is actually quite difficult.