SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Open Source
Data Warehousing
Abbas Baniasadsi Moghadam
baniasadi@um.ac.ir
Data Warehousing Consist have two aspects
A Datawarehouse Or Data Store
An Application software or system for Enterprise Reporting , analysis , data mining
and ETL capabilities ( extract – transform – load) for business intelligence .
There are several open source database management system but a few of them suitable for
data warehousing capabilities.
The most capability suitable for Data warehousing is Column-Oriented Architecture .
A column-oriented DBMS is a database management system (DBMS) that stores its content by
column rather than by row.
Description of Colum-Oriented Architecture
A database program must show its data as two-dimensional tables, of columns and rows, but
store it as one-dimensional strings. For example, a database might have this table.
EmpId Lastname Firstname Salary
1 Smith Joe 40000
2 Jones Mary 50000
3 Johnson Cathy 4400
A row-oriented database serializes all of the values in a row together, then the values in the next
row, and so on.
1,Smith,Joe,40000;
2,Jones,Mary,50000;
3,Johnson,Cathy,44000;
A column-oriented database serializes all of the values of a column together, then the values of
the next column, and so on.
1,2,3;
Smith,Jones,Johnson;
Joe,Mary,Cathy;
40000,50000,44000;
Partitioning, indexing, caching, views, OLAP cubes, and transactional systems such as write
ahead logging or multiversion concurrency control all dramatically affect the physical organization.
The online transaction processing (OLTP)-focused RDBMS systems are more row-oriented.
The online analytical processing (OLAP)-focused systems are a balance of row-oriented and
column-oriented.
Compairsion between row-oriented and colum-oriented
1. Column-oriented systems are more efficient when an aggregate needs to be computed over many rows
but only for a notably smaller subset of all columns of data, because reading that smaller subset of data
can be faster than reading all data.
2. Column-oriented systems are more efficient when new values of a column are supplied for all rows at
once, because that column data can be written efficiently and replace old column data without touching
any other columns for the rows.
3. Row-oriented systems are more efficient when many columns of a single row are required at the same
time, and when row-size is relatively small, as the entire row can be retrieved with a single disk seek.
4. Row-oriented systems are more efficient when writing a new row if all of the column data is
supplied at the same time, as the entire row can be written with a single disk seek.
Compression
Column data is of uniform type; therefore, there are some opportunities for storage size optimizations
available in column-oriented data that are not available in row-oriented data.
Current examples of column-oriented DBMSs :
* Calpont's InfiniDB Community Edition, MySQL-front end, GPLv2
* C-Store No new release since Oct 2006
* GenoByte Column based storage system and API for handling genotype data
* Lemur Bitmap Index C++ Library (GPL)
* FastBit
* Infobright Community Edition, regular updates .
* LucidDB and Eigenbase
* MonetDB academic project
* Metakit
* The S programming language and GNU R incorporate column-oriented data structures for
statistical analyses.
Only InfiniDB and Infobright is based on MySql and suitable for pepoles with mysql
skills.
Others are academic projects or only a library.
Infobright is the best partner of mysql in 2009 and have bigger community.
Infobright Community Edition Key benefits
Ideal for data volumes of 500GB to 30TB.
Market-leading data compression (from 10:1 to over 40:1),
which drastically reduces I/O (improving query performance) and results in significantly less storage than
alternative solutions.
No licensing fees.
Fast response times for complex analytic queries.
Query and load performance remains constant as the size of the database grows .
No requirement for specific schemas, e.g. Star schema .
No requirement for materialized views, complex data partitioning strategies, or indexing .
Simple to implement and manage, requiring little administration Reduction in data warehouse capital and
operational expenses by reducing the number of servers, the amount of storage needed and
their associated maintenance costs, and a significant reduction in administrative costs .
Runs on low cost, off-the-shelf hardware .
Is compatible with major Business Intelligence tools such as Pentaho,JasperSoft, Cognos, Business Objects,
and others .
Infobright Architecture
1 . Column Orientation
2 . Data Packs and Data Pack Nodes
3 . Knowledge Nodes and the Knowledge Grid
4 . The Optimizer
5 . Data Compression
1. Column Orientation
Infobright at its core is a highly compressed column-oriented data store, which means that
instead of the data being stored row by row, it is stored column by column.
There are many advantages to column-orientation, including the ability to do more efficient
data compression because each column stores a single data type (as opposed to rows that
typically contain several data types), and allows compression to be optimized for each
particular data type, significantly reducing disk I/O .
Most analytic queries only involve a subset of the columns of the tables and so a column
oriented database focuses on retrieving only the data that is required.
2. Data Organization and the Knowledge Grid
Infobright organizes the data into 3 layers:
• Data Packs
The data itself within the columns is stored in 65,536 item groupings
called Data Packs. The use of Data Packs improves data compression
since they are smaller subsets of the column data (hence less variability)
and the compression algorithm can be applied based on data type.
• Data Pack Nodes (DPNs)
Data Pack Nodes contain a set of statistics about the data that is stored and compressed in
each of the Data Packs. There is always a 1 to 1 relationship between Data Packs and DPNs.
DPN’s always exist, so Infobright has some information about all the data in the database,
unlike traditional databases where indexes are created for only a subset of Columns.
• Knowledge Nodes
These are a further set of metadata related to Data Packs or column relationships. They can
be more introspective on the data, describing ranges of value occurrences, or can be
extrospective, describing how they relate to other data in the database.
Most KN’s are created at load time,but others are created in response to queries in order to
optimize performance.
This is a dynamic process, so certain Knowledge Nodes may or may not exist at a particular
point in time.
The DPNs and KNs form the Knowledge Grid. Unlike traditional database indexes, they are not
manually created, and require no ongoing care and feeding. Instead, they are created and
managed automatically by the system. In essence, they create a high level view of the entire
content of the database.
3. The Infobright Optimizer
The Optimizer is the highest level of intelligence in the architecture. It uses the Knowledge Grid
to determine the minimum set of Data Packs, which need to be decompressed in order to satisfy
a given query in the fastest possible time.
4 . Resolving Complex Analytic Queries without Indexes
The Infobright data warehouse resolves complex analytic queries without the need for traditional
indexes.
Data Packs
As noted, Data Packs consist of groupings of 65,536 items within a given column.
For example, for the table T with columns A, B and 300,000 records, Infobright would have the
following Packs:
Pack A1: values of A for rows no. 1-65,536
Pack A2: values of A for 65,537-131,072
Pack A3: values of A for 131,073-196,608
Pack A5: values of A for 262,145-300,000
Pack A4: values of A for 196,609-262,144
Pack B1: values of B for rows no. 1-65,536
Pack B2: values of B for 65,537-131,072
Pack B3: values of B for 131,073-196,608
Pack B4: values of B for 196,609-262,144
Pack B5: values of B for 262,145-300,000
The Knowledge Grid
The Infobright Knowledge Grid includes Data Pack Nodes and Knowledge Nodes.
For example, for the above table T, assume that both A and B store some numeric values.
The following table should be read as follows:
for the first 65,536 rows in T the minimum value of A is 0, maximum is 5, the sum of values on A
for the first 65,536 rows is 100,000 (and there are no null values).
DPNs are accessible without the need to decompress the corresponding Data Packs .
Knowledge Nodes (KNs) were developed to efficiently deal with complex, multiple-table
queries (joins, sub-queries, etc.).
To process multi-table join queries and sub-queries, the Knowledge Grid uses multi-table
Pack-To-Pack KNs that indicate which pairs of Data Packs from different tables should actually
be considered while joining the tables.
KN’s can be compared to indexes used by traditional databases, however, KN’s work on Packs
instead of rows. Therefore KNs are 65,536 times smaller than indexes (or even 65,536 times
65,536 for the Pack-To-Pack Nodes because of the size decrease for each of the two tables
involved).
In general the overhead is around 1% of the data, compared to classic indexes, which can be
20-50% of the size of the data.
Knowledge Nodes are created on data load and may also be created during query. They are
automatically created and maintained by the Knowledge Grid Manager based on the column
type and definition, so no intervention by a DBA is necessary.
The Infobright Optimizer
The Optimizer applies DPNs and KNs for the purpose of splitting Data Packs among the three
following categories for every query coming into the optimizer:
• Relevant Packs – in which each element (the record’s value for the given column) is identified,
based on DPNs and KNs, as applicable to the given query.
• Irrelevant Packs – based on DPNs and KNs, the Pack holds no relevant values.
• Suspect Packs – some elements may be relevant, but there is no way to claim that the Pack is
either fully relevant or fully irrelevant,based on DPNs and KNs.
While querying, Infobright does not need to decompress either Relevant or Irrelevant Data Packs.
Irrelevant Packs are simply not taken into account at all. In case of Relevant Packs, Infobright
knows that all elements are relevant, and the required answer is obtainable.
Query : SELECT SUM(B) FROM T WHERE A > 6;
Packs A1, A2, A4 are Irrelevant – none of the rows can satisfy A > 6 because all these packs
have maximum values below 6. Consequently, Packs B1, B2, B4 will not be analyzed while
calculating SUM(B) – they are Irrelevant too.
Pack A3 is Relevant – all the rows with numbers 131,073-196,608 satisfy A > 6. It means Pack
B3 is Relevant too. The sum of values on B within Pack B3 is one of the components of the final
answer. Based on B3’s DPN,Infobright knows that that sum equals to 100. And this is everything
Infobright needs to know about this portion of data.
Pack A5 is Suspect – some rows satisfy A > 6 but it is not known which ones. As a
consequence, Pack B5 is Suspect too. Infobright will need to decompress both A5 and B5 to find,
which rows out of 262,145-300,000 satisfy A > 6 and sum up together the values over B precisely
for those rows. A result will be added to the value of 100 previously obtained for Pack B3, to form
the final answer to the query.
5. Data Loading and Compression
During load, 65,536 values of the given column are treated as a sequence with zero or more null
values occurring anywhere in the sequence.
Information about the null positions is stored separately (within the Null Mask). Then the
remaining stream of the non-null values is compressed, taking full advantage of regularities
inside the data.
With Infobright, a 1TB database becomes a 100GB database. Since the data is much smaller
the disk transfer rate is improved (even with the overhead of compression).
One of Infobright’s benefits is its industry-leading compression. Unlike traditional row-based
data warehouses, data is stored by column, allowing compression algorithms to be finely tuned
to the column data type.
Moreover, for each column, the data is split into Data Packs with each storing up to 65,536
values of a given column. Infobright then applies a set of patent-pending compression
algorithms that are optimized by automatically self-adjusting various parameters of the
algorithm for each data pack.
An average compression ratio of 10:1 is achieved in Infobright. For example 10TB of raw data
can be stored in about 1TB of space on average (including the overhead associated with Data
Pack Nodes and the Knowledge Grid).
Within Infobright, the compression ratio may differ depending on data types and content.
Additionally, some data may turn out to be more repetitive than others and hence compression
ratios can be as high as 40:1.
6. How Infobright Leverages MySQL
In the data warehouse marketplace, the database must integrate with a variety of drivers and
tools. By integrating with MySQL, Infobright leverages the extensive driver connectivity
provided by MySQL connectors (C, JDBC,ODBC, .NET, Perl, etc.).
MySQL also provide cataloging functions, such as table definitions, views, users, permissions,
etc. These are stored in a MyISAM database.
Although MySQL provides a storage engine interface which is implemented in Infobright, the
Infobright technology has its own advanced Optimizer. This is required because of the column
orientation, but also because of the unique Knowledge Node technology described above.
Conclusion
Infobright has developed an architecture designed for analytic data warehousing, which also
requires a lot less work by IT organizations, delivering faster time-to-market for analytic
Applications .
The open source Infobright Community Edition makes it affordable for companies of all sizes.
Almost half of IT directors are prevented from rolling out the BI initiatives they desire because
Of expensive licensing and specialized hardware costs, Infobright can help by supplying the
low TCO that IT executives are looking for.

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Datawarehouse olap olam
Datawarehouse olap olamDatawarehouse olap olam
Datawarehouse olap olam
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousing
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing and Data mining
Data warehousing and Data mining Data warehousing and Data mining
Data warehousing and Data mining
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Introduction Data warehouse
Introduction Data warehouseIntroduction Data warehouse
Introduction Data warehouse
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
Ppt
PptPpt
Ppt
 
Data Warehousing Overview
Data Warehousing OverviewData Warehousing Overview
Data Warehousing Overview
 
Hadoop & Data Warehouse
Hadoop & Data Warehouse Hadoop & Data Warehouse
Hadoop & Data Warehouse
 
Real World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data WarehousingReal World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data Warehousing
 

Destaque

Benchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metricsBenchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metricsRim Moussa
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL ServerPeter Gfader
 
Testing data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanTesting data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanKirti Bhushan
 
Build a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 MinutesBuild a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 MinutesCaserta
 
Data_Warehouse
Data_WarehouseData_Warehouse
Data_WarehouseThang Luu
 
ツイートID生成とツイッターリアルタイム検索システムの話
ツイートID生成とツイッターリアルタイム検索システムの話ツイートID生成とツイッターリアルタイム検索システムの話
ツイートID生成とツイッターリアルタイム検索システムの話Preferred Networks
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesCode Mastery
 
Tài liệu data warehouse vietsub
Tài liệu data warehouse  vietsubTài liệu data warehouse  vietsub
Tài liệu data warehouse vietsubhoangdat1361
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouseJ M
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data modeljagdish_93
 
Importance of data model
Importance of data modelImportance of data model
Importance of data modelyhen06
 
Cloud Computing and your Data Warehouse
Cloud Computing and your Data WarehouseCloud Computing and your Data Warehouse
Cloud Computing and your Data Warehousedrluckyspin
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPTTrinath
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
3 Tier Architecture
3  Tier Architecture3  Tier Architecture
3 Tier ArchitectureWebx
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 

Destaque (20)

Benchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metricsBenchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metrics
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL Server
 
Testing data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanTesting data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti Bhushan
 
Build a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 MinutesBuild a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 Minutes
 
Data_Warehouse
Data_WarehouseData_Warehouse
Data_Warehouse
 
ツイートID生成とツイッターリアルタイム検索システムの話
ツイートID生成とツイッターリアルタイム検索システムの話ツイートID生成とツイッターリアルタイム検索システムの話
ツイートID生成とツイッターリアルタイム検索システムの話
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
 
Tài liệu data warehouse vietsub
Tài liệu data warehouse  vietsubTài liệu data warehouse  vietsub
Tài liệu data warehouse vietsub
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 
Importance of data model
Importance of data modelImportance of data model
Importance of data model
 
Cloud Computing and your Data Warehouse
Cloud Computing and your Data WarehouseCloud Computing and your Data Warehouse
Cloud Computing and your Data Warehouse
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Different data models
Different data modelsDifferent data models
Different data models
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
3 Tier Architecture
3  Tier Architecture3  Tier Architecture
3 Tier Architecture
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data mining
Data miningData mining
Data mining
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 

Semelhante a Open Source Datawarehouse

Infobright Column-Oriented Analytical Database Engine
Infobright Column-Oriented Analytical Database EngineInfobright Column-Oriented Analytical Database Engine
Infobright Column-Oriented Analytical Database EngineAlex Esterkin
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018Dave Stokes
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoDave Stokes
 
Column store databases approaches and optimization techniques
Column store databases  approaches and optimization techniquesColumn store databases  approaches and optimization techniques
Column store databases approaches and optimization techniquesIJDKP
 
Data Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptxData Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptxTusharAgarwal49094
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented DatabaseSuvradeep Rudra
 
NoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseNoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseMohammad Shaker
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataVipin Batra
 
Bc0058 data warehousing
Bc0058   data warehousingBc0058   data warehousing
Bc0058 data warehousingsmumbahelp
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxDURGADEVIL
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...IJCERT JOURNAL
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...ijdms
 

Semelhante a Open Source Datawarehouse (20)

Infobright Column-Oriented Analytical Database Engine
Infobright Column-Oriented Analytical Database EngineInfobright Column-Oriented Analytical Database Engine
Infobright Column-Oriented Analytical Database Engine
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
 
Column store databases approaches and optimization techniques
Column store databases  approaches and optimization techniquesColumn store databases  approaches and optimization techniques
Column store databases approaches and optimization techniques
 
Data Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptxData Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptx
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Lecture3.ppt
Lecture3.pptLecture3.ppt
Lecture3.ppt
 
very large database
very large databasevery large database
very large database
 
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 
NoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseNoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to Couchbase
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Bc0058 data warehousing
Bc0058   data warehousingBc0058   data warehousing
Bc0058 data warehousing
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docx
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
 
Bigtable_Paper
Bigtable_PaperBigtable_Paper
Bigtable_Paper
 
Vertica
VerticaVertica
Vertica
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
 
Column oriented Transactions
Column oriented TransactionsColumn oriented Transactions
Column oriented Transactions
 

Mais de عباس بني اسدي مقدم

چارچوب متن باز جهت توسعه سیستم های نرم افزاری
چارچوب متن باز جهت توسعه سیستم های نرم افزاریچارچوب متن باز جهت توسعه سیستم های نرم افزاری
چارچوب متن باز جهت توسعه سیستم های نرم افزاریعباس بني اسدي مقدم
 
طراحی سیستم های اطلاعاتی بر مبنای قابلیت های Nosql بانک های اطلاعاتی
طراحی سیستم های اطلاعاتی بر مبنای قابلیت های Nosql بانک های اطلاعاتی طراحی سیستم های اطلاعاتی بر مبنای قابلیت های Nosql بانک های اطلاعاتی
طراحی سیستم های اطلاعاتی بر مبنای قابلیت های Nosql بانک های اطلاعاتی عباس بني اسدي مقدم
 
طرح چارچوب متن باز تولید نرم افزار
طرح چارچوب  متن باز تولید نرم افزار طرح چارچوب  متن باز تولید نرم افزار
طرح چارچوب متن باز تولید نرم افزار عباس بني اسدي مقدم
 
طرح رایانش ابری در صنعت برق خراسان
طرح رایانش ابری در صنعت برق خراسانطرح رایانش ابری در صنعت برق خراسان
طرح رایانش ابری در صنعت برق خراسانعباس بني اسدي مقدم
 
دستورالعمل تعیین مستمر تلفات انرژی
دستورالعمل تعیین مستمر تلفات انرژیدستورالعمل تعیین مستمر تلفات انرژی
دستورالعمل تعیین مستمر تلفات انرژیعباس بني اسدي مقدم
 
معماری سازمانی سیستم های اطلاعاتی
معماری سازمانی سیستم های اطلاعاتی معماری سازمانی سیستم های اطلاعاتی
معماری سازمانی سیستم های اطلاعاتی عباس بني اسدي مقدم
 
گزارش عملکرد دفتر فن آوری اطلاعات و ارتباطات
گزارش عملکرد دفتر فن آوری اطلاعات و ارتباطات گزارش عملکرد دفتر فن آوری اطلاعات و ارتباطات
گزارش عملکرد دفتر فن آوری اطلاعات و ارتباطات عباس بني اسدي مقدم
 
استراتژی نرم افزار در شرکت توزیع برق مشهد
استراتژی نرم افزار در شرکت توزیع برق مشهداستراتژی نرم افزار در شرکت توزیع برق مشهد
استراتژی نرم افزار در شرکت توزیع برق مشهدعباس بني اسدي مقدم
 
انقلاب تکنولوژیک در نرم ساخت رایانه های شخصی
انقلاب تکنولوژیک در نرم ساخت رایانه های شخصیانقلاب تکنولوژیک در نرم ساخت رایانه های شخصی
انقلاب تکنولوژیک در نرم ساخت رایانه های شخصیعباس بني اسدي مقدم
 
زیر ساخت نرم افزاری شرکت توزیع برق مشهد
زیر ساخت نرم افزاری شرکت توزیع برق مشهدزیر ساخت نرم افزاری شرکت توزیع برق مشهد
زیر ساخت نرم افزاری شرکت توزیع برق مشهدعباس بني اسدي مقدم
 

Mais de عباس بني اسدي مقدم (20)

Covid19
Covid19Covid19
Covid19
 
پروژه پورتال جامع سازمانی
پروژه پورتال جامع سازمانیپروژه پورتال جامع سازمانی
پروژه پورتال جامع سازمانی
 
چارچوب متن باز جهت توسعه سیستم های نرم افزاری
چارچوب متن باز جهت توسعه سیستم های نرم افزاریچارچوب متن باز جهت توسعه سیستم های نرم افزاری
چارچوب متن باز جهت توسعه سیستم های نرم افزاری
 
Postgresql Server Programming
Postgresql Server ProgrammingPostgresql Server Programming
Postgresql Server Programming
 
An Introduction to Postgresql
An Introduction to PostgresqlAn Introduction to Postgresql
An Introduction to Postgresql
 
طراحی سیستم های اطلاعاتی بر مبنای قابلیت های Nosql بانک های اطلاعاتی
طراحی سیستم های اطلاعاتی بر مبنای قابلیت های Nosql بانک های اطلاعاتی طراحی سیستم های اطلاعاتی بر مبنای قابلیت های Nosql بانک های اطلاعاتی
طراحی سیستم های اطلاعاتی بر مبنای قابلیت های Nosql بانک های اطلاعاتی
 
Software architecture002
Software architecture002Software architecture002
Software architecture002
 
طرح چارچوب متن باز تولید نرم افزار
طرح چارچوب  متن باز تولید نرم افزار طرح چارچوب  متن باز تولید نرم افزار
طرح چارچوب متن باز تولید نرم افزار
 
سیستم رسیدگی به شکایات
سیستم رسیدگی به شکایاتسیستم رسیدگی به شکایات
سیستم رسیدگی به شکایات
 
گزارش دستیابی به اهداف ۱۴۰۵
گزارش دستیابی به اهداف ۱۴۰۵گزارش دستیابی به اهداف ۱۴۰۵
گزارش دستیابی به اهداف ۱۴۰۵
 
طرح رایانش ابری در صنعت برق خراسان
طرح رایانش ابری در صنعت برق خراسانطرح رایانش ابری در صنعت برق خراسان
طرح رایانش ابری در صنعت برق خراسان
 
فروش اینترنتی انشعاب
فروش اینترنتی انشعابفروش اینترنتی انشعاب
فروش اینترنتی انشعاب
 
دستورالعمل تعیین مستمر تلفات انرژی
دستورالعمل تعیین مستمر تلفات انرژیدستورالعمل تعیین مستمر تلفات انرژی
دستورالعمل تعیین مستمر تلفات انرژی
 
معماری جاری نرم افزار های شرکت
معماری جاری نرم افزار های شرکتمعماری جاری نرم افزار های شرکت
معماری جاری نرم افزار های شرکت
 
معماری سازمانی سیستم های اطلاعاتی
معماری سازمانی سیستم های اطلاعاتی معماری سازمانی سیستم های اطلاعاتی
معماری سازمانی سیستم های اطلاعاتی
 
گزارش عملکرد دفتر فن آوری اطلاعات و ارتباطات
گزارش عملکرد دفتر فن آوری اطلاعات و ارتباطات گزارش عملکرد دفتر فن آوری اطلاعات و ارتباطات
گزارش عملکرد دفتر فن آوری اطلاعات و ارتباطات
 
استراتژی نرم افزار در شرکت توزیع برق مشهد
استراتژی نرم افزار در شرکت توزیع برق مشهداستراتژی نرم افزار در شرکت توزیع برق مشهد
استراتژی نرم افزار در شرکت توزیع برق مشهد
 
انقلاب تکنولوژیک در نرم ساخت رایانه های شخصی
انقلاب تکنولوژیک در نرم ساخت رایانه های شخصیانقلاب تکنولوژیک در نرم ساخت رایانه های شخصی
انقلاب تکنولوژیک در نرم ساخت رایانه های شخصی
 
مهاجرت به متن باز
مهاجرت به متن بازمهاجرت به متن باز
مهاجرت به متن باز
 
زیر ساخت نرم افزاری شرکت توزیع برق مشهد
زیر ساخت نرم افزاری شرکت توزیع برق مشهدزیر ساخت نرم افزاری شرکت توزیع برق مشهد
زیر ساخت نرم افزاری شرکت توزیع برق مشهد
 

Último

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 

Último (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 

Open Source Datawarehouse

  • 1. Open Source Data Warehousing Abbas Baniasadsi Moghadam baniasadi@um.ac.ir
  • 2. Data Warehousing Consist have two aspects A Datawarehouse Or Data Store An Application software or system for Enterprise Reporting , analysis , data mining and ETL capabilities ( extract – transform – load) for business intelligence . There are several open source database management system but a few of them suitable for data warehousing capabilities. The most capability suitable for Data warehousing is Column-Oriented Architecture . A column-oriented DBMS is a database management system (DBMS) that stores its content by column rather than by row.
  • 3. Description of Colum-Oriented Architecture A database program must show its data as two-dimensional tables, of columns and rows, but store it as one-dimensional strings. For example, a database might have this table. EmpId Lastname Firstname Salary 1 Smith Joe 40000 2 Jones Mary 50000 3 Johnson Cathy 4400 A row-oriented database serializes all of the values in a row together, then the values in the next row, and so on. 1,Smith,Joe,40000; 2,Jones,Mary,50000; 3,Johnson,Cathy,44000; A column-oriented database serializes all of the values of a column together, then the values of the next column, and so on. 1,2,3; Smith,Jones,Johnson; Joe,Mary,Cathy; 40000,50000,44000;
  • 4. Partitioning, indexing, caching, views, OLAP cubes, and transactional systems such as write ahead logging or multiversion concurrency control all dramatically affect the physical organization. The online transaction processing (OLTP)-focused RDBMS systems are more row-oriented. The online analytical processing (OLAP)-focused systems are a balance of row-oriented and column-oriented. Compairsion between row-oriented and colum-oriented 1. Column-oriented systems are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data. 2. Column-oriented systems are more efficient when new values of a column are supplied for all rows at once, because that column data can be written efficiently and replace old column data without touching any other columns for the rows. 3. Row-oriented systems are more efficient when many columns of a single row are required at the same time, and when row-size is relatively small, as the entire row can be retrieved with a single disk seek. 4. Row-oriented systems are more efficient when writing a new row if all of the column data is supplied at the same time, as the entire row can be written with a single disk seek. Compression Column data is of uniform type; therefore, there are some opportunities for storage size optimizations available in column-oriented data that are not available in row-oriented data.
  • 5. Current examples of column-oriented DBMSs : * Calpont's InfiniDB Community Edition, MySQL-front end, GPLv2 * C-Store No new release since Oct 2006 * GenoByte Column based storage system and API for handling genotype data * Lemur Bitmap Index C++ Library (GPL) * FastBit * Infobright Community Edition, regular updates . * LucidDB and Eigenbase * MonetDB academic project * Metakit * The S programming language and GNU R incorporate column-oriented data structures for statistical analyses. Only InfiniDB and Infobright is based on MySql and suitable for pepoles with mysql skills. Others are academic projects or only a library. Infobright is the best partner of mysql in 2009 and have bigger community.
  • 6. Infobright Community Edition Key benefits Ideal for data volumes of 500GB to 30TB. Market-leading data compression (from 10:1 to over 40:1), which drastically reduces I/O (improving query performance) and results in significantly less storage than alternative solutions. No licensing fees. Fast response times for complex analytic queries. Query and load performance remains constant as the size of the database grows . No requirement for specific schemas, e.g. Star schema . No requirement for materialized views, complex data partitioning strategies, or indexing . Simple to implement and manage, requiring little administration Reduction in data warehouse capital and operational expenses by reducing the number of servers, the amount of storage needed and their associated maintenance costs, and a significant reduction in administrative costs . Runs on low cost, off-the-shelf hardware . Is compatible with major Business Intelligence tools such as Pentaho,JasperSoft, Cognos, Business Objects, and others .
  • 7. Infobright Architecture 1 . Column Orientation 2 . Data Packs and Data Pack Nodes 3 . Knowledge Nodes and the Knowledge Grid 4 . The Optimizer 5 . Data Compression 1. Column Orientation Infobright at its core is a highly compressed column-oriented data store, which means that instead of the data being stored row by row, it is stored column by column. There are many advantages to column-orientation, including the ability to do more efficient data compression because each column stores a single data type (as opposed to rows that typically contain several data types), and allows compression to be optimized for each particular data type, significantly reducing disk I/O . Most analytic queries only involve a subset of the columns of the tables and so a column oriented database focuses on retrieving only the data that is required.
  • 8. 2. Data Organization and the Knowledge Grid Infobright organizes the data into 3 layers: • Data Packs The data itself within the columns is stored in 65,536 item groupings called Data Packs. The use of Data Packs improves data compression since they are smaller subsets of the column data (hence less variability) and the compression algorithm can be applied based on data type. • Data Pack Nodes (DPNs) Data Pack Nodes contain a set of statistics about the data that is stored and compressed in each of the Data Packs. There is always a 1 to 1 relationship between Data Packs and DPNs. DPN’s always exist, so Infobright has some information about all the data in the database, unlike traditional databases where indexes are created for only a subset of Columns. • Knowledge Nodes These are a further set of metadata related to Data Packs or column relationships. They can be more introspective on the data, describing ranges of value occurrences, or can be extrospective, describing how they relate to other data in the database. Most KN’s are created at load time,but others are created in response to queries in order to optimize performance. This is a dynamic process, so certain Knowledge Nodes may or may not exist at a particular point in time. The DPNs and KNs form the Knowledge Grid. Unlike traditional database indexes, they are not manually created, and require no ongoing care and feeding. Instead, they are created and managed automatically by the system. In essence, they create a high level view of the entire content of the database.
  • 9. 3. The Infobright Optimizer The Optimizer is the highest level of intelligence in the architecture. It uses the Knowledge Grid to determine the minimum set of Data Packs, which need to be decompressed in order to satisfy a given query in the fastest possible time.
  • 10. 4 . Resolving Complex Analytic Queries without Indexes The Infobright data warehouse resolves complex analytic queries without the need for traditional indexes. Data Packs As noted, Data Packs consist of groupings of 65,536 items within a given column. For example, for the table T with columns A, B and 300,000 records, Infobright would have the following Packs: Pack A1: values of A for rows no. 1-65,536 Pack A2: values of A for 65,537-131,072 Pack A3: values of A for 131,073-196,608 Pack A5: values of A for 262,145-300,000 Pack A4: values of A for 196,609-262,144 Pack B1: values of B for rows no. 1-65,536 Pack B2: values of B for 65,537-131,072 Pack B3: values of B for 131,073-196,608 Pack B4: values of B for 196,609-262,144 Pack B5: values of B for 262,145-300,000
  • 11. The Knowledge Grid The Infobright Knowledge Grid includes Data Pack Nodes and Knowledge Nodes. For example, for the above table T, assume that both A and B store some numeric values. The following table should be read as follows: for the first 65,536 rows in T the minimum value of A is 0, maximum is 5, the sum of values on A for the first 65,536 rows is 100,000 (and there are no null values).
  • 12. DPNs are accessible without the need to decompress the corresponding Data Packs . Knowledge Nodes (KNs) were developed to efficiently deal with complex, multiple-table queries (joins, sub-queries, etc.). To process multi-table join queries and sub-queries, the Knowledge Grid uses multi-table Pack-To-Pack KNs that indicate which pairs of Data Packs from different tables should actually be considered while joining the tables. KN’s can be compared to indexes used by traditional databases, however, KN’s work on Packs instead of rows. Therefore KNs are 65,536 times smaller than indexes (or even 65,536 times 65,536 for the Pack-To-Pack Nodes because of the size decrease for each of the two tables involved). In general the overhead is around 1% of the data, compared to classic indexes, which can be 20-50% of the size of the data. Knowledge Nodes are created on data load and may also be created during query. They are automatically created and maintained by the Knowledge Grid Manager based on the column type and definition, so no intervention by a DBA is necessary.
  • 13. The Infobright Optimizer The Optimizer applies DPNs and KNs for the purpose of splitting Data Packs among the three following categories for every query coming into the optimizer: • Relevant Packs – in which each element (the record’s value for the given column) is identified, based on DPNs and KNs, as applicable to the given query. • Irrelevant Packs – based on DPNs and KNs, the Pack holds no relevant values. • Suspect Packs – some elements may be relevant, but there is no way to claim that the Pack is either fully relevant or fully irrelevant,based on DPNs and KNs. While querying, Infobright does not need to decompress either Relevant or Irrelevant Data Packs. Irrelevant Packs are simply not taken into account at all. In case of Relevant Packs, Infobright knows that all elements are relevant, and the required answer is obtainable. Query : SELECT SUM(B) FROM T WHERE A > 6;
  • 14. Packs A1, A2, A4 are Irrelevant – none of the rows can satisfy A > 6 because all these packs have maximum values below 6. Consequently, Packs B1, B2, B4 will not be analyzed while calculating SUM(B) – they are Irrelevant too. Pack A3 is Relevant – all the rows with numbers 131,073-196,608 satisfy A > 6. It means Pack B3 is Relevant too. The sum of values on B within Pack B3 is one of the components of the final answer. Based on B3’s DPN,Infobright knows that that sum equals to 100. And this is everything Infobright needs to know about this portion of data. Pack A5 is Suspect – some rows satisfy A > 6 but it is not known which ones. As a consequence, Pack B5 is Suspect too. Infobright will need to decompress both A5 and B5 to find, which rows out of 262,145-300,000 satisfy A > 6 and sum up together the values over B precisely for those rows. A result will be added to the value of 100 previously obtained for Pack B3, to form the final answer to the query.
  • 15. 5. Data Loading and Compression During load, 65,536 values of the given column are treated as a sequence with zero or more null values occurring anywhere in the sequence. Information about the null positions is stored separately (within the Null Mask). Then the remaining stream of the non-null values is compressed, taking full advantage of regularities inside the data.
  • 16. With Infobright, a 1TB database becomes a 100GB database. Since the data is much smaller the disk transfer rate is improved (even with the overhead of compression). One of Infobright’s benefits is its industry-leading compression. Unlike traditional row-based data warehouses, data is stored by column, allowing compression algorithms to be finely tuned to the column data type. Moreover, for each column, the data is split into Data Packs with each storing up to 65,536 values of a given column. Infobright then applies a set of patent-pending compression algorithms that are optimized by automatically self-adjusting various parameters of the algorithm for each data pack. An average compression ratio of 10:1 is achieved in Infobright. For example 10TB of raw data can be stored in about 1TB of space on average (including the overhead associated with Data Pack Nodes and the Knowledge Grid). Within Infobright, the compression ratio may differ depending on data types and content. Additionally, some data may turn out to be more repetitive than others and hence compression ratios can be as high as 40:1.
  • 17. 6. How Infobright Leverages MySQL In the data warehouse marketplace, the database must integrate with a variety of drivers and tools. By integrating with MySQL, Infobright leverages the extensive driver connectivity provided by MySQL connectors (C, JDBC,ODBC, .NET, Perl, etc.). MySQL also provide cataloging functions, such as table definitions, views, users, permissions, etc. These are stored in a MyISAM database. Although MySQL provides a storage engine interface which is implemented in Infobright, the Infobright technology has its own advanced Optimizer. This is required because of the column orientation, but also because of the unique Knowledge Node technology described above.
  • 18. Conclusion Infobright has developed an architecture designed for analytic data warehousing, which also requires a lot less work by IT organizations, delivering faster time-to-market for analytic Applications . The open source Infobright Community Edition makes it affordable for companies of all sizes. Almost half of IT directors are prevented from rolling out the BI initiatives they desire because Of expensive licensing and specialized hardware costs, Infobright can help by supplying the low TCO that IT executives are looking for.