SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
Pinot
Kishore Gopalakrishna
Tuesday, August 18, 15
Agenda
• Pinot @ LinkedIn - Current
• Pinot - Architecture
• Pinot Operations
• Pinot @ LinkedIn - Future
Tuesday, August 18, 15
WVMP
Tuesday, August 18, 15
Slice and Dice Metrics
Tuesday, August 18, 15
Pinot @ LinkedIn
Customers Members Internal tools
Tuesday, August 18, 15
• 100B documents
• 1B documents ingested per day
• 100M queries per day
• 10’s of ms latency
• 30 tables in prod, 250 * 3 std app nodes

 

Pinot @ LinkedIn
Tuesday, August 18, 15
Key features
SQL-like
interface
Columnar
storage and
indexing
Real-time
data load
Tuesday, August 18, 15
(S)QL: Filters and Aggs
SELECT count(*)
FROM companyFollowHistoricalEvents
WHERE entityId = 121011 AND
'day' >= 15949 AND 'day' <= 15963 AND
paid = 'y’ AND
action = 'stop'
Tuesday, August 18, 15
(S)QL: Group By
SELECT count(*)
FROM companyFollowHistoricalEvents
WHERE entityId = 121011 AND
'day' >= 15949 AND 'day' <= 15963 AND
paid = 'y’
GROUP BY action
Tuesday, August 18, 15
(S)QL: ORDER BY and LIMIT
SELECT *
FROM companyFollowHistoricalEvents
WHERE entityId = 121011 AND
entityId = 1000 AND
action = 'start'
ORDER BY creationTime DESC LIMIT 1
Tuesday, August 18, 15
Whats not supported
• JOIN: unpredictable performance
• NOT A SOURCE OF TRUTH
• Mutation
Tuesday, August 18, 15
Pinot
• Data flow
• Query Execution
• How to use/operate
• Pinot @ LinkedIn - Future
Tuesday, August 18, 15
Broker Helix
Real
time
Historical
Kafka Hadoop
Pinot
Architecture
Queries
Raw
Data
Tuesday, August 18, 15
Pinot
• Pinot segments
Tuesday, August 18, 15
Pinot Segment layout: Columnar storage
Tuesday, August 18, 15
Pinot Segment layout: Sorted Forward Index
Tuesday, August 18, 15
Pinot Segment layout: Other techniques
• Indexes: Inverted index, Bitmap, RoaringBitmap
• Compression: Dictionary Encoding, P4Delta
• Multi Valued columns, skip lists,
• Hyperloglog for unique
• T-digest for Percentile, Quantile

Tuesday, August 18, 15
Data aware
pre-computation
Star tree Index
Tuesday, August 18, 15
Pinot
• Query Execution
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
S1
S3
S2
S1
S3
S2
Helix
Brokers
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
Brokers
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
4. Process Request
&
send response
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
4. Process Request
&
send response
5. Gather Response
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
4. Process Request
&
send response
5. Gather Response
6. Return Response
Tuesday, August 18, 15
Pinot Query Execution: Single Node Architecture
EXECUTION ENGINE
INVERTED
INDEX
BITMAP
INDEX
COLUMN FORMAT
PLANNER
Tuesday, August 18, 15
Pinot Query Execution: Single Node Architecture
SELECT
campaignId,
sum(clicks)
FROM Table A
WHERE
accountId = 121011
AND
'day' >= 15949
GROUP BY
campaignId
account Id daycampaign Id click
Filter
Operator
Projection
Operator
Aggregation
Group by
Operator
Combine Operator
Pinot
Segments
Data sources
Matching
doc ids
campaignId,Click tuple
Tuesday, August 18, 15
Pinot
• Operations
Tuesday, August 18, 15
Cluster Management: Deployment
Helix
Brokers
Servers
• Brokers and Servers register themselves in Helix
• All servers start with no use case specific configuration
Controller
Tuesday, August 18, 15
On boarding new use case
Helix
Brokers
Servers
XLNT XLNT
XLNT
Create Table
command
Controller
XLNT
XLNTTag
Servers
TableName
Brokers
3
XLNT_T1
1
Tuesday, August 18, 15
Segment Assignment
Servers
S3
S2
S1
Upload Segment S2
S1
S3
S2
S1
S3
Helix
Brokers
Copies
TableName
2
XLNT_T1
Controller
Tuesday, August 18, 15
• AUTO recovery mode: Automatically redistribute
segments on failure/addition of new nodes
• Custom mode: Run in degraded mode until node is
restarted/replaced.
Pinot - Fault tolerance/Elasticity
Tuesday, August 18, 15
Pinot vs Druid
Druid Pinot
Architecture
Realtime + Offline,
Realtime only
Realtime + Offline
Realtime only -> consistency is hard and
schema evolution/Bootstrap is hard
Inverted Index
Always On all columns,
Fixed
Configurable on per
column basis
Allows trade off between scanning v/s
inverted index + scanning. More data can be
fit in given memory size
Data organization N/A Sorts data
Organizing data provides speed/better
compression and removes the need for
inverted index
Smart pre-
materialization
N/A star-tree Allows trade off between latency and space
Query Execution
Layer
Fixed Plan
Split into Planning
and execution
Smart choices can be made at runtime
based on metadata/query.
Tuesday, August 18, 15
• Documentation & tooling
• In progress - consistency among real time replicas.
• Improve cost to serve - leverage SSD, partial pre
materialization
• ThirdEye - Business Metrics Monitoring
Pinot - Future
Tuesday, August 18, 15
Thank You
30
Tuesday, August 18, 15

Mais conteúdo relacionado

Mais procurados

Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache PinotSiddharth Teotia
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)Ryan Blue
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Flink Forward
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry confluent
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...Edureka!
 

Mais procurados (20)

Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache Pinot
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Intro to Pinot (2016-01-04)
Intro to Pinot (2016-01-04)Intro to Pinot (2016-01-04)
Intro to Pinot (2016-01-04)
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
 

Destaque

2017 holiday survey: An annual analysis of the peak shopping season
2017 holiday survey: An annual analysis of the peak shopping season2017 holiday survey: An annual analysis of the peak shopping season
2017 holiday survey: An annual analysis of the peak shopping seasonDeloitte United States
 
Inside Google's Numbers in 2017
Inside Google's Numbers in 2017Inside Google's Numbers in 2017
Inside Google's Numbers in 2017Rand Fishkin
 
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Issac Buenrostro
 
Penyimpangan Nilai "Persatuan" dalam pancasila
Penyimpangan Nilai "Persatuan" dalam pancasilaPenyimpangan Nilai "Persatuan" dalam pancasila
Penyimpangan Nilai "Persatuan" dalam pancasilahelda1234
 
PENYIMPANGAN KEPADA SILA KE 3
PENYIMPANGAN KEPADA SILA KE 3PENYIMPANGAN KEPADA SILA KE 3
PENYIMPANGAN KEPADA SILA KE 3Aldya Rachma
 
オールフェスタ Git勉強会資料 (public)
オールフェスタ Git勉強会資料 (public)オールフェスタ Git勉強会資料 (public)
オールフェスタ Git勉強会資料 (public)Shunsuke Tadokoro
 
Do Fluxo de Caixa ao Planejamento Financeiro
Do Fluxo de Caixa ao Planejamento FinanceiroDo Fluxo de Caixa ao Planejamento Financeiro
Do Fluxo de Caixa ao Planejamento FinanceiroGranatum
 
自習形式で学ぶ「DIGITS による画像分類入門」
自習形式で学ぶ「DIGITS による画像分類入門」自習形式で学ぶ「DIGITS による画像分類入門」
自習形式で学ぶ「DIGITS による画像分類入門」NVIDIA Japan
 
Mother teresa!
Mother teresa!Mother teresa!
Mother teresa!lsammut
 
A Tribute to Mother Teresa !!
A Tribute to Mother Teresa !!A Tribute to Mother Teresa !!
A Tribute to Mother Teresa !!Supriya S.
 
Mother Teresa: Saint of the Gutters
Mother Teresa: Saint of the GuttersMother Teresa: Saint of the Gutters
Mother Teresa: Saint of the Guttersguimera
 
Перелік об'єктів державної власності, які рекомендовано до передачі в концесію
Перелік об'єктів державної власності, які рекомендовано до передачі в концесіюПерелік об'єктів державної власності, які рекомендовано до передачі в концесію
Перелік об'єктів державної власності, які рекомендовано до передачі в концесіюtsnua
 
Перелік об'єктів державної власності, що підлягають приватизації у 2017–2020 ...
Перелік об'єктів державної власності, що підлягають приватизації у 2017–2020 ...Перелік об'єктів державної власності, що підлягають приватизації у 2017–2020 ...
Перелік об'єктів державної власності, що підлягають приватизації у 2017–2020 ...tsnua
 

Destaque (20)

10 facts about jobs in the future
10 facts about jobs in the future10 facts about jobs in the future
10 facts about jobs in the future
 
The AI Rush
The AI RushThe AI Rush
The AI Rush
 
2017 holiday survey: An annual analysis of the peak shopping season
2017 holiday survey: An annual analysis of the peak shopping season2017 holiday survey: An annual analysis of the peak shopping season
2017 holiday survey: An annual analysis of the peak shopping season
 
Inside Google's Numbers in 2017
Inside Google's Numbers in 2017Inside Google's Numbers in 2017
Inside Google's Numbers in 2017
 
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
 
Penyimpangan Nilai "Persatuan" dalam pancasila
Penyimpangan Nilai "Persatuan" dalam pancasilaPenyimpangan Nilai "Persatuan" dalam pancasila
Penyimpangan Nilai "Persatuan" dalam pancasila
 
Why OpenDaylight
Why OpenDaylightWhy OpenDaylight
Why OpenDaylight
 
PENYIMPANGAN KEPADA SILA KE 3
PENYIMPANGAN KEPADA SILA KE 3PENYIMPANGAN KEPADA SILA KE 3
PENYIMPANGAN KEPADA SILA KE 3
 
オールフェスタ Git勉強会資料 (public)
オールフェスタ Git勉強会資料 (public)オールフェスタ Git勉強会資料 (public)
オールフェスタ Git勉強会資料 (public)
 
Do Fluxo de Caixa ao Planejamento Financeiro
Do Fluxo de Caixa ao Planejamento FinanceiroDo Fluxo de Caixa ao Planejamento Financeiro
Do Fluxo de Caixa ao Planejamento Financeiro
 
自習形式で学ぶ「DIGITS による画像分類入門」
自習形式で学ぶ「DIGITS による画像分類入門」自習形式で学ぶ「DIGITS による画像分類入門」
自習形式で学ぶ「DIGITS による画像分類入門」
 
Presentation r4i
Presentation r4i Presentation r4i
Presentation r4i
 
National Research Award_2559
National Research Award_2559National Research Award_2559
National Research Award_2559
 
Presentation talent mobility
Presentation talent mobilityPresentation talent mobility
Presentation talent mobility
 
Research r4i
Research r4iResearch r4i
Research r4i
 
Mother teresa!
Mother teresa!Mother teresa!
Mother teresa!
 
A Tribute to Mother Teresa !!
A Tribute to Mother Teresa !!A Tribute to Mother Teresa !!
A Tribute to Mother Teresa !!
 
Mother Teresa: Saint of the Gutters
Mother Teresa: Saint of the GuttersMother Teresa: Saint of the Gutters
Mother Teresa: Saint of the Gutters
 
Перелік об'єктів державної власності, які рекомендовано до передачі в концесію
Перелік об'єктів державної власності, які рекомендовано до передачі в концесіюПерелік об'єктів державної власності, які рекомендовано до передачі в концесію
Перелік об'єктів державної власності, які рекомендовано до передачі в концесію
 
Перелік об'єктів державної власності, що підлягають приватизації у 2017–2020 ...
Перелік об'єктів державної власності, що підлягають приватизації у 2017–2020 ...Перелік об'єктів державної власності, що підлягають приватизації у 2017–2020 ...
Перелік об'єктів державної власності, що підлягають приватизації у 2017–2020 ...
 

Semelhante a Pinot: Realtime Distributed OLAP datastore

Cloud Cost Management and Apache Spark with Xuan Wang
Cloud Cost Management and Apache Spark with Xuan WangCloud Cost Management and Apache Spark with Xuan Wang
Cloud Cost Management and Apache Spark with Xuan WangDatabricks
 
ADRecon BH USA 2018 : Arsenal and DEF CON 26 Demo Labs Presentation
ADRecon BH USA 2018 : Arsenal and DEF CON 26 Demo Labs PresentationADRecon BH USA 2018 : Arsenal and DEF CON 26 Demo Labs Presentation
ADRecon BH USA 2018 : Arsenal and DEF CON 26 Demo Labs Presentationprashant3535
 
Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023
Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023
Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023Icinga
 
Truck and Body Presentation
Truck and Body PresentationTruck and Body Presentation
Truck and Body PresentationCBN2014
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at HotstarKafkaZone
 
Introduction of pg_statsinfo and pg_stats_reporter ~Statistics Reporting Tool...
Introduction of pg_statsinfo and pg_stats_reporter ~Statistics Reporting Tool...Introduction of pg_statsinfo and pg_stats_reporter ~Statistics Reporting Tool...
Introduction of pg_statsinfo and pg_stats_reporter ~Statistics Reporting Tool...Kondo Mitsumasa
 
8051,chapter1,architecture and peripherals
8051,chapter1,architecture and peripherals8051,chapter1,architecture and peripherals
8051,chapter1,architecture and peripheralsamrutachintawar239
 
ITCamp 2018 - Damian Widera U-SQL in great depth
ITCamp 2018 - Damian Widera U-SQL in great depthITCamp 2018 - Damian Widera U-SQL in great depth
ITCamp 2018 - Damian Widera U-SQL in great depthITCamp
 
Accumulo Tutorial — Up and Running (or at Least Walking) in 90 Minutes
Accumulo Tutorial — Up and Running (or at Least Walking) in 90 MinutesAccumulo Tutorial — Up and Running (or at Least Walking) in 90 Minutes
Accumulo Tutorial — Up and Running (or at Least Walking) in 90 MinutesAccumulo Summit
 
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDV
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDVSalesforce Apex Hours : How Lightning Platform Query Optimizer works for LDV
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDVAmit Chaudhary
 
Monitorama: How monitoring can improve the rest of the company
Monitorama: How monitoring can improve the rest of the companyMonitorama: How monitoring can improve the rest of the company
Monitorama: How monitoring can improve the rest of the companyJeff Weinstein
 
NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance
NoSQL Tel Aviv Meetup#1: Introduction to Polyglot PersistanceNoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance
NoSQL Tel Aviv Meetup#1: Introduction to Polyglot PersistanceNoSQL TLV
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...DataStax
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookTreasure Data, Inc.
 
Active Directory Recon 101
Active Directory Recon 101Active Directory Recon 101
Active Directory Recon 101prashant3535
 
Pinterest hadoop summit_talk
Pinterest hadoop summit_talkPinterest hadoop summit_talk
Pinterest hadoop summit_talkKrishna Gade
 
50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven ProductsDataWorks Summit
 
Hypothetical Partitioning for PostgreSQL
Hypothetical Partitioning for PostgreSQLHypothetical Partitioning for PostgreSQL
Hypothetical Partitioning for PostgreSQLYuzuko Hosoya
 

Semelhante a Pinot: Realtime Distributed OLAP datastore (20)

Cloud Cost Management and Apache Spark with Xuan Wang
Cloud Cost Management and Apache Spark with Xuan WangCloud Cost Management and Apache Spark with Xuan Wang
Cloud Cost Management and Apache Spark with Xuan Wang
 
ADRecon BH USA 2018 : Arsenal and DEF CON 26 Demo Labs Presentation
ADRecon BH USA 2018 : Arsenal and DEF CON 26 Demo Labs PresentationADRecon BH USA 2018 : Arsenal and DEF CON 26 Demo Labs Presentation
ADRecon BH USA 2018 : Arsenal and DEF CON 26 Demo Labs Presentation
 
Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023
Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023
Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023
 
Truck and Body Presentation
Truck and Body PresentationTruck and Body Presentation
Truck and Body Presentation
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
 
Postgres
PostgresPostgres
Postgres
 
Scaling postgres
Scaling postgresScaling postgres
Scaling postgres
 
Introduction of pg_statsinfo and pg_stats_reporter ~Statistics Reporting Tool...
Introduction of pg_statsinfo and pg_stats_reporter ~Statistics Reporting Tool...Introduction of pg_statsinfo and pg_stats_reporter ~Statistics Reporting Tool...
Introduction of pg_statsinfo and pg_stats_reporter ~Statistics Reporting Tool...
 
8051,chapter1,architecture and peripherals
8051,chapter1,architecture and peripherals8051,chapter1,architecture and peripherals
8051,chapter1,architecture and peripherals
 
ITCamp 2018 - Damian Widera U-SQL in great depth
ITCamp 2018 - Damian Widera U-SQL in great depthITCamp 2018 - Damian Widera U-SQL in great depth
ITCamp 2018 - Damian Widera U-SQL in great depth
 
Accumulo Tutorial — Up and Running (or at Least Walking) in 90 Minutes
Accumulo Tutorial — Up and Running (or at Least Walking) in 90 MinutesAccumulo Tutorial — Up and Running (or at Least Walking) in 90 Minutes
Accumulo Tutorial — Up and Running (or at Least Walking) in 90 Minutes
 
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDV
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDVSalesforce Apex Hours : How Lightning Platform Query Optimizer works for LDV
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDV
 
Monitorama: How monitoring can improve the rest of the company
Monitorama: How monitoring can improve the rest of the companyMonitorama: How monitoring can improve the rest of the company
Monitorama: How monitoring can improve the rest of the company
 
NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance
NoSQL Tel Aviv Meetup#1: Introduction to Polyglot PersistanceNoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance
NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Active Directory Recon 101
Active Directory Recon 101Active Directory Recon 101
Active Directory Recon 101
 
Pinterest hadoop summit_talk
Pinterest hadoop summit_talkPinterest hadoop summit_talk
Pinterest hadoop summit_talk
 
50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products
 
Hypothetical Partitioning for PostgreSQL
Hypothetical Partitioning for PostgreSQLHypothetical Partitioning for PostgreSQL
Hypothetical Partitioning for PostgreSQL
 

Mais de Kishore Gopalakrishna

Multi-Tenant Data Cloud with YARN & Helix
Multi-Tenant Data Cloud with YARN & HelixMulti-Tenant Data Cloud with YARN & Helix
Multi-Tenant Data Cloud with YARN & HelixKishore Gopalakrishna
 
Untangling cluster management with Helix
Untangling cluster management with HelixUntangling cluster management with Helix
Untangling cluster management with HelixKishore Gopalakrishna
 
Data driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache HelixData driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache HelixKishore Gopalakrishna
 
Apache Helix presentation at ApacheCon 2013
Apache Helix presentation at ApacheCon 2013Apache Helix presentation at ApacheCon 2013
Apache Helix presentation at ApacheCon 2013Kishore Gopalakrishna
 
Apache Helix presentation at SOCC 2012
Apache Helix presentation at SOCC 2012Apache Helix presentation at SOCC 2012
Apache Helix presentation at SOCC 2012Kishore Gopalakrishna
 

Mais de Kishore Gopalakrishna (7)

Multi-Tenant Data Cloud with YARN & Helix
Multi-Tenant Data Cloud with YARN & HelixMulti-Tenant Data Cloud with YARN & Helix
Multi-Tenant Data Cloud with YARN & Helix
 
Helix talk at RelateIQ
Helix talk at RelateIQHelix talk at RelateIQ
Helix talk at RelateIQ
 
Untangling cluster management with Helix
Untangling cluster management with HelixUntangling cluster management with Helix
Untangling cluster management with Helix
 
Data driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache HelixData driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache Helix
 
Apache Helix presentation at Vmware
Apache Helix presentation at VmwareApache Helix presentation at Vmware
Apache Helix presentation at Vmware
 
Apache Helix presentation at ApacheCon 2013
Apache Helix presentation at ApacheCon 2013Apache Helix presentation at ApacheCon 2013
Apache Helix presentation at ApacheCon 2013
 
Apache Helix presentation at SOCC 2012
Apache Helix presentation at SOCC 2012Apache Helix presentation at SOCC 2012
Apache Helix presentation at SOCC 2012
 

Último

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Último (20)

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

Pinot: Realtime Distributed OLAP datastore

  • 2. Agenda • Pinot @ LinkedIn - Current • Pinot - Architecture • Pinot Operations • Pinot @ LinkedIn - Future Tuesday, August 18, 15
  • 4. Slice and Dice Metrics Tuesday, August 18, 15
  • 5. Pinot @ LinkedIn Customers Members Internal tools Tuesday, August 18, 15
  • 6. • 100B documents • 1B documents ingested per day • 100M queries per day • 10’s of ms latency • 30 tables in prod, 250 * 3 std app nodes Pinot @ LinkedIn Tuesday, August 18, 15
  • 8. (S)QL: Filters and Aggs SELECT count(*) FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND 'day' >= 15949 AND 'day' <= 15963 AND paid = 'y’ AND action = 'stop' Tuesday, August 18, 15
  • 9. (S)QL: Group By SELECT count(*) FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND 'day' >= 15949 AND 'day' <= 15963 AND paid = 'y’ GROUP BY action Tuesday, August 18, 15
  • 10. (S)QL: ORDER BY and LIMIT SELECT * FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND entityId = 1000 AND action = 'start' ORDER BY creationTime DESC LIMIT 1 Tuesday, August 18, 15
  • 11. Whats not supported • JOIN: unpredictable performance • NOT A SOURCE OF TRUTH • Mutation Tuesday, August 18, 15
  • 12. Pinot • Data flow • Query Execution • How to use/operate • Pinot @ LinkedIn - Future Tuesday, August 18, 15
  • 15. Pinot Segment layout: Columnar storage Tuesday, August 18, 15
  • 16. Pinot Segment layout: Sorted Forward Index Tuesday, August 18, 15
  • 17. Pinot Segment layout: Other techniques • Indexes: Inverted index, Bitmap, RoaringBitmap • Compression: Dictionary Encoding, P4Delta • Multi Valued columns, skip lists, • Hyperloglog for unique • T-digest for Percentile, Quantile Tuesday, August 18, 15
  • 18. Data aware pre-computation Star tree Index Tuesday, August 18, 15
  • 20. Pinot Query Execution: Distributed Servers S1 S3 S2 S1 S3 S2 Helix Brokers Tuesday, August 18, 15
  • 21. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix Brokers Tuesday, August 18, 15
  • 22. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers Tuesday, August 18, 15
  • 23. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers 3. Scatter Request Tuesday, August 18, 15
  • 24. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers 3. Scatter Request 4. Process Request & send response Tuesday, August 18, 15
  • 25. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers 3. Scatter Request 4. Process Request & send response 5. Gather Response Tuesday, August 18, 15
  • 26. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers 3. Scatter Request 4. Process Request & send response 5. Gather Response 6. Return Response Tuesday, August 18, 15
  • 27. Pinot Query Execution: Single Node Architecture EXECUTION ENGINE INVERTED INDEX BITMAP INDEX COLUMN FORMAT PLANNER Tuesday, August 18, 15
  • 28. Pinot Query Execution: Single Node Architecture SELECT campaignId, sum(clicks) FROM Table A WHERE accountId = 121011 AND 'day' >= 15949 GROUP BY campaignId account Id daycampaign Id click Filter Operator Projection Operator Aggregation Group by Operator Combine Operator Pinot Segments Data sources Matching doc ids campaignId,Click tuple Tuesday, August 18, 15
  • 30. Cluster Management: Deployment Helix Brokers Servers • Brokers and Servers register themselves in Helix • All servers start with no use case specific configuration Controller Tuesday, August 18, 15
  • 31. On boarding new use case Helix Brokers Servers XLNT XLNT XLNT Create Table command Controller XLNT XLNTTag Servers TableName Brokers 3 XLNT_T1 1 Tuesday, August 18, 15
  • 32. Segment Assignment Servers S3 S2 S1 Upload Segment S2 S1 S3 S2 S1 S3 Helix Brokers Copies TableName 2 XLNT_T1 Controller Tuesday, August 18, 15
  • 33. • AUTO recovery mode: Automatically redistribute segments on failure/addition of new nodes • Custom mode: Run in degraded mode until node is restarted/replaced. Pinot - Fault tolerance/Elasticity Tuesday, August 18, 15
  • 34. Pinot vs Druid Druid Pinot Architecture Realtime + Offline, Realtime only Realtime + Offline Realtime only -> consistency is hard and schema evolution/Bootstrap is hard Inverted Index Always On all columns, Fixed Configurable on per column basis Allows trade off between scanning v/s inverted index + scanning. More data can be fit in given memory size Data organization N/A Sorts data Organizing data provides speed/better compression and removes the need for inverted index Smart pre- materialization N/A star-tree Allows trade off between latency and space Query Execution Layer Fixed Plan Split into Planning and execution Smart choices can be made at runtime based on metadata/query. Tuesday, August 18, 15
  • 35. • Documentation & tooling • In progress - consistency among real time replicas. • Improve cost to serve - leverage SSD, partial pre materialization • ThirdEye - Business Metrics Monitoring Pinot - Future Tuesday, August 18, 15