SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
FiveDataTrends You Should Know
Tomasz Tunguz, Managing Director, Redpoint Ventures
@ttunguz & tomtunguz.com
5 Major Trends in Data You Should Know
5 Major Trends in Data You Should Know
5 Major Trends in Data You Should Know
Metatrend:
Rise of Data Engineering as Craft
Rise of Data Engineering as Craft
Why has
Data Become
So
Ubiquitous?
Rise of Data Engineering as Craft
Aggregated into
EDW
Output
Oracle SAP
Logs
TX
Actions
Cognos
Tableau
Data Produced
When a Single Monolithic Pipeline
Worked, It Looked Like This
Rise of Data Engineering as CraftBut Everyone Wanted One
Exec Team
Marketing Product Sales
Rise of Data Engineering as CraftAnd They Each Need Data from the
Others
Exec Team
Marketing Product Sales
Rise of Data Engineering as Craft
This is a Data Mesh:
A Network of Data Producers &
Consumers
Centralize and Move it to a
Cloud Data Lake
Rise of Data Engineering as Craft
Without the right tooling, you
have a Data Mess
5 Major Trends in Data You Should Know
Rise of Data Engineering as CraftBut You Could Have a Breathtaking
Machine, When It All Comes Together?
Who Will Come to Save the Day?
Rise of Data Engineering as CraftWhat is a Data Engineer?
Data Engineers: the people who move, shape, and
transform data from where it is generated to
where it is needed, and do it
1.Consistently
2.Efficiently
3.Scalably
4.Accurately
5.Compliantly
Rise of Data Engineering as Craft
aka
Software Engineers Deep in Data
Insight: Software Engineers Have
Experience, Tools, and Patterns
Writing Code
Ex: the Software Development Lifecycle
Rise of Data Engineering as Craft
What is the Data Engineering
Equivalent?
Rise of Data Engineering as Craft
The Data
Engineering
Lifecycle
Rise of Data Engineering as Craft
Each Step of the DELC Needs
New Tools
Rise of Data Engineering as CraftData Pipelines:
Watermains of Data
Code in a modern language to
repeatably move data around
Innovators
Airflow, Elementl, Prefect
Data Pipelines:
Watermains of Data
Rise of Data Engineering as CraftCompute Engines:
Access Cloud Data
Query data in the cloud, without
moving it. Key insight: separation
of data and compute.
Innovators
Dremio, Databricks
5 Major Trends in Data You Should Know
Rise of Data Engineering as CraftData Modeling:
Universal Metrics Library
Define metrics once for the entire
organization
Innovators
Transform Data, Looker
5 Major Trends in Data You Should Know
Rise of Data Engineering as CraftData Products:
Stand on the Shoulders of Gigabytes
Build and deploy data products
internally and externally
Innovators
BI: Preset
ML: Streamlit, Tecton
5 Major Trends in Data You Should Know
5 Major Trends in Data You Should Know
Rise of Data Engineering as CraftData Quality:
Harness & Tame Error
Develop tests and monitor data
flows to ensure data integrity
Innovators
Monte Carlo, Great Expectations,
Soda Data, Data Gravity
5 Major Trends in Data You Should Know
5 Major Trends in Data You Should Know
5 Data Trends You Should Know
1.Data Pipelines – move data with code
2.Compute Engines – query cloud data
3.Modeling – defines metrics once
4.Data Products – squeeze insight from data
5.Data Quality – keep data accurate
The Future Depends on You
FiveDataTrends You Should Know
Tomasz Tunguz, Managing Director, Redpoint Ventures
@ttunguz & tomtunguz.com

Mais conteúdo relacionado

Mais procurados

Building an AI Startup: Realities & Tactics
Building an AI Startup: Realities & TacticsBuilding an AI Startup: Realities & Tactics
Building an AI Startup: Realities & TacticsMatt Turck
 
Big data introduction - Big Data from a Consulting perspective - Sogeti
Big data introduction - Big Data from a Consulting perspective - SogetiBig data introduction - Big Data from a Consulting perspective - Sogeti
Big data introduction - Big Data from a Consulting perspective - SogetiEdzo Botjes
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and OpportunitiesKenny Huang Ph.D.
 
Fundamentals of Big Data in 2 minutes!!
Fundamentals of Big Data in  2 minutes!!Fundamentals of Big Data in  2 minutes!!
Fundamentals of Big Data in 2 minutes!!Simplify360
 
Conversational Architecture, CAVE Language, Data Stewardship
Conversational Architecture, CAVE Language, Data StewardshipConversational Architecture, CAVE Language, Data Stewardship
Conversational Architecture, CAVE Language, Data StewardshipLoren Davie
 
Cognitive computing big_data_statistical_analytics
Cognitive computing big_data_statistical_analyticsCognitive computing big_data_statistical_analytics
Cognitive computing big_data_statistical_analyticsPietro Leo
 
Big, small or just complex data?
Big, small or just complex data?Big, small or just complex data?
Big, small or just complex data?panoratio
 
AI-SDV 2020: AI, IoT, Blockchain & Co: How to keep track and take advantage o...
AI-SDV 2020: AI, IoT, Blockchain & Co: How to keep track and take advantage o...AI-SDV 2020: AI, IoT, Blockchain & Co: How to keep track and take advantage o...
AI-SDV 2020: AI, IoT, Blockchain & Co: How to keep track and take advantage o...Dr. Haxel Consult
 
Summiting the Mountain of Big Data
Summiting the Mountain of Big DataSummiting the Mountain of Big Data
Summiting the Mountain of Big DataIntegra
 
The Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesThe Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesBen Siscovick
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analyticsCapgemini
 
Evolution of big data technology
Evolution of big data technologyEvolution of big data technology
Evolution of big data technologyMarket Analyzer
 
Living in a data driven world by V Laxmikanth Broadridge
Living in a data driven world by V Laxmikanth BroadridgeLiving in a data driven world by V Laxmikanth Broadridge
Living in a data driven world by V Laxmikanth BroadridgeZinnov
 

Mais procurados (19)

Building an AI Startup: Realities & Tactics
Building an AI Startup: Realities & TacticsBuilding an AI Startup: Realities & Tactics
Building an AI Startup: Realities & Tactics
 
Big data introduction - Big Data from a Consulting perspective - Sogeti
Big data introduction - Big Data from a Consulting perspective - SogetiBig data introduction - Big Data from a Consulting perspective - Sogeti
Big data introduction - Big Data from a Consulting perspective - Sogeti
 
Disruptive Technologies
Disruptive TechnologiesDisruptive Technologies
Disruptive Technologies
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Fundamentals of Big Data in 2 minutes!!
Fundamentals of Big Data in  2 minutes!!Fundamentals of Big Data in  2 minutes!!
Fundamentals of Big Data in 2 minutes!!
 
Conversational Architecture, CAVE Language, Data Stewardship
Conversational Architecture, CAVE Language, Data StewardshipConversational Architecture, CAVE Language, Data Stewardship
Conversational Architecture, CAVE Language, Data Stewardship
 
Cognitive computing big_data_statistical_analytics
Cognitive computing big_data_statistical_analyticsCognitive computing big_data_statistical_analytics
Cognitive computing big_data_statistical_analytics
 
AI at the Edge
AI at the EdgeAI at the Edge
AI at the Edge
 
Digital twin
Digital twinDigital twin
Digital twin
 
Business analytics
Business analyticsBusiness analytics
Business analytics
 
Big, small or just complex data?
Big, small or just complex data?Big, small or just complex data?
Big, small or just complex data?
 
AI-SDV 2020: AI, IoT, Blockchain & Co: How to keep track and take advantage o...
AI-SDV 2020: AI, IoT, Blockchain & Co: How to keep track and take advantage o...AI-SDV 2020: AI, IoT, Blockchain & Co: How to keep track and take advantage o...
AI-SDV 2020: AI, IoT, Blockchain & Co: How to keep track and take advantage o...
 
Digital twins
Digital twinsDigital twins
Digital twins
 
How does big data impact you
How does big data impact youHow does big data impact you
How does big data impact you
 
Summiting the Mountain of Big Data
Summiting the Mountain of Big DataSummiting the Mountain of Big Data
Summiting the Mountain of Big Data
 
The Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesThe Business of Big Data - IA Ventures
The Business of Big Data - IA Ventures
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analytics
 
Evolution of big data technology
Evolution of big data technologyEvolution of big data technology
Evolution of big data technology
 
Living in a data driven world by V Laxmikanth Broadridge
Living in a data driven world by V Laxmikanth BroadridgeLiving in a data driven world by V Laxmikanth Broadridge
Living in a data driven world by V Laxmikanth Broadridge
 

Semelhante a 5 Major Trends in Data You Should Know

DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDATAVERSITY
 
Data Engineering Proposal for Homerunner.pptx
Data Engineering Proposal for Homerunner.pptxData Engineering Proposal for Homerunner.pptx
Data Engineering Proposal for Homerunner.pptxDamilolaLana1
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...Denodo
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019mark madsen
 
State of Big Data Markets
State of Big Data MarketsState of Big Data Markets
State of Big Data MarketsKyle Redinger
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionDenodo
 
How to design ai functions to the cloud native infra
How to design ai functions to the cloud native infraHow to design ai functions to the cloud native infra
How to design ai functions to the cloud native infraChun Myung Kyu
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
The Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleThe Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleVasu S
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Enabling Data centric Teams
Enabling Data centric TeamsEnabling Data centric Teams
Enabling Data centric TeamsData Con LA
 
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big DataVoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big DataVoltDB
 
The Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionThe Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionMongoDB
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Prof.Balakrishnan S
 

Semelhante a 5 Major Trends in Data You Should Know (20)

DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
Data Engineering Proposal for Homerunner.pptx
Data Engineering Proposal for Homerunner.pptxData Engineering Proposal for Homerunner.pptx
Data Engineering Proposal for Homerunner.pptx
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
State of Big Data Markets
State of Big Data MarketsState of Big Data Markets
State of Big Data Markets
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
How to design ai functions to the cloud native infra
How to design ai functions to the cloud native infraHow to design ai functions to the cloud native infra
How to design ai functions to the cloud native infra
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
The Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleThe Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | Qubole
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Enabling Data centric Teams
Enabling Data centric TeamsEnabling Data centric Teams
Enabling Data centric Teams
 
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big DataVoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
 
The Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionThe Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reduction
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
 

Último

YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 

Último (17)

YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 

5 Major Trends in Data You Should Know

  • 1. FiveDataTrends You Should Know Tomasz Tunguz, Managing Director, Redpoint Ventures @ttunguz & tomtunguz.com
  • 5. Metatrend: Rise of Data Engineering as Craft
  • 6. Rise of Data Engineering as Craft Why has Data Become So Ubiquitous?
  • 7. Rise of Data Engineering as Craft Aggregated into EDW Output Oracle SAP Logs TX Actions Cognos Tableau Data Produced When a Single Monolithic Pipeline Worked, It Looked Like This
  • 8. Rise of Data Engineering as CraftBut Everyone Wanted One Exec Team Marketing Product Sales
  • 9. Rise of Data Engineering as CraftAnd They Each Need Data from the Others Exec Team Marketing Product Sales
  • 10. Rise of Data Engineering as Craft This is a Data Mesh: A Network of Data Producers & Consumers
  • 11. Centralize and Move it to a Cloud Data Lake
  • 12. Rise of Data Engineering as Craft Without the right tooling, you have a Data Mess
  • 14. Rise of Data Engineering as CraftBut You Could Have a Breathtaking Machine, When It All Comes Together?
  • 15. Who Will Come to Save the Day?
  • 16. Rise of Data Engineering as CraftWhat is a Data Engineer? Data Engineers: the people who move, shape, and transform data from where it is generated to where it is needed, and do it 1.Consistently 2.Efficiently 3.Scalably 4.Accurately 5.Compliantly
  • 17. Rise of Data Engineering as Craft aka Software Engineers Deep in Data
  • 18. Insight: Software Engineers Have Experience, Tools, and Patterns Writing Code
  • 19. Ex: the Software Development Lifecycle
  • 20. Rise of Data Engineering as Craft What is the Data Engineering Equivalent?
  • 21. Rise of Data Engineering as Craft The Data Engineering Lifecycle
  • 22. Rise of Data Engineering as Craft Each Step of the DELC Needs New Tools
  • 23. Rise of Data Engineering as CraftData Pipelines: Watermains of Data Code in a modern language to repeatably move data around Innovators Airflow, Elementl, Prefect
  • 25. Rise of Data Engineering as CraftCompute Engines: Access Cloud Data Query data in the cloud, without moving it. Key insight: separation of data and compute. Innovators Dremio, Databricks
  • 27. Rise of Data Engineering as CraftData Modeling: Universal Metrics Library Define metrics once for the entire organization Innovators Transform Data, Looker
  • 29. Rise of Data Engineering as CraftData Products: Stand on the Shoulders of Gigabytes Build and deploy data products internally and externally Innovators BI: Preset ML: Streamlit, Tecton
  • 32. Rise of Data Engineering as CraftData Quality: Harness & Tame Error Develop tests and monitor data flows to ensure data integrity Innovators Monte Carlo, Great Expectations, Soda Data, Data Gravity
  • 35. 5 Data Trends You Should Know 1.Data Pipelines – move data with code 2.Compute Engines – query cloud data 3.Modeling – defines metrics once 4.Data Products – squeeze insight from data 5.Data Quality – keep data accurate
  • 37. FiveDataTrends You Should Know Tomasz Tunguz, Managing Director, Redpoint Ventures @ttunguz & tomtunguz.com

Notas do Editor

  1. Thank you for the warm introduction, Jason. I’m thrilled to be here. My name is Tomasz Tunguz. I’m a managing director at Redpoint Ventures and I write a blog at tomtunguz.com It’s a data infused collection of posts about startups.
  2. Let me tell you about Redpoint. Redpoint is a venture firm based in Silicon Valley. Invest anywhere from 1m to 50m in companies primarily in the US. We’re a group of founders and operators who have founded startups, operated at hypergrowth companies, and helped startups scale to terrific heights.
  3. We work or have worked with 26 Unicorns and some iconic companies with more than 25b in market cap. Including Stripe, Hashicorp, Twilio, Duo Security and Zendesk. We have deep domain experience in data. We were early investors in Looker, Snowflake and Dremio. We evaluate about 7000 investment opportunities annually and this presentation is meant to distill some of the trends we see in market.
  4. I’m passionate about data. I was first exposed to the power of data studying machine learning at college. I studied control systems for satellites and saw how that technology could be used in the stock market. Then went to Google. Google’s business is entirely predicated on data. I saw first hand the impact and the leverage we could drive from great data if properly managed through the right systems and tools. I have been deep in in data ever since I co-authored a book on data called Winning with Data, that researched the challenges modern organizations face with data and how the best companies in the world mitigate those challenges and transform data into competitive advantage. Like you all, I love data and the power & insight it can give businesses.
  5. Today, I’ll share with you 5 trends we’re seeing in the data world. But you should know there is one megatrend, a huge wave, furthering these trends. That trend is the rise of data engineering as a new craft. The word data engineer is new, and the idea is important. Data engineers will define the next decade. Ten years ago, the people working with data, moving it, shaping it, slicing it, came from many different backgrounds. Some came from finance; others have statistics backgrounds; still others came from customer support (like me) and they all found themselves in data roles. This convergence across disciplines occurred because data has become a critical part of every modern company’s technology stack. Data has become essential. So companies invest in specialized people, processes and systems to maximize the benefit they can squeeze from data.
  6. Data engineering has come about because data is everywhere. And every bit of a business’ data is valuable,. The reason data has become so ubiquitous is it costs much less to store than it did 20 years ago. 20 years ago, we stored data in Oracle databases that were expensive and required new licenses as data scaled. So we filtered it aggressively. Today, we store exabytes data in files on S3. Because we can afford it. For the price of two oat-milk macchiatos at BlueBottle, I can store half a terabyte of data on Amazon for a month. So, we store data because we can afford it. And we store buckets, reams, mountains of it. Since we have all that data at hand, we decided to use it. 20 years ago, IT bought the systems to extract value from data. They procured them, installed them, and managed these systems. But ten years ago, forward thinking teams decided to do it for themselves. IT was too slow. A modern marketing team can’t wait 3 to 6 months to get the answers to their questions. They’ll be toast. So the marketing team bought their own system. And, then the marketing team created data products. At first, these data products were dashboards. How many new clicks? How many leads? How many customers? How much ad spend? Then marketing operations teams became more sophisticated. They started to run scenarios to test different ideas, and experiment with new techniques. Today, marketing is a panoply of machine learning algorithms stuffed to the gills with first-party data, a quantitative hedge fund for buying online ads. All in 15 years. Those predictive systems create data of their own, which is stored and processed. This is more than a process; it’s a flywheel that goes faster and faster and faster. A massive digital boulder of ones and zeros coming down the hill at top speed. The problem is that this boulder isn’t just in marketing. It’s everywhere within a company.
  7. Let me explain. 20 years ago this is how the data world worked at the highest level. Systems produced data: system logs, transactions, customer actions on websites. The data was filtered into an enterprise data warehouse and data cube because of cost. And then pumped into a legacy output system like a Tableau or Cognos. This worked for small data volumes. But it’s expensive, inflexible, closed and slow. Pop quiz: how long does it take to update a report in Cognos? Too long. Your business is dead. But this was state of the art.
  8. And everyone wanted one. Each team manager saw success with data. The authority, the command of the business, the ideas that flowed from the data. It’s intoxicating when you can use data to see around corners, inspire confidence, and lead teams boldly into the future. So each team developed their own data systems. IT couldn’t keep up. And consumerization of IT was born. For every $1 IT spent on technology, department heads spent an additional 47 cents to outfit their teams with the best kit. At the outset, departments built small systems. But then each hired operations teams, doublespeak for data analysts and data engineers to help them understand the data, predict the future, and build data products on top. A thousand digital flowers bloomed. And they grew and grew and grew.
  9. And that garden quickly became overrun with complexity. Leaves and thorns everywhere. The marketing team decided they needed data from other places; not just the central data store administered by IT. The marketing team needs access to the CRM data base to understand customer value. Oh, and customer support data to understand customer lifecycles. Plus, billing data from the finance team. And a bit of product data: those web analytics inform customer conversion. It wasn’t just marketing that was sapping data from other teams. Each department needed data from the other to operate their businesses best. Which created a completely new concept.
  10. This idea has a name. It is called the data mesh. It is a network of data producers and consumers within an organization. Each team is responsible for producing its own data, publishing data via some API or common format. It’s responsible for documenting the data, explaining the lineage, keeping it up to date, so other teams can use it and rely on it to decide. In exchange, other teams do the same. This creates a mesh, and enables the organization to send the data, use APIs, and develop increasingly sophisticated data products at scale.
  11. And then, importantly, modern companies move this all to a cloud data lake. In the cloud, data is elastic, cheap, maintained by someone else, and accessible by everyone (with the right IAM permissions of course). More importantly, teams stored data in these cloud data lakes in standard, open-source formats like Parquet and Arrow. These formats accelerate queries, create a single standard which makes it easier to work with tools that you have today and tools that have not been invented yet That’s the vision. That’s where the industry is going. But we are all in different states of getting there. And the reality is more complicated than these beautiful diagrams.
  12. In fact today, many companies don’t have a data mesh, they have a date mess. Each team has their own tools, data storage depots and infrastructure. It’s a big bucket of Legos. Systems that don’t talk to each other. Confusion about three different definitions of revenue. Where is the customer support data table? Oh, that’s the old version. And that column that reads date_final_final is actually the wrong format. We moved it to a new column called dff..f. And to access that table you need to speak COBOL. But we lost the COBOL/NodeJS connector.
  13. Data Messes have 4 consistent problems data breadlines: I have a question about the business. Let me go and ask the engineer I met at lunch if she’ll do me the favor of pulling the data, again. Data breadlines are the invisible people people waiting around for answers to their data questions, who ask a question and go to the back of the line when they need a refinement. Data obscurity or rogue databases: when I was at google, I operated a rogue database. I asked an engineer to run a MapReduce job to help me understand the competition and dump that table on a server underneath my desk. Then I bult reports on that table and we used it to prioritize customer acquisition techniques. No one knew it was there. No one validated the data. Data fragmentation is the challenge of finding out where data is. You see the dashboard in front of you. You know the data is stored somewhere in the company, but where? Who owns it? Data brawls: the fights between teams about the definition of payback period).
  14. The vision, as it has always been with data systems is to put it all together and develop a breathtaking machine that enables a company to grow significantly faster. I can tell you from working with some of the leading companies in the data world, when you do achieve this vision, it’s a transformation. It enables teams to move faster, execute better, and outperform the competition. I saw it at Google. I saw it at Looker and many of Looker’s customers. And we’re seeing it at Dremio too. Companies that can migrate to data meshes suddenly unlock hidden productivity, It’s a big leap and challenge.
  15. But, getting there and building a machine is not easy. So the question is when you put the bat signal who will come to save the day? There’s a simple answer. It’s the data engineer. This is why this role has evolved. Because the complexity has gotten to a point that we need specialized people to manage this infrastructure and empower everyone within a company to use data effectively. We believe that data engineering is the customer success of this decade. A new role, is critically important to a company, that will champion a discipline of the future. Although I can’t see you, I’m confident many of you in the audience are exactly the superhero, maybe minus the Batmobile.
  16. What is a data engineer? They are the people who move, shape, and transform data from where it is generated to where it is needed and do it consistently, efficiently, scalability, accurately and compliantly. Date engineers have many different skills. Some of them are infrastructure specialists. Others have focused on reporting and the tools associated with analytics. Still others develop and host and maintain machine learning infrastructure. It is a broad discipline of very smart people who are going to be key to business success in the next 10 years.
  17. In other words, these people are software engineers who are deep data around
  18. In researching this market, we had an insight. Software engineers have decades of experience writing software, building tooling and patterns of writing code.
  19. The ost recent example of this is the cloud native computing foundations software development lifecycle. This is a ourobouros, a snake eating its tail, an infinite cycle. It is a consistent process for how to manage software releases in the most modern way. Vendors within that ecosystem use this diagram within their pitches to customers and describe exactly which part of the processes they address. Managers use this process to talk about tooling at different steps of the engineering process. It has 8 steps. Plan the software you want to build Code it Build it and package it to ship Test it with a testing harness Release the software by pushing into the production environment Deploy the software across your cloud Operate it Monitor it And repeat
  20. If data engineering really is software engineers deep in data, what is the data engineering equivalent of the software development lifecycle? I haven’t been able to find one. But, in talking to hundreds of potential buyers of this kind of software, we have a hypothesis of what it should look like.
  21. This is what we observe the market for the data engineering cycle. It has six steps Ingesting data from whatever data producer is spewing data into storage system like Amazon S3 Planning: this is the phase of deciding what it is that you want to do with this data Query: modern computer engines run over the data to filter and aggregate the data in a way that’s useful to a particular product. Data Modeling: is the work of defining a metric once in a central place so that everyone within the company can benefit from it. Developing Product: Is the work of actually building a product around the data and the insights contained within that data Monitoring: the act and process of ensuring data is flowing normally and is accurate at all times This cycle creates more data which is then ingested saved and pumped back into the rest of the cycle.
  22. In each of the steps of the data engineering lifecycle, new tools are emerging to support the work of the data engineer. These are the five major trends within the data world.
  23. First, data pipelines. These are the watermains of data moving data from where it’s produced to where it can be leveraged. Data pipelines have been around forever. The main advance in these data pipelines are Using modern computing languages Creating higher levels of abstractions to enable engineers to reuse code across different data pipelines to improve productivity Monitoring within these data pipelines Visualization of the DAGs a directed acyclic graph, all the steps involved
  24. Here are screenshots of Prefect’s products which ingests code and then creates a DAG visualization. You can see the different steps in the data process. And on the right, there is a monitoring dashboard that shows the state of the data pipeline, the errors, and the activity. The idea is to treat data pipelines as real code with true monitoring to ensure data is always accurate. Some of these systems
  25. Computer engines query the data within the cloud without moving it. This enables teams to get access to all the information they need from a single place in a cost-effective, compliant, and fast way. These computer engines are the execution layer that sits on top all of the open format files. Compute engines accelerate queries. They make them faster not just for a single user vote for everybody. They reduce cost because you’re not having to move data around. They eliminate data lock-in because any tool can talk to them provided they use an open format like arrow. We’ve been lucky enough work with Dremio from the beginning, and it’s our company that has seen this trend years ago and develop the infrastructure to enable you to achieve this vision
  26. The next step in the process is data modeling. The idea is to define revenue once so that the sales team and the marketing team both have the same definition and don’t get into data disputes with each other. Make sure that the entire company is aligned on a single number. I’m sure we’ve all lived through a meeting where we are arguing about a topic, and we’ve each got a different number for revenue or lead count or payback period. Modeling is all about creating an owner of a metric, explaining what that metric is, describing the lineage, so that everybody is on the same page and using the right number in the right column to make the best decision
  27. The other important part of data modeling is to ensure you undersntand what your data is telling you. Variations inData definitions can have meaningful impact on how you interpret a number. So, companies like transform develop systems where dimensions and metrics of data are defined once, in a central place. This code is checked into Github. Then, whenever you need data you query the data modeling interface, which ensures you know the revenue metric you are asking for is the revenue metric that everybody else is using and the one that has been approved by finance.
  28. Data products are the insights, analytics, and software built using data within a company. There are two big buckets of these I’ll talk about today: Their next generation data visualization companies like preset that enable teams to visualize trends within the data, share this insight with others, and publish them on an ongoing basis to key stakeholders. Preset is a company commercializing an open source software called superset which was created at AirBnB. In fact, the founder of the project and the company Max, spoke to you earlier today. Preset adopts many of the open principles that are consistent with the rest of this ecosystem and applies it to data visualization and data exploration in addition, there is a parallel world within machine learning tooling. This world is huge and purity with the key players in it. Streamlet enables machine learning engineers to share their models with non-technical users either for direct consumption of those models like a recommendation system within a customer support tool for recommending email responses, or for help treating a model in a autonomous vehicle use case for example.
  29. To give you a sense, this is a screenshot of preset’s mapping capability in San Francisco. This is entirely open-source software
  30. This is an example of a StreamlitData product. On the left is the code written in Python. On the right, you see the web UI that is created. In this case, it is an example that allows an end user to tweak and tune data scientist’s machine learning model. And that user doesn’t have to be a technical user. It could be someone who operates autonomous vehicles helping data scientists to the object avoidance algorithm.
  31. Last, data quality. Data quality was a wave in the late 90s. But it is disappeared for about 20 years, or at least hasn’t been adoptied within modern data stacks until now. software engineering has many different systems to ensure new code operates well. There are a battery of performance tests, functional test, unit test, progression tests, concepts of test coverage, monitoring tools and anomaly detection tools. But we don’t have that today. And it manifests itself in the worst way. Has your CEO ever looked at a report you showed him and said the numbers look way off? Has a customer ever called out incorrect data in your product’s dashboards? Data quality is meant to solve that issue and restore consistent credibility within people who use data.
  32. There are two different approaches to data quality. The first is to write explicit tests. This is an expectation from Great Expectations. It says the column room temperature should remain between 60 & 75 degrees for 95% of instances. This type of data integrity testing is like functional testing in software. If engineers know what to expect, this is an effective tool. It does require writing a huge battery of tests and having a test coverage metric similar to software.
  33. There’s another approach using machine learning. Companies like Soda Data and Monte Carlo use ML to understand data patterns and then discover anomalies. These anomalies might be differences in data volumes. A data feed is broken. Or there’s a change in distribution of the data. Instead of a gaussian distribution in the data, now it’s a zipf and which has implications for analysis downstream. The machine learning approach comes from anomaly detection in security systems. And the benefit is the system is automonous. The challenge is ensuring the signal to noise ratio is strong and meaningful, otherwise, users won’t pay attention to the results.
  34. So, in summary, these are the five data trends you should know. These are the data trends that we have observed after meeting thousands of companies and talking to hundreds of prospective buyers. These are the technologies that we expect will define the data world over the next 10 years. These five trends are not enough.
  35. It’s really early in this decade of data engineering. We are 6 months into her 10 year long movement. The future depends on you. We need engineers to weave all these different technologies together into a beautiful data tapestry. These are not easy problems, and the landscape underneath you is changing all the time. There are new software tools, legacy applications, lots of demands from everybody around you to get them exactly what they need when they need it which is yesterday. But at Redpoint, we believe this decade is the decade of the data engineer. An entirely new role that specializes in the critically important functions of getting data from the places it is generated to the places it creates insights and unlocks powerful decision-making ability within businesses The future depends on you.