SlideShare uma empresa Scribd logo
1 de 55
Baixar para ler offline
Taming the Compliance Beast:
Lessons learnt at LinkedIn
Sept 28, 2017
Shirshanka Das, Principal Staff Engineer, LinkedIn
Tushar Shanbhag, Head of Data Products, LinkedIn
@shirshanka, @tusharis
ever-evolving
^
Data Protection in a Digital World
PLAYING CATCH-UP WITH INNOVATION
GDPR
metric scripts
production code
Business facing
decision making
OUR VISION
Create economic opportunity for every
member of the global workforce
LinkedIn’s Vision
29K
schools
10M
companies
11B
endorsements
500M
Members
10M
jobs
The LinkedIn Privacy Paradox
“On one hand, the company has
500+ million members trusting
the company to protect highly
sensitive data.
On the other hand, one only
joins the largest professional
network on the Internet because
they want to be found !"     
     
Kalinda Raina,
Head of Global Privacy, LinkedIn
MEMBER PRIVACY <> MEMBER DISCOVERY
metric scripts
Members First is a Core Value for LinkedIn
MEMBER PRIVACY WHILE DELIVERING MEMBER VALUE
production code
Well-connected.
Get relevance right.
Few connections.
Give them inventory.
Example
Member value is proportional to knowledge
Member privacy is paramount for LinkedIn
We strive to maintain this fine balance
Data Is the Lifeblood of LinkedIn
MEMBER EXPERIENCES + BUSINESS DECISIONS
production code
Member Data
System of Intelligence
Member Experiences
Business Decisions
We needed data democracy to
deliver member value
LinkedIn Data Science
I want to analyze as much data as
possible so my models are accurate
Data Democracy
ALL THE DATA, ALL THE TIME
I want to discover data that’s needed for my
analysis as fast as possible
I want to access that data as quickly as
possible for my analysis

I want my personal data to be stored only
where needed and not propagated
unnecessarily
Data Protection
Need to Ensure Member Privacy
LinkedIn Members
STORE, PROCESS, DELETE,..
I want my personal data to be deleted when
I close my account or request deletion
I want my personal data to only be
processed if essential and only if I consent
DATA DEMOCRACY <> DATA PROTECTION
More Data
Discover Data
Easy Access
Less Data
Discover Violations
Restricted Access
The Data Paradox
LinkedIn’s Data Ecosystem
LinkedIn’s Data Ecosystem
LinkedIn’s Data Ecosystem
LinkedIn’s Data Ecosystem
LinkedIn’s Data Ecosystem
LinkedIn’s Data Ecosystem
DATA DEMOCRACY <> DATA PROTECTION
More Data
Discover Data
Easy Access
Less Data
Discover Violations
Restricted Access
The Data Paradox
Data Hubs at LinkedIn
In Motion
At Rest
Scale
O(10) clusters
~2.3 Trillion messages
~450 TB
Scale
O(10) clusters
~10K machines
~100 PB
In Motion
At Rest
Data Integration
SFTP
JDBC
REST
Azure
Blob, Data
Lake
Storage
SFTP
JDBC
REST
Apache Gobblin: Simplifying Data Integration
@LinkedIn
Hundreds of TB per day
Thousands of datasets
~30 different source systems
80%+ of data ingest
Open source @ https://gobblin.apache.org/
Stream + Batch
Adopted by LinkedIn, Intel, PayPal, Apple, IBM,
Swisscom, Prezi, AppLift, NerdWallet and many more…
SFTP
Azure
Blob, Data
Lake
Storage
REQUIREMENTS
Less Data
Legal: Right to Erasure or Right to be Forgotten
“Delete all my personal data without undue delay when it is no
longer necessary / when consent has been withdrawn”
Engineering:
Need the ability to delete some specific subset or all data associated
with a specific LinkedIn member from all our data systems
A lot of data, different formats
Challenges
Understand HDFS data: organization, formats, …
Cycle asynchronously, within an SLA, deleting
records, without affecting running jobs
Quarantine exceptional records for manual triage
Can scale to processing hundreds of PB of data
Data Deletion
IMPLICATIONS FOR HADOOP
Gobblin: The Logical Pipeline
Source
Work
Unit
Work
Unit
Work
Unit
Extract Convert Quality Write Data
Publish
WriteQualityConvertExtract
Extract Convert Quality Write
Task
Task
Task
Gobblin: Extending for Purge
HDFS
Work
Unit
Data
Publish
Extract Convert Quality Write
Task
Task
HDFS
If needs purge
then drop
else continue
Member’s Delete
Requests
STATUS AND CHALLENGES
Gobblin: Data Lifecycle Management at Scale
Status
Number of datasets: many thousands
Amount of data scanned for purge: XXX TB/day
Challenges
Immutable Storage Formats +  Right to Erasure = Unhappy Disks
“Widespread implementation will surely lead to innovation in these formats!”
DATA DEMOCRACY <> DATA PROTECTION
More Data
Discover Data
Easy Access
Less Data
Discover Violations
Restricted Access
The Data Paradox
DATA LIFECYCLE MANAGEMENT
DATA DEMOCRACY <> DATA PROTECTION
More Data
Discover Data
Easy Access
Less Data
Discover Violations
Restricted Access
The Data Paradox
DATA LIFECYCLE MANAGEMENT
LinkedIn’s Data Ecosystem
Metadata based Search Experience
for Data Scientists
Data Discovery
Where is dataset X?
How did it get created?
Usage : In production since 2014
Users : Data Scientists, Product Engineers
Use Cases: Discovery, Impact Analysis
WhereHows
FIND DATA, NAVIGATE RELATIONSHIPS
Open source @ github.com/linkedin/wherehows
SEARCH SCREENSHOTS
WhereHows
LINEAGE SCREENSHOTS
WhereHows
More than just Discovery
Use Cases
Which datasets at LinkedIn contain PII or highly
confidential data?
How many contain member-member messages?
How many of them are accessible by team X?
Have all datasets been purged within SLA?
Discovering Violations
ANSWERING HARDER QUESTIONS
Wide + Deep
Metadata
Comprehensive coverage of data systems at LinkedIn
We have > 20 systems!
SQL, NoSQL, Indexes, Blob Stores, …
Deeper understanding of each dataset
Schema is not enough
Need to understand semantics
Discovering Violations
REQUIREMENTS
A METADATA REFINERY APPROACH
WhereHows Architecture @ 10,000 ft
ML driven
refinements
DATA DEMOCRACY <> DATA PROTECTION
More Data
Discover Data
Easy Access
Less Data
Discover Violations
Restricted Access
The Data Paradox
DATA LIFECYCLE MANAGEMENT
METADATA
METADATA
DATA DEMOCRACY <> DATA PROTECTION
More Data
Discover Data
Easy Access
Less Data
Discover Violations
Restricted Access
The Data Paradox
DATA LIFECYCLE MANAGEMENT
FREEDOM OF EXPRESSION
Many Transformation Engines @ LinkedIn
In Motion
At Rest
HARD TO CHANGE ANYTHING UNDERNEATH!
Challenge for Infrastructure Providers
(Pig scripts)
My Raw Data
Native readers, dependencies on path, format hard-coded
Hard to move to
better formats
without breaking
everyone or
copying data twice
My Raw Data
HARD TO CHANGE ANYTHING UPSTREAM!
Semantic Challenges
Data is unclean (bad data on certain dates)
Data models are in constant flux (split event into multiple)
Have to change
data processing
logic everywhere!
My Raw Data
AN API TO MANAGE EVOLUTION
We need “microservices” for Data
My Data API
My Raw Data
A DATA ACCESS LAYER FOR LINKEDIN
We built Dali to solve this
Logical Tables + Views
Logical FileSystem
Abstract away underlying physical details to
allow users to focus solely on the logical
concerns
Dali: Implementation Details in Context
Dali FileSystem
Processing Engine
(MR, Spark)
Dali Datasets (Tables+Views)
Dataflow APIs
(MR, Spark,
Scalding)
Query Layers
(Pig, Hive,
Spark)
Dali CLI
Data Catalog
Git + Artifactory
View Def +
UDFs
Dataset
Owner
Data Source
Data Sink
Simple to Complex
Different Types
Basic Restrictions
Access to dataset based on business need
Privacy by Default
Analysts shouldn’t get access to raw PII by
default
Consent-based Access
Access to certain data elements only available
if member has consented for that particular use-
case
Access Restrictions
REQUIREMENTS
STEP 1: DATA + METADATA
Solving for Compliant Access
Schema = {
int memberId
String firstName
String lastName
Position[] positions
educationHistory[] educationHistory
…
}
MemberProfile
MEMBER_ID
NAME
PROFILE DATA
NAME : is_pii
MEMBER_ID : is_pii
Raw
Dataset
Meta
Data
STEP 2: A MEMBER’S PREFERENCES
Privacy Preferences
A BITMAP DATASET: ONE PER MEMBER
Privacy Preferences
Member Privacy
Preferences
Solving for Compliant Access With Dali
Raw
Dataset
Meta
Data
Member Privacy
Preferences
Dali Reader responsibility:
Given:
(Dataset, Metadata, UseCase)
Generate:
Dataset and Column-level
transformations
(obfuscate, null, …)
Auto-join with Member
Privacy Preferences
(filter out data elements that
are not consented to)
Processing
Logic
Dali
Reader
Library
Use
Case = X
Solving for Compliant Purging With Dali + Gobblin
Raw
Dataset
Meta
Data
Member Privacy
Preferences
Gobblin
Purger
Dali
Reader
Library
Use
Case =
Purge
Member’s Delete
Requests
Purged
Dataset
DATA DEMOCRACY <> DATA PROTECTION
More Data
Discover Data
Easy Access
Less Data
Discover Violations
Restricted Access
The Data Paradox
DATA LIFECYCLE MANAGEMENT
METADATA
DATA ACCESS LAYER
DATA DEMOCRACY <> DATA PROTECTION
More Data
Discover Data
Easy Access
Less Data
Discover Violations
Restricted Access
The Data Paradox : Solved !
METADATA
DATA ACCESS LAYER
DATA LIFECYCLE MANAGEMENT
DATA DEMOCRACY + DATA PROTECTION
The Technology Blueprint
WhereHows*
Dali Apache Gobblin*
* Open Source : We can collaborate on these together!
DATA LIFECYCLE MANAGEMENTDATA ACCESS LAYER
METADATA
Core company value, implemented
by Technology & Process
Privacy By Design
Privacy : Technology + Process
SUSTAINABILITY IS CRITICAL
Product : Security & Privacy Review
Data : Data Model Review
Legal : Regulation change -> Tech requirements
Company-wide : “Horizontal” Initiatives
Getting Stricter and more complex
Data Protection
Key Takeaways
THE BEAST IS REAL
Stricter regulations in a digital world
Increasingly more complex to implement
This is an accelerating global trend
We’ve established a blueprint to
sustainably address privacy
Learnings at LinkedIn
Key Takeaways
THE BEAST CAN BE TAMED !
Privacy By Design : baked into technology
stack & product development process
Standardization : To solve at scale, certain
parts need to be centralized and standardized
Company-wide : Needs co-ordinated effort
across various functions
DATA DEMOCRACY <> DATA PROTECTION
More Data
Discover Data
Easy Access
Less Data
Discover Violations
Restricted Access
The Data Paradox : Solved !
METADATA
DATA ACCESS LAYER
DATA LIFECYCLE MANAGEMENT
Thank You!

Mais conteúdo relacionado

Mais procurados

Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixWhoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixDataWorks Summit
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Databricks
 
JSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataJSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataGregg Kellogg
 
Data and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageData and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageJulien Le Dem
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaAvinash Ramineni
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Netflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsNetflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsBlake Irvine
 
Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQGeorge Teo
 
MongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB
 
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...confluent
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergWalaa Eldin Moustafa
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication confluent
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stackVikrant Chauhan
 
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...Flink Forward
 
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-PlatformDelight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-PlatformDatabricks
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Ryan Cuprak
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
 

Mais procurados (20)

Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixWhoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
 
JSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataJSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked Data
 
Data and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageData and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineage
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and Kibana
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Netflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsNetflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of Analytics
 
Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQ
 
MongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad Query
 
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and Iceberg
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
 
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
 
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-PlatformDelight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 

Destaque

What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...Edureka!
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017NVIDIA
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith
 
Inside Google's Numbers in 2017
Inside Google's Numbers in 2017Inside Google's Numbers in 2017
Inside Google's Numbers in 2017Rand Fishkin
 
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017Carol Smith
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Shirshanka Das
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Privacy is for losers 2016
Privacy is for losers 2016Privacy is for losers 2016
Privacy is for losers 2016Cain Ransbottyn
 
Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017
Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017
Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017NVIDIA
 
Top 5 Deep Learning and AI Stories - November 3, 2017
Top 5 Deep Learning and AI Stories - November 3, 2017Top 5 Deep Learning and AI Stories - November 3, 2017
Top 5 Deep Learning and AI Stories - November 3, 2017NVIDIA
 
Totally Excellent Tips for Righteous Local SEO
Totally Excellent Tips for Righteous Local SEOTotally Excellent Tips for Righteous Local SEO
Totally Excellent Tips for Righteous Local SEOGreg Gifford
 
Infrastructure as code: running microservices on AWS using Docker, Terraform,...
Infrastructure as code: running microservices on AWS using Docker, Terraform,...Infrastructure as code: running microservices on AWS using Docker, Terraform,...
Infrastructure as code: running microservices on AWS using Docker, Terraform,...Yevgeniy Brikman
 
Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013
Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013
Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013Cain Ransbottyn
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShareSlideShare
 
Secrets to a Great Team
Secrets to a Great TeamSecrets to a Great Team
Secrets to a Great TeamElodie A.
 

Destaque (20)

What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
The AI Rush
The AI RushThe AI Rush
The AI Rush
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 
Inside Google's Numbers in 2017
Inside Google's Numbers in 2017Inside Google's Numbers in 2017
Inside Google's Numbers in 2017
 
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Privacy is for losers 2016
Privacy is for losers 2016Privacy is for losers 2016
Privacy is for losers 2016
 
Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017
Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017
Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017
 
Top 5 Deep Learning and AI Stories - November 3, 2017
Top 5 Deep Learning and AI Stories - November 3, 2017Top 5 Deep Learning and AI Stories - November 3, 2017
Top 5 Deep Learning and AI Stories - November 3, 2017
 
Totally Excellent Tips for Righteous Local SEO
Totally Excellent Tips for Righteous Local SEOTotally Excellent Tips for Righteous Local SEO
Totally Excellent Tips for Righteous Local SEO
 
Energy conservation ppt
Energy conservation pptEnergy conservation ppt
Energy conservation ppt
 
Infrastructure as code: running microservices on AWS using Docker, Terraform,...
Infrastructure as code: running microservices on AWS using Docker, Terraform,...Infrastructure as code: running microservices on AWS using Docker, Terraform,...
Infrastructure as code: running microservices on AWS using Docker, Terraform,...
 
Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013
Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013
Privacy is an Illusion and you’re all losers! - Cryptocow - Infosecurity 2013
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
Online Harassment 2017
Online Harassment 2017Online Harassment 2017
Online Harassment 2017
 
SEO in 2017/18
SEO in 2017/18SEO in 2017/18
SEO in 2017/18
 
Secrets to a Great Team
Secrets to a Great TeamSecrets to a Great Team
Secrets to a Great Team
 

Semelhante a Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strata NYC 2017]

Balancing Data Democracy with Data Privacy: The LinkedIn Story
Balancing Data Democracy with Data Privacy: The LinkedIn StoryBalancing Data Democracy with Data Privacy: The LinkedIn Story
Balancing Data Democracy with Data Privacy: The LinkedIn StoryAnthony Hsu
 
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)Jun Rao
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security datamarkgrover
 
Linked Data Planet Key Note
Linked Data Planet Key NoteLinked Data Planet Key Note
Linked Data Planet Key Noterumito
 
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bhaskar Ghosh
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8dallemang
 
Qiagram
QiagramQiagram
Qiagramjwppz
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Mark Tabladillo
 
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)mars197365
 
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsEnterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsDenodo
 
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data ConnectECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data ConnectEuropean Collaboration Summit
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesDenodo
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?confluent
 
Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And HadoopEdureka!
 
Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation Sabrina Kirrane
 
BigDataRevealed SecureSequesterEncrypt - iot easy as 1-2-3 - catalog-metadata...
BigDataRevealed SecureSequesterEncrypt - iot easy as 1-2-3 - catalog-metadata...BigDataRevealed SecureSequesterEncrypt - iot easy as 1-2-3 - catalog-metadata...
BigDataRevealed SecureSequesterEncrypt - iot easy as 1-2-3 - catalog-metadata...Steven Meister
 

Semelhante a Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strata NYC 2017] (20)

Balancing Data Democracy with Data Privacy: The LinkedIn Story
Balancing Data Democracy with Data Privacy: The LinkedIn StoryBalancing Data Democracy with Data Privacy: The LinkedIn Story
Balancing Data Democracy with Data Privacy: The LinkedIn Story
 
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
 
Data Privacy at Scale
Data Privacy at ScaleData Privacy at Scale
Data Privacy at Scale
 
The Power of Data
The Power of DataThe Power of Data
The Power of Data
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security data
 
Linked Data Planet Key Note
Linked Data Planet Key NoteLinked Data Planet Key Note
Linked Data Planet Key Note
 
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 
Qiagram
QiagramQiagram
Qiagram
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
 
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsEnterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
 
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data ConnectECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solves
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?
 
Data mining
Data miningData mining
Data mining
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And Hadoop
 
Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation
 
BigDataRevealed SecureSequesterEncrypt - iot easy as 1-2-3 - catalog-metadata...
BigDataRevealed SecureSequesterEncrypt - iot easy as 1-2-3 - catalog-metadata...BigDataRevealed SecureSequesterEncrypt - iot easy as 1-2-3 - catalog-metadata...
BigDataRevealed SecureSequesterEncrypt - iot easy as 1-2-3 - catalog-metadata...
 

Último

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 

Último (20)

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 

Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strata NYC 2017]

  • 1. Taming the Compliance Beast: Lessons learnt at LinkedIn Sept 28, 2017 Shirshanka Das, Principal Staff Engineer, LinkedIn Tushar Shanbhag, Head of Data Products, LinkedIn @shirshanka, @tusharis ever-evolving ^
  • 2. Data Protection in a Digital World PLAYING CATCH-UP WITH INNOVATION GDPR
  • 3. metric scripts production code Business facing decision making OUR VISION Create economic opportunity for every member of the global workforce LinkedIn’s Vision 29K schools 10M companies 11B endorsements 500M Members 10M jobs
  • 4. The LinkedIn Privacy Paradox “On one hand, the company has 500+ million members trusting the company to protect highly sensitive data. On the other hand, one only joins the largest professional network on the Internet because they want to be found !"            Kalinda Raina, Head of Global Privacy, LinkedIn MEMBER PRIVACY <> MEMBER DISCOVERY
  • 5. metric scripts Members First is a Core Value for LinkedIn MEMBER PRIVACY WHILE DELIVERING MEMBER VALUE production code Well-connected. Get relevance right. Few connections. Give them inventory. Example Member value is proportional to knowledge Member privacy is paramount for LinkedIn We strive to maintain this fine balance
  • 6. Data Is the Lifeblood of LinkedIn MEMBER EXPERIENCES + BUSINESS DECISIONS production code Member Data System of Intelligence Member Experiences Business Decisions
  • 7. We needed data democracy to deliver member value LinkedIn Data Science I want to analyze as much data as possible so my models are accurate Data Democracy ALL THE DATA, ALL THE TIME I want to discover data that’s needed for my analysis as fast as possible I want to access that data as quickly as possible for my analysis

  • 8. I want my personal data to be stored only where needed and not propagated unnecessarily Data Protection Need to Ensure Member Privacy LinkedIn Members STORE, PROCESS, DELETE,.. I want my personal data to be deleted when I close my account or request deletion I want my personal data to only be processed if essential and only if I consent
  • 9. DATA DEMOCRACY <> DATA PROTECTION More Data Discover Data Easy Access Less Data Discover Violations Restricted Access The Data Paradox
  • 16. DATA DEMOCRACY <> DATA PROTECTION More Data Discover Data Easy Access Less Data Discover Violations Restricted Access The Data Paradox
  • 17. Data Hubs at LinkedIn In Motion At Rest Scale O(10) clusters ~2.3 Trillion messages ~450 TB Scale O(10) clusters ~10K machines ~100 PB
  • 18. In Motion At Rest Data Integration SFTP JDBC REST Azure Blob, Data Lake Storage
  • 19. SFTP JDBC REST Apache Gobblin: Simplifying Data Integration @LinkedIn Hundreds of TB per day Thousands of datasets ~30 different source systems 80%+ of data ingest Open source @ https://gobblin.apache.org/ Stream + Batch Adopted by LinkedIn, Intel, PayPal, Apple, IBM, Swisscom, Prezi, AppLift, NerdWallet and many more… SFTP Azure Blob, Data Lake Storage
  • 20. REQUIREMENTS Less Data Legal: Right to Erasure or Right to be Forgotten “Delete all my personal data without undue delay when it is no longer necessary / when consent has been withdrawn” Engineering: Need the ability to delete some specific subset or all data associated with a specific LinkedIn member from all our data systems
  • 21. A lot of data, different formats Challenges Understand HDFS data: organization, formats, … Cycle asynchronously, within an SLA, deleting records, without affecting running jobs Quarantine exceptional records for manual triage Can scale to processing hundreds of PB of data Data Deletion IMPLICATIONS FOR HADOOP
  • 22. Gobblin: The Logical Pipeline Source Work Unit Work Unit Work Unit Extract Convert Quality Write Data Publish WriteQualityConvertExtract Extract Convert Quality Write Task Task Task
  • 23. Gobblin: Extending for Purge HDFS Work Unit Data Publish Extract Convert Quality Write Task Task HDFS If needs purge then drop else continue Member’s Delete Requests
  • 24. STATUS AND CHALLENGES Gobblin: Data Lifecycle Management at Scale Status Number of datasets: many thousands Amount of data scanned for purge: XXX TB/day Challenges Immutable Storage Formats +  Right to Erasure = Unhappy Disks “Widespread implementation will surely lead to innovation in these formats!”
  • 25. DATA DEMOCRACY <> DATA PROTECTION More Data Discover Data Easy Access Less Data Discover Violations Restricted Access The Data Paradox DATA LIFECYCLE MANAGEMENT
  • 26. DATA DEMOCRACY <> DATA PROTECTION More Data Discover Data Easy Access Less Data Discover Violations Restricted Access The Data Paradox DATA LIFECYCLE MANAGEMENT
  • 28. Metadata based Search Experience for Data Scientists Data Discovery Where is dataset X? How did it get created? Usage : In production since 2014 Users : Data Scientists, Product Engineers Use Cases: Discovery, Impact Analysis WhereHows FIND DATA, NAVIGATE RELATIONSHIPS Open source @ github.com/linkedin/wherehows
  • 31. More than just Discovery Use Cases Which datasets at LinkedIn contain PII or highly confidential data? How many contain member-member messages? How many of them are accessible by team X? Have all datasets been purged within SLA? Discovering Violations ANSWERING HARDER QUESTIONS
  • 32. Wide + Deep Metadata Comprehensive coverage of data systems at LinkedIn We have > 20 systems! SQL, NoSQL, Indexes, Blob Stores, … Deeper understanding of each dataset Schema is not enough Need to understand semantics Discovering Violations REQUIREMENTS
  • 33. A METADATA REFINERY APPROACH WhereHows Architecture @ 10,000 ft ML driven refinements
  • 34. DATA DEMOCRACY <> DATA PROTECTION More Data Discover Data Easy Access Less Data Discover Violations Restricted Access The Data Paradox DATA LIFECYCLE MANAGEMENT METADATA
  • 35. METADATA DATA DEMOCRACY <> DATA PROTECTION More Data Discover Data Easy Access Less Data Discover Violations Restricted Access The Data Paradox DATA LIFECYCLE MANAGEMENT
  • 36. FREEDOM OF EXPRESSION Many Transformation Engines @ LinkedIn In Motion At Rest
  • 37. HARD TO CHANGE ANYTHING UNDERNEATH! Challenge for Infrastructure Providers (Pig scripts) My Raw Data Native readers, dependencies on path, format hard-coded Hard to move to better formats without breaking everyone or copying data twice My Raw Data
  • 38. HARD TO CHANGE ANYTHING UPSTREAM! Semantic Challenges Data is unclean (bad data on certain dates) Data models are in constant flux (split event into multiple) Have to change data processing logic everywhere! My Raw Data
  • 39. AN API TO MANAGE EVOLUTION We need “microservices” for Data My Data API My Raw Data
  • 40. A DATA ACCESS LAYER FOR LINKEDIN We built Dali to solve this Logical Tables + Views Logical FileSystem Abstract away underlying physical details to allow users to focus solely on the logical concerns
  • 41. Dali: Implementation Details in Context Dali FileSystem Processing Engine (MR, Spark) Dali Datasets (Tables+Views) Dataflow APIs (MR, Spark, Scalding) Query Layers (Pig, Hive, Spark) Dali CLI Data Catalog Git + Artifactory View Def + UDFs Dataset Owner Data Source Data Sink
  • 42. Simple to Complex Different Types Basic Restrictions Access to dataset based on business need Privacy by Default Analysts shouldn’t get access to raw PII by default Consent-based Access Access to certain data elements only available if member has consented for that particular use- case Access Restrictions REQUIREMENTS
  • 43. STEP 1: DATA + METADATA Solving for Compliant Access Schema = { int memberId String firstName String lastName Position[] positions educationHistory[] educationHistory … } MemberProfile MEMBER_ID NAME PROFILE DATA NAME : is_pii MEMBER_ID : is_pii Raw Dataset Meta Data
  • 44. STEP 2: A MEMBER’S PREFERENCES Privacy Preferences
  • 45. A BITMAP DATASET: ONE PER MEMBER Privacy Preferences Member Privacy Preferences
  • 46. Solving for Compliant Access With Dali Raw Dataset Meta Data Member Privacy Preferences Dali Reader responsibility: Given: (Dataset, Metadata, UseCase) Generate: Dataset and Column-level transformations (obfuscate, null, …) Auto-join with Member Privacy Preferences (filter out data elements that are not consented to) Processing Logic Dali Reader Library Use Case = X
  • 47. Solving for Compliant Purging With Dali + Gobblin Raw Dataset Meta Data Member Privacy Preferences Gobblin Purger Dali Reader Library Use Case = Purge Member’s Delete Requests Purged Dataset
  • 48. DATA DEMOCRACY <> DATA PROTECTION More Data Discover Data Easy Access Less Data Discover Violations Restricted Access The Data Paradox DATA LIFECYCLE MANAGEMENT METADATA DATA ACCESS LAYER
  • 49. DATA DEMOCRACY <> DATA PROTECTION More Data Discover Data Easy Access Less Data Discover Violations Restricted Access The Data Paradox : Solved ! METADATA DATA ACCESS LAYER DATA LIFECYCLE MANAGEMENT
  • 50. DATA DEMOCRACY + DATA PROTECTION The Technology Blueprint WhereHows* Dali Apache Gobblin* * Open Source : We can collaborate on these together! DATA LIFECYCLE MANAGEMENTDATA ACCESS LAYER METADATA
  • 51. Core company value, implemented by Technology & Process Privacy By Design Privacy : Technology + Process SUSTAINABILITY IS CRITICAL Product : Security & Privacy Review Data : Data Model Review Legal : Regulation change -> Tech requirements Company-wide : “Horizontal” Initiatives
  • 52. Getting Stricter and more complex Data Protection Key Takeaways THE BEAST IS REAL Stricter regulations in a digital world Increasingly more complex to implement This is an accelerating global trend
  • 53. We’ve established a blueprint to sustainably address privacy Learnings at LinkedIn Key Takeaways THE BEAST CAN BE TAMED ! Privacy By Design : baked into technology stack & product development process Standardization : To solve at scale, certain parts need to be centralized and standardized Company-wide : Needs co-ordinated effort across various functions
  • 54. DATA DEMOCRACY <> DATA PROTECTION More Data Discover Data Easy Access Less Data Discover Violations Restricted Access The Data Paradox : Solved ! METADATA DATA ACCESS LAYER DATA LIFECYCLE MANAGEMENT