SlideShare uma empresa Scribd logo
1 de 199
Baixar para ler offline
Multimedia Privacy
Gerald Friedland
Symeon Papadopoulos
Julia Bernd
Yiannis Kompatsiaris
ACM Multimedia, Amsterdam, October 16, 2016
What’s the Big Deal?
Overview of Tutorial
• Part I: Understanding the Problem
• Part II: User Perceptions About Privacy
• Part III: Multimodal Inferences
• Part IV: Some Possible Solutions
• Part V: Future Directions
Part I:
Understanding the
Problem
What Can a Mindreader Read?
• This vulnerability is a problem with any type of
public or semi-public post. They’re not specific to a
particular type of information, e.g. text, image, or
video.
• However, let’s focus on multimedia data: images,
audio, video, social media context, etc.
Multimedia on the Internet Is Big!
Source:
Domosphere
Resulting Problem
• More multimedia data = Higher demand for
retrieval and organization tools.
• But multimedia retrieval is hard!
• Researchers work on making retrieval better (cf. latest
advances in Deep Learning for content-based retrieval).
• Industry develops workarounds to make retrieval easier
right away.
Hypothesis
• Retrieval is already good enough to cause major
issues for privacy that are not easy to solve.
• Let’s take a look at some retrieval approaches:
• Image tagging
• Geo-tagging
• Multimodal Location Estimation
• Audio-based user matching
Workaround: Manual Tagging
Workaround: Geo-Tagging
Source:
Wikipedia
Geo-Tagging
Allows easier clustering of photo and video series,
among other things.
Geo-Tagging Everywhere
Part of the location-based service hype:
But: Geo-coordinates + Time = Unique ID!
Support for Geo-Tags
• Social media portals provide APIs to connect geo-
tags with metadata, accounts, and web content.
• Allows easy search, retrieval, and ad placement.
Portal %* Total
YouTube 3.0 3M
Flickr 4.5 180M
*estimate (2013)
Hypothesis
• Since geo-tagging is a workaround for multimedia
retrieval, it allows us to peek into a future where
multimedia retrieval works perfectly.
• What if multimedia retrieval actually just worked?
Related Work
“Be careful when using social location sharing services, such
as Foursquare.”
Related Work
Mayhemic Labs, June 2010: “Are you aware that Tweets are geo-tagged?”
Can you do real harm?
• Cybercasing: Using online (location-based) data
and services to enable physical-world crimes.
• Three case studies:
G. Friedland and R. Sommer: "Cybercasing the Joint: On the Privacy
Implications of Geotagging", Proceedings of the Fifth USENIX Workshop
on Hot Topics in Security (HotSec 10), Washington, D.C, August 2010.
Case Study 1: Twitter
• Pictures in Tweets can be geo-tagged
• From a tech-savvy celebrity we found:
• Home location (several pics)
• Where the kids go to school
• Where he/she walks the dog
• “Secret” office
Celebs Unaware of Geo-Tagging
Source: ABC
News
Celebs Unaware of Geotagging
Google Maps Shows Address...
Case Study 2: Craigslist
“For Sale” section of Bay Area Craigslist.com:
• 4 days: 68,729 pictures total - 1.3% geo-tagged
Users Are Unaware of Geo-Tagging
• Many “anonymized” ads had geo-location
• Sometimes selling high-value goods, e.g. cars,
diamonds, etc.
• Sometimes “call Sunday after 6pm”
• Multiple photos allow interpolation of coordinates
for higher accuracy
Craigslist: Real Example
Geo-Tagging Resolution
Measured accuracy: +/- 1m
iPhone 3G picture Google Street View
What About Inference?
Owner
Valuable
Case Study 3: YouTube
Recall:
• Once data is published, the Internet keeps it (often
with many copies).
• APIs are easy to use and allow quick retrieval of
large amounts of data.
Can we find people on vacation using YouTube?
Cybercasing on YouTube
Experiment: Cybercasing using the YouTube API (240
lines in Python)
Cybercasing on YouTube
Input parameters
Location: 37.869885,-122.270539
Radius: 100km
Keywords: kids
Distance: 1000km
Time-frame: this_week
Cybercasing on YouTube
Output
• Initial videos: 1000 (max_res)
• User hull: ~50k videos
• Vacation hits: 106
• Cybercasing targets: >12
The Threat Is Real!
Question
Do you think geo-tagging should be illegal?
a) No, people just have to be more careful. The
possibilities still outweigh the risks.
b) Maybe it should be regulated somehow to make
sure no harm can be done.
c) Yes, absolutely! This information is too
dangerous.
But…
Is this really about geo-tags?
(remember: hypothesis)
But…
Is this really about geo-tags?
No, it’s about the privacy implications of
multimedia retrieval in general.
Question
And now? What do you think should be done?
a) Nothing can be done. Privacy is dead.
b) I will think before I post, but I don’t know that it
matters.
c)We need to educate people about this and try to
save privacy. (Fight!)
d) I’ll never post anything ever again! (Flight!)
Observations
• Many applications encourage heavy data sharing,
and users go with it.
• Multimedia isn’t only a lot of data, it’s also a lot of
implicit information.
• Both users and engineers often unaware of the
hidden retrieval possibilities of shared (multimedia)
data.
• Local anonymization and privacy policies may be
ineffective against cross-site inference.
Dilemma
• People will continue to want social networks and
location-based services.
• Industry and research will continue to improve
retrieval techniques.
• Government will continue to do surveillance and
intelligence-gathering.
Solutions That Don’t Work
• I blur the faces
•Audio and image artifacts can still give you
away
• I only share with my friends
•But who are they sharing with, on what
platforms?
• I don’t do social networking
•Others may do it for you!
Further Observations
• There is not much incentive to worry about privacy,
until things go wrong.
• People’s perception of the Internet does not match
reality (enough).
Basics:
Definitions and Background
Definition
• Privacy is the right to be let alone
(Justices Warren and Brandeis)
• Privacy is:
a) the quality or state of being apart from company
or observation
b) freedom from unauthorized intrusion
(Merriam Webster’s)
Starting Points
• Privacy is a human right. Every individual has a
need to keep something about themselves private.
• Companies have a need for privacy.
• Governments have a need for privacy (currently
heavily discussed).
Where We’re At (Legally)
Keep an eye out for multimedia inference!
A Taxonomy of Social Networking Data
• Service data: Data you give to an OSN to use it, e.g.
name, birthday, etc.
• Disclosed data: What you post on your page/space
• Entrusted data: What you post on other people’s
pages, e.g. comments
• Incidental data: What other people post about you
• Behavioural data: Data the site collects about you
• Derived data: Data that a third party infers about you
based on all that other data
B. Schneier. A Taxonomy of Social Networking Data, Security & Privacy, IEEE, vol.8,
no.4, pp.88, July-Aug. 2010
Privacy Bill of Rights
In February 2012, the US Government released
CONSUMER DATA PRIVACY IN A NETWORKED
WORLD:
A FRAMEWORK FOR PROTECTING PRIVACY AND PROMOTING
INNOVATION IN THE GLOBAL DIGITAL ECONOMY
http://www.whitehouse.gov/sites/default/files/privacy-final.pdf
Privacy Bill of Rights
1) Individual Control: Consumers have a right to
exercise control over what personal data is
collected from them and how they use it.
2) Transparency: Consumers have a right to easily
understandable and accessible information about
privacy and security practices.
3) Respect for Context: Consumers have a right to
expect that organizations will collect, use, and
disclose personal data in ways consistent with
the context in which consumers provide the data.
Privacy Bill of Rights
4) Security: Consumers have a right to secure and
responsible handling of personal data.
5) Access and Accuracy: Consumers have a right to
access and correct personal data in usable
formats, in a manner that is appropriate to the
sensitivity of the data and the risk of adverse
consequences to citizens if the data is inaccurate.
Privacy Bill of Rights
6) Focused Collection: Consumers have a right to
reasonable limits on the personal data that
organizations collect and retain.
7) Accountability: Consumers have a right to have
personal data handled by organizations with
appropriate measures in place to assure they
adhere to the Consumer Privacy Bill of Rights.
One View
The Privacy Bill of Rights could serve as a
requirements framework for an ideally
privacy-aware Internet service.
...if it were adopted.
Limitations
• The Privacy Bill of Rights is subject to
interpretation.
• What is “reasonable”?
• What is “context”?
• What is “personal data”?
• The Privacy Bill of Rights presents technical
challenges.
Multimedia Privacy
Multimedia Privacy
Personal Data Protection in EU
• The Data Protection Directive* (aka Directive 95/46/EC on
the protection of individuals with regard to the processing
of personal data and on the free movement of such data)
is an EU directive adopted in 1995 which regulates the
processing of personal data within the EU. It is an
important component of EU privacy and human rights
law.
• The General Data Protection Regulation, in progress since
2012 and adopted in April 2016, will supersede the Data
Protection Directive and be enforceable as of 25 May
2018
• Objectives
• Give control of personal data to citizens
• Simplify regulatory environment for businesses
* A directive is a legal act of the European Union, which requires member states to
achieve a particular result without dictating the means of achieving that result.
When is it legitimate…?
Collecting and processing the personal data of
individuals is only legitimate in one of the following
circumstances (Article 7 of Directive):
• Individual gives unambiguously consent
• If data processing is needed for a contract (e.g. electricity bill)
• If processing is required by a legal obligation
• If processing is necessary to protect the vital interests of the
person (e.g. processing medical data of an accident victim)
• If processing is necessary to perform tasks of public interest
• If data controller or third party have a legitimate interest in
doing so, as long as this does not affect the interests of the
data subject or infringe his/her fundamental rights
Obligations of data controllers in EU
Respect for the following rules:
• Personal Data collected and used for explicit and
legitimate purposes
• It must be adequate, relevant and not excessive in
relation to the above purposes
• It must be accurate and updated when needed
• Data subjects must be able to correct, remove, etc.
incorrect data about themselves (access)
• Personal data should not be kept longer than necessary
• Data controllers must protect personal data (incl. from
unauthorized access to third parties) using appropriate
measures of protection (security, accountability)
Handling sensitive data
Definition of sensitive data in EU:
• religious beliefs
• political opinions
• health
• sexual orientation
• race
• trade union membership
Processing sensitive data comes under stricter set of
rules (Article 8)
Enforcing data protection in EU?
• The Directive states that every EU country must
provide one or more independent supervisory
authorities to monitor its application.
• In principle, all data controllers must notify their
supervisory authorities when they process
personal data.
• The national authorities are also in charge of
receiving and handling complaints from
individuals.
Data Protection: US vs EU
• US has no legislation that is comparable to EU’s
Data Protection Directive.
• US privacy legislation is adopted on ad hoc basis,
e.g. when certain sectors and circumstances
require (HIPAA, CTPCA, FCRA)
• US adopts a more laissez-faire approach
• In general, US privacy legislation is considered
“weaker” compared to EU
Example: What Is Sensitive Data?
Public records indicate you own a house.
Example: What Is Sensitive Data?
A geo-tagged photo taken by a friend reveals who attended your party!
Example: What Is Sensitive Data?
Facial recognition match with a public record: Prior
arrest for drug offense!
Example: What Is Sensitive Data?
1) Public records indicate you own a house
2) A geo-tagged photo taken by a friend reveals
who attended your party
3) Facial recognition match with a public record:
Prior arrest for drug offense!
→ “You associate with convicts”
Example: What Is Sensitive Data?
“You associate with convicts”
What will this do for your reputation when you:
• Date?
• Apply for a job?
• Want to be elected to public office?
Example: What Is Sensitive Data?
But: Which of these is the sensitive data?
a) Public record: You own a house
b) Geotagged photo taken by a friend at your party
c) Public record: A friend’s prior arrest for a drug
offense
d) Conclusion: “You associate with convicts.”
e) None of the above.
Who Is to Blame?
a) The government, for its Open Data policy?
b) Your friend who posted the photo?
c) The person who inferred data from publicly
available information?
Part II:
User Perceptions About
Privacy
Study 1:
Users’ Understandings of
Privacy
The Teaching Privacy Project
• Goal: Create a privacy curriculum for K-12 and
undergrad, with lesson plans, teaching tools,
visualizations, etc.
• NSF sponsored. (CNS‐1065240 and DGE-1419319;
all conclusions ours.)
• Check It Out: Info, public education, and teaching
resources: http://teachingprivacy.org
Based on Several Research Strands
• Joint work between Friedland, Bernd, Serge Egelman,
Dan Garcia, Blanca Gordo, and many others!
• Understanding of user perceptions comes from:
• Decades of research comparing privacy comprehension,
preferences, concerns, and behaviors, including by
Egelman and colleagues at CMU
• Research on new Internet users’ privacy perceptions,
including Gordo’s evaluations of digital-literacy programs
• Observation of multimedia privacy leaks, e.g.
“cybercasing” study
• Reports from high school and undergraduate teachers
about students’ misperceptions
• Summer programs for high schoolers interested in CS
Common Research Threads
• What happens on the Internet affects the “real”
world.
• However: Group pressure, impulse, convenience,
and other factors usually dominate decision
making.
• Aggravated by lack of understanding of how
sharing on the Internet really works.
• Wide variation in both comprehension and actual
preferences.
Multimedia Motivation
• Many current multimedia R&D applications have a
high potential to compromise the privacy of
Internet users.
• We want to continue pursuing fruitful and
interesting research programs!
• But we can also work to mitigate negative effects
by using our expertise to educate the public about
effects on their privacy.
What Do People Need to Know?
Starting point: 10 observations about frequent
misperceptions + 10 “privacy principles” to address them
Illustrations
by
Ketrina Yim.
Misconception #1
• Perception: I keep track of what I’m posting. I am in
control. Websites are like rooms, and I know what’s
in each of them.
• Reality: Your information footprint is larger than you
think!
• An empty Twitter post has kilobytes of publicly available
metadata.
• Your footprint includes what others
post about you, hidden data
attached by services, records of
your offline activities… Not to
mention inferences that can be
drawn across all those “rooms”!
Misconception #2
• Perception: Surfing is anonymous. Lots of sites allow
anonymous posting.
• Reality: There is no anonymity on the Internet.
•Bits of your information
footprint — geo-tags,
language patterns, etc.
— may make it possible
for someone to
uniquely identify you,
even without a name.
Misconception #3
• Perception: There’s nothing interesting about what I
do online.
• Reality: Information about you on the Internet will
be used by somebody in their interest — including
against you.
•Every piece of information
has value to somebody: other
people, companies,
organizations, governments...
•Using or selling your data is
how Internet companies that
provide “free” services make money.
Misconception #4
• Perception: Communication on the Internet is
secure. Only the person I’m sending it to will see
the data.
• Reality: Communication over a network, unless
strongly encrypted, is never just between two
parties.
•Online data is always routed
through intermediary computers
and systems…
•Which are connected to
many more computers
and systems...
Misconception #5
• Perception: If I make a mistake or say something
dumb, I can delete it later. Anyway, people will get
what I mean, right?
• Reality: Sharing information over a network means
you give up control over that information — forever!
•The Internet never forgets. Search engines, archives, and
reposts duplicate data; you can’t “unshare”.
•Websites sell your information, and
data can be subpoenaed.
•Anything shared online is
open to misinterpretation.
The Internet can’t take a joke!
Misconception #6
• Perception: Facial recognition/speaker ID isn’t
good enough to find this. As long as no one can find
it now, I’m safe.
• Reality: Just because it can’t be found today, it
doesn’t mean it can’t be found tomorrow.
•Search engines get smarter.
•Multimedia retrieval gets better.
•Analog information gets digitized.
•Laws, privacy settings, and
privacy policies change.
Misconception #7
• Perception: What happens on the Internet stays on
the Internet.
• Reality: The online world is inseparable from the
“real” world.
•Your online activities are as
much a part of your life as
your offline activities.
•People don’t separate what
they know about Internet-you
from what they know about
in-person you.
Misconception #8
• Perception: I don’t chat with strangers. I don’t
“friend” people on Facebook that I don’t know.
• Reality: Are you sure? Identity isn’t guaranteed on
the Internet.
•Most information that “establishes”
identity in social networks may
already be public.
•There is no foolproof way to
match a real person with
their online identity.
Misconception #9
• Perception: I don’t use the Internet. I am safe.
• Reality: You can’t avoid having an information
footprint by not going online.
•Friends and family will
post about you.
•Businesses and
government share data
about you.
•Companies track transactions online.
•Smart cards transmit data online.
Misconception #10
• Perception: There’s laws that keep companies and
people from sharing my data. If a website has a
privacy policy, that means they won’t share my
information. It’s all good.
• Reality: Only you have an interest in maintaining
your privacy!
•Internet technology is rarely designed to protect
privacy.
•“Privacy policies” are there to
protect providers from lawsuits.
•Laws are spotty and vary
from place to place.
•Like it or not, your privacy
is your own responsibility!
What Came of All This?
Example: “Ready or Not?” educational app
LINK
What Came of All This?
Example: “Digital Footprints” video
Study 2:
Perceived vs. Actual Predictability of
Personal Information in Social Nets
Papadopoulos and Kompatsiaris with Eleftherios Spyromitros-
Xioufis, Giorgos Petkos, and Rob Heyman (iMinds)
Personal Information in OSNs
Participation in OSNs comes at a price!
• User-related data is shared with:
• a) other OSN users, b) the OSN itself, c) third parties
(e.g. ad networks)
• Disclosure of specific types of data:
• e.g. gender, age, ethnicity, political or religious beliefs,
sexual preferences, employment status, etc.
• Information isn’t always explicitly disclosed!
• Several types of personal information can be accurately
inferred based on implicit cues (e.g. Facebook likes) and
machine learning! (cf. Part III)
Inferred Information & Privacy in OSNs
• Study of user awareness with regard to inferred
information largely neglected by social research.
• Privacy usually presented as a question of giving
access or communicating personal information to
some party, e.g.:
“The claim of individuals, groups, or institutions to
determine for themselves when, how, and to what extent
information about them is communicated to others.”
(Westin, 1970)
[1] Alan Westin. Privacy and freedom. Bodley Head, London, 1970.
Inferred Information & Privacy in OSNs
• However, access control is non-existent for
inferred information:
• Users are unaware of the inferences being made.
• Users have no control over the way inferences are
made.
• Goal: Investigate whether and how users intuitively
grasp what can be inferred from their disclosed
data!
Main Research Questions
1. Predictability: How predictable are different types of
personal information, based on users’ OSN data?
2. Actual vs. perceived predictability: How realistic are
user perceptions about the predictability of their
personal information?
3. Predictability vs. sensitivity: What is the relationship
between perceived sensitivity and predictability of
personal information?
• Previous work has focused mainly on Q1
• We address Q1 using a variety of data and methods,
and additionally we address Q2 and Q3
Data Collection
• Three types of data about 170 Facebook users:
• OSN data: Likes, posts, images -- collected through a test
Facebook application
• Answers to questions about 96 personal attributes,
organized into 9 categories, e.g. health factors, sexual
orientation, income, political attitude, etc.
• Answers to questions related to their perceptions about
the predictability and sensitivity of the 9 categories
http://databait.eu
http://www.usemp-project.eu
Example From Questionnaire
• What is your sexual orientation? →
ground truth
• Do you think the information on your
Facebook profile reveals your sexual
orientation? Either because you
yourself have put it online, or it could
be inferred from a combination of
posts. → perceived predictability
• How sensitive do you find the
information you had to reveal about
your sexual orientation? (1=not
sensitive at all, 7= very sensitive) →
perceived sensitivity
Response #
heterosexual 147
homosexual 14
bisexual 7
n/a 2
Response #
yes 134
no 33
n/a 3
Features Extracted From OSN Data
• likes: binary vector denoting presence/absence of a like (#3.6K)
• likesCats: histogram of like category frequencies (#191)
• likesTerms: Bag-of-Words (BoW) of terms in description, title,
and about sections of likes (#62.5K)
• msgTerms: BoW vector of terms in user posts (#25K)
• lda-t: Distribution of topics in the textual contents of both likes
(description, title, and about section) and posts
• Latent Dirichlet Allocation with t=20,30,50,100
• visual: concepts depicted in user images (#11.9K), detected
using CNN, top 12 concepts per images, 3 variants
• visual-bin: hard 0/1 encoding
• visual-freq: concept frequency histogram
• visual-conf: sum of detection scores across all images
Experimental Setup
• Evaluation method: repeated random sub-sampling
• Data split randomly 𝑛=10 times into train (67%) / test (33%)
• Model fit on train / accuracy of inferences assessed on test
• 96 questions (user attributes) were considered
• Evaluation measure: area under ROC curve (AUC)
• Appropriate for imbalanced classes
• Classification algorithms
• Baseline: 𝑘-nearest neighbors, decision tree, naïve Bayes
• SoA: Adaboost, random forest, regularized logistic regression
Predictability per Attribute
nationality
is employed
can be moody
smokes cannabis
plays volleyball
What Is More Predictable?
Rank Perceived Actual predictability Predictability SoA*
1 Demographics Demographics - Demographics
2 Relationship status
and living condition
Political views +3 Political views
3 Sexual orientation Sexual orientation - Religious views
4 Consumer profile Employment/Income +4 Sexual orientation
5 Political views Consumer profile -1 Health status
6 Personality traits Relationship status
and living condition
-4 Relationship status
and living condition
7 Religious views Religious views -
8 Employment/Incom
e
Health status +1
9 Health status Personality traits -3
* Kosinski, et al. Private traits and attributes are predictable from digital records of human
behavior. Proceedings of the National Academy of Sciences, 2013.
Predictability Versus Sensitivity
Part III:
Multimodal Inferences
Personal Data: Truly Multimodal
• Text: posts, comments, content of articles you
read/like, etc.
• Images/Videos: posted by you, liked by you,
posted by others but containing you
• Resources: likes, visited websites, groups, etc.
• Location: check-ins, GPS of posted images, etc.
• Network: what your friends look like, what they
post, what they like, community where you belong
• Sensors: wearables, fitness apps, IoT
What Can Be Inferred?
A lot….
Three Main Approaches
• Content-based
• What you post is what/where/how/etc. you are
• Supervised learning
• Learn by example
• Network-based
• Show me your friends and I’ll tell you who you are
Content-Based
Beware of your posts…
Location
Multimodal Location Estimation
Multimodal Location Estimation
http://mmle.icsi.berkeley.edu
Multimodal Location Estimation
We infer the location of a video based on visual
stream, audio stream, and tags:
• Use geo-tagged data as training data
• Allows faster search, inference, and intelligence-
gathering, even without GPS.
G. Friedland, O. Vinyals, and T. Darrell: "Multimodal Location Estimation," pp. 1245-
1251, ACM Multimedia, Florence, Italy, October 2010.
Intuition for the Approach
{berkeley, sathergate,
campanile}
{berkeley, haas}
{campanile} {campanile, haas}
Node: Geolocation of
video
Edge: Correlated locations
(e.g. common tag, visual,
acoustic features)
Edge Potential: Strength of an edge
(e.g. posterior distribution of locations
given common tags)
MediaEval
J. Choi, G. Friedland, V. Ekambaram, K. Ramchandran: "Multimodal Location Estimation of
Consumer Media: Dealing with Sparse Training Data," in Proceedings of IEEE ICME 2012,
Melbourne, Australia, July 2012.
YouTube Cybercasing Revisited
YouTube Cybercasing With Geo-Tags vs.
Multimodal Location Estimation
Old Experiment No Geo-Tags
Initial Videos 1000 (max) 107
User Hull ~50k ~2000
Potential Hits 106 112
Actual Targets >12 >12
Account Linking
Can we link accounts based on their
content?
Using Internet Videos: Dataset
Test videos from Flickr (~40 sec)
• 121 users to be matched, 50k trials
• 70% have heavy noise
• 50% speech
• 3% professional content
H. Lei, J. Choi, A. Janin, and G. Friedland: “Persona Linking: Matching Uploaders of Videos
Across Accounts”, at IEEE International Conference on Acoustic, Speech, and Signal
Processing (ICASSP), Prague, May 2011.
Matching Users Within Flickr
Algorithm:
1) Take 10 seconds of the soundtrack of a video
2) Extract the Spectral Envelope
3) Compare using Manhattan Distance
Spectral Envelope
User ID on Flickr Videos
Persona Linking Using Internet Videos
Result:
•On average, having 40 seconds in the
test and training sets leads to a 99.2%
chance for a true positive match!
Another Linkage Attack
Exploiting users’ online activity to link accounts
• Link based on where and when a user is posting
• Attack model is individual targeting
• Datasets: Yelp, Flickr, Twitter
• Methods
• Location profile
• Timing profile
When a User Is Posting
Where a User Is Posting
- Twitter locations
- Yelp locations
De-Anonymization Model
Targeted account
(YELP users are
ID’d) Candidate
list
Datasets
• Three social networks: Yelp, Twitter, Flickr
• Two types of data sets
• Ground truth data set
• Yelp-Twitter: 2,363 -> 342 (with geotags) -> 57 (in SF bay)
• Flickr-Twitter: 6,196 -> 396 (with geotags) -> 27 (in SF bay)
• Candidate Twitter list data set: 26,204
Performance on Matching
Supervised Learning
Learn by example
Inferring Personal Information
• Supervised learning algorithms
• Learn a mapping (model) from inputs 𝒙𝑖 to outputs 𝑦𝑖 by analyzing a set of
training examples 𝐷=(𝒙𝑖,𝑦𝑖 )i
𝑁
• In this case
• 𝑦𝑖 corresponds to a personal user attribute, e.g. sexual orientation
• 𝒙𝑖 corresponds to a set of predictive attributes or features, e.g. user likes
• Some previous results
• Kosinski et al. [1]: likes features (SVD) + logistic regression: Highly accurate
inferences of ethnicity, gender, sexual orientation, etc.
• Schwartz et al. [2] status updates (PCA) + linear SVM: Highly accurate
inference of gender
[1] Kosinski, et al. Private traits and attributes are predictable from digital records of human
behavior. Proceedings of the National Academy of Sciences, 2013.
[2] Schwartz, et al. Personality, gender, and age in the language of social media: The open-
vocabulary approach. PloS one, 2013.
What Do Your Likes Say About You?
M. Kosinski, D. Stillwell, T. Graepel. “Private Traits and Attributes are Predictable
from Digital Records of Human Behavior”. PNAS 110: 5802 – 5805, 2013
M. Kosinski, D. Stillwell, T. Graepel. “Private Traits and Attributes are Predictable
from Digital Records of Human Behavior”. PNAS 110: 5802 – 5805, 2013
Results: Prediction Accuracy
M. Kosinski, D. Stillwell, T. Graepel. “Private Traits and Attributes are Predictable
from Digital Records of Human Behavior”. PNAS 110: 5802 – 5805, 2013
The More You Like…
Our Results: USEMP Dataset (Part II)
Testing different classifiers
Our Results: USEMP Dataset (Part II)
Testing different features
Our Results: USEMP Dataset (Part II)
Testing combinations of features
Caution: Reliability of Predictions
MODEL 1
MODEL 2
MODEL N
α% training set
ENSEMBLE
α%
α%
Caution: Reliability of Predictions
Percentage of users,
for which individual
models have low
agreement (Sx<0.5).
Classification
accuracy for
those users.
MyPersonality dataset (subset)
Conclusions
• Representing users as feature vectors and using
supervised learning can help achieve pretty good
accuracy in several cases.
However:
• There will be several cases where the output of the
trained model will be unreliable (close to random).
• For many classifiers and for abstract feature
representations (e.g. SVD), it is very hard to explain
why a particular user has been classified as
belonging to a given class.
Network-Based Learning
Show me your friends….
with Georgios Rizos
Network-Based Classification
• People with similar interests tend to connect
→ homophily
• Knowing about one’s connections
could reveal information
about them
• Knowing about
the whole network
structure could reveal
even more…
My Social Circles
A variety of affiliations:
• Work
• School
• Family
• Friends
…
SoA: User Classification (1)
Graph-based semi-supervised learning:
• Label propagation (Zhu and Ghahramani, 2002)
• Local and global consistency (Zhou et al., 2004)
Other approaches to user classification:
• Hybrid feature engineering for inferring user behaviors
(Pennacchiotti et al., 2011 , Wagner et al., 2013)
• Crowdsourcing Twitter list keywords for popular users
(Ghosh et al., 2012)
SoA: Graph Feature Extraction (2)
Use of community detection:
• EdgeCluster: Edge centric k-means (Tang and Liu, 2009)
• MROC: Binary tree community hierarchy (Wang et al., 2013)
Low-rank matrix representation methods:
• Laplacian Eigenmaps: k eigenvectors of the graph Laplacian
(Belkin and Niyogi, 2003 , Tang and Liu, 2011)
• Random-Walk Modularity Maximization: Does not suffer from
the resolution limit of ModMax (Devooght et al., 2014)
• Deepwalk: Deep representation learning (Perozzi et al., 2014)
Overview of Framework
Online social interactions
(retweets, mentions, etc.)
Social interaction
user graph
ARCTE
Partial/Sparse
Annotation
Supervised graph
feature representation
Feature Weighting
User Label
Learning
Classified Users
ARCTE: Intuition
Evaluation: Datasets
Ground truth generation:
• SNOW2014 Graph: Twitter list aggregation & post-processing
• IRMV-PoliticsUK: Manual annotation
• ASU-YouTube: User membership to group
• ASU-Flickr: User subscription to interest group
Datasets Labels Vertices Vertex Type Edges Edge Type
SNOW2014 Graph
(Papadopoulos et al., 2014)
90 533,874 Twitter
Account
949,661 Mentions +
Retweets
IRMV-PoliticsUK
(Greene & Cunningham, 2013)
5 419 Twitter
Account
11,349 Mentions +
Retweets
ASU-YouTube
(Mislove et al., 2007)
47 1,134,890 YouTube
Channel
2,987,624 Subscriptions
ASU-Flickr
(Tang and Liu, 2009)
195 80,513 Flickr Account 5,899,882 Contacts
Example: Twitter
Twitter Handle Labels
@nytimes usa, press,
new york
@HuffPostBiz finance
@BBCBreaking press,
journalist, tv
@StKonrath journalist
Examples from SNOW 2014 Data
Challenge dataset
Evaluation: SNOW 2014 dataset
SNOW2014 Graph (534K, 950K): Twitter mentions +
retweets, ground truth based on Twitter list processing
Evaluation: ASU-YouTube
• ASU-YouTube (1.1M, 3M): YouTube subscriptions, ground
truth based on membership to groups
Part IV:
Some Possible Solutions
Solution 1:
Disclosure Scoring Framework
with Georgios Petkos
Problem and Motivation
• Several studies have shown that privacy is a challenging
issue in OSNs.
•Madejski et al. performed a study with 65 users asking
them to carefully examine their profiles → all of them
identified a sharing violation.
• Information about a user may appear not only
explicitly, but also implicitly, and may therefore be
inferred (also think of institutional privacy).
• Different users have different attitudes towards privacy
and online information sharing (Knijnenbourg, 2013).
Madejski et al., “A study of privacy setting errors in an online social network”. PERCOM, 2012
Knijnenbourg, “Dimensionality of information disclosure behavior”. IJHCS, 2013
Disclosure Scoring
“A framework for quantifying the type of information
one is sharing, and the extent of such disclosure.”
Requirements:
• It must take into account the fact that privacy
concerns are different across users.
• Different types of information have different
significance to users.
• Must take into account both explicit and inferred
information.
Related Work
1. Privacy score [Liu10]: based on the concepts of
visibility and sensitivity:
1. Privacy Quotient and Leakage [Srivastava13]
2. Privacy Functionality Score [Ferrer10]
3. Privacy index [Nepali13]
4. Privacy Scores [Sramka15]
Types of Personal Information
aka Disclosure Dimensions
Overview of PScore
A F
A.1 A.6 F.1 F.3A.5
• Explicitly Disclosed /
Inferred
• Value / Predicted Value
• Confidence of Prediction
• Level of Sensitivity
• Level of Disclosure
• Reach of Disclosure
• Level of Sensitivity
Observed data (URLs, likes, posts)
Inference Algorithms
0101 1101 1001
Disclosure Dimensions
User Attributes
Example
Visualization
Bubble color/size
proportional to
disclosure score →
red/big corresponds to
more sensitive/risky
Visualization
Hierarchical exploration of types
of personal information.
http://usemp-mklab.iti.gr/usemp/
Solution 2:
Personalized Privacy-Aware Image
Classification
with Eleftherios Spyromitros-Xioufis and Adrian Popescu
(CEA-LIST)
Privacy-Aware Image Classification
• Photo sharing may compromise privacy
• Can we make photo sharing safer?
• Yes: build “private” image detectors
• Alerts whenever a “private” image is shared
• Personalization is needed because privacy is subjective!
-Would you share such an image?
-Does it depend with whom?
Previous Work, and Limitations
• Focus on generic (“community”) notion of privacy
• Models trained on PicAlert [1]: Flickr images annotated
according to a common privacy definition
• Consequences:
• Variability in user perceptions not captured
• Over-optimistic performance estimates
• Justifications are barely
comprehensible
[1] Zerr et al., I know what you did last summer!: Privacy-aware image classification
and search, CIKM, 2012.
Goals of the Study
• Study personalization in image privacy classification
• Compare personalized vs. generic models
• Compare two types of personalized models
• Semantic visual features
• Better justifications and privacy insights
• YourAlert: more realistic than existing benchmarks
Personalization Approaches
• Full personalization:
• A different model for each user, relying only on their
feedback
• Disadvantage: requires a lot of feedback
• Partial personalization:
• Models rely on user feedback + feedback from other
users
• Amount of personalization controlled via instance
weighting
Visual and Semantic Features
• vlad [1]: aggregation of local image descriptors
• cnn [2]: deep visual features
• semfeat [3]: outputs of ~17K concept detectors
• Trained using cnn
• Top 100 concepts per image
[1] Spyromitros-Xioufis et al., A comprehensive study over vlad and product quantization in
large-scale image retrieval. IEEE Transactions on Multimedia, 2014.
[2] Simonyan and Zisserman, Very deep convolutional networks for large-scale image
recognition, ArXiv, 2014.
[3] Ginsca et al., Large-Scale Image Mining with Flickr Groups, MultiMedia Modeling, 2015.
Explanations via Semfeat
• Semfeat can be used to justify predictions
• A tag cloud of the most discriminative visual concepts
• Explanations may often be confusing
• Concept detectors are not perfect
• Semfeat vocabulary (ImageNet) is not privacy-oriented
knitwear
young-back
hand-glass
cigar-smoker
smoker
drinker
Freudian
semfeat-LDA: Enhanced Explanations
• Project semfeat to a latent space (second level
semantic representation)
• Images treated as text documents (top 10 concepts)
• Text corpus created from private images (Pic+YourAlert)
• LDA is applied to create a topic model (30 topics)
• 6 privacy-related topics are identified (manually)
Topic Top 5 semfeat concepts assigned to each topic
children dribbler child godson wimp niece
drinking drinker drunk tipper thinker drunkard
erotic slattern erotic cover-girl maillot back
relatives great-aunt second-cousin grandfather mother great-grandchild
vacations seaside vacationer surf-casting casting sandbank
wedding groom bride celebrant wedding costume
semfeat-LDA: Example
knitwear
young-back
hand-glass
cigar-smoker
smoker
drinker
Freudian
1st level semantic
representation
2nd level semantic
representation
YourAlert: A Realistic Benchmark
• User study
• Participants annotate their own photos (informed
consent, only extracted features shared)
• Annotation based on the following definitions:
• Private: “would share only with close OSN friends or not at all”
• Public: “would share with all OSN friends or even make public”
• Resulting dataset: YourAlert
• 1.5K photos, 27 users, ~16 private/40 public per user
• Main advantages:
•Facilitates realistic evaluation of privacy models
•Allows development of personalized models
Publicly available at: http://mklab.iti.gr/datasets/image-privacy/
Generic Models: PicAlert vs. YourAlert
Key Findings
• Almost perfect performance for PicAlert with CNN
• semfeat performs similarly to CNN
• Significantly worse performance for YourAlert
• Similar performance for all features
• Additional findings
• Using more generic training examples does not help
• Large variability in performance across users
Personalized privacy models
• Evaluation carried out on YourAlert
• A modified k-fold cross-validation for unbiased
estimates
• Personalized model types
• ‘user’: only user-specific examples from YourAlert
• ‘hybrid’: a mixture of user-specific examples from
YourAlert and generic examples from PicAlert
• User-specific examples are weighted higher
Evaluation of Personalized Models
PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
Model type:
‘user’
Evaluation of Personalized Models
PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
Model type:
‘user’
Evaluation of Personalized Models
PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
Model type:
‘user’
Evaluation of Personalized Models
PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
Model type:
‘hybrid w=1’
Evaluation of Personalized Models
PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
Model type:
‘hybrid w=1’
Evaluation of Personalized Models
PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
Model type:
‘hybrid w=1’
Evaluation of Personalized Models
PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
Model type:
‘hybrid w=2’
Evaluation of Personalized Models
PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
Model type:
‘hybrid w=2’
Evaluation of Personalized Models
PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
Model type:
‘hybrid w=2’
Results
Privacy Insights via Semfeat
child
mate
son
private
uphill
lakefront
waterside
public
Identifying Recurring Privacy Themes
• A prototype semfeat-LDA vector for each user
• The centroid of the semfeat-LDA vectors of their private
images
• K-means (k=5) clustering on the prototype vectors
Would you share the following?
With whom would you share the photos in the
following slides:
a)family
b)friends
c)colleagues
d)your Facebook friends
e)everyone (public)
Multimedia Privacy
Multimedia Privacy
Multimedia Privacy
Part V:
Future Directions
Towards Private Multimedia Systems
We should:
• Research methods to help mitigate risks and offer
choice.
• Develop privacy policies and APIs that take into
account multimedia retrieval.
• Educate users and engineers on privacy issues.
...before panic slows progress in the multimedia
field.
The Role of Research
Research can help:
• Describe and quantify risk factors
• Visualize and offer choices in UIs
• Identify privacy-breaking information
• Filter out “irrelevant information” through content
analysis
Reality Check
Can we build a privacy-proof system?
No. We can’t build a theft-proof car either.
However, we can make it more or less privacy-proof.
Emerging Issue: Internet of Things
Graphic by Applied Materials using International Data Corporation data.
Emerging Issue: Wearables
Source: Amish Gandhi via SlideShare
Multimedia Things
• Much of the IoT data collected is multimedia data.
•Requires (exciting!) new approaches to real-time
multimedia content analysis. →
•Presents new threats to security and privacy. →
•Requires new best practices for Security and
Privacy by Design and new privacy enhancing
technologies (PETs). →
•Presents opportunities to work on privacy
enhancements to multimedia!
Example IoT Advice From Future of Privacy Forum
• Get creative with using multimedia affordances (visual,
audio, tactile) to alert users to data collection.
• Respect for context: Users may have different expectations
for data they input manually and data collected by sensors.
• Inform users about how their data will be used.
• Choose de-identification practices according to your
specific technical situation.
•In fact, multimedia expertise can contribute to improving de-
identification!
• Build trust by allowing users to engage with their own
data, and to control who accesses it.
Source: Christopher Wolf, Jules Polonetsky, and Kelsey Finch, A Practical Privacy
Paradigm for Wearables. Future of Privacy Forum, 2015.
One Privacy Design Practice Above All
Think about privacy (and security) as you
BEGIN designing a system or planning a
research program.
Privacy is not an add-on!
Describing Risks
A Method from Security Research
• Build a model for potential attacks
as a set of:
• attacker properties
• attack goals
• Proof your system against it as much as possible.
• Update users’ expectations about residual risk.
Attacker Properties: Individual Privacy
• Resources
• individual/institutional/moderate resource
• Target Model
• targeted individual/easiest k of N/everyone
• Database access
• full (private, public) data access/well-indexed
access/poorly indexed access/hard retrieval/soft
retrieval (multimedia)
Goals of Privacy Attacks
• Cybercasing (attack preparation)
• Cyberstalking
• Socio-Economic profiling
• Espionage (industry, country)
• Cybervetting
• Cyberframing
Towards Privacy-Proof MM Systems
• Match users’ expectations of privacy in system
behavior (e.g. include user evaluation)
• If that’s not possible, educate users about risks
• Ask yourself: What is the best trade-off for the
users between privacy, utility, and convenience?
• Don’t expose as much information as possible,
expose only as much information as is required!
Engineering Rules From the Privacy Community
• Inform users of the privacy model and quantify the
possible audience:
• Public/link-to-link/semi-public/private
• How many people will see the information (avg. friends-
of-friends on Facebook: 70k people!)
• If users expect anonymity, explain the risks of
exposure
• Self-posting of PII, hidden meta-data, etc.
• Provide tools that make it easier to stay (more)
anonymous based on expert knowledge (e.g. erase EXIF)
Engineering Rules from the Privacy Community
• Show users what metadata is collected by your
service/app and to whom it is made available (AKA
“Privacy Nutrition Label”)
• At the least, offer an opt-out!
• Make settings easily configurable (Facebook is not
easily configurable)
• Offer methods to delete and correct data
• If possible, trigger search engine updating after
deletion
• If possible, offer “deep deletion” (i.e. delete re-posts, at
least within-system)
Closing Thought Exercise: Part 1
Take two minutes to think about the following
questions:
• What’s your area of expertise? What are you
working on right now?
• How does it interact with privacy? What are the
potential attacks and potential consequences?
• What can you do to mitigate negative privacy
effects?
• What can you do to educate users about possible
privacy implications?
Closing Thought Exercise: Part 2
• Turn to the person next to you and share your
thoughts. Ask each other questions!
• You have five minutes.
Acknowledgments
Work together with:
• Jaeyoung Choi, Luke Gottlieb, Robin Sommer,
Howard Lei, Adam Janin, Oana Goga, Nicholas
Weaver, Dan Garcia, Blanca Gordo, Serge Egelman,
and others
• Georgios Petkos, Eleftherios Spyromitros-Xioufis,
Adrian Popescu, Rob Heyman, Georgios Rizos,
Polychronis Charitidis, Thomas Theodoridis and
others
Thank You!
Acknowledgements:
• This material is based upon work supported by the
US National Science Foundation under Grant No.
CNS-1065240 and DGE-1419319, and by the
European Commission under Grant No. 611596 for
the USEMP project.
• Any opinions, findings, and conclusions or
recommendations expressed in this material are
those of the authors and do not necessarily reflect
the views of the funding bodies.

Mais conteúdo relacionado

Mais procurados

(Abegail Bagaan) Kompilasyon ng mga Akademikong Sulatin
(Abegail Bagaan) Kompilasyon ng mga Akademikong Sulatin(Abegail Bagaan) Kompilasyon ng mga Akademikong Sulatin
(Abegail Bagaan) Kompilasyon ng mga Akademikong SulatinBeth Aunab
 
Web 1.0, Web 2.0 & Web 3.0
Web 1.0, Web 2.0 & Web 3.0Web 1.0, Web 2.0 & Web 3.0
Web 1.0, Web 2.0 & Web 3.0tokey_sport
 
Ang apat na buwan ko sa Espanya.pptx
Ang apat na buwan ko sa Espanya.pptxAng apat na buwan ko sa Espanya.pptx
Ang apat na buwan ko sa Espanya.pptxShinPhobefinPetiluna
 
Empowerment technologies
Empowerment technologiesEmpowerment technologies
Empowerment technologiesDeped
 
Pormal na sanaysay final
Pormal na sanaysay finalPormal na sanaysay final
Pormal na sanaysay finaleijrem
 
PAGSASALINGWIKA: MASINING NA GAWAIN-RETORIKA
PAGSASALINGWIKA: MASINING NA GAWAIN-RETORIKAPAGSASALINGWIKA: MASINING NA GAWAIN-RETORIKA
PAGSASALINGWIKA: MASINING NA GAWAIN-RETORIKAGOOGLE
 
Mga wika at dyalekto sa pilipinas
Mga wika at dyalekto sa pilipinasMga wika at dyalekto sa pilipinas
Mga wika at dyalekto sa pilipinasFritz Veniegas
 
Current And Future Trends in Media and Information - Media and Information Li...
Current And Future Trends in Media and Information - Media and Information Li...Current And Future Trends in Media and Information - Media and Information Li...
Current And Future Trends in Media and Information - Media and Information Li...Mark Jhon Oxillo
 
Konseptong pangwika(modyul1)
Konseptong pangwika(modyul1)Konseptong pangwika(modyul1)
Konseptong pangwika(modyul1)princessalcaraz
 
Hakbang at prinsipyo sa pagbabalangkas
Hakbang at prinsipyo sa pagbabalangkasHakbang at prinsipyo sa pagbabalangkas
Hakbang at prinsipyo sa pagbabalangkasTEACHER JHAJHA
 
Data Protection Predictions for 2023.pdf
Data Protection Predictions for 2023.pdfData Protection Predictions for 2023.pdf
Data Protection Predictions for 2023.pdfDarylBallesteros3
 
Mga kasanayan sa akademikong pagbasa
Mga kasanayan sa akademikong pagbasaMga kasanayan sa akademikong pagbasa
Mga kasanayan sa akademikong pagbasaArlan Faraon
 
Photo essay/sanaysay ng larawan
Photo essay/sanaysay ng larawanPhoto essay/sanaysay ng larawan
Photo essay/sanaysay ng larawanLorelyn Dela Masa
 

Mais procurados (20)

(Abegail Bagaan) Kompilasyon ng mga Akademikong Sulatin
(Abegail Bagaan) Kompilasyon ng mga Akademikong Sulatin(Abegail Bagaan) Kompilasyon ng mga Akademikong Sulatin
(Abegail Bagaan) Kompilasyon ng mga Akademikong Sulatin
 
Web 1.0, Web 2.0 & Web 3.0
Web 1.0, Web 2.0 & Web 3.0Web 1.0, Web 2.0 & Web 3.0
Web 1.0, Web 2.0 & Web 3.0
 
CPAR .pptx
CPAR .pptxCPAR .pptx
CPAR .pptx
 
Thesis.
Thesis.Thesis.
Thesis.
 
Ang apat na buwan ko sa Espanya.pptx
Ang apat na buwan ko sa Espanya.pptxAng apat na buwan ko sa Espanya.pptx
Ang apat na buwan ko sa Espanya.pptx
 
Empowerment technologies
Empowerment technologiesEmpowerment technologies
Empowerment technologies
 
Pormal na sanaysay final
Pormal na sanaysay finalPormal na sanaysay final
Pormal na sanaysay final
 
PAGSASALINGWIKA: MASINING NA GAWAIN-RETORIKA
PAGSASALINGWIKA: MASINING NA GAWAIN-RETORIKAPAGSASALINGWIKA: MASINING NA GAWAIN-RETORIKA
PAGSASALINGWIKA: MASINING NA GAWAIN-RETORIKA
 
Filipino
FilipinoFilipino
Filipino
 
Mga wika at dyalekto sa pilipinas
Mga wika at dyalekto sa pilipinasMga wika at dyalekto sa pilipinas
Mga wika at dyalekto sa pilipinas
 
Kahalagahan at kahulugan ng pananaliksik
Kahalagahan at kahulugan ng pananaliksikKahalagahan at kahulugan ng pananaliksik
Kahalagahan at kahulugan ng pananaliksik
 
Current And Future Trends in Media and Information - Media and Information Li...
Current And Future Trends in Media and Information - Media and Information Li...Current And Future Trends in Media and Information - Media and Information Li...
Current And Future Trends in Media and Information - Media and Information Li...
 
Konseptong pangwika(modyul1)
Konseptong pangwika(modyul1)Konseptong pangwika(modyul1)
Konseptong pangwika(modyul1)
 
Mga Kasanayan sa Pagbasa
Mga Kasanayan sa PagbasaMga Kasanayan sa Pagbasa
Mga Kasanayan sa Pagbasa
 
Hakbang at prinsipyo sa pagbabalangkas
Hakbang at prinsipyo sa pagbabalangkasHakbang at prinsipyo sa pagbabalangkas
Hakbang at prinsipyo sa pagbabalangkas
 
Data Protection Predictions for 2023.pdf
Data Protection Predictions for 2023.pdfData Protection Predictions for 2023.pdf
Data Protection Predictions for 2023.pdf
 
Mga kasanayan sa akademikong pagbasa
Mga kasanayan sa akademikong pagbasaMga kasanayan sa akademikong pagbasa
Mga kasanayan sa akademikong pagbasa
 
Konklusyon
KonklusyonKonklusyon
Konklusyon
 
Filipino
FilipinoFilipino
Filipino
 
Photo essay/sanaysay ng larawan
Photo essay/sanaysay ng larawanPhoto essay/sanaysay ng larawan
Photo essay/sanaysay ng larawan
 

Destaque

Insider Threat Final Powerpoint Prezi
Insider Threat Final Powerpoint PreziInsider Threat Final Powerpoint Prezi
Insider Threat Final Powerpoint PreziKashif Semple
 
Insider threat event presentation
Insider threat event presentationInsider threat event presentation
Insider threat event presentationIISPEastMids
 
Insider Threat Detection Recommendations
Insider Threat Detection RecommendationsInsider Threat Detection Recommendations
Insider Threat Detection RecommendationsAlienVault
 
Malicious Insiders
Malicious InsidersMalicious Insiders
Malicious Insidersgjohansen
 
Insider threats and countermeasures
Insider threats and countermeasuresInsider threats and countermeasures
Insider threats and countermeasuresKAMRAN KHALID
 
5 Signs you have an Insider Threat
5 Signs you have an Insider Threat5 Signs you have an Insider Threat
5 Signs you have an Insider ThreatLancope, Inc.
 
Snowden slides
Snowden slidesSnowden slides
Snowden slidesDavid West
 
Insider Threats Webinar Final_Tyco
Insider Threats Webinar Final_TycoInsider Threats Webinar Final_Tyco
Insider Threats Webinar Final_TycoMatt Frowert
 

Destaque (11)

Insider Threat Final Powerpoint Prezi
Insider Threat Final Powerpoint PreziInsider Threat Final Powerpoint Prezi
Insider Threat Final Powerpoint Prezi
 
Insider threat kill chain
Insider threat   kill chainInsider threat   kill chain
Insider threat kill chain
 
Insider threat event presentation
Insider threat event presentationInsider threat event presentation
Insider threat event presentation
 
Insider Threat Detection Recommendations
Insider Threat Detection RecommendationsInsider Threat Detection Recommendations
Insider Threat Detection Recommendations
 
The Accidental Insider Threat
The Accidental Insider ThreatThe Accidental Insider Threat
The Accidental Insider Threat
 
Malicious Insiders
Malicious InsidersMalicious Insiders
Malicious Insiders
 
Insider threats and countermeasures
Insider threats and countermeasuresInsider threats and countermeasures
Insider threats and countermeasures
 
5 Signs you have an Insider Threat
5 Signs you have an Insider Threat5 Signs you have an Insider Threat
5 Signs you have an Insider Threat
 
Insider threat
Insider threatInsider threat
Insider threat
 
Snowden slides
Snowden slidesSnowden slides
Snowden slides
 
Insider Threats Webinar Final_Tyco
Insider Threats Webinar Final_TycoInsider Threats Webinar Final_Tyco
Insider Threats Webinar Final_Tyco
 

Semelhante a Multimedia Privacy

Helping Developers with Privacy
Helping Developers with PrivacyHelping Developers with Privacy
Helping Developers with PrivacyJason Hong
 
Ethical Issues Of Information Technology
Ethical Issues Of Information TechnologyEthical Issues Of Information Technology
Ethical Issues Of Information TechnologySandra Arveseth
 
The death of data protection
The death of data protection The death of data protection
The death of data protection Lilian Edwards
 
The death of data protection sans obama
The death of data protection sans obamaThe death of data protection sans obama
The death of data protection sans obamaLilian Edwards
 
Personal Data and Trust Network inaugural Event 11 march 2015 - record
Personal Data and Trust Network inaugural Event   11 march 2015 - recordPersonal Data and Trust Network inaugural Event   11 march 2015 - record
Personal Data and Trust Network inaugural Event 11 march 2015 - recordDigital Catapult
 
Thierer Internet Privacy Regulation
Thierer Internet Privacy RegulationThierer Internet Privacy Regulation
Thierer Internet Privacy RegulationMercatus Center
 
SOCIAL MEDIA RISKS | HB EMERGING COMPLEX CLAIMS
SOCIAL MEDIA RISKS | HB EMERGING COMPLEX CLAIMSSOCIAL MEDIA RISKS | HB EMERGING COMPLEX CLAIMS
SOCIAL MEDIA RISKS | HB EMERGING COMPLEX CLAIMSHB Litigation Conferences
 
Privacy & the Internet: An Overview of Key Issues
Privacy & the Internet: An Overview of Key IssuesPrivacy & the Internet: An Overview of Key Issues
Privacy & the Internet: An Overview of Key IssuesAdam Thierer
 
Malcolm Crompton, IIS Partners Irish Future Internet Forum - Socioeconomics
Malcolm Crompton, IIS Partners Irish Future Internet Forum - SocioeconomicsMalcolm Crompton, IIS Partners Irish Future Internet Forum - Socioeconomics
Malcolm Crompton, IIS Partners Irish Future Internet Forum - SocioeconomicsIrish Future Internet Forum
 
Ethics for Conversational AI
Ethics for Conversational AIEthics for Conversational AI
Ethics for Conversational AIVerena Rieser
 
PLA Legal aspects of Big Data analytics final
PLA Legal aspects of Big Data analytics finalPLA Legal aspects of Big Data analytics final
PLA Legal aspects of Big Data analytics finalSofie van der Meulen
 
Paperless Lab Academy 'legal aspects of big data analytics'
Paperless Lab Academy 'legal aspects of big data analytics' Paperless Lab Academy 'legal aspects of big data analytics'
Paperless Lab Academy 'legal aspects of big data analytics' Axon Lawyers
 
data mining privacy concerns ppt presentation
data mining privacy concerns ppt presentationdata mining privacy concerns ppt presentation
data mining privacy concerns ppt presentationiWriteEssays
 
[SLIDES] Internet of Things presentation at AEI (Sept 2014)
[SLIDES] Internet of Things presentation at AEI (Sept 2014)[SLIDES] Internet of Things presentation at AEI (Sept 2014)
[SLIDES] Internet of Things presentation at AEI (Sept 2014)Adam Thierer
 
Privacy and social media for Australian governments
Privacy and social media for Australian governmentsPrivacy and social media for Australian governments
Privacy and social media for Australian governmentsCraig Thomler
 
Data Science For Social Good: Tackling the Challenge of Homelessness
Data Science For Social Good: Tackling the Challenge of HomelessnessData Science For Social Good: Tackling the Challenge of Homelessness
Data Science For Social Good: Tackling the Challenge of HomelessnessAnita Luthra
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big databis_foresight
 
Cybersecurity Strategies - time for the next generation
Cybersecurity Strategies - time for the next generationCybersecurity Strategies - time for the next generation
Cybersecurity Strategies - time for the next generationHinne Hettema
 

Semelhante a Multimedia Privacy (20)

Helping Developers with Privacy
Helping Developers with PrivacyHelping Developers with Privacy
Helping Developers with Privacy
 
Ethical Issues Of Information Technology
Ethical Issues Of Information TechnologyEthical Issues Of Information Technology
Ethical Issues Of Information Technology
 
The death of data protection
The death of data protection The death of data protection
The death of data protection
 
The death of data protection sans obama
The death of data protection sans obamaThe death of data protection sans obama
The death of data protection sans obama
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Personal Data and Trust Network inaugural Event 11 march 2015 - record
Personal Data and Trust Network inaugural Event   11 march 2015 - recordPersonal Data and Trust Network inaugural Event   11 march 2015 - record
Personal Data and Trust Network inaugural Event 11 march 2015 - record
 
Thierer Internet Privacy Regulation
Thierer Internet Privacy RegulationThierer Internet Privacy Regulation
Thierer Internet Privacy Regulation
 
SOCIAL MEDIA RISKS | HB EMERGING COMPLEX CLAIMS
SOCIAL MEDIA RISKS | HB EMERGING COMPLEX CLAIMSSOCIAL MEDIA RISKS | HB EMERGING COMPLEX CLAIMS
SOCIAL MEDIA RISKS | HB EMERGING COMPLEX CLAIMS
 
Privacy & the Internet: An Overview of Key Issues
Privacy & the Internet: An Overview of Key IssuesPrivacy & the Internet: An Overview of Key Issues
Privacy & the Internet: An Overview of Key Issues
 
Malcolm Crompton, IIS Partners Irish Future Internet Forum - Socioeconomics
Malcolm Crompton, IIS Partners Irish Future Internet Forum - SocioeconomicsMalcolm Crompton, IIS Partners Irish Future Internet Forum - Socioeconomics
Malcolm Crompton, IIS Partners Irish Future Internet Forum - Socioeconomics
 
Ethics for Conversational AI
Ethics for Conversational AIEthics for Conversational AI
Ethics for Conversational AI
 
PLA Legal aspects of Big Data analytics final
PLA Legal aspects of Big Data analytics finalPLA Legal aspects of Big Data analytics final
PLA Legal aspects of Big Data analytics final
 
Paperless Lab Academy 'legal aspects of big data analytics'
Paperless Lab Academy 'legal aspects of big data analytics' Paperless Lab Academy 'legal aspects of big data analytics'
Paperless Lab Academy 'legal aspects of big data analytics'
 
data mining privacy concerns ppt presentation
data mining privacy concerns ppt presentationdata mining privacy concerns ppt presentation
data mining privacy concerns ppt presentation
 
[SLIDES] Internet of Things presentation at AEI (Sept 2014)
[SLIDES] Internet of Things presentation at AEI (Sept 2014)[SLIDES] Internet of Things presentation at AEI (Sept 2014)
[SLIDES] Internet of Things presentation at AEI (Sept 2014)
 
Privacy and social media for Australian governments
Privacy and social media for Australian governmentsPrivacy and social media for Australian governments
Privacy and social media for Australian governments
 
Data Science For Social Good: Tackling the Challenge of Homelessness
Data Science For Social Good: Tackling the Challenge of HomelessnessData Science For Social Good: Tackling the Challenge of Homelessness
Data Science For Social Good: Tackling the Challenge of Homelessness
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big data
 
Cybersecurity Strategies - time for the next generation
Cybersecurity Strategies - time for the next generationCybersecurity Strategies - time for the next generation
Cybersecurity Strategies - time for the next generation
 
Internal social networks
Internal social networksInternal social networks
Internal social networks
 

Mais de Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSymeon Papadopoulos
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualitySymeon Papadopoulos
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentSymeon Papadopoulos
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetSymeon Papadopoulos
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionSymeon Papadopoulos
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterSymeon Papadopoulos
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Symeon Papadopoulos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceSymeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Symeon Papadopoulos
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsSymeon Papadopoulos
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsSymeon Papadopoulos
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Symeon Papadopoulos
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015Symeon Papadopoulos
 

Mais de Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
 

Último

AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 

Último (17)

AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 

Multimedia Privacy

  • 1. Multimedia Privacy Gerald Friedland Symeon Papadopoulos Julia Bernd Yiannis Kompatsiaris ACM Multimedia, Amsterdam, October 16, 2016
  • 3. Overview of Tutorial • Part I: Understanding the Problem • Part II: User Perceptions About Privacy • Part III: Multimodal Inferences • Part IV: Some Possible Solutions • Part V: Future Directions
  • 5. What Can a Mindreader Read? • This vulnerability is a problem with any type of public or semi-public post. They’re not specific to a particular type of information, e.g. text, image, or video. • However, let’s focus on multimedia data: images, audio, video, social media context, etc.
  • 6. Multimedia on the Internet Is Big! Source: Domosphere
  • 7. Resulting Problem • More multimedia data = Higher demand for retrieval and organization tools. • But multimedia retrieval is hard! • Researchers work on making retrieval better (cf. latest advances in Deep Learning for content-based retrieval). • Industry develops workarounds to make retrieval easier right away.
  • 8. Hypothesis • Retrieval is already good enough to cause major issues for privacy that are not easy to solve. • Let’s take a look at some retrieval approaches: • Image tagging • Geo-tagging • Multimodal Location Estimation • Audio-based user matching
  • 11. Geo-Tagging Allows easier clustering of photo and video series, among other things.
  • 12. Geo-Tagging Everywhere Part of the location-based service hype: But: Geo-coordinates + Time = Unique ID!
  • 13. Support for Geo-Tags • Social media portals provide APIs to connect geo- tags with metadata, accounts, and web content. • Allows easy search, retrieval, and ad placement. Portal %* Total YouTube 3.0 3M Flickr 4.5 180M *estimate (2013)
  • 14. Hypothesis • Since geo-tagging is a workaround for multimedia retrieval, it allows us to peek into a future where multimedia retrieval works perfectly. • What if multimedia retrieval actually just worked?
  • 15. Related Work “Be careful when using social location sharing services, such as Foursquare.”
  • 16. Related Work Mayhemic Labs, June 2010: “Are you aware that Tweets are geo-tagged?”
  • 17. Can you do real harm? • Cybercasing: Using online (location-based) data and services to enable physical-world crimes. • Three case studies: G. Friedland and R. Sommer: "Cybercasing the Joint: On the Privacy Implications of Geotagging", Proceedings of the Fifth USENIX Workshop on Hot Topics in Security (HotSec 10), Washington, D.C, August 2010.
  • 18. Case Study 1: Twitter • Pictures in Tweets can be geo-tagged • From a tech-savvy celebrity we found: • Home location (several pics) • Where the kids go to school • Where he/she walks the dog • “Secret” office
  • 19. Celebs Unaware of Geo-Tagging Source: ABC News
  • 20. Celebs Unaware of Geotagging
  • 21. Google Maps Shows Address...
  • 22. Case Study 2: Craigslist “For Sale” section of Bay Area Craigslist.com: • 4 days: 68,729 pictures total - 1.3% geo-tagged
  • 23. Users Are Unaware of Geo-Tagging • Many “anonymized” ads had geo-location • Sometimes selling high-value goods, e.g. cars, diamonds, etc. • Sometimes “call Sunday after 6pm” • Multiple photos allow interpolation of coordinates for higher accuracy
  • 25. Geo-Tagging Resolution Measured accuracy: +/- 1m iPhone 3G picture Google Street View
  • 27. Case Study 3: YouTube Recall: • Once data is published, the Internet keeps it (often with many copies). • APIs are easy to use and allow quick retrieval of large amounts of data. Can we find people on vacation using YouTube?
  • 28. Cybercasing on YouTube Experiment: Cybercasing using the YouTube API (240 lines in Python)
  • 29. Cybercasing on YouTube Input parameters Location: 37.869885,-122.270539 Radius: 100km Keywords: kids Distance: 1000km Time-frame: this_week
  • 30. Cybercasing on YouTube Output • Initial videos: 1000 (max_res) • User hull: ~50k videos • Vacation hits: 106 • Cybercasing targets: >12
  • 31. The Threat Is Real!
  • 32. Question Do you think geo-tagging should be illegal? a) No, people just have to be more careful. The possibilities still outweigh the risks. b) Maybe it should be regulated somehow to make sure no harm can be done. c) Yes, absolutely! This information is too dangerous.
  • 33. But… Is this really about geo-tags? (remember: hypothesis)
  • 34. But… Is this really about geo-tags? No, it’s about the privacy implications of multimedia retrieval in general.
  • 35. Question And now? What do you think should be done? a) Nothing can be done. Privacy is dead. b) I will think before I post, but I don’t know that it matters. c)We need to educate people about this and try to save privacy. (Fight!) d) I’ll never post anything ever again! (Flight!)
  • 36. Observations • Many applications encourage heavy data sharing, and users go with it. • Multimedia isn’t only a lot of data, it’s also a lot of implicit information. • Both users and engineers often unaware of the hidden retrieval possibilities of shared (multimedia) data. • Local anonymization and privacy policies may be ineffective against cross-site inference.
  • 37. Dilemma • People will continue to want social networks and location-based services. • Industry and research will continue to improve retrieval techniques. • Government will continue to do surveillance and intelligence-gathering.
  • 38. Solutions That Don’t Work • I blur the faces •Audio and image artifacts can still give you away • I only share with my friends •But who are they sharing with, on what platforms? • I don’t do social networking •Others may do it for you!
  • 39. Further Observations • There is not much incentive to worry about privacy, until things go wrong. • People’s perception of the Internet does not match reality (enough).
  • 41. Definition • Privacy is the right to be let alone (Justices Warren and Brandeis) • Privacy is: a) the quality or state of being apart from company or observation b) freedom from unauthorized intrusion (Merriam Webster’s)
  • 42. Starting Points • Privacy is a human right. Every individual has a need to keep something about themselves private. • Companies have a need for privacy. • Governments have a need for privacy (currently heavily discussed).
  • 43. Where We’re At (Legally) Keep an eye out for multimedia inference!
  • 44. A Taxonomy of Social Networking Data • Service data: Data you give to an OSN to use it, e.g. name, birthday, etc. • Disclosed data: What you post on your page/space • Entrusted data: What you post on other people’s pages, e.g. comments • Incidental data: What other people post about you • Behavioural data: Data the site collects about you • Derived data: Data that a third party infers about you based on all that other data B. Schneier. A Taxonomy of Social Networking Data, Security & Privacy, IEEE, vol.8, no.4, pp.88, July-Aug. 2010
  • 45. Privacy Bill of Rights In February 2012, the US Government released CONSUMER DATA PRIVACY IN A NETWORKED WORLD: A FRAMEWORK FOR PROTECTING PRIVACY AND PROMOTING INNOVATION IN THE GLOBAL DIGITAL ECONOMY http://www.whitehouse.gov/sites/default/files/privacy-final.pdf
  • 46. Privacy Bill of Rights 1) Individual Control: Consumers have a right to exercise control over what personal data is collected from them and how they use it. 2) Transparency: Consumers have a right to easily understandable and accessible information about privacy and security practices. 3) Respect for Context: Consumers have a right to expect that organizations will collect, use, and disclose personal data in ways consistent with the context in which consumers provide the data.
  • 47. Privacy Bill of Rights 4) Security: Consumers have a right to secure and responsible handling of personal data. 5) Access and Accuracy: Consumers have a right to access and correct personal data in usable formats, in a manner that is appropriate to the sensitivity of the data and the risk of adverse consequences to citizens if the data is inaccurate.
  • 48. Privacy Bill of Rights 6) Focused Collection: Consumers have a right to reasonable limits on the personal data that organizations collect and retain. 7) Accountability: Consumers have a right to have personal data handled by organizations with appropriate measures in place to assure they adhere to the Consumer Privacy Bill of Rights.
  • 49. One View The Privacy Bill of Rights could serve as a requirements framework for an ideally privacy-aware Internet service. ...if it were adopted.
  • 50. Limitations • The Privacy Bill of Rights is subject to interpretation. • What is “reasonable”? • What is “context”? • What is “personal data”? • The Privacy Bill of Rights presents technical challenges.
  • 53. Personal Data Protection in EU • The Data Protection Directive* (aka Directive 95/46/EC on the protection of individuals with regard to the processing of personal data and on the free movement of such data) is an EU directive adopted in 1995 which regulates the processing of personal data within the EU. It is an important component of EU privacy and human rights law. • The General Data Protection Regulation, in progress since 2012 and adopted in April 2016, will supersede the Data Protection Directive and be enforceable as of 25 May 2018 • Objectives • Give control of personal data to citizens • Simplify regulatory environment for businesses * A directive is a legal act of the European Union, which requires member states to achieve a particular result without dictating the means of achieving that result.
  • 54. When is it legitimate…? Collecting and processing the personal data of individuals is only legitimate in one of the following circumstances (Article 7 of Directive): • Individual gives unambiguously consent • If data processing is needed for a contract (e.g. electricity bill) • If processing is required by a legal obligation • If processing is necessary to protect the vital interests of the person (e.g. processing medical data of an accident victim) • If processing is necessary to perform tasks of public interest • If data controller or third party have a legitimate interest in doing so, as long as this does not affect the interests of the data subject or infringe his/her fundamental rights
  • 55. Obligations of data controllers in EU Respect for the following rules: • Personal Data collected and used for explicit and legitimate purposes • It must be adequate, relevant and not excessive in relation to the above purposes • It must be accurate and updated when needed • Data subjects must be able to correct, remove, etc. incorrect data about themselves (access) • Personal data should not be kept longer than necessary • Data controllers must protect personal data (incl. from unauthorized access to third parties) using appropriate measures of protection (security, accountability)
  • 56. Handling sensitive data Definition of sensitive data in EU: • religious beliefs • political opinions • health • sexual orientation • race • trade union membership Processing sensitive data comes under stricter set of rules (Article 8)
  • 57. Enforcing data protection in EU? • The Directive states that every EU country must provide one or more independent supervisory authorities to monitor its application. • In principle, all data controllers must notify their supervisory authorities when they process personal data. • The national authorities are also in charge of receiving and handling complaints from individuals.
  • 58. Data Protection: US vs EU • US has no legislation that is comparable to EU’s Data Protection Directive. • US privacy legislation is adopted on ad hoc basis, e.g. when certain sectors and circumstances require (HIPAA, CTPCA, FCRA) • US adopts a more laissez-faire approach • In general, US privacy legislation is considered “weaker” compared to EU
  • 59. Example: What Is Sensitive Data? Public records indicate you own a house.
  • 60. Example: What Is Sensitive Data? A geo-tagged photo taken by a friend reveals who attended your party!
  • 61. Example: What Is Sensitive Data? Facial recognition match with a public record: Prior arrest for drug offense!
  • 62. Example: What Is Sensitive Data? 1) Public records indicate you own a house 2) A geo-tagged photo taken by a friend reveals who attended your party 3) Facial recognition match with a public record: Prior arrest for drug offense! → “You associate with convicts”
  • 63. Example: What Is Sensitive Data? “You associate with convicts” What will this do for your reputation when you: • Date? • Apply for a job? • Want to be elected to public office?
  • 64. Example: What Is Sensitive Data? But: Which of these is the sensitive data? a) Public record: You own a house b) Geotagged photo taken by a friend at your party c) Public record: A friend’s prior arrest for a drug offense d) Conclusion: “You associate with convicts.” e) None of the above.
  • 65. Who Is to Blame? a) The government, for its Open Data policy? b) Your friend who posted the photo? c) The person who inferred data from publicly available information?
  • 66. Part II: User Perceptions About Privacy
  • 68. The Teaching Privacy Project • Goal: Create a privacy curriculum for K-12 and undergrad, with lesson plans, teaching tools, visualizations, etc. • NSF sponsored. (CNS‐1065240 and DGE-1419319; all conclusions ours.) • Check It Out: Info, public education, and teaching resources: http://teachingprivacy.org
  • 69. Based on Several Research Strands • Joint work between Friedland, Bernd, Serge Egelman, Dan Garcia, Blanca Gordo, and many others! • Understanding of user perceptions comes from: • Decades of research comparing privacy comprehension, preferences, concerns, and behaviors, including by Egelman and colleagues at CMU • Research on new Internet users’ privacy perceptions, including Gordo’s evaluations of digital-literacy programs • Observation of multimedia privacy leaks, e.g. “cybercasing” study • Reports from high school and undergraduate teachers about students’ misperceptions • Summer programs for high schoolers interested in CS
  • 70. Common Research Threads • What happens on the Internet affects the “real” world. • However: Group pressure, impulse, convenience, and other factors usually dominate decision making. • Aggravated by lack of understanding of how sharing on the Internet really works. • Wide variation in both comprehension and actual preferences.
  • 71. Multimedia Motivation • Many current multimedia R&D applications have a high potential to compromise the privacy of Internet users. • We want to continue pursuing fruitful and interesting research programs! • But we can also work to mitigate negative effects by using our expertise to educate the public about effects on their privacy.
  • 72. What Do People Need to Know? Starting point: 10 observations about frequent misperceptions + 10 “privacy principles” to address them Illustrations by Ketrina Yim.
  • 73. Misconception #1 • Perception: I keep track of what I’m posting. I am in control. Websites are like rooms, and I know what’s in each of them. • Reality: Your information footprint is larger than you think! • An empty Twitter post has kilobytes of publicly available metadata. • Your footprint includes what others post about you, hidden data attached by services, records of your offline activities… Not to mention inferences that can be drawn across all those “rooms”!
  • 74. Misconception #2 • Perception: Surfing is anonymous. Lots of sites allow anonymous posting. • Reality: There is no anonymity on the Internet. •Bits of your information footprint — geo-tags, language patterns, etc. — may make it possible for someone to uniquely identify you, even without a name.
  • 75. Misconception #3 • Perception: There’s nothing interesting about what I do online. • Reality: Information about you on the Internet will be used by somebody in their interest — including against you. •Every piece of information has value to somebody: other people, companies, organizations, governments... •Using or selling your data is how Internet companies that provide “free” services make money.
  • 76. Misconception #4 • Perception: Communication on the Internet is secure. Only the person I’m sending it to will see the data. • Reality: Communication over a network, unless strongly encrypted, is never just between two parties. •Online data is always routed through intermediary computers and systems… •Which are connected to many more computers and systems...
  • 77. Misconception #5 • Perception: If I make a mistake or say something dumb, I can delete it later. Anyway, people will get what I mean, right? • Reality: Sharing information over a network means you give up control over that information — forever! •The Internet never forgets. Search engines, archives, and reposts duplicate data; you can’t “unshare”. •Websites sell your information, and data can be subpoenaed. •Anything shared online is open to misinterpretation. The Internet can’t take a joke!
  • 78. Misconception #6 • Perception: Facial recognition/speaker ID isn’t good enough to find this. As long as no one can find it now, I’m safe. • Reality: Just because it can’t be found today, it doesn’t mean it can’t be found tomorrow. •Search engines get smarter. •Multimedia retrieval gets better. •Analog information gets digitized. •Laws, privacy settings, and privacy policies change.
  • 79. Misconception #7 • Perception: What happens on the Internet stays on the Internet. • Reality: The online world is inseparable from the “real” world. •Your online activities are as much a part of your life as your offline activities. •People don’t separate what they know about Internet-you from what they know about in-person you.
  • 80. Misconception #8 • Perception: I don’t chat with strangers. I don’t “friend” people on Facebook that I don’t know. • Reality: Are you sure? Identity isn’t guaranteed on the Internet. •Most information that “establishes” identity in social networks may already be public. •There is no foolproof way to match a real person with their online identity.
  • 81. Misconception #9 • Perception: I don’t use the Internet. I am safe. • Reality: You can’t avoid having an information footprint by not going online. •Friends and family will post about you. •Businesses and government share data about you. •Companies track transactions online. •Smart cards transmit data online.
  • 82. Misconception #10 • Perception: There’s laws that keep companies and people from sharing my data. If a website has a privacy policy, that means they won’t share my information. It’s all good. • Reality: Only you have an interest in maintaining your privacy! •Internet technology is rarely designed to protect privacy. •“Privacy policies” are there to protect providers from lawsuits. •Laws are spotty and vary from place to place. •Like it or not, your privacy is your own responsibility!
  • 83. What Came of All This? Example: “Ready or Not?” educational app LINK
  • 84. What Came of All This? Example: “Digital Footprints” video
  • 85. Study 2: Perceived vs. Actual Predictability of Personal Information in Social Nets Papadopoulos and Kompatsiaris with Eleftherios Spyromitros- Xioufis, Giorgos Petkos, and Rob Heyman (iMinds)
  • 86. Personal Information in OSNs Participation in OSNs comes at a price! • User-related data is shared with: • a) other OSN users, b) the OSN itself, c) third parties (e.g. ad networks) • Disclosure of specific types of data: • e.g. gender, age, ethnicity, political or religious beliefs, sexual preferences, employment status, etc. • Information isn’t always explicitly disclosed! • Several types of personal information can be accurately inferred based on implicit cues (e.g. Facebook likes) and machine learning! (cf. Part III)
  • 87. Inferred Information & Privacy in OSNs • Study of user awareness with regard to inferred information largely neglected by social research. • Privacy usually presented as a question of giving access or communicating personal information to some party, e.g.: “The claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others.” (Westin, 1970) [1] Alan Westin. Privacy and freedom. Bodley Head, London, 1970.
  • 88. Inferred Information & Privacy in OSNs • However, access control is non-existent for inferred information: • Users are unaware of the inferences being made. • Users have no control over the way inferences are made. • Goal: Investigate whether and how users intuitively grasp what can be inferred from their disclosed data!
  • 89. Main Research Questions 1. Predictability: How predictable are different types of personal information, based on users’ OSN data? 2. Actual vs. perceived predictability: How realistic are user perceptions about the predictability of their personal information? 3. Predictability vs. sensitivity: What is the relationship between perceived sensitivity and predictability of personal information? • Previous work has focused mainly on Q1 • We address Q1 using a variety of data and methods, and additionally we address Q2 and Q3
  • 90. Data Collection • Three types of data about 170 Facebook users: • OSN data: Likes, posts, images -- collected through a test Facebook application • Answers to questions about 96 personal attributes, organized into 9 categories, e.g. health factors, sexual orientation, income, political attitude, etc. • Answers to questions related to their perceptions about the predictability and sensitivity of the 9 categories http://databait.eu http://www.usemp-project.eu
  • 91. Example From Questionnaire • What is your sexual orientation? → ground truth • Do you think the information on your Facebook profile reveals your sexual orientation? Either because you yourself have put it online, or it could be inferred from a combination of posts. → perceived predictability • How sensitive do you find the information you had to reveal about your sexual orientation? (1=not sensitive at all, 7= very sensitive) → perceived sensitivity Response # heterosexual 147 homosexual 14 bisexual 7 n/a 2 Response # yes 134 no 33 n/a 3
  • 92. Features Extracted From OSN Data • likes: binary vector denoting presence/absence of a like (#3.6K) • likesCats: histogram of like category frequencies (#191) • likesTerms: Bag-of-Words (BoW) of terms in description, title, and about sections of likes (#62.5K) • msgTerms: BoW vector of terms in user posts (#25K) • lda-t: Distribution of topics in the textual contents of both likes (description, title, and about section) and posts • Latent Dirichlet Allocation with t=20,30,50,100 • visual: concepts depicted in user images (#11.9K), detected using CNN, top 12 concepts per images, 3 variants • visual-bin: hard 0/1 encoding • visual-freq: concept frequency histogram • visual-conf: sum of detection scores across all images
  • 93. Experimental Setup • Evaluation method: repeated random sub-sampling • Data split randomly 𝑛=10 times into train (67%) / test (33%) • Model fit on train / accuracy of inferences assessed on test • 96 questions (user attributes) were considered • Evaluation measure: area under ROC curve (AUC) • Appropriate for imbalanced classes • Classification algorithms • Baseline: 𝑘-nearest neighbors, decision tree, naïve Bayes • SoA: Adaboost, random forest, regularized logistic regression
  • 94. Predictability per Attribute nationality is employed can be moody smokes cannabis plays volleyball
  • 95. What Is More Predictable? Rank Perceived Actual predictability Predictability SoA* 1 Demographics Demographics - Demographics 2 Relationship status and living condition Political views +3 Political views 3 Sexual orientation Sexual orientation - Religious views 4 Consumer profile Employment/Income +4 Sexual orientation 5 Political views Consumer profile -1 Health status 6 Personality traits Relationship status and living condition -4 Relationship status and living condition 7 Religious views Religious views - 8 Employment/Incom e Health status +1 9 Health status Personality traits -3 * Kosinski, et al. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 2013.
  • 98. Personal Data: Truly Multimodal • Text: posts, comments, content of articles you read/like, etc. • Images/Videos: posted by you, liked by you, posted by others but containing you • Resources: likes, visited websites, groups, etc. • Location: check-ins, GPS of posted images, etc. • Network: what your friends look like, what they post, what they like, community where you belong • Sensors: wearables, fitness apps, IoT
  • 99. What Can Be Inferred? A lot….
  • 100. Three Main Approaches • Content-based • What you post is what/where/how/etc. you are • Supervised learning • Learn by example • Network-based • Show me your friends and I’ll tell you who you are
  • 104. Multimodal Location Estimation We infer the location of a video based on visual stream, audio stream, and tags: • Use geo-tagged data as training data • Allows faster search, inference, and intelligence- gathering, even without GPS. G. Friedland, O. Vinyals, and T. Darrell: "Multimodal Location Estimation," pp. 1245- 1251, ACM Multimedia, Florence, Italy, October 2010.
  • 105. Intuition for the Approach {berkeley, sathergate, campanile} {berkeley, haas} {campanile} {campanile, haas} Node: Geolocation of video Edge: Correlated locations (e.g. common tag, visual, acoustic features) Edge Potential: Strength of an edge (e.g. posterior distribution of locations given common tags)
  • 106. MediaEval J. Choi, G. Friedland, V. Ekambaram, K. Ramchandran: "Multimodal Location Estimation of Consumer Media: Dealing with Sparse Training Data," in Proceedings of IEEE ICME 2012, Melbourne, Australia, July 2012.
  • 107. YouTube Cybercasing Revisited YouTube Cybercasing With Geo-Tags vs. Multimodal Location Estimation Old Experiment No Geo-Tags Initial Videos 1000 (max) 107 User Hull ~50k ~2000 Potential Hits 106 112 Actual Targets >12 >12
  • 108. Account Linking Can we link accounts based on their content?
  • 109. Using Internet Videos: Dataset Test videos from Flickr (~40 sec) • 121 users to be matched, 50k trials • 70% have heavy noise • 50% speech • 3% professional content H. Lei, J. Choi, A. Janin, and G. Friedland: “Persona Linking: Matching Uploaders of Videos Across Accounts”, at IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), Prague, May 2011.
  • 110. Matching Users Within Flickr Algorithm: 1) Take 10 seconds of the soundtrack of a video 2) Extract the Spectral Envelope 3) Compare using Manhattan Distance
  • 112. User ID on Flickr Videos
  • 113. Persona Linking Using Internet Videos Result: •On average, having 40 seconds in the test and training sets leads to a 99.2% chance for a true positive match!
  • 114. Another Linkage Attack Exploiting users’ online activity to link accounts • Link based on where and when a user is posting • Attack model is individual targeting • Datasets: Yelp, Flickr, Twitter • Methods • Location profile • Timing profile
  • 115. When a User Is Posting
  • 116. Where a User Is Posting - Twitter locations - Yelp locations
  • 117. De-Anonymization Model Targeted account (YELP users are ID’d) Candidate list
  • 118. Datasets • Three social networks: Yelp, Twitter, Flickr • Two types of data sets • Ground truth data set • Yelp-Twitter: 2,363 -> 342 (with geotags) -> 57 (in SF bay) • Flickr-Twitter: 6,196 -> 396 (with geotags) -> 27 (in SF bay) • Candidate Twitter list data set: 26,204
  • 121. Inferring Personal Information • Supervised learning algorithms • Learn a mapping (model) from inputs 𝒙𝑖 to outputs 𝑦𝑖 by analyzing a set of training examples 𝐷=(𝒙𝑖,𝑦𝑖 )i 𝑁 • In this case • 𝑦𝑖 corresponds to a personal user attribute, e.g. sexual orientation • 𝒙𝑖 corresponds to a set of predictive attributes or features, e.g. user likes • Some previous results • Kosinski et al. [1]: likes features (SVD) + logistic regression: Highly accurate inferences of ethnicity, gender, sexual orientation, etc. • Schwartz et al. [2] status updates (PCA) + linear SVM: Highly accurate inference of gender [1] Kosinski, et al. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 2013. [2] Schwartz, et al. Personality, gender, and age in the language of social media: The open- vocabulary approach. PloS one, 2013.
  • 122. What Do Your Likes Say About You? M. Kosinski, D. Stillwell, T. Graepel. “Private Traits and Attributes are Predictable from Digital Records of Human Behavior”. PNAS 110: 5802 – 5805, 2013
  • 123. M. Kosinski, D. Stillwell, T. Graepel. “Private Traits and Attributes are Predictable from Digital Records of Human Behavior”. PNAS 110: 5802 – 5805, 2013 Results: Prediction Accuracy
  • 124. M. Kosinski, D. Stillwell, T. Graepel. “Private Traits and Attributes are Predictable from Digital Records of Human Behavior”. PNAS 110: 5802 – 5805, 2013 The More You Like…
  • 125. Our Results: USEMP Dataset (Part II) Testing different classifiers
  • 126. Our Results: USEMP Dataset (Part II) Testing different features
  • 127. Our Results: USEMP Dataset (Part II) Testing combinations of features
  • 128. Caution: Reliability of Predictions MODEL 1 MODEL 2 MODEL N α% training set ENSEMBLE α% α%
  • 129. Caution: Reliability of Predictions Percentage of users, for which individual models have low agreement (Sx<0.5). Classification accuracy for those users. MyPersonality dataset (subset)
  • 130. Conclusions • Representing users as feature vectors and using supervised learning can help achieve pretty good accuracy in several cases. However: • There will be several cases where the output of the trained model will be unreliable (close to random). • For many classifiers and for abstract feature representations (e.g. SVD), it is very hard to explain why a particular user has been classified as belonging to a given class.
  • 131. Network-Based Learning Show me your friends…. with Georgios Rizos
  • 132. Network-Based Classification • People with similar interests tend to connect → homophily • Knowing about one’s connections could reveal information about them • Knowing about the whole network structure could reveal even more…
  • 133. My Social Circles A variety of affiliations: • Work • School • Family • Friends …
  • 134. SoA: User Classification (1) Graph-based semi-supervised learning: • Label propagation (Zhu and Ghahramani, 2002) • Local and global consistency (Zhou et al., 2004) Other approaches to user classification: • Hybrid feature engineering for inferring user behaviors (Pennacchiotti et al., 2011 , Wagner et al., 2013) • Crowdsourcing Twitter list keywords for popular users (Ghosh et al., 2012)
  • 135. SoA: Graph Feature Extraction (2) Use of community detection: • EdgeCluster: Edge centric k-means (Tang and Liu, 2009) • MROC: Binary tree community hierarchy (Wang et al., 2013) Low-rank matrix representation methods: • Laplacian Eigenmaps: k eigenvectors of the graph Laplacian (Belkin and Niyogi, 2003 , Tang and Liu, 2011) • Random-Walk Modularity Maximization: Does not suffer from the resolution limit of ModMax (Devooght et al., 2014) • Deepwalk: Deep representation learning (Perozzi et al., 2014)
  • 136. Overview of Framework Online social interactions (retweets, mentions, etc.) Social interaction user graph ARCTE Partial/Sparse Annotation Supervised graph feature representation Feature Weighting User Label Learning Classified Users
  • 138. Evaluation: Datasets Ground truth generation: • SNOW2014 Graph: Twitter list aggregation & post-processing • IRMV-PoliticsUK: Manual annotation • ASU-YouTube: User membership to group • ASU-Flickr: User subscription to interest group Datasets Labels Vertices Vertex Type Edges Edge Type SNOW2014 Graph (Papadopoulos et al., 2014) 90 533,874 Twitter Account 949,661 Mentions + Retweets IRMV-PoliticsUK (Greene & Cunningham, 2013) 5 419 Twitter Account 11,349 Mentions + Retweets ASU-YouTube (Mislove et al., 2007) 47 1,134,890 YouTube Channel 2,987,624 Subscriptions ASU-Flickr (Tang and Liu, 2009) 195 80,513 Flickr Account 5,899,882 Contacts
  • 139. Example: Twitter Twitter Handle Labels @nytimes usa, press, new york @HuffPostBiz finance @BBCBreaking press, journalist, tv @StKonrath journalist Examples from SNOW 2014 Data Challenge dataset
  • 140. Evaluation: SNOW 2014 dataset SNOW2014 Graph (534K, 950K): Twitter mentions + retweets, ground truth based on Twitter list processing
  • 141. Evaluation: ASU-YouTube • ASU-YouTube (1.1M, 3M): YouTube subscriptions, ground truth based on membership to groups
  • 143. Solution 1: Disclosure Scoring Framework with Georgios Petkos
  • 144. Problem and Motivation • Several studies have shown that privacy is a challenging issue in OSNs. •Madejski et al. performed a study with 65 users asking them to carefully examine their profiles → all of them identified a sharing violation. • Information about a user may appear not only explicitly, but also implicitly, and may therefore be inferred (also think of institutional privacy). • Different users have different attitudes towards privacy and online information sharing (Knijnenbourg, 2013). Madejski et al., “A study of privacy setting errors in an online social network”. PERCOM, 2012 Knijnenbourg, “Dimensionality of information disclosure behavior”. IJHCS, 2013
  • 145. Disclosure Scoring “A framework for quantifying the type of information one is sharing, and the extent of such disclosure.” Requirements: • It must take into account the fact that privacy concerns are different across users. • Different types of information have different significance to users. • Must take into account both explicit and inferred information.
  • 146. Related Work 1. Privacy score [Liu10]: based on the concepts of visibility and sensitivity: 1. Privacy Quotient and Leakage [Srivastava13] 2. Privacy Functionality Score [Ferrer10] 3. Privacy index [Nepali13] 4. Privacy Scores [Sramka15]
  • 147. Types of Personal Information aka Disclosure Dimensions
  • 148. Overview of PScore A F A.1 A.6 F.1 F.3A.5 • Explicitly Disclosed / Inferred • Value / Predicted Value • Confidence of Prediction • Level of Sensitivity • Level of Disclosure • Reach of Disclosure • Level of Sensitivity Observed data (URLs, likes, posts) Inference Algorithms 0101 1101 1001 Disclosure Dimensions User Attributes
  • 150. Visualization Bubble color/size proportional to disclosure score → red/big corresponds to more sensitive/risky
  • 151. Visualization Hierarchical exploration of types of personal information. http://usemp-mklab.iti.gr/usemp/
  • 152. Solution 2: Personalized Privacy-Aware Image Classification with Eleftherios Spyromitros-Xioufis and Adrian Popescu (CEA-LIST)
  • 153. Privacy-Aware Image Classification • Photo sharing may compromise privacy • Can we make photo sharing safer? • Yes: build “private” image detectors • Alerts whenever a “private” image is shared • Personalization is needed because privacy is subjective! -Would you share such an image? -Does it depend with whom?
  • 154. Previous Work, and Limitations • Focus on generic (“community”) notion of privacy • Models trained on PicAlert [1]: Flickr images annotated according to a common privacy definition • Consequences: • Variability in user perceptions not captured • Over-optimistic performance estimates • Justifications are barely comprehensible [1] Zerr et al., I know what you did last summer!: Privacy-aware image classification and search, CIKM, 2012.
  • 155. Goals of the Study • Study personalization in image privacy classification • Compare personalized vs. generic models • Compare two types of personalized models • Semantic visual features • Better justifications and privacy insights • YourAlert: more realistic than existing benchmarks
  • 156. Personalization Approaches • Full personalization: • A different model for each user, relying only on their feedback • Disadvantage: requires a lot of feedback • Partial personalization: • Models rely on user feedback + feedback from other users • Amount of personalization controlled via instance weighting
  • 157. Visual and Semantic Features • vlad [1]: aggregation of local image descriptors • cnn [2]: deep visual features • semfeat [3]: outputs of ~17K concept detectors • Trained using cnn • Top 100 concepts per image [1] Spyromitros-Xioufis et al., A comprehensive study over vlad and product quantization in large-scale image retrieval. IEEE Transactions on Multimedia, 2014. [2] Simonyan and Zisserman, Very deep convolutional networks for large-scale image recognition, ArXiv, 2014. [3] Ginsca et al., Large-Scale Image Mining with Flickr Groups, MultiMedia Modeling, 2015.
  • 158. Explanations via Semfeat • Semfeat can be used to justify predictions • A tag cloud of the most discriminative visual concepts • Explanations may often be confusing • Concept detectors are not perfect • Semfeat vocabulary (ImageNet) is not privacy-oriented knitwear young-back hand-glass cigar-smoker smoker drinker Freudian
  • 159. semfeat-LDA: Enhanced Explanations • Project semfeat to a latent space (second level semantic representation) • Images treated as text documents (top 10 concepts) • Text corpus created from private images (Pic+YourAlert) • LDA is applied to create a topic model (30 topics) • 6 privacy-related topics are identified (manually) Topic Top 5 semfeat concepts assigned to each topic children dribbler child godson wimp niece drinking drinker drunk tipper thinker drunkard erotic slattern erotic cover-girl maillot back relatives great-aunt second-cousin grandfather mother great-grandchild vacations seaside vacationer surf-casting casting sandbank wedding groom bride celebrant wedding costume
  • 161. YourAlert: A Realistic Benchmark • User study • Participants annotate their own photos (informed consent, only extracted features shared) • Annotation based on the following definitions: • Private: “would share only with close OSN friends or not at all” • Public: “would share with all OSN friends or even make public” • Resulting dataset: YourAlert • 1.5K photos, 27 users, ~16 private/40 public per user • Main advantages: •Facilitates realistic evaluation of privacy models •Allows development of personalized models Publicly available at: http://mklab.iti.gr/datasets/image-privacy/
  • 162. Generic Models: PicAlert vs. YourAlert
  • 163. Key Findings • Almost perfect performance for PicAlert with CNN • semfeat performs similarly to CNN • Significantly worse performance for YourAlert • Similar performance for all features • Additional findings • Using more generic training examples does not help • Large variability in performance across users
  • 164. Personalized privacy models • Evaluation carried out on YourAlert • A modified k-fold cross-validation for unbiased estimates • Personalized model types • ‘user’: only user-specific examples from YourAlert • ‘hybrid’: a mixture of user-specific examples from YourAlert and generic examples from PicAlert • User-specific examples are weighted higher
  • 165. Evaluation of Personalized Models PicAlert YourAlert u1 3-fold cv k=1 test set u2 u3 Model type: ‘user’
  • 166. Evaluation of Personalized Models PicAlert YourAlert u1 3-fold cv k=1 test set u2 u3 Model type: ‘user’
  • 167. Evaluation of Personalized Models PicAlert YourAlert u1 3-fold cv k=1 test set u2 u3 Model type: ‘user’
  • 168. Evaluation of Personalized Models PicAlert YourAlert u1 3-fold cv k=1 test set u2 u3 Model type: ‘hybrid w=1’
  • 169. Evaluation of Personalized Models PicAlert YourAlert u1 3-fold cv k=1 test set u2 u3 Model type: ‘hybrid w=1’
  • 170. Evaluation of Personalized Models PicAlert YourAlert u1 3-fold cv k=1 test set u2 u3 Model type: ‘hybrid w=1’
  • 171. Evaluation of Personalized Models PicAlert YourAlert u1 3-fold cv k=1 test set u2 u3 Model type: ‘hybrid w=2’
  • 172. Evaluation of Personalized Models PicAlert YourAlert u1 3-fold cv k=1 test set u2 u3 Model type: ‘hybrid w=2’
  • 173. Evaluation of Personalized Models PicAlert YourAlert u1 3-fold cv k=1 test set u2 u3 Model type: ‘hybrid w=2’
  • 175. Privacy Insights via Semfeat child mate son private uphill lakefront waterside public
  • 176. Identifying Recurring Privacy Themes • A prototype semfeat-LDA vector for each user • The centroid of the semfeat-LDA vectors of their private images • K-means (k=5) clustering on the prototype vectors
  • 177. Would you share the following? With whom would you share the photos in the following slides: a)family b)friends c)colleagues d)your Facebook friends e)everyone (public)
  • 182. Towards Private Multimedia Systems We should: • Research methods to help mitigate risks and offer choice. • Develop privacy policies and APIs that take into account multimedia retrieval. • Educate users and engineers on privacy issues. ...before panic slows progress in the multimedia field.
  • 183. The Role of Research Research can help: • Describe and quantify risk factors • Visualize and offer choices in UIs • Identify privacy-breaking information • Filter out “irrelevant information” through content analysis
  • 184. Reality Check Can we build a privacy-proof system? No. We can’t build a theft-proof car either. However, we can make it more or less privacy-proof.
  • 185. Emerging Issue: Internet of Things Graphic by Applied Materials using International Data Corporation data.
  • 186. Emerging Issue: Wearables Source: Amish Gandhi via SlideShare
  • 187. Multimedia Things • Much of the IoT data collected is multimedia data. •Requires (exciting!) new approaches to real-time multimedia content analysis. → •Presents new threats to security and privacy. → •Requires new best practices for Security and Privacy by Design and new privacy enhancing technologies (PETs). → •Presents opportunities to work on privacy enhancements to multimedia!
  • 188. Example IoT Advice From Future of Privacy Forum • Get creative with using multimedia affordances (visual, audio, tactile) to alert users to data collection. • Respect for context: Users may have different expectations for data they input manually and data collected by sensors. • Inform users about how their data will be used. • Choose de-identification practices according to your specific technical situation. •In fact, multimedia expertise can contribute to improving de- identification! • Build trust by allowing users to engage with their own data, and to control who accesses it. Source: Christopher Wolf, Jules Polonetsky, and Kelsey Finch, A Practical Privacy Paradigm for Wearables. Future of Privacy Forum, 2015.
  • 189. One Privacy Design Practice Above All Think about privacy (and security) as you BEGIN designing a system or planning a research program. Privacy is not an add-on!
  • 190. Describing Risks A Method from Security Research • Build a model for potential attacks as a set of: • attacker properties • attack goals • Proof your system against it as much as possible. • Update users’ expectations about residual risk.
  • 191. Attacker Properties: Individual Privacy • Resources • individual/institutional/moderate resource • Target Model • targeted individual/easiest k of N/everyone • Database access • full (private, public) data access/well-indexed access/poorly indexed access/hard retrieval/soft retrieval (multimedia)
  • 192. Goals of Privacy Attacks • Cybercasing (attack preparation) • Cyberstalking • Socio-Economic profiling • Espionage (industry, country) • Cybervetting • Cyberframing
  • 193. Towards Privacy-Proof MM Systems • Match users’ expectations of privacy in system behavior (e.g. include user evaluation) • If that’s not possible, educate users about risks • Ask yourself: What is the best trade-off for the users between privacy, utility, and convenience? • Don’t expose as much information as possible, expose only as much information as is required!
  • 194. Engineering Rules From the Privacy Community • Inform users of the privacy model and quantify the possible audience: • Public/link-to-link/semi-public/private • How many people will see the information (avg. friends- of-friends on Facebook: 70k people!) • If users expect anonymity, explain the risks of exposure • Self-posting of PII, hidden meta-data, etc. • Provide tools that make it easier to stay (more) anonymous based on expert knowledge (e.g. erase EXIF)
  • 195. Engineering Rules from the Privacy Community • Show users what metadata is collected by your service/app and to whom it is made available (AKA “Privacy Nutrition Label”) • At the least, offer an opt-out! • Make settings easily configurable (Facebook is not easily configurable) • Offer methods to delete and correct data • If possible, trigger search engine updating after deletion • If possible, offer “deep deletion” (i.e. delete re-posts, at least within-system)
  • 196. Closing Thought Exercise: Part 1 Take two minutes to think about the following questions: • What’s your area of expertise? What are you working on right now? • How does it interact with privacy? What are the potential attacks and potential consequences? • What can you do to mitigate negative privacy effects? • What can you do to educate users about possible privacy implications?
  • 197. Closing Thought Exercise: Part 2 • Turn to the person next to you and share your thoughts. Ask each other questions! • You have five minutes.
  • 198. Acknowledgments Work together with: • Jaeyoung Choi, Luke Gottlieb, Robin Sommer, Howard Lei, Adam Janin, Oana Goga, Nicholas Weaver, Dan Garcia, Blanca Gordo, Serge Egelman, and others • Georgios Petkos, Eleftherios Spyromitros-Xioufis, Adrian Popescu, Rob Heyman, Georgios Rizos, Polychronis Charitidis, Thomas Theodoridis and others
  • 199. Thank You! Acknowledgements: • This material is based upon work supported by the US National Science Foundation under Grant No. CNS-1065240 and DGE-1419319, and by the European Commission under Grant No. 611596 for the USEMP project. • Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding bodies.