SlideShare uma empresa Scribd logo
1 de 82
Baixar para ler offline
巨量與開放資料之創新機會與關鍵挑戰巨量與開放資料之創新機會與關鍵挑戰
Vincent S. Tseng (曾新穆)
D t t f C t S iDepartment of Computer Science
National Chiao Tung University
T i
1
Taiwan
Starting with Some
Innovative Applications
Google Flu Trendsg
J. Ginsberg, et al.,
Detecting influenza epidemics
i h i d tusing search engine query data,
Nature, February 2009
Link:- www.google.com/flutrends
Application in Movie Industrypp y
 電影【復仇者聯盟】: 成本兩億美金 電影【復仇者聯盟】: 成本兩億美金
 如何知道觀眾之興趣反應?
 如何訂定最佳之行銷策略?
4
Application in Movie Industry (cont.)pp y ( )
 利用Big Data Analytics 監測分析社交媒體對電影
預告片之反應:
 11億條 Tweets/min
萬篇 570萬篇Blogs/min
 350萬條 Messages/min
 擷取關鍵訊息 分析主題 判斷網友意向 → 歸結出網友對電影預告 擷取關鍵訊息, 分析主題, 判斷網友意向 → 歸結出網友對電影預告
片之看法與評價
 電影公司針對分析結果進行行銷策略之調整
 【復仇者聯盟】票房:
 2012年5月上片後, 美國本土首周票房達兩億美金(成本),寫下全美
影史最高首周票房紀錄
 2012年總票房達15億美金, 成為世界電影史票房排名第三名, 僅次於
”阿凡達” 、“鐵達尼號”阿凡達 鐵達尼號
5
Architecture for Big Data Analytics
High-Performance Computing Platform
g y
High Performance Computing Platform
(Cloud, Stream, In-Memory, …)
DataADataADataADataA
Mining & Learning
Components Rules RetrieveRules Retrieve
•Clusters
•Association
…….Reports
AccessCAPAccessCAPAccessCAPAccessCAP
D tD t
Components
Data MiningData Mining
Rules Retrieve
Components
Rules Retrieve
Components
Input
C ++C ++ Predictive
Models
Models/
Rules
IIII
Data
Preparation
Components
Data
Preparation
Components
Text MiningText Mining
Machine LearningMachine Learning
Prediction
Components
Prediction
Components
Data
• Structured
• Unstructured
Rules
Statistical LearningStatistical Learning
Applications
Module
Applications
Module
Interesting
Patterns
Data
Preparation Deploy
Data
Access
Data
Modeling
Presentation
/Applications
7
Tackling Some Key Challengesg y g
 Data Preprocessing Phase Data Preprocessing Phase
 Data quality problem: Noise, Incompleteness, Sparsity
 Veracity issue: Is bigger the better?y gg
 Data Understanding Phase
 Key Features Discovery: Finding the needle in a haystack
 Learning and Modeling Phase
 Timeliness vs. Precision: Issues for data sampling
 Need of more sophisticated methodologies
 Post-processing Phase
8
Some Key Challengesy g
 Data Preprocessing Phase Data Preprocessing Phase
 Data quality problem: Noise, Incompleteness, Sparsity
 Veracity issue: Is bigger the better?y gg
 Data Understanding Phase
 Key Features Discovery: Finding the needle in a haystack
 Learning and Modeling Phase
 Timeliness vs. Precision: Issues for data sampling
 Need of more sophisticated methodologies
 Post-processing Phase
9
10
Netflix Overview
11
商業模式商業模式
12
個人化推薦系統個人化推薦系統
13
個人化推薦系統(cont.)個人化推薦系統( )
 推薦系統 & 過濾系統 推薦系統 & 過濾系統
 利用Big Data Analytics分析客戶偏好度
 提供非熱門影片以平衡與滿足客戶需求,非熱門影片租 提供非熱門影片以平衡與滿足客戶需求,非熱門影片租
借佔了七成
 當您被推薦的冷門電影卻非常好看,那種感覺是無可比 當您被推薦的冷門電影卻非常好看 那種感覺是無可比
擬的
 四分之三的推薦影片評價比最新發行的影片還高,這就
是推薦系統的真正價值
 世界上最龐大的電影評比資料庫,遠超過競爭對手所能
提供的服務價值
14
Big Data in Netflixg
 62M+ Subscribers over 50 countries 62M+ Subscribers over 50 countries
 4M/day Ratings
 3M/day Searches
 30+M/day plays30 M/day plays
 Streaming hours
2B h i Q1/2012 2B hours in Q1/2012
 10B hours in Q1/2015
15
Netflix Prize
 Grand Prize, $1M USD for 10% improvement in
prediction accuracy
 Progress Prize, $50,000 USD every yearg , $ , y y
 Since Oct. 2, 2006
E d O t 2 2011 End Oct. 2, 2011
 Or when some teams reach 10% goal
16
(Ref: Netflix 2012 )
Recommendation Problem:
Collaborative Filtering based Methods- Collaborative Filtering-based Methods
itm1 itm2 itm3 itm4 itm5
A d ? 1 1 4 5
User-based Collaborative Filtering
Andre ? 1 1 4 5
Ben 1 2 0 2 0
Juice 3 1 2 4 5
User based Collaborative Filtering
David 1 1 0 1 0
itm1 itm2 itm3 itm4 itm5Item-based Collaborative Filtering 1 2 3 4 5
Andre ? 1 0 4 5
Ben 1 2 0 2 0
g
Juice 3 1 2 4 5
David 1 1 0 1 0
if i itm1 itm2 itm4 itm3 itm5
Andre ? 1 4 0 5
Ben 1 2 2 0 0
Unifying User-based and Item-based
Collaborative Filtering
17
Ben 1 2 2 0 0
Juice 3 1 2 4 5
David 1 1 1 0 0
Netflix Analytics Worky
 Dataset consists of 100M+ training entries Dataset consists of 100M+ training entries
 Each training entry is in a quadruplet form
 <user, movie, date, grade>, each is an integer
 The qualifying dataset consists of 2.8M entriesq y g
 <user, movie, date> w/o grading
 Error measure: RMSE (root mean square error) Error measure: RMSE (root mean square error)
18
RMSE Scores
 0 8563 (10%) Grand Prize 0.8563 (10%) Grand Prize
 0.8643 (9.15%) Leader
 0.8667 (8.9%) Current progress
 0.8712 (8.43%) Progress Prize Winner 20070.8712 (8.43%) Progress Prize Winner 2007
 0.9514 (0%) Netflix Cinematch
1 0540 ( 10 78%) M i A 1.0540 (-10.78%) Movie Average
19
2009 Grand Prize
Winner:
BellKor's Pragmatic Chaos
20
Challengesg
 Data Sparsity Problemp y
 Highly Sparse Data & Cold Start Problem:
traditional approaches like CF are not feasibletraditional approaches like CF are not feasible
→ Need specialized method
 Netflix Prize winner: Pragmatic Chaos Theory Netflix Prize winner: Pragmatic Chaos Theory
 Gap between complex models and deployment
 Winner’s solution: Complex composition of
hundreds/thousands of learned models
→ Hard to deploy in real applications
 Similar scenarios exist in many big datay g
applications and effective solutions are desired! 21
Some Key Challengesy g
 Data Preprocessing Phase Data Preprocessing Phase
 Data quality problem: Noise, Incompleteness, Sparsity
 Veracity issue: Is bigger the better?y gg
 Data Understanding Phase
 Key Features Discovery: Finding the needle in a haystack
 Learning and Modeling Phase
 Timeliness vs. Precision: Issues for data sampling
 Need of more sophisticated methodologies
 Post-processing Phase
22
I bi l h b ?Is bigger always the better?
Veracity issue-- Veracity issue
Google Flu Trendsg
J. Ginsberg, et al.,
Detecting influenza epidemics
i h i d tusing search engine query data,
Nature, February 2009
Link:- www.google.com/flutrends
Google Flu Trends -- Ideag
• C t i W b S h• Certain Web Search
terms are good
Indicators of flu activity.
• Google Trend uses
Aggregated search data
on flu indicators.on flu indicators.
• Estimate current flu
activity around the world
i l tiin real time.
• From example :- Google
Flu Trend detectsFlu Trend detects
increased flu activity two
weeks before CDC. *CDC: Center for Disease Control
Google Flu Trends -- Modelg
 Data:
 Look at all search queries in Google from 2003 to 2008 Look at all search queries in Google from 2003 to 2008
 Several hundred billion individual searches
in the United States
 Keep track of only the 50 million most
common queries
 Keep a weekly count for each query
 Also keep counts of each query by geographic region
(requires use of geo-location from IP addresses: >95% accurate)
So counts for 50 million queries x 170 weeks x 9 regions
query selectionq g
 Target variable to be predicted:
 For each week, for each region
I(t) = percentage physician visits that are ILI (as compiled by CDC)
query selection
I(t) = percentage physician visits that are ILI (as compiled by CDC)
 Input variable:
Q(t) = sum of top n highest correlated queries
/ total number of queries that week
Constructing the
ILI-related query/ total number of queries that week
“M d l l i ”
q y
fraction
 “Model learning”:
log( I(t) / [1 – I(t)] ) =  log ( Q(t)/ [1 – Q(t) ] ) + noise Logistic regression
The Parable of Google Flu: Traps in Bigg p g
Data Analysis (Science, Mar. 2014)
Some Key Challengesy g
 Data Preprocessing Phase Data Preprocessing Phase
 Data quality problem: Noise, Incompleteness, Sparsity
 Veracity issue: Is bigger the better?y gg
 Data Understanding Phase
 Key Features Discovery: Finding the needle in a haystack
 Learning and Modeling Phase
 Timeliness vs. Precision: Issues for data sampling
 Need of more sophisticated methodologies
 Post-processing Phase
28
Deep Understanding of Key Featuresp g y
 A large-scale research initiative aimed at
 Innovations around smartphone-based research
 Collect smartphone data in everyday life conditions
 Community-based evaluation of related mobile data analysis
methodologiesmethodologies
 Data source: Lausanne Data Collection Campaign
30
User Profile/Behavior Modeling and Prediction
 Personal information
 Media files
 Device information
 Process
 Calendar
 Applications
 Social information
 Accelerometer
 System Information
 Location information
 Call log
 Contacts
 Bluetooth
 GSM
 WLAN
 Sequence of place visits
MDC 2012 Tracks
 Main Goals
 User Profile/Behavior Modeling and Prediction
 Dedicated Track Dedicated Track
 Demographic attribute prediction
 Predict gender age group marital status job type etc Predict gender, age group, marital status, job type, etc.
of an user
 Semantic place prediction Semantic place prediction
 Predict the semantic meaning of user’s visited places
N t l di ti Next place prediction
 Predict the next destination of a user
32
Demographic Attribute Prediction
 One of the items: Prediction of gender
g p
 One of the items: Prediction of gender
33
34
Modeling Flowg
35
Demographic Attribute Prediction
 Lots of features could be extracted from data
g p
 10,000+ features used by the winner team!
 High accuracy achieved: 96%
………………Location
features
Media features
Sensor features
36
Very high dimensional complexityVery high dimensional complexity
- Feasibility problem in real applications!Feasibility problem in real applications!
Is there some key/dominating feature?
………………Location……
features
Media features
S f tSensor features
37
Demographic Attribute Prediction (cont.)
 Accelerometer is actually a key/dominating
g p ( )
feature!
 Support accuracy around 95%
 Underlying reasoning?
38
Very Different behavior between the
Male & Female !
39
Some Key Challengesy g
 Data Preprocessing Phase Data Preprocessing Phase
 Data quality problem: Noise, Incompleteness, Sparsity
 Veracity issue: Is bigger the better?y gg
 Data Understanding Phase
 Key Features Discovery: Finding the needle in a haystack
 Learning and Modeling Phase
 Timeliness vs. Precision: Issues for data sampling
 Need of more sophisticated methodologies
 Post-processing Phase
40
Timeliness in Big Data Analyticsg y
41
(Source: IBM white paper)
One Solution: Data Samplingp g
- Bias on Data Samples
T i id i l f h Twitter provides two main outlets for researchers to
access tweets in real time:
 Streaming API (~1% of all public tweets, free)
 Firehose (100% of all public tweets, costly)
 Streaming API data is often used by researchers to
validate hypotheses.
 How well does the sampled Streaming API data measure
the true activity on Twitter?
42
Bias on Data Samples (cont.)p ( )
S [H Li l AAAI ICWSM2013]
43
Source: [Huan Liu et al. AAAI ICWSM2013]
全民健保資料
44
National Health Insurance Research Database
in Taiwan
 National Health Insurance (NHI ) National Health Insurance (NHI )
 Established in March 1, 1995
 Serves 99.2% of Taiwanese population (20M+)
 Covers 92.62% of medical institutions
 Longitudinal Health Insurance Database ( LHID )
 sampled from NHIRDp
 Including 951,044 people health records
 1997 – now
Strongly representative in Taiwan Strongly representative in Taiwan
 Every living regions
 Big time interval
15+ years
Reference : National Health Insurance, http://www.nhi.gov.tw
Linking with More Heterogeneous Datasets
Environmental
Smart
Environmental
monitoring data
Lab data & PatientLab data & Patient
CRCRNHINHI CODCODBRBR Smart
Health Risk
Al treported outcomereported outcome
Cloud Sensor-based biomarker
Alert
Computing
Sensor-based biomarker
monitoring data
46
健保資料抽樣方式健保資料抽樣方式
 資料內容
 以2010年承保資料檔中「2010年在保者」隨機取100萬人
 抽樣母體群
 由中央健康保險署所提供的2010年承保資料檔以「身份證字
號加上生日加上性別」歸人,可得 27,378,403人之資料,
作為資料母檔。作為資料母檔
 抽樣方法
 利用隨機值產生器(random number generator)產生至少100 利用隨機值產生器(random number generator)產生至少100
萬個隨機值(random number, 實得1,074,263個隨機值),取
與100萬個隨機值相同的流水號,來隨機抽取所需的保險對
象樣本。象樣本
 關於隨機值產生作業,係採用Oracle的DBMS_RANDOM套件來
執行。
資料來源: 全民健康保險研究資料庫, http://nhird.nhri.org.tw/date_cohort.htm
健保資料抽樣方式(cont.)健保資料抽樣方式( )
萬樣本與抽樣母群體 全人口 之驗證方式 100萬樣本與抽樣母群體(全人口) 之驗證方式
 統計資料中年齡、性別、每年出生人數分佈,以及
平均投保金額,比較100萬樣本與抽樣母群體之間是
否有差異
 同時並與內政部公佈之資料值比較
 以卡方分析分析100萬人樣本對抽樣母群體之代表性
 均在5%顯著水準以下
資料來源: 全民健康保險研究資料庫, http://nhird.nhri.org.tw/date_cohort.htm
疾病因子分析
Linked data is biased!
測站
日期
每日X疾病就診人數大氣環境資料
監測站
49
空氣汙染資料
監測站
監測站
使用LHID2000百萬抽樣檔
Some Key Challengesy g
 Data Preprocessing Phase Data Preprocessing Phase
 Data quality problem: Noise, Incompleteness, Sparsity
 Veracity issue: Is bigger the better?y gg
 Data Understanding Phase
 Key Features Discovery: Finding the needle in a haystack
 Learning and Modeling Phase
 Timeliness vs. Precision: Issues for data sampling
 Need of more sophisticated methodologies
 Post-processing Phase
50
Mining User PreferenceMining User Preference
- for POI Recommendation
Goal
• How to do POI recommendation by utilizing user’s
i l t k l ( h k i )?social network log (eg, check-in)?
1
3
4
6
5 7
8
9S
2
3 9
10S
p
S
p
2
1
- 52 -
Urban Point of Interest Recommendation byUrban Point-of-Interest Recommendation by
Mining User Check-in Behaviors
Josh Jia-Ching Ying, Eric Hsueh-Chan Lu, Wen-Ning KuoJosh Jia Ching Ying, Eric Hsueh Chan Lu, Wen Ning Kuo
and Vincent S. Tseng
2012ACM SIGKDD Int’l Workshop on Urban Computing2012ACM SIGKDD Int l Workshop on Urban Computing
(UrbComp 2012)
Proposed Method – UPOI-Minep
LBSN Dataset Social Factor User-POI Graph
Construction
Relevance Learning
LBSN Dataset Social Factor User-POI Graph
Construction
Relevance Learning
Individual
Preference
Construction
Individual
Preference
Construction
Feature Extraction POI Popularity -
User-POI Relevance
Matrix
Feature Extraction POI Popularity -
User-POI Relevance
Matrix
UserRequest Top k Nearest POI
selection
Top k Nearest POI POI RankingUserRequest Top k Nearest POI
selection
Top k Nearest POI POI Ranking
POI
Recommending
List
POI
Recommending
ListPOI Recommendation
Social Factor (SF)( )

F
Weight
kikiki DisSimwCheckSimw )1(Relation 


k
i,kk,jji Interest,POIuserSF
1
]Relation[)(
kikiki ,,, )(

, jkcheckin
Interest

 ||
1
,
, S
s
sk
jk
checkin
Interest
F f i d fF: friends of user i
S: the set of POIs
Check-in k,* = check-ins of user k at POI*
Individual Preference (IP)( )
highlight
category
• Individual Preference(IP)
• HPrefi,h
• CPrefi
category
• CPrefi,c
),POIIP(user ji 
  Pr)1()POI(Pr
,
)(
HCount
HCount
efHIefC
Hh
jh
i,h
C
jcctgi,c 

  







 
asdefinedfunctionindicatoranis)I(where,
,
s,c
HCountHh
Hg
jgCc 

 





 

otherwise0
)(POIif1
)POI()(
cctg
I
j
jcctg
 otherwise0
POI Popularity (PP)
 POI Popularity
p y ( )
 POI Popularity
 Relative Popularity of POI
 Normalized based on category
checkins
RP
j
j


.POIithcategory wsamein thewhichPOIsofsettheiswhere,
POI
jCS
checkins
CS
k
j
k

.Otcatego y wsa et ew cO sosett esw e e, jCS
Relevance Estimation
TargetTo estimate the relevance of each pair of user-POI TargetTo estimate the relevance of each pair of user-POI,
we use these features to learn a Regression-Tree
Model.
User ID POI ID SF PP IP Relevance
1 A 0.2 0.1 0.001 3
1 B 0.05 0.2 0.1 51 B 0.05 0.2 0.1 5
1 C 0.004 0.1 0.9 1
… … … … … …
N D 0.5 0.15 0.06 2
Regression-Tree Model
Experimental Evaluation
 Real dataset crawled from Gowalla
p
 in New York City area
 1,964,919 POIs, ,
 18,159 people
 5 341 191 Check-ins 5,341,191 Check-ins
 392,246 Friendship Links
Comparisons with Otherp
Recommenders
Better way for modeling?Better way for modeling?
- UPOI-Walk- UPOI-Walk
In ACM Transactions on Intelligent Systems and Technologies, 2014
Motivation
 The existing models could not deal with such
h f llheterogeneous features well
 The existing models try to combine all features into
f b ildi i l d l Bi !one measure for building a single model → Bias!
Relevance LearningRelevance Learning
LBSN Dataset Social Factor
Individual
Preference
User-POI Graph
Construction
Hits-based
Random Walk
LBSN Dataset Social Factor
Individual
Preference
User-POI Graph
Construction
Hits-based
Random Walk
Feature Extraction
Preference
POI Popularity User-POI Graphs
User-POI
Relevance
Matrix
Feature Extraction
Preference
POI Popularity User-POI Graphs
User-POI
Relevance
Matrix
User Request Top k Nearest POI
selection
Top k Nearest POI POI RankingUser Request Top k Nearest POI
selection
Top k Nearest POI POI Ranking
POI
Recommending
ListPOI Recommendation
POI
Recommending
List
Random Walk Model
HITS-based Random Walk
X C t l “Mi i i ifi t ti
Random Walk
X. Cao , et al., “Mining significant semantic
locations from GPS data,” Proceedings of the
VLDB Endowment, v.3 n.1-2, September
20102010
0.3
0 2 0 10.2
0.4
0.1
Given an m × n hits value matrix MGiven an m × n hits value matrix M
11
1
1
))1((
))1((




kk
k
user
T
col
k
POI
xMx
xMx


HITS-based Random Walk
2 ))1((  POIrowuser xMx 
Dynamic HITS-Based Random Walky
X
N
X
Y
Network Set
= {M,N,X,Y,Z,…}M
ZZ
Randomly select
vPOI
k1
 (Mcol
T
(1)1)vuser
k
vk1
 (N (1 ) )vk1
……
hits value
matrixes from
Network Set
vuser  (Nrow (1)2 )vPOI
vPOI
k2
 (Xcol
T
(1)1)vuser
k1
vk3
 (Y (1)2 )v O
k2
…
vuser (Yrow (1 )2 )vPOI
vPOI
k3
 (Zcol
T
(1)1)vuser
k2
…
till converged
Comparison with Existing
R d NDCGRecommenders - NDCG
Beautiful algorithms matter a lot still
for Big Data Analytics!
67
Some Key Challengesy g
 Data Preprocessing Phase Data Preprocessing Phase
 Data quality problem: Noise, Incompleteness, Sparsity
 Veracity issue: Is bigger the better?y gg
 Data Understanding Phase
 Key Features Discovery: Finding the needle in a haystack
 Learning and Modeling Phase
 Timeliness vs. Precision: Issues for data sampling
 Need of more sophisticated methodologies
 Post-processing Phase
68
醫療雲計畫醫療雲計畫
69
全民健保資料加值計劃
70
Early Prediction of Diseasesy
Huizinga, T. W. J., & van der Helmvan Mil, A. H. M. (2007). Prediction and prevention of rheumatoid
arthritis. Revista Colombiana de Reumatología, 14(2), 106-114.
Early RA
12 month
RA DiagnosisEarly RA
18 month
Very Early
Detection
~ X years
71
ye s
Analytics Frameworky
Data miningTarget PreprocessedRaw
techniquesdata datadata
Di d
Off
Classifier
Discovered
Rules
Off-
line
On-line
Morbidity Risk
Prediction
S
Health records
Potential
Patient Doctor / Hospital
System
Predicted risk
72
Rules Produced
Too many rules!
Postprocessing is
essential!
73
73
Post-Processing – Rules FilteringPost Processing Rules Filtering
Rules:
Lift > 1: 11,004 Rules
Lift = 1: 357 Rules
Lift < 1: 7,543 Rules
74
Postprocessing: Literature Search (Pubmed)
Acute laryngopharyngitis
Manic disorder
neoplasm of breast
Adhesive capsulitis of shoulder
0
0
0
0
0
0
0
0
decubitus
urination
Vaginitis
Kaschin
lumbar intervertebral disc
Pterygium
6
5
4
3
2
2
1
1
1
1
1
conjunctivitis
Cervical spondylosis
capsulitis
Spinal stenosis
Calculus
decubitus
26
24
21
20
17
16
13
12
11
7
7
6
bronchitis
rhinitis
Fasciitis
Allergic rhinitis
Coronary atherosclerosis
j
62
60
58
55
52
44
43
43
41
29
26
Peptic
Peptic ulcer
cataract
Sicca syndrome
Dyspepsia
tract infection
156
123
118
116
113
113
105
90
77
73
72
6
Anxiety
neuropathy
dermatitis
Sleep
nephropathy
Peptic
375
323
301
296
279
271
270
257
248
225
166
156
75
Systemic lupus erythematosus
Diabetes
Osteoporosis
asthma
breast
y
4557
2337
2043
1982
1392
1328
748
592
394
375
A More Complete Framework
(i OS O 201 )(in PLOS One 2015)
After Postprocessing: Interesting Rules
77
How to summarize/validate/interpret
the discovered results is important
last-mile for Big Data Analytics!
78
Concluding Remarks:
G d O i iGrand Opportunities
 “Data is King”: Age of data monetization Data is King : Age of data monetization
 Data vs. Ideas vs. Technologies
 From Data to Idea
 From Idea to data
 Utilization of right technologies
 Visioning Visioning
 擁有價值性資料者可以為王
 不擁有資料但有創新點子的人易可稱王 不擁有資料但有創新點子的人易可稱王
 Innovative Ideas + Right Tech on Valued Data =>
Smart King
7979
Smart King
Grand Challenges Big Opportunities!Grand Challenges, Big Opportunities!
81
Thanks for your attentionf y

Mais conteúdo relacionado

Mais procurados

Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learningSara Hooker
 
Barga Data Science lecture 1
Barga Data Science lecture 1Barga Data Science lecture 1
Barga Data Science lecture 1Roger Barga
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1Sara Hooker
 
Tales from an ip worker in consulting and software
Tales from an ip worker in consulting and softwareTales from an ip worker in consulting and software
Tales from an ip worker in consulting and softwareGreg Makowski
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-surveyAdam Rabinovitch
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkRoger Barga
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI DutchJos van Dongen
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningPruet Boonma
 
Generative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging ApplicationsGenerative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging ApplicationsKyuhwan Jung
 
Strategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert MunroStrategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert MunroRobert Munro
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learningShishir Choudhary
 
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User StudiesYONG ZHENG
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesBesnik Fetahu
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsAndre Freitas
 

Mais procurados (20)

Why Data Science is a Science
Why Data Science is a ScienceWhy Data Science is a Science
Why Data Science is a Science
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learning
 
Barga Data Science lecture 1
Barga Data Science lecture 1Barga Data Science lecture 1
Barga Data Science lecture 1
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Tales from an ip worker in consulting and software
Tales from an ip worker in consulting and softwareTales from an ip worker in consulting and software
Tales from an ip worker in consulting and software
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-survey
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Active learning
Active learningActive learning
Active learning
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited Talk
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI Dutch
 
Agile Deep Learning
Agile Deep LearningAgile Deep Learning
Agile Deep Learning
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Generative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging ApplicationsGenerative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging Applications
 
Strategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert MunroStrategies for Practical Active Learning, Robert Munro
Strategies for Practical Active Learning, Robert Munro
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
 
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 

Destaque

人口統計應用於選舉預測-蔡佳泓
人口統計應用於選舉預測-蔡佳泓人口統計應用於選舉預測-蔡佳泓
人口統計應用於選舉預測-蔡佳泓台灣資料科學年會
 
顏汝芳/從薪酬制度讀 CEO 的行為心理學
顏汝芳/從薪酬制度讀 CEO 的行為心理學顏汝芳/從薪酬制度讀 CEO 的行為心理學
顏汝芳/從薪酬制度讀 CEO 的行為心理學台灣資料科學年會
 
周世恩/資料分析前的奏曲 : 談資料收集的挑戰
周世恩/資料分析前的奏曲 : 談資料收集的挑戰周世恩/資料分析前的奏曲 : 談資料收集的挑戰
周世恩/資料分析前的奏曲 : 談資料收集的挑戰台灣資料科學年會
 
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)台灣資料科學年會
 
[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123台灣資料科學年會
 
讓數字說話:資料的公益責信應用
讓數字說話:資料的公益責信應用讓數字說話:資料的公益責信應用
讓數字說話:資料的公益責信應用台灣資料科學年會
 
從 2013 社群網絡活動看台灣社會發展趨勢
從 2013 社群網絡活動看台灣社會發展趨勢從 2013 社群網絡活動看台灣社會發展趨勢
從 2013 社群網絡活動看台灣社會發展趨勢台灣資料科學年會
 
劉正山/世代之爭爭什麼 ? 談談如何從調查資料挖掘出豐厚的意義
劉正山/世代之爭爭什麼 ? 談談如何從調查資料挖掘出豐厚的意義劉正山/世代之爭爭什麼 ? 談談如何從調查資料挖掘出豐厚的意義
劉正山/世代之爭爭什麼 ? 談談如何從調查資料挖掘出豐厚的意義台灣資料科學年會
 
Collaboration with Statistician? 矩陣視覺化於探索式資料分析
Collaboration with Statistician? 矩陣視覺化於探索式資料分析Collaboration with Statistician? 矩陣視覺化於探索式資料分析
Collaboration with Statistician? 矩陣視覺化於探索式資料分析台灣資料科學年會
 
從網頁存取記錄瞭解使用者行為與網頁區塊貢獻分析-崔殷豪
從網頁存取記錄瞭解使用者行為與網頁區塊貢獻分析-崔殷豪從網頁存取記錄瞭解使用者行為與網頁區塊貢獻分析-崔殷豪
從網頁存取記錄瞭解使用者行為與網頁區塊貢獻分析-崔殷豪台灣資料科學年會
 
莊坤達/資料科學與防疫應用的結合 : 以登革熱防治為例
莊坤達/資料科學與防疫應用的結合 : 以登革熱防治為例莊坤達/資料科學與防疫應用的結合 : 以登革熱防治為例
莊坤達/資料科學與防疫應用的結合 : 以登革熱防治為例台灣資料科學年會
 
黃從仁/心理與行為資料中的因與果
黃從仁/心理與行為資料中的因與果黃從仁/心理與行為資料中的因與果
黃從仁/心理與行為資料中的因與果台灣資料科學年會
 
以健保資料分析對抗健康新聞的恐慌症候群-張俊鴻
以健保資料分析對抗健康新聞的恐慌症候群-張俊鴻以健保資料分析對抗健康新聞的恐慌症候群-張俊鴻
以健保資料分析對抗健康新聞的恐慌症候群-張俊鴻台灣資料科學年會
 
軟工人的資料科學奇航-線上遊戲、網路學習與中華職棒 by 許懷中
軟工人的資料科學奇航-線上遊戲、網路學習與中華職棒 by 許懷中軟工人的資料科學奇航-線上遊戲、網路學習與中華職棒 by 許懷中
軟工人的資料科學奇航-線上遊戲、網路學習與中華職棒 by 許懷中台灣資料科學年會
 
天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統
天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統
天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統台灣資料科學年會
 
林煜軒…œ/從手機解讀行為與心理
林煜軒…œ/從手機解讀行為與心理林煜軒…œ/從手機解讀行為與心理
林煜軒…œ/從手機解讀行為與心理台灣資料科學年會
 
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室台灣資料科學年會
 

Destaque (20)

人口統計應用於選舉預測-蔡佳泓
人口統計應用於選舉預測-蔡佳泓人口統計應用於選舉預測-蔡佳泓
人口統計應用於選舉預測-蔡佳泓
 
顏汝芳/從薪酬制度讀 CEO 的行為心理學
顏汝芳/從薪酬制度讀 CEO 的行為心理學顏汝芳/從薪酬制度讀 CEO 的行為心理學
顏汝芳/從薪酬制度讀 CEO 的行為心理學
 
周世恩/資料分析前的奏曲 : 談資料收集的挑戰
周世恩/資料分析前的奏曲 : 談資料收集的挑戰周世恩/資料分析前的奏曲 : 談資料收集的挑戰
周世恩/資料分析前的奏曲 : 談資料收集的挑戰
 
海量視覺資料-孫民
海量視覺資料-孫民海量視覺資料-孫民
海量視覺資料-孫民
 
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
 
[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123
 
[系列活動] 機器學習速遊
[系列活動] 機器學習速遊[系列活動] 機器學習速遊
[系列活動] 機器學習速遊
 
讓數字說話:資料的公益責信應用
讓數字說話:資料的公益責信應用讓數字說話:資料的公益責信應用
讓數字說話:資料的公益責信應用
 
從 2013 社群網絡活動看台灣社會發展趨勢
從 2013 社群網絡活動看台灣社會發展趨勢從 2013 社群網絡活動看台灣社會發展趨勢
從 2013 社群網絡活動看台灣社會發展趨勢
 
劉正山/世代之爭爭什麼 ? 談談如何從調查資料挖掘出豐厚的意義
劉正山/世代之爭爭什麼 ? 談談如何從調查資料挖掘出豐厚的意義劉正山/世代之爭爭什麼 ? 談談如何從調查資料挖掘出豐厚的意義
劉正山/世代之爭爭什麼 ? 談談如何從調查資料挖掘出豐厚的意義
 
Collaboration with Statistician? 矩陣視覺化於探索式資料分析
Collaboration with Statistician? 矩陣視覺化於探索式資料分析Collaboration with Statistician? 矩陣視覺化於探索式資料分析
Collaboration with Statistician? 矩陣視覺化於探索式資料分析
 
從網頁存取記錄瞭解使用者行為與網頁區塊貢獻分析-崔殷豪
從網頁存取記錄瞭解使用者行為與網頁區塊貢獻分析-崔殷豪從網頁存取記錄瞭解使用者行為與網頁區塊貢獻分析-崔殷豪
從網頁存取記錄瞭解使用者行為與網頁區塊貢獻分析-崔殷豪
 
莊坤達/資料科學與防疫應用的結合 : 以登革熱防治為例
莊坤達/資料科學與防疫應用的結合 : 以登革熱防治為例莊坤達/資料科學與防疫應用的結合 : 以登革熱防治為例
莊坤達/資料科學與防疫應用的結合 : 以登革熱防治為例
 
黃從仁/心理與行為資料中的因與果
黃從仁/心理與行為資料中的因與果黃從仁/心理與行為資料中的因與果
黃從仁/心理與行為資料中的因與果
 
以健保資料分析對抗健康新聞的恐慌症候群-張俊鴻
以健保資料分析對抗健康新聞的恐慌症候群-張俊鴻以健保資料分析對抗健康新聞的恐慌症候群-張俊鴻
以健保資料分析對抗健康新聞的恐慌症候群-張俊鴻
 
軟工人的資料科學奇航-線上遊戲、網路學習與中華職棒 by 許懷中
軟工人的資料科學奇航-線上遊戲、網路學習與中華職棒 by 許懷中軟工人的資料科學奇航-線上遊戲、網路學習與中華職棒 by 許懷中
軟工人的資料科學奇航-線上遊戲、網路學習與中華職棒 by 許懷中
 
天氣/氣候大數據的應用與展望
天氣/氣候大數據的應用與展望天氣/氣候大數據的應用與展望
天氣/氣候大數據的應用與展望
 
天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統
天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統
天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統
 
林煜軒…œ/從手機解讀行為與心理
林煜軒…œ/從手機解讀行為與心理林煜軒…œ/從手機解讀行為與心理
林煜軒…œ/從手機解讀行為與心理
 
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
 

Semelhante a 巨量與開放資料之創新機會與關鍵挑戰-曾新穆

Early Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV AdvertisingEarly Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV AdvertisingJeff Storan
 
Early Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV AdvertisingEarly Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV AdvertisingJeffrey Storan
 
The What, Why and How of Big Data
The What, Why and How of Big DataThe What, Why and How of Big Data
The What, Why and How of Big DataLuca Naso
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Greg Makowski
 
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...BigData_Europe
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...Daniel Katz
 
The Rationale for Continuous Delivery by Dave Farley
The Rationale for Continuous Delivery by Dave FarleyThe Rationale for Continuous Delivery by Dave Farley
The Rationale for Continuous Delivery by Dave FarleyBosnia Agile
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User StudyEnrico Daga
 
From the Big Bang to the New Economy, a journey in making sense of Big Data
From the Big Bang to the New Economy, a journey in making sense of Big DataFrom the Big Bang to the New Economy, a journey in making sense of Big Data
From the Big Bang to the New Economy, a journey in making sense of Big DataPatrick Deglon
 
Tech M&A Monthly: 2017 Midyear Report
Tech M&A Monthly: 2017 Midyear ReportTech M&A Monthly: 2017 Midyear Report
Tech M&A Monthly: 2017 Midyear ReportCorum Group
 
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - AmsterdamAllen Day, PhD
 
Media, data, context... and the Holy Grail of User Taste Prediction
Media, data, context... and the Holy Grail of User Taste PredictionMedia, data, context... and the Holy Grail of User Taste Prediction
Media, data, context... and the Holy Grail of User Taste PredictionXavier Amatriain
 
Benjamin Heitmann, PhD defence talk: An Open Framework for Multi-source, Cro...
Benjamin Heitmann, PhD defence talk: An Open Framework for Multi-source, Cro...Benjamin Heitmann, PhD defence talk: An Open Framework for Multi-source, Cro...
Benjamin Heitmann, PhD defence talk: An Open Framework for Multi-source, Cro...Benjamin Heitmann
 
Social networks protection against fake profiles and social bots attacks
Social networks protection against  fake profiles and social bots attacksSocial networks protection against  fake profiles and social bots attacks
Social networks protection against fake profiles and social bots attacksAboul Ella Hassanien
 
Building and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsBuilding and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsEmiliano De Cristofaro
 
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?Tuan Yang
 
Cogntive computing ibm workshop Assirm15
Cogntive computing ibm workshop Assirm15Cogntive computing ibm workshop Assirm15
Cogntive computing ibm workshop Assirm15Pietro Leo
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and butest
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...multimediaeval
 

Semelhante a 巨量與開放資料之創新機會與關鍵挑戰-曾新穆 (20)

Early Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV AdvertisingEarly Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV Advertising
 
Early Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV AdvertisingEarly Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV Advertising
 
The What, Why and How of Big Data
The What, Why and How of Big DataThe What, Why and How of Big Data
The What, Why and How of Big Data
 
It's all About the Data
It's all About the DataIt's all About the Data
It's all About the Data
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
 
The Rationale for Continuous Delivery by Dave Farley
The Rationale for Continuous Delivery by Dave FarleyThe Rationale for Continuous Delivery by Dave Farley
The Rationale for Continuous Delivery by Dave Farley
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User Study
 
From the Big Bang to the New Economy, a journey in making sense of Big Data
From the Big Bang to the New Economy, a journey in making sense of Big DataFrom the Big Bang to the New Economy, a journey in making sense of Big Data
From the Big Bang to the New Economy, a journey in making sense of Big Data
 
Tech M&A Monthly: 2017 Midyear Report
Tech M&A Monthly: 2017 Midyear ReportTech M&A Monthly: 2017 Midyear Report
Tech M&A Monthly: 2017 Midyear Report
 
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam
 
Media, data, context... and the Holy Grail of User Taste Prediction
Media, data, context... and the Holy Grail of User Taste PredictionMedia, data, context... and the Holy Grail of User Taste Prediction
Media, data, context... and the Holy Grail of User Taste Prediction
 
Benjamin Heitmann, PhD defence talk: An Open Framework for Multi-source, Cro...
Benjamin Heitmann, PhD defence talk: An Open Framework for Multi-source, Cro...Benjamin Heitmann, PhD defence talk: An Open Framework for Multi-source, Cro...
Benjamin Heitmann, PhD defence talk: An Open Framework for Multi-source, Cro...
 
Social networks protection against fake profiles and social bots attacks
Social networks protection against  fake profiles and social bots attacksSocial networks protection against  fake profiles and social bots attacks
Social networks protection against fake profiles and social bots attacks
 
Building and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsBuilding and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility Analytics
 
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
 
Cogntive computing ibm workshop Assirm15
Cogntive computing ibm workshop Assirm15Cogntive computing ibm workshop Assirm15
Cogntive computing ibm workshop Assirm15
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 

Mais de 台灣資料科學年會

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用台灣資料科學年會
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告台灣資料科學年會
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰台灣資料科學年會
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機台灣資料科學年會
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機台灣資料科學年會
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話台灣資料科學年會
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇台灣資料科學年會
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 台灣資料科學年會
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵台灣資料科學年會
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用台灣資料科學年會
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告台灣資料科學年會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話台灣資料科學年會
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維台灣資料科學年會
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察台灣資料科學年會
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰台灣資料科學年會
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] Practical experience in mining and evaluating information...
[2018 台灣人工智慧學校校友年會] Practical experience in mining and evaluating information...[2018 台灣人工智慧學校校友年會] Practical experience in mining and evaluating information...
[2018 台灣人工智慧學校校友年會] Practical experience in mining and evaluating information...台灣資料科學年會
 

Mais de 台灣資料科學年會 (20)

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
 
台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
 
[2018 台灣人工智慧學校校友年會] Practical experience in mining and evaluating information...
[2018 台灣人工智慧學校校友年會] Practical experience in mining and evaluating information...[2018 台灣人工智慧學校校友年會] Practical experience in mining and evaluating information...
[2018 台灣人工智慧學校校友年會] Practical experience in mining and evaluating information...
 

Último

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 

Último (20)

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 

巨量與開放資料之創新機會與關鍵挑戰-曾新穆

  • 1. 巨量與開放資料之創新機會與關鍵挑戰巨量與開放資料之創新機會與關鍵挑戰 Vincent S. Tseng (曾新穆) D t t f C t S iDepartment of Computer Science National Chiao Tung University T i 1 Taiwan
  • 3. Google Flu Trendsg J. Ginsberg, et al., Detecting influenza epidemics i h i d tusing search engine query data, Nature, February 2009 Link:- www.google.com/flutrends
  • 4. Application in Movie Industrypp y  電影【復仇者聯盟】: 成本兩億美金 電影【復仇者聯盟】: 成本兩億美金  如何知道觀眾之興趣反應?  如何訂定最佳之行銷策略? 4
  • 5. Application in Movie Industry (cont.)pp y ( )  利用Big Data Analytics 監測分析社交媒體對電影 預告片之反應:  11億條 Tweets/min 萬篇 570萬篇Blogs/min  350萬條 Messages/min  擷取關鍵訊息 分析主題 判斷網友意向 → 歸結出網友對電影預告 擷取關鍵訊息, 分析主題, 判斷網友意向 → 歸結出網友對電影預告 片之看法與評價  電影公司針對分析結果進行行銷策略之調整  【復仇者聯盟】票房:  2012年5月上片後, 美國本土首周票房達兩億美金(成本),寫下全美 影史最高首周票房紀錄  2012年總票房達15億美金, 成為世界電影史票房排名第三名, 僅次於 ”阿凡達” 、“鐵達尼號”阿凡達 鐵達尼號 5
  • 6.
  • 7. Architecture for Big Data Analytics High-Performance Computing Platform g y High Performance Computing Platform (Cloud, Stream, In-Memory, …) DataADataADataADataA Mining & Learning Components Rules RetrieveRules Retrieve •Clusters •Association …….Reports AccessCAPAccessCAPAccessCAPAccessCAP D tD t Components Data MiningData Mining Rules Retrieve Components Rules Retrieve Components Input C ++C ++ Predictive Models Models/ Rules IIII Data Preparation Components Data Preparation Components Text MiningText Mining Machine LearningMachine Learning Prediction Components Prediction Components Data • Structured • Unstructured Rules Statistical LearningStatistical Learning Applications Module Applications Module Interesting Patterns Data Preparation Deploy Data Access Data Modeling Presentation /Applications 7
  • 8. Tackling Some Key Challengesg y g  Data Preprocessing Phase Data Preprocessing Phase  Data quality problem: Noise, Incompleteness, Sparsity  Veracity issue: Is bigger the better?y gg  Data Understanding Phase  Key Features Discovery: Finding the needle in a haystack  Learning and Modeling Phase  Timeliness vs. Precision: Issues for data sampling  Need of more sophisticated methodologies  Post-processing Phase 8
  • 9. Some Key Challengesy g  Data Preprocessing Phase Data Preprocessing Phase  Data quality problem: Noise, Incompleteness, Sparsity  Veracity issue: Is bigger the better?y gg  Data Understanding Phase  Key Features Discovery: Finding the needle in a haystack  Learning and Modeling Phase  Timeliness vs. Precision: Issues for data sampling  Need of more sophisticated methodologies  Post-processing Phase 9
  • 10. 10
  • 14. 個人化推薦系統(cont.)個人化推薦系統( )  推薦系統 & 過濾系統 推薦系統 & 過濾系統  利用Big Data Analytics分析客戶偏好度  提供非熱門影片以平衡與滿足客戶需求,非熱門影片租 提供非熱門影片以平衡與滿足客戶需求,非熱門影片租 借佔了七成  當您被推薦的冷門電影卻非常好看,那種感覺是無可比 當您被推薦的冷門電影卻非常好看 那種感覺是無可比 擬的  四分之三的推薦影片評價比最新發行的影片還高,這就 是推薦系統的真正價值  世界上最龐大的電影評比資料庫,遠超過競爭對手所能 提供的服務價值 14
  • 15. Big Data in Netflixg  62M+ Subscribers over 50 countries 62M+ Subscribers over 50 countries  4M/day Ratings  3M/day Searches  30+M/day plays30 M/day plays  Streaming hours 2B h i Q1/2012 2B hours in Q1/2012  10B hours in Q1/2015 15
  • 16. Netflix Prize  Grand Prize, $1M USD for 10% improvement in prediction accuracy  Progress Prize, $50,000 USD every yearg , $ , y y  Since Oct. 2, 2006 E d O t 2 2011 End Oct. 2, 2011  Or when some teams reach 10% goal 16 (Ref: Netflix 2012 )
  • 17. Recommendation Problem: Collaborative Filtering based Methods- Collaborative Filtering-based Methods itm1 itm2 itm3 itm4 itm5 A d ? 1 1 4 5 User-based Collaborative Filtering Andre ? 1 1 4 5 Ben 1 2 0 2 0 Juice 3 1 2 4 5 User based Collaborative Filtering David 1 1 0 1 0 itm1 itm2 itm3 itm4 itm5Item-based Collaborative Filtering 1 2 3 4 5 Andre ? 1 0 4 5 Ben 1 2 0 2 0 g Juice 3 1 2 4 5 David 1 1 0 1 0 if i itm1 itm2 itm4 itm3 itm5 Andre ? 1 4 0 5 Ben 1 2 2 0 0 Unifying User-based and Item-based Collaborative Filtering 17 Ben 1 2 2 0 0 Juice 3 1 2 4 5 David 1 1 1 0 0
  • 18. Netflix Analytics Worky  Dataset consists of 100M+ training entries Dataset consists of 100M+ training entries  Each training entry is in a quadruplet form  <user, movie, date, grade>, each is an integer  The qualifying dataset consists of 2.8M entriesq y g  <user, movie, date> w/o grading  Error measure: RMSE (root mean square error) Error measure: RMSE (root mean square error) 18
  • 19. RMSE Scores  0 8563 (10%) Grand Prize 0.8563 (10%) Grand Prize  0.8643 (9.15%) Leader  0.8667 (8.9%) Current progress  0.8712 (8.43%) Progress Prize Winner 20070.8712 (8.43%) Progress Prize Winner 2007  0.9514 (0%) Netflix Cinematch 1 0540 ( 10 78%) M i A 1.0540 (-10.78%) Movie Average 19
  • 21. Challengesg  Data Sparsity Problemp y  Highly Sparse Data & Cold Start Problem: traditional approaches like CF are not feasibletraditional approaches like CF are not feasible → Need specialized method  Netflix Prize winner: Pragmatic Chaos Theory Netflix Prize winner: Pragmatic Chaos Theory  Gap between complex models and deployment  Winner’s solution: Complex composition of hundreds/thousands of learned models → Hard to deploy in real applications  Similar scenarios exist in many big datay g applications and effective solutions are desired! 21
  • 22. Some Key Challengesy g  Data Preprocessing Phase Data Preprocessing Phase  Data quality problem: Noise, Incompleteness, Sparsity  Veracity issue: Is bigger the better?y gg  Data Understanding Phase  Key Features Discovery: Finding the needle in a haystack  Learning and Modeling Phase  Timeliness vs. Precision: Issues for data sampling  Need of more sophisticated methodologies  Post-processing Phase 22
  • 23. I bi l h b ?Is bigger always the better? Veracity issue-- Veracity issue
  • 24. Google Flu Trendsg J. Ginsberg, et al., Detecting influenza epidemics i h i d tusing search engine query data, Nature, February 2009 Link:- www.google.com/flutrends
  • 25. Google Flu Trends -- Ideag • C t i W b S h• Certain Web Search terms are good Indicators of flu activity. • Google Trend uses Aggregated search data on flu indicators.on flu indicators. • Estimate current flu activity around the world i l tiin real time. • From example :- Google Flu Trend detectsFlu Trend detects increased flu activity two weeks before CDC. *CDC: Center for Disease Control
  • 26. Google Flu Trends -- Modelg  Data:  Look at all search queries in Google from 2003 to 2008 Look at all search queries in Google from 2003 to 2008  Several hundred billion individual searches in the United States  Keep track of only the 50 million most common queries  Keep a weekly count for each query  Also keep counts of each query by geographic region (requires use of geo-location from IP addresses: >95% accurate) So counts for 50 million queries x 170 weeks x 9 regions query selectionq g  Target variable to be predicted:  For each week, for each region I(t) = percentage physician visits that are ILI (as compiled by CDC) query selection I(t) = percentage physician visits that are ILI (as compiled by CDC)  Input variable: Q(t) = sum of top n highest correlated queries / total number of queries that week Constructing the ILI-related query/ total number of queries that week “M d l l i ” q y fraction  “Model learning”: log( I(t) / [1 – I(t)] ) =  log ( Q(t)/ [1 – Q(t) ] ) + noise Logistic regression
  • 27. The Parable of Google Flu: Traps in Bigg p g Data Analysis (Science, Mar. 2014)
  • 28. Some Key Challengesy g  Data Preprocessing Phase Data Preprocessing Phase  Data quality problem: Noise, Incompleteness, Sparsity  Veracity issue: Is bigger the better?y gg  Data Understanding Phase  Key Features Discovery: Finding the needle in a haystack  Learning and Modeling Phase  Timeliness vs. Precision: Issues for data sampling  Need of more sophisticated methodologies  Post-processing Phase 28
  • 29. Deep Understanding of Key Featuresp g y
  • 30.  A large-scale research initiative aimed at  Innovations around smartphone-based research  Collect smartphone data in everyday life conditions  Community-based evaluation of related mobile data analysis methodologiesmethodologies  Data source: Lausanne Data Collection Campaign 30
  • 31. User Profile/Behavior Modeling and Prediction  Personal information  Media files  Device information  Process  Calendar  Applications  Social information  Accelerometer  System Information  Location information  Call log  Contacts  Bluetooth  GSM  WLAN  Sequence of place visits
  • 32. MDC 2012 Tracks  Main Goals  User Profile/Behavior Modeling and Prediction  Dedicated Track Dedicated Track  Demographic attribute prediction  Predict gender age group marital status job type etc Predict gender, age group, marital status, job type, etc. of an user  Semantic place prediction Semantic place prediction  Predict the semantic meaning of user’s visited places N t l di ti Next place prediction  Predict the next destination of a user 32
  • 33. Demographic Attribute Prediction  One of the items: Prediction of gender g p  One of the items: Prediction of gender 33
  • 34. 34
  • 36. Demographic Attribute Prediction  Lots of features could be extracted from data g p  10,000+ features used by the winner team!  High accuracy achieved: 96% ………………Location features Media features Sensor features 36
  • 37. Very high dimensional complexityVery high dimensional complexity - Feasibility problem in real applications!Feasibility problem in real applications! Is there some key/dominating feature? ………………Location…… features Media features S f tSensor features 37
  • 38. Demographic Attribute Prediction (cont.)  Accelerometer is actually a key/dominating g p ( ) feature!  Support accuracy around 95%  Underlying reasoning? 38
  • 39. Very Different behavior between the Male & Female ! 39
  • 40. Some Key Challengesy g  Data Preprocessing Phase Data Preprocessing Phase  Data quality problem: Noise, Incompleteness, Sparsity  Veracity issue: Is bigger the better?y gg  Data Understanding Phase  Key Features Discovery: Finding the needle in a haystack  Learning and Modeling Phase  Timeliness vs. Precision: Issues for data sampling  Need of more sophisticated methodologies  Post-processing Phase 40
  • 41. Timeliness in Big Data Analyticsg y 41 (Source: IBM white paper)
  • 42. One Solution: Data Samplingp g - Bias on Data Samples T i id i l f h Twitter provides two main outlets for researchers to access tweets in real time:  Streaming API (~1% of all public tweets, free)  Firehose (100% of all public tweets, costly)  Streaming API data is often used by researchers to validate hypotheses.  How well does the sampled Streaming API data measure the true activity on Twitter? 42
  • 43. Bias on Data Samples (cont.)p ( ) S [H Li l AAAI ICWSM2013] 43 Source: [Huan Liu et al. AAAI ICWSM2013]
  • 45. National Health Insurance Research Database in Taiwan  National Health Insurance (NHI ) National Health Insurance (NHI )  Established in March 1, 1995  Serves 99.2% of Taiwanese population (20M+)  Covers 92.62% of medical institutions  Longitudinal Health Insurance Database ( LHID )  sampled from NHIRDp  Including 951,044 people health records  1997 – now Strongly representative in Taiwan Strongly representative in Taiwan  Every living regions  Big time interval 15+ years Reference : National Health Insurance, http://www.nhi.gov.tw
  • 46. Linking with More Heterogeneous Datasets Environmental Smart Environmental monitoring data Lab data & PatientLab data & Patient CRCRNHINHI CODCODBRBR Smart Health Risk Al treported outcomereported outcome Cloud Sensor-based biomarker Alert Computing Sensor-based biomarker monitoring data 46
  • 47. 健保資料抽樣方式健保資料抽樣方式  資料內容  以2010年承保資料檔中「2010年在保者」隨機取100萬人  抽樣母體群  由中央健康保險署所提供的2010年承保資料檔以「身份證字 號加上生日加上性別」歸人,可得 27,378,403人之資料, 作為資料母檔。作為資料母檔  抽樣方法  利用隨機值產生器(random number generator)產生至少100 利用隨機值產生器(random number generator)產生至少100 萬個隨機值(random number, 實得1,074,263個隨機值),取 與100萬個隨機值相同的流水號,來隨機抽取所需的保險對 象樣本。象樣本  關於隨機值產生作業,係採用Oracle的DBMS_RANDOM套件來 執行。 資料來源: 全民健康保險研究資料庫, http://nhird.nhri.org.tw/date_cohort.htm
  • 48. 健保資料抽樣方式(cont.)健保資料抽樣方式( ) 萬樣本與抽樣母群體 全人口 之驗證方式 100萬樣本與抽樣母群體(全人口) 之驗證方式  統計資料中年齡、性別、每年出生人數分佈,以及 平均投保金額,比較100萬樣本與抽樣母群體之間是 否有差異  同時並與內政部公佈之資料值比較  以卡方分析分析100萬人樣本對抽樣母群體之代表性  均在5%顯著水準以下 資料來源: 全民健康保險研究資料庫, http://nhird.nhri.org.tw/date_cohort.htm
  • 49. 疾病因子分析 Linked data is biased! 測站 日期 每日X疾病就診人數大氣環境資料 監測站 49 空氣汙染資料 監測站 監測站 使用LHID2000百萬抽樣檔
  • 50. Some Key Challengesy g  Data Preprocessing Phase Data Preprocessing Phase  Data quality problem: Noise, Incompleteness, Sparsity  Veracity issue: Is bigger the better?y gg  Data Understanding Phase  Key Features Discovery: Finding the needle in a haystack  Learning and Modeling Phase  Timeliness vs. Precision: Issues for data sampling  Need of more sophisticated methodologies  Post-processing Phase 50
  • 51. Mining User PreferenceMining User Preference - for POI Recommendation
  • 52. Goal • How to do POI recommendation by utilizing user’s i l t k l ( h k i )?social network log (eg, check-in)? 1 3 4 6 5 7 8 9S 2 3 9 10S p S p 2 1 - 52 -
  • 53. Urban Point of Interest Recommendation byUrban Point-of-Interest Recommendation by Mining User Check-in Behaviors Josh Jia-Ching Ying, Eric Hsueh-Chan Lu, Wen-Ning KuoJosh Jia Ching Ying, Eric Hsueh Chan Lu, Wen Ning Kuo and Vincent S. Tseng 2012ACM SIGKDD Int’l Workshop on Urban Computing2012ACM SIGKDD Int l Workshop on Urban Computing (UrbComp 2012)
  • 54. Proposed Method – UPOI-Minep LBSN Dataset Social Factor User-POI Graph Construction Relevance Learning LBSN Dataset Social Factor User-POI Graph Construction Relevance Learning Individual Preference Construction Individual Preference Construction Feature Extraction POI Popularity - User-POI Relevance Matrix Feature Extraction POI Popularity - User-POI Relevance Matrix UserRequest Top k Nearest POI selection Top k Nearest POI POI RankingUserRequest Top k Nearest POI selection Top k Nearest POI POI Ranking POI Recommending List POI Recommending ListPOI Recommendation
  • 55. Social Factor (SF)( )  F Weight kikiki DisSimwCheckSimw )1(Relation    k i,kk,jji Interest,POIuserSF 1 ]Relation[)( kikiki ,,, )(  , jkcheckin Interest   || 1 , , S s sk jk checkin Interest F f i d fF: friends of user i S: the set of POIs Check-in k,* = check-ins of user k at POI*
  • 56. Individual Preference (IP)( ) highlight category • Individual Preference(IP) • HPrefi,h • CPrefi category • CPrefi,c ),POIIP(user ji    Pr)1()POI(Pr , )( HCount HCount efHIefC Hh jh i,h C jcctgi,c               asdefinedfunctionindicatoranis)I(where, , s,c HCountHh Hg jgCc             otherwise0 )(POIif1 )POI()( cctg I j jcctg  otherwise0
  • 57. POI Popularity (PP)  POI Popularity p y ( )  POI Popularity  Relative Popularity of POI  Normalized based on category checkins RP j j   .POIithcategory wsamein thewhichPOIsofsettheiswhere, POI jCS checkins CS k j k  .Otcatego y wsa et ew cO sosett esw e e, jCS
  • 58. Relevance Estimation TargetTo estimate the relevance of each pair of user-POI TargetTo estimate the relevance of each pair of user-POI, we use these features to learn a Regression-Tree Model. User ID POI ID SF PP IP Relevance 1 A 0.2 0.1 0.001 3 1 B 0.05 0.2 0.1 51 B 0.05 0.2 0.1 5 1 C 0.004 0.1 0.9 1 … … … … … … N D 0.5 0.15 0.06 2 Regression-Tree Model
  • 59. Experimental Evaluation  Real dataset crawled from Gowalla p  in New York City area  1,964,919 POIs, ,  18,159 people  5 341 191 Check-ins 5,341,191 Check-ins  392,246 Friendship Links
  • 61. Better way for modeling?Better way for modeling? - UPOI-Walk- UPOI-Walk In ACM Transactions on Intelligent Systems and Technologies, 2014
  • 62. Motivation  The existing models could not deal with such h f llheterogeneous features well  The existing models try to combine all features into f b ildi i l d l Bi !one measure for building a single model → Bias! Relevance LearningRelevance Learning LBSN Dataset Social Factor Individual Preference User-POI Graph Construction Hits-based Random Walk LBSN Dataset Social Factor Individual Preference User-POI Graph Construction Hits-based Random Walk Feature Extraction Preference POI Popularity User-POI Graphs User-POI Relevance Matrix Feature Extraction Preference POI Popularity User-POI Graphs User-POI Relevance Matrix User Request Top k Nearest POI selection Top k Nearest POI POI RankingUser Request Top k Nearest POI selection Top k Nearest POI POI Ranking POI Recommending ListPOI Recommendation POI Recommending List
  • 64. HITS-based Random Walk X C t l “Mi i i ifi t ti Random Walk X. Cao , et al., “Mining significant semantic locations from GPS data,” Proceedings of the VLDB Endowment, v.3 n.1-2, September 20102010 0.3 0 2 0 10.2 0.4 0.1 Given an m × n hits value matrix MGiven an m × n hits value matrix M 11 1 1 ))1(( ))1((     kk k user T col k POI xMx xMx   HITS-based Random Walk 2 ))1((  POIrowuser xMx 
  • 65. Dynamic HITS-Based Random Walky X N X Y Network Set = {M,N,X,Y,Z,…}M ZZ Randomly select vPOI k1  (Mcol T (1)1)vuser k vk1  (N (1 ) )vk1 …… hits value matrixes from Network Set vuser  (Nrow (1)2 )vPOI vPOI k2  (Xcol T (1)1)vuser k1 vk3  (Y (1)2 )v O k2 … vuser (Yrow (1 )2 )vPOI vPOI k3  (Zcol T (1)1)vuser k2 … till converged
  • 66. Comparison with Existing R d NDCGRecommenders - NDCG
  • 67. Beautiful algorithms matter a lot still for Big Data Analytics! 67
  • 68. Some Key Challengesy g  Data Preprocessing Phase Data Preprocessing Phase  Data quality problem: Noise, Incompleteness, Sparsity  Veracity issue: Is bigger the better?y gg  Data Understanding Phase  Key Features Discovery: Finding the needle in a haystack  Learning and Modeling Phase  Timeliness vs. Precision: Issues for data sampling  Need of more sophisticated methodologies  Post-processing Phase 68
  • 71. Early Prediction of Diseasesy Huizinga, T. W. J., & van der Helmvan Mil, A. H. M. (2007). Prediction and prevention of rheumatoid arthritis. Revista Colombiana de Reumatología, 14(2), 106-114. Early RA 12 month RA DiagnosisEarly RA 18 month Very Early Detection ~ X years 71 ye s
  • 72. Analytics Frameworky Data miningTarget PreprocessedRaw techniquesdata datadata Di d Off Classifier Discovered Rules Off- line On-line Morbidity Risk Prediction S Health records Potential Patient Doctor / Hospital System Predicted risk 72
  • 73. Rules Produced Too many rules! Postprocessing is essential! 73 73
  • 74. Post-Processing – Rules FilteringPost Processing Rules Filtering Rules: Lift > 1: 11,004 Rules Lift = 1: 357 Rules Lift < 1: 7,543 Rules 74
  • 75. Postprocessing: Literature Search (Pubmed) Acute laryngopharyngitis Manic disorder neoplasm of breast Adhesive capsulitis of shoulder 0 0 0 0 0 0 0 0 decubitus urination Vaginitis Kaschin lumbar intervertebral disc Pterygium 6 5 4 3 2 2 1 1 1 1 1 conjunctivitis Cervical spondylosis capsulitis Spinal stenosis Calculus decubitus 26 24 21 20 17 16 13 12 11 7 7 6 bronchitis rhinitis Fasciitis Allergic rhinitis Coronary atherosclerosis j 62 60 58 55 52 44 43 43 41 29 26 Peptic Peptic ulcer cataract Sicca syndrome Dyspepsia tract infection 156 123 118 116 113 113 105 90 77 73 72 6 Anxiety neuropathy dermatitis Sleep nephropathy Peptic 375 323 301 296 279 271 270 257 248 225 166 156 75 Systemic lupus erythematosus Diabetes Osteoporosis asthma breast y 4557 2337 2043 1982 1392 1328 748 592 394 375
  • 76. A More Complete Framework (i OS O 201 )(in PLOS One 2015)
  • 78. How to summarize/validate/interpret the discovered results is important last-mile for Big Data Analytics! 78
  • 79. Concluding Remarks: G d O i iGrand Opportunities  “Data is King”: Age of data monetization Data is King : Age of data monetization  Data vs. Ideas vs. Technologies  From Data to Idea  From Idea to data  Utilization of right technologies  Visioning Visioning  擁有價值性資料者可以為王  不擁有資料但有創新點子的人易可稱王 不擁有資料但有創新點子的人易可稱王  Innovative Ideas + Right Tech on Valued Data => Smart King 7979 Smart King
  • 80.
  • 81. Grand Challenges Big Opportunities!Grand Challenges, Big Opportunities! 81
  • 82. Thanks for your attentionf y