4. classification v.s clustering
• Data is not labeled
• Group pointed that are “close” to each other
• Unsupervised learning
• Labeled data points
• Want a “rule” that assigns labels to new points
• Supervised learning
21. 誰最近呢?
⾝身⾼高 體重 label
A 180 90 ⼤大隻
B 175 85 ⼤大隻
C 160 45 ⼩小⽀支
D 165 50 ⼩小⽀支
E 和⼤大夥的距離
200
450
3925
3125
E A和 最近,所以A是⼤大隻
那E也是⼤大隻
22.
23.
24. Brute Force
N samples in D dimensions
N = 4 D = 2
⾝身⾼高 體重 label
A 180 90 ⼤大隻
B 175 85 ⼤大隻
C 160 45 ⼩小⽀支
D 165 50 ⼩小⽀支
找出所有距離 N*D
找出最近的點 N
時間複雜度
O [ DN2 ]
25.
26. K-D Tree
⾝身⾼高 體重 label
A 180 90 ⼤大隻
B 175 85 ⼤大隻
C 160 45 ⼩小⽀支
D 165 50 ⼩小⽀支
165
180
175160
190
27. K-D Tree
⾝身⾼高 體重 label
A 180 90 ⼤大隻
B 175 85 ⼤大隻
C 160 45 ⼩小⽀支
D 165 50 ⼩小⽀支
165,50
180,90
175,85
160,45
190,100
⽐比⾝身⾼高
⽐比體重
時間複雜度
O [ DN logN ]
28.
29. Where KD trees partition data along
Cartesian axes, ball trees partition data
in a series of nesting hyper-spheres.
41. You must predict the total count of bikes
rented during each hour
42. datetime hourly date + timestamp
season
1 = spring, 2 = summer, 3 = fall, 4 =
winter
weather
1: Clear, Few clouds, Partly cloudy,
Partly cloudy
2: Mist + Cloudy, Mist + Broken
temp temperature in Celsius
humidity relative humidity
windspeed wind speed
holiday
whether the day is considered a
holiday
workingday
whether the day is neither a
weekend nor holiday