Sei sulla pagina 1di 18

機器學習超簡介

Yi-Ren Yeh (葉倚任)


Department of Mathematics

National Kaohsiung Normal University
從⼈人類學習到機器學習

⼈人類學習: 藉由觀察到的事與物(observations)之經驗來來學習(learning)
到特定能⼒力力。

Observations + Feedback Ability

Learning

2
從⼈人類學習到機器學習

機器學習: 藉由收集到的資料(data)來來學習到特定功能。

Machine Learning
Data ability 新的數字圖片
Algorithm

資料 機器學習 特定功能

數字

有87%的機會是5
辨識模型

3
機器學習是如何運作的?
新的數字圖片
資料 ⼈人類的技能
機器學習
數字

辨識模型
有87%的機會是5

資料表達 學習演算法 模型評估


Data Representation Learning Algorithm Evaluation
4
What is Data?

⼀一般來來說,資料可分成兩兩個部分:
特徵 (Feature):⽤用來來描述每⼀一筆資料,通常會⽤用 X 來來表⽰示
標記 (label):⽤用來來表⽰示每⼀一筆資料所對應的輸出,這個輸出樣式可以有
不同的狀狀態(可能是類別或者實數值等),通常會⽤用 Y 來來表⽰示。
Data
Y: Male or Female

X: weight • Use these features to describe


height the data instances.
hair color • Can be discrete, numeric, etc.
hair length
.
.
How to extract good features is an important
.

task in machine learning 5


Weather Data Example
X Y: Will it rain tomorrow or not?

6
Image Data Example
Y X What is X for these
image data?

7
Text Data Example

Y: Spam email or not?

X
What is X for these
text data? 8
資料來來源很多樣性!

These formats are not consistent.


How to apply learning algorithms to these data?

9
資料分析的第⼀一步:特徵擷取

What features will you define to detect spam mails?

10
Text Representation Example

Bag-of-Words(BoW) Model

AAspam:
spam:
AThe
spam:
The highsalary
high salaryand
andstability
stabilityofofyour
yourlife...One
life...Oneyour
yourphone
phonecall
call
The
can high
canchangesalary
change and
your
your stability
life.
life. Thousand
Thousand of people
your
peoplelife...One your
worldwide
worldwide phone
have
have call
already
already
can
got change
gotnew your
newhighly
highly life.work
paid
paid Thousand
workthanks
thankspeople worldwide
totoususWeWework
workfor have
for youalready
you 77days
daysinin
got
a newand
aweek
week highly
andWe Wepaid
send
send work thanks to to
thecertificate
the certificate us We
toall work for you 7 days in
allcountries.
countries.
aIfweek
If you and
youthe We send
thecitizen
citizen the
ofofthe certificate
theUSA: to all countries.
USA:1603-509-2001
1603-509-2001 OutsideUSA:
Outside USA:
If+1603-509-2001Please
you the citizen of the USA:
+1603-509-2001Please Inform 1603-509-2001
Inform yourName
your Nameand Outside
and USA:
yourphone
your phone
+1603-509-2001Please
numberand
number andyour
yourcountry
country Inform
codeyour Name and your phone
code
number and your country code

Dictionary
high salary
Term Frequency
phone
high salary phone …… country
country
high salary phone … country
11 11 22 …… 22
1 1 2 … 2
(Representingmails
(Representing mailsininaavector
vectorspace)
space)
(Representing mails in a vector space)
11
為什什麼要斷詞?

透過斷詞,將可以進⼀一步表達資料!
12
What is a Learning Algorithm?
新的數字圖片
資料 ⼈人類的技能
機器學習
數字

辨識模型
有87%的機會是5

Let’s start with a simple classification algorithm: 



K-nearest neighbors

資料表達 學習演算法 模型評估


Data Representation Learning Algorithm Evaluation
13
1-Nearest Neighbor classifier

An instance is classified by its nearest neighbor.

k=1

14
K-Nearest Neighbor (kNN) Classifier

More neighbors can taken into account!

k=1 k=3

15
Use 1-NN to Classify Weather Data
Training data Testing data
Temp. Humidity Play
85 85 no
80 90 no
83 86 yes
70 96 yes
68 80 yes
65 70 no
64 65 yes
72 95 no
69 70 yes
75 80 yes
75 70 yes
72 90 yes
81 75 yes : yes
: no
71 91 no
16

16
如何應⽤用在 QnA 上⾯面呢?

先準備你的QnA!

17
如何應⽤用在 QnA 上⾯面呢?

比相似度
Q1: 綜合所得稅應於何時辦理理結算申報?如何申報?

Q2: 夫妻是否應合併申報綜合所得稅?在年年度中結婚
夫妻報稅怎麼申報啊? ... 或離婚,應如何申報?
...
Qn: 綜合所得稅退稅⽅方式有那幾種?

相似度最⾼高!
Q2: 夫妻是否應合併申報綜合所得稅?在年年度中結婚
回覆 或離婚,應如何申報?

找出相對應的A
A2: 這個問題分2點來來說明:
1.所得稅法第15條規定,102(含)以前年年度納稅義務
⼈人配偶的所得,要由納稅義務⼈人合併報繳;其中納稅義務⼈人本⼈人或配偶的薪資所
得是可以分開計算稅額,但仍然要由納稅義務⼈人合併報繳。⾃自103(含)年年度起 . . .

18

Potrebbero piacerti anche