TM 29 Text Cat Methods

Caricato da

Sahar Salah

Il 0% ha trovato utile questo documento (0 voti)

8 visualizzazioni6 pagine

text mining

Copyright

Formati disponibili

PDF, TXT o leggi online da Scribd

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Segnala questo documento

text mining

Copyright:

Formati disponibili

Scarica in formato PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Il 0% ha trovato utile questo documento (0 voti)

8 visualizzazioni6 pagine

TM 29 Text Cat Methods

Caricato da

Sahar Salah

text mining

Copyright:

Formati disponibili

Scarica in formato PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Salta alla pagina

Sei sulla pagina 1di 6

Cerca all'interno del documento

Text Categorization: Methods

ChengXiang Cheng Zhai

Department of Computer Science
University of Illinois at Urbana-Champaign

Overview
What is text categorization?
Why text categorization?
How to do text categorization?
Generative probabilistic models
Discriminative approaches

How to evaluate categorization results?

Categorization Methods: Manual

Determine the category based on rules that are carefully designed
to reflect the domain knowledge about the categorization
problem
Works well when
The categories are very well defined
Categories are easily distinguished based on surface features in text
(e.g., special vocabulary is known to only occur in a particular category)
Sufficient domain knowledge is available to suggest many effective rules

Problems
Labor intensive doesnt scale up well
Cant handle uncertainty in rules; rules may be inconsistent not
robust
3
Both problems can be solved/alleviated by using machine learning

Categorization Methods: Automatic

Use human experts to
Annotate data sets with category labels Training data
Provide a set of features to represent each text object that can potentially
provide a clue about the category

Use machine learning to learn soft rules for categorization from

the training data
Figure out which features are most useful for separating different
categories
Optimally combine the features to minimize the errors of categorization
on the training data
The trained classifier can then be applied to a new text object to predict
the most likely category (that a human expert would assign to it)
4

Machine Learning for Text Categorization

General setup: Learn a classifier f: XY
Input: X = all text objects; Output: Y = all categories
Learn a classifier function, f: XY, such that f(x)=y Y gives the correct
category for xX (correct is based on the training data)

All methods
Rely on discriminative features of text objects to distinguish categories
Combine multiple features in a weighted manner
Adjust weights on features to minimize errors on the training data

Different methods tend to vary in

Their way of measuring the errors on the training data (may optimize a
different objective/loss/cost function)
Their way of combining features (e.g., linear vs. non-linear)
5

Generative vs. Discriminative Classifiers

Generative classifiers (learn what the data looks like in each
category)
Attempt to model p(X,Y) = p(Y)p(X|Y) and compute p(Y|X) based on p(X|Y)
and p(Y) by using Bayes Rule
Objective function is likelihood, thus indirectly measuring training errors
E.g., Nave Bayes

Discriminative classifiers (learn what features separate categories)

Attempt to model p(Y|X) directly
Objective function directly measures errors of categorization on training
data
E.g., Logistic Regression, Support Vector Machine (SVM), k-Nearest
Neighbors (kNN)
6

Potrebbero piacerti anche

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Da Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Valutazione: 4 su 5 stelle
4/5 (5794)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Da Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Valutazione: 4 su 5 stelle
4/5 (1090)
Never Split the Difference: Negotiating As If Your Life Depended On It
Da Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Valutazione: 4.5 su 5 stelle
4.5/5 (838)
Principles: Life and Work
Da Everand
Principles: Life and Work
Ray Dalio
Valutazione: 4 su 5 stelle
4/5 (599)
The Glass Castle: A Memoir
Da Everand
The Glass Castle: A Memoir
Jeannette Walls
Valutazione: 4.5 su 5 stelle
4.5/5 (1713)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Da Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Valutazione: 4 su 5 stelle
4/5 (895)
Sing, Unburied, Sing: A Novel
Da Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Valutazione: 4 su 5 stelle
4/5 (1103)
Grit: The Power of Passion and Perseverance
Da Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Valutazione: 4 su 5 stelle
4/5 (588)
Shoe Dog: A Memoir by the Creator of Nike
Da Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Valutazione: 4.5 su 5 stelle
4.5/5 (537)
The Perks of Being a Wallflower
Da Everand
The Perks of Being a Wallflower
Stephen Chbosky
Valutazione: 4.5 su 5 stelle
4.5/5 (2104)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Da Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Valutazione: 4.5 su 5 stelle
4.5/5 (345)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Da Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Valutazione: 4.5 su 5 stelle
4.5/5 (474)
Bad Feminist: Essays
Da Everand
Bad Feminist: Essays
Roxane Gay
Valutazione: 4 su 5 stelle
4/5 (1016)
Her Body and Other Parties: Stories
Da Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Valutazione: 4 su 5 stelle
4/5 (821)
The Outsider: A Novel
Da Everand
The Outsider: A Novel
Stephen King
Valutazione: 4 su 5 stelle
4/5 (1839)
The Emperor of All Maladies: A Biography of Cancer
Da Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Valutazione: 4.5 su 5 stelle
4.5/5 (271)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Da Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Valutazione: 4.5 su 5 stelle
4.5/5 (121)
Angela's Ashes: A Memoir
Da Everand
Angela's Ashes: A Memoir
Frank McCourt
Valutazione: 4.5 su 5 stelle
4.5/5 (440)
Brooklyn: A Novel
Da Everand
Brooklyn: A Novel
Colm Tóibín
Valutazione: 3.5 su 5 stelle
3.5/5 (1937)
The Little Book of Hygge: Danish Secrets to Happy Living
Da Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Valutazione: 3.5 su 5 stelle
3.5/5 (400)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Da Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Valutazione: 3.5 su 5 stelle
3.5/5 (2259)
A Man Called Ove: A Novel
Da Everand
A Man Called Ove: A Novel
Fredrik Backman
Valutazione: 4.5 su 5 stelle
4.5/5 (4610)
The Art of Racing in the Rain: A Novel
Da Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Valutazione: 4 su 5 stelle
4/5 (4200)
A Tree Grows in Brooklyn
Da Everand
A Tree Grows in Brooklyn
Betty Smith
Valutazione: 4.5 su 5 stelle
4.5/5 (1929)
The Yellow House: A Memoir (2019 National Book Award Winner)
Da Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Valutazione: 4 su 5 stelle
4/5 (98)
Steve Jobs
Da Everand
Steve Jobs
Walter Isaacson
Valutazione: 4.5 su 5 stelle
4.5/5 (806)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Da Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Valutazione: 4.5 su 5 stelle
4.5/5 (266)
The Woman in Cabin 10
Da Everand
The Woman in Cabin 10
Ruth Ware
Valutazione: 3.5 su 5 stelle
3.5/5 (2322)
Yes Please
Da Everand
Yes Please
Amy Poehler
Valutazione: 4 su 5 stelle
4/5 (1891)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Da Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Valutazione: 3.5 su 5 stelle
3.5/5 (231)
Team of Rivals: The Political Genius of Abraham Lincoln
Da Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Valutazione: 4.5 su 5 stelle
4.5/5 (234)
Fear: Trump in the White House
Da Everand
Fear: Trump in the White House
Bob Woodward
Valutazione: 3.5 su 5 stelle
3.5/5 (738)
Wolf Hall: A Novel
Da Everand
Wolf Hall: A Novel
Hilary Mantel
Valutazione: 4 su 5 stelle
4/5 (3811)
John Adams
Da Everand
John Adams
David McCullough
Valutazione: 4.5 su 5 stelle
4.5/5 (2409)
On Fire: The (Burning) Case for a Green New Deal
Da Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Valutazione: 4 su 5 stelle
4/5 (74)
The Light Between Oceans: A Novel
Da Everand
The Light Between Oceans: A Novel
M.L. Stedman
Valutazione: 4.5 su 5 stelle
4.5/5 (789)
The Unwinding: An Inner History of the New America
Da Everand
The Unwinding: An Inner History of the New America
George Packer
Valutazione: 4 su 5 stelle
4/5 (45)
Manhattan Beach: A Novel
Da Everand
Manhattan Beach: A Novel
Jennifer Egan
Valutazione: 3.5 su 5 stelle
3.5/5 (792)
The Constant Gardener: A Novel
Da Everand
The Constant Gardener: A Novel
John le Carré
Valutazione: 3.5 su 5 stelle
3.5/5 (104)
Rise of ISIS: A Threat We Can't Ignore
Da Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Valutazione: 3.5 su 5 stelle
3.5/5 (137)
Little Women
Da Everand
Little Women
Louisa May Alcott
Valutazione: 4 su 5 stelle
4/5 (104)
Credit Card Fraud Detection Using Random Forest Algorithm and CNN
Documento48 pagine
Credit Card Fraud Detection Using Random Forest Algorithm and CNN
Nexgen Technology
Nessuna valutazione finora
dr-KhalidKerawani-التواصل الفوري المتزامن وغير المتزامن PDF
Documento34 pagine
dr-KhalidKerawani-التواصل الفوري المتزامن وغير المتزامن PDF
Sahar Salah
100% (1)
Method Summary For The Sprite Channel Object
Documento2 pagine
Method Summary For The Sprite Channel Object
Sahar Salah
Nessuna valutazione finora
Lesson1 PDF
Documento2 pagine
Lesson1 PDF
Sahar Salah
Nessuna valutazione finora
Director Tutorial
Documento20 pagine
Director Tutorial
Sahar Salah
Nessuna valutazione finora
Text Cat Eval Part1
Documento8 pagine
Text Cat Eval Part1
Sahar Salah
Nessuna valutazione finora
Educatiion
Documento1 pagina
Educatiion
Sahar Salah
Nessuna valutazione finora
edItLib 2015
Documento9 pagine
edItLib 2015
Sahar Salah
Nessuna valutazione finora
The Theory of Elearning v1 1227797847653768 9
Documento16 pagine
The Theory of Elearning v1 1227797847653768 9
Sahar Salah
Nessuna valutazione finora
Qualities of A Successful Manager: Attributes Behavior
Documento4 pagine
Qualities of A Successful Manager: Attributes Behavior
Sahar Salah
Nessuna valutazione finora
Time Management: Claudio Calini Lweendo Hamukoma Gabriel Tomazzia Yu Jie
Documento20 pagine
Time Management: Claudio Calini Lweendo Hamukoma Gabriel Tomazzia Yu Jie
Sahar Salah
Nessuna valutazione finora
World Journal On Educational Technology
Documento13 pagine
World Journal On Educational Technology
Sahar Salah
Nessuna valutazione finora
Ann Book
Documento16 pagine
Ann Book
Sunny Sandhu
Nessuna valutazione finora
Animal Sound Classification Using A Convolutional Neural Network
Documento5 pagine
Animal Sound Classification Using A Convolutional Neural Network
Hln Frcnt
Nessuna valutazione finora
Aspect-Based Financial Sentiment Analysis Using Deep Learning
Documento6 pagine
Aspect-Based Financial Sentiment Analysis Using Deep Learning
rallupymeyraldo
Nessuna valutazione finora
Untitled
Documento248 pagine
Untitled
Aliah Gie Zabala
Nessuna valutazione finora
Python
Documento12 pagine
Python
Dhinesh kumar
Nessuna valutazione finora
Week10 (Backprop and Competitive)
Documento63 pagine
Week10 (Backprop and Competitive)
AhmedIsmaeil
Nessuna valutazione finora
History of Neural Networks
Documento34 pagine
History of Neural Networks
ScribedSkype
Nessuna valutazione finora
Object Detection: Team:Utkarsh Dubey Pawan Jakke Bunty Dhakar Mentor: DR - Apoorva Mishra
Documento12 pagine
Object Detection: Team:Utkarsh Dubey Pawan Jakke Bunty Dhakar Mentor: DR - Apoorva Mishra
Aman Dubey
Nessuna valutazione finora
Theoretical Framework
Documento7 pagine
Theoretical Framework
Bryan Arvin Tan
Nessuna valutazione finora
Machine Learning CS-8 Dept. of CS, KFUEIT: Instructor: Muhammad Adeel Abid
Documento19 pagine
Machine Learning CS-8 Dept. of CS, KFUEIT: Instructor: Muhammad Adeel Abid
Opportunities
Nessuna valutazione finora
MCQ Chp1 ETItuba
Documento7 pagine
MCQ Chp1 ETItuba
jtuba6815
Nessuna valutazione finora
L02 Fundamentals of ML
Documento39 pagine
L02 Fundamentals of ML
YONG LONG KHAW
Nessuna valutazione finora
Metastylespeech
Documento16 pagine
Metastylespeech
kyeonj
Nessuna valutazione finora
Artificial Intelligence-IT314
Documento2 pagine
Artificial Intelligence-IT314
aamir
Nessuna valutazione finora
Real Time Deep Learning Weapon Detection Techniques For Mitigating Lone Wolf Attacks
Documento16 pagine
Real Time Deep Learning Weapon Detection Techniques For Mitigating Lone Wolf Attacks
Adam Hansen
Nessuna valutazione finora
BDB 2020 SP2 Assignment 1 - Fadel Thariq Gifari - Technology Review
Documento9 pagine
BDB 2020 SP2 Assignment 1 - Fadel Thariq Gifari - Technology Review
Fadel Gifari
Nessuna valutazione finora
Forest Fires Application Demonstration
Documento4 pagine
Forest Fires Application Demonstration
RISHITHA LELLA
Nessuna valutazione finora
2-Capacity, Underfitting, overfitting-15-Jul-2020Material - I - 15-Jul-2020 - ML - Fundamentals
Documento35 pagine
2-Capacity, Underfitting, overfitting-15-Jul-2020Material - I - 15-Jul-2020 - ML - Fundamentals
Anand Amsuri
Nessuna valutazione finora
Upgrad + PGD+ML+Brochure
Documento8 pagine
Upgrad + PGD+ML+Brochure
Rayvon
Nessuna valutazione finora
PDF
Documento1 pagina
PDF
Prateek Yadav
Nessuna valutazione finora
CS 229 Project Report: Predicting Used Car Prices
Documento5 pagine
CS 229 Project Report: Predicting Used Car Prices
shantanu singh
100% (1)
Random Forests
Documento22 pagine
Random Forests
Waleed Soudi
Nessuna valutazione finora
Artificial Intelligence: Learn Beginner To Advance
Documento10 pagine
Artificial Intelligence: Learn Beginner To Advance
Md. Zobayer Hasan Nayem
Nessuna valutazione finora
212 Machine Learning Slides
Documento65 pagine
212 Machine Learning Slides
rasromeo
Nessuna valutazione finora
01 DS 2019 CODESIGN Correction Ex1
Documento17 pagine
01 DS 2019 CODESIGN Correction Ex1
Mariem Ksontini
Nessuna valutazione finora
Applied Data Science Questions
Documento15 pagine
Applied Data Science Questions
gnanajothi k
Nessuna valutazione finora
Multivariate Multi Step Time Series Forecasting Using Stacked LSTM Sequence To Sequence Autoencoder in Tensorflow 2 0 Keras
Documento9 pagine
Multivariate Multi Step Time Series Forecasting Using Stacked LSTM Sequence To Sequence Autoencoder in Tensorflow 2 0 Keras
Anish Shah
Nessuna valutazione finora
Machine Learing
Documento6 pagine
Machine Learing
19wh1a05h4 Taninki Keerthi Phanisree
Nessuna valutazione finora
Tomato Quality Classification Based On Transfer Learning Feature Extraction and Machine Learning Algorithm Classifiers
Documento13 pagine
Tomato Quality Classification Based On Transfer Learning Feature Extraction and Machine Learning Algorithm Classifiers
Raghavendra Kusoor
Nessuna valutazione finora