TM and Stylo

Caricato da

segundacuentade

Il 0% ha trovato utile questo documento (0 voti)

14 visualizzazioni1 pagina

Titolo originale

Tm and Stylo

Copyright

Formati disponibili

PDF, TXT o leggi online da Scribd

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Segnala questo documento

Copyright:

Formati disponibili

Scarica in formato PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Il 0% ha trovato utile questo documento (0 voti)

14 visualizzazioni1 pagina

TM and Stylo

Caricato da

segundacuentade

Copyright:

Formati disponibili

Scarica in formato PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Salta alla pagina

Sei sulla pagina 1di 1

Cerca all'interno del documento

RPubs - Text Mining With R and the "tm&qu... https://rpubs.

com/sgeletta/95577

RPubs brought to you by RStudio

Text Mining With R and the "tm" Package by Simon
Sign in
Last updated almost 2 years ago
Register

Data Science Capstone Comments () Share Hide Toolbars

Simon Geletta
Saturday, July 25, 2015

Milestone Report
Introduction and Objectives
The main goal of this report is to demonstrate the level of competency achieved in working with
unstructured data in order to produce a structured set of records which can then be used for the
purposes of statistical modeling. The first step in any such task is to really know (as much as
possible), what is included in the raw data (or document corpus) and to separate out the useful from
the not-so-useful information. I would like to note that because the running of the codes while
preparing the document for publication on RPub.com was taking unreasonably long period of time, I
am forced to present this report based on a 10% sample of the entire data that was provided. The
idea is to provide this as an evidence of what I will do with the entire data at the end of the capstone
project.

Methods
The first task is to download the raw resources that would be used for the analytics tasks - The main
being the three data sources en_US.blogs.txt, en_US.news.txt, and en_US.tweets.txt. In addition, the
list of bad/profane words were also obtained (later to be used to exclude from the analysis). The raw
data were extracted from the given site: http://d396qusza40orc.cloudfront.net/dsscapstone/dataset
/Coursera-SwiftKey.zip (http://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-
SwiftKey.zip) in a compressed format and locally uncompressed. The bad/profane words were
downloaded from https://raw.githubusercontent.com/shutterstock/List-of-Dirty-Naughty-Obscene-
and-Otherwise-Bad-Words/master/en (https://raw.githubusercontent.com/shutterstock/List-of-Dirty-
Naughty-Obscene-and-Otherwise-Bad-Words/master/en). These were also locally stored as
en_bws.txt. The following chunc of code shows how the files acquisition went.

dtsrc <- "http://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKe

y.zip"
if (!file.exists("coursera-swiftkey.zip")){
download.file(dtsrc, destfile="coursera-swiftkey.zip")
unzip("coursera-swiftkey.zip")
}
## list of bad/profane words download from github
bwsrc1<-"https://raw.githubusercontent.com/shutterstock/List-of-Dirty-Naughty-Obscen

1 of 1 06/04/2017 09:49 PM

Potrebbero piacerti anche

Fear: Trump in the White House
Da Everand
Fear: Trump in the White House
Bob Woodward
Valutazione: 3.5 su 5 stelle
3.5/5 (738)
A Man Called Ove: A Novel
Da Everand
A Man Called Ove: A Novel
Fredrik Backman
Valutazione: 4.5 su 5 stelle
4.5/5 (4609)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Da Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Valutazione: 3.5 su 5 stelle
3.5/5 (231)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Da Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Valutazione: 4.5 su 5 stelle
4.5/5 (119)
Never Split the Difference: Negotiating As If Your Life Depended On It
Da Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Valutazione: 4.5 su 5 stelle
4.5/5 (838)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Da Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Valutazione: 4.5 su 5 stelle
4.5/5 (265)
The Little Book of Hygge: Danish Secrets to Happy Living
Da Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Valutazione: 3.5 su 5 stelle
3.5/5 (399)
Grit: The Power of Passion and Perseverance
Da Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Valutazione: 4 su 5 stelle
4/5 (587)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Da Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Valutazione: 3.5 su 5 stelle
3.5/5 (2219)
Yes Please
Da Everand
Yes Please
Amy Poehler
Valutazione: 4 su 5 stelle
4/5 (1891)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Da Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Valutazione: 4 su 5 stelle
4/5 (5794)
Principles: Life and Work
Da Everand
Principles: Life and Work
Ray Dalio
Valutazione: 4 su 5 stelle
4/5 (599)
Team of Rivals: The Political Genius of Abraham Lincoln
Da Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Valutazione: 4.5 su 5 stelle
4.5/5 (234)
Rise of ISIS: A Threat We Can't Ignore
Da Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Valutazione: 3.5 su 5 stelle
3.5/5 (137)
Shoe Dog: A Memoir by the Creator of Nike
Da Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Valutazione: 4.5 su 5 stelle
4.5/5 (537)
The Emperor of All Maladies: A Biography of Cancer
Da Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Valutazione: 4.5 su 5 stelle
4.5/5 (271)
The Glass Castle: A Memoir
Da Everand
The Glass Castle: A Memoir
Jeannette Walls
Valutazione: 4.5 su 5 stelle
4.5/5 (1711)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Da Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
Valutazione: 4 su 5 stelle
4/5 (1090)
A Tree Grows in Brooklyn
Da Everand
A Tree Grows in Brooklyn
Betty Smith
Valutazione: 4.5 su 5 stelle
4.5/5 (1929)
Her Body and Other Parties: Stories
Da Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Valutazione: 4 su 5 stelle
4/5 (821)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Da Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Valutazione: 4.5 su 5 stelle
4.5/5 (344)
John Adams
Da Everand
John Adams
David McCullough
Valutazione: 4.5 su 5 stelle
4.5/5 (2409)
The Woman in Cabin 10
Da Everand
The Woman in Cabin 10
Ruth Ware
Valutazione: 3.5 su 5 stelle
3.5/5 (2322)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Da Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Valutazione: 4 su 5 stelle
4/5 (890)
Sing, Unburied, Sing: A Novel
Da Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Valutazione: 4 su 5 stelle
4/5 (1103)
Wolf Hall: A Novel
Da Everand
Wolf Hall: A Novel
Hilary Mantel
Valutazione: 4 su 5 stelle
4/5 (3811)
Angela's Ashes: A Memoir
Da Everand
Angela's Ashes: A Memoir
Frank McCourt
Valutazione: 4.5 su 5 stelle
4.5/5 (440)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Da Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Valutazione: 4.5 su 5 stelle
4.5/5 (474)
The Art of Racing in the Rain: A Novel
Da Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Valutazione: 4 su 5 stelle
4/5 (4200)
The Unwinding: An Inner History of the New America
Da Everand
The Unwinding: An Inner History of the New America
George Packer
Valutazione: 4 su 5 stelle
4/5 (45)
The Yellow House: A Memoir (2019 National Book Award Winner)
Da Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Valutazione: 4 su 5 stelle
4/5 (98)
The Perks of Being a Wallflower
Da Everand
The Perks of Being a Wallflower
Stephen Chbosky
Valutazione: 4.5 su 5 stelle
4.5/5 (2099)
The Constant Gardener: A Novel
Da Everand
The Constant Gardener: A Novel
John le Carre
Valutazione: 3.5 su 5 stelle
3.5/5 (104)
The Outsider: A Novel
Da Everand
The Outsider: A Novel
Stephen King
Valutazione: 4 su 5 stelle
4/5 (1839)
The Light Between Oceans: A Novel
Da Everand
The Light Between Oceans: A Novel
M.L. Stedman
Valutazione: 4.5 su 5 stelle
4.5/5 (789)
Little Women
Da Everand
Little Women
Louisa May Alcott
Valutazione: 4 su 5 stelle
4/5 (104)
On Fire: The (Burning) Case for a Green New Deal
Da Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Valutazione: 4 su 5 stelle
4/5 (73)
Brooklyn: A Novel
Da Everand
Brooklyn: A Novel
Colm Tóibín
Valutazione: 3.5 su 5 stelle
3.5/5 (1937)
The Magic Lantern - A Novel and - Jose Tomas de Cuellar
Documento203 pagine
The Magic Lantern - A Novel and - Jose Tomas de Cuellar
segundacuentade
Nessuna valutazione finora
WDSguide
Documento202 pagine
WDSguide
chandan1020
Nessuna valutazione finora
NSX-T Reference Design Guide 3-0 PDF
Documento300 pagine
NSX-T Reference Design Guide 3-0 PDF
kdjh
Nessuna valutazione finora
Cms-Rip-Linearization and Profiling
Documento8 pagine
Cms-Rip-Linearization and Profiling
victor Derrihu
Nessuna valutazione finora
Manhattan Beach: A Novel
Da Everand
Manhattan Beach: A Novel
Jennifer Egan
Valutazione: 3.5 su 5 stelle
3.5/5 (792)
Bad Feminist: Essays
Da Everand
Bad Feminist: Essays
Roxane Gay
Valutazione: 4 su 5 stelle
4/5 (1015)
Bao
Documento4 pagine
Bao
segundacuentade
Nessuna valutazione finora
Escape Room Industry Report 2019
Documento13 pagine
Escape Room Industry Report 2019
Jane Thomas
Nessuna valutazione finora
KPI Formulae & Sample Format - V1.1 - Nokia
Documento32 pagine
KPI Formulae & Sample Format - V1.1 - Nokia
Sharique Iqbal
Nessuna valutazione finora
Yamanta's AutoCAD Manual (2D Drafting)
Documento333 pagine
Yamanta's AutoCAD Manual (2D Drafting)
yamantar
Nessuna valutazione finora
Steve Jobs
Da Everand
Steve Jobs
Walter Isaacson
Valutazione: 4.5 su 5 stelle
4.5/5 (806)
Communication Systems by Simon Haykins PDF
Documento2 pagine
Communication Systems by Simon Haykins PDF
Alisha
0% (1)
Verilog HDL Lab Manual
Documento136 pagine
Verilog HDL Lab Manual
Mani Rathnam
Nessuna valutazione finora
Get Rid of Paint
Documento1 pagina
Get Rid of Paint
segundacuentade
Nessuna valutazione finora
Program Schedule-Nebraska Dance Summit
Documento1 pagina
Program Schedule-Nebraska Dance Summit
segundacuentade
Nessuna valutazione finora
Hamburger Vegetable Soup
Documento1 pagina
Hamburger Vegetable Soup
segundacuentade
Nessuna valutazione finora
Puerto Rican Fish Stew (Bacalao) Recipe
Documento2 pagine
Puerto Rican Fish Stew (Bacalao) Recipe
segundacuentade
Nessuna valutazione finora
Hamburger Vegetable Soup
Documento1 pagina
Hamburger Vegetable Soup
segundacuentade
Nessuna valutazione finora
Bunsen Labs RC Openbox
Documento17 pagine
Bunsen Labs RC Openbox
segundacuentade
Nessuna valutazione finora
Animals
Documento6 pagine
Animals
segundacuentade
Nessuna valutazione finora
Handling and Processing Strings in R PDF
Documento113 pagine
Handling and Processing Strings in R PDF
InstantRamen
Nessuna valutazione finora
Almandoz Urbanismo
Documento33 pagine
Almandoz Urbanismo
segundacuentade
Nessuna valutazione finora
Clothes Drawings
Documento13 pagine
Clothes Drawings
segundacuentade
Nessuna valutazione finora
Clothes Drawingsert
Documento13 pagine
Clothes Drawingsert
segundacuentade
Nessuna valutazione finora
Flash Cards Daily Actions45
Documento4 pagine
Flash Cards Daily Actions45
segundacuentade
Nessuna valutazione finora
Animals
Documento6 pagine
Animals
segundacuentade
Nessuna valutazione finora
Flash Cards Verbs
Documento14 pagine
Flash Cards Verbs
segundacuentade
Nessuna valutazione finora
Software Reuse in Avionics A FACE Approach White Paper
Documento4 pagine
Software Reuse in Avionics A FACE Approach White Paper
Nasr Pooya
Nessuna valutazione finora
Chapter 9memory
Documento8 pagine
Chapter 9memory
maqyla naquel
Nessuna valutazione finora
Virtusa Profile
Documento2 pagine
Virtusa Profile
Bharath Kumar
Nessuna valutazione finora
QualNet Family
Documento4 pagine
QualNet Family
Balachandra Chikkoppa
Nessuna valutazione finora
Gym Database
Documento17 pagine
Gym Database
yussef sherif
Nessuna valutazione finora
BAHRIA UNIVERSITY, (Karachi Campus) : Department of Software Engineering
Documento11 pagine
BAHRIA UNIVERSITY, (Karachi Campus) : Department of Software Engineering
Sadia Afzal
Nessuna valutazione finora
Blue Prism Release Notes 6.5.1 - 4.Pd
Documento82 pagine
Blue Prism Release Notes 6.5.1 - 4.Pd
prateek
Nessuna valutazione finora
CS6501 - Internet Programming: Unit-I Part - A
Documento28 pagine
CS6501 - Internet Programming: Unit-I Part - A
Kannusamy Kumarasamy
Nessuna valutazione finora
BLG CD 030719 2 Inity R1N R2N CN
Documento5 pagine
BLG CD 030719 2 Inity R1N R2N CN
Gladstone Samuel
Nessuna valutazione finora
HECSALV Emergency Response Software
Documento2 pagine
HECSALV Emergency Response Software
gil
Nessuna valutazione finora
HUAWEI B593s-22TCPU-V200R001B270D10SP00C00 Release Notes V1.0
Documento11 pagine
HUAWEI B593s-22TCPU-V200R001B270D10SP00C00 Release Notes V1.0
John Dale Ibale
Nessuna valutazione finora
1.wpo-01 Wcdma Radio Theory-53
Documento53 pagine
1.wpo-01 Wcdma Radio Theory-53
sabirelnara
Nessuna valutazione finora
Activate Licenses Offline Using GE Cloud License Server
Documento10 pagine
Activate Licenses Offline Using GE Cloud License Server
Sergio Rivas
Nessuna valutazione finora
4 - WMG-Cloud-Strategy-Template
Documento8 pagine
4 - WMG-Cloud-Strategy-Template
samsonadeboga
Nessuna valutazione finora
Cisco Configure Opus Support On Cisco Unified Communication Manager
Documento5 pagine
Cisco Configure Opus Support On Cisco Unified Communication Manager
andreicain
Nessuna valutazione finora
Pochana Srikar Reddy Resume 1
Documento1 pagina
Pochana Srikar Reddy Resume 1
Srikar
Nessuna valutazione finora
Iplist
Documento6.634 pagine
Iplist
Adnan Sulejmanovic
Nessuna valutazione finora
Woodward - 2301E Digital Load Sharing and Speed Control
Documento2 pagine
Woodward - 2301E Digital Load Sharing and Speed Control
Johnny Castillo
Nessuna valutazione finora
Anna University Notes
Documento92 pagine
Anna University Notes
Vignesh Ramachandran
Nessuna valutazione finora
Original
Documento92 pagine
Original
Rodi
Nessuna valutazione finora
06 Laravel - Eloquent (DB) Continued, Validation, Cookies and Sessions
Documento32 pagine
06 Laravel - Eloquent (DB) Continued, Validation, Cookies and Sessions
Daniel J. Santos
Nessuna valutazione finora
10 AI Predictions For 2022
Documento9 pagine
10 AI Predictions For 2022
asasd
Nessuna valutazione finora