Benvenuto in Scribd!

Salta carosello

04 Data Processing

Caricato da

Alexandra Gabriela Grecu

Il 0% ha trovato utile questo documento (0 voti)

18 visualizzazioni6 pagine

inf

Copyright

Formati disponibili

PPTX, PDF, TXT o leggi online da Scribd

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Segnala questo documento

inf

Copyright:

Formati disponibili

Scarica in formato PPTX, PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Il 0% ha trovato utile questo documento (0 voti)

18 visualizzazioni6 pagine

04 Data Processing

Caricato da

Alexandra Gabriela Grecu

inf

Copyright:

Formati disponibili

Scarica in formato PPTX, PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Salta alla pagina

Sei sulla pagina 1di 6

Cerca all'interno del documento

Al.I.

Cuza University of Iai

Faculty of Economics and Business Administration
Department of Accounting, Information Systems and
Statistics

Data Analysis & Data

Science with R

By Marin
Fotache

Introduction to Data
Processing

Why need Data Processing ?

Data

is gathered from different sources

Databases (PostgreSQL, Oracle, MongoDB, etc.)

Spreadsheets (MS Excel, Google Spreadsheets)
HTML/XML tables, Web pages
Other statistical packages

is extremely rare that imported data is ready for

analysis:

There are missing values (NAs)

Data was gathered with errors
Variables must be renamed, recoded, removed
Data are spread among many datasets
Aggregation is needed

According

with some authors, 80% of the data

analysis time is spent with data
preparation/processing

Topics covered
Other

basic (and available in packages) functions

for dealing with NAs, NULLs, strings, dates etc.
Options for renaming, recoding and removing
variables in data frames
Subsetting
Data bind and join
Aggregation
Reshaping
Basics of programming in R
Group operations: apply "family", packages "plyr"
and "dplyr"
Case studies

Data cleaning best practices

http://www.quora.com/What-are-some-of-the-best-prac
tices-for-data-cleaning
(Leon Polovets)
Sort the data by different attributes. Negative numbers,
strings that start with obscure symbols, and other outliers
will often appear at the top of your dataset after a sort.
Look

at summary statistics (mean, standard deviation,

number of missing values, etc.) for each column. These can
be used to quickly zero in on the most common problems.

Keep

track of every cleaning operation you do so that you

can modify, repeat, or remove operations as necessary.

For

large, messy datasets, reaching 100% cleanliness is next

to impossible. If you have a list of cleanliness issues, sort
them by estimated frequency and attack the most common
problems until your dataset is good enough.

Data cleaning best practices

http://www.quora.com/What-are-some-of-the-best-prac
tices-for-data-cleaning
(Leon Polovets) - (cont.)
You can also use sampling to test data quality. For example, the
acceptance test for your dataset could be something like, "If
you pick 250 random rows, no more than 5 of them should have
formatting issues."
Tools like OpenRefine can save you a lot of time by performing a
lot of simple cleanings with very little user input.
Become an expert at using regular expressions.
Create a set of utility functions/scripts/tools to handle common
cleaning tasks. These might include: regex search-and-replace,
remapping values based on a SQL database or CSV file,
blanking out all values that don't match a regex or an allowed
list of values, deleting rows that do/don't match some filter, and
so on. If you find yourself repeating several steps over and over,
write a script to handle all of the steps at once.

Data cleaning best practices

http://www.quora.com/What-are-some-of-the-best-prac
tices-for-data-cleaning
(Alon Nir) - (cont.)

for data-type errors. e.g. numerical values which

were read as strings, dates as ranges (for example 1-10
instead of 10/1).
Inconsistencies. For example, say you have a list of
product code numbers and the name of the products
(194392: white socks). You should check that you don't
have any other product names associated with 194392,
and no other product code for white socks.
Also, when adding labels to records based on a certain
value (e.g. Small, Medium, Large for company size, based
on the number of employees), avoid the practice of using
an if statement that uses logic such as "if none of the
previous n-1 labels, then it's the nth label". That would be
a catchall that leads to mistakes.

Potrebbero piacerti anche

The Yellow House: A Memoir (2019 National Book Award Winner)
Da Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Valutazione: 4 su 5 stelle
4/5 (98)
Registru de Casa - : Soldul Zilei Precedente Cont Corespondent Data NR
Documento1 pagina
Registru de Casa - : Soldul Zilei Precedente Cont Corespondent Data NR
Alexandra Gabriela Grecu
Nessuna valutazione finora
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Da Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Valutazione: 4 su 5 stelle
4/5 (5795)
Simply The Best: in Company Upper-Intermediate - Second Edition Answer Key: Unit 11
Documento3 pagine
Simply The Best: in Company Upper-Intermediate - Second Edition Answer Key: Unit 11
Alexandra Gabriela Grecu
Nessuna valutazione finora
Shoe Dog: A Memoir by the Creator of Nike
Da Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Valutazione: 4.5 su 5 stelle
4.5/5 (537)
Case Study The Futures Unwritten
Documento1 pagina
Case Study The Futures Unwritten
Alexandra Gabriela Grecu
Nessuna valutazione finora
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Da Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Valutazione: 4.5 su 5 stelle
4.5/5 (474)
Comanda de L Client
Documento1 pagina
Comanda de L Client
Alexandra Gabriela Grecu
Nessuna valutazione finora
Grit: The Power of Passion and Perseverance
Da Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Valutazione: 4 su 5 stelle
4/5 (588)
Do The Right Thing: in Company Upper-Intermediate - Second Edition Answer Key: Unit 3
Documento1 pagina
Do The Right Thing: in Company Upper-Intermediate - Second Edition Answer Key: Unit 3
Alexandra Gabriela Grecu
Nessuna valutazione finora
On Fire: The (Burning) Case for a Green New Deal
Da Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Valutazione: 4 su 5 stelle
4/5 (74)
Promoting Your Ideas 8
Documento7 pagine
Promoting Your Ideas 8
Alexandra Gabriela Grecu
Nessuna valutazione finora
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Da Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Valutazione: 3.5 su 5 stelle
3.5/5 (231)
BA Opgave
Documento32 pagine
BA Opgave
Alexandra Gabriela Grecu
Nessuna valutazione finora
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Da Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Valutazione: 4 su 5 stelle
4/5 (895)
02b Data Structures Datasets
Documento96 pagine
02b Data Structures Datasets
Alexandra Gabriela Grecu
Nessuna valutazione finora
Never Split the Difference: Negotiating As If Your Life Depended On It
Da Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Valutazione: 4.5 su 5 stelle
4.5/5 (838)
Data Analysis & Data Science With R
Documento6 pagine
Data Analysis & Data Science With R
Alexandra Gabriela Grecu
Nessuna valutazione finora
The Little Book of Hygge: Danish Secrets to Happy Living
Da Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Valutazione: 3.5 su 5 stelle
3.5/5 (400)
03 Data Input Output
Documento43 pagine
03 Data Input Output
Alexandra Gabriela Grecu
Nessuna valutazione finora
Principles: Life and Work
Da Everand
Principles: Life and Work
Ray Dalio
Valutazione: 4 su 5 stelle
4/5 (599)
BIBLIOGRAFIE
Documento1 pagina
BIBLIOGRAFIE
Alexandra Gabriela Grecu
Nessuna valutazione finora
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Da Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Valutazione: 4.5 su 5 stelle
4.5/5 (345)
E-Mailing: in Company Upper-Intermediate - Second Edition Answer Key: Unit 12
Documento6 pagine
E-Mailing: in Company Upper-Intermediate - Second Edition Answer Key: Unit 12
Alexandra Gabriela Grecu
Nessuna valutazione finora
Yes Please
Da Everand
Yes Please
Amy Poehler
Valutazione: 4 su 5 stelle
4/5 (1891)
The Statoil Book
Documento77 pagine
The Statoil Book
Alexandra Gabriela Grecu
100% (2)
The Unwinding: An Inner History of the New America
Da Everand
The Unwinding: An Inner History of the New America
George Packer
Valutazione: 4 su 5 stelle
4/5 (45)
Nume Nota ISA Nota Statistica Nota Economie Nota LB - STR
Documento1 pagina
Nume Nota ISA Nota Statistica Nota Economie Nota LB - STR
Alexandra Gabriela Grecu
Nessuna valutazione finora
Team of Rivals: The Political Genius of Abraham Lincoln
Da Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Valutazione: 4.5 su 5 stelle
4.5/5 (234)
The Template For Physics Essay
Documento13 pagine
The Template For Physics Essay
larisa2697
Nessuna valutazione finora
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Da Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Valutazione: 3.5 su 5 stelle
3.5/5 (2259)
Froebelian Eglantyne Jebb, Save The Children
Documento3 pagine
Froebelian Eglantyne Jebb, Save The Children
Lifeinthemix_Froebel
Nessuna valutazione finora
Angela's Ashes: A Memoir
Da Everand
Angela's Ashes: A Memoir
Frank McCourt
Valutazione: 4.5 su 5 stelle
4.5/5 (440)
Accomplishment Report Project REACES
Documento21 pagine
Accomplishment Report Project REACES
Jessica Ann Manalo Sese
Nessuna valutazione finora
Steve Jobs
Da Everand
Steve Jobs
Walter Isaacson
Valutazione: 4.5 su 5 stelle
4.5/5 (806)
Kit Insert Kolesterol
Documento7 pagine
Kit Insert Kolesterol
Arsita Setyani
Nessuna valutazione finora
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Da Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Valutazione: 4.5 su 5 stelle
4.5/5 (266)
Jaramogi Oginga Odinga University of Science and Technology, Kenya - BSC Computer Security & Forensics MSC IT Security & Audit, PHD IT Security & Audit
Documento2 pagine
Jaramogi Oginga Odinga University of Science and Technology, Kenya - BSC Computer Security & Forensics MSC IT Security & Audit, PHD IT Security & Audit
Kefa Rabah
100% (1)
The Emperor of All Maladies: A Biography of Cancer
Da Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Valutazione: 4.5 su 5 stelle
4.5/5 (271)
5.occlusal Risk Factors Associated With Temporomandibular Disorders in Young Adults With Normal Occlusions
Documento5 pagine
5.occlusal Risk Factors Associated With Temporomandibular Disorders in Young Adults With Normal Occlusions
thiên lữ
Nessuna valutazione finora
John Adams
Da Everand
John Adams
David McCullough
Valutazione: 4.5 su 5 stelle
4.5/5 (2409)
Chapter 5 - Gases
Documento72 pagine
Chapter 5 - Gases
Ambar Wati
Nessuna valutazione finora
Rise of ISIS: A Threat We Can't Ignore
Da Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Valutazione: 3.5 su 5 stelle
3.5/5 (137)
Artefact 2 Lesson Plan
Documento4 pagine
Artefact 2 Lesson Plan
api-223100574
Nessuna valutazione finora
Fear: Trump in the White House
Da Everand
Fear: Trump in the White House
Bob Woodward
Valutazione: 3.5 su 5 stelle
3.5/5 (738)
Zero To Hero
Documento253 pagine
Zero To Hero
didine
50% (2)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Da Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Valutazione: 4 su 5 stelle
4/5 (1090)
Cover Letter For Content Writer With No Experience
Documento6 pagine
Cover Letter For Content Writer With No Experience
bcrqhr1n
100% (2)
Bad Feminist: Essays
Da Everand
Bad Feminist: Essays
Roxane Gay
Valutazione: 4 su 5 stelle
4/5 (1016)
Historian's Craft - Early Modern Europe, Darnton, Burke, Historical Anthropology and Mentalities
Documento6 pagine
Historian's Craft - Early Modern Europe, Darnton, Burke, Historical Anthropology and Mentalities
Abhinava Goswami
Nessuna valutazione finora
The Glass Castle: A Memoir
Da Everand
The Glass Castle: A Memoir
Jeannette Walls
Valutazione: 4.5 su 5 stelle
4.5/5 (1713)
Web Lab Manual Nhce - Updated
Documento41 pagine
Web Lab Manual Nhce - Updated
poornima venkatesh
Nessuna valutazione finora
The Outsider: A Novel
Da Everand
The Outsider: A Novel
Stephen King
Valutazione: 4 su 5 stelle
4/5 (1839)
The Enduring Context of Ihrm
Documento23 pagine
The Enduring Context of Ihrm
hurricane2010
Nessuna valutazione finora
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Da Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Valutazione: 4.5 su 5 stelle
4.5/5 (121)
Strategies For The Indo-Pacific: Perceptions of The U.S. and Like-Minded Countries
Documento100 pagine
Strategies For The Indo-Pacific: Perceptions of The U.S. and Like-Minded Countries
jb
100% (1)
A Man Called Ove: A Novel
Da Everand
A Man Called Ove: A Novel
Fredrik Backman
Valutazione: 4.5 su 5 stelle
4.5/5 (4610)
Statement of Teaching Interests-Example
Documento3 pagine
Statement of Teaching Interests-Example
NedelcuGeorge
Nessuna valutazione finora
The Woman in Cabin 10
Da Everand
The Woman in Cabin 10
Ruth Ware
Valutazione: 3.5 su 5 stelle
3.5/5 (2322)
Human Resources Management Practices in Modern World
Documento7 pagine
Human Resources Management Practices in Modern World
Maheshkumar Mohite
Nessuna valutazione finora
The Light Between Oceans: A Novel
Da Everand
The Light Between Oceans: A Novel
M.L. Stedman
Valutazione: 4.5 su 5 stelle
4.5/5 (789)
Line Follower Robot
Documento16 pagine
Line Follower Robot
Venkat Munnangi
100% (1)
Wolf Hall: A Novel
Da Everand
Wolf Hall: A Novel
Hilary Mantel
Valutazione: 4 su 5 stelle
4/5 (3811)
Priyajit's Resume New
Documento3 pagine
Priyajit's Resume New
amrit mohanty
Nessuna valutazione finora
Brooklyn: A Novel
Da Everand
Brooklyn: A Novel
Colm Tóibín
Valutazione: 3.5 su 5 stelle
3.5/5 (1937)
Sap WM Organization Structure
Documento7 pagine
Sap WM Organization Structure
Narendra Kumar
Nessuna valutazione finora
The Perks of Being a Wallflower
Da Everand
The Perks of Being a Wallflower
Stephen Chbosky
Valutazione: 4.5 su 5 stelle
4.5/5 (2104)
Sheroune 2 1 1.afl
Documento11 pagine
Sheroune 2 1 1.afl
udhaya kumar
Nessuna valutazione finora
The Art of Racing in the Rain: A Novel
Da Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Valutazione: 4 su 5 stelle
4/5 (4200)
Inquiries, Investigations and Immersion: Quarter 3 - Module 1
Documento8 pagine
Inquiries, Investigations and Immersion: Quarter 3 - Module 1
Kenneth Bautista
Nessuna valutazione finora
Little Women
Da Everand
Little Women
Louisa May Alcott
Valutazione: 4 su 5 stelle
4/5 (104)
Natural Motion in Physics (Aristotle, Newton, Einstein)
Documento11 pagine
Natural Motion in Physics (Aristotle, Newton, Einstein)
George Mpantes mathematics teacher
Nessuna valutazione finora
Manhattan Beach: A Novel
Da Everand
Manhattan Beach: A Novel
Jennifer Egan
Valutazione: 3.5 su 5 stelle
3.5/5 (792)
Java Training - Disys
Documento12 pagine
Java Training - Disys
Arun Kiliyara
Nessuna valutazione finora
Her Body and Other Parties: Stories
Da Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Valutazione: 4 su 5 stelle
4/5 (821)
Importance of Communication in Society
Documento16 pagine
Importance of Communication in Society
Ph843 King's Kids StudentCenter
Nessuna valutazione finora
Sing, Unburied, Sing: A Novel
Da Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Valutazione: 4 su 5 stelle
4/5 (1103)
Fundamentals of Probability
Documento58 pagine
Fundamentals of Probability
hien05
Nessuna valutazione finora
A Tree Grows in Brooklyn
Da Everand
A Tree Grows in Brooklyn
Betty Smith
Valutazione: 4.5 su 5 stelle
4.5/5 (1929)
Ted Owens - Flying Saucer Intelligences Speak
Documento34 pagine
Ted Owens - Flying Saucer Intelligences Speak
Homers Simpson
Nessuna valutazione finora
The Constant Gardener: A Novel
Da Everand
The Constant Gardener: A Novel
John le Carré
Valutazione: 3.5 su 5 stelle
3.5/5 (104)
Solman PDF
Documento71 pagine
Solman PDF
dav00034
Nessuna valutazione finora
Epdm
Documento2 pagine
Epdm
happale2002
Nessuna valutazione finora
Nishant Srivastav IPR Full Paper
Documento16 pagine
Nishant Srivastav IPR Full Paper
DIVYANSHI PHOTO STATE
Nessuna valutazione finora