Sei sulla pagina 1di 4

APPLIED DATA SCIENCE

ORTEGA, MARY ALYSSA T. 2nd Qtr SY 2019-2020

WORKSHEET #4: CLEANING DATA IN PYTHON

1 Date: DECEMBER 17, 2019


Import dob_job_application_filings.csv. Write codes below to determine the first five columns of the dataset. How
many records are in the dataset?
Code

Answers to Questions

2 Date: DECEMBER 17, 2019


Print out the value count for the boroughs. Write the code and the output.
Code

Output

Page 1 of 4
APPLIED DATA SCIENCE
ORTEGA, MARY ALYSSA T. 2nd Qtr SY 2019-2020

WORKSHEET #4: CLEANING DATA IN PYTHON

3 Date: DECEMBER 17, 2019


Create a histogram for the column ‘Existing Zoning Sqft’ of the same dataset. Rotate the axis labels by 70 degrees and
use a log scale for both axes. Write the code here and submit a copy of the plot through WS04-03: P04 Cleaning Data in Python
(#3).
Code Output

4 Date: DECEMBER 17, 2019


Import airquality.csv. Melt the columns Ozone, Solar.R, Wind and Temp into rows and assign to
airquality_melt. Rename the default variable column to measurement and the default value column to reading. Print
head() of airquality_melt. Write the code here and submit a copy of the output through WS04-04: P04 Cleaning Data in
Python (#4).
Code Output

import pandas as pd

airquality=pd.read_csv('airquality.csv')
airquality_melt=pd.melt(frame=airquality, value_vars=['Ozone','Solar.R','Wind','Temp'],
var_name='measurement', value_name='reading', id_vars=['Month','Day'])

airquality_melt.head()

5 Date: DECEMBER 17, 2019


Pivot airquality_melt from #4, with the rows indexed by ‘Month’ and ‘Day’, the columns indexed by
‘Measurement’ and ‘Reading’. Assign this to airquality_pivot. Print out the head of airquality_pivot. Write
the code here and submit a copy of the output through WS04-05: P04 Cleaning Data in Python (#5).
Code Output
import pandas as pd

airquality=pd.read_csv('airquality.csv')
airquality_melt=pd.melt(frame=airquality, id_vars=['Month','Day'],
value_vars=['Ozone','Solar.R','Wind','Temp'], var_name='measurement',
value_name='reading')
airquality_pivot=airquality_melt.pivot_table(index=['Month','Day'],
columns='measurement', values='reading')

airquality_pivot.head()

Page 2 of 4
APPLIED DATA SCIENCE
ORTEGA, MARY ALYSSA T. 2nd Qtr SY 2019-2020

WORKSHEET #4: CLEANING DATA IN PYTHON

6 Date: DECEMBER 17, 2019


Import the following files: uber_apr.csv, uber_may.csv, uber_jun.csv. Concatenate these files into a single file, uber.
Print out the head of uber. Write the code here and submit a copy of the output through WS04-06: P04 Cleaning Data in Python
(#6).
Code Output

7 Date: DECEMBER 17, 2019


Merge the following files: ‘site.csv’ and ‘visited.csv’. The output should be as follows:
`

Code

Page 3 of 4
APPLIED DATA SCIENCE
ORTEGA, MARY ALYSSA T. 2nd Qtr SY 2019-2020

WORKSHEET #4: CLEANING DATA IN PYTHON

8 Date: DECEMBER 17, 2019


Import tips.csv. Write the name and data type of the seven columns of this dataset.
Code

Answer to Question

9 Date:
Convert the sex and smoker columns to ‘category’.
Code

10 Date:
Import tips_1.csv. Convert the total_bill and tip columns to ‘numeric’.
Code

Page 4 of 4

Potrebbero piacerti anche