0 valutazioniIl 0% ha trovato utile questo documento (0 voti)
11 visualizzazioni11 pagine
This document summarizes Felicity Clemens' presentation on data cleaning techniques in Stata. It discusses identifying and removing duplicate records manually or automatically, merging datasets using Stata's merge command, and generating a moving target variable to identify the year a chemical concentration changed from its 2002 level by using a forval loop to examine relationships between years. The presentation provided hints and tips for common data cleaning problems in Stata.
This document summarizes Felicity Clemens' presentation on data cleaning techniques in Stata. It discusses identifying and removing duplicate records manually or automatically, merging datasets using Stata's merge command, and generating a moving target variable to identify the year a chemical concentration changed from its 2002 level by using a forval loop to examine relationships between years. The presentation provided hints and tips for common data cleaning problems in Stata.
Copyright:
Attribution Non-Commercial (BY-NC)
Formati disponibili
Scarica in formato PPT, PDF, TXT o leggi online su Scribd
This document summarizes Felicity Clemens' presentation on data cleaning techniques in Stata. It discusses identifying and removing duplicate records manually or automatically, merging datasets using Stata's merge command, and generating a moving target variable to identify the year a chemical concentration changed from its 2002 level by using a forval loop to examine relationships between years. The presentation provided hints and tips for common data cleaning problems in Stata.
Copyright:
Attribution Non-Commercial (BY-NC)
Formati disponibili
Scarica in formato PPT, PDF, TXT o leggi online su Scribd
and tips Felicity Clemens Stata Users’ Group meeting London, 17 & 18th May 2005
Felicity Clemens 18 May 2005
Introduction
Data cleaning – one of the most time
consuming jobs of all! Many ways of attacking the same problem when using Stata The talk will describe some common problems and propose possible solutions These are mostly reminders!
Felicity Clemens 18 May 2005
Contents
1) Introduction to the first datasets
2) Identifying and removing duplicates – by hand 3) Merging data and uses of the merge command 4) Generating a moving target variable Felicity Clemens 18 May 2005 The study
A case-control study carried across 3
central European countries Exposure of interest: exposure to chemicals in the environment Outcome of interest: cancer
Felicity Clemens 18 May 2005
Identifying duplicates in a dataset This can be done automatically (using the duplicates set of commands) We will demonstrate a manual method of identifying duplicates Two different possibilities: The same data have been entered on more than one occasion;
Felicity Clemens 18 May 2005
Identifying duplicates in a dataset This can be done automatically (using the duplicates set of commands) We will demonstrate a manual method of identifying duplicates Two different possibilities: The same data have been entered on more than one occasion; Different data have been entered using the same identifier (id numbers) Felicity Clemens 18 May 2005 The merge command
A necessary command in data
management of most big studies There are many different uses of the merge command. We look at two of them: Simple merge on id Multiple merge on id
Felicity Clemens 18 May 2005
Identifying a moving target Scenario: we have data for each town giving the chemical concentration for each year between 1982 and 2002 Problem: we need to identify the year counting backwards from 2002 in which the chemical changed from its 2002 level Why? We need to overwrite the 2002 value with a new value, and overwrite backwards until the value changed Felicity Clemens 18 May 2005 Identifying a moving target (2) rescode y1990 y1991 y1992 1010113 65 32 32 1010114 41 41 41 1010115 78 23 23 1010116 44 44 44 1010117 82 82 29 1010118 25 25 25 1010119 12 12 6 1010120 40 12 7
Felicity Clemens 18 May 2005
Identifying a moving target (3) We will use the forval loop to examine the relationship between each year’s observed value and the observed value for the previous year
Felicity Clemens 18 May 2005
Summary
Identifying duplicates – can be done by
hand or automatically using the “duplicates” set of commands Use of the merge command – to merge on a specific variable, to multiply merge datasets Generating a moving target variable – the use of the “forval” loop Felicity Clemens 18 May 2005