Sei sulla pagina 1di 4

December, 2012

Training Guide for Using Random Forests to Classify Satellite Images


Introduction This guide explains how to download and install R and then used the random forest algorithm to classify satellite imagery. Once the R software is installed the process is quite simple. To complete this guide you need to be able to digitize training areas and save them as an ESR Shapefile. The output images will be in the !eoTiff format if the tutorial script is used but this script can be modified to wor" with other image formats. The script included with this guide will need to be modified to use with your own data set. #ear the top of the script are a few variables that must be edited to ma"e it wor" for your data. The attributes that must be changed are noted throughout this guide. The image that you use for the classification can include spectral data $image bands% and other continuous or categorical data such as &E'( climate layers( and soil maps. )efore running the scripts it is important that R is installed on your computer and you have downloaded the necessary pac"ages. nstructions for installing R and the necessary pac"ages can be found on the )iodiversity nformatics website. 1. Download and install R on a windows computer nstructions for downloading and installing R can be found on the *)* R scripts and guides website+ http+,,biodiversityinformatics.amnh.org,index.php-section.R/Scripts. 2. Digiti e training areas The script that goes with this tutorial requires that training areas are digitized as polygons and saved as an ESR shapefile. 0ny ! S software can be used for this. f you do not have access to ! S software suggest you try 1! S $http+,,www.qgis.org,%( gvS ! $http+,,www.gvsig.gva.es,% or another open source ! S pac"age but any ! S will wor" 2ust fine. The attribute table of the shapefile must have an attribute of data type integer that stores the class number for each class to be included in the output classified map. The numbers for this attribute must be sequential without s"ipping any numbers $i.e.( 3(4(5(6(7(8%. The default name of the attribute $column name% for this tutorial script is 9type/id:. The

attribute name can be changed by modifying this line in the script ;att#ame <= >type/id>?. 0dditional attributes will be ignored. @ere is an example of an attribute table $note that the 9cover: attribute in the example below is optional and will be ignored by the script%. id A 3 4 5 6 7 8 C type/id 5 5 6 6 6 3 3 4 cover Bater Bater Other Other Other Tall shrub Tall shrub Short shrub

The random forest algorithm is non=parametric so it is not necessary to "eep training areas homogeneous. Dor example( you can have a 9cloud and shadow: class with both clouds and shadows in it. Eou can have as may polygons that you want for any class. 0ll of the polygons for a single class $defined by type/id in this example% are combined when the script is run. n other words( all of the pixels that fall under a particular class( even if there are several training polygons for that class( will be grouped together.

!. "diting t#e script The random forest classification process is controlled through the use of a script. )y modifying this script you are able to customize it for your own application. #ear the top of the script there are a number of attributes that can be changed to customize it for your application. @ere is a list of the attributes with an explanation about the attribute+

shapefile . #ame and path for the Shapefile $do not include the .shp extension%. numsamps . #umber of samples $pixels% to select at random for each land cover class. The samples are used to train the random forest model. The larger this number the more memory you will need. attName . #ame of the attribute in the ESR Shapefile that holds the integer land cover type identifier. inImage . #ame and path for the input satellite image $!eoTiff wor"s and other formats probably do as well% outImage . #ame and path of the output !eoTiff image. Notes: Bhen specifying the directory path in R for a Bindows computer it is necessary to use a double=bac"slash $9FF:% instead of a single bac"slash $9F:% or you can use a forward slash $9,:% which will also wor" on 0pple and Ginux operating systems. Dor example the directory path+ *+F&ata would be typed *+FF&ata or *+,&ata. R is case sensitive. n other words 9a: is different from 90:. 'a"e sure the directory path and file name is exactly as it appears when you list the directory contents. 0lso( note that in R variable or file names should not start with a number or most special characters. t is best to start variable names with an upper or lower=case letter. Other parts of the script can also be modified but you will need to have a good understanding of how the different commands wor". The script file must be saved as an 0S* text file. $. Running t#e random forest script To run the random forest classification clic" on 9Dile .H Open script: and then navigate to the script and clic" on 9Open:. The script will open in the R Editor. Eou can run the script one( or several lines at a time or you can run the entire script by Edit .H Run all $note that the R Editor window must be selected for this option to appear%. Eou can also type the path and the name of the script file into the R *onsole using this syntax+ source$9path and filename:%. Dor example+ source$9*+FF IEFFR/scriptsFFrf/classification/windows.R:%. Remember to use double bac"slash instead of single bac"slash in the directory path unless you are running the script in Ginux in which case you would use a single forward slash. Bhen the processing starts messages are printed in the R console and once the image classification step begins a status bar is displayed so you can monitor the progress. Garge images can ta"e several hours to process. Bhen the classification is finished you will need to chec" the result to see if it is o"ay. f it is not you will need to select more training sites and,or edit some you have already defied.

Bhen you exit out of R you do not need to save your wor"space although there is no harm done if you do. %ppendices&
% ' Citations and license information

f you cite this document we as" that you include the following information+ @orning( #. 4A34. Training !uide for Jsing Random Dorests to *lassify Satellite mages = vK. 0merican 'useum of #atural @istory( *enter for )iodiversity and *onservation. 0vailable from http+,,biodiversityinformatics.amnh.org,. $accessed on the date%. This document is licensed under a *reative *ommons 0ttribution=Share 0li"e 5.A Gicense. Eou are free to alter the wor"( copy( distribute( and transmit the document under the following conditions+ Eou must attribute the wor" in the manner specified by the author or licensor $but not in any way that suggests that they endorse you or your use of the wor"%. f you alter( transform( or build upon this wor"( you may distribute the resulting wor" only under the same( similar or a compatible license. 0ny questions or comment related to this document should be sent to #ed @orning L horningMamnh.org.
( ' %c)nowledgements

Be would li"e to than" the The Nohn &. and *atherine T. 'ac0rthur Doundation for supporting this effort.

Potrebbero piacerti anche