Maintenance Geog 176B Lecture 8 Data Collection One of most expensive GIS activities Many diverse sources (source integration, data fusion, interoperability) Two broad types of collection Data capture (direct collection) Data transfer Two broad capture methods Primary (direct measurement) Secondary (indirect derivation) Stages in Data Collection Projects Planning Preparation Digitizing / Transfer Editing / Improvement Evaluation Data Collection Techniques Raster Vector Primary Digital remote sensing images GPS measurements Digital aerial photographs Survey measurements Secondary Scanned maps Topographic surveys DEMs from maps Toponymy data sets from atlases Primary Data Capture Capture specifically for GIS use Raster remote sensing e.g. SPOT and IKONOS satellites and aerial photography Passive and active sensors Resolution is key consideration Spatial Spectral Temporal www.spot.ucsb.edu Imagery for GIS Vector Primary Data Capture Surveying Locations of objects determines by angle and distance measurements from known locations Uses expensive field equipment and crews Most accurate method for large scale, small areas GPS Collection of satellites used to fix locations on Earths surface Differential GPS used to improve accuracy Total Station Pen/Portable PC and GPS Secondary Geographic Data Capture Data collected for other purposes can be converted for use in GIS Raster conversion Scanning of maps, aerial photographs, documents, etc Important scanning parameters are spatial and spectral (bit depth) resolution
Scanner Raster to vector conversion Vector Secondary Data Capture Collection of vector objects from maps, photographs, plans, etc. Digitizing Manual (table) Heads-up and vectorization Photogrammetry the science and technology of making measurements from photographs, etc.
Digitizer Data Transfer Buy vs. build is an important question Many widely distributed sources of GI Includes geocoding Key catalogs include Geodata.gov Geography Network Access technologies Translation Direct read Managing Data Capture Projects Key principles Clear plan, adequate resources, appropriate funding, and sufficient time Fundamental tradeoff among Quality, accuracy, speed and price Two strategies Incremental Blitzkrieg Alternative resource options In house Specialist external agency Map scale
Ground distance corresponding to 0.5 mm map distance
1:1250
62.5 cm
1:2500
1.25 m
1:5000
2.5 m
1:10,000
5 m
1:24,000
12 m
1:50,000
25 m
1:100,000
50 m
1:250,000
125 m
1:1,000,000
500 m
1:10,000,000
5 km
A useful rule of thumb is that positions measured from maps are accurate to about 0.5 mm on the map. Multiplying this by the scale of the map gives the corresponding distance on the ground.
Positional Accuracy (cont.) within a database a typical UTM coordinate pair might be: Easting 579124.349 m Northing 5194732.247 m If the database was digitized from a 1:24,000 map sheet, the last four digits in each coordinate (units, tenths, hundredths, thousandths) would be questionable Testing Positional Accuracy Use an independent source of higher accuracy: find a larger scale map use precision GPS Use internal evidence: digitized polygons that are unclosed, lines that overshoot or undershoot nodes, etc. are indications of error sizes of gaps, overshoots, etc. may be a measure of positional accuracy Testing Accuracy (cont.) Compute accuracy from knowledge of the errors introduced by different sources e.g., 1 mm in source document 0.5 mm in map registration for digitizing 0.2 mm in digitizing if sources combine independently, we can get an estimate of overall accuracy... (1 2 + 0.5 2 + 0.2 2 ) 0.5 = 1.14 mm
Definitions Database an integrated set of data (attributes) on a particular subject Geographic (=spatial) database - database containing geographic data of a particular subject for a particular area Database Management System (DBMS) software to create, maintain and access databases A GIS links attribute and spatial data Attribute Data Flat File Relations Map Data Point File Line File Area File Topology Theme Advantages of Databases over Files Avoids redundancy and duplication Reduces data maintenance costs Faster for large datasets Applications are separated from the data Applications persist over time Support multiple concurrent applications Better data sharing Security and standards can be defined and enforced
Disadvantages of Databases over Files Expense Complexity Performance especially complex data types Integration with other systems can be difficult Types of DBMS Model Hierarchical Network Relational - RDBMS Object-oriented - OODBMS Object-relational - ORDBMS Relational Databases rule now Characteristics of DBMS (1) Data model support for multiple data types e.g MS Access: Text, Memo, Number, Date/Time, Currency, AutoNumber, Yes/No, OLE Object (MS Object linking and embedding), Hyperlink, Lookup Wizard Load data from files, databases and other applications Index for rapid retrieval Characteristics of DBMS (2) Query language SQL Security controlled access to data Multi-level groups (e.g. census, NGA) Controlled update using a transaction manager Versioning Backup and recovery
Characteristics of DBMS (3) Applications Forms builder Reportwriter Internet Application Server CASE tools Programmable API (Applications program interface) Geographic Information System Database Management System Data load Editing Visualization Mapping Analysis Storage Indexing Security Query Data System Task Role of DBMS Relational DBMS (1) Data stored as tuples (tup-el), conceptualized as tables Table data about a class of objects Two-dimensional list (array) Rows = objects Columns = object states (properties, attributes)
Table Row = object Vector feature Column = attribute Relational DBMS (2) Most popular type of DBMS Over 95% of data in DBMS is in RDBMS Commercial systems IBM DB2 Informix Microsoft Access Microsoft SQL Server Oracle Sybase SQL Structured (Standard) Query Language (pronounced SEQUEL) Developed by IBM in 1970s Now de facto and de jure standard for accessing relational databases Three types of usage Stand alone queries High level programming Embedded in other applications Types of SQL Statements Data Definition Language (DDL) Create, alter and delete data CREATE TABLE, CREATE INDEX Data Manipulation Language (DML) Retrieve and manipulate data SELECT, UPDATE, DELETE, INSERT Data Control Languages (DCL) Control security of data GRANT, CREATE USER, DROP USER Relational Join Fundamental query operation Occurs because Data created/maintained by different users, but integration needed for queries Table joins use common keys (column values) Table (attribute) join concept has been extended to geographic case Join Record ID Address #cars 1241 123 State St. 3 1242 1801 Main St. 1 1243 2106 Elm St. 2 1244 7262 Pine Drive 1 1241 Ford 2003 1241 Subaru 2000 1241 Honda 1999 1241 123 State St. Ford 1241 123 State St. Subaru 1241 123 State St. Honda 1242 1801 Elm St. Kia Spatial indexing Many maps tiled B-tree (Balanced) Grid indexing Quad tree: Points/regions R-tree (Based on MBR) New global/spatial grids: QTM Go2 Grids 38:53:22.08N 077:02:06.86W US.DC.WAS.54.18.28.83.11 US.CA.SBA.UCSB.UCEN Spatial Search: Gateway to Spatial Analysis Overlay is a spatial retrieval operation that is equivalent to an attribute join. Buffering is a spatial retrieval around points, lines, or areas based on distance.