Sei sulla pagina 1di 15

Database Concepts

Objectives

o Why should a database be used?

o What do databases have to offer?

o What should be taken into account when designing a database and setting it up?

o What are objects of a database, and how can they be used?

o Considerations in deciding whether to use a database

Purpose

Increase your awareness of the importance of managing information, and the vital role
that databases play in accomplishing that.

Learn to set up data properly using a systematic approach that starts with general
planning and ends with a database ready for use.

End results should include ability to:

o describe information management, what it is composed of, and why it is important;

o develop appropriate data structures and management procedures for a study / project
of your choosing.

History

o 1980s: dramatic reduction in the differences in capabilities of mainframe, mini-


computer, and microcomputer systems

o Virtual explosion in computer power, accessibility

Capability of storing terabytes of data


Information Management

o Common usage: worksheets (eg. Excel, Lotus)


ascii files

o Databases: can do the same thing as worksheets or ascii files, but allow
you to have the data represented as part of a bigger picture

“Relational” part of the name means that a given table in the database IS RELATED
TO, or can be linked with other related datasets (tables) in the database.

Relating Tables

By using variables that are found in both tables of a relational database, you can
merge information from these tables to generate 1 view of the results.

TABLE 1 TABLE 2
Name Name
Location Email address
Phone
Position
Department

Match up names, select people found in Peavy


Hall, and give me their e-mail addresses.

Peavy Hall Email Addresses


Name
Location
Email address

Database Design

Step 1: Determine what datasets you are interested in maintaining and viewing
regularly.

How would this data appear if it all had to be represented in a table?


o No introductory paragraphs
o No end comments

EXCEL Worksheet Containing Meteorology Data:


Dataset MetLocI Date / AirT3 AirTdel3 AirTMa AirTMi AirT
Code d Time m m1 x n Commen
ts
HOEM_AIRT HOEM 12/18/19 -2.63 0.15 -2.58 -2.69
93
19:30:00
HOEM_AIRT HOEM 12/18/19 -2.65 0.08 -2.54 -2.73
93
19:40:00
HOEM_AIRT HOEM 12/18/19 -2.42 0.02 -2.35 -2.51
93
19:50:00
HOEM_AIRT HOEM 12/18/19 -2.56 0.11 -2.40 -2.74
93
20:00:00
HOEM_AIRT HOEM 12/18/19 -2.81 0.09 -2.76 -2.85
93
20:10:00
HOEM_AIRT HOEM 12/18/19 -2.67 0.04 -2.57 -2.77
93
20:20:00
HOEM_AIRT HOEM 12/18/19 -2.45 0.04 -2.40 -2.55
93
20:30:00
HOEM_AIRT HOEM 12/18/19 -2.56 0.02 -2.48 -2.63
93
20:40:00
HOEM_AIRT HOEM 12/18/19 -2.56 0.01 -2.52 -2.61
93
20:50:00
HOEM_AIRT HOEM 12/18/19 -2.51 -4.0e-03 -2.48 -2.54
93
21:00:00

The file is already organized with 1 variable per column and is straightforward for
moving into a database table.
Ascii file:

PROGRAM NAME - Breckenridge


LOCATION - MacDONALD
ELEVATION IN FEET - 542
DATE(S) MEASUREMENTS WERE TAKEN – every 15 minutes from 1/3/91 7:30:00
to 1/3/91 13:45:00
VARIABLES: Discharge Rate,Discharge Quality,Water Temperature, Water
Temperature Quality,Conductivity,Conductivity Quality

45.11,good,3.40,good,29.60,good
45.11,good,3.60,good,29.60,good
44.26,good,3.80,good,29.60,good
43.44,good,3.90,good,29.60,good
43.44,good,4.00,good,29.60,good
43.44,good,4.60,good,29.60,good
42.62,good,4.80,good,29.60,good
41.80,good,4.90,good,29.60,good
41.80,good,5.20,good,29.60,good
42.62,good,5.00,good,29.60,good
42.62,good,4.70,good,29.60,good
42.62,good,4.40,good,30.30,good
42.62,good,5.60,good,30.30,good
42.62,good,6.20,good,28.90,good
43.44,good,5.70,good,29.60,good
44.26,good,5.60,good,29.60,good
45.93,good,5.40,good,29.60,good
46.78,good,5.60,good,29.60,good
47.63,good,6.10,good,28.90,good
48.51,good,6.50,good,28.30,good
50.26,good,6.60,good,28.30,good
51.11,good,6.70,good,28.30,good
52.02,good,6.60,good,28.20,good
52.90,good,6.10,good,28.90,good
52.90,good,6.00,good,30.30,good
52.90,good,6.60,good,29.60,good

You will need to go through this file and figure out how it would be represented if it were
all tabular. The headings will need to be represented as columns or variables in the table.

Possible result of cleaning up ascii file for stream hydrology:

Breckenridge,MacDONALD,1/3/91 7:30:00,45.11,good,3.40,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,45.11,good,3.60,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,44.26,good,3.80,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,43.44,good,3.90,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,43.44,good,4.00,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,43.44,good,4.60,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,42.62,good,4.80,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,41.80,good,4.90,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,41.80,good,5.20,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,42.62,good,5.00,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,42.62,good,4.70,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,42.62,good,4.40,good,30.30,good
Breckenridge,MacDONALD,1/3/91 7:30:00,42.62,good,5.60,good,30.30,good
Breckenridge,MacDONALD,1/3/91 7:30:00,42.62,good,6.20,good,28.90,good
Breckenridge,MacDONALD,1/3/91 7:30:00,43.44,good,5.70,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,44.26,good,5.60,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,45.93,good,5.40,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,46.78,good,5.60,good,29.60,good
Breckenridge,MacDONALD,1/3/91 7:30:00,47.63,good,6.10,good,28.90,good
Breckenridge,MacDONALD,1/3/91 7:30:00,48.51,good,6.50,good,28.30,good
Breckenridge,MacDONALD,1/3/91 7:30:00,50.26,good,6.60,good,28.30,good
Breckenridge,MacDONALD,1/3/91 7:30:00,51.11,good,6.70,good,28.30,good
Breckenridge,MacDONALD,1/3/91 7:30:00,52.02,good,6.60,good,28.20,good
Breckenridge,MacDONALD,1/3/91 7:30:00,52.90,good,6.10,good,28.90,good
Breckenridge,MacDONALD,1/3/91 7:30:00,52.90,good,6.00,good,30.30,good
Breckenridge,MacDONALD,1/3/91 7:30:00,52.90,good,6.60,good,29.60,good

It may look like the first two columns of this table are repeated a lot, but it allows you to
load all the information in a database table, and the more populated / complex your
database becomes, the more you’ll appreciate having this information within the table. If
you think about what would happen if you had datasets like this for 10 different programs
covering 20 different locations spanning a 5 year period, you’ll begin to understand why
it is helpful to add the program, location and date fields to the table.

Database Design

Step 2: Determine how datasets you’re interested in relate to one another.

o Step back and figure out how the datasets fit into a larger picture.

o What would be a logical / useful way of linking the tables together to allow
fields from them to be viewed simultaneously?

Meteorology & Stream Discharge Measurements:

Meteorology station: measures precipitation,


temperature, humidity; averaged every 10 minutes
Stream gauge: measures discharge, temperature,
conductivity; averaged every 15 minutes

It may not look like meteorological data has much in common with streamflow. On a
larger scale, time and location would be a logical way of linking them.

Relating meteorology and hydrology tables:

Meteorology Hydrology
Station Station
Date / Time Date / Time
Precipitation Discharge
Air Temperature Water Temperature
Humidity Conductivity

Select meteorology station A, stream gauge B,


match by date & show average air
temperature, total precipitation, and total
discharge by day.

Temperature, Precipitation & Stream Discharge


Date
Average Air Temperature
Total Precipitation @ Meteorology Station
Total Discharge @ Stream Gauge
If the tables do not already contain the date fields, which are important in pulling them
together, you need to be sure to add them. If this is not determined and taken care of at
the start, it creates a lot of extra work to fix this in the future.

Relationships between the meteorology and hydrology tables can be even more
formalized by setting up a location table. It may look something like this:

MEASURING SITES:

Location LocID Latitude Longitude Elevation Description Basin


Name (ft)
Beacon BENM 49.681 38.422 E 176 Cloverdal
met N e
station
Borney BOYS 56.994 50.981 E 264.43 Lorendcoo
stream N p
gauge
Boonswor BRHM 56.483 50.05647 279.0754 Lorendcoo
th met N E 6 p
station
Canada CAAM 45.753 48.42128 264.4289 Frosty
met N E 2
station
Cornwealt COHS 49.664 38.52343 290.2425 Cloverdal
h stream N E 6 e
gauge

Beacon met station and Cornwealth stream gauge are found in the same basin according
to this table.
Since the meteorology table and streamflow tables both contain a column for station
name, you can then refer to any records in the meteorology table and the stream table that
have a station found in a given basin:

MEASURING SITES
location name
locid
latitude
longitude
elevation
description
Meteorology Hydrology
basin
Station Station
Date / Time Date / Time
Precipitation Discharge
Air Temperature Water Temperature
Humidity Conductivity

This type of setup would allow you to:

1) Find all records in the measuring sites table where the basin is
“Lorendcoop”.

2) List the station name for those records.

3) Using that narrowed down list of station names, show the total daily
precipitation and stream discharges, and average air and water
temperatures sorted by date for the month of October 2001.
Resulting Table / View:
Stream Meteorolog Basin Date Tota Avg Air Total Avg
Location y Location l Ppt Temp Discharge Water
Temp
Borney Boonsworth Lorendcoop 10/1/0 0 54 2.09 38
stream met station 1
gauge
Borney Boonsworth Lorendcoop 10/2/0 0.02 49 2.73 39
stream met station 1
gauge
Borney Boonsworth Lorendcoop 10/3/0 0.01 51 2.51 37
stream met station 1
gauge

Database Design

Step 3: Formally set up tables and their relationships in the database.

Databases are populated through tables within the database.


o If you think of a database like a house, that database will be made up of tables,
like a house is made up of lots of rooms.
o Doorways in a house tie the rooms together. Relationships, or links using
fields in common tie tables of a database together.
o To access information from more than one table of a relational database at
once, the links that are set up in the design of the database will be used.

MEASURING SITES
location name (primary key)
locid
latitude
longitude
elevation
description
Meteorology Hydrology
basin
Station (foreign key) Station (foreign key)
Date / Time Date / Time
Precipitation Discharge
Air Temperature Water Temperature
Humidity Conductivity

If we go back to the example we’ve been working with, we have 3 tables to set up in the
database: measuring sites, meteorology, and hydrology.
o Location name in the measuring sites table needs to be present in order for
the hydrologists and meteorologists to link up with that information:
location name will serve the role as PRIMARY KEY.
PRIMARY KEY: unique identifier of a given field; no duplicate entries are
allowed for the field serving this role.

Any locations (stations) we want to use in our database must have an entry
for that location before using it in linked tables.

o Entry for station fields in the meteorology and hydrology tables must be
represented in the location name field of the measuring sites table: station
fields will serve the role as FOREIGN KEY.

FOREIGN KEY: field found in a dependent table which must contain a


matching entry in the primary key table / field.
Specifying the primary keys and foreign keys of a relational database tells how you link
the tables together.

If you try to make an entry in a foreign key field that is not present in your primary key
table, the database won’t allow it. It would return a primary key – foreign key constraint
error and will disregard the entry. This provides database integrity.

In the house analogy, the primary key – foreign key relationships provide doorways to go
from room-to-room in the house.

MEASURING SITES
location name (primary key)
locid
latitude
longitude
elevation
description
Meteorology Hydrology
basin
Station (foreign key) Station (foreign key)
Date / Time Date / Time
Precipitation Discharge
Air Temperature Water Temperature
Humidity Conductivity

In our example:
o the station names for the meteorology and hydrology tables aren’t the same,
but those stations would be represented in the measuring sites table.
o Using records in the measuring sites table, we are able to find all hydrology
and meteorology stations from a given basin, and
o merge fields from the hydrology and meteorology tables using the basin and
dates they have in common.

Database Usage
MS Access is one of the most common database software packages. This package
contains a series of “objects” that it would be useful to become familiar with.

MS Access objects:
o tables
o queries
o forms
o reports
o pages
o macros
o modules
MS Access Objects

MS Access objects are different ways of either viewing or processing / updating the data
stored in your database.

MS Access Tables

A table is a “collection of data about a specific topic”, such as hydrology or meteorology.

Using a separate table for each topic means that you store that data only once, which
makes your database more efficient, and reduces data-entry errors.

Tables are the official place where database records are stored. All the other objects in
MS Access are based on records in the tables.

MS Access Queries

Queries are used to view, change, and analyze data in different ways. They can also be
used as the source of records for forms, reports, and data access pages.

Four basic types of queries include;


o select – calls up a given group of records matching the constraints specified;

o update – changes / updates the values of field / record combinations meeting


specified criteria;

o append – adds new records to the bottom of a table;

o delete – removes records matching specified constraints from a table.

Basic statistical functions (eg. mean, max, min, standard deviation) can also be performed
with queries.
MS Access Forms

Forms are a way of viewing the information in your tables in other types of layouts. This
can serve a number of purposes:
o create a data-entry form to enter data into a table;
o create a custom dialog box to accept user input and then carry out an action based
on that input;
o create a switchboard form to open other forms or reports.

Most of the information in a form comes from an underlying record source (eg. table or
query). Other information in the form is stored in the form’s design.

MS Access Reports

Reports are a way to present your data in a printed format.

You have control over the size and appearance of everything on a report, so you can
display the information the way you want to see it.

Most of the information in a report comes from an underlying record source (eg. table or
query). Other information in the report is stored in the report’s design.

MS Access Pages

A data access page is a special type of Web page designed for viewing and working with
data from an Internet or intranet – data that is stored in an MS Access database or an MS
SQL server database.

A data access page may also include data from other sources such as MS Excel.
Data access pages are designed in MS Access. However, they’re stored in a separate file
outside of MS Access. A shortcut to that file is automatically added in the database
window.

MS Access Macros

A macro is a set of one or more actions that each perform a particular operation, such as
opening a form or printing a report.

Macros can help you to automate common tasks. For example, you can run a macro that
prints a report when a user clicks a command button on a form.

MS Access Modules

A module is a collection of Visual Basic for Applications declarations and procedures that
are stored together as a unit.

If you know how to write what you want in Visual Basic, you can bypass the queries and
macros, and go right to the modules.

Considerations in Deciding to use a Database

Under what circumstances is it worth my time to put my datasets into a database


program? When am I better off with an ascii, or some other representation of my data?

CONSIDER:
o size of file

o amount of data / number of tables

o what will you be doing with the data?


 searches, grouping, calculations, summary stats can be very efficient in
databases
 plotting, mapping, more complex statistics (eg. regression, ANOVA)
should be done in other software.

Considerations in Deciding to use a Database (cont’d)

What pieces of software can easily deal with database files?

MS Access and SQL Server tables can easily be saved / exported in the following
formats:
• dbase III, IV or 5 (*.dbf)
• Microsoft Excel 3, 4, 5-7 or 97-2000 (*.xls)
• HTML documents (*.html, *.htm)
• Lotus 1-2-3 WJ2, WK1 or WK3
• Paradox 3, 4, 5 or 7-8 (*.db)
• Text files (*.txt, *.csv, *.tab, *.asc)
• Microsoft Active Server Pages (*.asp)
• Microsoft IIS 1-2 (*.htx, *.idc)
• Rich text format (*.rtf)
• Microsoft Word Merge (*.txt)
• Microsoft Access databases (*.mdb)
• ODBC databases

Considerations in Deciding to use a Database (cont’d)

What pieces of software can easily deal with database files?

The following software packages have options for calling up database tables:

• MS Excel: Data  Get External Data  New Database Query

• ArcView:
o Using ODBC (open database connectivity), database tables can be linked
to ArcView.
o ArcView’s SQL connect feature allows you to query a database using SQL
and store the returned records in an ArcView table.

• SAS: File  Import  Microsoft Access table  specify database, table

• S-Plus: File  Import  From file  Files of type MS Access file (*.mdb)

CONCLUSIONS

Why should a database be used?


To make the whole greater than the individual parts.

What do databases have to offer?


They pull information from various sources together so you can see how 1 table /
dataset fits in to the greater picture.

What should be considered to set up a database?


What does the data consist of?
How will the data be used?
How should the data be organized to fit into the overall picture?

Potrebbero piacerti anche