Sei sulla pagina 1di 10

ITECH 2004 DATA MODELLING

ASSIGNMENT 1
Data mining project related to
mobility using GPS track logs.

Student Name: Prabin Bhusal

Student ID: 30356718

Date: 30/08/2019
Contents
Introduction .................................................................................................................................................. 3
ER Diagram .................................................................................................................................................... 4
Assumption ................................................................................................................................................... 4
Normalization................................................................................................................................................ 5
Dependency Diagram and Relational Schema .............................................................................................. 5
Discussion...................................................................................................................................................... 8
Entity and Attributes Table ........................................................................................................................... 9
Reference .................................................................................................................................................... 10
Introduction
In this report I will provide ERD Diagram, Normalization and the Relational Schema in order to
create database for a data mining project. This report will help to develop a database schema to
store track logs, and to keep a record of any calculations and transformations that have been
carried out on these track logs into different formats. The entities and attributes are provided
and the normalization will be carried out in the report. Experiment is done and each of the
experiment will have a certain result. There will be usage of algorithm in the tracks and in the
experiment.
ER Diagram

Assumption
1. The relation between FORMAT and TRACK is many to one, which means one track can
have many format where one format belongs to one track.
2. The relation between TRACK and POINT is many to many this is why a composite entity
REGISTER is included. In the description it has mentioned a track will have multiple
points this is why in this many to many relationship composite entity is introduced.
3. The relation between TRACK and ALGORITHM is one to many which means one track
can use many algorithms.
4. The relation between ALGORITHM and EXPERIMENT is many to one therefore each of
the experiment is done using many algorithm.
5. The relation between EXPERIMENT and RESULT is one to one because one experiment
will produce one result.
6. ALGORITHM is a super type and the discriminator can be ALGORITHM_TYPE because
there are two types of algorithm SIMPLE and COMPLEX, it is of disjoint constraint.
7. In RESULT table there is EXPMT_NAME which is foreign key because the result is
associated with the specific experiment.

Normalization
Normalization is a process for evaluating and correcting table structures to minimize data
redundancies, thereby reducing the likelihood of data anomalies. The normalization process
involves assigning attributes to tables based on the concepts of determination and functional
dependency. Normalization works through a series of stages called normal forms. The first
three stages are described as first normal form (1NF), second normal form (2NF), and third
normal form (3NF). From a structural point of view, 2NF is better than 1NF, and 3NF is better
than 2NF. For most purposes in business database design, 3NF is as high as we need to go in the
normalization process.

Although normalization is a very important ingredient in database design, we should not


assume that the highest level of normalization is always the most desirable. Generally, the
higher the normal form, the more relational join operations we need to produce a specified
output. Also, more resources are required by the database system to respond to end-user
queries. A successful design must also consider end-user demand for fast performance.
Therefore, we will occasionally need to denormalize some portions of a database design to
meet performance requirements. Denormalization produces a lower normal form; that is, a 3NF
will be converted to a 2NF through denormalization.

Dependency Diagram and Relational Schema


1. Format Table

Relational Schema: 2NF(FORMAT_TYPE, FORMAT_NAME, FORMAT_SIZE, FORMAT_DESC)


Conversion to 3NF

Relational Schema: 3NF(FORMAT_TYPE, FORMAT_NAME, FORMAT_SIZE)

Note: For 3NF put the determinant there and remove the dependence.

2. TRACK Table

Relational Schema: 3NF(TRACK_ID, TRACK_NAME, TRACK_DATE, TRACK_LOCATION)


3. REGISTER Table

Relational Schema: 3NF( Track_ID, POINT_ID, REGISTER_TIME)


4. POINT Table

Relational Schema: 3NF(POINT_ID, POINT_LONGITUDE, POINT_LATITUDE, POINT_DATE,


POINT_TIME)

5. ALGORITHM Table

Relational Schema: 2NF(ALGORITHM_TYPE, ALGORITHM_NAME, ALGORITHM_DESC,


TRACK_ID)

Conversion to 3NF

Relational Schema: 3NF(ALGORITHM_TYPE, ALGORITHM_NAME, TRACK_ID)

6. SIMPLE Table

Relational Schema: 3NF(ALGORITHM_TYPE, ALGORITHM_NAME)


7. Complex Table

Relational Schema: 3NF(ALGORITHM_TYPE, ALGORITHM_NAME)

8. EXPERIMENT Table

Relational Schema: 3NF(EXPMT_NAME, EXPMT_DATE, EXPMT_RANGE, EXPMT_NOTE,


ALGORITHM_NAME)
9. RESULT Table

Relational Schema: 3NF(RESULT_NAME, RESULT_TYPE, RESULT_DATE, EXPMT_NAME)

Discussion
1. Looking into the ALGORITHM and FORMAT table there is a Transitive Dependency
this is why it is in 2NF form so in order to convert it into 3NF form the attributes with
dependency are taken out and give a different separate table.
2. All of the other tables are in 3NF already because there were no any partial and
transitive dependency.
3. In the relational schema Primary key and Foreign key were defined by Underline for
the primary key and Italics font with underline for foreign key.
4. The many to many relationship between TRACK and POINT is resolved by introducing
new composite entity REGISTER.
Entity and Attributes Table
Entity Attributes Data Type Description
FORMAT FORMAT_NAME VARCHAR PRIMARY KEY
FORMAT_SIZE INTEGER VALUE
FORMAT_DESC VARCHAR
TRACK TRACK_ID INTEGER PRIMARY KEY
TRACK_NAME VARCHAR
TRACK_DATE DATETIME Y/M/D
TRACK_LOCATION VARCHAR
REGISTER TRACK_ID INTEGER FOREIGN KEY
POINT_ID INTEGER FOREIGN KEY
REGISTER_TIME DATETIME H/M/S
POINT POINT_ID INTEGER PRIMARY KEY
POINT_LONGITUDE INTEGER
POINT_LATITUDE INTEGER
POINT_DATE DATETIME Y/M/D
POINT_TIME DATETIME H/M/S
ALGORITHM ALGORITHM_TYPE VARCHAR PRIMARY KEY
ALGORITHM_NAME VARCHAR
ALGORITHM_DESC VARCHAR
TRACK_ID INTEGER FOREIGN KEY
SIMPLE ALGORITHM_TYPE CHAR (FK, PK)
ALGORITHM_NAME VARCHAR
COMPLEX ALGORITHM_TYPE CHAR (FK, PK)
ALGORITHM_NAME VARCHAR
EXPERIMENT EXPMT_NAME VARCHAR PRIMARY KEY
EXPMT_DATE DATETIME Y/M/D
EXPMT_RANGE INTEGER
EXPMT_NOTE VARCHAR
ALGORITHM_TYPE CHAR FOREIGN KEY
RESULT RESULT_NAME VARCHAR PRIMARY KEY
RESULT_TYPE CHAR
RESULT_DATE DATETIME Y/M/D
EXPMT_NAME VARCHAR FOREIGN KEY
Reference
No any materials used from any source

Potrebbero piacerti anche