Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Database Design
The most challenging aspect of a database project is design. Database design
includes the entire planning phase, cost-benefits analysis, selecting a DBMS, user
interface objectives, business rules, prototyping, testing, data migration,
interviewing users, modeling, and other activities. In this chapter, I will focus on
the most important topics related to database design: Planning, Discovery, and
Data Modeling. Other topics will be discussed in this chapter as passing concepts
and/or addressed in other chapters throughout this text.
The database design concepts have three major subjects, Planning, Discovery,
and Data-Modeling. These three subjects will be discussed in detail to ensure that
you can design a good database. In this chapter, I will discuss the approaches to
database design from both a theoretical and practical perspective. Both
approaches are important. Database design theory exists to ensure that your
design has met the standards of the database community. The practical
discussions are important to ensure that your database users will understand your
approach.
Planning
Too many database projects have been abandoned because of a lack of planning,
poor planning, and even a failure to plan. The planning phase of a database is
used to ensure that a database will actually solve a business objective. Planning a
database should include a feasibility study to ensure that the cost of implementing
a database solution is actually worth the efforts. It would be a shame to
implement a database system that later proves too expensive. Time should be
spent on evaluating the appropriate personnel to maintain the database project
once the database is live. A database is useless without the appropriate people to
support the database system. An understanding of their strengths, weaknesses,
and synergy is necessary. Make sure that the business rules can be
accommodated by the database system. A company should never have to make
changes to work with a business system; however, the reverse should be
explored. If a company currently has a database solution, an understanding of
how the old systems data will be migrated to the new database solution needs to
be understood. Users of the new database system should not be expected to re-
enter the years of data entry in the new project. Finally collecting all the
personnel who will interact directly and indirectly with the database system is
needed to ensure that the new database project objectives will be met.
The Planning process entails four phases, Discovery, Conceptual, Logical, and
Physical design. Each phase will be discussed in this next section to ensure that a
good database can be designed. As an example to illustrate the planning phases, I
will use a basic enterprise system, which will include Order, Customer Service,
Marketing, Sales Reps, and Inventory Management.
Planning Phases
Discovery
ER Diagram
Conceptual Design
Logical Design
Physical Design
Discovery
The first step in the planning phase in database design is the discovery phase.
The discovery phase is used to find facts about the company needs, employee
expectations, current systems, current practices, employee concerns, and
documentation. There are a number of techniques used to today to discover facts
about an organization, but I will focus on the four most common. The four most
common techniques used to discover facts about an enterprise are through
interviewing, document reviews, observation, and research. Each technique will
be described in the next few sections. After describing the discovery section, I
will then discuss the next important phase of planning, which is the conceptual
database design.
Interviewing
The Interviewing technique is the most practiced in database planning. There are
many interviewing approaches however. There is the one, on, one approach,
where you meet with every member of an organization who will be involved in
using the database. There is also the group approach where you can field
questions and answers based on the departments where each employee works.
Then there is a enterprise interview approach where you will generally meet with
middle management from each department that will use the new database.
Group Interview
The group interview process is much less expensive because you will spend less
time interviewing however, you do not always hear everyone's input because of
the competition for attention from the interviewer. The competition among those
interviewed to get their thoughts expressed tends to make the group interview
process chaotic.
Enterprise Interview
The enterprise interview process, like group interviews tends to be less expensive
and a little more formal. Those involved in the enterprise interview process act as
representatives for their department. The representative would usually be a
department manager, or someone who has intimate knowledge of the business
process in his or her respective department.
Examining Documents
Another important discovery technique is to examine the company's existing
documentation. The company uses the existing documentation to make
decisions. What is on the document that employees need to make decisions?
How are the documents filed? How are the documents retrieved? All of this is
important in the planning phase of database design. Not only do you want to see
the user documents, but you want the documentation on the existing system, such
as the data dictionary, flow charts, and user manuals, as well.
Observations
When articulating needs become too difficult, consider another discovery
strategy, observation. Take the time to watch how an employee, or a group of
employees, operates during a specified business process. A major disadvantage
to this process is that users may behave differently if they know they are being
watched. Another major disadvantage is the timing in which you have decided to
make observations. There is no point observing the data entry process of student
schedules during the first few weeks of a semester.
Research
In helping the development process, a lot can be gained by obtaining information
from other resources outside the company walls. There are credible bulletin
boards, textbooks on the specific model, journals on the industry you are
modeling, and user groups.
Examining the existing system can illustrate to the database developers the
problems that the user is experiencing, live. There is nothing better than a real
life experience to gain a perspective on a situation. However, there is a great
danger in evaluating an existing system. The major disadvantage in examining an
existing electronic system is that it can negatively influence your decisions in
making the new system. It is very simple to take an old model and put a new face
over it. It is very simple to take an older DBMS and upgrade it to a newer more
popular DBMS. The objective of the discovery phase is to determine how you
can improve the business process, not duplicate it. The only advantage to
reviewing the existing system is to ensure the user that you will not take away an
interface or a specific process that the users do enjoy. In order to make sure that
the learning curve is as flat as possible, you can get copies of the systems reports
and screen captures to make sure that your new design is relatively similar.
Identify Subjects
The next consideration of the conceptual design phase is to identify the subjects.
Every database project will have a subject. More than likely, your database
project will have multiple subjects, related based on common information
supplied by mutual attributes. One manner in determining subjects is based on
the discovery forms detailed in the previous section; the other is through
interviewing clients and having discussions about their objectives about the future
database. The business that you are evaluating may have multiple subjects per
view. For example, the Orders view will have a number of subjects such as;
Order, Products, Customers, and Sales Representatives. In some cases, a subject
may be found in multiple views. For example, the Customers information would
not only be important to those individuals satisfying customer orders, but also to
those individuals whose responsibility it is to handle customer satisfaction.
Further, the subject Sales Representatives are not only important to the order
takers but to those personnel who will manage the Sales department. As the
developer, you will manage this problem by creating a local data model, and
then merging your design later in what is known as a global data model. In each
local modal of the same subject, you may discover that each department may
want different information about a specific subject. You will then merge the
attributes in a single subject later in the subject implementation stage.
During the subject identification stage, you will identify them by nouns. For
example, after a discussion with the customer service manager, you determine
that they continuously make references to the "customer complaint form". This
file:///C|/Books/Books/Database/Support/Database%20Design.htm (6 of 37) [12/18/2005 1:23:41 PM]
Database Design
Identify Relationships
Once the modules and entities have been identified, you can concern yourself
with the relationship between each entity. In this stage, you want to make sure
that every entity is related to at least one other entity. Generally, you would use
nouns to identify the modules and tables, but with identifying relationships, you
would look for verbs when you are interviewing users. A user would tell you, for
example, that a customer is assigned a sales representative, a department has
employees, or a building has classrooms. The most common way to express
relationships among entities is to do so visually. The most common and standard
method is to create an Entity Relation (ER) Diagram, which is used to graphically
represent the structural design of a database project. More detail on ER diagrams
will be discussed later in this chapter.
Identify Attributes
Once you have identified all of the entities in the database project, you will then
want to list every category of data that will be found in your project. Do not
concern yourself with which entity the attribute belongs. Just list every category
of information that you have gathered. You may find a number of problems.
Calculated fields
Now consider calculated fields. A calculated field is created because of the
value(s) of other attributes. It is not uncommon to want to know immediately the
age of a student, or the length of service in years an employee has, or the number
of publications an author has in total. I can tell you that it is never a good idea to
store calculated fields in the basic structure of a database. The nature of
calculated fields are that they depend on values in other fields. That means you
can supply that calculated information to the users who want the information
when they need it. But it should not be part of the database structure. Calculated
fields take up unnecessary space within a database project. If disk space becomes
a problem, consider removing any and all calculated fields. Another problem
associated with calculated fields is with the accuracy. A calculated field that is
derived from multiple records may not have yet been updated. As a result,
making decisions on poor information can prove embarrassing for managers.
Consider a customer who has a credit limit of $10,000.00 who calls in to place a
new order. Well, the customer's credit limit is reduced based on an outstanding
balance of previous orders. A customer's outstanding balance would need to be
checked by summing the total sales for all orders that have not been satisfied.
This scenario will require SQL to help determine the value. When you pull a
customer who wants to place an order, you will probably be using SQL, and that
takes system resources. Your calculated field would also take system resources.
In the long run, this would slowly compromise your systems performance. A
solution would be to use an UPDATE method in SQL, which will be discussed in
a later chapter.
Atomic fields
Finally, atomic attributes are fields that store one piece of data. A field called
Address should not store 21Main Street, Wyandanch, New York 11798 as an
example. The problem is if you wanted a list of all customer from a specific
state, or zip, the records could not be retrieved. As an alternative, you will create
atomic fields where you would have a field called Street that stores 21 Main
Street, City that store Wyandanch, State that stores New York, and Zip that stores
11798.
Determine Keys
At this stage of the conceptual design phase, we want to determine the keys for
each entity. A Primary Key (PK) is used to uniquely identify a record within an
entity. A PK can be a single column or multiple attributes. Any field that could
potentially uniquely identify a record is known as a Candidate Key (CK).
Consider a table used to store the college faculty. You may have a field called
FacultyID, which could automatically generate values to a record, and that same
entity could have field called SocSecNo. Both fields could uniquely identify a
specific faculty, but you will have to choose one to be the primary key. In this
case, the SocSecNo field is now called an Alternate Key (AK). An AK is an
attribute or a set of attributes that could identify a record, but is not used as the
PK. If you need more than one attribute to make a PK, this is known as a
Composite Key (CK). Consider, for example, a student registering for a course.
You will probably have a table called StudentCourse. In order to prevent a
student from registering for the same class more than once, we use the student's
ID and the course ID as a primary key. Also consider a collection of authors and
textbooks. As you know, more than one author can write a textbook. In order to
credit the authors with their related texts, you will need to create an author ID
field, and a book ID field, and make them both the primary key.
There is a lot to consider when designing a database project. The second phase is
the conceptual database design phase. Below, I have listed the most common
stages of the conceptual design phase.
The first step of the logical design phase is to remove all complex relationships.
An example of a complex relationship would be one to one, and many to many.
A one to one (1:1) relationship is where you have a record in one relation only
relate to a record in another table, and the reverse is true. 1:1 relationships are
rare and should be used as infrequently as possible. An example of a 1:1
relationship would be an employee relation and a benefits table. A single
employee will generally only have a single benefits record. However, different
HR staff normally handles the two subjects. An illustration of a one to one
relationship will be found below.
One to one relationships can create update problems in the database project. An
easier solution would be to merge the two tables and use some security measure
to prevent unauthorized usage to specific attributes.
Integrity Constraints
During the logic phase of database design, you will want to consider the
constraints that should be imposed on your data and project. First let's consider
the constraints placed on your data. A constraint is a restriction imposed on your
data. Constraints can have many different perspectives, which will be discussed
in this section. Each attribute in your entities will require data of course. Take
this time to determine the fields where the information being placed in the record
is required. Making an entry required would force the user to enter information
into the RecordSet before moving to a new record. For example, consider
entering a student into the Registration module. In order to enter the student, the
social security number must be known. Generally, the social security number
cannot be null. Any field that is required cannot be permitted to remain without a
value.
Finally, you want to impose referential integrity to ensure that you are not
creating anomalies inadvertently. Referential Integrity is a record that is related
to records in other relations, most commonly a parent/child relation. An example
of referential integrity is when you prevent the user from allowing a student to
enroll in a class that does not exist in a related table. Another example of
referential integrity would not allow a user to delete a customer who has
outstanding invoices. Of all the constraints, referential integrity is the most
important. You will have to consider what you want to happen when a records
key is modified. For example, when you make a change to a primary key, would
you want to update all of its related records, or do you want to leave the records
as orphans? There are a number of options you would have as a developer when
considering referential integrity. (1) A: Cascade Update/Delete is used to ensure
that referential integrity is imposed within a given relationship. Cascade Update
will update the changes made on the parent side of a relationship to the child
side. For example, if the wrong ISBN was entered into the database for a given
book, cascade update would make the changes to the related publisher and
authors. B: Cascade Delete is used to delete the related child records when the
parent record is removed. For example, if you were to delete a student from the
database project, you would also delete the related course information and all
other records related to the student. This is not a recommended constraint for
obvious reasons. Imposing cascade delete would not only remove the parent
record, but all related child records in other entities. (2) Another option is to set
all child side relations to Null. For example, you may have a database project
with all the Little League teams and its related players. When deleting teams,
rather than deleting the child side as well (players), you can set the TeamID to
null. This way you can later return all players who are not associated with a
team. A parallel example would be a major concentration that is no longer
supported at the university. Rather than deleting the related students, you would
set their concentration to Null. So a student whose major is business does not yet
have a concentration like economics, human resources, or even information
systems. (3) Rather than setting the foreign key to null, you can also set the value
to a default. When a concentration is eliminated at the university, the student's
major-concentration will now be set to General. So a student can major in music-
general, business-general, or mathematics-general. (4) The final option is the
most preferred and practiced, which is to prevent a parent record from being
deleted if the record has related records in other tables. So, before you can delete
a team from the Little League database project, you will first have to reassign the
current team members to another team.
The final constraints are business rules. Business rules are constraints that will
further dictate the behavior of your database and the validity of the data it will
manage. For example, a business rule can be that no student may enroll in more
than 5 classes in a given term. Or, a customer cannot order more products in
advance that exceed his or her credit limit.
Integrity Constraints
Required
Data Validation
Primary Key
Referential Integrity
Business Rules
Normalize Relations
Once you have successfully considered the integrity constraints, you are now
ready to normalize your subjects. The concepts of normalization will be
discussed in the Normalization chapter. However, I can say now that
Normalization is a process of asking critical questions about your entities and
making the necessary changes to avoid a duplication of data, and to avoid
anomalies. Each series of questions is called normal forms. The first question is
called the first normal form or 1NF. The second question(s) are called the second
normal form or 2NF, and so on. It is the industry standard to go to 3NF. In short,
the 1NF is used to eliminate repeating groups; 2NF is used to remove any and all
instances of partial dependencies; and, 3NF is used to remove any signs of
transitive dependencies.
You merge your modules by replacing redundant subjects with a single subject.
Make sure that all the attributes collected from the separated redundant entities
are represented in the merged version. For example, consider the Inventory,
Packaging, and Orders modules. The common subject all three modules would
have is Product. Products are retrieved from inventory when picked for shipping,
pulled from the shelves for packaging, and made available in orders to set prices.
Each module is certainly interested in Products, but it may not be aware of the
same types of information. Below is an illustration of how each module will have
an impact on products.
As you can see, each module has an interest in products; however, each module
clearly has a different interest. The persons responsible for orders were interested
in setting prices, but as you can see, those people in the warehouse or packaging
department would not be interested in the value of the product. The same is true
for the Inventory and Packaging departments; they are both interested in category
so they understand where the product must be placed on the shelves prior to
shipping. We cannot create three separate relations for the same subject because
that would be a problem with redundancy. At this point you will want to merge
the categories in order to create a single relation. But there are a few other
problems.
Some of the terminology is not very clear. What is the difference between
ProductID, ProductNo, or No? Or, ProductName, Name, or ProdName? It is not
uncommon to run into problems that are simply semantic in nature. Be careful to
avoid categories that have the same or similar names but have very different
meanings in separate parts of the organization. As the liaison for the separate
departments, you will have to ask the users to agree on a common term to avoid
any further confusion during the development cycle.
Does Pieces in the Inventory view mean the same as QOH (Quantity On Hand) in
the Orders view? Of course, you will have to work out the terminology before
you begin the merge process so that you do not create duplicate categories, or
leave out an important field. Below is an example illustration of the merging of
the Products subject from the three modules discussed above.
Create Relations
In this stage of the physical design phase, you will finally want to create the
relations. However, in order to do so, you will require an intimate knowledge of
the DBMS you have chosen. Consult Appendix B-Creating Tables chapter later
in this book to get a review of how to create tables in many of the more common
DBMS, such as Access, SQL Server, Oracle, or FoxPro. In this stage, you will
need to have knowledge of creating the tables, naming the field in each table,
identifying the appropriate data types, field lengths, establish the primary key,
create any computed attributes, establish default values, implement data
validations, and establish how attributes will handle Nulls. All of this
information in order to create the tables for your database project will come from
the DBDL and Data Dictionary, which will be discussed later in the chapter.
When creating tables, there are a few considerations you should make. The first
is the table naming conventions. Make sure that all entities in your database
project are named consistently. Make sure that each table truly represents a
subject within your database module. Do not use proper names such as Sal's
Customer List. If Sal is a sales rep, then his customers can be retrieved using
SQL. Always use singular form when naming tables. A table named Customers
suggests that you are working with multiple customers. The truth of the matter is
that more times than not, you will be dealing with one record at a time. So a
more appropriate table name would be Customer.
Naming Relations
To expand on the concept of table naming conventions and types. You should be
clear on the nature of the table you are creating. In other words, clarify if the
tables are subject oriented or if they act as validation for subject type tables.
These types of tables are known as strong and weak entities. A Strong Entity is
a table that is based on the higher-level subject of the project. A weak entity
cannot exist without the existence of the strong entity. A Weak Entity acts as a
point of validation for an attribute in the strong entity. Consider for example a
student table used to record students at a community college and a major table
used to categorize each student by a major. The student table would be
considered a strong entity because the student would be one of the more
important subjects in a higher education database project. The major subject
would not exist if it were not for the student table, so we can identify the major
table as weak and the student table as strong.
For hints in identifying strong and weak entities, you can usually assume that an
entity is strong because it has a number of foreign key attributes that relate to
smaller tables that are designed to validate values in the stronger table. It is also
common to identify strong entities as a Parent and weak entities as a Child
entity. Another common name for strong and weak entities is Data and
Validation respectively. So it would be helpful to name relations based on the
nature of the table. So the student table described above may be called
tdatStudent and the major table may be named tvalMajor. The t is used to
confirm that the object is a table as opposed to a query. The dat string is used to
identify the table as a data table, so you could easily determine that the relation is
based on a higher-level subject. The val string is used to suggest that the table is
a weak entity, so the val string represents a validation. Below is table that
describes the types of tables and the common prefixing assigned for improved
clarity.
Implement Attributes
Each entity is useless without the implementation of its attributes. All attributes
except for calculated columns of data should be included as this stage of the
physical design phase. You want to make sure that each attribute is accounted for
based on the information obtained from the DBDL and Data Dictionary.
There are a few guidelines you will want to consider when implementing fields in
your relations. Avoid duplicate field names in your database project. Make sure
your names are unique throughout the entire database project. The First Name
field is the most obvious culprit. In a higher education database, you will have
multiple instances of a first name. Faculty, students, alumni, staff, contacts,
vendors, donators all have first names. To remove any confusion about this fact,
it is a common practice to prefix duplicate field names with the entity acronym or
prefix. For example, your new First Name fields will be listed as studFirstName,
facFirstName, alumFirstName, venFirstName, etc. Another consideration was
just illustrated, single word field names. It will be challenging (to say the least) to
your DBA to retrieve data from field names that are made up of multiple words.
SQL has a solution, and it is to use brackets to close field names. For example,
the Student First Name field can be retrieved by stating [Student First Name].
However, it is not a good practice use separate words in a field name, and it
should be avoided. Your DBMS will likely have a Caption property, which can
be used to make an attribute like facLastName appear as Faculty Last Name. Be
very careful when using acronyms when creating attributes. The field QOH may
mean nothing to some users, and Quantity On Hand to others. Only use
acronyms if the organization or the members of a given module, universally
understands them. Another consideration is to make sure that your field names
describe a single point of data. For example, a field named Phone/Fax is
inappropriate. A normalized database would separate these fields creating a
Phone attribute and Fax attribute. Also, make sure that you field names are in
singular form. Plural form suggests to the user that you can maintain multiple
values in a single cell. The field Phone Numbers suggests that you can place as
many phone number as you wish, which would create problems with multi-valued
columns. Finally, make sure that your attributes maintain a single value. Placing
multiple values in a single cell makes data retrieval very difficult to nearly
file:///C|/Books/Books/Database/Support/Database%20Design.htm (19 of 37) [12/18/2005 1:23:41 PM]
Database Design
impossible.
Creating an SQL statement that will return the number of rooms for a given
building could derive the above information. In the first picture below, I have a
small example table with a list of rooms for a single building. In the second
picture below, I am displaying the number rooms using a field alias (RoomCount)
to return the total number of rooms that are available in a given building.
General Specs
Just to recap, the Name property is used to represent the attribute for the given
relation. At times, the Name property is not very clear, so many DBMS will have
a Caption property, which is used to display a more meaningful name to the user.
For example, the attribute cusFirstName may have the caption First Name. Some
DBMS have a Description property, which can be used to offer a more detailed
explanation of a given attribute.
Physical Specs
The physical field specifications for an attribute include the data type, length,
character support, formatting, and the input mask. The data type is used to
establish the appropriate data constraints the attribute will manage. For example,
the Boolean data type is used to manage fields that are used to return true or false
values. In Microsoft Access, this data type is called Yes/No. At the end of the
section, I have included a table of the most common data types with brief
explanations of their limitations.
The Input Mask would used to allow the user to enter the important information
such as the numbers, and the DBMS will apply the appropriate syntax once the
data has been entered. Finally, some DBMS allow you to restrict the values that
will be placed in a specific field. For example, take the state field which should
not accommodate numeric values, and use the appropriate character support to
restrict the field to only allow for character strings. For example, [A-Z], or ![0-
9].
Data Types
Data Type Description
LOGICAL Used to store attributes as True/false
values
CHAR Used to store any type of numeric or
text data. Use the String data type for
number that will not be calculated in
the database project
INTEGER Used to store records as whole numbers
LONG Used to store number that exceeds the
value of an integer and it can maintain
decimal values
DATE Used to store date value
TIME Used to store time values
MONEY Used to store monetary data
Logical Specs
The last category of field specifications is the logical category. The logical
specifications discussed in this text will include the required, default, duplicates,
and validation properties. If a field cannot remain null, then the required
property must be set to true. You want to make sure that all categories of data
that is required are enforced in the physical design stage to ensure integrity.
Sometimes, in order to make data entry as easy as possible, you will want to
In order to ensure data integrity, you may only want a specific value to be stored
once in a database. Set the duplicate property to false to prevent a value from
appearing more than once in the relation. And the final property to consider is
validation. You may want to list the valid values in a relation, or impose a logical
reference to what a logical value would be. For example, the validation property
for gender would list M for male, and F for female. A logical expression could be
a built in SQL statement that would restrict the list of options a student choose in
declaring a major based on the appropriate prerequisites.
Establish Indexes
As your database projects grow, there will be a decrease in the performance of the
system. Indexes are used to help to make sure that your retrieval of data in a
given project sill return information as quickly as possible. They are especially
valuable with multiple table queries. In short, indexes improve the optimization
of your data.
When you create indexes, your queries will spend less time cycling through an
entire relations. Your records behind the scenes are sorted so that your queries
will be as efficient as possible. In many cases, a DBMS will make the Primary
Key attribute an index. However, the key attribute may not be an appropriate
manner for searching a record. Consider an attribute that is not as clear as an
EmployeeID. The EmployeeID column is used as the primary key and is used to
uniquely identify each employee. In many cases a user of your database may not
readily know the EmploeeID. It is more likely that he or she will want to search
for an employee using more natural descriptions such as LastName. An index
imposed on employee last names would mean that the DBMS would spend less
time searching for the desired records. The DBMS would go directly to the
record(s) that was selected. This could be a serious improvement in speed,
especially when you are considering tens of thousands or even millions of
records.
Depending on the DBMS, the implementation of indexes can take place either
manually, programmatically, or the local version of SQL. Implementing indexes
using SQL will be discussed in more detail in the SQL-Action chapter. See your
DBMS on-line help to determine if your system supports indexes, and how it can
be implemented.
You may have hundreds or even thousands of questions to prepare your database
project for. To ensure that you do not ask the same questions multiple times,
make sure that you categorize your questions in each module. Secondly, make
sure that you use parameter queries as often as possible to allow a single query to
have even greater flexibility. Consider a report that was designed to display
summary information about a single product. Now consider creating a new query
for each product that your company manufactures. You could potentially have
over one thousand queries to answer very similar questions. A parameter query
will allow the developer to create very few queries, but design them so that they
are flexible enough to filter for different criteria.
Impose Constraints
By now, you would have already imposed the more obvious relational
constraints, such as handling cascading issues. At this stage, you will want to
impose the business rules that are not readily manageable by any particular
DBMS. For example, many DBMS cannot restrict a single sales rep to have no
more than one hundred customers to manage. Business rules are important to
ensure that the system is not directing the organization, but the other way around.
Migrate Data
Assuming that your database project is replacing an existing system, you will
want to carefully migrate a copy of the data from the older database project, to the
new one. You will have to be careful when migrating data from another system.
There are a lot of issues to consider. For example, compatibility in data types is
required to ensure that data are not lost. Appropriate field lengths should be
considered to ensure that data is not lost because your current constraints were
not consistent with the older systems. Finally decomposition of data is another
serious issue. How do you migrate unnormalized data from an older system to
the new one? You may have to take the more costly approach and have someone
re-enter some important data. In other cases, you may be able to implement a
custom program. A custom program would take minutes to implement, the
manual solution may take weeks, and then there is a concern with data entry
errors.
Implement Security
Security is imposed on a database project to ensure that the structure of your
database is safe from harm. That harm could be those who actively seek to
destroy database data. Others are those who seek to compromise the confidence
of the company by making inappropriate data available. And, finally, there are
others who have no intended malice, but their access to certain data is
inappropriate. There are different levels of security within a database project.
Enterprise
Enterprise security is used in order to prevent access to the entire database
product. The module the user may want to gain access to is not consequential at
this level. Once the users gains access, you can then determine what permissions
the user may have based on the security information already provided
Module
In a modulated database project environment, you may want to restrict users to a
specific department of the data. For example, they log onto a database project
that only has the given module available to the user.
Table
At times, you may only want to allow users access to specified subjects within a
given project.
Attribute
Even with access to an entire relation, it may not be appropriate for a user to see
information based on specific categories of information. An Order Entry module
may have an attribute named DateEntered, which is used to record when the
values where stored. The user would generally not be given access to this field
file:///C|/Books/Books/Database/Support/Database%20Design.htm (26 of 37) [12/18/2005 1:23:41 PM]
Database Design
for fear of his or her changing this information. Managers generally use this field
to record the speed in which an order is processed.
Record
In a database project, you can even protect your database to the specific record.
Consider two sales reps who would typically have the same security rights. With
that said, they could potentially view one another's customers. You can restrict
the viewing of specific customers based on the sales reps' rights. Sales Rep A can
only see the customers who are assigned to Sales Rep A.
Security access can even be specific to the module, entity, attribute, or even
record. Module level security suggests that we are dealing with an enterprise
system, and that users who access the shipping module would not have rights to
the customer service module. There are even time when users in the same
module will not have access to the same subjects within the same module.
Always document your security rights. In addition, keep those security rights in a
secure location.
Entity-Relationship Modeling
With smaller database projects, you can probably get away with designing a
solution without the aid of modeling. However, with larger scaled projects, you
will appreciate the help a graphic display an Entity Relation (ER) diagram can
offer. An ER Diagram is used to graphically represent your database to display
its structure and relations. In order to create an ER Diagram, you must
understand the basic notations involved.
The purpose of this section is to show the common graphic notations in order to
accurately display the design of a database project. There are many CASE tools
available that one could use to create such as diagram. The two more common
players in the market are SmartDraw, Microsoft Visio; even Microsoft Word.
There are some CASE tools that will eventually create a database structure based
on the ER Diagram called reverse engineering. In this section I will be
consistent with the terminology used in creating entity diagrams.
Strategies
There are a number of ways to represent the structure of a database. One strategy
is to use the Database Design Language (DBDL). This strategy consists of your
listing the table, with attributes, and identifying the Primary, Alternate,
Secondary, and Foreign Keys. Below is an example illustration based on Books
and Authors.
In the above example, the name of the relation is tblBook. The primary key is
denoted by the underlined attribute in the list of attributes. Below the list of
attributes, you will find the Alternate Key (AK), Secondary Key (SK), and
Foreign Key (FK), respectively.
A popular strategy is the ER Diagram. In this strategy there are two different
formats a developer can take. One is to list each item in objects such as bubbles,
diamonds and rectangles. Another strategy is the use of standard rectangles to
illustrate each entity drawing lines known as crow's feet to illustrate
relationships.
Entity
An Entity as described earlier is a subject. You use an entity to categorize the
subjects you want to capture in a database. More specifically, an entity could be
a person, place, thing, an event or activity, or even a concept. You notate an
entity by using the entity object in a CASE tool.
Lets consider a Book database project where the subjects are books, authors, and
publishers. The first entity display I will draw will be for books.
Note that the text found in the entity graph is in an uppercase format. It is the
industry standard to case the entity descriptions in an uppercase format. The
book example is considered a strong entity. A strong entity is a subject that is
independent of all other subjects. For example, you do not need a subject Author
or Publisher to have a book. A weak entity, on the other hand, is a subject that
cannot be independent of other entities. For example, the entity Type is used to
identify if the book has a hardcover, or is spiral bound, etc. The entity Type
relies on the entity Book in order for it to exist. Without the Book entity, the
Type entity would be useless. A weak entity is denoted by an entity graph that is
similar to the standard except that it appears to be one entity superimposed within
another.
Attributes
The major property within an entity is known as an attribute. An attribute is a
category, field, or even described as a column. A simple attribute is used to
record the areas of interest for a given entity. The graphic display used to identity
an attribute is the ellipses.
An area of interest Book entity is the Title of the book. Below is a graphic
notation.
There are however different types of attributes within an entity. For example, the
primary key is treated differently from other attributes. This would be known as
the key attribute. You notate the primary key in an attribute the same as the
previous except the attribute is underlined. The ISBN would be an example
attribute that would be the primary key for the Book entity. Below is an example
illustration.
into smaller attributes. For example, the attributes Name could be decomposed to
First, Middle, Last, Prefix, and Suffix. Another example would be points of
contact. There are number of ways to communicate with an individual. For
example, you can use the phone, fax, or e-mail, all of which are points of contact.
Another example would be an address. An address is a composite attribute
because it too, like name and points of contact, is really a category of
decomposed attributes. Address attributes would be street1, street2, city, state,
and zip code. Below is an example illustration.
You will want to display attributes that may contain multiple values. A multi-
valued attribute is one that can contain more than one value. For example, a book
could be on more than one subject. This book, for example, covers database
concepts, SQL, Visual Basic, Access, and HTML. Below is an example
illustration of a multi-valued attribute.
Some attributes of interest are not documented; they are determined by values
found in the current entity or other entities within the database project. These are
known as derived or calculated attributes. For example, in order to derive your
employee's age, you would take the employee's date of birth and subtract it from
today's date, and divide that from 365 days in a year to return the employee's age
in years. Derived attributes look similar to the simple attribute notation except
the derived attributes are perforated.
Relationships
The last topic in the section is on relationships. It is important to graphically
represent relations in an ER diagram to truly get a perspective of how each entity
works together with the others. I will only focus on a standard relationship to
entities and weak entities. The diamond illustrated below graphically represents a
relationship between entities.
There are other notations to describe relationships. These are known as crow's
feet because of their obvious appearance. Below is an example notation to
describe a many to many relationship between a book and publisher.
In the above illustration, it shows that a book can only be written for a single
publisher, but that a publisher can have many books. The degree of a relation is
import to notate graphically so errors are avoided during the physical design
phase. Below are graphic representations of the one to one, one to many and
many to many symbols, respectively.
Not all relationships are required, as you already know. This fact can be
illustrated by using the following notation.
Like the more graphical approach, the strategy that includes crow's feet has
accommodations for specific entity types. For example, a weak entity is denoted
by the following illustration.
Summary
As you have discovered, the database design is quite complex and detailed. The
first phase is the conceptual design, where you only consider the implications of
file:///C|/Books/Books/Database/Support/Database%20Design.htm (34 of 37) [12/18/2005 1:23:41 PM]
Database Design
your design. You spend time thinking about the modules that will be involved in
the database project. You will consider the entities found in each module. If
there is redundancy in the entities, you will merge them later. You will also
consider the attributes during the conceptual design phase, the validation, keys,
and recapitulation.
An important aspect of the design process is to make your design clear. The most
widely used strategy is to use an ER Diagram. An ER diagram will make your
design more obvious to a lay person. In addition, there are many CASE tools
available that will help create ER diagrams. You can even use a tool as simple as
a word processor.
Questions
Answers
1. Keys, Validation, Recap, Modules, Entities, and Attributes
2. ER Diagram
3. Planning
4. Calculated or Derived
5. Physical
6. Attribute Name, Caption, and/or Description
7. Enterprise
8. The graphic is perforated
9. False
10. DBDL
Terms