Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Normalization is the process of efficiently organizing data in a database. There are two goals of
the normalization process: eliminate redundant data (for example, storing the same data in more
than one table) and ensure data dependencies make sense (only storing related data in a
table). Both of these are worthy goals as they reduce the amount of space a database consumes
and ensure that data is logically stored.
First normal form (1NF) sets the very basic rules for an organized database:
1. Eliminate duplicative columns from the same table.
2. Create separate tables for each group of related data and identify each row with a unique
column or set of columns (the primary key).
Second normal form (2NF) further addresses the concept of removing duplicative data:
1 Meet all the requirements of the first normal form.
2 Remove subsets of data that apply to multiple rows of a table and place them in separate
tables.
3 Create relationships between these new tables and their predecessors through the use of
foreign keys.
Q A clustered table will store its rows physically on disk in order by a specified column or columns
The difference is that, Clustered index is unique for any given table and we can have only one
clustered index on a table. The leaf level of a clustered index is the actual data and the data is
resorted in case of clustered index.Whereas in case of non-clustered index the leaf level is
actually a pointer to the data in rows so we can have as many non-clustered indexes as we can
on the db. The clustering index forces table rows to be stored in ascending order by the indexed
columns. There can be only one clustering sequence per table (because physically the data can
be stored in only one sequence).
• Join columns, to optimize SQL joins where multiple rows match for one or both
tables participating in the join
• Foreign key columns because they are frequently involved in joins and the DBMS
accesses foreign key values during declarative referential integrity checking
• Predicates in a WHERE clause
• Range columns
• Columns that do not change often (reduces physically reclustering)
• Columns that are frequently grouped or sorted in SQL statements
Q Triggers are basically used to implement business rules.Triggers is also similar to stored
procedures. The difference is that it can be activated when data is added or edited or deleted
from a table in a database.
Normalization & Denormalization 2
Q Delete command removes the rows from a table based on the condition that we provide with
a WHERE clause. Truncate will actually remove all the rows from a table and there will be
no data in the table after we run the truncate command.
Delete Command require Log file updation for each row of deleting process. But the Truncate
command not.So, the Truncate Command is so faster than Delete Command.
Data integrity (DI) is an important feature in SQL Server. When used properly, it ensures that
data is accurate, correct, and valid. It also acts as a trap for otherwise undetectable bugs within
your applications. However, DI remains one of the most neglected SQL Server features.
1 Entity Integrity
2 Domain Integrity
3 Referential integrity
4 User-Defined Integrity
Each of these categories of the data integrity can be enforced by the appropriate
constraints. Microsoft SQL Server supports the following constraints:
PRIMARY KEY
UNIQUE
FOREIGN KEY
CHECK
NOT NULL
A PRIMARY KEY constraint is a unique identifier for a row within a database table. Every table
should have a primary key constraint to uniquely identify each row and only one primary key
constraint can be created for each table. The primary key constraints are used to enforce entity
integrity.
Normalization & Denormalization 3
A UNIQUE constraint enforces the uniqueness of the values in a set of columns, so no duplicate
values are entered. The unique key constraints are used to enforce entity integrity as the primary
key constraints.
A FOREIGN KEY constraint prevents any actions that would destroy link between tables with the
corresponding data values. A foreign key in one table points to a primary key in another table.
Foreign keys prevent actions that would leave rows with foreign key values when there are no
primary keys with that value. The foreign key constraints are used to enforce referential integrity.
A CHECK constraint is used to limit the values that can be placed in a column. The check
constraints are used to enforce domain integrity.
A NOT NULL constraint enforces that the column will not accept null values. The not null
constraints are used to enforce domain integrity, as the check constraints.
Denormalization
Denormalization is the art of introducing database design mechanisms that enhance
performance. Successful denormalization depends upon a solid database design normalized to at
least the third normal form. Only then can you begin a process called responsible
denormalization. The only reason to denormalize a database is to improve performance.
In this section, we will discuss techniques for denormalization and when they should be used.
The techniques are
You can keep redundant data up-to-date with a trigger so that when the base table changes, the
tables carrying the redundant data also change.
An example of this technique is a vendor tracking system that contains several tables, as shown
in Figure 16.6, an entity-relationship diagram. A vendor can have multiple addresses, one of
which is marked “billing”; a vendor can have multiple contacts, one marked “primary”; and each
Normalization & Denormalization 4
The following Select statement creates a view that joins all the tables.
SELECT V.VendorID,
V.Name,
VA.Address1,
VA.Address2,
VA.City,
VA.State,
VA.PostalCode,
VA.Country,
VC.FirstName,
VC.LastName,
CP.Phone
FROM Vendor V,
VendorAddress VA,
VendorContact VC,
ContactPhone CP
WHERE V.VendorID = VA.VendorID
AND VA.AddressType = 'Billing'
AND V.VendorID = VC.VendorID
AND VC.ContactType = 'Primary'
AND VC.ContactID = CP.ContactID
AND CP.PhoneType = 'Main'
The above view is often too slow for quick lookup and retrieval. Converting that view to a table
with the same columns and datatypes and using a trigger to keep it up-to-date could give you
improved performance. More important, creating a table from a view gives developers a choice of
retrieving data from either the base tables or the redundant table.
The drawback to this technique is the performance penalty of keeping the table up-to-date. You
would use it only if the retrieval speed outweighed the cost of updating the table.
Summary columns, on the other hand, are examples of denormalization that can have substantial
performance gains. Good examples include adding MonthToDate, QtrToDate, and YearToDate
columns in the Account table of a general ledger system. Each of these columns summarizes of
data in many different records. By adding these columns, you avoid the long calculation needed
for the summary and get the same data from one read instead. Of course, this option is not very
flexible, because if you need a total from a certain date range, you still have to go back to the
base data to summarize the detail records.
A poor example of using summary fields is carrying an invoice subtotal on the header record
when your orders average five lines — for example, in a high-priced, low-volume inventory
system, such as a major appliance store. It doesn’t take much time or effort for SQL Server to add
five rows. But what if you have a low-priced, high-turnover inventory, such as a grocery store, with
hundreds of lines per order? Then it would make more sense to carry an InvoiceSubtotal field in
Normalization & Denormalization 5
Summarizing data by time period (day, month, year, etc.) or across product lines and saving the
summaries in a table or tables is a good strategy to shortcut the reporting process. This technique
is sometimes called a data warehouse. It can be expensive to keep up-to-date, because the data
collection process must also update the summary tables. One strategy to bypass the cost of the
data collection process is to replicate the transaction data to another database on another server
where the data warehouse is periodically updated.
Data Partitioning
You can partition data vertically or horizontally, depending on whether you want to split columns
or rows. You can also partition data across servers. We consider each type of partitioning below.
Subsets/Vertical Partitioning
Also called subsets, vertical partitions split columns that are not used often into new tables.
Subsets are zero- or one-to-one relationships, meaning that for each row in the main table, one
record (at most) exists in the subset table. The advantage to vertical partitioning is that the main
table has narrower records, which results in more rows per page. When you do need data from
the subset table, a simple join can retrieve the record.
Horizontal Partitioning
Horizontal partitioning, also known as row partitioning, is a more difficult type of partitioning to
program and manage. The table is split so that rows with certain values are kept in different
tables. For example, the data for each branch office could be kept in a separate table. A common
example in data warehousing — historical transactions — keeps each month of data in a
separate table.
The difficulty in horizontal partitioning comes in returning rows from each table as one result set.
You use a Union statement to create this result set, as demonstrated below.
The keys to designing an architecture involving multiple servers are a thorough knowledge of
replication and of returning results from remote procedures. For a complete discussion of
replication, see Chapter 11.
Normalization & Denormalization 6
Here is a simple example of returning a result set from a remote procedure and inserting it into a
table.
SUMMARY
Performance tuning can be a never-ending series of adjustments in which you alleviate one
bottleneck only to find another immediately. If your server is performing poorly, you need to
answer two basic questions: “Do I add more memory, disk space, or CPU?” and “Do I have
enough capacity to handle the anticipated growth of the company?” This book gives you enough
information to make informed decisions. Although more can be gained from tuning hardware than
ever before, the real gains come from tuning your queries and making good use of indexes, so
don’t stop reading here.
Introduction
Transaction Isolation Levels
Lock types
Locking optimizer hints
Deadlocks
View locks (sp_lock)
Literature
Introduction
In this article, I want to tell you about SQL Server 7.0/2000 Transaction Isolation Levels, what
kinds of Transaction Isolation Levels exist, and how you can set the appropriate Transaction
Isolation Level, about Lock types and Locking optimizer hints, about deadlocks, and about how
you can view locks by using the sp_lock stored procedure.
READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SERIALIZABLE
Microsoft SQL Server supports all of these Transaction Isolation Levels and can separate
REPEATABLE READ and SERIALIZABLE.
READ UNCOMMITTED
Normalization & Denormalization 7
When it's used, SQL Server not issue shared locks while reading data. So, you can read an
uncommitted transaction that might get rolled back later. This isolation level is also called dirty
read. This is the lowest isolation level. It ensures only that a physically corrupt data will not be
read.
READ COMMITTED
This is the default isolation level in SQL Server. When it's used, SQL Server will use shared
locks while reading data. It ensures that a physically corrupt data will not be read and will never
read data that another application has changed and not yet committed, but it not ensures that the
data will not be changed before the end of the transaction.
REPEATABLE READ
When it's used, the dirty reads and nonrepeatable reads cannot occur. It means that locks will be
placed on all data that is used in a query, and another transactions cannot update the data.
This is the definition of nonrepeatable read from SQL Server Books Online:
nonrepeatable read
When a transaction reads the same row more than one time, and between the
two (or more) reads, a separate transaction modifies that row. Because the
row was modified between reads within the same transaction, each read
produces different values, which introduces inconsistency.
SERIALIZABLE
Most restrictive isolation level. When it's used, the phantom values cannot occur. It prevents other
users from updating or inserting rows into the data set until the transaction will be completed.
phantom
Phantom behavior occurs when a transaction attempts to select a row that
does not exist and a second transaction inserts the row before the first
transaction finishes. If the row is inserted, the row appears as a phantom
to the first transaction, inconsistently appearing and disappearing.
You can set the appropriate isolation level for an entire SQL Server session by using the SET
TRANSACTION ISOLATION LEVEL statement.
This is the syntax from SQL Server Books Online:
DBCC USEROPTIONS
GO
Lock types
There are three main types of locks that SQL Server 7.0/2000 uses:
Shared locks
Update locks
Exclusive locks
Shared locks are used for operations that do not change or update data, such as a SELECT
statement.
Update locks are used when SQL Server intends to modify a page, and later promotes the
update page lock to an exclusive page lock before actually making the changes.
Exclusive locks are used for the data modification operations, such as UPDATE, INSERT, or
DELETE.
Shared locks are compatible with other Shared locks or Update locks.
Let me to describe it on the real example. There are four processes, which attempt to lock the
same page of the same table. These processes start one after another, so Process1 is the first
process, Process2 is the second process and so on.
Process1 : SELECT
Process2 : SELECT
Process3 : UPDATE
Process4 : SELECT
Process1 sets the Shared lock on the page, because there are no another locks on this page.
Process2 sets the Shared lock on the page, because Shared locks are compatible with other
Shared locks.
Process3 wants to modify data and wants to set Exclusive lock, but it cannot make it before
Process1 and Process2 will be finished, because Exclusive lock is not compatible with other lock
types. So, Process3 sets Update lock.
Process4 cannot set Shared lock on the page before Process3 will be finished. So, there is no
Lock starvation. Lock starvation occurs when read transactions can monopolize a table or
page, forcing a write transaction to wait indefinitely. So, Process4 waits before Process3 will be
finished.
After Process1 and Process2 were finished, Process3 transfer Update lock into Exclusive lock to
modify data. After Process3 was finished, Process4 sets the Shared lock on the page to select
data.
NOLOCK
HOLDLOCK
Normalization & Denormalization 9
UPDLOCK
TABLOCK
PAGLOCK
TABLOCKX
READCOMMITTED
READUNCOMMITTED
REPEATABLEREAD
SERIALIZABLE
READPAST
ROWLOCK
NOLOCK is also known as "dirty reads". This option directs SQL Server not to issue shared locks
and not to honor exclusive locks. So, if this option is specified, it is possible to read an
uncommitted transaction. This results in higher concurrency and in lower consistency.
HOLDLOCK directs SQL Server to hold a shared lock until completion of the transaction in which
HOLDLOCK is used. You cannot use HOLDLOCK in a SELECT statement that includes the FOR
BROWSE option. HOLDLOCK is equivalent to SERIALIZABLE.
UPDLOCK instructs SQL Server to use update locks instead of shared locks while reading a
table and holds them until the end of the command or transaction.
TABLOCK takes a shared lock on the table that is held until the end of the command. If you also
specify HOLDLOCK, the lock is held until the end of the transaction.
PAGLOCK is used by default. Directs SQL Server to use shared page locks.
TABLOCKX takes an exclusive lock on the table that is held until the end of the command or
transaction.
READCOMMITTED
Perform a scan with the same locking semantics as a transaction running at the READ
COMMITTED isolation level. By default, SQL Server operates at this isolation level.
READUNCOMMITTED
Equivalent to NOLOCK.
REPEATABLEREAD
Perform a scan with the same locking semantics as a transaction running at the REPEATABLE
READ isolation level.
SERIALIZABLE
Perform a scan with the same locking semantics as a transaction running at the SERIALIZABLE
isolation level. Equivalent to HOLDLOCK.
READPAST
Skip locked rows. This option causes a transaction to skip over rows locked by other transactions
that would ordinarily appear in the result set, rather than block the transaction waiting for the
other transactions to release their locks on these rows. The READPAST lock hint applies only to
transactions operating at READ COMMITTED isolation and will read only past row-level locks.
Applies only to the SELECT statement.
You can only specify the READPAST lock in the READ COMMITTED or REPEATABLE READ
isolation levels.
ROWLOCK
Use row-level locks rather than use the coarser-grained page- and table-level locks.
Normalization & Denormalization 10
Deadlocks
Deadlock occurs when two users have locks on separate objects and each user wants a lock on
the other's object. For example, User1 has a lock on object "A" and wants a lock on object "B"
and User2 has a lock on object "B" and wants a lock on object "A". In this case, SQL Server ends
a deadlock by choosing the user, who will be a deadlock victim. After that, SQL Server rolls back
the breaking user's transaction, sends message number 1205 to notify the user's application
about breaking, and then allows the nonbreaking user's process to continue.
You can decide which connection will be the candidate for deadlock victim by using SET
DEADLOCK_PRIORITY. In other case, SQL Server selects the deadlock victim by choosing the
process that completes the circular chain of locks.
So, in a multiuser situation, your application should check the error 1205 to indicate that the
transaction was rolled back, and if it's so, restart the transaction.
Note. To reduce the chance of a deadlock, you should minimize the size of transactions and
transaction times.
Sometimes you need a reference to information about locks. Microsoft recommends using the
sp_lock system stored procedure to report locks information. This very useful procedure returns
the information about SQL Server process ID, which lock the data, about locked database, about
locked table ID, about locked page and about type of locking (locktype column).
Check the link below if you need to get user name, host name, database name, index name
object name and object owner instead of their ID numbers:
Detailed locking view: sp_lock2