Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Chapter 18
Data and Storage
Management
Agenda
Disk
Modern disk drives are more reliable than in years past,
with an ever-increasing mean time between failures
(MTBF).
Disk drives may achieve in excess of a hundred thousand
hours of availability before failing.
But the mechanical nature of the disk drive renders them more
vulnerable to failure than other computer components.
As the number of disk drives in a system increases, the
vulnerability of the system increases.
Data Growth
IBM: Inside IBM we talk about ten times more connected people, a
hundred times more network speed, a thousand times more devices, and
a million times more data.
IDC Corporation: the amount of digital information created in the world in
2010, exceeded a zettabyte (1 trillion gigabytes), for the first time. The
amount of digital information created in 2011 will surpass 1.8 zettabytes.
A 2011 study shows that almost all shops reported data growth over the
past year and one-third reported the amount of data within their
enterprises grew by 25% or more during the past year.
About 10% have data stores in the petabyte range.
27% exceeded 100TB total
Byte
8 bits
KB
Kilobyte
1,024 bytes
210 bytes
MB
Megabyte
1,024 KB
220 bytes
GB
Gigabyte
1,024 MB
230 bytes
TB
Terabyte
1,024 GB
240 bytes
PB
Petabyte
1,024 TB
250 bytes
EB
Exabyte
1,024 PB
260 bytes
ZB
Zettabyte
1,024 EB
270 bytes
YB
Yottabyte
1,024 ZB
280 bytes
Storage Devices
The DBA may choose to use multiple storage devices for
the different files to:
Align the performance requirements of the file with the
appropriate disk device
Separate indexes from data for performance reasons
Isolate the transaction log on a separate and very fast
device
Isolate temporary and work files on a single volume; if
a disk error occurs, temporary files can be deleted and
redefined with no backup and recovery implications
Spread the data across multiple devices to facilitate
parallel access.
Space Management
Data Rows
Offset Table
Page Components
The page layout for a database object usually consists of
three basic components.
Page header -housekeeping information.
The page header may include a page identifier, forward and
backward links to other data pages, an identifier indicating to
which table the page belongs, free space pointers, and the
minimum row length for the table.
Allocation Pages
The DBMS uses an allocation page to manage
the other pages in the database object.
Sometimes call a space map page.
Row Data.
This consists of the actual data contents of the data columns
for the row, in the order of their definition. Depending on the
DBMS, the variable- and fixed-length columns may be
separated.
Offset Tables.
Optionally, the record may contain an offset table with
pointers to manage and control where variable-length fields
are stored within the row.
Row length.
For variable-length keys, the index may need to store the
actual length of the indexed data.
Page pointer.
Points to the physical location of the data page in the table
that actually holds the indexed data.
etc.
Index fragmentation.
A more serious concern for DBAs.
Indexes can become disorganized and fragmented as data is
added, modified, and removed from indexed tables.
Fragmented indexes consist of many scattered areas of storage
that are too small to be used productively.
This causes wasted space, which can hinder performance and
increase storage costs.
Use tools to scan indexes for fragmentation and take actions to
defragment or rebuild indexes on a regular basis.
Storage Options
SSD
RAID
JBOD
SAN
NAS
Tiered Storage
RAID
RAID: Redundant Arrays of Independent
Disks
RAID combines multiple disk devices into an
array.
There are many levels of RAID technology,
which deliver different degrees of fault
tolerance and performance.
RAID can improve availability, remove the
need for outages to change hardware, and
overall minimize downtime.
RAID Levels
Vendors provide varying levels of
support for the RAID levels that have
been defined.
These various levels of RAID support
continuous availability through
combinations of functions:
Mirroring
Striping
Parity
Mirroring
Mirroring occurs when complete
copies of the data are made on at
least two disk drives, and all changes
made to the data are made
simultaneously to both copies.
If one fails, access is
automatically shifted
to the remaining copy.
Striping
Striping occurs when subsets of data
are spread across multiple disk
drives.
If any one drive fails, the impact of the
failure is limited to the data within the
stripe on that disk.
Parity
Parity bits are encoded data that can
be used to facilitate the
reconstruction of the original data.
In the event that all or part of the data
cannot be accessed if the drive fails.
RAID Levels
Level
Fault
Read
Write
Cost
No RAID
tolerance
No
performance
Normal
performance
Normal
Inexpensive
Level 0
No
Fast
Fast
Expensive
Level 1
Yes
Normal
Normal
Moderate
Level 2
Yes
Normal
Normal
Moderate
Level 3
Yes
Normal
Normal
Moderate
Level 4
Yes
Normal
Slow
Moderate
Level 5
Yes
Fast
Slow
Expensive
Level 6
Yes
Fast
Slow
Expensive
Level 10
Yes
Fast
Normal
Expensive
Level 50
Yes
Normal
Normal
Expensive
Level
Yes
Fast
Fast
Very
0+1
Expensive
RAID-0
RAID-1
RAID-2
RAID-3
RAID-4
RAID-5
RAID-6
RAID-10
RAID-50
RAID-0+1
Proprietary RAID
A number of proprietary variants and
levels of RAID have been defined by
the storage vendors.
If you are in the market for RAID
storage, be sure you understand
exactly what the storage vendor is
delivering.
http://en.wikipedia.org/wiki/RAID
Evaluating RAID
Favor fault-tolerant RAID levels for database files. Database
files not on fault-tolerant disks are subject to downtime and
lost data.
Choose the appropriate disk system for the type of activity
each database object will experience. For example, you
might want to implement two separate RAID systemsone
at RAID-5 for data that is heavily read-focused, such as
analysis and reporting, and another at RAID-1 (or RAID-0+1)
for transaction data that is frequently written and updated.
For high performance, mission critical implementations with
sufficient budget consider RAID10 for its performance and
fault tolerance.
If you have the budget at your disposal, consider RAID-0+1
because it has fast read, fast write, and fault tolerance.
JBOD
JBOD stands for just a bunch of disks.
The term is used to differentiate
traditional disk technologies from newer
storage technology.
Benefits of SAN
SAN affords the following
benefits:
Shared storage between multiple
hosts
High I/O performance
Server and storage consolidation
NAS: Network-Attached
Storage
NAS refers to storage that can be
accessed directly from the network.
Hosts or client systems can read and write
data over a network interface
Tiered Storage
With tiered storage different categories of data are
assigned to different types of storage media in order
to reduce total storage cost.
Can be important for organizations that manage significant
amounts of data that continues to grow in volume.
Tiered storage can offer some financial relief.
Multi-Temperature Data
Popularized by Teradata.
This technique deploys four categories:
Hot
Warm
Cool
Dormant
Categorize Devices to
Temperature
The next step is to categorize your storage devices for use with
each data temperature.
Hot data that is I/O intensive and requires high availability can be
placed on storage devices offering high performance, reliability,
advanced features and large capacity.
RAID or SSD
Warm data is less frequently accessed than hot data and often is
read more than it is modified. Less expensive disk with good
performance and reliability, but not top-of-the-line.
SATA and SCSI configured storage can work well for warm data.
Cool data is not accessed often. Such data usually still needs to
reside on direct access storage devices.
Perhaps NAS (network attached storage) or object based storage.
Dormant data, which has not been accessed for a long time
(perhaps years) and whose data model is stable.
Offline storage systems such as intelligent tape or optical disk.
Capacity Planning
Capacity planning measures and compares system
capacity against requirements.
Determine whether your existing infrastructure can
sustain the anticipated workload by:
Measure current capacity
Gauge the growth of capacity over time
Factor in the anticipated capacity requirements of new
corporate and IT initiatives
Storage Planning
From a storage perspective, this may involve
simply adding more disk devices and assigning
them to the DBMS.
However, it may involve additional tasks to
support additional data and users, such as the
following:
Redesigning applications
Redesigning databases
Modifying DBMS parameters
Reconfiguring hardware components
Adjusting software interfaces
Questions