Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Marc Farley President of Building Storage, Inc Author, Building Storage Networks, Inc.
Agenda
Lesson 1: Basics of SANs Lesson 2: The I/O path Lesson 3: Storage subsystems Lesson 4: RAID, volume management and virtualization Lesson 5: SAN network technology Lesson 6: File systems
Lesson #1
Connecting
Concentrator Router
Dish
VPN
Connecting
Networking or bus technology Cables + connectors System adapters + network device drivers Network devices such as hubs, switches, routers Virtual networking Flow control Network security
Storing
Host Software
Mirroring Software
Storage Protocol
Storage Devices
Tape Drives
Storing
Device (target) command and control
Drives, subsystems, device emulation
Filing
C:\directory\file
Database Object
User/Application View
User/Application View
Filing
Namespace presents data to end users and applications as files and directories (folders) Manages use of storage address spaces Metadata for identifying data
file name owner dates
Filing
Wiring Connecting
HBA or NIC
Cable Cable
Storing
Network Switch/hub
Disk Drive
NAS is filing over a network SAN is storing over a network NAS and SAN are independent technologies They can be implemented independently They can co-exist in the same environment They can both operate and provide services to the same users/applications
NAS
Filing Filing
SAN Network
Storing Storing
Wiring Connecting
Storing Storing
Lesson #2
Memory
Processor
Memory Bus
MultiPathing
Device Driver
Switches, hubs, routers, bridges, gatways Port buffers, processors Backplane, bus, crossbar, mesh, memory
Fabric Services
Routing
Flow Control
Virtual Networking
Network Ports
Cache
Resource Manager
Disk drives
Tape drives
Tape Media
App
Cabling
Network Systems
Fabric Services
Routing
Flow Control
Virtual Networking
Cache
Resource Manager
Disk drives
Tape drives
Lesson #3
Storage subsystems
Cache Memory
Power
Multiple hot swappable power supplies Hot swappable cooling fans Data redundancy via RAID Multi-path support
Network ports to storage resources
Exported storage
Exported storage
Exported storage
Exported storage
Exported storage
Exported storage
unique ID (UUID) - sometimes referred to as a serial number An LU can be represented by multiple logical unit numbers (LUNs) Provisioning associates LUNs with LUs & subsystem ports
Provisioning storage
LUN 0 LUN 1
SCSI LU UUID A
Port S1
LUN 1 LUN 2 LUN 2
SCSI LU UUID B
Port S2
Port S3
LUN 3 LUN 3
SCSI LU UUID C
Port S4
LUN 0
SCSI LU UUID D
Controller functions
Multipathing
LUN X
Path 1
SCSI LU UUID A
MP SW
LUN X
Path 2
Caching
Exported Volume Exported Volume
Write Caches 1. Write Through (to disk) 2. Write Back (from cache)
Tape subsystems
Tape Drive Tape Drive Tape Drive Tape Drive
Tape Slots
Robot
Subsystem management
Ethernet/TCP/IP
Out-of-band management port
In-band management
Storage Subsystem
Data redundancy
Duplication
2n
Parity
n+1
Difference
-1
d(x) = f(x) f(x-1) f(x-1)
I/O Path
Mirroring Operator Terminate I/O & regenerate new I/Os Error recovery/notification
Point-in-time snapshot
Subsystem Snapshot
Host
Lesson #4
Duplication
2n
Parity
n+1
Difference
-1
d(x) = f(x) f(x-1) f(x-1)
Original goals were to reduce the cost and increase the capacity of large disk storage
Benefits of RAID
Capacity scaling
Disk mirroring (striping to 2 disks) Parity RAID (striping to more than 2 disks)
Capacity scaling
Combined extents 1 - 12 Storage extent 1 Storage extent 2 Storage extent 3 Storage extent 4
Storage extent 5
Storage extent 6
Storage extent 7
Storage extent 8
Storage extent11
Storage extent12
Performance
RAID controller (microsecond performance)
Disk drive
Disk drive
Disk drive
Disk drive
Disk drive
Disk drive
Disk drives (Millisecond performance) from rotational latency and seek time
Parity redundancy
RAID arrays use XOR for calculating parity
Operand 1 False False True True Operand 2 False True False True XOR Result False True True False
XOR {M1&M2&M3&P}
Parity rebuild
The process of recreating data on a replacement member is called a parity rebuild Parity rebuilds are often scheduled for non-production hours because performance disruptions can be so severe
XOR {M1&M2&M3&P}
RAID 0+1, 10
RAID Controller Hybrid RAID: 0+1
OS kernel
File system
Volume management
RAID & partition management Device driver layer between the kernel and storage I/O drivers
Volume Manager
Server system
Volume managers can use all available connections and resources and can span multiple SANs as well as SCSI and SAN resources
Virtual Storage
SAN HBA
In-band virtualization
Exported virtual storage
Disk subsystems
Out-of-band virtualization
Virtualization agents
Disk subsystems
Lesson #5
SAN networks
Fibre channel
The first major SAN networking technology Very low latency High reliability Fiber optic cables Copper cables Extended distance 1, 2 or 4 Gb transmission speeds Strongly typed
Fibre channel
A Fibre Channel fabric presents a consistent interface and set of services across all switches in a network Host and subsystems all 'see' the same resources
S A N Storage T a rg e t Subsystem
S A N Storage T a rg e t Subsystem
Leveraging the install base of Ethernet and TCP/IP networks iSCSI native SAN over IP FC/IP FC SAN extensions over IP
iSCSI
Native storage I/O over TCP/IP
New industry standard Locally over Gigabit Ethernet Remotely over ATM, SONET, 10Gb Ethernet
iSCSI equipment
Cables
Copper and fiber
Network systems
Switches/routers Firewalls
FC/IP
Extending FC SANs over TCP/IP networks FCIP gateways operate as virtual E-port connections FCIP creates a single fabric where all resources appear to be local
E-port
OneTCP/IP fabric
LAN, MAN or WAN
E-port
High-end SAN switches have latencies of 1 - 3 sec Transaction processing requires lowest latency
Most other applications do not
Directors
64+ ports HA redundancy Dual system supervisor Live SW upgrades
Dual star
Simple network + redundancy Single hop Independent or integrated fabric(s)
Ring + Star
Scalable integrated fabric 1 to 3 hops
Lesson #6
File systems
Filing Filing
Storing Storing
Think of the storage address space as a sequence of storage locations (a flat address space)
1 1 1 1 1 1 1 1 1 1 11 11 111 1 1 11 11 11 1 1 11 1 11 1. . . . 1 11 1 1 11 1. . . . . 1 11 1 1 11 1. . . . . 1 11 1 1 1. . . . . . 1 11 1 1 1. . . . . . 1 11 1 1 1. . . . . . 1 11 1 1 1. . . . . . 1 11 1 1 11 11 11 11 11 11 11 11 1
Superblocks are known addresses used to find Superblocks file system roots (and mount the file system)
SB
SB
File systems must have a known and Filing and Scaling dependable address space
The fine print in scalability - How does the filing function know about the new storing address space?
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
Filing Filing
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 11 1 1 1 1 1 1
Host system
LUN
Host system
SCSI arbitration
Host system ID 7
Target IDs
6 5 4 3
2 1 0
The highest number address 'wins' arbitration to access the bus next
Host system
LUN
write
Overlapped I/O
read
status
(Reshuffled I/Os)
Bus length
servers and storage tied together
Single initiator
access to data depends on server
Lesson #4
Disk drives
Disk drive components Areal density Rotational latency Seek time Buffer memory Dual porting
Disk drive areal density Amount of signal per unit area of media Keeps pace with Moore's law
Areal density doubles approximately every 18 months
Increasingly smaller magnetic particles Continued refinement of head technology Electro-magnetic physics research
Application level I/O operations can generate multiple disk accesses, each impacted by rotational latency
Memory nanoseconds 10 -9
Disk media
Seek Time Time needed to position the actuator over the track Equivalent to rotational latency in time Disk head
Disk actuator
Disk media
Overcome mechanical latencies with faster memory storage Enables overlapped I/Os to multiple drives Performance metrics
Burst transfer rate = transfer in/out buffer memory (Sustained transfer rate = transfer with track changes)
Duplication
2n
Parity
n+1
Difference
-1
d(x) = f(x) f(x-1) f(x-1)
Lesson #8
Backup systems Backup operations Media rotation Backup metadata Backup challenges
Duplication
2n
Parity
n+1
Difference
-1
d(x) = f(x) f(x-1) f(x-1)
Magnetic 'ribbon'
multiple layers of backing, adhesive, magnetic particles and lube/coating corrodes and cracks requires near-perfect conditions for long-term storage
Sequential access
slow load and seek times reasonable transfer rates can hold multiple versions of files
Highly differentiated
Speeds (3MB/s to 30MB/s) Capacities (20MB to 160MB) Physical formats (layouts) Compatibility is a constant issue Mostly parallel SCSI
Highly differentiated
Speeds (3MB/s to 30MB/s) Capacities (20MB to 160MB) Physical formats (layouts)
4mm, 8mm, inch, DLT, LTO, 19mm Cartridge construction, tape lengths
Compatibility is a constant issue Mostly parallel SCSI
Longitudinal tracking Parallel data tracks written lengthwise on tape by a 'stack' of heads
Tape heads
Helical tracking Single data tracks written diagonally across tape by a rotating cylindrical head assembly
Tape
Tape head
Data tracks
Tapes
Robot
Tape drive
Tape drive
Tape drive
Tape subsystems I/O bus/network subsystem Work scheduler & manager Data mover Metadata (database or catalog) Media manager (rotation scheduler) File system and database backup agents
File server
Backup agent
Web server DB server APP server Generic Network Backup System Backup Backup Backup
agent agent agent
Ethernet network
SCSI bus
Backup server
Full
Duplication redundancy One backup for complete redundancy
Incremental
Difference redundancy Multiple backups for complete redundancy
Differential
Difference redundancy Two backups for complete redundancy
Media rotation problems What happens when wrong tapes are used by mistake?
Say you use the last Friday's tape on the next Tuesday?
Backup window = time allotted for daily backups Starts after daily processing finishes Ends before next day's processing begins
SAN
Backup software Backup software Backup software
SAN switch
Backup software
File server
Web server
DB server
APP server
LAN
Consolidated resources (especially media) Centralized administration Performance Offloads LAN traffic Platform optimization
SAN
Path management
Dual pathing Zoning LUN masking Reserve / release Routing Virtual networking
Dual pathing
Zoning 1
I/O segregation
Switch function that restricts forwarding Zone membership is based on port or address
Port zoning
Address zoning
Zone 1
Addr 1 Addr 2
Zone 2
Addr 3 Addr 4
Zone 3
Addr 5 Addr 6
Zoning 2
Address zoning allows nodes to belong to more than one zone
Zone 1
Addr 1 (server A) Addr 2 (disk subsystem port target address A) Addr 7 (tape subsystem port target address A)
#1 #2
Addr 3 Addr 4 Addr 7
Zone 2
Addr 3 (server B) Addr 4 (disk subsystem port target address B) Addr 7 (tape subsystem port target address A)
Zone 3
Addr 5 (server C) Addr 6 (disk subsystem port target address C) Addr 7 (tape subsystem port target address A)
#3
Zoning 3
Zones (or zone memberships) can be 'swapped' to reflect different operating environments
Changing zones
LUN masking
Restricts subsystem access to defined servers
Target or LUN level masking Non-response to SCSI inquiry CDB Can be used with zoning for multi-level control
Reserve / Release
SCSI function
1st access
Typically implemented in SCSI/SAN storage routers Used to reserve tape resources during backups
Reserved
Routing
Path decisions made by switches Large TCP/IP networks require routing in switches instead of in end nodes Looping is avoided by spanning tree algorithms that ensure a single path OSPF is spanning tree technology for Fibre Channel Routing is not HA failover technology
The Name Space is the representation of Name Space data to end users and applications Identification and searching Organizational structure
Associations of data
Database indexing File system linking
(Security)
Intrinsic information and accounting information Access control determines how (or if) a user or application can use the data for example, read-only Access control is often incorporated with metadata but can be a separate function
Data has attributes that describe it Storage is managed based on data attributes
Data can have security associated with it. Data can be erased, copied, renamed, etc.
Managing multiple users or applications Locking with concurrent access to data Locking has been done in multi-user systems for decades Locking in NAS has been a central issue
Lock persistence
File systems organize data in blocks Blocks are SCSIs address abstraction layer Filing functions use block addresses to communicate with storing level entities Filing systems manage the utilization of block address spaces (space management) Block address structures typically are uniform Block address boundaries are static for efficient and error-free space management
File system structure has to be verified when Journaling mounting (FSCHECK) FSCheck can take hours on large file systems Journaling file systems keep a log of file system updates Like a database log file, journal updates can be checked against actual structures Incomplete updates can be rolled forward or backward to maintain system integrity
Filing is a and filing function V/VM Filing Virtualization & volume management (V/VM) is a storing function V/VM manipulates block addresses and creates real and virtual address spaces Filing manages the placement of data in the address spaces exported by virtualization