Sei sulla pagina 1di 123

STORAGE ARCHITECTURE/ GETTING STARTED: SAN SCHOOL 101

Marc Farley President of Building Storage, Inc Author, Building Storage Networks, Inc.

Agenda
Lesson 1: Basics of SANs Lesson 2: The I/O path Lesson 3: Storage subsystems Lesson 4: RAID, volume management and virtualization Lesson 5: SAN network technology Lesson 6: File systems

Lesson #1

Basics of storage networking

Connecting

Concentrator Router

Dish

HBA or NIC Computer System

Network Switch/hub Bridge

VPN

Connecting
Networking or bus technology Cables + connectors System adapters + network device drivers Network devices such as hubs, switches, routers Virtual networking Flow control Network security

Storing

Host Software

Volume Manager Software

Mirroring Software

Storage Device Drivers

Storage Protocol

Storage Command and Transfer Protocol


(wiring, network transmission frame)

Storage Devices

Tape Drives

Disk Drives RAID Subsystem

Storing
Device (target) command and control
Drives, subsystems, device emulation

Block storage address space manipulation (partition management)


Mirroring RAID Striping Virtualization Concatentation

Filing
C:\directory\file
Database Object

User/Application View

User/Application View

Logical Block Mapping (Storage)

Filing
Namespace presents data to end users and applications as files and directories (folders) Manages use of storage address spaces Metadata for identifying data
file name owner dates

Connecting, storing and filing as a complete storage system

Filing

Wiring Connecting
HBA or NIC
Cable Cable

Storing

Network Switch/hub

Storing function in an HBA driver


Computer System

Disk Drive

NAS and SAN analysis

NAS is filing over a network SAN is storing over a network NAS and SAN are independent technologies They can be implemented independently They can co-exist in the same environment They can both operate and provide services to the same users/applications

Protocol analysis for NAS and SAN

NAS

Filing Filing

SAN Network

Storin Storingg W irin g Connecting

Integrated SAN/NAS environment


NAS Client
NAS Head Server NAS + Server SAN Initiator NAS Head System

SAN SAN Target Storage

Filing Filing Wiring Connecting

Storing Storing
Wiring Connecting

Common wiring with NAS and SAN


NAS Client NAS Head Server NAS Head System SAN SAN Storage Target

Filing Filing Wiring Connecting

Storing Storing

Lesson #2

The I/O path

Host hardware path components

Memory

Processor

Memory Bus

System I/O Bus

Storage Adapter (HBA)

Host software path components

Application Operating Filing Cache Volume System System Manager Manager

MultiPathing

Device Driver

Network hardware path components

Cabling Fiber optic Copper

Switches, hubs, routers, bridges, gatways Port buffers, processors Backplane, bus, crossbar, mesh, memory

Network software path components

Access and Security

Fabric Services

Routing

Flow Control

Virtual Networking

Subsystem path components

Network Ports

Access and Security

Cache

Resource Manager

Internal Bus or Network

Device and media path components

Disk drives

Tape drives

Tape Media

Solid state devices

The end to end I/O path picture


Memory Cache Volume MultiFiling System Manager Manager Pathing Memory System I/O Bus Bus Device Storage Driver Adapter (HBA)

App

Processor Operating System

Cabling

Network Systems

Access and Security

Fabric Services

Routing

Flow Control

Virtual Networking

Subsystem Network Poirt

Access and Security

Cache

Resource Manager

Internal Bus or Network

Disk drives

Tape drives

Lesson #3

Storage subsystems

Generic storage subsystem model


Controller (logic+processors) Access control Network Ports Resource manager
Storage Resources

Cache Memory

Internal Bus or Network

Power

Redundancy for high availability

Multiple hot swappable power supplies Hot swappable cooling fans Data redundancy via RAID Multi-path support
Network ports to storage resources

Physical and virtual storage


Exported storage Exported storage

Physical storage device

Physical storage device

Exported storage

Exported storage

Exported storage

Exported storage

Physical storage device


Subsystem Controller Resource Manager (RAID, mirroring, etc.)

Physical storage device

Exported storage

Exported storage

Hot Spare Device

SCSI communications architectures determine SAN operations


SCSI communications are independent of connectivity SCSI initiators (HBAs) generate I/O activity They communicate with targets Targets have communications addresses Targets can have many storage resources Each resource is a single SCSI logical unit (LU) with a universal

unique ID (UUID) - sometimes referred to as a serial number An LU can be represented by multiple logical unit numbers (LUNs) Provisioning associates LUNs with LUs & subsystem ports

A storage resource is not a LUN, its an LU

Provisioning storage
LUN 0 LUN 1
SCSI LU UUID A

Port S1
LUN 1 LUN 2 LUN 2

Physical storage devices

SCSI LU UUID B

Port S2

Port S3

LUN 3 LUN 3

SCSI LU UUID C

Physical storage devices

Port S4

LUN 0

SCSI LU UUID D

Physical storage devices

Controller functions

Multipathing

LUN X

Path 1
SCSI LU UUID A

MP SW
LUN X

Path 2

Caching
Exported Volume Exported Volume

Controller Cache Manager


Exported Volume Exported Volume

Read Caches 1. Recently Used 2. Read Ahead

Write Caches 1. Write Through (to disk) 2. Write Back (from cache)

Tape subsystems
Tape Drive Tape Drive Tape Drive Tape Drive

Tape Subsystem Controller

Tape Slots

Robot

Subsystem management

Now Now with with SMIS SMIS

Management station browser-based network mgmt software

Ethernet/TCP/IP
Out-of-band management port

In-band management

Storage Subsystem

Exported Storage Resource

Data redundancy

Duplication

2n

Parity

n+1
Difference

-1
d(x) = f(x) f(x-1) f(x-1)

Duplication redundancy with mirroring


Host-based Within a subsystem

I/O Path

Mirroring Operator Terminate I/O & regenerate new I/Os Error recovery/notification

I/O PathA I/O PathB

Duplication redundancy with remote copy


Host Uni-directional (writes only)

Point-in-time snapshot

Subsystem Snapshot
Host

Lesson #4

RAID, volume management and virtualization

RAID = parity redundancy

Duplication

2n

Parity

n+1
Difference

-1
d(x) = f(x) f(x-1) f(x-1)

History of RAID Late 1980s R&D project at UC Berkeley


David Patterson Garth Gibson

(independent) Redundant array of inexpensive disks

Striping without redundancy was not defined (RAID 0)

Original goals were to reduce the cost and increase the capacity of large disk storage

Benefits of RAID

Capacity scaling

Combine multiple address spaces as a single virtual address

Performance through parallelism

Spread I/Os over multiple disk spindles

Reliability/availability with redundancy

Disk mirroring (striping to 2 disks) Parity RAID (striping to more than 2 disks)

Capacity scaling
Combined extents 1 - 12 Storage extent 1 Storage extent 2 Storage extent 3 Storage extent 4

Exported RAID disk volume (1 address)

Storage extent 5

Storage extent 6

Storage extent 7

Storage extent 8

RAID Controller (resource manager)

Storage Storage extent 9 extent10

Storage extent11

Storage extent12

Performance
RAID controller (microsecond performance)

Disk drive

Disk drive

Disk drive

Disk drive

Disk drive

Disk drive

Disk drives (Millisecond performance) from rotational latency and seek time

Parity redundancy
RAID arrays use XOR for calculating parity
Operand 1 False False True True Operand 2 False True False True XOR Result False True True False

XOR is the inverse of itself


Apply XOR in the table above from right to left Apply XOR to any two columns to get the third

Reduced mode operations


When a member is missing, data that is accessed must be reconstructed with xor An array that is reconstructing data is said to be operating in reduced mode System performance during reduced mode operations can be significantly reduced

XOR {M1&M2&M3&P}

Parity rebuild

The process of recreating data on a replacement member is called a parity rebuild Parity rebuilds are often scheduled for non-production hours because performance disruptions can be so severe

RAID Parity Rebuild

XOR {M1&M2&M3&P}

RAID 0+1, 10
RAID Controller Hybrid RAID: 0+1

DiskDisk drive drive 1

Disk Disk drive drive 2

Disk Disk drive drive 3

Disk Disk drive drive

Disk Disk drive drive 5

4 Mirrored pairs of striped members

Volume management and virtualization


Storing level functions Provide RAID-like functionality in host systems and SAN network systems Aggregation of storage resources for:
scalability availability cost / efficiency manageability

OS kernel

File system

Volume management

RAID & partition management Device driver layer between the kernel and storage I/O drivers

Volume Manager

HBA drivers HBAs

Server system

Volume managers can use all available connections and resources and can span multiple SANs as well as SCSI and SAN resources

Virtual Storage

Volume manager HBA drivers

SCSI disk resource


SCSI HBA SCSI Bus SAN cable

SAN disk resources SAN Switch

SAN HBA

SAN storage virtualization

RAID and partition management in SAN systems Two architectures:

In-band virtualization (synchronous) Out-of-band virtualization (asynchronous)

In-band virtualization
Exported virtual storage

SAN virtualization I/O Path system

System(s), switch or router

Disk subsystems

Out-of-band virtualization

Virtualization management system

Distributed volume management

Virtualization agents

Virtualization agents are managed from a central system in the SAN

Disk subsystems

Lesson #5

SAN networks

Fibre channel
The first major SAN networking technology Very low latency High reliability Fiber optic cables Copper cables Extended distance 1, 2 or 4 Gb transmission speeds Strongly typed

Fibre channel
A Fibre Channel fabric presents a consistent interface and set of services across all switches in a network Host and subsystems all 'see' the same resources

S A N Storage T a rg e t Subsystem

S A N Storage T arg e t Subsystem

S A N Storage T a rg e t Subsystem

Fibre channel port definitions

FC ports are defined by their network role

N-ports: end node ports connecting to fabrics


L-ports: end node ports connecting to loops NL-ports: end node ports connecting to fabrics or loops F-ports: switch ports connecting to N ports FL-ports: switch ports connecting to N ports or NL ports in a loop E-ports: switch ports connecting to other switch ports G ports: generic switch ports that can be F, FL or E ports

Ethernet / TCP / IP SAN technologies

Leveraging the install base of Ethernet and TCP/IP networks iSCSI native SAN over IP FC/IP FC SAN extensions over IP

iSCSI
Native storage I/O over TCP/IP
New industry standard Locally over Gigabit Ethernet Remotely over ATM, SONET, 10Gb Ethernet

iSCSI TCP IP MAC PHY

iSCSI equipment

Storage NICs (HBAs)


SCSI drivers

Cables
Copper and fiber

Network systems
Switches/routers Firewalls

FC/IP
Extending FC SANs over TCP/IP networks FCIP gateways operate as virtual E-port connections FCIP creates a single fabric where all resources appear to be local

E-port

FCIP FCIP Gateway Gateway

OneTCP/IP fabric
LAN, MAN or WAN

FCIP FCIP Gateway Gateway

E-port

SAN switching & fabrics

High-end SAN switches have latencies of 1 - 3 sec Transaction processing requires lowest latency
Most other applications do not

Transaction processing requires non-blocking switches


No internal delays preventing data transfers

Switches and directors


Switches
8 48 ports Redundant power supplies Single system supervisor

Directors
64+ ports HA redundancy Dual system supervisor Live SW upgrades

SAN topologies Star



Simplest single hop

Dual star

Simple network + redundancy Single hop Independent or integrated fabric(s)

SAN topologies N-wide star



Scalable Single hop Independent or integrated fabric(s)

Core - edge Scalable 1 3 hops integrated fabric

SAN topologies Ring



Scalable integrated fabric 1 to N2 hops

Ring + Star

Scalable integrated fabric 1 to 3 hops

Lesson #6

File systems

File system functions


Name space Access control Metadata Locking Address space management

Filing Filing

Storing Storing

Think of the storage address space as a sequence of storage locations (a flat address space)
1 1 1 1 1 1 1 1 1 1 11 11 111 1 1 11 11 11 1 1 11 1 11 1. . . . 1 11 1 1 11 1. . . . . 1 11 1 1 11 1. . . . . 1 11 1 1 1. . . . . . 1 11 1 1 1. . . . . . 1 11 1 1 1. . . . . . 1 11 1 1 1. . . . . . 1 11 1 1 11 11 11 11 11 11 11 11 1

Superblocks are known addresses used to find Superblocks file system roots (and mount the file system)

SB

SB

File systems must have a known and Filing and Scaling dependable address space
The fine print in scalability - How does the filing function know about the new storing address space?

1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1

Filing Filing

Storing Storing Storing Storing

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 11 1 1 1 1 1 1

Lesson #2 SCSI's role in storage networking


Legacy open systems server storage Physical parallel bus Independent master/slave protocol Storing in SANs
Compatibility requirements with system software force the use of the SCSI protocol

Storing and wiring in NAS


SCSI and ATA (IDE) used with NAS

Parallel SCSI bus technologies

A bus, with address lines and data lines


8-bit and 16-bit (narrow and wide) Single ended, differential, low voltage differential (LVD) electronics 5MB, 10MB, 20MB, 40MB, 80MB, 160MB, 320MB
Ultra SCSI 3 is 320 MB/sec

Distances vary from 3 to 25 meters


Current LVD SCSI is 12 meters

SCSI command protocol Master/slave relationships


host = master, device = slave

Independent of physical connectivity CDBs = Command Descriptor Blocks


Command format Used for both device operations and data xfers

Serial SCSI standard created and implemented as:


Fibre Channel Protocol (FCP) iSCSI

SCSI addressing model

Host system

Target storage subsystem

LUN

16 bus addresses with LUN sub-addressing

SCSI daisy chain connectivity

Host system

Target devices or subsystems


Storage Storage Storage Storage interface interface interface interface

In / Out In / Out In / Out In /

SCSI arbitration

Host system ID 7

Target IDs

6 5 4 3

2 1 0

The highest number address 'wins' arbitration to access the bus next

SCSI resource discovery SCSI inquiry CDB tell me your resources

Host system

Target storage subsystem

LUN

There is no domain server concept in SCSI

SCSI performance capabilities

write

Overlapped I/O
read

status

Tagged command queuing

(Reshuffled I/Os)

Parallel SCSI bus shortcomings

Bus length
servers and storage tied together

Single initiator
access to data depends on server

A standard full of variations


change is the only constant

Lesson #4

Disk drives
Disk drive components Areal density Rotational latency Seek time Buffer memory Dual porting

Disk drives Complex electro-mechanical devices


Media Motor and speed control logic Bearings and suspension Actuator (arm) Read/write heads Read/write channels I/O controller (ext interface + int operations) Buffer memory Power

Disk drive areal density Amount of signal per unit area of media Keeps pace with Moore's law
Areal density doubles approximately every 18 months

Increasingly smaller magnetic particles Continued refinement of head technology Electro-magnetic physics research

Rotational latency Time for data on media to rotate underneath heads


faster rotational speed = lower rotational latency 2 to 10 milliseconds are common

Application level I/O operations can generate multiple disk accesses, each impacted by rotational latency

Memory nanoseconds 10 -9

SAN switch microseconds 10 -6

Disk drive milliseconds 10 -3

Rotational latency & filing systems

Filing systems determine contiguous data lengths


(file systems and databases)

Block size definitions


512 2k 4k 16k 512k 2M

Disk media

Seek Time Time needed to position the actuator over the track Equivalent to rotational latency in time Disk head

Disk actuator

Disk media

Disk drive buffer memory

FIFO memory for data transfers


not cache

Overcome mechanical latencies with faster memory storage Enables overlapped I/Os to multiple drives Performance metrics
Burst transfer rate = transfer in/out buffer memory (Sustained transfer rate = transfer with track changes)

Dual-ported disk drives

Redundant connectivity interfaces Only FC to date Controller A Controller B

Forms of data redundancy

Duplication

2n

Parity

n+1
Difference

-1
d(x) = f(x) f(x-1) f(x-1)

24Business x 7 data access is the goal Continuity

5 nines through planning and luck

There are many potential threats


People Power Natural disasters Fires

Redundancy is the key


Multiple techniques cover different threats

Lesson #8

Backup and recovery


Removable media, usually tape
removable redundancy

Backup systems Backup operations Media rotation Backup metadata Backup challenges

Forms of data redundancy in backup

Duplication

2n

Parity

n+1
Difference

-1
d(x) = f(x) f(x-1) f(x-1)

Backup and recovery tape media

Magnetic 'ribbon'
multiple layers of backing, adhesive, magnetic particles and lube/coating corrodes and cracks requires near-perfect conditions for long-term storage

Sequential access
slow load and seek times reasonable transfer rates can hold multiple versions of files

Tape drives Two primary geometries


Longitudinal tracking Helical tracking

Highly differentiated
Speeds (3MB/s to 30MB/s) Capacities (20MB to 160MB) Physical formats (layouts) Compatibility is a constant issue Mostly parallel SCSI

Tape drive formats Two primary geometries


Longitudinal tracking Helical tracking

Highly differentiated
Speeds (3MB/s to 30MB/s) Capacities (20MB to 160MB) Physical formats (layouts)

4mm, 8mm, inch, DLT, LTO, 19mm Cartridge construction, tape lengths
Compatibility is a constant issue Mostly parallel SCSI

Longitudinal tracking Parallel data tracks written lengthwise on tape by a 'stack' of heads

Tape heads

Data tracks Technologies: DLT, SDLT, LTO, QIC

Helical tracking Single data tracks written diagonally across tape by a rotating cylindrical head assembly

Tape

Tape head

Data tracks

Technologies: 4mm, 8mm, 19mm

Tape subsystems Tape libraries & autoloaders

Tape Subsystem Controller

Tapes

Robot

Tape drive

Tape drive

Tape drive

Generic backup system components

Tape subsystems I/O bus/network subsystem Work scheduler & manager Data mover Metadata (database or catalog) Media manager (rotation scheduler) File system and database backup agents

File server
Backup agent

Web server DB server APP server Generic Network Backup System Backup Backup Backup
agent agent agent

Ethernet network

Work scheduler Data mover Metadata system Media manager

SCSI bus

Backup server

Tape drive(s) or Tape subsystem

Backup operations Full (all data)


Longest backup operations Usually done over/on weekends Easiest recovery with 1 tape set

Incremental (changed data)


Shortest backup operation Often done on days of the week Most involved recovery

Differential (accumulated changed data)


Compromise for easier backups and recovery Max 2 tape set restore

Backup operations and data redundancy

Full

Duplication redundancy One backup for complete redundancy

Incremental

Difference redundancy Multiple backups for complete redundancy

Differential

Difference redundancy Two backups for complete redundancy

Media rotations Change of tapes with common names and purposes


Tape sets - not individual tapes

Backup job schedules anticipate certain tapes


Monday, Tuesday, Wednesday, etc.. Even days, odd days 1st Friday, 2nd Friday, etc.. January, February, March, etc... 1st Qtr, 2nd Qtr, etc....

Media rotation problems What happens when wrong tapes are used by mistake?
Say you use the last Friday's tape on the next Tuesday?

Data you might need to restore sometime can be overwritten!


Backup system logic may have to choose between: Not completing backup (restore will fail) Deleting older backup files (restore might fail)

Backup metadata A database for locating data on tape:


Version: create/modify date & size Date/time of backup job Tape names & backup job ID on tape Owner Delete records (don't restore deleted data!)

Transaction processing during backup


Many small files creates heavy processor loads This is where backup fails to scale Backup databases need to be pruned

Performance and capacity problems

Traditional backup challenges

Completing backups within the backup window

Backup window = time allotted for daily backups Starts after daily processing finishes Ends before next day's processing begins

Media management and administration


Thousands of tapes to manage Audit requirements are increasing On/offsite movement for disaster protection

Balancing backup time against restore complexity

LAN-free backup in SANs

Tape drives or tape subsystem

SAN
Backup software Backup software Backup software

SAN switch

Backup software

File server

Web server

DB server

APP server

Ethernet client network

LAN

Advantages of LAN-free backup

Consolidated resources (especially media) Centralized administration Performance Offloads LAN traffic Platform optimization

SAN

Path management
Dual pathing Zoning LUN masking Reserve / release Routing Virtual networking

Dual pathing

System software for redundant paths


Path management is a super-driver process Redirects I/O traffic over a different path to the same storage resource Typically invoked after SCSI timeout errors Active / active or active / passive

Static load balancing only

Zoning 1
I/O segregation
Switch function that restricts forwarding Zone membership is based on port or address

Port zoning

Address zoning

Zone 1
Addr 1 Addr 2

Zone 2
Addr 3 Addr 4

Zone 3
Addr 5 Addr 6

Zoning 2
Address zoning allows nodes to belong to more than one zone

Zone 1

For example, tape subsystems can belong to all zones

Addr 1 (server A) Addr 2 (disk subsystem port target address A) Addr 7 (tape subsystem port target address A)

#1 #2
Addr 3 Addr 4 Addr 7

Zone 2
Addr 3 (server B) Addr 4 (disk subsystem port target address B) Addr 7 (tape subsystem port target address A)

Addr1 Addr 2 Addr 7

Zone 3
Addr 5 (server C) Addr 6 (disk subsystem port target address C) Addr 7 (tape subsystem port target address A)

#3

Addr 5 Addr 6 Addr 7

Zoning 3
Zones (or zone memberships) can be 'swapped' to reflect different operating environments

Changing zones

LUN masking
Restricts subsystem access to defined servers

SCSI inquiry CDB

Target or LUN level masking Non-response to SCSI inquiry CDB Can be used with zoning for multi-level control

No response to SCSI inquiry

Reserve / Release

SCSI function

1st access

Typically implemented in SCSI/SAN storage routers Used to reserve tape resources during backups

Reserved

2nd access blocked Storage router

tape drives robotics

Routing
Path decisions made by switches Large TCP/IP networks require routing in switches instead of in end nodes Looping is avoided by spanning tree algorithms that ensure a single path OSPF is spanning tree technology for Fibre Channel Routing is not HA failover technology

The Name Space is the representation of Name Space data to end users and applications Identification and searching Organizational structure

Directories or folders in file systems Rows and columns in databases

Associations of data
Database indexing File system linking

Metadata and Access Control Metadata is the description of data

(Security)

Intrinsic information and accounting information Access control determines how (or if) a user or application can use the data for example, read-only Access control is often incorporated with metadata but can be a separate function

Data has attributes that describe it Storage is managed based on data attributes

Activity info Owner info Capacity info Whatever info

Data can have security associated with it. Data can be erased, copied, renamed, etc.

Managing multiple users or applications Locking with concurrent access to data Locking has been done in multi-user systems for decades Locking in NAS has been a central issue

NFS advisory locks provide no guarantees CIFS oplocks are enforced

Lock persistence

File systems organize data in blocks Blocks are SCSIs address abstraction layer Filing functions use block addresses to communicate with storing level entities Filing systems manage the utilization of block address spaces (space management) Block address structures typically are uniform Block address boundaries are static for efficient and error-free space management

File system structure has to be verified when Journaling mounting (FSCHECK) FSCheck can take hours on large file systems Journaling file systems keep a log of file system updates Like a database log file, journal updates can be checked against actual structures Incomplete updates can be rolled forward or backward to maintain system integrity

Filing is a and filing function V/VM Filing Virtualization & volume management (V/VM) is a storing function V/VM manipulates block addresses and creates real and virtual address spaces Filing manages the placement of data in the address spaces exported by virtualization

Potrebbero piacerti anche