Sei sulla pagina 1di 98

Shared Storage Pools

Service Education
Vasu Vallabhaneni
Jacob Rosales

2011 IBM Corporation

Shared Storage Pools

Shared Storage Pools (SSP)


SSP is a server based storage virtualizer that is clustered across
multiple Power servers
Its an extension of PowerVMs existing storage virtualization
(VIOS vSCSI)
Combines existing SCSI emulation with clustering technology and
a distributed data object repository
Distributed data object repository is an advanced filesystem-like
function developed specifically for storage virtualization

SSP provides the same standard vSCSI Target


interface to client host
VIOS 2.2.1.0

Service Education

2011 IBM Corporation

Shared Storage Pools

Server and Storage integration

VIOS

VIOS
NG

VIOS

PHYP

VIOS
NG

VIOS

PHYP

VIOS
PHYP

Storage Pool (SAN & NAS)

PHYP

PHYP

VIOS
NG

33

VIOS
NG

Service Education

VIOS
NG

PHYP

VIOS
NG

VIOS
NG

VIOS
NG

2011 IBM Corporation

Shared Storage Pools

PowerVM VIOS Shared Storage Pools


Extending Integrated Storage Virtualization Beyond a Single System
PowerVM
PowerVM with Shared Storage Pools
NW

LPAR

LPAR

LPAR

LPAR

VIOS

Power
HyperVisor

LPAR

Service Education

LPAR

LPAR

Predominant PowerVM usage model today

LPAR

SAN

VIOS

Storage Pool

Power
HyperVisor

LPAR

Storage Pool

LPAR

Storage Pool

LPAR

Power
HyperVisor

LPAR

Power
HyperVisor

VIOS

LPAR

LPAR

LPAR

LPAR

VIOS

LPAR

LPAR

LPAR

LPAR

VIOS

LPAR

LPAR

LPAR

LPAR

VIOS

Power
HyperVisor

Power
HyperVisor

Storage Pool

SAN

High value extension of todays model

2011 IBM Corporation

Shared Storage Pools

SSP Software Stack

Service Education

2011 IBM Corporation

Shared Storage Pools

Internals of SSP Commands


Pre CLI Phase
Pre API Phase
CAA Phase
CAA
Dynamic Configuration

Post API Phase


Post CLI Phase

Service Education

2011 IBM Corporation

Shared Storage Pools

Post CLI Phase


Based on the return code process the exceptions
Logging of traces in /home/ios/logs/ioscli_global.trace

Service Education

2011 IBM Corporation

Shared Storage Pools

Create SSP
SSP can be created from CLI or CFGASSIST interface
Requirements
One Disk for Cluster Repository [size > 1 GB]
One Disk for Pool [size >= 10 GB]
SSP Cluster Name
Storage Pool Name

Service Education

2011 IBM Corporation

Shared Storage Pools

CLI to Create SSP


cluster -create -clustername ClusterName -repopvs PhysicalVolume
... -spname StoragePool -sppvs PhysicalVolume ... -hostname
HostName
Example
cluster create clustername ssp1 repopvs hdisk1 spname pool1 sppvs hdisk2
hostname node1.austin.ibm.com

Service Education

2011 IBM Corporation

Shared Storage Pools

Create SSP Pre API Phase


Validation of the disk usage & size
Call CAA API for create cluster
Logging of traces in /home/ios/logs/ioscli_global.trace

10

Service Education

2011 IBM Corporation

Shared Storage Pools

Create SSP CAA Phase [CAA]


Refer to CAA presentation
Logging of traces in /var/adm/ras/syslog.caa

11

Service Education

2011 IBM Corporation

Shared Storage Pools

Create SSP CAA Phase [Dynamic Configuration]


Starts when CAA calls the script /usr/sbin/vioscmd.sh
Dynamic Configuration is split into various operations for create SSP
Add Node
Join Node
Add Disk

Each operation is split into following phases


Check
Pre
Do It
Post
UndoPre

Logging of traces in /home/ios/logs/vioCmd.log

12

Service Education

2011 IBM Corporation

Shared Storage Pools

Dynamic Configuration Add Node


This operation is a NOP for create SSP

13

Service Education

2011 IBM Corporation

Shared Storage Pools

Dynamic Configuration Join Node


Check Phase check whether the node is part of a cluster
Pre Phase NOP
Post Phase
Verify and Create vioscluster0 with Cluster ID and Cluster Name

UndoPre Phase - NOP

14

Service Education

2011 IBM Corporation

Shared Storage Pools

Dynamic Configuration Add Disk

15

Check Phase check usage of the disk


Pre Phase NOP
Post Phase NOP
UndoPre Phase - NOP

Service Education

2011 IBM Corporation

Shared Storage Pools

Create SSP Post API Phase

Create the Pool


Create and Initialize SSP Database
Start vio_daemon
Verify all the services are running
vio_daemon
solidDB

Return status to CLI


Logging of traces in /home/ios/logs/ioscli_global.trace

16

Service Education

2011 IBM Corporation

Shared Storage Pools

How to verify create SSP?


cluster -status -clustername mycluster
Cluster Name
mycluster
Node Name
vios123
als092124
als092100

17

State
OK
MTM
8233-E8B02108F9BP
8233-E8B02108F9BP
8233-E8B0210BBE8P

Service Education

Partition Num
2
3
2

State

Pool State
OK
OK
OK
OK
OK
OK

2011 IBM Corporation

Shared Storage Pools

How to verify create SSP?


lssrc ls vio_daemon
Node ID:
Log File:
VSP Socket:
AF family:
Port:
Addr:
VKE Kernel Socket:
VKE Daemon Socket:
Bound to :
API Socket:
Bound to :
Cluster Name:
Cluster ID:
PNN NODE ID:
DBN NODE ID:
Pool Label:
Pool VIO Name:
Pool ID:
Pool State:

18

Service Education

a405d026e60f11e0b317fe6a2accf50b
/home/ios/logs/viod.log
0
0
0
0.0.0.0
3
4
/home/ios/socks/vioke_unix
6
/home/ios/socks/api_eve_unix
mycluster
ea93b8c4e5ff11e0a394fe6a2accf50b
00000000000000000000000000000000
ea87514ce5ff11e0a394fe6a2accf50b
mypool
D_E_F_A_U_L_T_061310
0000000009035C7B000000004E7CB1D6
UP

2011 IBM Corporation

Shared Storage Pools

Add Node to SSP


Node can be added from CLI or Smit interface
Requirements
Node should have access to Cluster Repository disk
Node should have access to Storage Pool disks
Node should have network configured

19

Service Education

2011 IBM Corporation

Shared Storage Pools

CLI to Add Node to SSP


cluster -addnode -clustername ClusterName -hostname HostName
Example
cluster addnode clustername ssp1 hostname node2.austin.ibm.com

20

Service Education

2011 IBM Corporation

Shared Storage Pools

Add Node to SSP Pre API Phase


Validation of the node
Call CAA API for change cluster
Logging of traces in /home/ios/logs/ioscli_global.trace

21

Service Education

2011 IBM Corporation

Shared Storage Pools

Add Node to SSP CAA Phase [CAA]


Refer to CAA presentation
Logging of traces in /var/adm/ras/syslog.caa

22

Service Education

2011 IBM Corporation

Shared Storage Pools

Add Node to SSP CAA Phase


[Dynamic Configuration]
Starts when CAA calls the script /usr/sbin/vioscmd.sh
Dynamic Configuration is split into various operations for Adding
Node to SSP
Add Node
Join Node

Each operation is split into following phases


Check
Pre
Do It
Post
UndoPre

Logging of traces in /home/ios/logs/vioCmd.log

23

Service Education

2011 IBM Corporation

Shared Storage Pools

Dynamic Configuration Add Node


Check Phase NOP
Pre Phase NOP
Post Phase All nodes except target node will try to add node to
SSP Database
UndoPre Phase - NOP

24

Service Education

2011 IBM Corporation

Shared Storage Pools

Dynamic Configuration Join Node [Target Node]


Check Phase
Check whether the node is part of a cluster
Check the software levels match with cluster

Pre Phase NOP


Post Phase
Verify and Create vioscluster0 with Cluster ID and Cluster Name
Query other node for storage pool ID & name
Populate vioscluster0 with storage pool ID & name
Start Pool
Start vio_daemon
Update SSP Database about node information [mtm, partition number]

UndoPre Phase - NOP

25

Service Education

2011 IBM Corporation

Shared Storage Pools

Dynamic Configuration Join Node


[Initiator/Other Members]

26

Check Phase - NOP


Pre Phase NOP
Post Phase - NOP
UndoPre Phase - NOP

Service Education

2011 IBM Corporation

Shared Storage Pools

Add Node to SSP Post API Phase


Add the node to SSP Database
Return status to CLI
Logging of traces in /home/ios/logs/ioscli_global.trace

27

Service Education

2011 IBM Corporation

Shared Storage Pools

Commands to Debug Issues?

lscluster m
lssrc ls vio_daemon
cluster status clustername ssp1
From root shell
clcmd lssrc ls vio_daemon

28

Service Education

2011 IBM Corporation

Shared Storage Pools

Add/Replace Disks to SSP


Disks can be Added/Replaced from SSP using CLI or Smit interface
Requirements
Node should have access to disks
Node should be a member of the SSP

29

Service Education

2011 IBM Corporation

Shared Storage Pools

CLI to Add Disks to SSP


chsp -add [-f] -clustername ClusterName -sp StoragePool
PhysicalVolume ...
Example
chsp add clustername ssp1 sp pool1 hdisk3 hdisk4

30

Service Education

2011 IBM Corporation

Shared Storage Pools

Add Disks to SSP Pre API Phase


Validation of the disks
Call CAA API for adding disks to cluster one disk at a time
Logging of traces in /home/ios/logs/ioscli_global.trace

31

Service Education

2011 IBM Corporation

Shared Storage Pools

Add Disk to SSP CAA Phase [CAA]


Refer to CAA presentation
Logging of traces in /var/adm/ras/syslog.caa

32

Service Education

2011 IBM Corporation

Shared Storage Pools

Add Disk to SSP CAA Phase


[Dynamic Configuration]
Starts when CAA calls the script /usr/sbin/vioscmd.sh
Dynamic Configuration operations for adding disk SSP
Add Disk

Each operation is split into following phases


Check
Pre
Do It
Post
UndoPre

Logging of traces in /home/ios/logs/vioCmd.log

33

Service Education

2011 IBM Corporation

Shared Storage Pools

Dynamic Configuration Add Disk

34

Check Phase Validate Disk Usage


Pre Phase NOP
Post Phase NOP
UndoPre Phase - NOP

Service Education

2011 IBM Corporation

Shared Storage Pools

Add Disk to SSP Post API Phase


Add the disks to Storage Pool
Return status to CLI
Logging of traces in /home/ios/logs/ioscli_global.trace

35

Service Education

2011 IBM Corporation

Shared Storage Pools

Add Disks to SSP Post CLI Phase


Based on the return code process the exceptions
Logging of traces in /home/ios/logs/ioscli_global.trace

36

Service Education

2011 IBM Corporation

Shared Storage Pools

CLI to Replace Disks from SSP


chsp -replace -clustername ClusterName -sp StoragePool
-oldpv PhysicalVolume ... -newpv PhysicalVolume ...
Example
chsp replace clustername ssp1 sp pool1 oldpv hdisk1 -newpv hdisk5

37

Service Education

2011 IBM Corporation

Shared Storage Pools

Replace Disks from SSP Pre API Phase

38

Validation of the new disks


Create an entry for Replace Disks Operation in the SSP Database
Call CAA API for adding new disks to cluster one disk at a time
Logging of traces in /home/ios/logs/ioscli_global.trace

Service Education

2011 IBM Corporation

Shared Storage Pools

Add Disk to SSP CAA Phase [CAA]


Refer to CAA presentation
Logging of traces in /var/adm/ras/syslog.caa

39

Service Education

2011 IBM Corporation

Shared Storage Pools

Add Disk to SSP CAA Phase [Dynamic


Configuration]
Starts when CAA calls the script /usr/sbin/vioscmd.sh
Dynamic Configuration operation for adding disk to SSP
Add Disk

Each operation is split into following phases


Check
Pre
Do It
Post
UndoPre

Logging of traces in /home/ios/logs/vioCmd.log

40

Service Education

2011 IBM Corporation

Shared Storage Pools

Dynamic Configuration Add Disk

41

Check Phase Validate Disk Usage


Pre Phase NOP
Post Phase NOP
UndoPre Phase - NOP

Service Education

2011 IBM Corporation

Shared Storage Pools

Replace Disks from SSP Post API Phase


Add the disks to Storage Pool
Remove old disks from SSP one disk at a time by calling CAA API
Operation RM DISK vioscmd.sh

Check Phase NOP


PRE Phase NOP
Post Phase NOP
UndoPre Phase - NOP

Clean up Replace Disks Operation from the SSP Database


Return status to CLI
Logging of traces in /home/ios/logs/ioscli_global.trace

42

Service Education

2011 IBM Corporation

Shared Storage Pools

What happens in case of failures?


Initiator Node Crashes or Command Cores
After the SSP Database changes then DBN will clean up
During adding the Disks to Storage Pool then DBN will take over the operation

Add New Disks Fails then clean up and fail the command
Remove Old Disks Fails
Return an error to command
DBN is will take over the clean up

43

Service Education

2011 IBM Corporation

Shared Storage Pools

Commands to Debug Issues?


lscluster d
lssp clustername ClusterName
lspv clustername ClusterName sp SPName
$ lssp -clustername mycluster
Pool
Size(mb)
Free(mb)
mypool
20352
19956
0000000009035C7B000000004E7CB1D6

TotalLUSize(mb)
40960

LUs
2

Type
CLPOOL

PoolID

$ lspv -clustername mycluster -sp mypool


PV NAME
SIZE(MB)
PVUDID
hdisk6
20480
3E213600A0B80006E44820000640F4E6F46560F1818
FAStT03IBMfcp

44

Service Education

2011 IBM Corporation

Shared Storage Pools

Remove Node from SSP


Node can be removed from SSP using CLI or Smit interface
Requirements
Node should be a member of SSP

45

Service Education

2011 IBM Corporation

Shared Storage Pools

CLI to Remove Node from SSP


cluster rmnode -f -clustername ClusterName -hostname
HostName
Option remove node assumes all the mappings have been removed
f [force] option has to be used when node has mappings

Example
cluster rmnode clustername ssp1 hostname node2.austin.ibm.com

46

Service Education

2011 IBM Corporation

Shared Storage Pools

Remove Node from SSP Pre API Phase


Validation of the node
Call CAA API for change cluster
Logging of traces in /home/ios/logs/ioscli_global.trace

47

Service Education

2011 IBM Corporation

Shared Storage Pools

Remove Node from SSP CAA Phase [CAA]


Refer to CAA presentation
Logging of traces in /var/adm/ras/syslog.caa

48

Service Education

2011 IBM Corporation

Shared Storage Pools

Remove Node from SSP CAA Phase [Dynamic


Configuration]
Starts when CAA calls the script /usr/sbin/vioscmd.sh
Dynamic Configuration is split into various operations for Remove
Node from SSP
Stop Node
RM Node

Each operation is split into following phases


Check
Pre
Do It
Post
UndoPre

Logging of traces in /home/ios/logs/vioCmd.log

49

Service Education

2011 IBM Corporation

Shared Storage Pools

Dynamic Configuration Stop Node [Initiator}


Check Phase
Request vio_daemon to relinquish roles

Pre Phase
Verify DBN role has been relinquish by the vio_daemon
All SSP VTDs moved to Define State [rmdev l <vtd name>]
Stop vio_daemon
Stop Pool

Post Phase - NOP


Remove all the SSP VTDs from the ODM
Remove vioscluster0 from ODM

UndoPre Phase
Start Pool
Start vio_daemon
Move all SSP VTDs to Available State

50

Service Education

2011 IBM Corporation

Shared Storage Pools

Dynamic Configuration RM Node [Target Node]


Check Phase - NOP
Pre Phase NOP
Post Phase
Remove all the SSP VTDs from the ODM
Remove vioscluster0 from ODM

UndoPre Phase - NOP

51

Service Education

2011 IBM Corporation

Shared Storage Pools

Dynamic Configuration RM Node


[Initiator/Other Members]
Check Phase - NOP
Pre Phase NOP
Post Phase
Remove all the VTD entries from the Database
Remove the Node entry from the Database

UndoPre Phase - NOP

52

Service Education

2011 IBM Corporation

Shared Storage Pools

Remove Node from SSP Post API Phase

53

Remove all the VTD entries from the SSP Database


Remove the node from SSP Database
Return status to CLI
Logging of traces in /home/ios/logs/ioscli_global.trace

Service Education

2011 IBM Corporation

Shared Storage Pools

Delete SSP
Delete SSP using CLI or Smit interface
Requirements
All the SSP objects have to removed before initiating delete SSP
lus
Clones
Images

54

Service Education

2011 IBM Corporation

Shared Storage Pools

CLI to Delete SSP


cluster delete -clustername ClusterName
Example
cluster delete clustername ssp1

55

Service Education

2011 IBM Corporation

Shared Storage Pools

Delete SSP Pre API Phase


Validate all the SSP objects have been removed
Call CAA API for delete cluster
Logging of traces in /home/ios/logs/ioscli_global.trace

56

Service Education

2011 IBM Corporation

Shared Storage Pools

Delete SSP CAA Phase [CAA]


CAA removes one node at a time using operations STOP NODE and
RM NODE
CAA makes sure that initiator node is the last one to be removed
Logging of traces in /var/adm/ras/syslog.caa

57

Service Education

2011 IBM Corporation

Shared Storage Pools

Dynamic Configuration Stop Node [Initiator]


Check Phase
Pre Phase
Stop vio_daemon
Remove the SSP Database
Clean up ODM
Stop Pool

Post Phase - NOP


UndoPre Phase - NOP

58

Service Education

2011 IBM Corporation

Shared Storage Pools

Delete SSP Post API Phase


Return status to CLI
Logging of traces in /home/ios/logs/ioscli_global.trace

59

Service Education

2011 IBM Corporation

Shared Storage Pools

Logical Units
Overview
Base building blocks for device virtualization and advanced functionality
Snapshot/Rollback
Thin/Thick
IM
Collection of files within one or more filesets
SSP DB is final arbiter for LU management

/var/vio/SSP//D_E_F_A_U_L_T...
/var/vio/SSP//D_E_F_U_A_L_T/
/var/vio/SSP//D_E_F_A_U_L_T...
/var/vio/SSP//D_E_F_U_A_L_T/

VOL1

60

Service Education

...

VOLX

2011 IBM Corporation

Shared Storage Pools

Logical Units
Creation
MKBDSP command enhanced for clustering
API Validation
Device creation within SSP DB
Object creation within pool
Rollback on failure

Debuging
ioscli_global.*
viod.log

CLI
Create Request

61

API
Validation

Service Education

API
DB Inserts

API
Pool OBJ Creation

API/DAEMON
LCE

2011 IBM Corporation

Shared Storage Pools

Logical Units
Removal
MKBDSP command enhanced for clustering
API Validation
Object removal within pool
Device removal within SSP DB

Debuging
ioscli_global.*
viod.log

CLI
Remove Request

62

API
Validation

Service Education

API
Pool OBJ Removal

API
DB Delete

API/DAEMON
LCE

2011 IBM Corporation

Shared Storage Pools

Logical Units
Provisioning
MKBDSP command enhanced for clustering
API Validation
Device creation within SSP DB
Device creation within local system (ODM)
Rollback on failure

Debuging
ioscli_global.*
viosCmd.log
cfglog
viod.log

CLI
Map Request

63

API
Validation

Service Education

API
DB Inserts

API
MKDEV

CFG
Add Child

API
DB Update

2011 IBM Corporation

Shared Storage Pools

SSP Database
Configuration
Database artifacts
/var/vio/SSP/<CLUSTER>/D_E_F_A_U_L_T_061310/VIOSCFG/DB
Catalog file (DB)
Transaction logs
Backup
Startup and FFDC
/var/vio/SSP/<CLUSTER>/local/VIOSCFG/DB
solid.ini
Error logs (solmsg/sollerror/soltrace).out
Checkpoints 5 Transactions/5 minutes
DB Backup once a day at 9pm

Debug
Pool full, 64MB minimum space
Error Report
viod.log

64

Service Education

2011 IBM Corporation

Shared Storage Pools

SSP Database
LIBVIO
Provides database services and abstraction utilizing ODBC

Database start up and shutdown


Connection handles
Database query and object modification
Only VIOS and Trusted Logging support

Debug

65

ioscli_global.*
vioCmd.log
viod.log
cfglog

Service Education

2011 IBM Corporation

Shared Storage Pools

66

Service Education

2011 IBM Corporation

Shared Storage Pools

VIO Daemon Election Framework


Provides role based election services
Database Node
Primary Notification Node
Provides method to establish additional services based on role once election completes
Database access and management
Database cleanup/error recovery for long/complex operations
Life Cycle Event Notification
Alert Notification
Provides opportunity to register critical processes
Process can be monitored and actions taken based on termination of those critical
processes
Can start/stop those processes based on election/relinquish actions
One process supported per role

67

Service Education

2011 IBM Corporation

Shared Storage Pools

VIO Daemon Election Framework


Election framework is driven by code within the vio daemon and utilizes election control
file access and process monitoring interfaces from VKE
Is dependent on the following sub-components
VIOS API
VIOS Daemon
VKE
SFStore
CAA
RSCT

68

Service Education

2011 IBM Corporation

Shared Storage Pools

Election Framework
Heartbeat/Validate Success

Init

Primary

Error/Node no
Longer meets
Reqs

Detect No Primary
Elect Self Success

Detect New
Primary

Elect Start
Elect Self

Node Meets reqs


Node does
not Meet reqs

Error/Validate
failure

Relinquish
No new Primary

Node does
not Meet reqs

Call Next
Cand

Node Meets reqs

Elect Self
Request

Rcvd BUSY
Start timer

Call next Node

Detect New
Primary

Node is elector
Exhausted
Node list

Elect Fail

Start timer

Elect timer expires / Detect no Primary


69

Service Education

Elect Wait
Timer
Transitional State
Final State
2011 IBM Corporation

Shared Storage Pools

Election Success Example


Node B attempts to
elect self

Node A fails to
elect self, calls
next node
Node A starts
Election

Node B

# cat dbn.ecf
ecf_version:1
cluster_id:0
primary_node_id:0
primary_node_ipaddr:0.0.0.0
elector_node_id:0
election_result:1

# cat dbn.ecf
ecf_version:1
cluster_id:Cluster ID
primary_node_id:0
primary_node_ipaddr:0.0.0.0
elector_node_id:Node A ID
election_result:1

Node A

Node B fails to
elect self, calls
next node
Node C attempts to
elect self

Node C

# cat dbn.ecf
ecf_version:1
cluster_id:Cluster ID
primary_node_id:Node C ID
primary_node_ipaddr:9.3.2.14
elector_node_id:0
election_result:2

Node C elects
self, updates
ECF

Node C is Primary

Node D

70

Service Education

2011 IBM Corporation

Shared Storage Pools

Election Fail Example


Node A fails to
elect self, calls
next node # cat dbn.ecf
ecf_version:1
cluster_id:0
primary_node_id:0
primary_node_ipaddr:0.0.0.0
elector_node_id:0
election_result:1

Node A starts
Election

# cat dbn.ecf
ecf_version:1
cluster_id:Cluster ID
primary_node_id:0
primary_node_ipaddr:0.0.0.0
elector_node_id:Node A ID
election_result:1

Node A

Node A is in Wait
Timer state
Node A fails
election, updates
ECF
71

Node B attempts to
elect self

Service Education

Node B
# cat dbn.ecf
ecf_version:1
cluster_id:Cluster ID
primary_node_id:0
primary_node_ipaddr:0.0.0.0
elector_node_id:0
election_result:4

Node B fails to
elect self, calls
next node

Node A detects it
started election

2011 IBM Corporation

Shared Storage Pools

Election Control File (ECF)


All access to the ECF is through vke
VKE utilizes open exclusive logic on the ECF to serialize elections
across the cluster
Resides in the storage pool
One ECF per primary role
/var/vio/SSP/<cluster_name>/D_E_F_A_U_L_T_061310/VIOSCFG/Election
dbn.ecf

- Database Node

pnn.ecf

- Primary Notification Node

Contains election information


During election contains the elector id, this is the id of the node that started the
election processes
After election completes contains the newly elected primary nodes information

72

Service Education

2011 IBM Corporation

Shared Storage Pools

Contents of ECF
# cat dbn.ecf

ecf_version:1
cluster_id:7a454cd2e95f11e084b736867718cf0b
primary_node_id:7a39063ee95f11e084b736867718cf0b
primary_node_ipaddr:9.3.92.142
elector_node_id:0
election_result:2

- File version
- Unique identifier for cluster
- Unique ID for the primary node*
- IP addr for the primary node
- Unique identifier for the elector
- Result of previous elections**

Defines in header file


* the primary node id stored here is for debug only the actual primary node identifier is stored in the ECF extended attributes
** Election Result values:
#define VIO_ELECT_INIT
0x1
/** Leader has not been elected waiting for trigger to start election */
#define VIO_ELECT_MET_REQ
0x2
/** at least one node meets the req, listener has established connection */
#define VIO_ELECT_NOT_MET_REQ 0x4
/** Election process has failed so no nodes in the cluster meet the req */

73

Service Education

2011 IBM Corporation

Shared Storage Pools

ECF Extended Attributes

File extended attributes are used to as an atomic election operation

EAs store the primary node id of the elected node and a sequence number

Sequence number is increased every time a non-zero value is set for the ea node id

Use the following command to read the ECF ea


# getea dbn.ecf
EAName: Owner
EAValue:

7A39063EE95F11E084B736867718CF0B - 0000000000000001

Primary
Node ID

74

Service Education

Sequence
Number

2011 IBM Corporation

Shared Storage Pools

Primary Roles

DBN Database Node


Controls access to the clusterwide DB process
Handles error recovery and cleanup for database errors

PNN Primary Notification Node


Alert Notification
Pool up/down
Node up/down
Life Cycle Event Notification to registered listeners
Currently on Director VSP CAS sub-agent is registered
Sends xml data to CAS sub-agent regarding changes to SSP objects
Create, modify and delete object changes
If a LCE cannot be sent to a registered listener then the pnn triggers a resync event which tells the registered
listener to re-discover the SSP configuration

75

Service Education

2011 IBM Corporation

Shared Storage Pools

VKE Services
Interface Abstraction Services
Emulator <-> Cluster Communication
Emulator -> Daemon
Emulator <-> Database
Daemon <-> Cluster Communication

InterNode Consistency Services


Election Control File Access
Election PID Monitoring and Cleanup

76

Service Education

2011 IBM Corporation

Shared Storage Pools

VKE Architecture
Module Breakdown by Interface
Separate anchor block by interface with lock protection independent of each other

Kproc and Threads


Maintain threads to monitor daemon sockets (for detecting close in order to
recreate)
Work threads to handle high priority requests and normal priority requests

77

Service Education

2011 IBM Corporation

Shared Storage Pools

VKE RAS
Component Trace
vke component
Traces all interfaces except the syscall interface

vke.sc component
Traces all syscalls with viodHead information (wraps quickly)

Error Log
KDB

Everything under 1 command vke


Use subcommands to get interface info vke -? for usage

https://w3.tap.ibm.com/w3ki03/display/vio/VKE+Debug

78

Service Education

2011 IBM Corporation

Shared Storage Pools

Common VKE Trace Events


SACCENTR,SACCEXIT/SMONRDY
normal daemon socket connect sequence.

ESTART/ECASSOC/ECOMPLETE
Normal election sequence. ECASSOC not required in all cases.

ERELINQH

Election relinquish handle not found. Common trace if 1st data word (the handle) is 0.

FMHBTOUT
Daemon heartbeat timed out. VKE will kill daemon PID

ECFTIMER

Election timer expired. VKE will kill daemon PID

79

Service Education

2011 IBM Corporation

Shared Storage Pools

VKE KDB Commands


1 root command with subcommands
vke anchor [daemon | cluster | kext | all | root | events | file]
vke req

[<address> | -h <handle> | pend | act | to | free]

vke ngdev [<address> | -u <udid>]


vke fcb

[<address> | main

vke pcb

[<address> | main

Build String in root anchor block shows date/time and build cycle

80

Service Education

2011 IBM Corporation

Shared Storage Pools

vSCSI SSP Concepts


SSP LUs are files contained within a shared storage pool
vSCSI maintains a sidecar file for each LU
Contains metadata needed to manage the virtual disk
Located in the storage pool so its accessible to all nodes in the cluster

81

Service Education

2011 IBM Corporation

Shared Storage Pools

Mapping Problems
Configuring SSP VTDs is more complicated than other types of
VTDs, due to interactions with the SSP database and storage pool
Check the error log. If there is no vhost error log, then the CLI never got far enough to call mkdev;
check the CLI traces.
If error log has Detail Data like this:
DetailData
ADDITIONALINFORMATION
module:ngdisk_init_lun1.67rc:0000000000000016location:00000583
data:13D49BA2AD9F11E08AF76A6C1249000D888100000000000000

there was a problem in the driver.


module contains function name (probably ngdisk_init_lun or ngdisk_finish_init)
and source file version number.
location is usually line number (in decimal)
rc and data are in hex

This should be enough to determine the nature of the error.

82

Service Education

2011 IBM Corporation

Shared Storage Pools

Mapping Problems (cont)


If error log has Detail Data like this:
DetailData
ADDITIONALINFORMATION
module:add_childrc:0000000000000016location:00001302
data:21

this indicates that the problem was not with the device driver, but the
configuration method. Check the config log:
MS80611548913046cfg_vt_ngdisklvtscsi1
M08061154cfg_vtdev_ngdisk.c94>Enterconfigure_deviceforcfg_vt_ngdisk
M08061154cfg_vtdev_ngdisk.c260erroraccessingdatabaseforvtscsi1rc1.
M08061154cfg_vtdev_ngdisk.c136ioctl(VSCSI_HOST_ADD_NGDISK)failed.vhost=vhost0,vtdev=vtscsi1,errno=22
M08061154cfg_vtdev_ngdisk.c146exitwithERROR,rc=33,rc2=47,cfg_failed=0x21.vhost=vhost0,vtdev=vtscsi1

cfg_failed not being 0 confirms that the config method encountered an error,
in this case with accessing the database.

83

Service Education

2011 IBM Corporation

Shared Storage Pools

Hung I/O
In kdb, you can use the svvtd vhostX command to list information
about all VTDs mapped to the specified vhost adapter, including any
commands on their active queues:
CommandElementatF1000A0400AFC348

cmd_list.next:F1000A0400AFC348cmd_list.prev:F1000A0400AFC348
delay_devstrat.next:0delay_devstrat.prev:0
working_area:0tag:F1000A0015B91548
srp_id:61300time0:1E8116F20683BF
start_time:1E8116F20699C6proc_time1:1607
proc_time2:0wait_time:0
lua:8100000000000000lun:F1000A0400CFBC00
num_bufs:1iodones_rcvd:0
task_attrib:00flags:00status_qualifier:00
status:00non_scsi_status:00iu_len:40resid:0
add_len_cdb:0sense_size:0
CDB2A000006B09000000804000000000000
[...]

Note the command pointer.

84

Service Education

2011 IBM Corporation

Shared Storage Pools

Hung I/O (cont)


KDB(0)>svngCmdF1000A0400AFC348
KernelThreadDataatF1000A0400A642C0(tid=0x2AB0059)
link.next:F1000A0400A642C0link.prev:F1000A0400A642C0
sync:FFFFFFFFFFFFFFFFsleep:FFFFFFFFFFFFFFFF
state:0cmd:F1000A0400AFC348
prev_cmd:F1000A0400AFC9D8
cmd.srp_id:61300cmd.tag:F1000A0015B91548

The upper portion of the tid is the thread slot number. (Note that 0x2AB = 683.)
KDB(0)>f683
pvthread+02AB00STACK:
[000093C4].unlock_enable_mem+0000B8()
[000D6DBC]e_block_thread+00049C()
[00014F50].kernel_add_gate_cstack+000030()
[F1000000C066EF4C]osCondBlock+00000C()
[F1000000C066B104]ioWaitPager+000284(??)
[F1000000C08384F4]cfsDIOWrite+0011F4(??,??,??,??,??,??,??,??)
[F1000000C083AED4]cfsRioMove+0004F4(??,??,??,??,??,??)
[F1000000C07DC164]cfsDataWrite+000724(??,??,??,??,??,??,??)
[F1000000C07BED10]cfsRdwrAttr+0008F0(??,??,??,??,??,??,??,??)
[005984A8]vnop_rdwr+0001A8(??,??,??,??,??,??,??,??)
[005B0314]vno_rw+0000B4(??,??,??,??,??)
[0055EDDC]fp_rwuio+00029C(??,??,??,??)
[00014F50].kernel_add_gate_cstack+000030()
[F1000000C025E180]ngdisk_io_proc+000A00(??,??)
[F1000000C0267204]ngdisk_thread+0003E4(??)
[00014D70].hkey_legacy_gate+00004C()
[00254234]threadentry+000094(??,??,??,??)

85

Service Education

2011 IBM Corporation

Shared Storage Pools

Hung I/O (cont)


You can also view general information, including a list of all active SSP
threads, with the svanng command. Note that running this on a live system
with active clients can result in an infinite loop traversing the busy queue.
KDB(0)>svanng
vke_handle:0x4C7node_id:0x21B08D4E94411E0B7336A6C1249000D
vke.version:0x2pool_q:0xF1000A0400A64180
Poolat0xF1000A0400A64180pool_id:0x281D39C91558D45E
use_count:0x1err_logged:0x0
enospc_time:0x0flags:0x0
threads.busy_q:F1000A0400A642C0threads.free_q:F1000A0400A64480
threads.wait_q:0threads.pid:0x270052
threads.num_ng_luns:0x1threads.num_threads:0x10
threads.max_threads:0x3E8threads.proc_sleep:0x3E0081
threads.cfg_sleep:0xFFFFFFFFFFFFFFFFthreads.lock:0x0
threads.reserved[0]:0x0threads.reserved[1]:0x0
&threads.reserved:0xF1000000C029F9C8threads.flags(0x00000003):
NG_PROC_RUNNING|NG_PROC_SLEEPING
TraversingBusyqueue:
KernelThreadDataatF1000A0400A642C0(tid=0x2AB0059)
link.next:F1000A0400A642C0link.prev:F1000A0400A642C0
sync:FFFFFFFFFFFFFFFFsleep:FFFFFFFFFFFFFFFF
state:0cmd:F1000A0400AFC348
prev_cmd:F1000A0400AFC9D8
cmd.srp_id:61300cmd.tag:F1000A0015B91548
Found1elementsonlist

86

Service Education

2011 IBM Corporation

Shared Storage Pools

I/O Errors
Any errors attempting to write to the backing SP file will generate an error log
DetailData
ADDITIONALINFORMATION
module:ngdisk_io_proc1.64rc:0000000000000034location:00004563
data:2370BF6E94411E0B7336A6C1249000D21B08D4E94411E0B7336A6C1249000D

Exceptions:

Consecutive errors will only generate one error log


Once an I/O succeeds, future errors will generate new error logs
ENOSPC Errors (rc = 1C)
Can occur on Writes to new blocks of thin-provisioned LUs when the SP is full
Occur at most once per day

87

Service Education

2011 IBM Corporation

Shared Storage Pools

VIOSBR Backup
VIOSBR has been updated to support backup of SSP
VIOSBR backs up the following data

88

Backup of the cluster information

Backup of all the SSP and Classic Mappings for all the nodes in UP state in the
cluster

Backup of the SSP database

Service Education

2011 IBM Corporation

Shared Storage Pools

VIOSBR Backup Command


viosbr -backup -clustername clusterName -file FileName [-frequency
daily|weekly|monthly [-numfiles fileCount]]
$ viosbr -backup -clustername ITL_UPT -file test
Backup of node vios163.austin.ibm.com successfull
Backup of node vios164.austin.ibm.com successfull
Backup of node vios165.austin.ibm.com successfull
Backup of this node (vios162.austin.ibm.com) successful
Name of the backup file will be <filename>.<clustername>.tar.gz
$ ls cfgbackups
test.ITL_UPT.tar.gz

89

Service Education

2011 IBM Corporation

Shared Storage Pools

VIOSBR Content of Backup file


ITL_UPTDB
ITL_UPTMTM8233-E8B02061AAFPP2.xml <<< Backup of the nodes format of the file
ITL_UPTMTM8233-E8B02061AAFPP4.xml <<< <Clustername>MTM<MTM of the Node>P<Partition Number>.xml
ITL_UPTMTM8233-E8B02061AAFPP1.xml
ITL_UPTMTM8233-E8B02061AAFPP3.xml
./ITL_UPTDB:
0e89745a167d11e1a5dcee967c979102
./ITL_UPTDB/0e89745a167d11e1a5dcee967c979102:
DBCOMPLETE
VIOS_FILE.dat
VIOS_SP_THRESHOLD_ALERT.dat
VIOCLUST.xml
VIOS_FILESET.ctr
VIOS_STORAGE_POOL.ctr
VIOS.sql
VIOS_FILESET.dat
VIOS_STORAGE_POOL.dat
VIOS_CAPABILITY.ctr
VIOS_LU.ctr
VIOS_TIER.ctr
VIOS_CAPABILITY.dat
VIOS_LU.dat
VIOS_TIER.dat
VIOS_CLIENTIMAGE.ctr
VIOS_MAP.ctr
VIOS_TRANSACTION.ctr
VIOS_CLIENTIMAGE.dat
VIOS_MAP.dat
VIOS_TRANSACTION.dat
VIOS_CLIENTOP.ctr
VIOS_MASTER_IMAGE.ctr
VIOS_VIOS_CLUSTER.ctr
VIOS_CLIENTOP.dat
VIOS_MASTER_IMAGE.dat
VIOS_VIOS_CLUSTER.dat
VIOS_CLIENT_PARTITION.ctr
VIOS_NODE_VERSION.ctr
VIOS_VIRTUAL_LOG.ctr
VIOS_CLIENT_PARTITION.dat
VIOS_NODE_VERSION.dat
VIOS_VIRTUAL_LOG.dat
VIOS_CLUSTER_VERSION.ctr
VIOS_PARTITION.ctr
VIOS_VIRTUAL_LOG_REPOSITORY.ctr
VIOS_CLUSTER_VERSION.dat
VIOS_PARTITION.dat
VIOS_VIRTUAL_LOG_REPOSITORY.dat
VIOS_DISK.ctr
VIOS_RESOURCE.ctr
VIOS_VIRTUAL_SERVER_ADAPTER.ctr
VIOS_DISK.dat
VIOS_RESOURCE.dat
VIOS_VIRTUAL_SERVER_ADAPTER.dat
VIOS_FILE.ctr
VIOS_SP_THRESHOLD_ALERT.ctr
./ITL_UPTDB/0e89745a167d11e1a5dcee967c979102/DBCOMPLETE:
sol00004.log sol00005.log solid.db
solid.ini solmsg.out <<< SSP Database

90

Service Education

2011 IBM Corporation

Shared Storage Pools

VIOSBR - View
VIOSBR has been updated to view SSP backup files
viosbr -view -file FileName -clustername clusterName [-type devType][-detail | -mapping]
Eaxmple viosbr -view -file test.ITL_UPT.tar.gz -clustername ITL_UPT | more
Files in the cluster Backup
===========================
ITL_UPTDB
ITL_UPTMTM8233-E8B02061AAFPP1.xml
ITL_UPTMTM8233-E8B02061AAFPP2.xml
ITL_UPTMTM8233-E8B02061AAFPP3.xml
ITL_UPTMTM8233-E8B02061AAFPP4.xml
===========================
Details in: /home/ios/ITL_UPT.8716414/ITL_UPTMTM8233-E8B02061AAFPP1.xml
===============================================================
Controllers:
============
Name
Phys Loc
----------iscsi0
pager0
U8233.E8B.061AAFP-V1-C32769-L0-L0
vasi0
U8233.E8B.061AAFP-V1-C32769
vbsd0
U8233.E8B.061AAFP-V1-C32769-L0
fcs0
U5877.001.0080617-P1-C1-T1

91

Service Education

2011 IBM Corporation

Shared Storage Pools

VIOSBR Scenarios for SSP Restore


1.
2.
3.
4.

Repository Disk is corrupted


One of the node is SSP is reinstalled
SSP Database is corrupted
Restore to old configuration on the node
Changes done to SSP mappings on the node after backup

92

Service Education

2011 IBM Corporation

Shared Storage Pools

VIOSBR SSP Restore Scenario 1


VIOSBR does the following to restore SSP

Validates the backup file

Creates Cluster

Add nodes to the cluster

Start pool

Restore the SSP database from backup

Start vio_daemon

Using clcmd run restore on all the node with skipcluster option

Restores all the devices on the initiating node

VIOSBR always validates before restoring

In case of failure change configuration if needed and rerun the command

Example Nodes are added one at a time. If adding a node fails then viosbr returns with error. Do the following.
Verify whether the node is in cluster using lscluster m
Verify the mappings on the node using lsmap all
Run the restore command again

Limitations

93

viosbr doesnt restore network configuration if the SSP is configured on the system
Restore the network configuration before restoring SSP on all the nodes using
viosbr restore -clustername ITL_UPT file test.ITL_UPT.tar.gz type net

Service Education

2011 IBM Corporation

Shared Storage Pools

VIOSBR SSP Restore command


Command to restore SSP along with database from backup
viosbr restore -clustername ITL_UPT -file test.ITL_UPT.tar.gz

Command to restore SSP without SSP database


viosbr restore -clustername ITL_UPT -file test.ITL_UPT.tar.gz -currentdb

Command to restore SSP using new reposistory disk


viosbr restore -clustername ITL_UPT -file test.ITL_UPT.tar.gz repopvs hdisk45

94

Service Education

2011 IBM Corporation

Shared Storage Pools

VIOSBR SSP Restore Scenario 2


VIOSBR does the following to restore a SSP node after reinstall
Validate the backup file and current node configuration
Restore RSCT Node ID
Make sure backup is for the current cluster
Dont run this option if the node is already part of the cluster
Verify using lscluster and lssrc ls vio_daemon
Restore the cluster configuration
Restore classic mappings

Command
viosbr restore -file test.ITL_UPT.tar.gz -clustername ITL_UPT subfile
ITL_UPTMTM8233-E8B02061AAFPP1.xml

95

Service Education

2011 IBM Corporation

Shared Storage Pools

VIOSBR SSP Restore Scenario 3


VIOSBR does the following
Validates the backup file
Verifies all the node in the cluster are in DOWN state
Recovers database either from the backup file or daily backup in the SSP

Command to recover database from the backup


viosbr -recoverdb -file test.ITL_UPT.tar.gz -clustername ITL_UPT

Command to recover database from the daily backup in SSP


viosbr -recoverdb --clustername ITL_UPT

96

Service Education

2011 IBM Corporation

Shared Storage Pools

VIOSBR SSP Restore Scenario 4


VIOSBR does the following to restore the configuration
Validates the backup
Validates the xml file using MTM and Partition ID of the node
Validates SSP
Validates and Generates list of missing SSP VTDs
Restore the VTDs

Command to restore
viosbr restore -file test.ITL_UPT.tar.gz -clustername ITL_UPT subfile
ITL_UPTMTM8233-E8B02061AAFPP1.xml -xmlvtds

97

Service Education

2011 IBM Corporation

Shared Storage Pools

Questions?

98

Service Education

2011 IBM Corporation

Potrebbero piacerti anche