SONAS Architecture and Implementation

Front cover
Draft Document for Review November 1, 2010 9:32 am SG24-7875-00
IBM Scale Out Network

Attached Storage
Architecture, Planning and
Implementation Basics
Learn to setup and customize the IBM
Scale Out NAS
Details hardware and software

architecture
Includes daily administration

scenarios
Mary Lovelace
Vincent Boucher
Shradha Nayak
Curtis Neal
Lukasz Razmuk
John Sing
John Tarella
ibm.com/redbooks
Draft Document for Review November 1, 2010 9:32 am 7875edno.fm
International Technical Support Organization
SONAS Architecture and Implementation
November 2010
SG24-7875-00
7875edno.fm Draft Document for Review November 1, 2010 9:32 am
Note: Before using this information and the product it supports, read the information in “Notices” on
page xiii.
First Edition (November 2010)
This edition applies to IBM Scale Out Network Attached Storage V1.1.1.
This document created or updated on November 1, 2010.
© Copyright International Business Machines Corporation 2010. All rights reserved.

Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
Draft Document for Review November 1, 2010 9:32 am 7875edno.fm
iii
7875edno.fm Draft Document for Review November 1, 2010 9:32 am
iv SONAS Architecture and Implementation

Draft Document for Review November 1, 2010 9:32 am 7875TOC.fm
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
Chapter 1. Introduction to Scale Out File Network Attached Storage . . . . . . . . . . . . . . 1

1.1 Marketplace requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Understanding I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 File I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Block I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Network Attached Storage (NAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Scale Out Network Attached Storage (SONAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 SONAS architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 SONAS scale out capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3 SONAS software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.4 High availability design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 SONAS architectural concepts and principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.1 Create, write, and read files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.2 Creating and writing a file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.3 Scale out more performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.4 Reading a file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.5 Scale out parallelism and high concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4.6 Manage storage centrally and automatically. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4.7 SONAS logical storage pools for tiered storage . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4.8 SONAS Software central policy engine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.4.9 High performance SONAS scan engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.4.10 High performance physical data movement for ILM / HSM. . . . . . . . . . . . . . . . . 31
1.4.11 Hierarchical storage management, backup/restore to external storage . . . . . . . 33
1.4.12 Requirements for high performance external HSM and backup restore . . . . . . . 34
1.4.13 SONAS high performance HSM using Tivoli Storage Manager . . . . . . . . . . . . . 34
1.4.14 SONAS high performance backup/restore using Tivoli Storage Manager . . . . . 35
1.4.15 SONAS and Tivoli Storage Manager integration in more detail . . . . . . . . . . . . . 36
1.4.16 Summary - lifecycle of a file using SONAS Software . . . . . . . . . . . . . . . . . . . . . 39
1.4.17 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Chapter 2. Hardware architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.1 Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.1.1 Interface nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.1.2 Storage nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.1.3 Management nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2 Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2.1 Internal Infiniband switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2.2 Internal private Ethernet switch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2.3 External Ethernet switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2.4 External ports - 1 GbE / 10 GbE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.3 Storage pods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
© Copyright IBM Corp. 2010. All rights reserved. v

7875TOC.fm Draft Document for Review November 1, 2010 9:32 am
2.3.1 SONAS storage controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.3.2 SONAS storage expansion unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.4 Connection between components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.4.1 Interface node connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.4.2 Storage node connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.4.3 Management node connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.4.4 Internal POD connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.4.5 Data Infiniband network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.4.6 Management ethernet network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.4.7 Connection to the external customer network. . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.5 Different SONAS configurations available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.5.1 Rack types - how to choose correct rack for your solution . . . . . . . . . . . . . . . . . . 61
2.5.2 Drive types - how to choose between different drive options . . . . . . . . . . . . . . . . 66
2.5.3 External ports - 1 GbE / 10 GbE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.6 SONAS with XIV Storage overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.6.1 Differences between SONAS with XIV and standard SONAS system . . . . . . . . . 68
2.6.2 SONAS with XIV configuration overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.6.3 SONAS base rack configuration when used with XIV storage . . . . . . . . . . . . . . . 70
2.6.4 SONAS with XIV configuration and component considerations . . . . . . . . . . . . . . 70
Chapter 3. Software architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.1 SONAS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.2 SONAS data access layer - file access protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2.1 File export protocols: CIFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.2.2 File export protocols: NFS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2.3 File export protocols: FTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2.4 File export protocols: HTTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2.5 SONAS Locks and Oplocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.3 SONAS Cluster Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.3.1 Introduction to the SONAS Cluster Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.3.2 Principles of SONAS workload allocation to interface nodes . . . . . . . . . . . . . . . . 81
3.3.3 Principles of interface node failover and failback . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.3.4 Principles of storage node failover and failback . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.3.5 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.3.6 SONAS Cluster Manager manages multi-platform concurrent file access . . . . . . 86
3.3.7 Distributed metadata manager for concurrent access and locking . . . . . . . . . . . . 88
3.3.8 SONAS Cluster Manager components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.4 SONAS authentication and authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.4.1 SONAS authentication concepts and flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.5 Data repository layer - SONAS file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.5.1 SONAS file system scalability and maximum sizes . . . . . . . . . . . . . . . . . . . . . . . 97
3.5.2 Introduction to SONAS File System parallel clustered architecture . . . . . . . . . . . 97
3.5.3 SONAS File system performance and scalability . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.6 SONAS data management services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.6.1 SONAS - Using the central policy engine and automatic tiered storage. . . . . . . 107
3.6.2 Using and configuring Tivoli Storage Manager HSM with SONAS basics . . . . . 111
3.7 SONAS resiliency using snapshots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.7.1 Integration with Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.8 SONAS resiliency using asynchronous replication . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.9 SONAS and Tivoli Storage Manager integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.9.1 General TSM and SONAS guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.9.2 Basic SONAS to TSM setup procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.9.3 TSM software licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
vi SONAS Architecture and Implementation

3.9.4 How to protect a SONAS files without TSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

3.10 SONAS system management services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
3.10.1 Management GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
3.10.2 Health Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.10.3 Command Line Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.10.4 External notifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3.11 Grouping concepts in SONAS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
3.11.1 Node grouping and TSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
3.11.2 Node grouping and async replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
3.12 Summary - SONAS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
3.12.1 SONAS goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Chapter 4. Networking considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

4.1 Review of network attached storage concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.1.1 File systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.1.2 Redirecting I/O over the network to a NAS device . . . . . . . . . . . . . . . . . . . . . . . 140
4.1.3 Network file system protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.1.4 Domain Name Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.1.5 Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.2 Domain Name Server as used by SONAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.2.1 Domain Name Server configuration recommendations . . . . . . . . . . . . . . . . . . . 144
4.2.2 Domain Name Server balances incoming workload . . . . . . . . . . . . . . . . . . . . . . 145
4.2.3 Interface node failover / failback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.3 Bonding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4.3.1 Bonding modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.3.2 Monitoring bonded ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.4 Network groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.5 Implementation networking considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.5.1 Network interface names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.5.2 Virtual Local Area Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.5.3 IP address ranges for internal connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.5.4 Use of Network Address Translation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.5.5 Management node as NTP server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.5.6 Maximum Transmission Unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.5.7 Considerations and restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.6 The impact of network latency on throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Chapter 5. SONAS policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

5.1 Creating and managing policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.1.1 The SCAN engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.2 SONAS CLI policy commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.3 SONAS policy best practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.3.1 Cron jobs considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.3.2 Policy rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
5.3.3 Peered policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.3.4 Tiered policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.3.5 HSM policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.3.6 Policy triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.3.7 Weight expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.3.8 Migration filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.3.9 General considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.4 Policy creation and execution walktrough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
5.4.1 Create storage pool using the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Contents vii
5.4.2 Create storage pool using the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

5.4.3 Create and apply policies using the GUI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.4.4 Create and apply policies using the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
5.4.5 Testing policy execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Chapter 6. Backup and recovery, availability and resiliency functions . . . . . . . . . . . 177

6.1 High Availability and protection in base SONAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6.1.1 Cluster Trivial Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6.1.2 DNS performs IP address resolution and load balancing . . . . . . . . . . . . . . . . . . 180
6.1.3 File sharing protocol error recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.2 Backup and restore of file data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.2.1 Tivoli Storage Manager terminology and operational overview . . . . . . . . . . . . . 181
6.2.2 Methods to backup a SONAS cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.2.3 TSM client and server considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.2.4 Configuring interface nodes for TSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.2.5 Performing TSM backup and restore operations . . . . . . . . . . . . . . . . . . . . . . . . 184
6.2.6 Using TSM HSM client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.3 Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.3.1 Snapshot considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.3.2 VSS snapshot integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.3.3 Snapshot creation and management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.4 Local and remote replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
6.4.1 Synchronous versus asynchronous replication. . . . . . . . . . . . . . . . . . . . . . . . . . 193
6.4.2 Block level versus file level replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
6.4.3 SONAS cluster replication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
6.4.4 Local synchronous replication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.4.5 Remote async replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
6.5 Disaster recovery methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.5.1 Backup of SONAS configuration information . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
6.5.2 Restore data from a traditional backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.5.3 Restore data from a remote replica. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Chapter 7. Configuration and sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

7.1 Tradeoffs between configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
7.1.1 Rack configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
7.1.2 Switch configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.1.3 Storage Pod configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.1.4 Interface Node configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7.1.5 Rack configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.2 Considerations for sizing your configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7.3 Inputs for SONAS sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
7.3.1 Application characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
7.3.2 Workload characteristics definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
7.3.3 Workload characteristics impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
7.3.4 Workload Characteristics measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
7.4 Powers of two and powers of ten: the missing space . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.5 Sizing the SONAS appliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.5.1 Capacity requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
7.5.2 Storage Subsystem disk type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
7.5.3 Interface node connectivity and memory configuration. . . . . . . . . . . . . . . . . . . . 237
7.6 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.6.1 Workload analyzer tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Chapter 8. Installation planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
viii SONAS Architecture and Implementation

8.1 Physical planning considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

8.2 Installation checklist questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
8.3 Storage considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
8.3.1 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
8.3.2 Asynch replication considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
8.3.3 Block size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
8.3.4 File system overhead and characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
8.3.5 SONAS master file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
8.3.6 Failure groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
8.3.7 Setting up storage pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
8.4 SONAS integration into your network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
8.4.1 Authentication using AD or LDAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
8.4.2 Planning IP addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
8.4.3 Data access and IP address balancing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
8.5 Attachment to customer applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
8.5.1 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
8.5.2 Share access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
8.5.3 Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
8.5.4 Backup considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Chapter 9. Installation and configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

9.1 Pre-Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
9.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
9.2.1 Hardware installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
9.2.2 Software installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
9.2.3 Check health of the Node hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
9.2.4 Additional hardware health checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
9.3 Post Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
9.4 Software Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
9.5 Sample environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
9.5.1 Initial hardware installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
9.5.2 Initial software configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
9.5.3 Understanding the IP Addresses for Internal Networking . . . . . . . . . . . . . . . . . . 286
9.5.4 Configure the Cluster Manager (CTDB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
9.5.5 List all available Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
9.5.6 Adding a second Failure Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
9.5.7 Create the GPFS File System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
9.5.8 Configure the DNS Server IP addresses and domains. . . . . . . . . . . . . . . . . . . . 290
9.5.9 Configure the NAT Gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
9.5.10 Configure Authentication - AD and LDAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
9.5.11 Configure Data Path IP Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
9.5.12 Configure Data Path IP Address Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
9.5.13 Attach the Data Path IP Address Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
9.6 Creating Exports for data access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
9.7 Modify ACLs to the shared export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
9.8 Test access to the SONAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
Chapter 10. SONAS administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

10.1 Using the management interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
10.1.1 GUI tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
10.1.2 Accessing the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
10.2 SONAS administrator tasks list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
10.3 Cluster Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Contents ix
10.3.1 Add/Delete cluster to the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

10.3.2 View Cluster status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
10.3.3 View Interface Node and Storage Node Status: . . . . . . . . . . . . . . . . . . . . . . . . 343
10.3.4 Modify Interface and Storage Nodes status . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
10.4 File system management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
10.4.1 Create a File system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
10.4.2 List the Filesystem status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
10.4.3 Mount the FIle system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
10.4.4 Unmount the File system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
10.4.5 Modify the File system configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
10.4.6 Delete File system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
10.4.7 Master and Non-Master file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
10.4.8 Quota Management for File systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
10.4.9 Fileset management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
10.5 Creating and managing exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
10.5.1 Create Exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
10.5.2 List and view status of exports created . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
10.5.3 Modify exports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
10.5.4 Remove service/protocols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
10.5.5 Activate Exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
10.5.6 Deactivate Exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
10.5.7 Remove Exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
10.5.8 Test accessing the Exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
10.6 Disk Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
10.6.1 List Disks and View Status:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
10.6.2 Change Properties of disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
10.6.3 Start Disks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
10.6.4 Remove Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
10.7 User management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
10.7.1 SONAS administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
10.7.2 SONAS end users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
10.8 Services Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
10.8.1 Management Service administration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
10.8.2 Manage Services on the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
10.9 Real-time and historical reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
10.9.1 System Utilization: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
10.9.2 File System Utilization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
10.9.3 Utilization Thresholds and Notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
10.10 Scheduling tasks in SONAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
10.10.1 List tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
10.10.2 Remove task: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
10.10.3 Modify the Schedule Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
10.11 Health Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
10.11.1 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
10.11.2 Default Grid view. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
10.11.3 Event logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
10.12 Call home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Chapter 11. Migration overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427

11.1 SONAS file system authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
11.1.1 SONAS file system ACLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
11.1.2 File sharing protocols in SONAS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
11.1.3 Windows CIFS and SONAS considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
x SONAS Architecture and Implementation

11.2 Migrating files and directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431

11.2.1 Data migration considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
11.2.2 Metadata migration considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
11.2.3 Migration tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
11.3 Migration of CIFS Shares and NFS Exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
11.4 Migration considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
11.4.1 Migration data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
11.4.2 Types of migration approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
11.4.3 Sample throuhput estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
11.4.4 Migration throughput example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Chapter 12. Getting started with SONAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

12.1 Quick start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
12.1.1 Quick start tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
12.2 Connecting to the SONAS system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
12.2.1 Connect to SONAS appliance using GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
12.2.2 Connect to SONAS appliance using CLI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
12.3 Create SONAS administrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
12.3.1 Creating a SONAS administrator using the CLI . . . . . . . . . . . . . . . . . . . . . . . . 444
12.3.2 Creating a SONAS administrator using the GUI . . . . . . . . . . . . . . . . . . . . . . . . 444
12.4 Monitoring your SONAS environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
12.4.1 Topology view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
12.4.2 SONAS Logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
12.4.3 Performance and reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
12.4.4 Threshold monitoring and notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
12.5 Create a filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
12.5.1 Creating a filesystem using the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
12.5.2 Creating a filesystem using the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
12.6 Creating an export. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
12.6.1 Configuring exports using the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
12.6.2 Configuring exports using the CLI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
12.7 Accessing an export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
12.7.1 Accessing a CIFS share from Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
12.7.2 Accessing a CIFS share from Windows command prompt. . . . . . . . . . . . . . . . 462
12.7.3 Accessing a NFS share from Linux. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
12.8 Creating and using snapshots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
12.8.1 Creating snapshots with the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
12.8.2 Creating snapshots with the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
12.8.3 Accessing and using snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
12.9 Backing up and restoring data with TSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
Chapter 13. Hints, tips and how to information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469

13.1 What to do when you receive an error message. . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
13.1.1 EFSSG0026I management service stopped. . . . . . . . . . . . . . . . . . . . . . . . . . . 470
13.2 Debugging SONAS with Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
13.2.1 CTDB Health Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
13.2.2 GPFS Logs: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
13.2.3 CTDB Logs: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
13.2.4 Samba/Winbind Logs: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
13.3 CTDB Unhealthy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
13.3.1 CTDB manages Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
13.3.2 Master file system umounted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
13.3.3 CTDB manages GPFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
Contents xi
13.3.4 GPFS unable to mount . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
Appendix A. Additional component detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475

CTDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
File system concepts and access permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
GPFS overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
GPFS Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
GPFS File Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
GPFS Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
GPFS High Availability solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
GPFS failure group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
Other GPFS features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
Tivoli Storage Manager (TSM) overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
Tivoli Storage Manager concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
Tivoli Storage Manager architectural overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
Tivoli Storage Manager storage management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
Policy management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
Hierarchical storage management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517

IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
How to get Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
xii SONAS Architecture and Implementation

Draft Document for Review November 1, 2010 9:32 am 7875spec.fm
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area.
Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product, program, or service that does
not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.
© Copyright IBM Corp. 2010. All rights reserved. xiii

7875spec.fm Draft Document for Review November 1, 2010 9:32 am
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
AFS® GPFS™ System i®
AIX® HACMP™ System p5®
BladeCenter® IBM® System Storage®
DB2® Lotus® System x®
Domino® PowerVM™ Tivoli®
Enterprise Storage Server® pSeries® XIV®
eServer™ Redbooks® xSeries®
FlashCopy® Redbooks (logo) ® z/OS®
The following terms are trademarks of other companies:
Snapshot, and the Network Appliance logo are trademarks or registered trademarks of Network Appliance,
Inc. in the U.S. and other countries.
Java, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other
countries, or both.
Microsoft, Windows NT, Windows, and the Windows logo are trademarks of Microsoft Corporation in the
United States, other countries, or both.
Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
xiv SONAS Architecture and Implementation

Draft Document for Review November 1, 2010 9:32 am 7875pref.fm
Preface
IBM® Scale Out Network Attached Storage (SONAS) is a Scale Out NAS offering designed
to manage vast repositories of information in enterprise environments requiring very large
capacities, high levels of performance, and high availability.
The IBM Scale Out Network Attached Storage provides a range of reliable, scaleable storage
solutions for a variety of storage requirements. These capabilities are achieved by using
network access protocols such as NFS, CIFS, HTTP and FTP. Utilizing built-in RAID
technologies all data is well protected with options to add additional protection through
mirroring, replication, Snapshots and backup. These storage systems are also characterized
by simple management interfaces that make installation, administrating and troubleshooting
uncomplicated and straight forward.
This book provides the reader with details of the hardware and software architecture that
make up the SONAS appliance along with configuration, sizing and performance
considerations. It provides information of the integration of the SONAS into an existing
network. The administration of the SONAS appliance through the GUI and CLI is
demonstrated as well as backup and availability scenarios. A quick start scenario takes you
through common SONAS administration tasks to familarize you with the SONAS system.
The team who wrote this book

This book was produced by a team of specialists from around the world working at the
International Technical Support Organization, San Jose Center.
Mary Lovelace is a Consulting IT Specialist at the International Technical Support

Organization. She has more than 20 years of experience with IBM in large systems, storage,
and storage networking product education, system engineering and consultancy, and
systems support. She has written many Redbooks® publications about IBM Tivoli® Storage
Productivity Center, Tivoli Storage Manager, and z/OS® storage products.
Vincent Boucher is an IT Specialist as a member of the EMEA Products and Solutions

Support Center (PSSC) of Montpellier France. His role within the Storage Benchmark team is
to demonstrate the efficiency of IBM solutions and their added value to customers. He holds a
Engineering degree in Mathematics and Computer Science from the ENSEEIHT Engineering
schools in Toulouse. Vincent’s area of expertise include Linux®, IBM systems x, mid-range
IBM Storage, and GPFS™ from both his past High Performance Computing and new Storage
benchmark experiences.
Shradha Nayak is a staff software engineer working with IBM India software Labs in Pune,
India. She holds a Bachelor of Computer Science Engineering degree and has around 6.5
years of experience. She has been working in the storage domain since and has good
expertise in Scale out File Service (SoFS) and Scale Out Network Attached Storage
(SONAS). Prior to this, she worked as a Level-3 developer for Distributed File Service (DFS)
and also worked for AFS®. Shradha is focussing on storage products and cloud storage and
is currently part of the Level-3 developers teams for SONAS. Being a part of the SONAS
developing and testing team, she has developed a thorough knowledge of SONAS, its
components and working. In this book, she has mainly focussed on the Installation,
Configuration and Administration of SONAS. Shradha is also interested in social media and
social networking tools and methodologies.
© Copyright IBM Corp. 2010. All rights reserved. xv

7875pref.fm Draft Document for Review November 1, 2010 9:32 am
Curtis Neal is an Executive IT Specialist working for the System Storage® Group in San
Jose, California. He has over 25 years of experience in various technical capacities, including
mainframe and open system test, design, and implementation. For the past eight years, he
has led the Open Storage Competency Center, which helps customers and IBM Business
Partners with the planning, demonstration, and integration of IBM System Storage Solutions.
Lukasz Razmuk is an IT Architect at IBM Global Technology Services in Warsaw, Poland.

He has six years of IBM experience in designing, implementing and supporting solutions in
AIX®, Linux, pSeries®, virtualization, high availability, GPFS, SAN Storage Area Network,
Storage for Open Systems and IBM Tivoli Storage Manager. Moreover he acts as a Technical
Account Advocate for Polish clients. He holds a master's of science degree in Information
Technology from Polish-Japanese Institute of Information Technology in Warsaw and also
many technical certifications including IBM Certified Advanced Technical Expert System p5®,
IBM Certified Technical Expert pSeries HACMP™, Virtualization Technical Support and
Enterprise Technical Support AIX 5.3.
John Sing John Sing is an Executive IT Consultant with IBM Systems and Technology
Group. John has specialties in large Scale Out NAS, in IT Strategy and Planning, and in IT
High Availability and Business Continuity. Since 2001, John has been an integral member of
the IBM Systems and Storage worldwide planning and support organizations. He started in
the Storage area in 1994 while on assignment to IBM Hong Kong (S.A.R. of China), and IBM
China. In 1998, John joined the Enterprise Storage Server® Planning team for PPRC, XRC,
and FlashCopy®. He has been the marketing manager for these products, and in 2002,
began working in Business Continuity and IT Strategy and Planning. Since 2009, John has
also added focus on IT Competitive Advantage strategy including Scale Out NAS and Cloud
Storage. John is the author of three ITSO Redbooks on these topics, and in 2007, celebrated
his 25th anniversary of joining IBM.
.John Tarella is a Senior Consulting IT Specialist who works for IBM Global Services in Italy.
He has twentyfive years of experience in storage and performance management on
mainframeand distributed environments. He holds a degree in Seismic Structural Engineering
from Politecnico di Milano, Italy. His areas of expertise include IBM Tivoli Storage Manager
and storage infrastructure consulting, design, implementation services, open systems
storage, and storage performance monitoring and tuning. He is presently focusing on storage
solutions for business continuity, information lifecycle management, and infrastructure
simplification. He has written extensively on z/OS DFSMS, IBM Tivoli Storage Manager,
SANs, storage business continuity solutions, content management and ILM solutions. John is
currently focusing on cloud storage delivery. He also has an interest in Web2.0 and social
networking tools and methdologies.
xvi SONAS Architecture and Implementation

Draft Document for Review November 1, 2010 9:32 am 7875pref.fm
Figure 0-1 The team, from left: Curtis, Lukasz, John Tarella, Mary, Vincent, Shradha, John Sing
Thanks to the following people for their contributions to this project:
Sven Oehme
Mark Taylor
Alexander Saupp
Mathias Dietz
Jason Auvenshine
Greg Kishi
Scott Fadden
Leonard Degallado
Todd Neville
Warren Saltzman
Wen Moy
Tom Beglin
Adam Childers
Frank Sowin
Pratap Banthia
Dean Hanson
Everett Bennally
Ronnie Sahlberg
Christian Ambach
Andreas Luengen
Bernd Baeuml
Preface xvii
7875pref.fm Draft Document for Review November 1, 2010 9:32 am
Now you can become a published author, too!

Here's an opportunity to spotlight your skills, grow your career, and become a published
author - all at the same time! Join an ITSO residency project and help write a book in your
area of expertise, while honing your experience using leading-edge technologies. Your efforts
will help to increase product acceptance and customer satisfaction, as you expand your
network of technical contacts and relationships. Residencies run from two to six weeks in
length, and you can participate either in person or as a remote resident working from your
home base.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
򐂰 Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
򐂰 Send your comments in an e-mail to:
redbooks@us.ibm.com
򐂰 Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
xviii SONAS Architecture and Implementation

Draft Document for Review November 1, 2010 9:32 am 7875Overview_Concept.fm
Chapter 1. Introduction to Scale Out File

Network Attached Storage
SONAS is designed to address the new storage challenges posed by the continuing
explosion of data. Leveraging mature technology from IBM’s High Performance Computing
experience, and based upon IBM’s General Parallel File System (GPFS), SONAS is an
easy-to-install, turnkey, modular, scale out NAS solution that provides the performance,
clustered scalability, high availability and functionality that are essential to meeting strategic
Petabyte Age and cloud storage requirements.
The high-density, high-performance SONAS can help organizations consolidate and manage
data affordably, reduce crowded floor space, and reduce management expense associated
with administering an excessive number of disparate storage systems. With its advanced
architecture, SONAS virtualizes and consolidates multiple filers into a single, enterprise-wide
file system, which can translate into reduced total cost of ownership, reduced capital
expenditure and enhanced operational efficiency.
© Copyright IBM Corp. 2010. All rights reserved. 1

7875Overview_Concept.fm Draft Document for Review November 1, 2010 9:32 am
1.1 Marketplace requirements

There are several factors driving the need for a new way of looking at information and the way
we make decisions based on that information. When you think of the changes in our world
today – the instrumentation, interconnectedness and intelligence of our environments – this
produces a massive glut of new information, from new sources, with new needs to leverage it.
This exacerbates some of the challenges that we have been dealing with for awhile now, just
on a whole new scale.
There is an explosion in the amount of data of course, but also shifts in the nature of data
(see Figure 1-1 on page 2). Once virtually all the information available to be "processed" was
authored by someone. Now that kind of data is being overwhelmed by machine-generated
data – spewing out of sensors, RFID, meters, microphones, surveillance systems, GPS
systems and all manner of animate and inanimate objects. With this expansion of the sources
of information comes large variance in the complexion of the available data -- very noisy, lots
of errors -- and no time to cleanse it in a world of real-time decision making.
Also, consider that today’s economic times require corporations and governments to analyze
new information faster and make timely decisions for achieving business goals. As the
volume, variety and velocity of information and decision making increases, this places a
larger burden on organizations to effectively and efficiently distribute the right information, at
the right time, to the people, processes and applications that are reliant upon that information
to make better business decisions.
All of these situations are creating challenges and it also provides an excellent opportunity for
driving an information-led transformation.
Figure 1-1 Explosion of data demands an Information -Led transformation
2 SONAS Architecture and Implementation

Today’s businesses are demanding the ability to create, manage, retrieve, protect and share
business and social digital content or large rich media files over a broadband Internet that
reaches to every corner of the globe (see Figure 1-2 on page 3 ). Users are creating and
using data that is redefining our business and social world in real time. Unlike traditional IT
data, this rich digital content is almost entirely file-based or object-based, and it is growing
ever larger in size, with highly diverse and unpredictable usage patterns.
Figure 1-2 Today’s workloads demand new approach to data access
Innovative applications in business analytics, digital media, medical data and cloud storage
are creating requirements for data access rates and response times to individual files that
were previously unique to high-performance computing environments—and all of this is
driving a continuing explosion of business data.
While many factors are contributing to data growth, these trends are significant contributors:
򐂰 Digital representation of physical systems and processes
򐂰 Capture of digital content from physical systems and sources
򐂰 Deliveries of digital content to a global population
Additional trends are driven by the following kinds of applications:
• Product Life Cycle Management (PLM) systems, which include Product Data
• Management systems and mechanical, electronic, and software design automation
• Service Life Cycle Management (SLM) systems
• Information Life Cycle Management (ILM), including e-mail archiving
• Video on demand: Online, broadcast, and cable
• Digital Video Surveillance (DVS): Government and commercial
• Video animation rendering
• Seismic modeling and reservoir analysis
• Pharmaceutical design and drug analysis
• Digital health care systems
Chapter 1. Introduction to Scale Out File Network Attached Storage 3

• Web 2.0 and service-oriented architecture
When it comes to traditional IT workloads, traditional storage will continue to excel for the
traditional applications for which they were designed. But solutions like Network Attach
Storage (NAS) were not intended to scale to the high levels and extremely challenging
workload characteristics required by today’s Internet-driven, Petabyte Age applications.
1.2 Understanding I/O

A major source of confusion regarding NAS is the concept of File I/O versus Block I/O.
Understanding the difference between these two forms of data access is crucial to realizing
the potential benefits of any SAN-based or NAS-based solution.
1.2.1 File I/O

When a partition on a hard drive is under the control of an operating system (OS), the OS will
format it. Formatting of the partition occurs when the OS lays a file system structure on the
partition. This file system is what enables the OS to keep track of where it stores data. The file
system is an addressing scheme the OS uses to map data on the partition. Now, when you
want to get to a piece of data on that partition, you must request the data from the OS that
controls it.
For example, suppose that Windows® 2000 formats a partition (or drive) and maps that
partition to your system. Every time you request to open data on that partition, your request is
processed by Windows 2000. Since there is a file system on the partition, it is accessed via
File I/O. Additionally, you cannot request access to just the last 10 KB of a file. You must
open the entire file, which is another reason that this method is referred to as File I/O.
Using File I/O is like using an accountant. Accountants are good at keeping up with your
money for you, but they charge you for that service. For your personal checkbook, you
probably want to avoid that cost. On the other hand, for a corporation where many different
kinds of requests are made, an accountant is a good idea. That way, checks are not written
when they should not be.
A file I/O specifies the file. It also indicates an offset into the file (see Figure 1-3 on page 5).
For instance, the I/O may specify “Go to byte ‘1000’ in the file (as if the file were a set of
contiguous bytes), and read the next 256 bytes beginning at that position.” Unlike block I/O,
there is no awareness of a disk volume or disk sectors in a file I/O request. Inside the NAS
appliance, the operating system keeps track of where files are located on disk. It is the NAS
OS which issues a block I/O request to the disks to fulfill the client file I/O read and write
requests it receives.
By default, a database application that is accessing a remote file located on a NAS device is
configured to run with File System I/O. It cannot utilize raw I/O to achieve improved
performance.

Figure 1-3 File I/O
1.2.2 Block I/O

Block I/O (raw disk) is handled differently (see Figure 1-4 on page 5 ). There is no OS format
done to lay out a file system on the partition. The addressing scheme that keeps up with
where data is stored is provided by the application using the partition.
An example of this would be DB2® using its tables to keep track of where data is located
rather than letting the OS do that job. That is not to say that DB2 cannot use the OS to keep
track of where files are stored. It is just more efficient, for the database to bypass the cost of
requesting the OS to do that work.
Figure 1-4 Block I/O
When sharing files across a network, something needs to control when writes can be done.
The operating system fills this role. It does not allow multiple writes at the same time, even

though many write requests are made. Databases are able to control this writing function on
their own so in general they run faster by skipping the OS although this depends on the
efficiency of the implementation of file system and database.
1.2.3 Network Attached Storage (NAS)

Storage systems which optimize the concept of file sharing across the network have come to
be known as NAS. NAS solutions utilize the mature Ethernet IP network technology of the
LAN. Data is sent to and from NAS devices over the LAN using TCP/IP protocol.
One of the key differences in a NAS appliance, compared to direct attached storage (DAS) or
other network storage solutions such as SAN or iSCSI, is that all client I/O operations to the
NAS use file level I/O protocols. File I/O is a high level type of request that, in essence,
specifies only the file to be accessed, but does not directly address the storage device. This is
done later by other operating system functions in the remote NAS appliance.
By making storage systems LAN addressable, the storage is freed from its direct attachment
to a specific server, and any-to-any connectivity is facilitated using the LAN fabric. In
principle, any user running any operating system can access files on the remote storage
device. This is done by means of a common network access protocol—for example, NFS for
UNIX® servers and CIFS for Windows servers. In addition, a task such as backup to tape can
be performed across the LAN using software like Tivoli Storage Manager (TSM), enabling
sharing of expensive hardware resources (for example, automated tape libraries) between
multiple servers.
NAS file system access and administration

Network access methods like NFS and CIFS can only handle file I/O requests to the remote
file system. This is located in the operating system of the NAS device. I/O requests are
packaged by the initiator into TCP/IP protocols to move across the IP network. The remote
NAS file system converts the request to block I/O and reads or writes the data to the NAS
disk storage. To return data to the requesting client application, the NAS appliance software
repackages the data in TCP/IP protocols to move it back across the network.
A storage device cannot just attach to a LAN. It needs intelligence to manage the transfer and
the organization of data on the device. The intelligence is provided by a dedicated server to
which the common storage is attached. It is important to understand this concept. NAS
comprises a server, an operating system, and storage which is shared across the network by
many other servers and clients. So a NAS is a specialized server or appliance, rather than a
network infrastructure, and shared storage is attached to the NAS server.
However, NAS filers today do not scale to high capacities. When one filer was fully utilized, a
second, third, and more filers were installed. The result - administrators found themselves in
the managing 'silos' of filers. Capacity on individual filers could not be shared. Some filers
were heavily accessed while others were mostly idle.
Managing the many different filers adds complexity to the administrator's job. When adding
more storage capacity to some filers; we cannot add more performance to a particular file,
than what is possible with the single disk drive or controller that the filer typically uses. In
other words, there is limited parallelism, (typically, perhaps one, two, or a few controllers) for
serving an individual file. Figure 1-5 on page 7 is an summary of traditional NAS limitations.

Figure 1-5 Network Attached Storage limitations
This is compounded by the fact that at 100s of TBs or more, conventional backup of such a
large storage farm is difficult, if not impossible. There is also the issue that even though one
might be using incremental only backup, but *scanning* 100s of TBs to *identify* the changed
files or changed blocks could in itself take to long, with too much overhead.
More issues include that there may not be any way to centrally and automatically apply file
placement, migration, deletion, and management policies from one centrally managed,
centrally deployed control point. Doing manual management of tens or 100s of filers was
proving to be neither timely nor cost-effective, and effectively prohibited any feasible way to
globally implement automated tiered storage.
1.3 Scale Out Network Attached Storage (SONAS)

IBM Scale Out Network Attached Storage (SONAS) is designed to address the new storage
challenges posed by the continuing explosion of data. IBM recognizes that a critical
component of future enterprise storage is a scale-out architecture that takes advantage of
industry trends to create a truly efficient and responsive storage environment, eliminating the
waste created by the proliferation of scale-up systems and providing a platform for file server
consolidation—and that is where SONAS comes in (see Figure 1-6 on page 8).
Leveraging mature technology from IBM’s High Performance Computing experience, and
based upon IBM’s flagship General Parallel File System (GPFS), SONAS is an
easy-to-install, turnkey, modular, scale out NAS solution that provides the performance,
clustered scalability, high availability and functionality that are essential to meeting strategic
Petabyte Age and cloud storage requirements.
Simply put, SONAS is a scale-out storage system combined with high-speed interface nodes
interconnected with storage capacity and GPFS, which enables organizations to scale

performance alongside capacity in an integrated, highly-available system. The high-density,

high-performance SONAS can help your organization consolidate and manage data
affordably, reduce crowded floor space, and reduce management expense associated with
administering an excessive number of disparate storage systems.
Figure 1-6 IBM SONAS overview
1.3.1 SONAS architecture

The SONAS system is available in as small a configuration as 20 terabytes (TB) usable in the
base rack, up to a maximum of 30 interface nodes and 60 storage nodes in 30 storage pods.
The storage pods fit into 15 storage expansion racks. The 60 storage nodes can contain a
total of 7200 hard-disk drives when fully configured and you are using 96-port InfiniBand
switches in the base rack. With its advanced architecture, SONAS virtualizes and
consolidates multiple filers into a single, enterprise-wide file system, which can translate into
reduced total cost of ownership, reduced capital expenditure, and enhanced operational
efficiency. Figure 1-7 on page 9 provides a high level overview of the SONAS architecture.

Figure 1-7 SONAS architecture
Assuming 2 TB SATA disk drives, such a system has 14.4 petabytes (PB) of raw storage and
billions of files in a single large file system. You can have as few as eight file systems in a fully
configured 14.4 PB SONAS system or as many as 256 file systems. It provides an automated
policy-based file management that controls backups and restores, snapshots, and remote
replication. It also provides:
– A single global namespace with logical paths that do not change because of physical
data movement
– Support for Serial Attached SCSI (SAS) and Serial Advanced Technology Attachment
(SATA) drives
– High-availability and load-balancing
– Centralized management
– Centralized backup
– An interconnected cluster of file-serving and network-interfacing nodes in a redundant
high-speed data network
– Virtually no capacity limits
– Virtually no scalability limits
– IBM Call Home trouble reporting and IBM Tivoli Assist On Site (AOS) remote support
capabilities
– Enhanced support for your Tivoli Storage Manager Server product, with a preinstalled
Tivoli Storage Manager client
– Support for the cloud environment. A controlled set of end users, projects, and
applications can:
• Share files with other users within one or more file spaces
• Control access to their files using access control lists (Microsoft® Windows clients)
and user groups

• Manage each file space with a browser-based tool
Global namespace
SONAS provides a global namespace that enables your storage infrastructure to scale to
extreme amounts of data, from terabytes to petabytes. Within the solution, centralized
management, provisioning, control, and automated information life-cycle management (ILM)
are integrated as standard features to provide the foundation for a truly cloud storage enabled
solution.
interface nodes
The high-performance interface nodes provide connectivity to your Internet Protocol (IP)
network for file access and support of both 1-gigabit Ethernet (GbE) and 10-GbE connection
speeds.Each interface node can connect to the IP network with up to eight separate
data-path connections. Performance and bandwidth scalability are achieved by adding
interface nodes, up to the maximum of 30 nodes, each of which has access to all files in all
file systems
You can scale out to thirty interface nodes. Each interface node has its own cache memory,
so you increase caching memory and data paths in your file-serving capacity by adding an
interface node. Of course, you also increase file-serving processor capacity. If raw storage
capacity is the prime constraint in the current system, the SONAS system scales out to as
much as 14.4 petabytes (PBs) with 2 terabyte (TB) SATA drives, with up to 256 file systems
that can each have up to 256 file-system snapshots. Most systems that a SONAS system
typically displaces cannot provide clients with access to so much storage from a single
file-serving head. Every interface node has access to all of the storage capacity in the
SONAS system.
1.3.2 SONAS scale out capability

SONAS provides extreme scale out capability, a globally clustered NAS file system built upon
IBM GPFS. The global namespace is maintained across the entire global cluster of multiple
storage pods and multiple interface nodes. All interface nodes and all storage nodes share
equally in the cluster to balance workloads dynamically and provide parallel performance to
all users and storage, while also assuring high availability and automated failover.
SONAS is a scalable virtual file storage platform that grows as data grows. It meets
demanding performance requirements as new processors can be added independently or as
storage capacity is added, eliminating a choke point found in traditional scale-up systems.
SONAS is designed for high availability 24x7 environments with a clustered architecture that
is inherently available and, when combined with the global namespace, allows for much
higher utilization rates than found in scale-up environments.
1.3.3 SONAS software

SONAS software leverages a powerful cross-platform access to the same files with locking
for data integrity. In addition, SONAS provides high availability Linux, UNIX, CIFS (Windows)
sessions with no client side changes (see Figure 1-8 on page 11). Deploying SONAS allows
users to reduce the overall number of disk drives and file storage systems that need to be
housed, powered, cooled, and managed relative to scale-up systems.

Figure 1-8 SONAS software
Storage Management benefits

SONAS also provides integrated support of policy-based automated placement and
subsequent tiering and migration of data. Customers can provision storage pools and store
file data according to its importance to the organization. For example, a user can define
multiple storage pools with different drive types and performance profiles. They can create a
higher performance storage pool with SAS drives and define a less expensive (and lower
performance) pool with SATA drives.
Rich, sophisticated policies are built into SONAS which can transparently migrate data
between pools based on many characteristics, such as capacity threshold limits and age of
the data. This helps to address business critical performance requirements. Leveraging
automated storage tiering, users can finally realize the cost savings and business benefits of
information lifecycle management (ILM) at an immense scale.
1.3.4 High availability design

SONAS provides a NAS storage platform for global access of your business critical data.
Your business critical data can be secured with both information protection and business
continuity solutions, giving you a high level of business continuity assurance.
In the event of data corruption or an unexpected disaster that could harm corporate data,
SONAS helps you to recover and quickly resume normal enterprise and data center
operations (see Figure 1-9 on page 12).
SONAS supports large enterprise requirements for remote replication, point-in-time copy (file
system-level snapshots), and scalable automated storage tiering, all managed as a single
instance within a global namespace. SONAS asynchronous replication is specifically
designed to cope with connections that provide low bandwidth, high latency and low

reliability. The async scheduled process will pick-up the updates on the source SONAS
system and write them to the target SONAS system, which may be thousands of miles away.
Security and information protection are enhanced in a consolidated SONAS environment. For
example, users considering the implementation of security and protection solutions are
concerned about maintaining data availability as systems scale—a key design point for
SONAS. Its clustered architecture is designed for high availability at scale and 24 x 7 x
Forever operation, complementing consolidated security and protection solutions to provide
an always-on information infrastructure.
Figure 1-9 High availability and disaster recovery design
1.4 SONAS architectural concepts and principles

In this section we review the overall SONAS architecture and operational principles. We will
start with the logical diagram in Figure 1-10.

Logical
Logical Diagram of IBM SONAS
/home/appl/data/web/important_big_spreadsheet.xls
/home
/home/appl/data/web/big_architecture_drawing.ppt
/ho me/appl/data/web/big_architecture_drawing.ppt
/app l
/data /home/appl/data/web/unstructured_big_video.mpg
/ho me/appl/data/web/unstructured_big_video.mpg
/web
IBM Scale Out NAS Physical
Global Namespace
Policy Engine
Inte rfa c e
node s … Inter fac e
nodes … Interfa ce
node s … > scale
out
Stor age
node s
….. Stor age
node s
….. St ora ge
nodes
... > scale
out
Storage Pool (Tier 1): SAS drives Storage Pool (Tier 2) with 1TB SATA Storage Pool (Tier 3) with 2TB SATA
Figure 1-10 Logical Diagram of IBM SONAS
In the top half of this diagram, we see the logical file directory structure as seen by the users.
SONAS presents and preserves this same logical appearance to the users, no matter what
we do to physically manage these files, and all files in the SONAS, from creation to deletion.
The user sees only his global namespace, his user directories and files. As a SONAS
expands, manages, and changes the physical data location and supporting physical
infrastructure, the users will still have the unchanged appearance of one single logical global
namespace, and maintain their logical file structure without change..
In the lower half of this diagram, we see a representation of the SONAS internal architectural
components. SONAS has interface nodes, which serve data to and from the users, over the
network. SONAS also has storage nodes, which service the storage for the SONAS clustered
file system.
All SONAS nodes are in a global cluster, connected via Infiniband. All interface nodes have
full read/write access to all storage nodes. All storage nodes have full read/write access to all
interface nodes. Each of the nodes runs a copy of IBM SONAS Software (5639-SN1), which
provides all the functions of SONAS, including a Cluster Manager which manages the cluster
and dispatches workload evenly across the cluster.
Also included is the SONAS central storage policy engine, which runs in a distributed fashion
across all the nodes in the SONAS. The SONAS policy engine provides central management
of the lifecyle of all files, in a centrally deployed, centrally controlled, enforceable manner. The
policy engine function is not tied to a particular node, it executes in a distributed manner
across all nodes. Not shown is the SONAS management node(s), which monitors the health
of the SONAS.
IBM SONAS Software manages the cluster and maintains the coherency and consistency of
the file system(s), providing file level and byte level locking, using a sophisticated distributed,
token (lock) management architecture that is derived from IBM General Parallel File System

(GPFS) technology. As we shall see, the SONAS clustered grid architecture provides the
foundation for automatic load balancing, high availability, scale out high performane, with
multiple paralle, concurrent writers and readers.
Physical disk drives are allocated to SONAS logical storage pools. Typically, we would
allocate a high performance pool of storage (which uses the fastest disk drives), and a lower
tier of storage for capacity (less expensive, slower spinning drives). In the example above, we
have allocated three logical storage pools.
1.4.1 Create, write, and read files

To understand the operation of SONAS Software and the interaction of the SONAS Software
functions, it is best to follow the lifecycle of a file as it flows from creation through automated
tiered storage management, to eventual destaging to backups, external storage, or deletion.
We will follow the lifecycle of reading and writing files, as they traverse the SONAS Software
and the SONAS central policy engine, and in this way, we’ll see how the SONAS software
and policy engine are used to manage these files in three different logical storage pools. Via
easy to write rules, a SONAS administrator may automate the storage management , file
placement, and file migration within a SONAS. The central policy engine in SONAS is the one
central location to provide a enforceable powerful set of policies and rules to globally manage
all physical storage, while preserving the appearance of a global namespace and an
unchanging file system to the users.
1.4.2 Creating and writing a file

When a ‘create file’ request comes into SONAS, it is directed to the the SONAS Softwaare
central policy engine. The policy engine has file placement rules, which determine to which of
the logical storage pools the file is to be written.
SONAS Software works together with an external Domain Name Server to allocate an
interface node to handle this client. The incoming workload is IP-balanced equally by the
external network Domain Name Server, across all SONAS interface nodes. As shown in the
following figure, Figure 1-11, we have selected an interface node, and are determining file
placement via the policy engine:

Create and write file 1.1

/home
Logical
/app l
/web
IBM Scale Out NAS
/home/appl/data/web/important_big_spreadsheet.xls Global Namespace

Policy Engine
Inte rfa c e
node s … > scale
out
….. ….. ... > scale
Physica l
Stor age Stor age St ora ge
node s node s nodes
out
Figure 1-11 Create and write file 1 - step 1 Policy
All incoming create file requests pass through the SONAS central policy engine in order to
determine file placement.
The interface node takes the incoming create file request, and based on the logical storage
pool for the file, passes the write request to the appropriate storage nodes. A logical storage
pool can and often does span storage nodes. The storage nodes, in parallel, perform a large
data striped write into the appropriate logical storage pool, exploiting the parallelism of writing
the data simultaneously across multiple physical disk drives.
SONAS data writes are done in a wide parallel data stripe write, across all disk drives in the
logical storage pool. In this way, SONAS Software architecture aggregates the file write and
read throughput of multiple disk drives, thus provideing high performance. SONAS Software
will write the file in physical blocksize chunks, according to the blocksize specified at the file
system level.
The default blocksize for a SONAS file system is 256KB, and IBM recommends that is a good
blocksize for the large majority of workloads, especially where there will be a mix of small
random I/Os and large sequential workloads within the same file system. You may choose to
define the file system with other blocksizes, for example, where the workload is known to be
highly sequnential in nature, you may choose to define the file system with a large 1 MB or
even 4 MB blocksize. See the detailed sizing sections of this book for further
recommendations.
This wide data striping architecture has algorithms that determine where the data blocks
should physically reside - this provides the SONAS Software the ability to automatically tune
and equally load balance all disk drives in the storage pool. This is shown in the following
figure:


/home
Logical
/app l
/web
IBM Scale Out NAS
Global Namespace
Policy Engine
Inte rfa c e
node s … > scale
out
….. ….. ... > scale
Physica l
node s node s nodes
/home/a ppl/data/web/important_big_spr eadsheet.xls out
Figure 1-12 Create and write file 1 - step 2 - wide parallel data stripe write
Now, let’s write another file to the SONAS. As shown below, another interface node is
appropriately selected for this incoming work request by the Domain Name Server, and the
file is passed to that interface node for writing, as shown below:
e

/home
Logical
/app l
/web
IBM Scale Out NAS
Global Namespace
Policy Engine
Inte rfa c e
node s … > scale
out
….. ….. ... > scale

Physica l

node s node s nodes
out
Notice that a different interface node has been chosen - this illustrates the automatic
balancing of the incoming workload across the interface nodes. The interface node is told by
the policy engine that this file is to be written to the 1TB intermediate pool. In the same
manner as previously described, the file is written in a wide data stripe, as shown below: :


/home
Logical
/app l
/web
IBM Scale Out NAS
Global Namespace
Policy Engine
Inte rfa c e
node s … > scale
out
….. ….. ... > scale
Physica l
node s node s nodes
/home/appl/data /web/ big_ar chitecture_dr awing.ppt out
Finally, let’s write a third file. As shown in the following figure, a third interface node is
selected by the Domain Name Server:

/home
Logical
/app l
/web
IBM Scale Out NAS
/home/appl/data/web/unst ructured_big_video.mpg
Global Namespace
Policy Engine
Inte rfa c e
node s … Inter face
nodes … Inte rfa ce
node s … > scale
out
….. ….. ... > scale

Physica l

node s node s nodes
out
The SONAS policy engine has specified that this file is to be written into the 2TB SATA pool.
A wide data stripe parallel write is done as shown in the following figure:


/home
Logical
/app l
/web
IBM Scale Out NAS
Global Namespace
Policy Engine
Inte rfa c e
node s … > scale
out
….. ….. ... >
Physica l
Stor age
node s
Stor age
node s
St ora ge
nodes scale out
/home/a ppl/data/web/unstructur ed_big_video.mpg
With these illustrations, we can now see how the components of the SONAS Software uses
the SONAS policy engine together with the Domain Name Server, to drive workoad equally
across interface nodes. The SONAS Software then appropriately distributes workload among
the storage nodes and physical disk drives.. This is summarized in the following figure:
Create and write files - summary

/home
Logical
/app l
/web
Note: all three files,
in same directory,
but each allocated to
Workload auto- different physical
balanced across all storage pool
IBM Scale Out NAS
interface nodes
Global Namespace
Policy Engine Data striped across
Inte rfa c e
node s … > scale
out
all disks in storage
pool.
High performance,
auto-tuning, auto-
load balancing
….. ….. ... > scale
Physica l

node s node s nodes
out
Figure 1-17 Create and write files - summary

In summary, SONAS Software will automatically balance workload across all interface nodes.
SONAS Software will write all data in wide stripes, cross all disks in the logical storage pool,
providing high performance, automatic tuning, and automatic load balancing.
Most importantly, note that from the user’s perspective, these three files can all reside in the
same logical path and directory. The user does not know that his files are physically located
on different classes of storage (or that the physical location may change over time). This
provides the ability to implement automatic phyiscal tiered storage without impact to users,
and without necessitating time-consuming, process-intensive application level s changes and
change control. The SONAS Software will continue to maintain this same logical file structure
and path, regardless of physical file location changes as the file is managed from creation
through its life cycle, using SONAS automatic tiered storage.
1.4.3 Scale out more performance

Next, let’s see the SONAS Software architecture is leveraged to scale out increased
performance.
Performance in SONAS can be increased by simply adding more disk drives to a SONAS
logical storage pool. With more drives, SONAS Software will write a wider parallel data stripe.
The data stripe is not limited to any particular storage pod or storage node; a logical storage
pool can span multiple storage nodes and storage pods, as shown in the following figure:
Increase write performance – scale out more disk drives

/hom e
Logical
/appl
/data
/web
IBM Scale Out NAS
Global Namespace
Policy Engine
Inte rf ac e
node
… Inter fa ce
node
… Inte rf ac e
node
…> scale
out
Physical
Stora ge
node s
….. Stor age
node s
….. St or age
nodes
... > scale
out
/home/appl/data/web/im portant_bi g_spreadsheet.xls
Expand Storage Pool with more disk drives Ti er 3
Figure 1-18 Scale out more disk drives for more write performance
By simply adding more disk drives, the SONAS Software architecture provides the ability to
scale out both the number of disk drives and the number of storage nodes that can be applied
to support a higher amount of parallel physical data write. The logical storage pool can be as
large as the entire file system - and the SONAS file sytem can be as large as the entire
SONAS machine. In this way, SONAS provides a extremely scalable and flexible architecture
for serving large scale NAS storage.

The SONAS Software architecture provides the ability to expand the scale and capacity of the
system in any direction that is desired. The additional disk drives and storage nodes can be
added non-disruptively to the SONAS. Immediately upon doing so, SONAS Software will start
to automatically auto-balance and auto-tune new workload onto the additional disks, and
automatically start takinge advantage of the additional resources.
1.4.4 Reading a file

Reading data in SONAS applies the same principles of exploiting the wide data stripe for
aggregating the performance of reading data in parallel, across multiple disk drives, as shown
in the following figure:
Read files – one user

/home
Logical
/app l
/data
/web
Parallel read of file,

IBM Scale Out NAS performance is
aggregate all disk
drives in that storage
pool.
Global Namespace
Policy Engine
Inte rfa c e
node s … > scale
out
Physica l
Stor age
node s
….. Stor age
node s
….. St ora ge
nodes
... > scale
out
/home/appl/data/web/important_big_sprea dshee t.xls
Tier 1 Tier 2 Tier 3
Figure 1-19 Read files - aggregate parallel data reads
Furthermore, the interface node is designed to utilize advanced algorithms that improve
read-ahead and write-behind file functions, and recognizes and does intelligent pre-fetch
caching of typical access patterns like sequential, reverse sequential and random, as shown

Read files 1.2

/home
Logical
/app l
/data
/web
Interface node
performs read-ahead
caching, intelligent
pre-fetch
IBM Scale Out NAS
Global Namespace
Policy Engine
Inte rfa c e
node s … Inter face
nodes … Inte rfa ce
node s … > scale
out
Physica l
Stor age
node s
….. Stor age
node s
….. St ora ge
nodes
... > scale
out
Figure 1-20 Read files - read-ahead caching, intelligent pre-fetch
In the same way that write performance can be enhanced by simply adding more disk drives
to the logical storage pool, read performance can be enahnced in the same way, as shown in
the following figure:
Read files – one user – scale out more disk drives

/home
Logical
/app l
Interface node
/data
performs read-ahead
/web caching, intelligent
pre-fetch
Read performance is aggregate

IBM Scale Out NAS of all disk drives in storage pool.
More disks and storage pods

provides more performance
Policy Engine
Inter face
node s … Int er fac e
nodes … Inter fac e
nodes … > scale
out
Physica l
Stor age
node s
….. Stor age
node s
….. S tora ge
nodes
... > scale
out
/ho me/app l/d ata /we b/impor ta nt _big_ s pr ea ds he et .xls\
Expand Storage Pool with more disk drives Tier 3
Figure 1-21 Scale out more disk drives for read performance - parallel data stripe read
Note that the parallelism in the SONAS for an individual client is in the storage read/write - the
connection from the interface to the client is a single connection and single stream. This is
done on purpose, so that any standard CIFS, NFS, FTP, or HTTPS client can access the IBM

SONAS interface nodes without requiring any modification or any special code. Throughput
between the interface nodes and the users are enhanced by sophisticated read-ahead and
pre-fetching and large memories on each interface node, to provide very high in capacity and
throughput on the network connection to the user.
As requirements for NAS storage capacity or performance increase. the SONAS Software
scale out architecture provides linearly scalable, high performance, parallel disk I/O
capabilities by:
򐂰 Striping data across multiple disks, across multiple storage nodes and storage pods
򐂰 Reading and writing data in parallel wide data stripes. Increasing the number of disk
drives in the logical storage pool can increase the performance
򐂰 Supporting a large block size, configurable by the administrator, to fit I/O requirements
򐂰 Utilizing advanced algorithms that improve read-ahead and write-behind file functions.
SONAS recognizes typical access patterns like sequential, reverse sequential and
random and optimizes I/O access for these patterns
This scale-out architecture of SONAS Software provides superb parallel performance,

especially for larger data objects and excellent performance for large aggregates of smaller
objects.
1.4.5 Scale out parallelism and high concurrency

The SONAS Software scale out architecture is designed to leverage the ability to have many
nodes active concurrently, thus providing the ability to scale to many tens or hundreds of
thousands of users in parallel.
Today’s SONAS is able to provide a scale out NAS architecture that can grow to 30 interface
nodes and 60 storage nodes, all in a global active-active share-everything cluster. The
technology that allows SONAS to do this is derived from the IBM General Parallel File System
(GPFS), which has proven in the large supercomputing environment, the ability to scale to
this high levelof many thousands of nodes. IBM is scaling down that GPFS capability and
providing access to this capability, in the SONAS appliance.
With the SONAS clustered node architecture, the larger the machine, the more parallel the
capability to concurrrently scale out capacity and performance for many, many individual
nodes, and many many concurrent users and their storage requests in parallel. This is shown

Read files – multiple users – in parallel

/home
Logical
/home/appl/data/web/big_architecture_design.ppt
/app l
/web
IBM Scale Out NAS

Parallel streaming
Global Namespace reads of multiple files
to multiple nodes
Policy Engine
Inter fa ce
nodes … Inter face
nodes …> scale
out
Physica l
Stor age
node s
….. Stor age
node s
….. S tora ge
nodes
... > scale
out
/home/appl/data/web/important_big_sprea dshee t.xls
Figure 1-22 SONAS Software parallel concurrent file access
The value of the SONAS scale out architecture is the ability to flexibly and dynamically add as
many nodes as needed, to increase the amount of parallel concurrrent users that can be
supported. Each individual node works in parallel to service clients, as shown as follows:
Read files – multiple users – in parallel

/home
Logical
/home/appl/data/web/big_architecture_design.ppt
/app l
/web
IBM Scale Out NAS All interface nodes

performs read-ahead
caching, intelligent
Global Namespace pre-fetch in parallel
Policy Engine
Inter face
nodes … Inter fac e
nodes … > scale
out
Physica l
Stor age
node s
….. Stor age
node s
….. S tora ge
nodes
... > scale
out
Figure 1-23 SONAS Software parallel concurrent service of multiple users
SONAS has the same operational procedures and read/write file system architectural
philosophy, whether you have a small two interface nodes and two storage node SONAS, or
a 30 interface node and 60 storage node very large SONAS.

1.4.6 Manage storage centrally and automatically

Now that we have seen how SONAS Software can provide linear scale out performance and
capacity for petabytes of storage, we next need to consider how the software provides tools
to physically manage this storage when we are operating at this level of scale. In particular:
򐂰 How do we affordably automate the phyiscal management of that much storage?
򐂰 What automated tools does SONAS provide for me to do this?
򐂰 Will these tools allow me to operate at this scale with less people and less resources?
The answer is a definite “Yes”.
Let’s see how SONAS Software provides integrated, automated tools to help you accomplish
these goals.
1.4.7 SONAS logical storage pools for tiered storage

SONAS Software is designed to help you to achieve data lifecycle management efficiencies
through providing integrated policy-driven automation and tiered storage management, all as
part of the base SONAS Software license. SONAS Software provides integrated logical
storage pools, filesets and user-defined policies to provide the ability to do automated tiered
storage, and therefore more efficiently match the cost of your storage to the value of your
data.
SONAS logical storage pools allow you to allocate physical storage hard drives to logical
storage pools within the SONAS file system. Using logical storage pools, you can create tiers
of storage by grouping phyiscal disk storage based on performance, locality or reliability
characteristics. Logical storage pools can span storage nodes and storage pods. You may
have multiple logical storage pools (up to 8 per file system), the size of a storage pool can be
as big as the entire file system, and the file system can be as big as the entire SONAS.
SONAS automatically manages, load-balances, and balances storage utilization at the level
of the entire logical storage pool. In the example shown below, multiple logical storage pools
have been set up:

Logical storage pools and filesets in an IBM SONAS SONAS

fileset
/h ome /home/appl/data/web/file_in_storage_pool_1.xls
/ho me/a pp l /d ata /we b/im por tant _big_s pre ads hee t.xls
/ap pl
/home/appl/data/web/file_in_storage_pool_2.ppt
/h om e/ap p l/d ata/w eb /big_ ar chite ctur e _dra wing.ppt
/d ata
/w eb
/home/appl/data/web/file_in_storage_pool_3.mpg
/h om e/ap p l/d ata/w eb /unstr uctur ed_ big_ video.m pg
IBM Scale Out NAS
Global Namespace
Policy Engine scale
In e
t rfa c
e n od es
In e
t rfa c
e n od es
I nt er f a c
e n ode s
I nt er f ac
e no de s
I nt er f ac
e no de s
In e
t r f ac
e no des
In e
t rfa c
e n od es
In e
t rfa c
e n od es …> out
scale
out
Stor ag
e
nodes
Stor ag
e
nodes
t orag
S
e
nodes
St orag
e
nodes
St orag
e
nodes
St orag
e
nodes
St orag
e
nodes
St roag
e
nodes
Stor ag
e
nodes
Stor ag
e
nodes
Stor ag
e
nodes
Storag
e
nodes
St or g
e
nodes
a St orag
e
nodes
St orag
e
node s
St orag
e
nodes
St orag
e
nodes …>
scale
…> out
etc….. etc….. etc… ..
Logical Logical Logical

Storage Storage Storage
Pool 1 Pool 2 Pool 3 External HSM
storage
Figure 1-24 SONAS Logical Storage Pools
Logical storage pool #1 could be high performance SAS disks, and logical storage pool #2
might be more economical Nearline large TB disk drives. Logical storage pool #3 might be
another large TB drive storage pool defined with external Hierarchical Storage Management
for when the data is intended to be staged in and out of the SONAS, to external storage,
external tape or tape libraries, or external data de-deduplication technology. Within the
internal SONAS logical storage pools, all of the data management, from creation to physical
data movement to deletion,is done by SONAS Software1.
In addition to internal storage pools, SONAS also supports external storage pools that are
managed through an external Tivoli Storage Manager server. When moving data to an
external pool, SONAS Software utilized a high performance scan engine to locate and identify
files that need to be managed, and then hands the list of the files to be managed to either the
SONAS Software data movement functions (for moving data internally within the SONAS), or
passes the list of files to be moved externally to the Tivoli Storage Manager for bacup and
restore, or for HSM storage on alternate media, such as tape, tape library, virtual tape library,
or data de-duplication devices.
If the data is moved for Hiearchical Storage Management purposes, a stub file is left on the
disk, and this HSM data can be retrieved from the external storage pool on demand, as a
result of an application opening a file. HSM data can also be retrieved in a batch operation if
desired.
SONAS Software provides the file management concept of a fileset. A fileset is a sub-tree of
the file system namespace, and provides a way to partition the global namespace into
smaller, more manageable units. A fileset is basically a named collection of files and/or
1 Note that if all data movement is within a SONAS, there is *no* need for an external TSM server.

directories, that you wish to operate upon or maintain as a common unit.. Filesets provide an
administrative boundary that can be used to set quotas and be specified in a user defined
policy, to control initial data placement or data migration. Currently, up to 1,000 filesets can
be defined per file system, and it is a known requirement to increase this number in the future.
Data and files in a single SONAS fileset can reside in one or more logical storage pools. As
the data is physically migrated according to storage policy, the fileset grouping is maintained.
Where the file data physically resides, and how and when it is migrated, is based on a set of
rules in a user defined policy, that is managed by the SONAS Software policy engine. Let’s
next overview this SONAS central policy engine.
1.4.8 SONAS Software central policy engine

All files under the control of SONAS Software are managed under the control of a integrated
central storage policy engine. Within the central policy engine are rules that specify all
aspects of file managment. There are two types of rules:
򐂰 Initial physical file placement
򐂰 Physical file management: this includes physical movement of the file between tiers of
disk storage, and between disk storage and external tape, virtual tape library, or
de-duplication storage. File management rules can also include backup/restore, global
alteration of file characterstics according to any file system criteria, and deletion of files
that have expired
File placement policies determine which storage pool file data is initially placed in. File
placement rules are determined by attributes known when a file is created, such as file name,
user, group or the fileset. Examples might be: ‘place all files that end in .avi onto the silver
storage pool’, ‘place all files created by the performance critical applications in the gold
storage pool’, or ‘place all files in the fileset ‘development’ in the copper pool’.
Files written to SONAS are physically placed according to these rules, and these rules are
contained in a SONAS storage policy. The SONAS administrator writes these rules, which
are SQL-like statements.
Examples of these rules are shown as follows:

Policy engine and Storage Policies

• Automated Tiered Storage policy statem ent examples:
• Migration policies, evaluated periodically
• rule 'cleangold' migrate from pool ‘TIER1' threshold (90,70) to pool ‘TIER2‘
• rule 'hsm' migrate from pool ‘T IER3' threshold(90,85) weight(current_timestamp –

access_time) to pool ‘HSM' where file_size > 1024kb
• rule 'cleansilver' when day_of_week()=Monday migrate from pool 'silver' to pool 'bronze'
where access_age > 30 days
• Deletion policies, evaluated periodically
• rule 'purgebronze' when day_of_month()=1 delete from pool 'bronze' where access_age>365
days
• There are also P olicies for:

• File based Backup/Archive
• Restore/Retrieve
• Many more options…..
Figure 1-25 SONAS Software policy engine and storage policies
Once files exist in a SONAS file system, SONAS Software file management policies allow
you to move, change the replication status or delete files. You can use file management
policies to move data from one pool to another, without changing the file’s location in the
directory structure.
The rules are very flexible; as an example, you may write a rule that says: ‘replicate all files in
/database/payroll which have the extension *.dat and are greater than 1 MB in size to storage
pool 2’. In addition, file management policies allow you to prune the file system, deleting files
as defined by policy rules.
File management policies can use more attributes of a file than file placement policies,
because once a file exists there is more known about the file. In addition to the file placement
attributes, the policies can now utilize attributes such as last access time, size of the file or a
mix of user and file size. This may result in policy statements such as: ‘delete all files with a
name ending in .temp that have not been accessed in 30 days’, ‘move all files that are larger
than 2 GB to pool2’, or ‘migrate all files owned by GroupID=Analytics that are larger than 4GB
to the SATA storage pool’.
Rules can include attributes related to a pool instead of a single file, using the threshold
option. Using thresholds you can create a rule that moves files out of the high performance
pool if it is more than 80% full, for example.
The threshold option comes with the ability to set high low and pre-migrate thresholds. This
means that SONAS Software begins migrating data at the high threshold, until the low
threshold is reached. If a pre-migrate threshold is set, SONAS Software begins copying data
until the pre-migrate threshold is reached. This allows the data to continue to be accessed in
the original pool until it is quickly deleted to free up space the next time the high threshold is
reached.
Policy rule syntax is based on the SQL 92 syntax standard and supports multiple complex
statements in a single rule enabling powerful policies. Multiple levels of rules can be applied
because the complete policy rule set is evaluated for each file when the policy engine
executes.

1.4.9 High performance SONAS scan engine

We apply these storage management rules to all files in the SONAS.
However, as numbers of files and storage grows from terabytes into petabytes, storage
management now has a major new requirement: how can we scan the file system(s) fast
enough in order to identify files that must be:
򐂰 Migrated to different storage pool
򐂰 Propagated to remote site(s)
򐂰 Backed up
򐂰 Restored
򐂰 Deleted
򐂰 Any other storage management requirements
As the numbers of files continues to grow, the time required for this scan using traditional
‘walk the directory tree’ methods, becomes a major obstacle to effective storage
management. Shrinking backup and storage management windows require scan times to
stay small or even shrink, even as the file systems continue grow to the hundreds of terabytes
to petabyte level. At at the level of petabyte storage scalability, it becomes unfeasible to use
the traditional methods of ‘walking the directory trees’ to identify files - that simply takes too
long.
To address this essential requirement, the SONAS is specifically designed to provide a high
performance, high speed scan engine.
The SONAS scan engine is an integrated part of the file system. Also integrated into the
SONAS file system is an internal database of file system metadata, which is specifically
designed for the integrated scan engine. The goal of these two functions is to provide the
ability to scan the file system very quickly, at any scale, extending to billions of files.
Let’s see how this works in more detail. To begin a scan to identify files, we submit a job to
the SONAS Software central policy engine to evaluate a set of policy rules, as shown as
follows:

Scan Engine •Scan Engine reads internal SONAS file system metadata
•Does not need to read the file or directory tree
/home
•All nodes can participate in scan of file system
/app l
/data
/web
Central policy
engine starts scan
by reading policies
IBM Scale Out NAS
1. Start scan Global Namespace

Policy Engine
Inter fac e Inte r fac e Int er face
node
Interface nodes node node
2. Read policies
Scale
Out
Stor age nodes Stor age nodes St or a ge node s S torage node s St ora ge node s S tora ge node s
Figure 1-26 High performance scan engine - start scan by reading policies
The SONAS Software high performance scan engine is designed to utilize the multiple
hardware nodes of the SONAS in parallel, to scan the internal file system metadata. The
multiple nodes equally spread the policy engine rule evaluation, file scan identification, and
and subsequent data movement responsibilities over the multiple nodes in the SONAS
cluster.
If greater scan speed is required, more SONAS nodes can be allocated to the scan, and each
node would scan only its equal portion of the total scan. This architectural aspect of SONAS
Software provides a very scalable, high performance, scale out rule processing engine, that
can provide the speed and parallelism required to address petabyte file system scan
requirements. This is shown in the following figure:

/home
/appl
/data
/web Some or all nodes

(both storage and
interface) participate
Parallel metadata scan in parallel scan
engine
Scan > 10 million
IBM Scale Out NAS files/minute
Global Namespace
Policy Engine
Interface Interface
Interface
node
Scale
3. Parallel Scan 3. Parallel Scan
Out
Storage nodes Storage nodes Storage nodes Storage nodes Storage nodes Storage nodes
Figure 1-27 High performance scan engine - parallel scan of metadata by all nodes
The results of the parallel scan are aggregated, and returned as the actionable list of
candidate files, as shown in the following figure:
/home
/app l
/data
Scan results
/web completed in much
shorter period of
time, compared to
traditional methods
IBM Scale Out NAS
4. Return results of scan

Global Namespace
Policy Engine
Inter fac e Inte r fac e Int er face
node
Scale
Out
Figure 1-28 High performance scan engine - return results of parallel scan
Notice that the SONAS scan engine is not limited to just storage. The scan engine can also
be used to:
򐂰 Reset file attributes according to policy (change deletions, change storage pool allocation,
etc)

򐂰 Run reports on file system usage and user activities

򐂰 Identify changed data blocks for asynchronous replication to remote site
Summary - SONAS Software high performance scan engine

The SONAS parallel scan engine functionality:
򐂰 Reads policy engine policies
򐂰 Identifies files that need to be moved within the physically tiered storage, sent to remote
sites, etc.
򐂰 Enables and makes feasible automated tiered storage at terabyte and petabyte scale
The SONAS high performance scan engine:

򐂰 Does not need to read the file or directory tree
򐂰 Reads special metadata integrated and maintained by the file system
򐂰 All nodes can participate in parallel scan of file system
򐂰 Delivers very high performance scan with minimized impact on concurrent workloads
򐂰 Can perform scan on frequent basis due to low overhead
As long as the data movement is within the SONAS, or between SONAS devices, then all
physical data movement is done solely through SONAS Software, and no involvement of any
external servers or external software is involved.
This combination of the internal file system metadata and the SONAS scale out parallel grid
software architecture enables SONAS to provide a architectural solution to scanning the file
system(s) quickly and efficiently, at the level of millions and billions of files in a short period of
time. The SONAS Software integrated high performance scan engine and data movement
engine work together to make feasible, the management of automated tiered storage, with
physical data movement transparent to the users, at the level of hundreds of terabytes to
petabytes in the file systems .
1.4.10 High performance physical data movement for ILM / HSM

Now that we have used the scan engine to identify candidate files for automated storage
management, let’s see how the parallel grid architecture of SONAS is used to scale out
physical data movement.
Once the list of candidate files has been identified using the SONAS parallel scan engine,
SONAS Software then performs physical data movement according to the outcome of the
rules. Physical data movement is also performed using the multiple hardware nodes of the
SONAS cluster in parallel.
An example of physically moving a file from ‘Storage Pool 1’ to ‘Storage Pool 2’ is shown in
the following figure:

High performance internal data movement
/home
/app l
/data
/home/appl/data/web/unstructured_big_video.mpg
/web
6. All nodes (both

storage and
IBM Scale Out NAS interface) can
participate in parallel
data movement
5. Perform results of scan
Global Namespace
Policy Engine
Int er fac e Inte r fac e Int er face
node Interface nodes node node
Scale
Out
Figure 1-29 High performance parallel data movement for ILM - pool 1 to pool 2
All files remain online and fully accessible during this physical data movement - the logical
appearance of the file path and location to the user does not change. The user has no idea
that the physical location of his file has moved. This is one of the design objectives of the
SONAS.
According to the results of the scan, SONAS continues with other physical file movement.
According to policy, data can be up-staged as well as down-staged, as shown in Figure 1-30
on page 33:

High performance internal data movement
/home
/app l
/data
/web
6. All nodes (both

storage and
IBM Scale Out NAS interface) can
participate in parallel
5. Perform results of scan data movement
Global Namespace
Policy Engine
Scale
Out
Figure 1-30 High performance parallel data movement for ILM - pool 2 to pool 1
As the SONAS grows in capacity over time, it is a straightforward matter to add additional
nodes to the parallel cluster, thus maintaining the ability to perform and complete file system
scans and physical data movement in a timely manner, even as the file system grows into
hundreds of terabytes and petabytes.
1.4.11 Hierarchical storage management, backup/restore to external storage

SONAS also supports the ability to extend physical data movement to external storage
outside of the SONAS. There are two types of operations to external storage:
򐂰 Hierarchical storage management (HSM)- migrate inactive files to external storage,
while leaving a stub file on disk.
򐂰 Backup / restore (B/R)- backup or restore copies of files, to and from SONAS and
external storage
Traditional software that performs these functions can accomplish these functions on
SONAS, by walking the directory trees, identifying candidate files through normal means, and
performing normal LAN I/O to do data movement. In this case, the normal parameters of file
system scan time will apply.
Tip: IBM has an extensive Independant Software Vendor (ISV) certification program for
SONAS. Enterprises use many ISV applications for their storage management to address
business requirements. IBM has done extensive testing and intends to continue to ensure
interoperability and compatibility of the leading ISV applications with SONAS to reduce
deployment risks.

1.4.12 Requirements for high performance external HSM and backup restore
SONAS can support any standard HSM and backup/’restore software. These conventional
solutions use normal ‘walk the directory tree’ methods to identify files that need to be
managed and moved, and then copies these files using conventional methodls.
However, as file systems continue grow to the hundreds of terabytes to petabyte level, the
following requirements have arisen.
򐂰 The elapsed time for traditional scans to identify files that need to be moved for HSM or
backup/restore purposes, are becoming too long. In other words, due to the scale of the
search, the time required to ‘walk the directory tree’ is becoming too long, and incurs a
very large amount of small block IOPs.
򐂰 These long scan times can severely inhibiting the ability to manage a large amount of
storage. In many cases, the scan time alone could be longer than the backup or tiered
storage management window.
򐂰 In addition, once we do identify the files, the large amount of data that this can represent
often drives the need for very high data rates in order to accomplish the needed amount of
Hierarchical Storage Management or backup/restore data movement, within a desired
(and continually shrinking) time window
Therefore, in order to address these issues and make feasible automated tiered storage at
large scale, SONAS provides a specific set of technology exploitations to solve these issues,
significantly reduce this overly long scan time, to perform efficient data movement as well as
heiarchical storage management to external storage..
SONAS does this by providing optional (yet highly recommended) exploitation and integration
with IBM Tivoli Storage Manager. SONAS Software has specific high performance integration
with Tivoli Storage Manager to provide accelerated backup/restore and accelerated, more
functional hierarchical storage management to external storage.
1.4.13 SONAS high performance HSM using Tivoli Storage Manager

With IBM Tivoli Storage Manager, the SONAS scan engine works together, combining
SONAS parallel grid architecture with software parallelism in Tivoli Storage Manager, to
significantly enhance the speed, performance, and scale of both HSM and backup/restore
processes.
In a SONAS environment, Tivoli Storage Manager does not need to walk directory trees to
identify files that need to be moved to external storage, backed up, or restored. Instead, the
SONAS high performance scan engine is used to identify files to be migrated, and Tivoli
Storage Manager server(s) are architected to exploit multiple SONAS interface nodes, in
parallel, for data movement.
The architecture of the SONAS heiarchical storage management to external storage, is

shown in the following figure:

Hierarchical Storage Management to tape using TSM

Scan engine results
TSM/HSM
Server
Parallel data streams
Migrate inactive
data to tape, tape lib,
or de-duplication device
IBM Scale Out NAS Stub file is left on

disk, remainder of
file migrated to
external
Figure 1-31 Hiearchial Storage Management to external storage using Tivoli Storage Manager
In this SONAS + Tivoli Storage Manager HSM scenario, a stub file is left on disk. allowing the
appearance of the file to be active in the file system. Many operations, such as ‘list files’, are
satisfied by the stub file without any need for recall. You have flexible control over the HSM
implementation, such as specifying the size of the stub file, the minimum size of file in order to
be eligible for migration, and so on. If the file is requested to be accessed but is resident only
on external storage, the file is transparently auto-recalled from the external storage through
the TSM server. Data movement to and from external storage is done in parallel through as
many multiple SONAS interface nodes as desired, maximizing throughput through
parallelism. Data can be pre-migrated, re-staged and de-tstaged, according to policy.
In this manner, SONAS provides the ability to support the storing of petabytes of data in the
online file system, yet staging only the desired portions of the file system on the actual
SONAS disk. The external storage can be any TSM-supported storage, including external
disk, tape, virtual tape libraries, or data de-duplication devices.
1.4.14 SONAS high performance backup/restore using Tivoli Storage Manager

The same SONAS and Tivoli Storage Manager architecture is used for backup and restore
acceleration. The first step in backup / restore is to identify the files that need to be backed up
for Tivoli Storage Manager’s ‘incremental forever’ method of operation. The SONAS high
performance scan engine is called by Tivoli Storage Manager to perform this task, and then,
the SONAS scan engine passes this list of identified changed files to be backed up t oTSM.
Rules for backup are submitted to the be included in a SONAS policy engine scan of the file
system. The high performance scan engine locates files that needs to be backedup, builds a
list of these files, and then passes these results to the Tivoli Storage Manager server, as
shown in Figure 1-32:

Backup/restore acceleration using Tivoli Storage Manager
Scan engine results

Tivoli Storage
Manager
backup ser ver
Parallel data streams

•Any TSM-supported
devices including:
•ProtectTier de-dup
•Virtual Tape Library
•Tape
IBM Scale Out NAS
Figure 1-32 Backup and restore acceleration usingTivoli Storage Manager
Let’s now examine how the SONAS and Tivoli Storage Manager exploitation works in a little
more detail.
1.4.15 SONAS and Tivoli Storage Manager integration in more detail

The first step in either HSM or backup/restore is to identify the files that need to be migrated
or backed up. We will submit Tivoli Storage Manager rules to the SONAS central policy
engine scan of the file system, which specify to perform a external storage HSM, or a high
performance backup or restore.
After scanning the file system as previously described in this chapter, SONAS passes the
scan engine results back to the Tivoli Storage Manager server, as shown in Figure 1-33
below:

SONAS scan engine identifies files for Tivoli Storage Manager
/home/appl/data/web/
important_big_spreadsheet.xls
/home
big_architecture_drawing.ppt
/app l
/data /home/appl/data/web/
unstructured_big_video.mpg
/web
Scan engine List of files passed

identifies files to be
restaged (up or to Tivoli Storage
down), or to be Manager server
backed
IBM Scale OutupNAS
Tiv oli St ora ge
Global Namespace Mana ger
Policy Engine
Int er fac e
node … Inte rfa ce
node …
2. Pass list of changed
files to TSM
1. Scan to identify
changed files
St or a ge
node
… St ora ge
node
…
Tier 1: SAS drives Tier 2: SATA drives
Figure 1-33 SONAS scan engine and Tivoli Storage Manager
Rather walking the directory tree, the SONAS scan engine uses multiple SONAS interface
nodes to parallel scan the file system and identify the list of changed files. SONAS then
passes the list of changed files directly to the TSM server. In this way, we use the SONAS
scan engine to avoid the need to walk the directory tree, and to avoid the associated
traditional time-consuming small block directory IOs.
The results of the scan are divided up among multiple interface nodes. These multiple
interface nodes then work in parallel to with the TSM server(s) to initiate the HSM or
backup/restore data movement, creating parallel data streams. The TSM software
implements a ‘virtual node’ function, that allows the multiple SONAS interface nodes to
stream the data in parallel to a TSM server, as shown in Figure 1-34 on page 38:

Data movement to external storage using Tivoli Storage Manager
important_big_spreadsheet.xls
/home
big_architecture_drawing.ppt
/app l
/data /home/appl/data/web/
unstructured_big_video.mpg
/web
High perfor mance parallel

data movement to Tivoli
Storage Manager server
IBM Scale Out NAS
Tiv oli St ora ge
Global Namespace Mana ger
Policy Engine
Int er fac e
node … Inte rfa ce
node …
2. Pass list of changed
files to TSM
1. Scan to identify
changed files
St or a ge
node
… St ora ge
node
…
Tier 1: SAS drives Tier 2: SATA drives

3. Parallel data movement using
multiple interface nodes to TSM
server - backup to Tape, Virtual Tape
Lib, or De-duplication
Figure 1-34 Parallel data streams between SONAS and TSM
In this way, the SONAS Software and Tivoli Storage Manager work together to exploit the
SONAS scale out architecture to perform these functions at petabyte levels of scalability and
performance. As higher data rates are required, more interface nodes may be allocated to
scale out the performance in a linear fashion, as shown in the following figure:
High performance data movement to external

storage using Tivoli Storage Manager server
/home
/app l
/data
/web
TSM
IBM Scale Out NAS s er ve rs
TS M
ser v er s
Policy Engine
Scale
Out
Stub files left on

disk, auto-recall
Figure 1-35 High performance parallel data movement at scale, from SONAS to external storage

SONAS scale out architecture combined with Tivoli Storage Manager can be applied to
maintain desired time windows for automated tiered storage, hierarchical storage
management, and backup/restore, even as file systems grow into hundreds of terabytes to
petabytes.
Note: SONAS only requires external Tivoli Storage Manager servers if you wish to exploit:
򐂰 Acclerated Heiarchical Storage Management to external storage pools
򐂰 Accelerated backup/restores and HSM that exploit the SONAS Software scan engine
򐂰 Accelerated external data movement fhat exploits multiple parallel interface nodes to
raise the backup/restore and HSM data rates
All internal data movement within a SONAS (i.e. between internal SONAS logical storage
pools) or between SONAS machines (i.e. SONAS async replication) is done by the
SONAS Software itself, and does not require any involvement with an external Tivoli
Storage Manager servers.
Of course, SONAS also supports conventional external software that performs

backup/restore and heiarchical storage management, through normal ‘walk the directory
tree’ and normal copying of files.
More information on SONAS and Tivoli Storage Manager integration may be found in
“SONAS and Tivoli Storage Manager integration” on page 118 and “Backup and restore of file
data” on page 181.
1.4.16 Summary - lifecycle of a file using SONAS Software

In this section, we have seen the lifecycle of a file in SONAS, and through that, have seen an
overview of the operational characteristics of SONAS Software. We have seen how SONAS
Software performs:
򐂰 Creation of files
򐂰 Serving of files in a high performance manner, including providing scalability and parallel
high performance using wide striping
򐂰 Automatic tiered storage management, effecting physical storage movement using central
policy engine
򐂰 Migration of data to external storage for hierarchical storage management and for backup,
using an external Tivoli Storage Manager server
In the remainder of this chapter, we will explore in more detail how SONAS Software provides
a rich, integrated set of functions to accomplish these operational methods.
As shown in the following figure, this is the end result of SONAS automated tiered storage,
centralized data management capability to manage the lifecyle of a file:

End result SONAS automated tiered storage:

/home
/app l
/data
/web
Note: all three files,

no change to logical
directory
TSM
IBM Scale Out NAS s er ve rs
TS M
ser ver s
Policy Engine
Int er fac e Inte r fac e Int er fac e
Scale
Out
Stor age nodes Stor age nodes St or a ge node s S tora ge node s St ora ge node s S torage node s
Physical data movement Stub files left on

transparent to users disk, auto-recall
Figure 1-36 End result - SONAS automated tiered storage
During all of these physical data movement and management operations, the user logical file
path and appearance emains untouched. The user does not have any idea that this large
scale, high performance physical data management is being automatically performed on his
or her behalf.
We will now begin a discussion of more detail on SONAS Software architecture, components,
and operational methodologies.
1.4.17 Summary
SONAS is a highly desired choice for organizations seeking to better manage their growing
demand for file-based storage. SONAS is designed to consolidate data that is scattered in
multiple storage locations and allows them to be efficiently shared and managed. The
solution helps improve productivity by providing automated ILM, automatic storage allocation,
user management by storage quota, and universal reporting and performance management.

Draft Document for Review November 1, 2010 9:32 am 7875HardwareArchitecture.fm
Chapter 2. Hardware architecture

The basic hardware structure of the SONAS appliance product is discussed in this chapter.
The configuration consists of a collection of interface nodes which provide file services to
external application machines running standard file access protocols such as NFS or CIFS, a
collection of storage nodes which provide a gateway to the storage and at least one
management node which provides a management interface to the configuration. In addition to
the nodes there are switches and storage pods.
This chapter describes the hardware components of the SONAS appliance and provides the
basis for configuration, sizing and performance sonsiderations discussed throughout the
book.

7875HardwareArchitecture.fm Draft Document for Review November 1, 2010 9:32 am
2.1 Nodes
The SONAS system consists of three types of server nodes.
򐂰 set of interface nodes which provide connectivity your Internet Protocol (IP) network for file
services to external application machines running standard file access protocols such as
NFS or CIFS
򐂰 management node which provides a management interface to the SONAS configuration
򐂰 storage nodes which provide a gateway to the SONAS storage
The management node, the interface nodes, and the storage nodes all run the SONAS
software product in a Linux operating system. Product software updates to the management
node are distributed and installed on each of the interface nodes and storage nodes in the
system.
The interface nodes, management nodes and the storage nodes are connected through a
scalable, redundant infiniband fabric allowing data to be transferred between the interface
nodes providing access to the application and the storage nodes with direct attachements to
the storage. Infiniband was chosen for its low overhead and high speed - 20 Gbits/sec for
each port on the switches. The basic SONAS hardware structure is shown in Figure 2-1.
Figure 2-1 Overview of SONAS hardware structure

2.1.1 Interface nodes

The Interface node is a 2U server that provides the TCP/IP data network connectivity and the
file services for the SONAS system. SONAS supports the following file-serving capabilities:
򐂰 Common Internet File System (CIFS),
򐂰 Network File System v3 (NFSv3) or
򐂰 File Transfer Protocol (FTP), HyperText Transfer Protocol Secure (HTTPS)
The interface node contains two redundant hot-swappable 300GB 2.5-inch 10K RPM SAS
hard disk drives (HDDs) with mirroring between the two HDDs for high-availability. The HDDs
contain the SONAS system software product, containing the operating system and all other
software needed for an operational SONAS system.
These nodes can operate at up to 10 Gb speeds with optional adapters to providing extremely
fast access to data. They are connected to the rest of the SONAS system via a redundant
high speed InfiniBand data network.
Files can be accessed through each of the interface nodes which provide a highly scalable
capability for data access. Additional data access performance can be obtained by adding
interface nodes up to the limits of SONAS. SONAS R1 allows a minimum of two interface
nodes and a maximum of 30 interface nodes. Collections of files (file systems) are provided
via storage nodes which are gateways to storage controllers and disk drawers. All interface
nodes can access all storage on all storage nodes. All storage nodes can send data to any
interface node.
Two of the onboard Ethernet ports connect to the internal private management network within
the SONAS system for health monitoring and configuration. The other two onboard Ethernet
ports connect to the IP external network for network file serving capabilities. Two of the PCIe
adapter slots are available for use to add more adapters for host IP interface connectivity.
The InfiniBand Host Channel Adapters (HCAs) attach to the two independent InfiniBand
switches in the system to interconnect the interface nodes to the management nodes and the
storage nodes in an InfiniBand fabric.
An Interface node contains the following:

򐂰 Two Intel® Nehalem EP quad-core processors
򐂰 32GB of Double Data Rate (DDR3) memory standard, with options to expand to 64GB or
128GB.
򐂰 Four onboard 10/100/1000 Ethernet ports:
– Two of which are used within the system for management features
– Two of which are available for connectivity to customer’s, external IP network
򐂰 Two 300GB 2.5-inch Small Form Factor (SFF) 10K RPM SAS Slim-HS hard disk drives
with mirroring between the two HDDs (RAID 1) for high availability
򐂰 Four PCIe Gen 2.0 x8 adapter slots:
– The top two adapter slots each contain a single-port 4X (DDR InfiniBand Host Channel
Adapter (HCA) card that connects to the SONAS InfiniBand fabric for use within the
system.
– One of the bottom two adapter slots can optionally contain zero or one Quad-port
10/100/1000 Ethernet NIC (FC 1100)
– One of the bottom two adapter slots can optionally contain zero or one Dual-port 10Gb
Converged Network Adapter (CNA) (FC 1101)
Chapter 2. Hardware architecture 43

򐂰 The bottom two adapter slots can have zero, one, or two optional Ethernet cards, but only
one of each kind of adapter with a maximum total number of six additional ports.
򐂰 Integrated Baseboard Management Controller (iBMC) with an Integrated Management
Module (IMM)
򐂰 Two redundant hot-swappable power supplies
򐂰 Six redundant hot-swappable cooling fans
Optional features for interface nodes

The following are optional features for interface nodes that need to be considered when
planning for your SONAS device.
– Additional 32GB of memory (FC 1000)
Feature code 1000 provides an additional 32 GB of memory in the form of eight 4 GB
1333MHz double-data-rate three (DDR3) memory dual-inline-memory modules
(DIMMs). The feature enhances system throughput performance, but it is optional. You
can order only one of FC 1000 per interface node. Installation of FC 1000 into an
already installed interface node is a disruptive operation that requires you to shut down
the interface node. However, a system with a functioning interface node continues to
operate with the absence of the interface node being upgraded.
– 128 GB memory option (FC 1001)
Provides 128 GB of memory in the interface node. This feature provides for installation
of 128 GB of memory in the interface node in the form of sixteen 8 GB 1333 MHz
DDR3 memory DIMMs. Only one of FC 1001 may be ordered. Feature #1001 is
mutually exclusive with feature #1000 on the initial order of an interface node.
Installation of FC 1001 in an existing already installed SONAS interface node is a
disruptive operation requiring the interface node to be shut down. However, a SONAS
system will continue to operate with the absence of the interface node being upgraded
with the addition of FC 1001.
– Quad-port 1GbE NIC (FC 1100)
This provides a quad-port 10/100/1000 Ethernet PCIe x8 adapter card. This NIC
provides four RJ45 network connections for additional host IP network connectivity.
This adapter supports a maximum distance of 100m using Category 5 or better
unshielded twisted pair (UTP) four-pair media. The customer is responsible for
providing the network cables to attach the network connections on this adapter to their
IP network. One of feature code 1100 can be ordered for an interface node. The
manufacturer of this card is Intel, OEM part number: EXPI9404PTG2L20
– Dual-port 10Gb Converged Network Adapter (FC 1101)
This provides a PCIe 2.0 Gen 2 x8 low-profiles dual-port 10Gb Converged Network
Adapter (CNA) with two SFP+ optical modules. The CNA supports short reach (SR)
850nm multimode fiber (MMF). The customer is responsible for providing the network
cables to attach the network connections on this adapter to their IP network. One of
feature code 1101 can be ordered for an interface node. The manufacturer of this card
is Qlogic, OEM part number: FE0210302-13
Note: Cat 5e cable or better is required to support 1 Gb network speeds. Cat 6

provides better support for 1 Gbps network speeds.
The 10 GbE data-path connections support short reach (SR) 850 nanometer (nm)
multimode fiber (MMF) optic cables that typically can reliably connect equipment up
to a maximum of 300 meters (M) using 50/125µ (2000MHz*km BW) OM3 fiber.

2.1.2 Storage nodes

The storage node is a 2U server that connects the SONAS system to the InfiniBand cluster
and also directly connects to the fibre channel attachment on the SONAS storage controller.
Storage nodes are configured in high-availability (HA) pairs that are connected to one or two
SONAS storage controllers. Two of the onboard Ethernet ports connect the storage node to
the internal private management network, and two for a NULL Ethernet connection to the
DASD Disk Storage Controller. All interface nodes can access all storage on all storage
nodes. All storage nodes can send data to any interface node. A SONAS system contains a
minimum of two storage nodes and a maximum of 60 when using 96-port InfiniBand switches
in the base rack. When using 36-port InfiniBand switches in the base rack, the maximum
number of storage nodes is 28.
The Storage node contains two redundant hot-swappable 300 GB 2.5-inch 10K RPM SAS
HDDs with mirroring between them for high-availability. The hard disk drives contain the
SONAS System Software product which hosts the operating system and all other software
needed for an operational SONAS system.
All of the PCIe x8 adapter slots in the storage node are already populated with adapters. Two
of the PCIe adapter slots are populated with two single-port 4X DDR InfiniBand HCAs for
attaching to the two InfiniBand switches in the SONAS system. The other two PCIe x8
adapter slots are populated with two dual-port 8 Gbps Fibre Channel Host Bus Adapters
(HBAs) for attaching to the SONAS Storage Controller.
A storage node contains the following:

򐂰 Two Intel Nehalem EP quad-core processors
򐂰 8GB of Double Data Rate (DDR3) memory
– Two of which are used within the system for management features
– Two of which connect directly to Disk Storage Controllers.
with mirroring between the two HDD’s (RAID 1) for high availability
– The top two adapter slots each contain a single-port 4X (DDR InfiniBand Host Channel
Adapter (HCA) card that connects to the SONAS InfiniBand fabric for use within the
system.
– Two of bottom slots each contain a dual-port 8Gb/s Fiber Channel Host Bus Adapters
(HBAs) for attaching to the SONAS Storage Controller.
Module (IMM)
2.1.3 Management nodes

The Management node is a 2U server that provides a central point for the system
administrator to configure, monitor, and manage the operation of the SONAS cluster. The
management node supports both a browser-based graphical user interface (GUI) and a
command line interface (CLI). It also provides a System Health Center for monitoring the

overall health of the system. A single management node is required. The SONAS system will
continue to operate without a management node, but configuration changes can only be
performed from an active management node.
Note: SONAS administrators will only interact with the storage and interface nodes directly
for the purpose of debug under the guidance of IBM service. You will have no need to
access the underlying SONAS technology components for SONAS management
functions, and will have not need to directly access the interface or storage nodes.
The management node contains two hot-swappable 300GB 2.5-inch 10K RPM SAS hard disk
drives with mirroring between the two HDD’s for high-availability. The hard disk drives contain
the SONAS System Software product, containing the operating system and all other software
needed for an operational SONAS system. The third hot-swappable 300GB 2.5-inch 10K
RPM SAS hard disk drive stores the logging and trace information for the entire SONAS
system.
Two of the PCIe x8 adapter slots are already populated with two single-port 4X Double Data
Rate (DDR) InfiniBand Host Channel Adapters (HCA). The two HCAs attach to two
independent InfiniBand switches in the SONAS system and interconnect the management
node to the other components of the SONAS system.
A Management node contains the following:

򐂰 Two Intel Nehalem EP quad-core processors
򐂰 32GB of Double Data Rate (DDR3) memory standard
– Two of which are used to connect to your Internet Protocol (IP) network for health
monitoring and configuration.
– Two of which connect to the internal private management network within the SONAS
system for health monitoring and configuration
with mirroring between the two HDDs (RAID 1) for high availability
򐂰 One non-mirrored 300 GB 2.5-inch SFF 10K RPM SAS Slim-HS hard disk drive for
centralized collection of log files and trace data
– Two of which each contain a single-port 4X Double Data Rate (DDR) InfiniBand Host
Channel Adapter (HCA) for use within the system
– Two of which are available for your use to add more adapters for host IP interface
connectivity
Module (IMM)
The management node comes with all of the cables that are required to connect it to the
switches withinthe base rack. The management node is assumed to be in the SONAS base
rack with the two InfiniBand switches.

2.2 Switches
The SONAS system contains internal Infiniband, internal Ethernet switches and external
customer supplied external Ethernet switches.
2.2.1 Internal Infiniband switch

All major components of a SONAS system like interface nodes, storage nodes and
management node are interconnected by a high-performance low-latency InfiniBand 4X
Double Data Rate (DDR) fabric. Two redundant InfiniBand switches are included inside each
SONAS system. For small and medium configurations a 1U 36 port 4X DDR InfiniBand switch
is available. For larger configurations a 7U 96-port 4X DDR InfiniBand switch is available.
You can choose the larger InfiniBand switch, but start with basic configuration with just the
default 24-port InfiniBand card in each switch and as demand for your system grows, you can
order additional 24-port cards until you scale out the system to the maximum amount of 96
InfiniBand ports in each switch. Two identical InfiniBand switches must be ordered for a
SONAS system, either two 36-port InfiniBand switches or two 96-port InfiniBand switches.
Important: At the time of ordering the SONAS, you must choose between the smaller
InfiniBand 36-port switch or the larger 96-port InfiniBand switch. It is not possible to field
upgrade the 36-port InfinBand switch to the 96-port InfiniBand switch. Your only option
would be to purchase a new base rack with the 96 port InfiniBand switch. The migration
would require a file system outage for the migration to the new base rack. It is important to
take into consideration future growth in your environment and order appropriately.
The SONAS InfiniBand network supports high bandwidth, low latency file system data and
control traffic among the nodes of the system. The data carried by the InfiniBand fabric
includes low level file system data, as well as the TCP/IP-based locking and management
messaging traffic. The locking and management traffic occurs on IP over InfiniBand, which
bonds a single IP address across all of the InfiniBand adapters. There are two switch size:
total backplane bandwidth is 144 Gigabytes/sec for the smaller 36-port Infiniband switches,
and 384 Gigabytes/sec for the larger 96 port Infiniband switches).
Cabling within a rack is provided as part of the rack configuration and it is done by IBM. You
must order InfiniBand cable features for inter-rack cabling after determining the layout of your
multi-rack system.
36-port InfiniBand switch

The SONAS 36-port InfiniBand switch (2851-I36) is a 1U 4X DDR InfiniBand switch that
provides 36 QSFP ports that each operate at 20 gigabits per second (Gbps).
This InfiniBand switch provides a maximum backplane bandwidth of 1.44 terabits per second
(Tbps) and contains an embedded InfiniBand fabric manager.
The switch provides two redundant hot-swappable power supplies. The 36-port InfiniBand
switch (2851-I36) has no options or upgrades.
96-port InfiniBand switch

The SONAS 96-port InfiniBand switch (2851-I96) provides a 7U Voltaire ISR2004 96-port 4X
DDR Infiniband switch providing up to 96 4x DDR CX4 switch ports. The 96-port InfiniBand
switch is intended for large SONAS system configurations. The InfiniBand switch provides a
maximum backplane bandwidth of 3.84 Tbps. The 96-port InfiniBand switch comes standard
with the following:

򐂰 Two Voltaire sFB-2004 Switch Fabric boards

򐂰 One Voltaire sLB-2024 24-port 4X DDR InfiniBand Line Board
򐂰 One Voltaire sMB-HM Hi-memory Management board containing an embedded InfiniBand
fabric manager.
򐂰 Two sPSU power supplies
򐂰 All fan assemblies
The 96-port switch comes standard with one 24-port 4X DDR InfiniBand line board providing
24-port 4X DDR (20 Gbps) IB ports. Up to three additional sLB-2024 24-port 4X DDR
InfiniBand line boards may be added for a total of 96-ports.
The 96-port InfiniBand switch comes standard with two sLB-2004 Switch Fabric boards. Up to
two additional sLB-2004 Switch Fabric boards may be added to provide additional backplane
bandwidth.
The 96-port InfiniBand switch comes standard with two power supplies. Up to two additional
sPSU power supplies may be added for redundancy. The two standard power supplies are
capable of powering a fully configured 96-port InfiniBand switch with:
򐂰 Four sFB-2004 Switch Fabric boards
򐂰 Four sLB-2024 24-port 4X DDR line boards
򐂰 Two sMB-HM Hi-memory Management boards
You may upgrade the 96 port switch non-disruptively.
Infiniband backplane bandwidth:
Infiniband switches = 20 Gbit/sec per port (2 GBytes/sec per port)
Infiniband 36-port switch backplane = 1.44 Tbits/sec (144 GBytes/sec total)
Infiniband 96-port switch backplane = 3.84 Tbits/sec (384 GBytes/sec total)
The Infiniband switches have sufficient bandwidth capability to handle a fully configured
SONAS solution.
2.2.2 Internal private Ethernet switch

The management network is an internal network to the SONAS components. Each SONAS
rack has a pair of 50-port Ethernet switches, which are interconnected to form this network.
The management network is designed to carry low bandwidth messages in support of the
configuration, health, and monitoring of the SONAS subsystem. All major components of a
SONAS system like interface nodes, storage nodes, management nodes, infiniband switches
are connected to internal ethernet network via internal, private Ethernet switches.
The adapters connecting to the Ethernet switches are bonding with one another such that
they share a single IP address. This requires that the Ethernet switches in each rack be
interconnected with one another such that they form a single network.
For multi-site “sync” configurations, the management network needs to be extended between
the sites, connecting the internal switches from the base racks at each site.

The internal Ethernet switches can NOT be shared with external customer’s connections.
They can be used only for internal communication within SONAS cluster.
Internal IP addresses
During installation you can choose one of the IP address ranges listed below. The range you
select must not conflict with the IP addresses used for the customer Ethernet connections to
the Management Node(s) or Interface Nodes.
The available IP address ranges are:

򐂰 172.31.*.*
򐂰 192.168.*.*
򐂰 10.254.*.*
Integrated Management Module

Integrated Baseboard Management Controller (iBMC) with an Integrated Management
Module (IMM) on all interface nodes, storage nodes and management node may be
connected to the inteAdditional details about Baseboard Management Controller can be
found in IBM eServer xSeries and BladeCenter Server Management, SG24-6495.
Ethernet cabling within a rack is provided as part of the rack and rack components order. You
must order inter-rack Ethernet cables to connect the Ethernet switches in a rack to the
Ethernet switches in the base rack.
2.2.3 External Ethernet switches

You have to provide an external TCP/IP infrastructure for external connections in your
installation (data and management). This infrastructure can not be shared with internal
SONAS Ethernet switches. This network infrastructure has to support 1Gb/s or 10Gb/s
Ethernet links - it depends on NIC/CNA mounted in interface nodes. These swicthes and
cables are not provided as part of the SONAS appliance.
All interface nodes have to be connected to yout external network infrastructure for data
serving. The management node has to be connected to your external network infrastructure
for management the operation of the SONAS cluster via browser-based graphical user
interface (GUI) and a command line (CLI).
2.2.4 External ports - 1 GbE / 10 GbE

SONAS supports up to 30 interface nodes which connect customer Ethernet network over up
to two 10Gb/s ports or up to six 1Gb/s ports per interface node with data. SONAS is designed
as a parallel grid architecture. Every node is a balanced modular building block, with
sufficient main memory and PCI bus capacity to provide full throughput with the adapters
configured to that node. Therefore, it is not possible, for example, for the 10 Gb/s ethernet
card to overrun the PCI bus of the interface node, because the PCI bus has more capacity
than the 10 Gb/s ethernet card. The Infiniband switches have sufficient bandwidth capability
to handle a fully configured SONAS. It means that customer can choose between 1Gb/s or
10Gb/s communication for SONAS storage and SONAS clients, internal SONAS network in
any case will not be a bottleneck. The choice should depend on requested speed between
SONAS clients and SONAS storage and current network infrastructure. By default all
ethernet 10 Gb ports in interface nodes work in bonding configuration as active-backup
interfaces (only one slave in the bond is active) and all 1Gb ports work in bonding as load

balancing configuration, so transfer even with 1Gb/s adapters can be sufficient in some cases
(See “Bonding” on page 148).
2.3 Storage pods

Each storage pod consists of a pair of storage nodes, at least one storage controller with high
density storage (at least 60 disks). A pod may be expanded by additional second storage
controller and one storage expansion attached for 60 additional disks per controller and per
storage expansion. Each storage pod provides dual paths to all storage for reliability. Each
interface node, and each storage pod, operate in parallel with each other. There is no fixed
relationship between the number of interface nodes and storage pods.
Storage pods can be scaled in providing an ability to increase storage capacity and
bandwidth independently of the interface nodes. Capacity may be added to the system in two
ways either adding disks to existing storage pods, or adding storage pods. Each storage pod
supports a maximum of 240 hard disk drives.
Storage pod expansion

In Figure 2-2 on page 50 you can find two possible ways to expand a Storage pod. In
Figure 2-3 on page 51 you can see how to further expand the Storage pod. A maximum of 30
storage pods are supported for a maximum of 7200 hard disk drives in a single system.
SONAS stripes data across all disks in the storage pool, storage pod performance may be
increased by adding more disks to the pod. The highest performance will be achieved by
adding a new storage controller instead of expanding an existing one using a storage
expansion unit. You can mix the SAS and SATA within a storage pod, but all the drives within
an enclosure have to be the same type.
Storage
StorageNode
Node Storage
Storage Node
Node
2851- SS1
2851-SS1 2851- SS1
2851-SS1
Storage Pod starts with 1 drawer of 60 disks
Storage
StorageController
Controller
2851-DR
2851-DR11
1
Includes
Includes 11dr awer of
drawer of60
60SATA
SATA
or
or SAS
SAS disks
disks
Storage Pod
Performance is Performance is
approx. 1.85x approx. 2x
Two possible ways to expand Storage Pod: of 60-disk pod of 60-disk pod
Storage Storage
StorageNNode
ode Storage
StorageNode
StorageNode
Node Storage
Storage Node
Node Node
2851- SS1
2851-SS1 2851- SS1
2851-SS1 2851-SS1
2851-SS1 2851-SS1
2851-SS1
Storage
StorageController
Controller Storage
StorageCContr
ontroller
oller Storage
StorageController
C ontroller
2851-DR
2851-DR11 2851-DR1
2851-DR1 2851-DR1
2851- DR1
or
Includes
Includes 11dr awer of
drawer of60
60SATS
SATS Includes
Includes 11 drawer
drawer of
of 60
60 SATS
SATS Includes
Includes 11drawer
drawer of
of60
60SATS
SATS
or
or SAS
SAS disks
disks or
or SAS
SAS disks
disks or
or SAS
SAS disks
disks
Disk
D isk Storage
Stor ageExpansion
ExpansionUnit
Unit
2851-DE1
2851-DE1
Includes
Includes 11 drawer
drawer of
of60
60SATA
SATA or
or 2a 2b
SAS
SAS disks
disks
Sto rage Pod Storage Pod
Figure 2-2 Adding additional storage controllers and expansion units to storage pod

SStorage
torage Node
Node Storage
St orage Node
Node Storage
St orage Node
Node SSttorage
orage Node
Node
2851-S S1 2851-SS 1 2851-SS1
2851-S S1 2851-SS1
2851-SS 1
2851-SS1 2851-SS1 To expand t hese
Storag e
Storage Pods
St orag e Storage
S torage Controller
Cont roller further Po d
2851-DR1 StStorage
orage Control ler
Cont roller Storage
St orage Cont roller
Control ler
2851-DR1
Pod …… 2851-DR1
2851-DR1 2851-DR1
2851-DR1
Includes
Incl udes11 drawer
drawerofof60
60 SATA
SA TA
or Includes
I ncludes11drawer
drawerofof60 SATA
60S ATA Includes
Inc ludes11drawer
drawerofof 60
60
orSA S disks
SAS disks
or
orSSAS
AS disks
disks SSATA
ATA oror SAS
SA Sdisks
disks
Dis
Diskk Storage
St orage Expansi on Uni
Expansion Unitt
2851-DE1
2851-DE1
Includes
Incl udes11 drawer
drawer ofof60
60 SATA
SA TA 2a 2b
or
orSA
SASS disks
disks
SStorage
torage Node
Node Storage
S torage Node
Node Storage
S torageNode
Node StStorage
orage Node
Node
2851-S S1
2851-SS1 2851-SS1
2851-S S1 2851-SS
2851-SS11 2851-S S1
2851-SS1
Add
Storage
St orage Controller
Controller Storage
S torage Controller
Cont roller 2851-DR1 StStorage
orage Control ler
Cont roller StStorage
orage Controller
Controller
2851-DR1
2851-DR1 2851-DR1
2851-DR1 includes 2851-DR1
2851-DR1 2851-DR1
2851-DR1
drawer of Add
I Includes
nc ludes 11 drawer
drawer ofof60
60 SATA
SA TA Includes
Includes 11 drawer
drawer ofof60
60 Includes
I ncludes11drawer
drawerofof60
60SATA
S ATA Includes
Includes11 drawer
drawerofof60
60
or
orSAS
S ASdisks
dis ks SATA
SA TAororSAS
S ASdisks
disks 60 driv es… or
orSSAS
AS disks
disks SA TA or
SATA orSSAS
AS disks
disks
2851-DE1
t o either
Di
DisskkStorage
St orage Expans ion Uni
Expansion Unitt Dis
Diskk Storage
St orage Expansi on Unit
Expansion Unit
2851-DE 1 2851-DE1 Storage
2851-DE1 2851-DE1
Controller
Includes
Includes11drawer
drawerofof60
60SATA
S ATA I Includes
nc ludes 11 drawer
drawer ofof60
60
or
orSA S disks
SAS disks SATA
S ATAororSAS
SASdisks
dis ks
Storage Storage Pod St orage Node

SStorage
torage Node
Node
Storage P od Storage Node
S torage Node S torageNode
Node Storage Node
2851-SS
2851-SS11 2851-S S1
2851-SS1
2851-S S1
2851-SS1 2851-SS1
2851-S S1
Then complet ely fill StStorage

orage Control ler
Cont roller StStorage
orage Controller
Controller
Storage
St orage Controller Storage
S torage Controller
Controller
2851-DR1
2851-DR1
Cont roller
2851-DR1
2851-DR1
the Storage Pod 2851-DR1
2851-DR1 2851-DR1
2851-DR1
by
I Includes
nc ludes 11 drawer
drawer ofof60
60 SATA
SA TA Includes
Includes 11 drawer
drawer ofof60
60 adding final
Includes
I ncludes11drawer
or
drawerofof60
60SATA
S ATA Includes
Includes11 drawer
drawerof
of60
60 Then fill in the
orSSAS
AS disks
disks SARA
SA RAoror SAS
SA Sdisks
disks
or
orSAS
S ASdisks
dis ks SATA
SA TAororSAS
S ASdisks
disks
2851-DE1 other position with
DiDis
skkStorage
St orage Expans ion Uni
E xpansion Unitt Disk
Dis kStStorage
orage Expans
Expansiion
on Unit
Unit drawer of
Disk
Dis kStorage
S torageExpans
2851-DE1
ion Unit
Expansion Unit Disk
Dis kStStorage
orage Expans
Expansiion
2851-DE1
on Unit
Uni t 2851-DE1
2851-DE1 2851-DE 1
2851-DE 1
2851-DE1 2851-DE1
2851-DE 1 60 drives wit h 60 drives
Includes
Includes11drawer
drawerofof60
60 Includes
Includes11drawer
drawerofof60
60SSATA
ATA
Includes
I ncludes11drawer
drawerofof60 SATA
60S ATA Incl udes 11drawer
Includes drawerofof60
60SA TA
SATA SA TA or
SATA orSSAS
AS dis ks
disks or
or SAS
SA Sdis
diskkss
or
orSSAS
AS disks
dis ks oror SAS
SA Sdis
diskkss
Storage P od Storage Pod
Figure 2-3 Adding additional storage controllers and expansion units to storage pod.
Note: The storage within the IBM SONAS is arranged in storage pods, where each
Storage pod contains:
򐂰 Two storage nodes
򐂰 One or two high density storage controllers
򐂰 Zero, one or two high density disk Storage expansion units
2.3.1 SONAS storage controller

The high density SONAS storage controller (2851-DR1) is a 4U enclosure containing
redundant active/active RAID controllers, redundant power and cooling modules and 60 hard
disk drives that connects the SONAS system to its storage. SONAS contains dual redundant
hot-swappable RAID controllers and dual redundant hot-swappable power supplies and
cooling modules. Each SONAS RAID controller supports up to 60 3.5-inch SAS or 60 SATA
hard-disk drives.
The storage controller is configured to use either

򐂰 RAID 5 with 450 GB or 600 GB SAS hard-disk drives
򐂰 RAID 6 with 1 or 2 terabyte (TB) SATA hard-disk drives
All 60 disk drives in the storage controller must be the same type, either six 10-packs of 7.2K
RPM 1 TB SATA hard-disk drives or six 10-packs of 2 TB SATA drives or six 10-packs of 15K

RPM 450 GB or 600 GB SAS drives. You can not mix drive types or sizes within an
enclosure. Controller and attached expansion driver can contain different disk types. You can
order one high-density disk-storage expansion unit to attach to each storage controller.
Each SONAS storage controller supports up to four Fibre Channel host connections, two per
Controller. Each connection is auto-sensing and supports 2 Gb/s, 4 Gb/s or 8 Gb/s.
Each RAID controller contains:

򐂰 4 GB of cache
򐂰 Two 8 Gbps fibre channel host ports
򐂰 One drive-side SAS expansion port
The storage controller supports RAID 1, 5, 6, 10, but it is configured by default to work only
with RAID 5 or RAID 6 arrays according to hard disk drive type. Currently you can not change
the predefined RAID levels. The storage controller automatic drive failure recovery
procedures ensure that absolute data integrity is maintained while operating in degraded
mode. Both full and partial (fractional) rebuilds are supported in the storage controller.
Rebuilds are done at the RAID level. Partial rebuilds will reduce the time to return the RAID
level to full redundancy. The timer will begin when a disk in the RAID level is declared
missing. If the disk reappears prior to the expiration of the timer, a fractional rebuild will be
done. Otherwise, the disk will be declared failed, replaced by a spare and a full rebuild will
begin to return the Storage Pool to full redundancy. The default partial rebuild timer (Disk
Timeout) setting is 10 minutes. The controller supports limit between 0 and 240 minutes, but
currently only supported value is default configuration. Under heavy write workloads, it is
possible that the number of stripes that need to be rebuilt will exceed the system’s internal
limits prior to the timer expiration. When this happens, a full rebuild will be started
automatically instead of waiting for the partial rebuild timeout (see Table 2-1 on page 52).
Table 2-1 Configured and supported RAID arrays

Disk drive type RAID level Number of RAID RAID Total Raw usable capacity
arrays per configuration Drives
controller or
expansion unit
1TB 10k RPM SATA RAID 6 1 8+P+Q 60 46 540 265 619 456
2TB 10k RPM SATA RAID 6 1 8+P+Q 60 93 956 704 567 296
450GB 15k RPM SAS RAID 5 1 8+P+Spare 60 20 564 303 413 248
600GB 15k RPM SAS RAID 5 1 8+P+Spare 60 27 419 071 217 664
Figure 2-4 shows layout of drives in an SONAS storage controller.

Figure 2-4 SONAS Storage controller drive layout
2.3.2 SONAS storage expansion unit

The high density disk storage expansion (2851-DE1) unit is a 4U enclosure containing
redundant connections, redundant power and cooling modules and 60 hard disk drives. One
disk storage expansion unit may be attached to each storage controller. The storage
controller and the disk expansion unit support both high performance 15K RPM SAS disk
drives and high-capacity 7.2K RPM SATA disk drives, but all the drives within an enclosure
have to be the same type.
2.4 Connection between components

This section describes the connections between the SONAS components.
2.4.1 Interface node connections

A single interface node has five 1 gigabit Ethernet (GbE) path connections (ports) on the
system board, two of the onboard Ethernet ports connect to the internal private management
network within the SONAS system for health monitoring and configuration. The other two
onboard Ethernet ports are used for connectivity to your external IP network for network file
serving capabilities and one is used for connectivity to Integrated Baseboard Management
Controller (iBMC) with an Integrated Management Module (IMM) that enables the user to
manage and control the servers easily. Two of the PCIe adapter slots are available for
customer use to add more adapters for host IP interface connectivity. Six additional Ethernet
connections to the customer TCP/IP data network are possible for each interface node. A
4-port Ethernet adapter card feature can provide four 1 GbE connections. A 2-port Ethernet
adapter card feature can provide two 10 GbE connections. You can have zero or one of each
feature in a single interface node. In Table 2-2 on page 54 you can find physical connectivity
for each card in an Interface node. The possible configurations of data path connections
(ports) are shown in Figure 2-5 on page 54.

Table 2-2 Possible data port configurations

Number of ports in various configurations of a single Interface Node
Interface node on Number of Installed features Total number of data

board 1GbE path connectors
connectors (always 2 Feature Code 1100, Feature Code 1101,
ports are available) Quad-port 1GbE Dual-port 10GbE
Network Interface Converged Network
Card (NIC) Adapter (CNA)
2 0 0 2
2 0 (1 with two ports) 4
2 1 (with 4 ports) 0 6
2 1 (with 4 ports) 1 (with 2 ports) 8
Figure 2-5 Interface node connectivity
Interface Node rear view

In Figure 2-6 on page 55 you can see rear view of an Interface node.

Figure 2-6 Interface node
1. PCI slot 1 (SONAS single-port 4X DDR InfiniBand HCA)

2. PCI slot 2 (SONAS quad-port 1 GbE or dual-port 10 GbE feature for additional TCP/IP
data path connectors)
3. PCI slot 3 (SONAS single-port 4X DDR InfiniBand HCA)
4. PCI slot 4 (SONAS quad-port 1 GbE or dual-port 10 GbE feature for additional TCP/IP
data path connectors)
5. Ethernet 2 (SONAS GbE management network connector)
6. Ethernet 1 (SONAS GbE management network connector)
7. Ethernet 4 (TCP/IP data path connector)
8. Ethernet 3 (TCP/IP data path connector)
9. Integrated Management Module (IMM) Integrated Baseboard Management Controller
(iBMC) with an Integrated Management Module (IMM)
Note: If you are using only Ethernet ports 3 (point 8) and 4 (point 7) for external network
connections to an interface node then that daughter card is a single point of failure for that
one node. In the event of a failure of an entire network card, the interface node with this
network card will be taken offline, and the workload running on that interface node will be
failed over, by SONAS software, to another interface node. So failure of a single interface
node should not be a significant concern, although very small systems could see a
performance impact. As an example, a system with the minimum two interface nodes
could see workload on the remaining interface node double if one interface node fails.
2.4.2 Storage node connections

Two of the 1 gigabit Ethernet (GbE) connections are to each of the storage controllers that
the storage node uses and also has an additional integrated Baseboard Management
Controller (iBMC) port—the service maintenance port—connection to one of the GbE. If only
one storage controller is present, only one cable is installed. Two of the four 8-gigabit fibre
channel (GbFC) connections attach to each storage controller. If only one storage controller
is present, only two fibre channel cables are present. In Figure 2-7 on page 56 you can find
physical connectivity for each card in a storage node.

Figure 2-7 Storage node connectivity
2.4.3 Management node connections

The management node connects to the two gigabit Ethernet (GbE) switches at the top of the
base rack, to the two InfiniBand switches in the base rack. Two of the onboard Ethernet ports
connect to the internal private management network within the SONAS system for health
monitoring and configuration. The other two onboard Ethernet ports connect to the customer
network for GUI and CLI access and also has an additional integrated Baseboard
Management Controller (iBMC) port—the service maintenance port—connection to one of
the GbE switches in the base rack. Figure 2-8 on page 57 shows the find physical
connectivity for each card in a Management node.

Figure 2-8 Management node connectivity
2.4.4 Internal POD connectivity

In Figure 2-9 on page 58 shows internal connectivity within a storage pod. Two storage nodes
in storage pod are connected to two different Infiniband switches (Infiniband fabrics) within
SONAS rack, they are configured in high-availability pairs. The two Storage nodes in the HA
pair are directly attached through Fiber Channel links to one or two Storage controllers. Disk
storage expansion units are directly attached to Storage controllers through a 3Gb/s SAS
interface. All connections within Storage pod are redundant and configured in dual-fabric
configuration. All connections within storage pod are installed by IBM Customer Engineer
(CE).

Figure 2-9 Internal Storage pod cabling
2.4.5 Data Infiniband network

Figure 2-10 on page 59 shows all the internal Infiniband connections within SONAS cluster.
There are two Infiniband switches within SONAS. All nodes (management, interface, storage)
are connected to these swithes. All SONAS nodes have a bonded IP address to
communicate with each other. It means that in case of link failure traffic will be moved to
another available interface. SONAS Infiniband network carries file system data and low level
control and management traffic. Expansion racks can be moved away from each other for the
length of Infiniband cables. Currently the longest cable available is 50m.

Figure 2-10 SONAS internal Infiniband connections
2.4.6 Management ethernet network

The Management ethernet network carries SONAS administration traffic such as monitoring
and configuration via GUI and CLI. Figure 2-11 on page 60 shows all the internal Ethernet
connections within a SONAS cluster. The blue connections are for low bandwidth
management messages and green connections are for Integrated Management Module
connections. Internal Ethernet switches are always installed in all racks. All SONAS nodes
have a bonded IP address to communicate with each other. It means that in case of link
failure traffic will be moved to another available interface.

Figure 2-11 SONAS internal Ethernet connections
2.4.7 Connection to the external customer network

Figure 2-12 on page 61 shows a picture with all external Ethernet connections between
SONAS cluster and your network. Interface node have a bonded IP address to communicate
with external SONAS clients. It means that in case of link failure traffic will be moved to
another available interface. By default interface nodes work in active-backup configuration for
10Gb ports and in load balancing for 1Gb ports. Currently Management node does not offer
IP bonding for external administrator connectivity (refer to 4.3, “Bonding” on page 148).

Figure 2-12 SONAS external customer’s Ethernet connections
2.5 Different SONAS configurations available

In this section we will look at the available SONAS configurations.
2.5.1 Rack types - how to choose correct rack for your solution
A SONAS system can consist of one or more racks, into which the components of the system
are installed. A 42U enterprise class rack is available. Note that installation of SONAS
components in customer-supplied racks is not permitted.
The rack may have two or four power distribution units (PDU) mounted inside of it. The PDUs
do not consume any of the rack’s 42U of space. The first pair of PDUs is mounted in the lower
left and lower right sidewalls. The second pair of PDUs is mounted in the upper left and upper
right sidewalls. The rack supports either Base PDUs or Intelligent PDUs (iPDUs). The iPDUs
can be used with the Active Energy Manager component of IBM Systems Director to monitor
the energy consumption of the components in the rack. When installed in the rack, the iPDUs
are designed to collect energy usage information on the components in the rack and report
the information to the IBM Active Energy Manager over an attached customer-provided local
area network (LAN). Using iPDUs and IBM Systems Director Active Energy manager, you
can gain a more complete view of energy used with the datacenter.
There are three variations of the SONAS rack:

Base rack
The Scale Out Network Attached Storage (SONAS) system always contains a base rack that
contains the management node, InfiniBand switches, a minimum two interface nodes, and a
keyboard, video, and mouse (KVM) unit. The capacity of the SONAS system that you order
affects the number of racks in your system and the configuration of the base rack. Figure 2-13
on page 62 shows the three basic SONAS racks.
Figure 2-13 SONAS base rack options
There are three available options of the SONAS base rack: 2851-RXA feature code 9003,
9004 and 9005. Your first base rack will depend on how you are going to scale out the
SONAS system in the future.
SONAS base rack feature code 9003 (left rack in Figure 2-13)
This base rack has two gigabit Ethernet (GbE) switches located at the top of the rack, a
management node, two smaller 36-port InfiniBand switches and at least two interface nodes.
There is no storage with this rack, so you have to add storage in additional storage expansion
racks. You can add to the base rack additional interface expansion racks. In this rack you are
limited to 2x36 Infiniband ports, these switches can not be expanded or exchanged. The rack
may be expanded on additional maximum 14 interface nodes.

Note:
򐂰 The rack must have two 50-port 10/100/1000 Ethernet switches for internal IP
management network.
򐂰 The rack must have at least one Management node installed.
򐂰 The rack must have a minimum of two Interface Nodes, with the rest of the
interface node bays being expandable options for a total of 16 interface nodes.
SONAS base rack Feature Code 9004 (middle rack on Figure 2-13)
This is the base rack with at top two gigabit Ethernet (GbE) switches, a management node,
two larger 96-port InfiniBand switches and at least two interface nodes. There is no storage
with this rack, so you have to add storage in another storage expansion racks. You can add to
the rack additional interface expansion racks. The rack may be expanded on additional
maximum eight interface nodes. The rack will allow you to scale to maximum scalability,
because it has installed large 96-port InfiniBand switches.
Note:
management network.
SONAS base rack Feature Code 9005 (right rack on Figure 2-13)
This is the base rack with at top two gigabit Ethernet (GbE) switches, a management node,
two smaller 36-port InfiniBand switches, at least two interface nodes and at least one storage
pod which consist of two storage nodes and a RAID controller.
Note:
management network.
interface node bays being expandable options.
򐂰 The rack must have two Storage Nodes
򐂰 The rack must have a minimum of one Storage Controller. Disk storage
expansion units extending up to a total of two disk storage expansion units and
two disk storage controllers.
Interface Expansion Rack

The IBM SONAS interface expansion rack extends the number of interface nodes in an
already existing base rack by providing up to 20 additional interface nodes. The total number
of interface nodes cannot exceed 30. Mandatory are the two 50-port 10/100/1000 Ethernet
switches and at least one interface node per interface expansion rack. Figure 2-14 on
page 64 shows Interface expansion rack.

Figure 2-14 Interface Expansion Rack
Note:
management network.
򐂰 The rack must have a minimum of one Interface Nodes, with the rest of the
Storage expansion
The Storage expansion rack extends the storage capacity of an already existing base rack,
by adding up to four additional storage nodes, up to four additional storage controllers, and up
to four disk storage expansion units. The eight possible disk storage expansion and controller
units can hold a maximum of 480 hard-disk drives. Up to 30 Storage Pods can exist in a
SONAS system. Figure 2-15 on page 65 shows Storage expansion rack.

Figure 2-15 Storage expansion rack

Note:
management network.
򐂰 The rack must have two Storage Nodes
򐂰 The rack must have a minimum of one Storage Controller and may be expanded
by:
• Add a disk storage expansion unit #1.2 to the first storage controller #1.1.
• Add a 2nd storage controller #2.1 to the first storage pod.
• Add a disk storage expansion unit #2.2 to the second storage controller in
the 1st storage pod, if the first storage controller #1.1 also has a disk
storage expansion unit.
• Add the start of a second storage pod, which include two storage nodes
(#3 and #4) and another storage controller #3.1 attached to these storage
nodes.
• Add a disk storage expansion unit #3.2 to storage controller #3.1.
• Add a second Storage controller #4.1 to the second storage pod.
• Add a disk Storage expansion unit #4.2 to storage controller #4.1, if
storage controller #3.1 also has a disk storage expansion unit.
Power limitation affects number of SAS drives per rack.
At the current time, SONAS does not yet suupport a 60A service option, which can limit the
total amount of HW that can be installed in the expansion rack, 2851-RXB.
Because of the power consumption of a Storage Controller (MTM 2851-DR1) fully populated
with sixty (60)15K RPM SAS hard disk drives and the power consumption of a disk storage
expansion unit (MTM 2851-DE1) fully populated with sixty (60) 15K RPM SAS hard disk
drives, a Storage Expansion Rack (2851-RXB) is limited to a combined total of six Storage
Controllers and Disk Storage Expansion Unit when they are fully populated with sixty (60)
15K RPM SAS hard disk drives.
2.5.2 Drive types - how to choose between different drive options

The lowest hardware storage component of SONAS is a physical disk. They are grouped in
sets of 10 disks. Each set of disks consists of a single type of physical disk drive – either six
10-packs of 7.2K RPM 1 TB SATA hard-disk drives or six 10-packs of 2 TB SATA drives or
six 10-packs of 15K RPM 450 GB or 600 GB SAS drives. You can not mix drive types or sizes
within an enclosure.
SONAS storage controller is capable to use RAID 5 with 450 GB SAS hard-disk drives and
RAID 6 with 1 and 2 TB SATA hard-disk drives. The entire capacity of a RAID array is
mapped into a single LUN and is mapped to all hosts. Each LUN is presented across a fibre
channel interface and detected as a multipath device on each storage node. The SONAS
code assigns a unique multipath alias to each LUN based on its WWID.
SAS disk drives require more power than SATA drives, so each storage expansion rack can
hold up to 360 450 GB SAS drives or up to 480 1 TB or 2 TB SATA drives (see Figure 8-4 on
page 247). SATA drives are always configured within a storage controller or a storage

expansion unit at the RAID 6 array. There are eight data drives and two parity drives per an
array, this means that within a storage controller or a storage expansions unit there are 48
data drives. SAS drives are always configured within the storage controller or storage
expansion unit at the RAID 5 array. There are eight data drives, one parity drive and one
spare drive per an array, which means that within a storage controller or a storage expansion
unit there are 48 data drives. Table 2-3 on page 67 shows summary of possible
configurations. This is preconfigured and can not be changed.
Table 2-3 Drive types configuration summary

Drive type Drive RAID Total Data Parity Spare
capacity array drives drives drives drives
SATA 1 or 2 TB RAID 6 60 48 12 0
SAS 450 GB RAID 5 60 48 6 6
SAS 600 GB RAID 5 60 48 6 6
Generally speaking SAS drives have smaller seek time, larger data transfer rate and higher
Mean Time Between Failures (MTBF) than cheaper but higher capacity SATA drives. The
SONAS internal storage will perform disk scrubbing, as well as isolation of failed disks for
diagnosis and attempted repair. The storage can conduct low-level formatting of drives,
power-cycle individual drives if they become unresponsive, correct data using checksums on
the fly, rewrite corrected data back to the disk, and use smart diagnostics on SATA disks to
determine if the drives need to be replaced.
SONAS supports drives intermix - it is possible to have within the same storage pod a storage
enclosure with high performance SAS disks and another storage enclosure with high capacity
SATA disks. Non-enterprise class application data or rarely used data may be automaticaly
migrated within SONAS from faster, but smaller and more expensive SAS disks to slower, but
larger and cheaper SATA disks.
2.5.3 External ports - 1 GbE / 10 GbE

SONAS supports up to 30 interface nodes which connect customer Ethernet network over up
to two 10Gb/s ports or up to six 1Gb/s ports per interface node with data. SONAS is designed
as a parallel grid architecture. Every node is a balanced modular building block, with
sufficient main memory and PCI bus capacity to provide full throughput with the adapters
configured to that node. Therefore, it is not possible, for example, for the 10 Gb/s ethernet
card to overrun the PCI bus of the interface node, because the PCI bus has more capacity
than the 10 Gb/s ethernet card. The Infiniband switches have sufficient bandwidth capability
to handle a fully configured SONAS. It means that customer can choose between 1Gb/s or
10Gb/s communication for SONAS storage and SONAS clients, internal SONAS network in
any case will not be a bottleneck. The choice should depend on requested speed between
SONAS clients and SONAS storage and current network infrastructure. By default all
ethernet 10 Gb ports in interface nodes work in bonding configuration as active-backup
interfaces (only one slave in the bond is active) and all 1Gb ports work in bonding as load
balancing configuration, so transfer even with 1Gb/s adapters can be sufficient in some
cases.

2.6 SONAS with XIV Storage overview

IBM offers a specialized version of the IBM Scale Out NAS (SONAS), which uses the IBM
XIV® Storage System as the storage. This section describes how the specialized SONAS
configuration attaches XIV storage, and outlines general considerations and considerations.
IBM SONAS with XIV storage is a specialized SONAS configuration, available under special
bid only. This offering modifies the SONAS base rack to attach only XIV storage.
In this configuration, the SONAS will have one storage pod, with two Storage Nodes, without
integrated SONAS storage. To these two storage nodes, you may attach one or two external
XIV storage systems. The XIV storage systems provide the necessary usable disk storage for
the SONAS file systems.
The XIVs may be shared with other Fibre Channel or iSCSI hosts that are attached to the
XIVs, provided that all LUNs allocated to SONAS are hard allocated (no thin provisioning).
All of the normal functionality of the SONAS system will be available when XIV is used as the
back-end storage systems, which includes:
򐂰 Network file serving via NFS, CIFS, FTP, and HTTPS.
򐂰 Quotas
򐂰 Snapshots
򐂰 TSM backup and recovery
򐂰 Information Lifecycle Management (ILM) and Hierarchical Space management (HSM)
Note: Because no intermix with regular SONAS integrated storage is supported, and
because XIV only supports SATA disk, SONAS HSM is limited in the sense that no tiered
disk storage pooling is necessary.
However, if desired, multiple logical storage pools can be be defined for management
purposes. SONAS HSM may be used as normal for transparent movement of data out to
external storage, such as tape, tape libraries, or data de-duplication devices.
2.6.1 Differences between SONAS with XIV and standard SONAS system
The SONAS with XIV will be similar to a standard SONAS system, with the following
exceptions:
򐂰 It will be limited to the SONAS Base Rack (2851-RXA) Configuration #3 (FC 9005), with
two 36-ports InfiniBand switches (2851-I36), a management node (2851-SM1),
Keyboard-video-mouse (KVM) unit and two 10/100/1000 Level 2 50-port Ethernet
switches
򐂰 It can have between two and six SONAS interface nodes (2851-SI1) within the base rack.
򐂰 It will have one pair of storage nodes (2851-SS1) within the base rack
򐂰 The SONAS Software Management GUI and Health Center do not provide monitoring or
management functionality with regards to the XIV, or the SAN switch(es) by which it is
connected.
The requirement for built-in SONAS storage is removed as part of this specialized
configuration. It is not supported for this specialized configuration of SONAS to support a

mixed storage environment; there cannot be a combination of internal SONAS storage and
external XIV storage.
2.6.2 SONAS with XIV configuration overview

The SONAS with XIV offering will allow a SONAS system to be ordered without any SONAS
Storage controllers (2851-DR1) and without any Disk Storage expansion units (2851-DE1). In
other words, this will be a SONAS Storage pod without any integrated SONAS storage. The
Storage pod will consist of two standard SONAS storage nodes (2851-SS1), ordered and
placed into a SONAS rack 2851-RXA.
The SONAS with XIV storage pod, connecting to XIV storage, requires the customer to
supply:
򐂰 Two external SAN switches currently supported by XIV for attachment, such as (but not
limited to) the IBM SAN24B 24-port 8Gbps Fibre Channel (FC) switches (2498-B24)
򐂰 One or two XIV storage systems. Existing XIVs can be used, as long as they meet the
minimum required XIV microcode level
The diagram in Figure 2-16 on page 69 diagram shows a high-level representation of a single
Storage pod connecting to XIV storage.
Figure 2-16 SONAS storage pod attachment to XIV storage

2.6.3 SONAS base rack configuration when used with XIV storage
Figure 2-17 on page 70 shows the maximum configuration of the SONAS Base Rack
(2851-RXA) when ordered with specify code #9006 (indicating configuration #3) and the
SONAS i-RPQ number #8S1101.
Note that to mitigate tipping concerns, the SONAS interface nodes will be moved to the
bottom of the rack. Also note that components that are not part of the SONAS appliance
(including SAN switches) may not be placed in the empty slots.
Figure 2-17 Maximum configuration for SONAS base rack for attaching XIV storage
2.6.4 SONAS with XIV configuration and component considerations

The SONAS with XIV offering requires that you purchase one or two XIV systems, as well as
a pair of XIV supported SAN switches. This is in addition to the SONAS base rack order
shown abovein Figure 2-17 on page 70.
When all of these components have been delivered to your site, the IBM Customer Engineeer
service representative will connect, power up, and perform initial configuration on the SONAS
and XIV components. SAN switches must follow the normal setup rules for that SAN switch.

This specialized SONAS configuration is available for original order plant manufacture only. It
is not available as a field MES.
The SAN switches must be mounted in a customer supplied rack and may not be mounted in
the SONAS rack. The open slots in the SONAS base rack will be covered with filler panels for
aesthetic and airflow reasons. Components that are not part of the SONAS (including SAN
switches) may not be placed in the empty slots.
One or two XIV storage systems may be attached. Any model of XIV storage may be used.
All available XIV configurations starting from 6 modules are supported. The firmware code
level on the XIV must be version 10.2 or higher.
Larger SAN switches, such as the SAN40B, may also be used. Any switch on XIV’s
supported switch list may be used, provided there are sufficient open, active ports with SFPs
to support the required connectivity. Connectivity for external block device users sharing the
XIV(s) is beyond the scope of this specialized offering, and must be planned/set up by the
end user so as not to interfere with the connectivity requirements for the IBM SONAS and
XIV.
The following SONAS file system settings have been tested and are intended to be used:
򐂰 256K Block Size
򐂰 Scatter Block Allocation
򐂰 One failure group if only one XIV is present
򐂰 Two failure groups (supports metadata replication requirement), if two XIVs are present.
򐂰 Metadata replication: if two XIVs are present, the SONAS file system metadata will be
replicated across XIV systems.
It is supported to add more storage to the XIV system, provision LUN's on that additional
storage, have the LUN's recognized by the SONAS system and available to be added to an
existing file system or new file system in the SONAS system.
It is supported to share the XIV system(s) between SONAS and other block storage
applications, provided that the LUNs allocated to SONAS are hard allocated (no thin
provisioning).
While the current specialized offering only supports one storage pod with one pair of storage
nodes, and only supports one or two XIV systems, this is not an architectural limitation. It is
only a testing and support limitation.
IBM requires the use of IBM Services to install this specialized configuration. This assures
that the proper settings and configuration are done on both the XIV and the SONAS for this
offering.
The intent of this specialized SONAS configuration offering, is to allow existing or aspiring
users of the IBM XIV storage system, to be able to attach XIV to an IBM SONAS.


Draft Document for Review November 1, 2010 9:32 am 7875SoftwareArchitecture.fm
Chapter 3. Software architecture

This chapter provides a description of the software architecture, operational characteristics,
and components of the IBM Scale Out NAS (SONAS) software. We will review the design and
concepts of the SONAS Software licensed program product that operates the SONAS
parallel clustered architecture. An overview of the SONAS Software functionality stack, the
file access protocols, the SONAS Cluster Manager, the parallel file system, central policy
engine, scan engine, automatic tiered storage, workload allocation, availability,
administration, snapshots, asynchronous replication, and system management services are
discussed.
This is an excellent chapter in which to gain an overview of all of these SONAS Software
concepts, and provide the base knowledge for further detailed discussions of these topics in
subsequent chapters.

7875SoftwareArchitecture.fm Draft Document for Review November 1, 2010 9:32 am
3.1 SONAS Software

The functionality of IBM SONAS is provided by IBM SONAS Software (5639-SN1). Each
node of the IBM SONAS is licensed and pre-installed with one copy of SONAS Software.
According to the role of the node, one of interface node, storage node, or management node,
the appropriate functions are called upon out of the common software code load. Each node
and each copy of SONAS Software operates together in a parallel grid cluster architecture,
working in parallel to provide the functions of the IBM SONAS.
SONAS Software provides multiple elements and integrated components that work together
in a coordinated manner to provide the functions shown in the diagram in Figure 3-1. In this
chapter we will give an overview each of these components:
CIFS NFS FTP HTTPS future
SONAS Cluster Manager
HSM and ILM

Parallel Monitoring Agents
File System
Backup & Restore GUI/CLI mgmt
Policy Engine Interfaces
Snapshots and Scan Engine
Replication
Security
Enterprise Linux
IBM Servers
Figure 3-1 SONAS software functional components
This chapter has the following sections:

򐂰 Describe the SONAS data access layer: the CIFS, NFS, FTP, HTTPS file protocols
򐂰 Describe the SONAS Cluster Manager for workload allocation and high availability
򐂰 Describe SONAS authentication and authorization
򐂰 Describe the SONAS data repository layer: the parallel clustered file system
򐂰 Describe details of SONAS data management services
– Automated data placement and management: Information Life cycle Management
(ILM) and Hiearchical Storage Management (HSM)

– Backup and restore data protection and Heiarchical Storage Management (HSM),
using integration with Tivoli Storage Manager (TSM) discussed in “SONAS and Tivoli
Storage Manager integration” on page 118
– Snapshots for local data resiliency
– Remote async replication for remote recovery
򐂰 Describe the SONAS system management services
– GUI, Health Center, CLI, and management interfaces
– Monitoring agents, security, and access control lists
In this chapter, we will review the functions of each of the SONAS Software components as
shown in Figure 3-1 on page 74, starting at the top and working our way down.
3.2 SONAS data access layer - file access protocols

We begin by examining the SONAS data access layer, and the file access protocols that are
currently supported, as shown in Figure 3-2 on page 75:
Network
Storage
Users
Ethernet
HSM and
and ILM
ILM Monitoring Agents
Parallel
File System
Snapshots and
and Scan Engine
Security
Replication
Enterprise Linux
IBM Servers
Figure 3-2 SONAS Software - file access protocols
The network file access protocols that are supported by SONAS today are: CIFS, NFS, FTP,
and HTTPS. These file access protocols provides the mapping of the client file requests onto
the SONAS parallel file system. The file requests are translated from the network file access
protocol to the SONAS native file system protocol. The SONAS Cluster Manager is used to
provides cross-node and cross-protocol locking services for the file serving functions in CIFS,
NFS, FTP, and HTTPS. The CIFS file serving function maps of CIFS semantics and security
onto the POSIX-based parallel file system with native NFSv4 Access Control Lists.
Chapter 3. Software architecture 75

Following this section, we will then discuss the role the SONAS Cluster Manager plays in
concurrent access to a file from multiple different platforms (i.e. concurrently access a file
from both CIFS and NFS, for example). For additional information on creating exports for file
sharing protocols you can refer to “Creating and managing exports” on page 369.
3.2.1 File export protocols: CIFS

SONAS CIFS support has been explicitly tested for data access from clients running
Microsoft Windows (2000, XP, Vista 32-bit, Vista 64-bit, Windows 2008 Server), Linux with
SMBClient, Mac OS X 10.5, and Windows 7.
The base SONAS file system is a full POSIX-compliant file system, i.e. a Unix/Linux-style file
system. SONAS communicates with CIFS clients and Microsoft Windows clients by emulating
CIFS Windows file system behavior over this POSIX-compliant SONAS file system. For
Windows clients, the SONAS system maps Unix/Linux Access Control Lists (ACLs) to
Windows security semantics.
A multitude of appropriate file access concurrency and cross-platform mapping functions are
done by the SONAS Software, especially in the SONAS Cluster Manager. This support
includes the following capabilities to allow Windows users to interact transparently with the
SONAS file system:
򐂰 The full CIFS data access and transfer capabilities are supported with normal locking
semantics.
򐂰 User authentication is provided through Microsoft Active Directory or through LDAP
򐂰 NTFS Access Control Lists (ACLs) are enforced on files and directories; they can be
modified using the standard Windows tools
򐂰 Semi transparent fail over, if the CIFS application supports network retry1
򐂰 Consistent central Access Control Lists (ACLs) enforcement across all platforms ACLs are
enforced on files and directories, and they can be modified (with proper authority and
ownership) using the standard Windows tools
򐂰 Supports the win32 share modes for opening and creating files
򐂰 Supports case insensitive file lookup
򐂰 Support for DOS attributes on files and directories
򐂰 Archive bit, ReadOnly bit, System bit, other semantics not requiring POSIX file attributes
򐂰 MS-DOS / 16 bit Windows short file names
򐂰 Supports generation of 8.3 character file names
򐂰 Notification support of changes to file semantics to all clients in session with the file
򐂰 Provides consistent locking across platforms by supporting mandatory locking
mechanisms and strict locking
򐂰 Opportunistic locks and Leases are supported, supports lease management for enabling
client side caching
򐂰 Off-line or de-staged file support
– Windows files that have been de-staged to external tape storage, using the SONAS
Hierarchical Storage Management function through Tivoli Storage Manager, will be
displayed as off-line within the Windows Explorer because they are marked with an
hourglass symbol, the off-line bit.
– Users and applications can see in advance that a file is off-line
– Recall to disk is transparent to the application, so no additional operation beside the
file open is needed.
– Directory browsing using the Windows Explorer supports file property display without
the need to recall off-line or migrated files
򐂰 Support of SONAS Snapshot™ integrated into the Windows Explorer VSS (Volume
Shadow Services) interface, allowing users with proper authority to recall files from
1 See 3.3.3, “Principles of interface node failover and failback” on page 83 for details

SONAS Snapshots. This file version history support is for versions created by SONAS
Snapshots
򐂰 The standard CIFS timestamps are made available:
– Created Time stamp: It is the time when the file is created in the current directory.
When the file is copied to a new directory, a new value will be set.
– Modified Time stamp: It is the time when the file is last modified. When the file is copied
to elsewhere, the same value will be carried over to the new directory.
– Accessed Time stamp: It is the time when the file is last accessed. This value is set by
the application program that sets or revises the value (this is application dependent -
unfortunately some applications do not revise this value)
3.2.2 File export protocols: NFS

NFSv2 and NFSv3 are supported by SONAS and any standard NFS client is supported.
NFSv4 is not currently supported by SONAS, this is a known requirement. The following
characteristics apply to the NFS exports:
򐂰 Normal NFS data access functions are fully supported with NFS consistency guarantees
򐂰 Authorization and ACLs are supported
򐂰 Supports client machine authorization through NFS host lists
򐂰 Supports enforcement of Access Control Lists (ACLs). Supports reading and writing of the
standard NFSv3 / POSIX bits
򐂰 Supports the NFSv3 advisory locking mechanism
򐂰 Semi transparent node fail over (application must support network retry)2
Note that the SONAS Software file system implements NFSv4 Access Control Lists (ACLs)
for security, regardless of the actual network storage protocol used. This provides the
strength of the NFSv4 ACLs even to clients that access SONAS via the NFSv2, NFSv3,
CIFS, FTP, and HTTPS protocols.
You should not mount the same NFS share on one client from two different SONAS interface
nodes as data corruption might occur and also do not mount the same export twice on the
same client.
3.2.3 File export protocols: FTP

SONAS provides FTP support from any program supporting the FTP protocol. The following
characteristics apply:
򐂰 Data transfer to and from any standard FTP client
򐂰 Supports user authentication through Microsoft Active Directory and through LDAP
򐂰 Supports enforcement of Access Control Lists (ACL’s). Supports retrieval of POSIX
attributes. ACL’s can not be modified using FTP as there is no support for the chmod
command
򐂰 On node failover, SONAS supports FTP resume for application that support network retry
򐂰 Characters for file names and directory names are UTF 8 encoded
3.2.4 File export protocols: HTTPS

SONAS supports simple read only data access to files through the HTTPS protocol from any
web browser. All web downloads from SONAS are via HTTPS. If you try to connect to
SONAS via HTTP you will be automatically redirected to a HTTPS connection. The reason for
2 See 3.3.3, “Principles of interface node failover and failback” on page 83 for details

this design is security as we have to provide a secure logon mechanism for access
authorization.The following features are supported through this protocol:
򐂰 Read access to appropriately formatted files
򐂰 Supports enforcement of Access Control Lists (ACL’s). ACL’s can not be modified or
viewed using this protocol
򐂰 Supports user authentication through Microsoft Active Directory or LDAP.
򐂰 On node fail during a file transfer, the transfer is cancelled and must be retried at another
node. Partial retrieve is supported, which minimizes duplicate transfers in a failover
situation.
򐂰 Characters for file names and directory names are UTF 8 encoded.
The Apache daemon provides HTTPS access to the SONAS file system. SONAS supports
secure access only, so that the credentials will always be SSL encrypted. SONAS uses HTTP
aliases as vehicle to emulate the share concept. For example, share XYZ will be accessible
via https://server.domain/XYZ.
Note that WebDAV3 and the REST4 API are not supported at this time in SONAS, they are
known requirements.
3.2.5 SONAS Locks and Oplocks

POSIX byte-range locks set by NFS clients are stored in the SONAS file system, and
Windows clients accessing the cluster using CIFS honor these POSIX locks. Mapping of
CIFS locks to POSIX locks is updated dynamically on each locking change.
Unless the applications specifically know how to handle byte-range locks on a file or are
architected for multiple concurrent writes, concurrent writes to a single file are not desirable in
any operating system.
To maintain data integrity, locks are used to guarantee that only one process can write to a
file or to a byte range in a file a time. Although traditional operating systems and file systems
traditionally locked the entire file, newer ones such as SONAS support the ability for a range
of bytes within a file to be locked. Byte-range locking is supported for both CIFS and NFS but
this does require the application to know how to exploit this capability. Byte range locks are
handled by the SONAS parallel file system.
If another process attempts to write to a file, or a section of one, that is already locked, it will
receive an error from the operating system and will wait until the lock is released.
SONAS Software supports the standard DOS and NT filesystem (deny-mode) locking
requests, which allow only one process to write to an entire file on a server at a give time, as
well as byte-range locking. In addition, SONAS Software supports the Windows locking
known as opportunistic locking or oplock.
CIFS byte-range locks set by Windows clients are stored both in the SONAS interface node
cluster-wide database, and by mapping to POSIX byte-range locks in the SONAS file system.
This mapping ensures that NFS clients see relevant CIFS locks as POSIX advisory locks,
and NFS clients honor these locks.
3
WebDAV is "Web-based Distributed Authoring and Versioning"
4 REST API is "Representational State Transfer”

3.3 SONAS Cluster Manager

Next, we will examine the SONAS Cluster Manager, as shown in Figure 3-3 on page 79. The
SONAS cluster manager is a core SONAS component that coordinates and orchestrates
SONAS functions.
HSM and ILM Monitoring Agents

Parallel
File System

Security
Replication
Enterprise Linux
IBM Servers
Figure 3-3 SONAS Cluster Manager
The SONAS Cluster Manager has these responsibilities:

1. Coordinates the mapping of the different file sharing protocols onto the SONAS parallel
file system. The CIFS file serving function maps of CIFS semantics and security onto the
POSIX-based parallel file system and NFSv4 Access Control Lists
2. Provides the clustered implementation and management of the interface nodes, including
tracking and distributing record updates across the interface nodes in the cluster.
3. Control of the interface nodes in the cluster. SONAS Cluster Manager controls the public
IP addresses used to publish the NAS services, and moves them as necessary between
nodes. Via monitoring scripts, SONAS Cluster Manager monitors and determines the
health state of each individual interface node. If an interface node has problems, such a
hardware failures, or software failures such as broken services, network links, etc. the
node becomes unhealthy. In this case, SONAS Cluster Manager will dynamically migrate
affected public IP addresses and in-flight workloads to healthy interface nodes, and uses
‘tickle-ack’ technology with the affected user clients, so that they reestablish connection to
their new interface node.
4. SONAS Cluster Manager provides the interface to manage cluster IP addresses, add and
remove nodes, ban and disable nodes.
5. The SONAS Cluster Manager coordinates advanced functions such as the byte-range
locking available in the SONAS parallel file system, it manages the interface nodes and
coordinates the multiple file sharing protocols to work in conjunction with the SONAS
parallel file system base technology so as to allow concurrent access, parallel read and

write access, for multiple protocols and multiple platforms, across multiple SONAS
interface nodes. It is the key to guarantee full data integrity to all files, anywhere within the
file system.
For information on how to administer the SONAS cluster manager you can refer to section
“Cluster Management” on page 343.
3.3.1 Introduction to the SONAS Cluster Manager

The SONAS global cluster manager provides workload allocation and high availability.
SONAS provides high availability through a sophisticated implementation of a global cluster
of active-active peer nodes. Each SONAS node is a peer to all other SONAS nodes, each
SONAS node is in an active-active relationship with all other SONAS nodes, and incoming
workload can be evenly distributed among all SONAS nodes of the same type. If a node fails,
the SONAS Software will automatically fail over the workload to another healthy node of the
same type. We will review:
򐂰 The operational characteristics and types of SONAS nodes in the global cluster
򐂰 Clustered node failover/failback for both CIFS and NFS
򐂰 Dynamic insertion/deletion of nodes into the cluster
Let’s start by reviewing the SONAS architecture. as shown in Figure 3-4.
HTTP
HTTP NFS
NFS CIFS
CIFS FTP
FTP Other
Other
Clients
Clients Clients
Clients Clients
Clients Clients
Clients Clients
Clients
IP Network
Global Namespace
Management
Management
Node
Node
Interface
Interface
Node
Node ... Interface
Interface
Node
Node .... Interface
Interface
Node
Node
IP Mgmt. Network Infiniband Data Network Tape
Storage Pod Storage Pod

Storage
Storage Node
Node Storage
Storage Node
Node Storage
Storage Node
Node Storage
Storage Node
Node
Storage
controller &
disk
Storage
controller &
disk
... Storage
controller &
disk
Storage
controller &
disk
Storage Storage Storage Storage

Expansion Expansion Expansion Expansion
Figure 3-4 SONAS architecture
There are three types of nodes in a SONAS. The nodes are divided and configured according
to one of three roles. All nodes are in a global cluster, and a copy of SONAS Software runs on
each of the nodes. A node performs only one of the three roles below:
򐂰 Interface node - provides the connections to the customer IP network for file serving.
These nodes establish and maintain the connections to CIFS, NFS, FTP, or HTTP users,

and serve the file requests. All four of these protocols can and do co-exist on the same
interface node. Each interface node can talk to any of the storage nodes.
򐂰 Storage node - acts as a storage servers, and reads and writes data to and from the
actual storage controllers and disks. Each storage node can talk to any of the interface
nodes. A storage node serves file and data requests from any requesting interface node.
SONAS Software writes data in a wide stripe across multiple disk drives in a logical
storage pool. If the logical storage pool is configured to span multiple storage nodes and
storage pods, the data striping will also span storage nodes and storage pods.
򐂰 Management node - monitors and manages the SONAS global cluster of nodes, and
provides Command Line Interface management and GUI interface for administration.
Command Line Interface commands come into the SONAS through the Management
node.
Notice that SONAS is a two-tier architecture, because there are multiple clustered interface
nodes in the interface tier and multiple clustered storage nodes in the storage tier. This is an
important aspect of the design, as it allows independent scalability of interface nodes for user
file serving throughput from storage pods and storage nodes for storage capacity and
performance.
Each SONAS node is an IBM System x® commercial enterprise class 2U server, and each
node runs a copy of IBM SONAS Software licensed program product (5639-SN1). SONAS
Software manages the global cluster of nodes, provides clustered auto-failover, and provides
the following functions:
򐂰 The IBM SONAS Software manages and coordinates each of these nodes running in a
peer-peer global cluster, sharing workload equitably, striping data, running the central
policy engine, performing automated tiered storage.
򐂰 The cluster of SONAS nodes is an all-active clustered design, based upon proven
technology derived from the IBM General Parallel File System (GPFS)
򐂰 All interface nodes are active and serving file requests from the network, and passing
them to the appropriate storage nodes. Any interface node can talk to any storage node.
򐂰 All storage nodes are active and serving file and data requests from any and all interface
nodes. Any storage node can respond to a request from any interface node. SONAS
Software will stripe data across disks, storage RAID controllers, and storage pods.
򐂰 SONAS Software also coordinates automatic node failover and failback if necessary.
򐂰 From a maintenance or failover and failback standpoint, any node may be dynamically
deleted or inserted into the global cluster. Upgrades or maintenance may be performed by
taking a node out of the cluster, upgrading it if necessary, and re-inserting it into the
cluster. This is a normal mode of operation for SONAS, and this is the manner in which
rolling upgrades of software and firmware are performed.
SONAS Software is designed with the understanding that over time, different generations and
speeds of System x servers will be used in the global SONAS cluster. SONAS Software
understands this and is able to distribute workload equitably, among different speed interface
nodes and storage nodes within the cluster.
3.3.2 Principles of SONAS workload allocation to interface nodes

In this section we will discuss how workload is allocated and distributed among the multiple
interface nodes, and the role played within that by the SONAS Cluster Manager.
In order to cluster SONAS interface nodes so that they can serve the same data, the interface
nodes must coordinate their locking and recovery. This coordination is done through the
SONAS Cluster Manager. It is the SONAS Cluster Manager’s role to manage all aspects of
the SONAS interface nodes in the cluster.

Clusters usually cannot outperform a standalone server to a single client, due to cluster
overhead. At the same time, clusters can outperform standalone servers in aggregate
throughput to many clients, and clusters can provide superior high availability. SONAS is a
hybrid design that provide the best of both of these approaches.
From a incoming workload allocation standpoint, SONAS uses the Domain Name Server
(DNS) to perform round-robin IP address balancing, to spread workload equitably on a IP
address basis across the interface nodes, as shown in Figure 3-5.
SONAS.virtual.com SONAS.virtual.com
Client I Client II Client n

DNS Server
(name resolution)
10.0.0.10 10.0.0.12 10.0.0.14
10.0.0.11 10.0.0.13 10.0.0.15
SONAS.virtual.com
10.0.0.10
10.0.0.11
10.0.0.12
10.0.0.13
10.0.0.14
10.0.0.15
Figure 3-5 SONAS interface node workload allocation
SONAS allocates a single user network client to a single interface node, to minimize cluster
overhead. SONAS Software does not rotate a single client’s workload across interface nodes.
That is not only unsupported by DNS or CIFS, but would also decrease performance, as
caching and read-ahead is done in the SONAS interface node, and it is for this reason that
any one individual client is going to be assigned, for the duration of their session, to one
interface node at the time they authenticate and access the SONAS. At the same time,
workload from multiple users that could be numbering into the thousands or more, are
equitably spread across as many SONAS interface nodes as are available. If more user
network capacity is required, you simply add more interface nodes. SONAS scale out
architecture thus provide linear scalability as the numbers of users grow.
Agnostically to the application or the interface nodes, SONAS Software will always stripe data
across disks, storage RAID controllers, and storage pods, thus providing wide data striping
performance and parallelism to any file serving requests, by any interface node. This is
shown in Figure 3-6 on page 83.

Single
connection
write file write file for ease of attach
Parallelism
for high
performance
agnostic to
application
Figure 3-6 SONAS interface node workload allocation - parallelism at storage level
SONAS provides a single high performance NFS, CIFS, FTP, or HTTPS connection for any
one individual network clients. In aggregate, multiple users are IP-balanced equitably across
all the interface nodes, thus providing scale out capability - the more interface nodes, the
more user capacity that is available.
SONAS was designed to make the connection a standard CIFS, NFS, FTP, or HTTP
connection, in order to allow attachability by as wide a range of standard clients as possible,
and to avoid requiring the installation of any client side code.
3.3.3 Principles of interface node failover and failback

In the event that the redundancy within an interface node may fail, for example if there is a
fatal error or if the interface node simply needs to be upgraded or maintained, interface nodes
may be dynamically removed from and later re-inserted into the SONAS cluster. The normal
method of upgrade or repair of a interface node, would be to take the interface node out of the
cluster, as described in “Modify Interface and Storage Nodes status” on page 344. The
SONAS Cluster Manager will manage the failover of the workload to the remaining healthy
interface nodes in the SONAS cluster. The offline interface node may then be upgraded or
repaired, and then re-inserted into the SONAS cluster, and workload will then be
automatically rebalanced across the interface nodes in the SONAS.
The SONAS Software component that actually performs this function of managing the
interface node monitoring, failover and failback, is the SONAS Cluster Manager. When an
interface node is removed from the cluster, or if there is a interface node failure, healthy
interface nodes take over the load of the failed node as shown in Figure 3-7 on page 84.
In this case, the SONAS Software Cluster Manager will automatically perform the following:
򐂰 Terminate old network connections and move the network connections to a healthy
interface node. IP addresses are automatically re-allocated to a healthy interface node
– Session and state information that was kept in the Cluster Manager is used to support
re-establishment of the session and maintaining IP addresses, ports, etc.

– This state and session information and metadata for each user and connection is
stored in memory in each node in a high performance clustered design, along with
appropriate shared locking and any byte-range locking requests as well as other
information needed to maintain cross-platform coherency between CIFS, NFS, FTP,
HTTP users
򐂰 Notification technologies called tickle ack are used to tickle the application and cause it to
reset of the network connection
10.0.0.13 10.0.0.12
10.0.0.10 10.0.0.14 DNS Server
10.0.0.11 10.0.0.15 (name resolution)
SONAS.virtual.com
10.0.0.10
10.0.0.11
10.0.0.12
10.0.0.13
10.0.0.14
10.0.0.15
Figure 3-7 SONAS interface node failover
At the time of the failover of the node, if the session or application is not actively in a
connection transferring data, the failover can usually be transparent to the client. If the client
is transferring data, depending on the protocol and application, the application service failover
may be transparent to the client, depending on nature of the application, and depending on
what is occurring at the time of the failover.
In particular, if the client application, in response to the SONAS failover and SONAS
notifications, automatically does a retry of the network connection, then it is possible that the
user will not see an interruption of service. Examples of software that do this include many
NFS-based applications, as well as Windows applications that do retries of the network
connection, such as the Windows XCOPY utility.
If the application does not do automatic network connection retries, or the protocol in question
is stateful (i.e. CIFS) then a client side reconnection may be necessary to re-establish the
session. Unfortunately for most CIFS connections, this will be the likely case.
More information on interface node cluster failover are in Chapter 6, “Backup and recovery,
availability and resiliency functions” on page 177.
3.3.4 Principles of storage node failover and failback

We previously discussed interface node failover and failback in SONAS. A similar principle is
operational for storage node failover and failback. The SONAS Cluster Manager does not
directly participate in storage node failover and failback as it is the SONAS parallel file system
that manages the storage node failover.

In SONAS, there is the concept of a storage pod as a modular building block of storage, as
illustrated in Figure 3-4 on page 80. Each storage pod contains between 60 and 240 disk
drives, arranged in groups of 60 drives, and each storage pod contains two active - active
storage nodes. The two storage nodes provide resiliency and backup for each other in the
storage pod.
If a storage node should fail, the remaining healthy storage node in the storage pod takes
over the load of the failed storage node. A individual storage node is very high in capacity and
throughput, to allow good operation in the event of a failed storage node.
Furthermore, as we saw in the SONAS Introduction chapter in this book, recall that SONAS
can be configured to perform storage load balancing by striping data across:
򐂰 Disks
򐂰 Storage RAID controllers
򐂰 Storage pods
Logical storage pools in SONAS can be defined, and usually are defined, to span disks,
storage RAID controllers and storage pods. Furthermore, the data striping means that files
are spread in blocksize ‘chunks’ across these components in order to achieve parallel
performance and balanced utilization of the underlying storage hardware.
One of the purposes of this dispersion of SONAS data is to mitigate the effect of a failed
storage node. Files in SONAS are spread across multiple storage nodes and storage pods,
with the intent that only a small portion of any file would be affected by a storage node failure,
and only in terms of performance, not of data availability that is maintained.
As the SONAS grows larger and scales out to more and more storage nodes, the failure of
any one storage node becomes a smaller and smaller percentage of the overall storage node
aggregate capacity. The SONAS scale out architecture thus has the effect of reducing the
amount of impact of a storage node failure, as the SONAS grows.
Just as with interface nodes, storage nodes may be dynamically removed and re-inserted into
a cluster. Similar to the interface node methodology, the method of upgrade or repair of a
storage node is to take the storage node out of the cluster. The remaining storage node in the
storage pod will dynamically assume the workload of the pod. The offline storage node may
then be upgraded or repaired, and then re-inserted into the cluster. When this is done,
workload will then be automatically rebalanced across the storage nodes in the storage pod.
During all of these actions, the file system stays online and available and file access to the
users is maintained.
3.3.5 Summary
We have seen that the IBM SONAS provides equitable workload allocation to a global cluster
of interface nodes, including high availability through clustered auto-failover. We saw that:
򐂰 All SONAS nodes operate in a global cluster
򐂰 Workload allocation to the interface nodes is done in conjunction with external Domain
Name Server(s)
򐂰 The global SONAS cluster offers dynamic failover/failback, and if the application supports
network connection retries, can provide transparent failover of the interface nodes
򐂰 Normal upgrade and maintenance for SONAS nodes is via dynamic removal and insertion
of nodes into and out of the cluster
Let’s now proceed to discuss in more detail the SONAS Software components that provides
the functionality to support these capabilities.

3.3.6 SONAS Cluster Manager manages multi-platform concurrent file access

One of the primary functions of the SONAS Cluster Manager is to support concurrent access
from concurrent users, spread across multiple different network protocols and platforms to
many files. SONAS Software also supports, with proper authority, concurrent read and write
access to the same file, including byte-range locking. Byte-range locking means that two
users can access the same file cuncurrently, and each user lock and update a subset of the
file, with full integrity among updaters.
In the following figure, we see that all file accesses from the users to the SONAS parallel file
system will logically traverse the SONAS Cluster Manager, as shown in Figure 3-8. The term
logically implies
that the cluster manager will handle metadata and locking, but will not handle data transfer, in
other terms the cluster manager is not in-band as regards data transfer.
CIFS NFS FTP HTTP
SONAS Cluster
Manager
concurrent_access_to_a_file
SONAS File System

uses NFSv4 Access Control Lists
Figure 3-8 All file accesses traverse the SONAS Cluster Manager including concurrent accesses
The SONAS Cluster Manager is logically positioned in the file access path-length, as it is the
SONAS Cluster Manager that provides the mapping of the multiple protocols onto the
SONAS parallel file system, simultaneously managing the necessary locking to guarantee
data integrity across all the interface nodes. Finally, the SONAS Cluster Manager if
necessary, provides the failover and failback capabilities if the interface node experiences an
unhealthy or failed state.
The SONAS Cluster Manager works together with the SONAS parallel file system to provide
concurrent file access from multiple platforms in the following way:
򐂰 SONAS Cluster Manager - provides the mapping and concurrency control across multiple
interface nodes, and when multiple different protocols access the same file, and provides
locking across users across the interface nodes
򐂰 SONAS parallel file system - provides the file system concurrent access control at the
level of the physical file management, provides ability to manage and perform parallel
access, provides the NFSv4 access control list security, and provides the foundational file
system data integrity capabilities

We’ll discuss the parallel file system in more detail a little later. First, let’s discuss how the
IBM SONAS Cluster Manager provides multiple concurrent interface node file serving with
data integrity, across the following network protocols at the same time:
򐂰 CIFS (typically these are Windows users)
򐂰 NFS (typically these are Unix or Linux users)
򐂰 FTP
򐂰 HTTPS
SONAS Software Cluster Manager functionality supports multiple exports and shares of the
file system, over multiple interface nodes, by providing distributed lock, share, and lease
support. The SONAS Cluster Manager is transparent to the NFS, CIFS, FTP, and HTTPS
clients; these clients are unaware of, and do not need to know that the SONAS Cluster
Manager is servicing and managing these multiple protocols concurrently.
When sharing files and directories, SONAS reflects changes made by one authorized user, to
all other users that are sharing the same files and directories. As an example, if a
SONAS-resident file is renamed, changed, or deleted, other uses who this fact will
immediately be reflected properly to all SONAS-attached clients on other platforms, including
those using other protocols, as shown in Figure 3-9.
delete SONAS User1 SONAS User2

files
01:00
12
9 3
6
01:01
12
9 3
6 Other
users see
files are
deleted
Figure 3-9 SONAS concurrent access to a shared directory from multiple users
SONAS Software employs sophisticated distributed cluster management, metadata

management, and a scalable token management system to provides data consistency while
supporting concurrent file access from thousands of users. All read and write locking types
are kept completely coherent between NFS and CIFS clients, globally, across the cluster.
SONAS Cluster Manager provides the capability to export data from a collection of nodes
using CIFS, NFSv2, NFSv3, FTP, and HTTPS.

3.3.7 Distributed metadata manager for concurrent access and locking

In order to assure data consistency, SONAS Software provides both a sophisticated
multi-platform interface node locking capability that works in conjunction with a sophisticated
token (lock) management capability in the file system that is derived from the IBM General
Parallel File System. This capability coordinates a shared-everything global access from any
and all interface nodes, to any and all disk storage - assuring the consistency of file system
data and metadata when different nodes access the same file.
SONAS Software is designed to provide a flexible, multi-platform environment, as shown in

Figure 3-10.
Logical
/home Windows/home/appl/data/web/writing_reading_the_file.dat
/appl
Unix-Linux/home/appl/data/web/writing_reading_the_file.dat
/data
Any/home/appl/data/web/writing_reading_the_file.dat
/web
CIFS NFS
IBM Scale Out NAS
SONAS file system
provides ability for multiple
Global Namespace concurrent readers/writers
from multiple platforms
Policy Engine
Interface
nodes … Interface
nodes … Interface
nodes …> scale
out
Storage
nodes
….. Storage
nodes
….. Storage
nodes
... > scale
out
Physical
Figure 3-10 SONAS provides concurrent access and locking from multiple platforms
SONAS Software has multiple facilities to provides scalability. These include the distributed
ability for multiple nodes to act as token managers for a single file system. SONAS Software
also provides scalable metadata management by providing for a distributed metadata
management architecture, thus allowing all nodes of the cluster to dynamically share in
performing file metadata operations while accessing the file system.
This distinguishes SONAS from other cluster NAS filer architectures which may have a
centralized metadata server handling fixed regions of the file namespace. A centralized
metadata server can often become a performance bottleneck for metadata intensive
operations and can represent a scalability limitation and single point of failure. SONAS solves
this problem by managing metadata at the node which is using the file or in the case of
parallel access to the file, at a dynamically selected node which is using the file.
3.3.8 SONAS Cluster Manager components

The SONAS Cluster Manager provides services to the following file serving protocols:
򐂰 NFS file serving
򐂰 CIFS file serving via SONAS CIFS component

򐂰 Clustered CIFS provided by Clustered Trivial Data Base (CTDB) - which clusters SONAS
CIFS component, monitors interface node services including start / failover / failback of
public IP address
򐂰 FTP daemon
򐂰 HTTPS daemon
In SONAS Cluster Manager, the software used includes:

򐂰 SONAS CIFS component to provide Windows CIFS file serving, including mapping CIFS
semantics, userids, security identifiers, NTFS access control lists, and other required
CIFS mapping, to the underlying SONAS parallel file system.
򐂰 Clustered Trivial Data Base (CTDB), which in combination with SONAS CIFS component
provides a fully clustered CIFS capability
Working together, the SONAS Cluster Manager and these components provide true
multi-protocol, active/active clustering within a single global namespace, spanning multiple
interface nodes, all clustered transparently to applications.
SONAS CIFS component and CTDB in the SONAS Cluster Manager

IBM SONAS Cluster Manager uses SONAS CIFS component technology to provide the CIFS
file serving capability on individual interface nodes. SONAS Cluster Manager uses the
open-source Clustered Trivial Data Base (CTDB) technology to store important cluster and
locking information in small databases called TDB (trivial data base). The local Trivial
Database (TDB) files also contains the messaging, locking details for files and information
about open files that are accessed by many clients. Each TDB has metadata information of
the POSIX to CIFS semantics mapping and vice versa, across the cluster.
CTDB addresses the fact that a SONAS CIFS component process, by itself running on a
interface node, does not know about the locking information held by SONAS CIFS
component processes running locally on the other interface nodes.
CTDB provides the functionality to co-ordinate the SONAS CIFS component processes that
run on different SONAS interface nodes. To have consistency in the data access and writes,
CTDB provides the mechanism by which SONAS CIFS component running on each interface
node can communicate with each other and effectively share the information for proper
locking, with high performance, and to assure data integrity by avoiding shared data
corruption.
An example of this operation is as follows;

򐂰 Suppose that file1 has been accessed through multiple different nodes by multiple end
user clients.
򐂰 These multiple nodes need to know about the locks and need to know that each of them
have accessed the same file.
򐂰 CTDB provides the architecture and the function to provide lightweight, fast, scalable
intercommunication between all the nodes in the cluster, to coordinate the necessary
cross-node communication, and to intelligently minimize that cross-communication to
assure scalability.
򐂰 If any of the nodes wish to write to the file, the SONAS Cluster Manager CTDB function
assures that proper file integrity is maintained. CTDB performs the services and high
performance architecture for individual SONAS interface nodes (regardless of protocol
used to access the file) to take ownership and transfer ownership of individual records, as
necessary, to assure data integrity.
򐂰 CTDB assures data integrity by tracking and assuring that only the owning interface node
has the most recent copy of the record and that only the proper owning node has the
ability to updated the record.

򐂰 When required, CTDB is specifically architected to provide the high performance,

lightweight messaging framework and trivial data bases to quickly cross-notify,
cross-share, and properly pass ownership among requesting interface nodes, to update
records with integrity and high performance
More information about SONAS CIFS component and Clustered TDB

The Clustered TDB is a shared TDB approach to distributing locking state in small, fast
access files that are called ‘trivial data bases’ (they are called this because they are designed
to be very lightweight in message size and very fast in speed of access). In this approach, all
cluster nodes access the same TDB files. Clustered TDB (CTDB) provides the same types of
functions as TDB but in a clustered fashion, providing a TDB-style database that spans
multiple physical hosts in a cluster, while preserving the high-speed of access and very
lightweight small message size.
CTDB technology also provides the fundamental SONAS Cluster Manager failover
mechanisms to ensure data integrity is not lost if any interface node goes down while serving
data.
In summary, the CTDB functionality provides important capabilities for the SONAS Cluster
Manager to provide a global namespace virtual file server to all users from any protocol, in
which all interface nodes appear as a single virtual file server which encloses all the interface
nodes. The CTDB also assures that all the SONAS CIFS components on each interface node
are able to talk to each other in a high performance, scalable manner, and update each other
about the locking and other information held by the other SONAS CIFS components.
SONAS Cluster Manager summary

In summary, the SONAS Cluster Manager provides SONAS interface node clustering through
integrating, testing, and providing the enterprise class support for the SONAS Software
capabilities. SONAS Cluster Manager provides the multi-protocol, cross-platform locking and
control, interface node monitoring, IP address management, and interface node failover and
failback
A summary of the functions of the SONAS Cluster Manager is as follows:

򐂰 Provides ability for all user clients to be able to connect to any interface node
򐂰 All interface nodes appear to the users as a single large global namespace NAS server
򐂰 Fully supports and exploits the internal SONAS parallel file filesystem, from which all
interface nodes can serve out different files or the same set of files, with parallel high
performance
򐂰 Provides full data integrity across all interface nodes and across all concurrent users and
applications, from multiple network storage access protocols
򐂰 Interface nodes can fail and clients are transparently reconnected to another interface
node
򐂰 All file changes are immediately seen on all interface nodes and all other clients accessing
the same file
򐂰 Minimizes the latency and cross-communication required of interface nodes to check for
proper file and data integrity
򐂰 Provides ability to scale in a linear, non-disruptive fashion by simply adding more interface
nodes or storage nodes
Specific enhancements that IBM has made in the SONAS Cluster Manager for CIFS include:
򐂰 Clustering enhancements

– Multiple exports and shares of the same file system over multiple nodes including
distributed lock, share and lease support
– Failover capabilities on the server
– Integration with NFS, FTP, HTTPS daemons in regard of locking, failover and
authorization
򐂰 Performance optimization with the SONAS file system (i.e. with GPFS)
򐂰 NTFS Access Control List (ACL) support in SONAS CIFS component using the native
GPFS NFSv4 ACL support
򐂰 HSM support within SONAS CIFS component to allow destaging of files to tape and user
transparent recall.
򐂰 VSS integration of SONAS Snapshots
Next, we will examine SONAS Authentication.
3.4 SONAS authentication and authorization

SONAS requires an external service to provide authentication and authorization of client
users. Authentication is the process of verifying client user identity and is typically performed
by verifying credentials such as user ID and password. Authorization is the process of
deciding which resources a user can access, for example a user may have full control over
one drectory allowing read, write, create, delete and execute and no access to a different
directory.
SONAS supports the following authentication methods:

򐂰 Microsoft Active Directory
– Active Directory itself provides Kerberos Infrastructure)
– Active Directory with SFU (Services for Unix, RFC2307 schema)
򐂰 LDAP (Lightweight Directory Access Protocol)
– including LDAP with MIT Kerberos
򐂰 Samba Primary Domain Controller PDC / NT4 mode
򐂰 Network Information Service (NIS) with NFS NetGroup support only for ID mapping
The authentication server is external to SONAS and must have a proper connectivity to
SONAS. The authentication server is configured externally to the SONAS, using normal
authentication server skills. Note that the external authentication server is a essential
component in the SONAS data access flow, if the authentication server is unavailable then
the data access is not available. At the current SONAS 1.1.1 release level, a single SONAS
system supports only one of the above authentication methods at a time and in order to
access a SONAS, the user must be authenticated using the authentication method that is
configured on that particular SONAS system.
Only the SONAS interface nodes are configured for authentication by users. SONAS storage
nodes are not part of this configuration. In the current SONAS release, the number of groups
per user is limited to approximately 1000.
Care must be taken with time synchronization, all of the SONAS nodes must have their time
set by a network time protocol (NTP) server and the same server must synchronize the time
for the authentication server such as Active Directory (AD) server and/or Kerberos KDC
server. Note that and Active Directory (AD) domain controller can be used as a NTP time
source.
To setup a SONAS system, administrative information for the selected authentication server
should be obrtained in advance. Examples of the information required are administrative
account, password, SSL certificate, Kerberos keytab file. Refer to the Managing

authentication server integration chapter in the SONAS Administrator's Guide for the
information required for each authentication protocol. Additional information can be also
found in section “Authentication using AD or LDAP” on page 259.
3.4.1 SONAS authentication concepts and flow

To access files, client users must authenticate with SONAS. How authentication operates
depends on the file sharing protocol that is used.
CIFS authentication concepts

For the CIFS protocol user authentication is performed using the challenge-response method
where the challenge is asking for the password and the response is the correct password. In
the case of CIFS no password is transferred over the wire and instead only a password hash
is send by the client. For this reason LDAP needs a special schema to store password
hashes. With kerberos the CIFS client can also authenticate using a valid kerberos ticket
which has been granted by a trusted authority or KDC.
HTTP/FTP/SCP authentication concepts

When using the HTTP/FTP/SCP User Authentication is done by transferring the password to
the protocol server; in th ecase of HTTP and SCP the password is encrypted. Linux Plugable
Authentication Module (PAM) system will forward the authentication request to the configured
authentication system. These protocols do not support the use kerberos tickets.
NFS authentication concepts

In the case of NFS authentication is only performed by host name and ip address. There is no
user authentication concept. Authorization is based on unix user IDs (uids) and group IDs
(gids). The NFS client will send the uid/gid of the current user to the NFS Server inside
SONAS. To guarantee consistent authorization you must ensure that the client has the same
id mapping, the same uid/gid mappings, as the NFS server in SONAS. How to do this is
explained in “SONAS Authentication - ID Mapping” on page 93. With kerberos the CIFS client
can also authenticate using a valid kerberos ticket which has been granted by a trusted
authority or KDC.
The SONAS authentication of users occurs according to the diagram shown in Figure 3-11.

Clients 1. User Auth. Request

2. verify Auth. Request
Authentication
w/o Server
Kerberos 4. Response 3. Response
SONAS
3. Kerberos Ticket
4. Response
Clients
with
Kerberos with
1. User Auth. Request Kerberos
2. Granted Kerberos Ticket
KDC
Figure 3-11 SONAS authentication flow
Clients without kerberos send a (1) user authentication request to SONAS that (2) sends the
autentication request to the external authentication server. The authentication server then (3)
sends a response to SONAS and SONAS then (4) sends the response back to the client.
In the case of kerberos the client sends a (1) user authentication request directly to the
authentication server that also has a Kerberos Distribution Center (KDC). The authentication
server then (2) replies with a kerberos ticket for the client. The client then (3) sends a request
to SONAS with the kerberos ticket that was granted and SONAS then (4) sends the response
back to the client. Kerberos tickets have a lease time before expiring so a client can access
SONAS multiple times witout requiring re-authentication with the kerberos KDC.
SONAS Authentication - ID Mapping

SONAS software is designed to support multiple different platforms and protocols all
accessing the SONAS concurrently. However, Windows CIFS systems use Security
Identifiers (SID) internally to identify users and groups, whereas a UNIX systems uses a 32
bit userid / group id (uid/gid). To make both worlds work together in SONAS and provide full
concurrent and consistent access from both platforms, SONAS performs a user mapping
between Windows SID and UNIX uid/gid.
As the underlying SONAS data is stored in a POSIX-compliant Unix and Linux style file
system based on IBM GPFS, all Access Control List (ACL) and access information is
ultimately controlled using SONAS GPFS uid/gid. that is the standard way of controlling user
access in UNIX based environments.
Therefore, while accessing SONAS data from UNIX or Linux systems using the NFS protocol,
there are no issues as these uid/gid directly maps to UNIX system uid and gid.
However, when Windows clients access the SONAS, SONAS Software provides the mapping
between the Windows user SID (Security identifier) and the internal file system UID to identify
users. In SONAS, depending on the type of authentication used, different methods are
applied to solve this UID to SID mapping requirement. The SONAS user ID mapping flow is
shown in Figure 3-12.

User- or Groupname  Microsoft Security ID (SID)
Windows AD
SONAS file system
SONAS file System (GPFS

uses Linux
CIFS UID / GID
User- / Groupname
Shared
NFS Id map db
UID / GID
NFS provides UID / GID at client

level only. This means the mapping SONAS maps Usernames and
has to happen on NFS CLIENT level: Groups to Unix User/group IDs
- Create user with correct ID’s manually consistently across all nodes
Interface
- Use Microsoft AD Service for Unix (SFU) nodes
Figure 3-12 SONAS authentication - userid mapping
To solve the ID mapping issue SONAS supports multiple authentication server integrations:
򐂰 LDAP and LDAP with MIT Kerberos
򐂰 Samba primary domain controller (PDC) for Microsoft Windows NT® version 4 (NT4)
򐂰 Active Directory Server (ADS itself works as Kerberos), and AD with Microsoft Windows
Services for UNIX (SFU)
򐂰 Network Information Service (NIS) as an extension to AD/Samba PDC
SONAS Active Directory authentication

In case of Active Directory (AD), SONAS generates a uid for each SID, using auto-increment
logic. This means that if any new user access SONAS, SONAS creates a uid at runtime and it
is stored in the SONAS Cluster Manager Trivial Data Base (TDB) that was discussed earlier
in this chapter. Therefore, when using AD as authentication for SONAS, you should be aware
of this and plan to create the user on the Unix/Linux machine, which matches the uid created
on SONAS.
SONAS Active Directory authentication with SFU

When using AD with AD Services for Unix (SFU), SONAS uses AD SFU to read the SID to
uid mapping. In this case, when users access SONAS, the SONAS Software fetches this SID
to uid mapping from SFU. The uid/gid is stored in a dedicated field in the user/group object on
the AD server, this requires SFU schema extension or Windows 2003 R2.
SONAS with LDAP authentication

For SONAS authentication using Unix/Linux Lightweight Directory Access Protocol (LDAP),
the SID to uid mapping is kept in LDAP server itself, uid/gid is stored in a dedicated field in the
user/group Object on the LDAP server. So this is typically a very straightforward
authentication environment. with little or no issue on ID mapping in LDAP.

SONAS NIS authentication extension for ID mapping

NIS is used in UNIX based environments for centralized management of users and other
services. NIS is used for keeping user, domain and netgroup information. Using NIS for user
and hostname management ensures that all machine have same user information, and this is
useful when using NAS data stores such as SONAS trough the NFS protocol. The netgroup
construct is used to name a group of client machine IP/hostnames, the netgroup name is then
specified while creating NFS exports instead of individuals client machines. NIS is also used
for user authentication for different services like ssh,ftp,http.
Note: In SONAS we do not support NIS as an authentication mechanism, we use NIS

exclusively for netgroup support and ID mapping.
SONAS supports the following three different modes of NIS configuration:

򐂰 Plain NIS without any authentication, just for negroup support.
– In this case only the NFS protocol is supported and the other protocols are disabled.
– This mode is used only for customer site where customer has only NFS client access
without any authentication.
– Any previous authentication is removed.
– SONAS uses NIS default domain to resolve netgroup even though we support multiple
NIS domains
򐂰 NIS for netgroup only and Active Directory for authentication and AD Auto increment ID
mapping logic.
– This mode is used only for customers needing netgroup support for NFS clients and
other protocols uses AD.
– This is extension to existing AD authentication.
– All protocols will use AD for authentication and ID Mapping is doing using auto
increment logic.
– SONAS needs to be configured with AD using cfgad and then configured for NIS using
cfgnis.
򐂰 NIS with ID mapping as extension to Active Directory and netgroup support.
– This configuration is used when you have both Windows and UNIX systms and you
want to keep a known mapping of UNIX users with Windows users.
– We need SONAS to be configured with AD using cfgad and then run cfgnis to
configure NIS.
– In this mode NIS becomes an extension to existing AD authentication.
– All protocols will use AD for authentication and ID Mapping is done using NIS.
– For user accessing SONAS the SID to uid mapping is done by NIS ID mapping logic
with the help of domain map and user map rules.
We support Samba PDC also with NIS. In the above discussion what is valid for AD is also
valid for SAMBA PDC.
3.5 Data repository layer - SONAS file system

In this section, we will describe in more detail, the internal architecture of the SONAS file
system, which is based upon the IBM General Parallel File System.
In the SONAS Software the parallel file system, which includes the central policy engine and
the high performance scan engine, is at the heart of SONAS Software functionality as
illustrated in Figure 3-13 on page 96.


Parallel
File System

Security
Replication
Enterprise Linux
IBM Servers
Figure 3-13 SONAS Software - parallel file system, policy engine, scan engine
We will discuss core SONAS file system concepts, including the high-performance file system
itself, the manner in which the policy engine and scan engine provide the foundation for
SONAS Information Lifecycle Management (ILM) (discussed in detail in “SONAS data
management services” on page 107), and characteristics of the SONAS file system for
configuration, performance, scalability, and storage management.
As mentioned, the SONAS file system is based upon IBM General Parallel File System
(GPFS) so, if you are familiar with IBM GPFS, then you will be quite familiar with the concepts
discussed in this section.
The SONAS file system offers more than a traditions file system; it is the core foundation for a
end to end NAS file management infrastructure within SONAS. IBM leverages IBM GPFS
technology to provide a proven high performance parallel grid file system architecture, with
high reliability and high scalability.
In addition to providing file storage capabilities, the SONAS file system also provides storage
management and information life cycle management tools, centralized administration and
facilities that, in conjunction with the SONAS Cluster Manager, allows for shared high
performance access from multiple NAS protocols simultaneously.
IBM SONAS was designed to leverage IBM GPFS long history as a high performance parallel
file system, supporting many different types of applications ranging from relational databases,
to digital media, to high performance analytics, to scalable file serving.
The core GPFS technology is installed today in across many industries including financial,
retail and government applications. GPFS has been tested in very demanding large
environments for over 15 years, making GPFS a solid foundation for use within the SONAS
as the central parallel file system.
For more detailed information on configuring SONAS file systems you can refer to section
“File system management” on page 345. We will now discuss the SONAS file system in
greater detail:

3.5.1 SONAS file system scalability and maximum sizes

The SONAS maximum file system size for standard support is 2PB. Larger PB file systems
are possible by submitting a request to IBM for support. The SONAS file system is based
upon IBM GPFS technology, which today runs on many of the worlds largest
supercomputers. The largest existing GPFS configurations run into the 10s of thousands of
nodes. IBM GPFS has been available on IBM AIX since 1998, and on Linux since 2001. IBM
GPFS as been field proven time and again on some of the world's most powerful
supercomputers1 to provide efficient use of disk bandwidth, and it is this technology that is
being packaged in a Scale Out NAS form factor, manageable with standard NAS
administrator skills.
SONAS leverages the fact that GPFS was designed from the beginning to support extremely
large, extremely challenging high performance computing environments. Today, SONAS
uses that technology to supports building a single global namespace and a single file system
over the entire 14.4 PB current maximum size of a physical SONAS system. The theoretical
limits to the SONAS file system are shown in Table 3-1. The currently supported maximum
SONAS number of files per file system is 231 -1, approximately 2 billion.
Table 3-1 SONAS file system theoretical limits

Attribute Theoretical limit
Maximum SONAS capacity 134217728 Yobibytes (2107 Bytes)
Max. size of a single shared file system 524288 Yobibytes (299 Bytes)
Max. number of file systems within one cluster 256
Max. size of a single file 16 Exibytes (264 Bytes)
Max. number of files per file system 2.8 quadrillion (248 )
Max. number of snapshots per file system 256
Max. number of subdirectories per directory 216 (65536)
The SONAS cluster can contain up to 256 mounted file systems. There is no limit placed
upon the number of simultaneously opened files within a single file system.
3.5.2 Introduction to SONAS File System parallel clustered architecture

The SONAS file system is built upon a collection of disks which contain the file system data
and metadata. A file system can be built from a single disk or contain thousands of disks
storing Petabytes of data.
SONAS implements its file system upon a grid parallel architecture, in which every 'node'
runs a copy of SONAS software and thus has a copy of the SONAS parallel file system code
running on it. SONAS implements a two-tier global cluster, with SONAS interface nodes as
the upper tier of network file serving nodes, and with SONAS storage nodes serving as the
lower tier.
On the interface nodes, the SONAS file system code serves as file system storage
requesters, Network Shared Disk (NSD) clients in GPFS terminology. On the storage nodes,
the SONAS file system code serves as file system storage servers or NSD servers in GPFS
terminology. All SONAS interface, management and storage nodes are a peer-to-peer global
cluster.

SONAS leverages the experience from worldwide current GPFS customers who are using
single file systems from 10 to 20 PB in size and growing. Other GPFS user file systems
containing hundreds of millions of files.
3.5.3 SONAS File system performance and scalability

SONAS file system achieves high performance I/O by using the following techniques:
򐂰 Striping data across multiple disks attached to multiple nodes
– All data in the SONAS file system is read and written in wide parallel stripes.
– The blocksize for the file system determines the size of the block writes.
– The blocksize is specified at the time the file system is defined, is global across the file
system, and cannot be changed after the file system is defined
򐂰 To optimized for small block writes, a SONAS block is also sub-divided into 32 sub-blocks,
so that multiple small block application writes can be aggregated and stored in a SONAS
file system block, without unnecessarily wasting space in the block
򐂰 Providing a high performance metadata (inode) scan engine, to scan the file system very
rapidly in order to enable fast identification of data that needs to managed or migrated in
the automated tiered storage environment or replicated to a remote site
򐂰 Supports a large block size, configurable by the SONAS administrator, to fit I/O
requirements.
– Typical blocksizes are the default 256KB which is good for most workloads, especially
mixed small random and large sequential workloads
– For large sequential workloads, the SONAS file system can optionally be defined with
blocksizes at 1 MB or 4 MB
򐂰 Utilizes advanced algorithms that improve read-ahead and write-behind file functions for
caching in the interface node
򐂰 Uses block level locking based on a very sophisticated scalable token management
system to provide data consistency while allowing multiple application nodes concurrent
access to the files.
Let's see how SONAS scalability is achieved using the expandability of the SONAS building
block approach. In the diagram in Figure 3-14 on page 99, we see a SONAS interface node,
performing a small single read or write on a disk. Because this read or write is small in size,
only the resources of one path, one storage node, and one RAID Controller / disk are
sufficient to handle the IO operation:

Interface
Interface Node
Node
InfiniBand network
Storage Pod
Storage
Storage Node
Node Storage
Storage Node
Node
(NSD
(NSD Server)
Server) (NSD
(NSD Server)
Server)
RAID RAID
Controller Controller
Raid
disk
Figure 3-14 A small single read or write in the SONAS file system
The power of the SONAS file system, however, is in its ability to read or write files in parallel
chunks of the defined blocksize, across multiple disks, controllers, and storage nodes inside a
storage pod as shown in see Figure 3-15.
Interface
Interface Node
Node
InfiniBand network
Storage Pod
Storage
Storage Node
Node Storage
Storage Node
Node
(NSD
(NSD Server)
Server) (NSD
(NSD Server)
Server)
RAID RAID RAID RAID

Controller Controller Controller Controller
Raid Raid Raid Raid Raid Raid Raid Raid Raid Raid Raid Raid
(NSD) (NSD) (NSD) (NSD) (NSD) (NSD) (NSD) (NSD) (NSD) (NSD) (NSD) (NSD)
Figure 3-15 A high parallel read or write in the SONAS file system. This could be one file.
The scalability of the SONAS file system does not stop with a single storage pod. In addition
to the very large parallel write capability of a single storage pod as shown in Figure 3-16 on
page 100.

Interface
Interface Node
Node
InfiniBand network
Storage
Storage Storage
Storage
Node
Node Node
Node
High Density
Storage Array
High Density
Storage Array
Storage Pod
Figure 3-16 SONAS file system parallel read / write capability to one storage pod
If the file is big enough, or if the aggregate workload is big enough, the SONAS file system
easily expands to multiple storage pods in parallel as shown in Figure 3-17.
Interface
Interface Node
Node
InfiniBand network
Storage
Storage Storage
Storage Storage
Storage Storage
Storage Storage
Storage Storage
Storage Storage
Storage Storage
Storage
Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node
High Density High Density High Density High Density

Storage Array Storage Array Storage Array Storage Array

Storage Pod Storage Pod Storage Pod Storage Pod
Figure 3-17 SONAS file system parallel read/ write capability to multiple storage pods
We can see that the SONAS file system provides the capability for extremely high parallel
performance. This is especially applicable to modern day analytics-intensive data types with
the associated large data objects and unstructured data.
The SONAS file system recognizes typical access patterns like sequential, reverse sequential
and random and optimizes I/O access for these patterns.
Distributed metadata and distributed locking

The SONAS file system also implements the sophisticated GPFS-based token lock
management, which coordinates access to shared disks ensuring the consistency of file
system data and metadata when different nodes access the same file.
SONAS file system has implemented a sophisticated distributed metadata server function, in
which multiple nodes act, share, acquire, and relinquish roles as token managers for a single

file system. This distributed architecture avoids metadata server bottlenecks, and has been
proven to scale to very large file systems.
Along with distributed token management, SONAS file system provides scalable metadata
management by allowing all nodes of the cluster accessing the file system to perform file
metadata operations. This key and unique feature distinguishes SONAS from other cluster
file systems which have a centralized metadata server handling fixed regions of the file
namespace. The SONAS file system design avoids a centralized metadata serve, to avoid
problems where there is a performance bottleneck for metadata intensive operations. This
also improves availability, as the distributed metadata server function provides additional
insulation against a metadata server single points of failure.
SONAS implements the GPFS technology that solves this problem by managing metadata at
the node which is using the file or in the case of parallel access to the file, at a dynamically
selected node which is using the file.
SONAS file system administration

The SONAS file system provides an administration model that is easy to use and consistent
with standard Linux file system administration, while providing extensions for the clustering
aspects of SONAS.
These functions support cluster management and other standard file system administration
functions such as user quotas, snapshots and extended access control lists.
The SONAS file system provides functions that simplify cluster-wide tasks. A single SONAS
command or GUI command can perform a file system function across the entire SONAS file
system cluster.
The distributed SONAS file system architecture facilitates a rolling upgrade methodology, to
allow you to upgrade individual SONAS nodes in the cluster while the file system remains
online. The SONAS file system also supports a mix of nodes running at current and new
release levels, to enable dynamic SONAS Software upgrades.
SONAS file system implements quotas to enable the administrator to control and monitor file
system usage by users and groups across the cluster. The SONAS file system provides
commands to generate quota reports including user, group and fileset inode and data block
usage.
SONAS file system snapshots

In the current release, up to 256 read-only snapshots of an entire GPFS file system may be
created to preserve the file system's contents at a single point in time. The SONAS file
system implements a space efficient snapshot mechanism that generates a map of the file
system at the time it was taken and it is space efficiente because it maintains a copy of only
the file system data that has been changed since the snapshot was created. This is done
using a copy-on-write techniques. The snapshot function allows a backup program, for
example, to run while the file system is in use and still obtain a consistent copy of the file
system as it was when the snapshot was created.
In addition, SONAS snapshots provide an online backup capability that allows files to be
recovered easily from common problems such as accidental deletion of a file. It is a known
requirement to increase the snapshot granularity to include filesets, directories, and individual
files, and IBM intends to address these requirements in a future SONaS release.

SONAS storage pools

SONAS Storage pools are a collection of storage resources that allow you to group storage
LUNs takn from multiple storage subsystems into a single file system. SONAS storage pools
allow you to perform complex operations such as moving, mirroring, or deleting files across
multiple storage devices, providing storage virtualization and a single management context.
Storage pools also provide you with a method to partition file system storage for
considerations such as:
򐂰 Storage optimization by matching the cost of storage to the value of the data
򐂰 Improved performance by:
– Reducing the contention for premium storage
– Reducing the impact of slower devices ti critical applications
– Allowing you to retrieve HSM-archived data when needed
򐂰 Improved reliability by providing for:
– Granular replication based on need
– Better failure containment
– Creation of new storage pools as needed
There are two different types of storage pools: internal storage pools and external storage
pools.
Internal storage sools ar used for managing online storage resources, SONAS supports a
maximum of eight internal storage pools per file system. A minimum of ne pool is required
and is called the system storage pool and SONAS supports up to seven optional user pools.
GPFS assigns file data to internal storage pools under these circumstances:
򐂰 During file creation the storage pool is determined by the file placement policy
򐂰 Attributes of the file, such as file size or access time, match the rules of a policy that
directs the file to be migrated to a different storage pool
External storage pools are intended for use as near-line storage and archival HSM
operations. External storage pools require the use of a external storage management
application and SONAS supports TSM. The TSM external storage manager is responsible for
moving files from the SONAS filesystem and returning them upon the request of an
application accessing the file system.
SONAS filesets
SONAS also utilizes a file system object called a fileset. A fileset is a directory subtree of a file
system namespace that in many respects behaves like an independent file system. Filesets
provide a means of partitioning the file system to allow administrative operations at a finer
granularity than the entire file system. Filesets allow the following operations:
򐂰 Define quotas on both data blocks and inodes.
򐂰 Can be specified in a policy to control initial data placement, migration, and replication of
the file’s data.
SONAS supports a maximum of 1000 filesets per file system.
High performance scan engine

The most important step in file management operations is processing the file metadata. The
SONAS high performance metadata scan interface allows you to efficiently process the
metadata for billions of files. Once the candidate list of files is identified, data movement
operations can be done by multiple nodes in the cluster. SONAS has the ability to spread rule
evaluation and data movement responsibilities over multiple nodes in the cluster providing a
very scalable, high performance rule processing engine.

The SONAS file system implements a high performance scan engine, which can be used to
rapidly identify files that need to be managed within the SOnAS file system concept of logical
tiers storage pools. SONAS file system can transparently perform physical movement of data
between pools of logical tiered storage, and can also perform Hierarchical Storage
Management (HSM) to external storage, using an external Tivoli Storage Manager server.
Access Control
The SONAS filesystem uses NFSv4 enhanced access control to allow all SONAS users,
regardless of the NAS protocol by which they access the SONAS, to be able to take
advantage of this robust level of central security and access control to protect directories and
files. SONAS file system implements NFSv4 access control lists (ACLs) in addition to
traditional ACL support.
SONAS ACLs are based on the POSIX model. Access control lists (ACLs) extend the base
permissions, or standard file access modes, of read (r), write (w), and execute (x) beyond the
three categories of file owner, file group, and other users, to allow the definition of additional
users and user groups. In addition, SONAS introduces a fourth access mode, control (c),
which can be used to govern who can manage the ACL itself.
Exporting or sharing the SONAS file system

The SONAS file system is exported so it can be accessed trough NFS, CIFS, FTP or HTTPS
to SONAS users through the clustered capability of the SONAS Cluster Manager. The
SONAS Cluster Manager function works in conjunction with the SONAS file system to provide
clustered NFS, clustered CIFS, clustered FTP, and clustered HTTPS. With the SONAS
Cluster Manager, SONAS provides a super-scalable, high performance file system capability
with simultaneous access to a common set of data from multiple interface nodes,
The SONAS file system works in conjunction with the rest of the SONAS Software to provide
a comprehensive, integrated set of storage management tools including monitoring of file
services, load balancing and IP address fail over.
File system high availability

The SONAS clustered file system architecture provides high availability, parallel cluster fault
tolerance. The SONAS file system provides for continuous access to data, even if cluster
nodes or storage systems fail. This is accomplished though robust clustering features
together with internal or external data replication.
The SONAS file system continuously monitors the health of the file system components. If
failures are detected, appropriate recovery action is taken automatically. Extensive logging
and recovery capabilities are provided which maintain metadata consistency when nodes
holding locks or performing services fail.
Internal data replication, SONAS Software RAID-1 mirroring, can optionally be configured to
provide further protection over and above the SONAS hardware storage redundancy and
RAID. In addition, the SONAS file system automatically self- replicates and internally mirrors
file system journal logs and metadata to assure hot failover and redundancy and continuous
operation of the rest of the file system, even if all paths to a disk or storage pod fail.
SONAS file system Information lifecycle management (ILM)

The SONAS file system provides the foundation for the Data Management services that will
be discussed in “SONAS data management services” on page 107. SONAS file system is
designed to help achieve data lifecycle management efficiencies through policy-driven
automation and tiered storage management.

The SONAS file system implements the concept of logical storage pools, filesets and
user-defined policies to provide the ability to better match the cost of your storage to the value
of your data.
SONAS logical storage pools allow you to create groups of disks within a file system. Using
logical storage pools, you can create tiers of storage by grouping disks based on
performance, locality or reliability characteristics. For example, one pool could be high
performance SAS disks and another more economical SATA storage.
These types of internal logical storage pools are the constructs within which all of the data
management is done within SONAS. In addition to internal storage pools, SONAS supports
external storage pools, via an external Tivoli Storage Management HSM server.
Standard, commonly available Tivoli Storage Manager skills and servers are used to provide
this HSM function, and especially helps those customers who are already using TSM to
further leverage their TSM investment.
When moving data to an external pool, SONAS file system handles all the metadata
processing through the SONAS high performance scan engine, and then hands a list of the
data to be moved to the TSM Server for backup, restore, or HSM to external storage on any
of the TSM supported external storage devices, including external disk storage,
de-duplication devices, VTLs, or tape libraries for example. Data can be retrieved from the
external HSM storage pools on demand, as a result of an application opening a file.
SONAS file system provides the concept of a fileset, which is a sub-tree of the file system
namespace and provides a way to partition the global namespace into smaller, more
manageable units. Filesets provide an administrative boundary that can be used to set quotas
and be specified in a user defined policy to control initial data placement or data migration.
Data in a single fileset can reside in one or more storage pools. Where the file data resides
and how it is migrated is based on a set of SONAS file system rules in a user defined policy.
There are two types of user defined policies in SONAS: file placement and file management.
File placement policies determine which storage pool file data is initially placed in. File
placement rules are determined by attributes known when a file is created such as file name,
user, group or the fileset.
An example may include place all files that end in .avi onto the platinum storage pool, place
all files created by the CEO on the gold storage pool, or place all files in the fileset
‘development’ in the bronze pool.
Once files exist in a file system, SONAS file management policies allow you to move, change
the replication status or delete files. You can use file management policies to move data from
one pool to another without changing the file’s location in the directory structure.
Filesets can be used to change the replication (mirroring) status at the file level, allowing fine
grained control over the space used for data availability. You can use a policy that says:
replicate all files in /database/payroll which have the extension *.dat and are greater than 1
MB in size to storage pool #2.
In addition, file management policies allow you to prune the file system, deleting files as
defined by policy rules. File management policies can use more attributes of a file than
placement policies because once a file exists there is more known about the file.
In addition to the file placement attributes you can now utilize attributes such as last access
time, size of the file or a combination of user and file size. This may result in policies like:
delete all files with a name ending in .temp that have not been accessed in 30 days, move all

files that are larger than 2 GB to pool2, or migrate all files owned by Sally that are larger than
4GB to the SATA storage pool. Rules can include attributes related to a pool instead of a
single file using the threshold option. Using thresholds you can create a rule that moves files
out of the high performance pool if it is more than 80% full, for example. The threshold option
comes with the ability to set high low and pre-migrate thresholds. This means that GPFS
begins migrating data at the high threshold, until the low threshold is reached. If a pre-migrate
threshold is set GPFS continues to copy data to TSM until the pre-migrate threshold is
reached. This allows the data to continue to be accessed in the original pool until it is quickly
deleted to free up space the next time the high threshold is reached.
SONAS file system policy rule syntax is based on the SQL 92 syntax standard and supports
multiple complex statements in a single rule enabling powerful policies. Multiple levels of
rules can be applied because the complete policy rule set is evaluated for each file when the
policy engine executes.
All of these dat amanagement funtions functions will be described in more detail in “SONAS
data management services” on page 107.
SONAS cluster two-tier configuration

The SONAS file system is built in a two-tiered architecture, wherein the interface nodes are
not directly attached to the storage. In this configuration, SONAS file makes use of the
GPFS-based network block device capability. SONAS uses the GPFS-provided block level
interface, called the Network Shared Disk (NSD) protocol, which operates over the internal
SONAS InfiniBand network.
In this configuration, the interface nodes are 'GPFS Network Shared Disk (NSD) clients', in
that they make GPFS NSD storage read and write requests over the internal InfiniBand
network to the storage nodes, which are 'GPFS Network Shared Disk (NSD) servers'. The
internal SONAS file system thus transparently handles I/O requests between the interface
nodes and the storage nodes.
SONAS clusters use this GPFS-based Network Shared Disk (NSD) protocol to provide high
speed data access from the interface nodes to the storage nodes. Storage is direct attached
to the storage nodes (i.e. the GPFS NSD storage servers). Each storage node (i.e. NSD
server) provides storage serving to it's own particular section of the overall SONAS file
system disk collection. Note that every SONAS storage pod has two storage nodes (i.e.two
GPFS NSD servers) that provide dual paths to serve each disk, thus avoiding single points of
failure in the disk hardware.
The internal SONAS file system cluster uses the internal InfiniBand network for the transfer of
both file system control information between all nodes, as well as for all data transfer between
the interface nodes (GPFS NSD clients) and the storage nodes (GPFS NSD servers).
The SONAS file system internal architecture is shown in Figure 3-18.

Interface
Interface Node
Node Interface
Interface Node
Node Interface
Interface Node
Node
NSD
NSD client
client NSD
NSD client
client NSD
NSD client
client
SONAS Storage nodes InfiniBand network interconnect SONAS Interface

are ‘NSD servers’ nodes are
‘NSD clients’
NSD
NSD server
server NSD
NSD server
server NSD
NSD server
server NSD
NSD server
server NSD
NSD server
server NSD
NSD server
server NSD
NSD server
server NSD
NSD server
server
Storage
Storage Storage
Storage Storage
Storage Storage
Storage Storage
Storage Storage
Storage Storage
Storage Storage
Storage
Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node


Storage Pod Storage Pod Storage Pod Storage Pod
Figure 3-18 SONAS file system two-tier architecture with internal GPFS NSD clients and NSD servers
As shown above, the fact that the disks are remote to the interface nodes, is transparent to
the interface nodes themselves, to the users. The storage nodes, NSD server nodes, are
responsible for the serving of disk data blocks across the internal InfiniBand network.
The SONAS file system is thus composed of storage pod 'building blocks' for storage, in
which a balanced number of storage nodes, NSD storage servers, are preconfigured within
the SONAS storage pod, to provide optimal performance from the disk storage.
The SONAS cluster runs on enterprise class commercial Intel-based servers - and these run
on a Linux-based kernel. The SONAS file system nodes use a native InfiniBand protocol
built on Remote Memory Direct Access (RDMA) technology to transfer data directly between
the interface node NSD client memory and the storage node NSD server memory thus
exploiting the 20 Gbit/sec per port data transfer rate of the current SONAS internal InfiniBand
switches, maximizing throughput, and minimizing node CPU utilization.
SONAS file system summary

The SONAS file system is at the heart of the SONAS Software stack. Based on IBM GPFS,
SONAS file is highly scalable:
򐂰 Symmetric, scalable software architecture
򐂰 Distributed metadata management
򐂰 Allows for incremental scaling of system in terms of nodes and disk space with ease
򐂰 Based on GPFS technology which today runs 10s of thousands of nodes in a single
cluster
SONAS file system is a high performance file system

򐂰 Large and tunable block size support with wide striping across nodes and disks
򐂰 Parallel access to files from multiple nodes
򐂰 Supports byte-range locking and distributed token locking management
򐂰 Efficient deep prefetching: read ahead, write behind
򐂰 Recognize access patterns with adaptable mechanisms
򐂰 Highly multi threaded
SONAS file system is highly available and fault tolerant

򐂰 Data protection mechanisms include journaling, replication, mirroring,

򐂰 Internal peer-to-peer global cluster heartbeat mechanism to recover from multiple disk,
node, connectivity failures
򐂰 Recovery software mechanisms implemented in all layers
Let’s now examine in more detail, the SONAS data management services.
3.6 SONAS data management services

We now turn our attention to describing the operational concepts of SONAS data
management that uses the central policy engine to automatically place and move files on
tiered storage using the integrated HSM and ILM capabilities.
The SONAS Software functions that we will examine in this section, as shown in Figure 3-19
on page 107. These services are supplied by the policy and scan engines in the parallel file
system and the data movement, copy and replication functions such as HSM and ILM,
backup and replication.

Parallel
File System

Security
Replication
Enterprise Linux
IBM Servers
Figure 3-19 SONAS Software data management services components
We will also discuss the role and usage of Tivoli Storage Manager (TSM) together with
external TSM servers to provide accelerated backup and restore, and tor provide HSM to
external storage pools. Finally, we will also describe local data resiliency using Snapshots,
and remote resiliency using asynchronous replication.
3.6.1 SONAS - Using the central policy engine and automatic tiered storage
SONAS uses policies to control the lifecycle of files that it manages and consequently control
the costs of storing data by automatically aligning data to the appropriate storage tier based
on the policy rules setup in by the SONAS administrator. Figure 3-20 illustrates a tiered
storage environment that contains multiple storage tiers and each tier has specific
performance characteristics and associated costs, for example the poolfast contains fast and
expensive disk whereas pooltape contains relatively inexpensive tapes.

Figure 3-20 Policy-based storage tiering
Evidently performance comes at a price, and is the main cost differentiator in storage
acquisitions. For this reason setting policies can help control costs by using the appropriate
storage tier for a specific sets of data and making room on the more expensive tiers for new
data with higher performance requirements.
The SONAS policy implementation is based on and uses the GPFS policy implementation.
File reside in SONAS storage pools and policies are assigned to files and control the
placement and movement of files between storage pools.
A SONAS policy consists in a collection of rules and the rules control what actions are
executed and against what files the actions are performed. So the smallest entity controlled
by a rule is a file. SONAS has three types of policies:
Initial file placement These rules control the placement of newly created files in a specific
storage pool.
File management These rules control movement of existing files between storage pools
and the deletion of old files. Migration policies are used to transfer
data between the SONAS storage pools and to the external HSM
storage pool and to control replication of SONAS data.
Restore of file data These rules control what happens when data gets restored to a
SONAS file system.
SONAS policy rules are single statements define an operation such as migrate and replicate
a file, there are three uses for rules:
򐂰 Initial file placement
򐂰 File management
򐂰 Restoring file data
Policy rules are SQL-like statement that specify conditions that, when true, cause the rule to
be applied. Conditions that cause GPFS to apply a rule include :
򐂰 Date and time when the rule is evaluated, that is, the current date and time
򐂰 Date and time when the file was last accessed
򐂰 Date and time when the file was last modified
򐂰 Fileset name
򐂰 File name or extension
򐂰 File size
򐂰 User ID and group ID

SONAS evaluates policy rules in order, from first to last, as they appear in the policy. The first
rule that matches determines what is to be done with that file. Example 3-1shows sample rule
syntax:
Example 3-1 Sample rule syntax

RULE 'mig1' MIGRATE FROM POOL 'pool_1'
THRESHOLD(90,80,70) WEIGHT(KB_ALLOCATED) TO POOL 'pool_2'
RULE 'del1' DELETE FROM POOL 'pool_1'

WHERE (DAYS(CURRENT_TIMESTAMP) – DAYS(ACCESS_TIME) > 30)
AND lower(NAME) LIKE '%.tmp'
Each SONAS filesystem is mapped to storage pools. The default pool for a filesystem is the
system pool also called pool1. A file system can have one or more additional storage pools
after the system pool.
Each storage pool is associated with one or more NSDs or LUNs. SONAS also manages
external storage pools. An external storage pool is not mapped to standard NSD devices, it is
a mechanism for SONAS to store data in an external manager such as TSM. SONAS
interfaces with the external manager using a standard protocol called Data Management API
(DMAPI) that is implemented in the SONAS GPFS filesystem. Policies control the location of
files among storage pools in the same filesystem. Figure 3-21 shows a conceptual
representation of a filesystem, pools and NSDs:
Figure 3-21 SONAs filesystem and policies
A filesystem is managed by one active policy, policy1 in the example. The initial file
placement policies control the placement of new files. File placement policies are evaluated
and applied at file creation time. If placement policies are not defined all new files are placed
in the system storage pool. Migration and deletion rules, or file management rules, control the
movement of files between SONAS disk storage pools and external storage pools like TSM
HSM and the deletion of old files. Migration and deletion rules can be scheduled using the
cron scheduler. File migration between pools can also be controlled by specifying thresholds.
Figure 3-22 on page 110 shows a conceptual representation of these rules.

Figure 3-22 File placement and migration rules
SONAS introduces the concept of tiered and peered storage pools:

Tiered Pools The pools that NSDs are assigned to can be tiered in a hierarchy
using GPFS file management policies. These hierarchies are typically
used to transfer data between a fast pool and a slower pool (Pool1 
Pool2) using migration. When coupled with HSM, data flows in a
hierarchy from Pool1  Pool2  Pool3(HSM).
Peered Pools The pools that NSDs are assigned to can be operated as peers in a
hierarchy using GPFS initial file placement policies. These policies
allow files to be placed according to rules in either the fast pool
Pool1or the slower pool Pool2. When coupled with HSM data flows to
either Pool1 or Pool2 pool based on initial file placement policies, then
from both Pool1 and Pool2 pools the data flows to Pool3 (HSM) based
on file management policies.
To simplify implementation of HSM and storage pooling, SONAS provides templates for
several standard usage cases. Customized cases can be created from the default templates
by using the SONAS CLI. The standard usage cases, also called ILM profiles, are shown in
the diagram in Figure 3-23 on page 111.

Default pool:
All NSDs in the same pool
new file pool1
Peered pools:
Placement policies only
new file pool1 pool2
Tiered pools:
Files placed in pool1 and
new file pool1 pool2 then moved to pool2
Default pool and HSM:

Files placed in pool1 then
new file pool1 pool3 moved to TSM HSM pool3
Peered pools and HSM:

Placement policies for
new file pool1 pool2 pool3 pool1,2 and migration from
pool1,2 to pool3
Tiered pools and HSM:
Files placed in pool1, then
new file pool1 pool2 pool3 migrated to pool2 and then
to TSM HSM pool3
Figure 3-23 Standard ILM policy profiles
The standard ILM policy profiles are based on the assumption that pool1 is the fastest pool
using the fastest storage devices such as SAS disks and pool2 is based on less expensive
disk such as SATA. SONAS GPFS metadata should always reside in the fastest storage
pool, pool1 in our examples as it is the data that has the highest IO requirements when
SONAS GPFS file system scan operations are performed. For additional information on
configuration of SONAS policy rules refer to “SONAS policies” on page 157.
3.6.2 Using and configuring Tivoli Storage Manager HSM with SONAS basics
The use of SONAS HSM provides the following advantages:
򐂰 It frees administrators and users from manual file system pruning tasks, and defers the
need to purchase additional disk storage.
򐂰 It allows the TSM HSM to extend the SONAS disk space and automates the movement of
seldom-used files to and from external near line storage
򐂰 It allows pre-migration, a method that sends a copy of the file to be migrated to the TSM
server prior to migration, allowing threshold migration to quickly provide space by simply
stubbing the premigrated files.
To use the TSM HSM client you must provide a TSM server external to the SONAS system
and the server is accessed through the ethernet connections on the interface nodes.
See “SONAS and Tivoli Storage Manager integration” on page 118f or more information on
the configuration requirements and connection of a SONAS and Tivoli Storage Manager
server.
The current version of SONAS requires that HSM be configured and managed using the CLI
as at the time of writing GUI support is not present for HSM. HSM migration work may cause
additional overhead on the SONAS interface nodes, especially in environments that regularly

create large amounts of data and want to migrate it early, and so care should be taken when
planning the timing and frequency of migration jobs.
When using HSM space management on a filesystem, each file in the filesystem can be in
one of three different states:
򐂰 resident when the file resides on disk in the SONAS appliance
򐂰 premigrated when the file resides both on the disk in the SONAS and in TSM HSM
򐂰 migrated when the file resides only in TSM
Files are created and modified on the SONAS filesystem and when they are physically
present in the filesystem they are said to be in the resident state.
Files in an HSM managed filesystem can be migrated to TSM HSM storage for a variety of
reasons, such as when a predefined file system utilization threshold is exceeded. Migrated
files are copied to TSM and replaced by a stub file that has a preset size. Using a stub file
leaves a specified amount of file data at the front of the file on the SONAS disk, allowing it to
be read without triggering a recall. In a SONAS GPFS environment, a small file that is less
than the 1/32 of the filesystem blocksize, or one subblock, can become larger after an HSM
migration because SONAS GPFS adds meta information to the file during the migration.
Because another block on the file system is allocated for the meta information, this increases
the space allocated for the file. If a file system is filled to its maximum capacity with many
small files, it is possible that the file system can run out of space during the file migration.
A recall is triggered when the first byte of storage not on the SONAS disk is accessed. When
a migrated file is accessed, it is recalled from the external TSM storage into the SONAS
storage. If you have files with headers that will be periodically accessed and do not wish to
trigger recalls on those header accesses, you should use the appropriate stub file size to
ensure that an appropriate amount of file header stays on the SONAS disk.
Note: At the timeof writing SONAS only supports a stub file size of zero so migrated files
will be recalled as soon as the first byte of the file is accessed. So care should be placed
on the kind of utilities you run on HSM enabled filesystems.
As data is accessed via CIFS or NFS, when a migrated file is opened and a byte of data that
is not in the SONAS cache is accessed, that access triggers a Data Management API
(DMAPI) event in the SONAS. That event is sent to the primary TSM client, that resides on
one of the interface nodes, and it triggers a recall. If the primary TSM client is not overloaded,
it issues the recall itself, otherwise it sends the recall to another TSM client node. In practice
most recalls will be performed by the primary TSM client interface node.
Because a recall from physical tape requires waiting for cartridge fetching, tape drive loading
and tape movement to the desired file, physical tape recalls can take significant numbers of
seconds to start, so the application needs to plan for this delay.
The TSM requirements for HSM are:

򐂰 You must supply a TSM server that can be accessed by the SONAS interface nodes
򐂰 You must ensure that sufficient network bandwidth and connectivity exists between the
interface nodes they select to run HSM on to the external storage server they are
providing.
򐂰 The TSM server has to be prepared for use by the SONAS system
򐂰 A TSM storage pool using the to store migrated data is set up
򐂰 Server time should be synchronized with the SONAS system, both systems should access
the same NTP server
򐂰 TSM server authentication should be on

HSM can be added to a filesystem at the time of filesystem creation or at a later time.
Note: HSM cannot be removed from a file system through CLI commands, services need
to be engaged.
The diagram in Figure 3-24 shows the steps that need to be performed to add HSM to a
SONAS filesystem using the SONAS CLI.
startbackup
mkfs/chfs verify TSM connect
create/change fs
cfghsmnode
create TSM parms
mkpolicy cfghsmfs
create policy connect fs to HSM
mkpolicytask
schedule policy
setpolicy
apply policy to fs
Figure 3-24 Steps for adding HSM to a filesystem
The mkfs and chfs commands are used to create a new filesystem or modify a filesystem for
HSM usage, as these commands allow you to add multiple NSDs and storage pools to the
filesystem.
The cfghsmnode command is used to validate the connection to TSM and sets up HSM
parameters. The startbackup command can optionally be used to verify the TSM connection
for a specific filesystem, if startbackup executes correctly you know you have a valid
connection to TSM for use by HSM.
The cfghsmfs command adds HSM support for a given filesystem, it enables SONAS CIFS
component HSM support and stores HSM configuration information to the CTDB registry.
You then create a policy with the mkpolicy command and set the policy for a filesystem with
the setpolicy command. For more information on creating and managing policies see Figure
10-149, “Call Home test” on page 425.
After creation of the policy you can schedule the policy execution with the SONAS scheduler
by using the mkpolicyrule command.
SONAS HSM also provides the lshsmlog command to view HSM errors and the lshsmstatus
command to verify HSM execution status.

3.7 SONAS resiliency using snapshots

In this section, we will overview how SONAS Software implements space-efficient Snapshots.
Snapshots are a standard, included feature of the SONAS Software and do not require any
additional licensing. SONAS Snapshot enables online backups to be maintained, providing
near instantaneousness access to previous versions of data without requiring complete,
separate copies or resorting to offline backups.
SONAS Snapshots can be scheduled or performed by authorized users or by the SONAS

administrator, with the capability of up to 256 active Snapshots, per file system, at any one
time.
SONAS Snapshot technology makes efficient use of storage by storing only block-level
changes between each successive Snapshot. Only the changes made to the original file
system consume additional physical storage, thus minimizing physical space requirements
and maximizing recoverability.
At the current release level, SONAS Snapshot is a read-only, point-in-time consistent version
of a entire SONAS file system, frozen at a point in time:
– Each SONAS file system can maintain up to 256 Snapshots concurrently
– Snapshots only consume space when the file system changes
– Snapshots uses no additional disk space when first taken
– Snapshots are enforced to be consistent across the file system to a single point in time
– Snapshots can be taken manually or automatically on a schedule
– For CIFS users, SONAS Snapshots are readily accessible via Microsoft Volume
Shadow Services (VSS) integration into the Windows Explorer interface
Snapshots can be made by administrators with proper authority through the SONAS
Management GUI, or through the SONAS Command Line Interface (CLI). The SnapShot
appears as a special directory called .snapshots and located in the filesystem rood directory,
as shown in Figure 3-25 on page 114.
Filesystem /fs1/file1
/fs1/file2
Before /fs1/subdir1/file3
Read-only copy of
/fs1/subdir1/file4
Snapshot /fs1/subdir2/file5 directory
structure and files
Only changes to the

original file consume
/fs1/file1
/fs1/file2 disk space
/fs1/subdir1/file3
Filesystem /fs1/subdir1/file4
/fs1/subdir2/file5
After
/fs1/.snapshots/snap1/file1
Snapshot /fs1/.snapshots/snap1/file2
/fs1/.snapshots/snap1/subdir1/file3
Figure 3-25 SONAS Snapshot appears as a special directory in the file system

Snapshots of a SONAS file system are read-only; changes are made only to the active (that
is, normal, (non-snapshot) files and directories. Snapshots are only made of active file
systems, you cannot make a Snapshot of a existing snapshot. Individual files, groups of files,
or entire directories can be restored or copied back from Snapshots. For additional
information on configuring snapshots refer to “Snapshots” on page 188.
3.7.1 Integration with Windows

SONAS Snapshot supports the Microsoft Volume Shadow copy Services (VSS) function to
allow display of older file and folder versions, from within the Windows Explorer. Snapshots
are exported to Windows CIFS clients via the Volume Shadow copy Service (VSS) API.
This means that SONAS Snapshot data can be accessed and copied back, through the
previous versions dialog in the Microsoft Windows Explorer. Following is a picture of this
previous versions dialog, as shown in Figure 3-26 on page 115.
Use these
buttons
copy or
restore the
snapshot
Figure 3-26 SONAS Snapshots are accessible for Windows CIFS users via Windows Explorer
SONAS Snapshots are intended as a point in time copy of an entire SONAS file system, and
preserves the contents of the file system at a single point in time. The snapshot function
allows a backup or mirror program to run concurrently with user updates and still obtain a
consistent copy of the file system as of the time that the snapshot was created. SONAS
Snapshots also provide an online backup capability that allows easy recovery from common
problems such as accidental deletion of a file, and comparison with older versions of a file.
3.8 SONAS resiliency using asynchronous replication

In this section, we will overview how SONAS asynchronous replication is designed to provide
a bandwidth-friendly mechanism that is tolerant of telecommunication bandwidth shortages.
This implementation is space efficient, transferring only the changed blocks of a file, not the
whole file again. Resource efficiency and high performance is achieved by using multiple
interface nodes in parallel, to transfer the data.

SONAS asynchronous replication can also be useful for the idea of ‘backup-less backup’
disaster recovery, in other words, using direct disk to disk incremental change replication to a
disaster recovery remote site. This is particularly important when the raw amount of data for
backup/restore for large amounts of storage, is so large that a tape restore at a disaster
recovery site may be unfeasible from a time-to-restore standpoint.
We will discuss in this section, the SONAS asynchronous replication capability that is
designed to address these requirements. At a high level, SONAS asynchronous replication
works as follows.
The first step is to execute a central policy engine scan for async replication. The SONAS
high performance scan engine is used for this scan. As part of the asynchronous replication,
an internal snapshot will be made of both the source file system and the target file system.
The first step is shown in Figure 3-27.
Async replication
begins by
executing a policy
IBM Scale Out NAS
1. Read policies
Global Namespace Remote Scale
Out NAS
Policy Engine
Interface
node … Interface
node …
Target file system 1 snapshot
Storage
node
… Storage
node
…
File system 1 snapshot File system 2 snapshot
Figure 3-27 SONAS async replication step 1 - execute a policy, makes snapshots
The next step is to make a mathematical hash of the source and target snapshots, and
compare them, as shown in Figure 3-28 on page 116.
Hash compare:
determines
incremental
changes to send
IBM Scale Out NAS
Global Namespace
Remote Scale
Policy Engine Out NAS
Interface
node … Interface
node … hash
hash hash
2. Scan, hash compare Storage

node
… Storage
node
… hash

Figure 3-28 SONAS async replication step 2 - compare mathematical hash of snapshots

The final step is to exploit the parallel data transfer capabilities of SONAS by having multiple
nodes participate in the transfer of the async replication changed blocks to the target remote
file systems, as shown in Figure 3-29.
IBM Scale Out NAS 3. Parallel transmit to remote site(s)
Global Namespace
Remote Scale
Policy Engine Out NAS
Interface
node … Interface
node …
Storage
node
… Storage
node
…
Figure 3-29 SONAS async replication step 3 - transfer data using multiple interface nodes
The internal snapshot at the source side assures that data being transmitted is in data
integrity and consistency, and is at a single point in time. The internal snapshot at the target is
there to provide a backout point in time capability, if for any reason the drain of the changes
from source to target fails before it is complete.
Let’s review a few more details about the SONAS asynchronous replication.
SONAS asynchronous replication is designed to cope with connections that provide low
bandwidth, high latency and low reliability. The basic steps in of SONAS asynchronous
replication are:
򐂰 Take a snapshot of both the local and remote file system(s). This ensures first that we are
replicating a frozen and consistent state of the source file system.
򐂰 Collects a file path list with corresponding stat information, by comparing the two with a
mathematical hash, in order to identify changed blocks
򐂰 Distributes the changed file list to a specified list of source interface node(s)
򐂰 Run a scheduled process that performs rsync operations on the set of interface nodes, for
a given file list, to the destination SONAS. Rsync is a well-understood open source utility,
that will pick-up the changed blocks on the source SONAS file system, and stream those
changes in parallel to the remote, and write them to the target SONAS file system.
򐂰 The snapshot at the remote SONAS system insures that a safety fallback point is
available should there be a failure in the drain of the new updates.
򐂰 Once the drain is complete then the remote file system is ready for use.
򐂰 Both snapshots are automatically deleted after a successful replication run.
The target SONAS system is an independent SONAS cluster that may be thousands of miles
away.
At the current release level SONAS R1.1.1, asynchronous replication is available for
replicating incremental changes at the file system level to one other site. Asynchronous
replication is done using an IBM enhanced and IBM supported version of open source 'rsync'.
The enhancements include the ability to have multiple SONAS nodes in parallel work on the
rsync transfer of the files.
The asynchronous replication is unidirectional; changes on the target site are not replicated
back. The replication schedule is configured thru the SONAS GUI or via CLI. Depending on

the number of files included in the replication, the minimal interval will vary depending on the
amount of data and files to be sent. For additional information on how to configure
asynchronous replication refer to “Local and remote replication” on page 193.
3.9 SONAS and Tivoli Storage Manager integration

In this section we will provide more SONAS configuration details on how the SONAS and
Tivoli Storage Manager (TSM) work together and are configured. You may choose to use the
SONAS-specific integration and exploitation with Tivoli Storage Manager for either or both of
these two functions:
򐂰 Protect the data in SONAS with backup and restore functionality to guarantee data
availability in case of data corruption, accidental deletion or hardware loss
򐂰 Migrate low access data from SONAS to TSM managed storage devices such as tape to
free up space inside the SONAS system
The SONAS to TSM integration is designed to support accelerated backups on the file-level
of entire GPFS filesystems using a TSM client to an external TSM server, and to provide a
file-level restore capability.
The SONAS to TSM integration also offers SONAS customers the ability to perform
Hierarchical Storage Management (HSM) to external TSM managed storage devices to free
up space in the SONAS system. Files that have been moved by HSM to external, TSM
managed, storage are called migrated files. When a migrated file is accessed SONAS
initiates a recall operation to bring the file back from TSM storage to SONAS disk and this
recall is transparent to the SONAS client accessing the file, the client will only notice a delay
proportional to the time required to recall the file from TSM.
The TSM to SONAS backup integration is file based, that means that TSM performs backup
and restore at the file level and handles individual files as individual TSM objects. This offers
the flexibility of incremental backup and gives us the ability to restore individual files. With this
architecture the TSM database needs to be sized appropriately, it will have an entry for each
file managed, to hold entries for all files being backed-up.
The SONAS system runs the TSM clients on all or a subset of interface nodes. These
interface nodes connect to an external, customer supplied, TSM server through the customer
LAN network. The TSM server contains a database that inventories all files that have been
backed up or migrated to TSM and owns the storage devices where backed up and migrated
data is stored. Figure 3-30 shows a diagram of the SONAS and Tivoli Storage Manager
configuration.

TSM client
code is pre-
installed
Mgmt Interface Interface

Node Node Node
Storage SONAS to TSM

Pod is Ethernet only
IBM
SONAS
disk TSM server
external to SONAS
Figure 3-30 SONAS and Tivoli Storage Manager configuration
As compared to normal, conventional backup software, SONAS and TSM integration is

designed to provide significantly accelerated backup elapsed times or high performance HSM
to external storage, by exploiting the following technologies:
򐂰 The fast SONAS scan engine is used to identify files for TSM to backup or migrate. This is
much faster compared to standard TSM backups, or other conventional backup software,
that needs to traverse potentially large filesystems and checking each file against the TSM
server. The SONAS scan engine is part of the SONAS file system and knows exactly
which files to backup and migrate and will build a list of files, the filelist, to backup or
migrate. The list is then passed to TSM for processing. Instead the standard operation of
TSM requires that it traverse all files in the file system and send the information to the
TSM server to determine which files need a backup and which files are allready present i
the file server.
򐂰 Multiple SONAS interface nodes can be configured to work in parallel so that multiple
TSM clients can stream data to the TSM server at an accelerated rate.
򐂰 The SONAS Software will distribute parts of the filelist as backup jobs to multiple TSM
clients configured on a given set of interface nodes. Each interface node then operates in
parallel on it’s own subset of the files in the filelist. Each TSM process can establish
several sessions to the TSM server.
TSM customers can make use of their existing TSM servers to backup SONAS, if the server
has enough capacity to accomodate the new workload. Configuring SONAS to perform TSM
functions requires only a few commands, these commands need to be issued both on the
TSM server and the SONAS system. These command perform both the initial configuration
and the scheduling of the backup operations. HSM migration operations are configured
separately using the policy engine, as discussed in “SONAS data management services” on
page 107.
In SONAS, TSM backup is performed over LAN through one ormore interface nodes and
these connect to one or more TSM server. It is not possible to do LAN-free backup at this
time from SONAS directly to storage devices managed the TSM server.

For more information on how to configure SONAS with TSM refer to “Backup and restore of
file data” on page 181.
3.9.1 General TSM and SONAS guidelines

A SONAS system can accomodate large quantities of data, both in terms of space used and
also in terms of number of files. SONAS supports to 256 filesystems and each filesystem can
have hundreds of millions of files. Whether you are considering backup or HSM management
of the SONAS primary space, you have to take into account your expected workload
characteristics.
The TSM server has a DB2 database (DB) that inventories all files that are stored in TSM and
each file requires aroungd 1KB of TSM DB space. As a general rule the TSM DB should be
limited to a size of 1TB and consequently it can accomodate around 1billion files. Currently an
individual SONAS filesystem can be backed up by a single TSM server, you cannot split a
filesystem to backup to different TSM servers. You can configure multiple or all filesystems to
use the same TSM server. If your filesystems have large numbers of files, in the order of a
billion or more, you should plan the filesystem to TSM server association so as not to
overwhelm the TSM server with files and probably you will require multiple TSM servers.
You should also considered the required throughput of the backup system in terms of files
and amount of data per unit of time. Assume you have 10TB of data, an average file size of
100k and 100 million files, and a daily change rate of 10%. This gives 1TB/day and 10million
files to backup. Assume you have a 4 hour backup window, your backup environment will
need to accomodate something like 250GB/h or 70MB/sec and 2.5 million files/hour or
around 700 files/sec. The data rate can be easily accomodated but the amount of files to
handle could be a challenge and may require multiple TSM servers.
TSM manages multiple storage devices, these can be disk and tape technologies. Disk has
good random and sequential performance characteristics and low latency. TSM disk storage
can accomodate multiple read and write streams in parallel. You should also consider for disk
contention as multipel parallel streams can cause disk contention and the aggregate
throughput can be less than that of a single stream.
Tape storage devices offer different characteristics, they can store data for long periods of
time and are very energy efficient as tapes consume no energy at rest. Current tape
technologies have high sequential data rates in the order of 100-200MB/sec. With TSM each
backup session uses an individual tape drive. Tapes are usually mounted automatically by a
tape library and the mount time depends on the library model but in general you can assume
it to be between 40 and 120 seconds. tapes then need to be positioned and this can take
around 30 seconds depending on the drive technology. During this time the aplication using
the tape sees a delay, in the case of backup this delay is generally a small part of the total
backup operation duration. In the case of HSM it is felt directly by the application that uses the
file because it waits until the data is recalled to SONAS disc.
We will give some general guidelines regarding the use of SONAS with TSM. These have to
be taken in the context of your specific data characteristics such as file size and amount of
data, your workload requirements in tems of daily backup traffic and restore speed
expectations.
򐂰 If you have many small files to backup on a daily basis and you need to send multiple
backup streams to the TSM server you should consider using TSM disk-pool as the
primary pool to store data. If you configure the disk pool larger than the normal amount of
backup data that gets backed up per backup-run, so that all data first gets copied to disk,
then no tape mount is required during a backup

򐂰 Depending on the amount of data in SONAS, it might be necessary to have one dedicated
TSM server per filesystem considering that one SONAS filesystem could contain 2 billion
files.
򐂰 If you need to backup large files to TSM, say larger than 1MB, then you may consider
sending them directly to tape without storing them on a dissk storage pool. You will need
to configure as many tape drives as the number of paralell sessions you have configured
to TSM in SONAS.
򐂰 When using SONAS HSM that migrates data outside the SONAS environment, you
probably should consider using tape as the final destination of the data because if you
used disk you would defeat the pourpose of migration.
򐂰 When using HSM to tape remember to plan for the application delay in accessing the data
because of the time required to mount and position the tape and then the time required to
recall the data to SONAS disk.
The TSM backup is not using the classical TSM backup process to traverse the filesystem,
compare the client contents with those on the server and identify the changes as this would
be time-consuming due to the interaction beween filesyste, TSM client and remote TSM
server.
Instead the SONAS Software is called to use the high performance scan engine and the
policy engine to identify changes in the filesystem, and to generate the list of files that need to
be expired, and the list of files that need to be backed up.
Several scripts are provided with the SONAS Software to define the interface nodes involved
in the backup, the relationship of which filesystem needs to be backed up to which TSM
server, and to schedule, start and stop backup and restores operations.
You should not consider the use of SONAS HSM with TSM as a replacement for backups.
HSM should be viewed as external storage extension of local SONAS disk storage. A TSM
backup implies two concepts, the first is that the backup is a copy of the original file,
regardless of where the original file is and that can be either inside a SONAS filesystem or in
TSM external storage. The second concept is that the backup file can exist in multiple
versions inside TSM storage, based o the TSM backup policies you configure. TSM backups
will allow you to restore a file that hase been damaged or lost, either because of deletion or
logical corruption of the original file or because of media faliure either in SONAS storage or in
TSM storage.
When a file is migrated using the TSM HSM server to the external TSM HSM storage, there is
still only one copy of the file available, because the original is deleted on the SONAS file
system itself, and replaced by the TSM/HSM stub file. Also, HSM with Tivoli Storage Manager
maintains only the current copy of the file, giving no opportunity to store multiple versions. In
comparison, TSM backup/archive (or typically any backup/archive software) gives you the full
ability to storage multiple backup versions of a file, and to track and manage these backup
copies in an automated way.
It is Tivoli Storage Manager best practice that you should backup a file before the file has
been migrated by Tivoli Storage Manager HSM to external storage. With proper
configuration, you can specify in TSM management classes that a file is not eligible for HSM
migration unless a backup has been made first with the TSM backup-archive capability.
Generally an HSM managed file lifecycle implies file creation, the backup of the file shortly
after creation, the file then stays on disk for a given amount of time and is later migrated to
TSM HSM storage. If the file is becomes candidate for migration very shortly after creation
the following two scenarios can occur:

򐂰 If you specify in TSM that migration requires backup then the file will not be migrated until
a bakup cycle has successfully completed for the file. The file will be copied from SONAS
to TSM two times: one time for backup and one time for migration.
򐂰 If you specify in TSM that migration does not require backup then the file will be migrated
and a subsequent backup cycle will cause the file to be copied inside TSM from TSM HSM
storage to TSM backup storage. The file will be copied from SONAS to TSM only one time
and the second opy will be made by the TSM server.
Note: If ACL data of a premigrated file are modified, these changes are not written to the
TSM server, if the file will be migrated after this change. To avoid loosing the modified ACL
data, use the option migraterequiresbackup yes. This setting will not allow to migrate files,
whose ACL data have been modified and no current backup version exists on the server.
You can back up and migrate your files to the same IBM Tivoli Storage Manager server or to
different IBM Tivoli Storage Manager servers. If you back up and migrate files to the same
server, the HSM client can verify that current backup versions of your files exist before you
migrate them. For this purpose, the same server stanza for backup and migration must be
used. For example, if you are using the defaultserver and migrateserver TSM options, they
must both point to the same server stanza within the TSM dsm.sys file. You cannot point to
different server stanzas, even if they are pointing to the same Tivoli Storage Manager server.
To restore stub files rather than backup versions of your files, for example if one or more of
your local file systems is damaged or lost, use the TSM backup-archive client restore
command with the restoremigstate option. Your migrated and premigrated files remain intact
on the Tivoli Storage Manager server, and you need only restore the stub files on your local
system.
However you cannot use the backup-archive client to restore stub files for your migrated files,
if they have been backed up before the migration. Instead use the TSM HSM
dsmmigundelete command to recreate stub files for any migrated or premigrated files that are
lost.
If you back up and migrate data to tape volumes in the same library, make sure that there are
always some tape drives available for space management. You can achieve this by limiting
the number of tape drives which can be used simultaneously by backup and archive
operations. Specify a number for the mountlimit which is less than the total number of drives
available in the library (see mountlimit option of the define devclass command in the IBM
Tivoli Storage Manager Administrator's Reference for your operating system). Using disk
storage as your primary storage pool for space management might, depending on the
average size of your files, result in a better performance than using tape storage pools.
If you back up files to one TSM server and migrate them to a different TSM server, or if you
are using different TSM server stanzas for backup and migration, the HSM function cannot
verify that current backup versions of your files exist before you migrate them. Use the
backup-archive client to restore the actual backup versions only.
Archiving and retrieving files

TSM archiving of files refers to the operation of storing a copy of the files in TSM that is then
retained for a specific period of time, as specified in the TSM management class associated
to the file. TSM archived files are not subject to versioning, the file exists in TSM regardless of
what happens to the file in primary SONAS storage. Archiving is only used to retain a copy of
th file for long periods of time.
SONAS does not support archiving of files, this means that the SONAS TSM client interface
does not allow you to specify archiving of files.

If you wish to use the TSM archiving function you should install a TSM client on a datamover
system, a server external to SONAS, mount the SONAS exported filesystems you wish to
archive to TSM on this server, and then initiate the archive operation using the TSM archive
command on this server. The same process can be used to retreive files archived to TSM.
Note that the performance of the backup operation could be impacted if you need to archive
large numbers of small files. You should ensure the user on the datamove system has the
necessary authority to read and write the files.
Restoring file systems overview

If you lose an entire file system and you attempt to restore backup versions of all your files,
including those that are migrated and premigrated, do proper planning so that to avoid your
file system running out of space. If your file system runs out of space during the restore
process, the HSM funtion must begin migrating files to storage to make room for additional
restored files, thereby slowing the restore process. You should evaluate the dsmmigundelete
command to restore migrated files as stub files.
TSM Manuals and Information

More information on TSM may be found at the online Tivoli Storage Manager information
center at:
http://publib.boulder.ibm.com/infocenter/tsminfo/v6/index.jsp
3.9.2 Basic SONAS to TSM setup procedure

In a SONAS environment, the basic setup procedure for connecting SONAS to Tivoli Storage
Manager is:
򐂰 The TSM servers (TSM server V5.5 or above supported) need to be connected to the
network and reachable from the SONAS interface nodes
򐂰 Each TSM server used for SONAS needs to be setup with a backup pool to use the
backup feature
򐂰 Each TSM server used for SONAS needs to have one node name registered for each
interface node that is being used for backup from SONAS and an additional virtual pr
proxy node to represent the SONAS system
򐂰 The setup could be that one SONAS file system is backed up against TSM server1, while
another SONAS file system is backed up against TSM server2
򐂰 Creating a proxy node name on the TSM Server
򐂰 Granting access for each of the cluster nodes to backup data to this proxy node name as
the virtual proxy node may be used from more than one node in SONAS
򐂰 TSM server authentication is set to ON (“set auth on”)
򐂰 TSM server date/time and SONAS nodes data/time need to be in sync
򐂰 Create a new SONAS backup schedule using the StartBackupTSM template
3.9.3 TSM software licensing

The TSM client software code is pre-installed within SONAS and is resident in the code from
the factory so there is no need to order it and install it separately. If you are not using TSM
functions, there is no charge for the fact that this TSM client code is present in the SONAS
Software. The TSM client code is installed only on the inteface nodes and not on the storage
nodes and, as of the writing of this book, the TSM client version 6.2.2 is used internally in
SONAS Software.
There are two separate TSM clients installed in SONAS. The TSM backup/archive (TSM b/a)
client used for backup and restore of SONAS files and the TSM HSM client used for space

management by offloading files from SONAS storage to TSM storage and offering
transparent recall. The TSM b/a client is supplied as part of the IBM Tivoli Storage Manager
Standard Edition and the IBM Tivoli Storage Manager Extended Edition products. The TSM
HSM client is part of the IBM Tivoli Storage Manager for Space Management product. These
TSM client components will need to connect to an external, to SONAS, TSM server that also
needs to be licensed, you can use an existing TSM server if your installation has one already.
You are required to pay a license charge for the TSM client code only if you are using TSM
functions and you would only pay the license charge for the interface nodes that are attached
to TSM servers and actively run TSM client code. The TSM HSM client requires the TSM b/a
client so to use HSM functionality both clients must be licensed for each interface node
running the code. Even though TSM can be licensed for a subset of interface nodes our
recommendation is to license the function on all interface nodes for multiple reasons such as:
򐂰 A SONAS filesystem can be mounted on a subset of nodes or on all nodes; mounting the
file system on all nodes guarantees the maximum level of availability of the resource in
case of failover
򐂰 To manage a file system TSM code must run on at least one of the nodes where the file
system is mounted
򐂰 It is best to run TSM code on multiple nodes where the filesystem is mounted to guarantee
service during failover
򐂰 The TSM client can execute parallel backup streams from multiple nodes for the same
filesystem thus increasing backup and restore throughput
򐂰 When using TSM HSM, file recalls can occur on any node and need to be serviced by a
local TSM HSM client
The TSM licensing is calculated on the processor value units (PVU) of the SONAS interface
node or croup of nodes that run the TSM code. TSM client licensing is not calculated for the
terabytes of storage that may be on the SONAS system. For a more detailed explanation of
TSM PVU licensing please refer to:
http://www-01.ibm.com/software/lotus/passportadvantage/pvu_licensing_for_customers
.html
At the time of writing each SONAS interface node has two sockets each with a quadcore
processors for a total of 8 cores. The interface node has Xeon Nehalem EP processors that
imply a value of 70PVU per core. So the required TSM PVUs for each interface node running
TSM code are 560PVU that corresponds to 8 cores times 70 PVU per core. If you choose to
run TSM code on 3 interface nodes you will require to license 1680PVUs.
For additional information there are several red-books and white-papers about TSM, TSM
sizing guidelines, TSM performance optimization, tuning knobs, that are available at the
redbooks website:
http://www.redbooks.ibm.com/
3.9.4 How to protect a SONAS files without TSM

If you do not have TSM in your environment and would like to backup the SONAS data you
need to use an external datamover system that can mount the SONAS file system exports.
This is similar to the procedure discussed in “Archiving and retrieving files” on page 122.
You should install a backup client of your choice on the external datamover server. You
should ensure the user on the datamove system has the necessary authority to read and
write the files. You can then start the backup and restore operations using your backup

software. Note that the performance of the backup operation could be impacted if you need to
backup large filesystems with large numbers of files.
3.10 SONAS system management services

SONAS provides a comprehensive set of facilities for globally managing and centrally
deploying SONAS storage. In this section we will provide an overview of the Management
GUI, the Command Line Interface and the Health Center. For information on accessing the
GUI and command line refer to “Using the management interface” on page 306.
The SONAS GUI and command line (CLI) connect to a server that runs on the SONAS
management node, as illustrated in Figure 3-31. The server collects data from the interface
and storage nodes and stores the data in a database. It can also run data collection tasks on
the SONAS nodes and also this data gets stored in the database. The data is then served to
the CLI by the CLI server component and to the GUI through the ISC controller. Data
displayed on the GUI and CLI is mainly retrieved form the Database.
CIM Agent
Lightweight Infrastructure Framework (LWI)
(handles SNMP
using an Adaptor)
CIM Service
SSH Daemon
Interface Nodes CLI

Tasks
Business Layer
Server
SSH Client
CIM Agent
(handles SNMP
sing an Adaptor)
SSH Daemon
DB GUI
(ISC Controller)
Storage Nodes
SONAS Backend
Management Node Web Browser
Figure 3-31 SONAS GUI and CLI backend
SONAS uses multiple specialized gatherer tasks to collect data and update the database, as
shown in Figure 3-32. For example clicking on the refresh button on the File Systems GUI
page starts a File System gatherer task, which will get the needed information from the nodes
attached to the cluster. The last time the gather was run is displayed on the bottom right
button in the file system GUI window.

Gatherer Tasks
File Systems
SSH Client
Exports
SSH
SSH Daemon
Daemon DB
…
Node
Node …
SONAS Backend
Figure 3-32 SONAS backend gather tasks
3.10.1 Management GUI

SONAS provides a centralized web-based graphical user interface and Health Center for
configuration and monitoring tasks. Users access the GUI / Health Center via a standard web
browser. There is a command line interface (CLI) as well.
SONAS Management GUI server runs on the SONAS Management Node and is web-based,
you can access it from a remote web browser using the https protocol. It provides role-based
authorization for users, and enables the administrator to maintain the SONAS cluster. These
roles are used to segregate GUI administrator users according to their working scope within
the Management GUI. These defined roles are as follows:
򐂰 Administrator - this role has access to all features and functions provided by the GUI. This
role is the only one that can manage GUI users and roles.
򐂰 Operator - the operator can do the following:
– Check the health of the cluster.
– View the cluster configuration.
– Verify the system and file system utilization.
– Manage to set thresholds and notifications settings.
򐂰 Export administrator - the export administrator is allowed to create and manage shares,
plus perform the tasks the operator can execute.
򐂰 Storage administrator - the storage administrator is allowed to manage disks and storage
pools, plus perform the tasks the operator can execute.
򐂰 System administrator - the system administrator is allowed to manage nodes and tasks,
plus perform the tasks the operator can execute.
For additional information on administration roles and defining administrators see “User
management” on page 389.
SONAS has a central database that stores configuration information and events. This
information is used and displayed by the management node and collected to the
management node from the other nodes in the cluster. The SONAS Management GUI and
Health Center provide panels for most functions, a partial list follows:
򐂰 Storage management
򐂰 File system management

򐂰 Pool management
򐂰 Fileset management
򐂰 Policy management
򐂰 Access control list (ACL) management
򐂰 Synchronous replication management
򐂰 Heiarchical Storage management
򐂰 Tivoli Storage Manager backup management
򐂰 Async replication management
򐂰 Snapshot management
򐂰 Quota management
򐂰 Cluster management
򐂰 Protocol management (CIFS, NFS, HTTPS, FTP)
򐂰 Export management
򐂰 Event log
򐂰 Node availability
򐂰 Node utilization (CPU, memory, I/O)
򐂰 Performance management (CPU, memory, I/O)
򐂰 File system utilization (capacity)
򐂰 Pool / disk utilization (capacity)
򐂰 Notifications / call-home
򐂰 Hardware monitoring
򐂰 File access services such as NFS, HTTPS, FTP, and CIFS
򐂰 File system services
򐂰 Nodes including CPU’s, memory DIMM’s, VRM, disk drives, power supplies, fans and
onboard network interface ports
򐂰 I/O adapters including storage and network access
򐂰 Storage utilization
Panels are available for most of the major functions, as shown in Figure 3-33:
Figure 3-33 SONAS Management GUI has panels for most aspects of SONAS
SONAS has a complete Topology Viewer, that shows in graphical format, the internal
components of the SONAS, reports on their activity, and provides a central place to monitor
and display alerts. You can click on an icon and drill down into the details of the particular
component, this function is especially useful when drilling down to solve a problem. In
Figure 3-34 on page 128, we see an example of the SONAS Management GUI Topology
Viewer:

Exports / shares
status
External interface
network throughput
File systems
status
At a glance look at all

interface node status
Internal data network

performance
Storage node status

and performance
Figure 3-34 SONAS Management GUI - Topology Viewer
Each of the icons is clickable, and will expand to show status of an individual components.
The SONAS Management GUI is the focal point for extended monitoring facilities and the
SONAS Health Center.
3.10.2 Health Center

The SONAS Health Center provides a central place to view the overall SONAS health,
including examining the System Log, Alert Log, and System Utilization Reports and graphs.
Through the SONAS Management GUI, repeating tasks can be set up, utilization thresholds
set, notification settings refined and notification recipients defined.
SONAS tracks historical performance and utilization information, and provides the ability to
graphically display the current and historical trends.
This is shown in Figure 3-35 on page 129.

Interface Interface node

Example historical reports memory
node CPU
utilization for a SONAS interface node utilization
Interface node Storage node

network utilization disk utilization
Figure 3-35 SONAS Health Center historical system utilization graphical reports
The length of time that can be reported is determined by the amount of log space set aside to
capture data. For additional information on the health center refer to “Health Center” on
page 411.
3.10.3 Command Line Interface

The SONAS Command Line Interface (CLI) runs on the SONAS Management Node. The CLI
provides the ability to perform SONAS administrative tasks, and implements about 110 CLI
commands. The focus is on enabling scripting of administrative tasks. CLI primarily for
installation and setup commands, with additional configuration functionality
The CLI includes commands for all SONAS functions:

򐂰 Cluster configuration
򐂰 Authentication
򐂰 Network
򐂰 Files
򐂰 File Systems
򐂰 Exports
򐂰 File Sets
򐂰 Quotas
򐂰 Snapshots
򐂰 Replication
򐂰 ILM automatic tiered storage
򐂰 Hierarchical storage management
򐂰 Physical management of disk storage
򐂰 Performance and Reports
򐂰 System Utilization

򐂰 SONAS Console Settings

򐂰 Scheduled Tasks
The SONAS CLI is designed to be familiar to the standard Unix, Windows, and NAS
administrator.
3.10.4 External notifications

SONAS collects data and can send event information to external recipients. To get proactive
notifications for events which need to be supervised, the administrator can configure
thresholds, the events to trigger a notification and who should be the notification recipient.
This ensures that the administrator is informed when the incident takes place. Figure 3-36
shows the SONAS notification monitoring architecture.
SNMP
IMM
Log
DDN System
CIM
SMC Checkout
Health Center Call Home
Voltaire
Mutipath MgmtNode
Network SNMP
Gatherer
Samba
CIFS
CTDB SMTP
GPFS
Syslog
... GUI
Figure 3-36 Notification monitoring architecture
SONAS supports the following kinds of notifications:

򐂰 Summary e-mail that collects all messages and sends out the list on a regular basis
򐂰 Immediate e-mail and SNMP traps
– contents the same for both e-mail and SNMP
– log messages are instantly forwarded
– a max number of messages can be defined – after that number is reached further
messages are collected and a summary is sent
These messages originate from multiple sources including Syslog, GUI gatherer (gpfs status,
ctdb status, etc.), CIM messages from providers in cluster and SNMP messages from nodes
in cluster. SONAS also allows you to set utilization thresholds, when the threshold is reached
a notification gets sent, for various resources including:
򐂰 CPU usage
򐂰 File system usage
򐂰 GPFS usage
򐂰 Memory usage
򐂰 Network errors

3.11 Grouping concepts in SONAS

SONAS is a scale out architecture where multiple interface nodes and storage pods act
together in a coordinated way to deliver service. SONAS commands and processes can be
configured and run on all nodes or on a subset of nodes, a group of nodes. We discuss these
SONAS grouping concepts, present where they are applied and discuss grouping
dependencies.
In a SONAS cluster we have multiple interface nodes that can export CIFS and NFS shares
for the same set of underlying SONAS filesystems. After creating a filesystem this can be
mounted on a subset of nodes or on all nodes using the mountfs command. The example in
Figure 3-37 shows that filesys#1 is mounted on node#1, node#2 and node#3 whereas
filesys#3 is mounted on nodes #2, #3 and #4. To make the filesystem available to users it is
then exported using the mkexport, the mkexport command does not allow you to specify a
given subset of nodes but it makes the export available on all nodes, this is why the diagram
shows an exports box across all interface nodes with three exports for all the filesystems:
exportfs#1 for filesys#1 an so on. When the CTDB manages exports it will give a warning for
nodes that do not have the filesystem mounted as the export of a filesystem depends on
mount of that filesystem being present.
The network group concept represents a named collection of interface nodes; in our example
in Figure 3-37 on page 132 we have three network groups, netwkgrp#1 associated with
interface nodes #1 #2 and #3,netwkgrp#2 associated with interface nodes #2 #3 and #4 and
lastly the default netwkgrp associated with all interface nodes.
The network object is a collection of common properties that describe a network, such as the
subnet mask, gateway, vlan ID and so on, a network aggregates a pool of IP addresses.
These IP addresses are assigned to the raw interfaces or to the bonds of the associated
interface nodes. The example in Figure 3-37 shows that network#A is associated to IP
addresses IPA1, IPA2 and IOA3 an so on.
Networks are connected to one single network group using the attachnw command. Our
example in Figure 3-37 shows three networks: network#A, network#B and network#C
attached respectively to network group netwgrp#1, netwgrp#2 and the default netwkgrp.
A DNS alias can be created for each network that resolves to all the IP addresses in the given
network. For example if network#A has IP addresses IPA1, IPA2 and IPA3 we create a DNS
alias called for example SONAS#A that resolves to the three IP addresses above. We also
create a DNS alias for network#B and network#C that resplve to network#B and network#C
IP addresses.

DNS
DNS
Network#A
Network#A Network#B
Network#B Network#C
Network#C SONAS#A  IPA1,
SONAS#A  IPA1, IPA2,
IPA2, IPA3
IPA3
SONAS#B  IPB1,
SONAS#B  IPB1, IPB2,
IPB2, IPB3
IPB3
SONAS#C  IPC1,
SONAS#C  IPC1, IPC2,
IPC2, IPC3,
IPC3,
IPC4,IPC5,IPC6
IPC4,IPC5,IPC6
Netwkgrp#1
Netwkgrp#1 Netwkgrp#2
Netwkgrp#2 Default
Default netwkgrp
netwkgrp
IPB1 IPB2
IPC6
IPA1 IPC4 IPA2 IPC5 IPA3 IPB3 IPC1 IPC2 IPC3
Interface
Interface Interface
Interface Interface
Interface Interface
Interface Interface
Interface Interface
Interface
Node
Node #1#1 Node
Node #2#2 Node
Node #3#3 Node
Node #4#4 Node
Node #5#5 Node
Node #6#6
Exports:
Exports: exportfs#1,
exportfs#1, exportfs#2,
exportfs#2, exportfs#3
exportfs#3
filesys #1 filesys #3 filesys #2
Figure 3-37 Filesystems, exports and networks
In the above example filesystem filesys#2 is accessible through the export exportfs#2
through all networks. Filestsrem filesys#1 instead will be accessible only through network#A
and the DNS alias SONAS#A. Accessing filesys#1 over network#B and DNS alias SONAS#B
could cause problems as the DNS could return IP address IPB3 that is associated with
node#4 that does not mount filesys#1.
When creating network groups care should be taken to ensure that filesystems accessed
through a given network are mounted on all that network’s network group nodes.
In failove situations the IP address will be taken over by another node in that specific network
group, and this ensures that in case of failover that specific IP address will allow you to
access all filesystems associated with, or mounted on the nodes of, that network group.
One way to ensure that there are no mismatches between mounted filesystems and network
groups is to mount the share on all interface nodes and access it only on a given network
group.
Network groups can be used for multiple reasons: when we limit the client access to two or
three nodes we increase the pro bability of finding data in cache than if the access were
spread across many nodes, so it would give a performance benefit. Another use is to
segregate workloads such as production and test in the same SONAS cluster.
3.11.1 Node grouping and TSM

Grouping concepts apply also to SONAS backups with TSM as illustrated in Figure 3-38. In
this case we have three filesystems, filesys#1 , #2 and #3 that we wish to backup to two
different TSM servers, TSMs#1 for filesys#1 and TSMs#2 for filesys#2 and filesys#3. As a
SONAS GPFS parallel filesystem can be accessed by multiple interface nodes and also
backed up from multiple interface nodes we will have to define that multiple TSM client
requests are made to the TSM server for the same filesystem.
To accomodate for this behaviour on the TSM server we have a grouping concept: on the
TSM server we define or register individual node names, one for each interface node that will

connect to the TSM server and then we define a target node; the interface node definitions
are a proxy of the target node. In our example in Figure 3-38 on page 134 we backup
filesys#1 to TSM server TSMs#1, as filesys#1 is accessed through interface nodes node#1,
#2 and #3 in the TSM server we define the TSM client nodes node#1, #2 and #3 and these
nodes will act as a proxy for the target node fs1tsm. In TSM terms all data received from
node#1, #2 and #3 will be stored under the name of the proxy target fs1tsm. This allows for
data backed up on any agent node to be restored on any other agent node as all requests to
the TSM server from the above three nodes are serviced by the common proxy target fs1tsm.
We then have TSM client configuration parameters, also called TSM stanzas, on the interface
nodes. These are configured using the cfgtsmnode SONAS command, after executing the
command on multiple nodes. This command configures a named TSM server instance for the
interface node TSM client that contains server connection parameters, TSM client name and
password and TSM server proxy target.
For example on node#2 we use the cfgtsmnode command twice as we will use this node to
backup of both filesys#1 and filesys#3. For filesys#1 we configure a stanza called tsms#1 that
points to TSM server tsms#1 and proxy target fs1tsm and for filesys#3 we configure a stanza
called tsms#2 that points to TSM server tsms#2 and TSM proxy target fs3tsm.
Using these definitions a TSM client on the interface nodes can connect to the TSM server,
that means a TSM client running on the interface node is enabled to connect to the assigned
TSM server, but where the client actually runs is a different matter. Backup execution is
controlled using the cfgbackupfs command. The diagram in Figure 3-38 on page 134 shows
that, for example, filesystem filesys#2 is enabled for backup on nodes node#2 to node#6 as
there is a TSM server stanza defined on each interface node, but there is no stanza definition
on node#1 even though this node has filesys#2 mounted. We can execute the backups for
filesys#2 on nodes node#2 to node#6 but we decide to segregate backup operations only to
nodes node#5 and node#6 so we execute the cfgbackpfs command for filesys#2 specifying
only nodes node#5 and node#6.

TSM
TSM Server:
Server: TSMs#1
TSMs#1
proxy
proxy target:fs1tsm
target:fs1tsm
nodes:node#1,node#2,node#3
nodes:node#1,node#2,node#3
TSM
TSM Server:
Server: TSMs#2
TSMs#2
proxy
proxy target:fs2tsm
target:fs2tsm nodes:node#1,node#2,node#3,node#4,node#5,node#6
nodes:node#1,node#2,node#3,node#4,node#5,node#6
proxy:fs3tsm
proxy:fs3tsm nodes:node#2,
nodes:node#2, node#3,
node#3, node#4
node#4
cfgbackupfs
cfgbackupfs cfgbackupfs
cfgbackupfs cfgbackupfs
cfgbackupfs
filesys#1
filesys#1 filesys#3
filesys#3 filesys#2
filesys#2
node#1
node#1 node#2,node#3,node4
node#2,node#3,node4 node#5,node#6
node#5,node#6
Interface
Interface Interface
Interface Interface
Interface Interface
Interface Interface
Interface Interface
Interface
Node
Node #1#1 Node
Node #2#2 Node
Node #3#3 Node
Node #4#4 Node
Node #5#5 Node
Node #6#6
Stanza:
Stanza: Stanza:
Stanza: Stanza:
Stanza: Stanza:
Stanza: Stanza:
Stanza: Stanza:
Stanza:
tsms#1
tsms#1 tsms#1
tsms#1 tsms#1
tsms#1
tsms#2
tsms#2 tsms#2
tsms#2 tsms#2
tsms#2 tsms#2
tsms#2 tsms#2
tsms#2
filesys #1 filesys #3 filesys #2
Figure 3-38 TSM grouping concepts
We have multiple grouping concepts in action here. On the TSM server side we define one
proxy target for each filesystem and this proxy target is associated with multiple proxy agent
nodes. You can define a subset of nodes as proxy agents but this may lead to errors if a
backup is run from a node that is not defined as a proxy agent so, to avoid such errors, define
all interface nodes as proxy agents for TSM.
The cfgtsmnode command wil create a TSM server definitions or stanzas on the node where
the command is run, and running the command on multiple nodes will create a group of
definitions for the same server. To avoid missing TSM server stanzas on a node you can
define all available TSM servers to all nodes.
The cfgbackupfs command configures the backup to run on a subset group of nodes. To
execute the backup of a filesystem on a node the following requirements must be met:
򐂰 The filesystem must be mounted on that node
򐂰 A TSM server stanza must have been defined on the node for the target TSM server
򐂰 TSM server proxy target and agent node definitions need to be in place for that node
򐂰 The interface node mist have network connectivity to the TSM server
The (green) arrows in Figure 3-38 show the data path for the backups. Data flows from the
filesystem to the group of interface nodes defined with the cfgtsmnode and cfgbackupfs
commands and to the TSM server through the network, we see that groups of nodes can
perform backup operations in parallel, for example backups for filesys#3 are executed by
nodes node#2, #3 nd #4.
The network must be available to access the TSM servers, as the network is accessed from
the interface nodes using network groups, the TSM server used to backup a given filesystem

must be accessible from the interface nodes where the filesystem is mounted and where the
TSM server stanza has been defined.
Node grouping and HSM

Data movement to external storage devices in SONAS is managed by and integrated TSM
HSM client that connects to a TSM server to migrate and recall data. As a filesystem can be
mounted on and exported by multiple interface nodes the HSM component needs to be
installed and active on all these interface nodes, this is because a recall request to a migrated
file may be started from any interface node.
SONAS uses the cfghsmnode CLI command to configure HSM connectivity to a given TSM
server on a group of interface nodes as follows:
cfghsmnode <TSMserver_alias> <intNode1,intNode2,...,intNodeN>
So with HSM you have a group of nodes that can connect to a TSM server as TSM HSM
clients. HSM is then enabled for a given filesystem using the cfghsmfs command.
3.11.2 Node grouping and async replication

Asynchronous replication processes run on a group of one or more interface nodes that
mount the filesystem to be replicated. You use the cfgrepl command to configure
asynchronous replication and you specify one or more source-target interface node pairs that
will run the replication operation. The number or group of source-destination pairs can be
scaled up as the amount of data to replicate grows.
3.12 Summary - SONAS Software

As we close this chapter, we have seen that SONAS Software provides a comprehensive,
integrated software functionality stack that includes in one software license, all capabilities
required to manage a SONAS system from the very small to the very large.
SONAS Software provides in one license, all of the components shown in Figure 3-39.

HSM and ILM

Parallel Monitoring Agents
File System
Replication
Security
Enterprise Linux
IBM Servers
Figure 3-39 Summary - SONAS software functional components
The SONAS Software provides the ability for central management of storage, providing the
functionality for a highly automated, extremely flexible, and highly scalable self-managing
system. You may start with a small SONAS with less than 100 TB, and continue to
seamlessly grow and linearly scale and increased performance, using the SONAS Software
to manage scalability at petabytes.
SONAS Software supports the full capability of the current SONAS to scale is up to 30
interface nodes and 60 storage nodes. The current largest SONAS storage subsystem,
capable of supporting up to 14.4 petabytes of raw storage. One copy of SONAS Software
runs on each node of a SONAS. A current maximum SONAS configuration is shown in
Figure 3-40.

/hom /home/appl/data/web/important_big_spreadsheet.xls
e /home/appl/data/web/important_big_spreadsheet.xls
/appl /home/appl/data/web/big_architecture_drawing.ppt
/home/appl/data/web/big_architecture_drawing.ppt Logical
/data
/web
IBM Scale Out NAS

Physical
Global Namespace
Policy Engine Interfac Interfac Interfac Interfac Interfac Interfac Interfac Interfac Interfac Interfac
… > scale
… > out
e nodes e nodes e nodes e nodes e nodes e nodes e nodes e nodes e nodes e nodes
Interfac Interfac Interfac Interfac Interfac Interfac Interfac Interfac Interfac Interfac
…>
Interfac Interfac Interfac Interfac Interfac Interfac Interfac Interfac Interfac Interfac
… … … … … … … … …
Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag Stor ag
… > scale
… > out
e e e e e e e e e e e e e e e e e e e e
no de s no de s no de s no de s no de s no de s no de s no de s no de s no de s node s node s no de s no de s no de s no de s no de s node s no de s node s
…>
no de s node s no de s no de s no de s no de s no de s no de s no de s no de s node s no de s no de s no de s no de s no de s node s node s no de s no de s
no de s node s no de s no de s no de s no de s no de s no de s no de s no de s node s no de s no de s no de s no de s no de s node s node s node s no de s
…
… >
> scale
…
… >> out
etc….. etc….. etc…..
Storage Pool 1 Storage Pool 2 Storage Pool 3….. etc.
Figure 3-40 SONAS Software manages all aspects of a maximum size SONAS
As storage needs continue to grow over time, the SONAS Software is designed to continue to
scale out and support even larger configurations, while still maintaining all the storage
management and high performance storage characteristics that we discussed in this chapter.
SONAS Software is designed to provide:

򐂰 A single software license that provides all capabilities required to manage a simple,
expandable Storage appliance, including ease of ordering, deployment, and management
򐂰 Centralized Management of all files, single namespace, which provides reduced
Administrative costs, faster response time to end users
򐂰 File Placement policies including automation, which provides optimized storage costs and
reduced administrative costs
򐂰 No individual chargeable add-on software, which provides reduced TCO, simpler
procurement process
򐂰 Automated policy based hierarchical storage management: HSM, ILM, which provides
reduced administrative costs, optimized storage costs
򐂰 Independent scalability of storage and nodes, which provides simple but flexible
configurations tailored to your specific workload characteristics, yet remains flexible and
reconfigurable for the future
򐂰 Concurrent access to files from all nodes, distributed token management, and automatic
self-tuning and workload balancing, and high availability via the Cluster Manager. These
combine to provide very high performance, reduced administrative costs related to
migrating hot spot files
򐂰 Storage Pool striping, which provides very high performance - fast access to data
򐂰 High performance metadata scanning across all available resources/nodes, integrated
TSM clients, which provides ability to perform HSM and automatic tiered storage at high
scalability, and well as accelerate faster backups of files
򐂰 Snapshots, Asynchronous Replication, which provides robust data protection and disaster
Recovery.

3.12.1 SONAS goals

In summary, SONAS Software provides the software foundation to meet the goals of IBM
SONAS stated below:
򐂰 Unified management of petabytes of storage
– Automated tiered storage, centrally managed and deployed
򐂰 Global access to data, from anywhere
– Single global namespace, across petabytes of data
򐂰 Based on standard, open architectures
– Not proprietary
– Avoids lock-ins
– Leverage worldwide Open Source innovative technology
򐂰 Provides and exceeds today’s needed requirements for:
– Scale-out capacity, performance, global virtual file server
– Extreme scalability with modular expansion
򐂰 High ROI
– Significant cost savings due to auto-tune, auto-balance, automatic tiered storage
򐂰 Position to exploit the next generation technology
– Superb foundation for cloud storage
In the remaining chapters of this book, we will continue to explore all aspects of SONAS in
more detail.

Draft Document for Review November 1, 2010 9:32 am 7875Networking.fm
Chapter 4. Networking considerations

This chapter provides information on networking as related to SONAS implementation and
configuration.
We begin with a brief review of Network Attached Storage concepts and terminology.
Following that, we will discuss some of the technical networking implementation details for
SONAS.

7875Networking.fm Draft Document for Review November 1, 2010 9:32 am
4.1 Review of network attached storage concepts

In this section we have a brief review of network attached storage concepts as it pertains to
SONAS, for those readers who are more familiar with block I/O SAN-attached storage and
terminology.
4.1.1 File systems

A file system is the physical structure an operating system uses to store and organize files on
a stroage device. To manage how data is laid out on the disk, an operating system addes a
hierarchical directory structure. Many different file systems have been developed to operate
with different operating systems. They reflect different OS requirements and performance
assumptions. Some file systems work well on small computers; others are designed to exploit
large, powerful servers. An early PC file system is the File Allocation Table (FAT) file system
used by the MS-DOS operating system. Others file systems include the High Performance
File System (HPFS), initially developed for IBM OS/2, Windows NT File System (NTFS),
Journal File System (JFS) developed for the IBM AIX OS, and General Parallel File System
(GPFS), also developed by IBM for AIX. There are many others.
A file system does not work directly with the disk device. A file system works with abstract
logical views of the disk storage. The file system maintains a map of the data on the disk
storage. From this map the file system finds space which is available to store the file. The file
system also creates metadata (data describing the file) which is used for systems and
storage management purposes, and determines access rights to the file.
The file system is usually tightly integrated with the operating system. However, in network
attached storage, it is physically separated from the OS and distributed to multiple remote
platforms. This is to allow a remote file system (or part of a file system) to be accessed as if it
were part of a local file system. This is what happens with Network File System (NFS) and
Common Internet File System (CIFS).
4.1.2 Redirecting I/O over the network to a NAS device

In the case of network-attached storage, input/output (I/O) is redirected out through the
network interface card (NIC) attachment to the network.
The NIC contains a network protocol driver in firmware, which describes the operations
exchanged over the underlying network protocol (such as TCP/IP). Now one of the network
file protocols (such as NFS or CIFS) comes into play. The I/O operation is transferred using
this network protocol to the remote network attached storage. With Windows operating
systems the file protocol is usually CIFS; with UNIX and Linux, it is usually NFS. Or it may be
File Transfer Protocol (FTP).
When the remote server, or NAS appliance, receives the redirected I/O, the I/O requests are
“unbundled” from their TCP/IP network protocols. The I/O request is submitted to the NAS
appliance’s operating system, which manages the scheduling of the I/O, and security
processes to the local disk. From then on the I/O is handled as a local I/O. It is routed via the
appliance’s file system, which establishes the file’s identity and directory, and eventually
converts the I/O request to a storage system protocol (that is, a block I/O operation). Finally,
the I/O request is routed to the physical storage device itself to satisfy the I/O request.
The receiving NAS system keeps track of the initiating client’s details, so that the response
can be directed back to the correct network address. The route for the returning I/O follows
more or less the reverse path outlined above.

Network File I/O differences from local SAN I/O

One of the key differences of a NAS device, compared to direct attached storage or SAN
storage, is that all I/O operations use file-level I/O protocols. The network access methods
such as NFS and CIFS can only handle file I/O requests to the remote file system located in
the operating system of the NAS device. This is because they have no knowledge of the
characteristics of the remote storage device.
I/O requests are transferred across the network, and it is the NAS OS file system which
converts the request to block I/O and reads or writes the data to the NAS disk storage.
Clearly, network file I/O process involves many more steps than storage protocol (block) I/O,
and it is this software stack overhead that is a factor in comparing performance of a NAS I/O
to a DAS or SAN-attached I/O.
An example of a network attached storage file I/O is shown in Figure 4-1.
Figure 4-1 Tracing the path of a network file I/O operation
It is important to note that a database application accessing a remote file located on a NAS
device, by default, must be configured to run with file system I/O. As we can see from the
diagram above, it cannot use raw I/O to achieve improved performance (that is only possible
with locally attached storage).
4.1.3 Network file system protocols

Network File System (NFS) is a network-based file protocol that is typically used by UNIX and
Linux operating systems. NFS is designed to be machine-independent, operating
system-independent, and transport protocol-independent.
Chapter 4. Networking considerations 141

Common Internet File System (CIFS) is a network-based file protocol that was designed by
Microsoft to work on Windows workstations.
In this section below, we provide a high-level comparison of NFS and CIFS.
Making file systems available to clients

NFS servers make their file systems available to other systems in the network by exporting
directories and files over the network. An NFS client “mounts” a remote file system from the
exported directory location. NFS controls access by giving client-system level user
authorization. The assumption is that a user who is authorized to the system must be
trustworthy. Although this type of security is adequate for many environments, it can be
abused by knowledgeable users who can access a UNIX system via the network.
On the other hand, CIFS systems create “file shares” which are accessible by authorized
users. CIFS authorizes users at the server level, and can use Windows domain controllers
(Windows Active Directory is a common example) for this purpose. CIFS security can be
generally considered to be stronger than NFS in this regard.
Stateless verses stateful

NFS is a stateless service. In other words, NFS it is not aware of the activities of its clients.
Any failure in the link will be transparent to both client and server. When the session is
re-established the two can immediately continue to work together again.
CIFS is session-oriented and stateful. This means that both client and server share a history
of what is happening during a session, and they are aware of the activities occurring. If there
is a problem, and the session has to be re-initiated, a new authentication process has to be
completed.
Security
For directory and file level security, NFS uses UNIX concepts of “User”, “Groups” (sets of
users sharing a common ID), and “Other” (meaning no associated ID). For every NFS
request, these IDs are checked against the UNIX file system’s security. However, even if the
IDs do not match, a user may still have access to the files.
CIFS, however, uses access control lists that are associated with the shares, directories, and
files, and authentication is required for access.
Locking
The locking mechanism principles are very different. When a file is in use NFS provides
“advisory lock” information to subsequent access requests. These inform subsequent
applications that the file is in use by another application, and for what it is being used. The
later applications can decide if they want to abide by the lock request or not. So UNIX or
Linux applications can access any file at any time. The system relies on “good neighbor”
responsibility and proper system administration is clearly essential.
CIFS, on the other hand, effectively locks the file in use. During a CIFS session, the lock
manager has historical information concerning which client has opened the file, for what
purpose, and in which sequence. The first access must complete before a second application
can access the file.

4.1.4 Domain Name Server

At the machine level, IP network connections use numeric IP addresses. However, because
these addresses are obviously much harder to remember (and manage) than just the name
of a system, modern networking uses symbolic host names.
For example, instead of typing:
http://10.12.7.14 (a completely fictitious address)
You could type:
http://www.ibm.com
In this case, the network Domain Name Servers (DNS) handle the mappings between
symbolic name (www.ibm.com) and the actual IP addresses. The DNS will take
www.ibm.com, and translate that to the IP address 10.12.7.14. By using DNS, a powerful
variety of network balancing and management functions can be achieved.
4.1.5 Authentication
SONAS supports the following authentication methods:
򐂰 LDAP (Lightweight Directory Access Protocol)
򐂰 NIS (Network Information Service)
򐂰 Samba PDC / NT4 mode
At the current release level, a single SONAS system can support only one of these
authentication methods at a time.
In order to access a SONAS, the user must be authenticated using the authentication method
that is implemented on a particular SONAS machine.
4.2 Domain Name Server as used by SONAS

SONAS uses the Domain Name Server (DNS) function to perform round-robin IP address
balancing for spreading workload equitably on a IP address basis, across the SONAS
interface nodes.
As shown in Figure 4-2 on page 144, when a user requests SONAS.virtual.com, the Domain
Name Server (DNS) must have been previously defined with a list of IP addresses that DNS
is to balance across the SONAS interface nodes.
The user request for SONAS.virtual.com, , is translated by DNS to a physical IP address, and
DNS then allocates that user to the SONAS interface node associated with that IP address.
Subsquent requests from other users for SONAS.virtual.com are allocated equitably (on an
IP address basis) to other SONAS interface nodes. Each interface node can handle multiple
clients on the same IP address; the unique pairing of the client IP address and the SONAS
interface node IP address is the the determines the connection. See Figure 4-2 on page 144.

DNS Server
(name resolution)
SONAS.virtual.com
10.0.0.10
10.0.0.11
10.0.0.10 10.0.0.12 10.0.0.14 10.0.0.12
10.0.0.11 10.0.0.13 10.0.0.15 10.0.0.13
10.0.0.14
10.0.0.15
Figure 4-2 SONAS inteface node workload allocation
As shown above in Figure 4-2, in SONAS each network client is allocated to one and only one
interface node, in order to minimize cluster overhead. SONAS Software does not rotate a
single client’s workload across interface nodes. That is not only unsupported by DNS or
CIFS, but would also decrease performance, as caching and read-ahead is per done per
SONAS interface node. At the same time, workload from multiple users, numbering into the
thousands or more, is equitably spread across as many SONAS interface nodes as are
available. If more user network capacity is required, you simply add more interface nodes.
SONAS scale out architecture provides linear scalability as the numbers of users grow.
SONAS requires an external server which runs an instance of the domain name server
(DNS). Based on using the DNS, SONAS will round robin each incoming request for a file to
the next available public IP interface on an available interface node. SONAS serves multiple
IP addresses and client gets one of these IP addresses in a round robin manner. If one of the
interface node goes down, another interface node starts serving the same IP address.
4.2.1 Domain Name Server configuration recommendations

We recommend to use multiple public IP addresses per interface node, for better load
balancing in the even of an interface node outage. DNS round robin provides the IP address
load balancing, while the workload failover is performed internally by SONAS Software.
Using the host name in SONAS is recommended, but you can use IP addresses instead of
DNS host name. In this case, clients will be binded to provided static IP address. SONAS
offers global namespace, so SONAS clients are using one host name which is spreaded
accross independent interface nodes and logical storage pools. One client is connected to
one interface node untill reboot or other interruption, there is no IP address caching
mechanism for clients on DNS level. Connecting a client simultaneously to multiple interface

nodes would decrease performance due to cache misses moreover this is not supported by
DNS and CIFS. When you expand your SONAS and add new Interface nodes you should add
the new IP addresses to DNS server and load will be distributed across newly configured
nodes.
4.2.2 Domain Name Server balances incoming workload

In Figure 4-3, you can find steps what are performed to balance SONAS incoming client
request.
Figure 4-3 DNS load balancing in SONAS
External DNS Server contains duplicated address records (A records) with different IP
addresses. These IP addresses are configured on Interface nodes. The name server rotates
addresses for the name which has multiple A records. The following is a description of the
DNS load balancing steps in Figure 4-3.
Step 1: First SONAS Client sends request to external DNS Server about sonas.pl.ibm.com IP
address.
Step 2: DNS Server rotates addresses for the name and provides first available in this
moment IP address for the client - 192.168.0.11.
Step 3: Client 1 connects to data via Interface Node 1.
Step 4: SONAS Client 2 send request to external DNS Server about sonas.pl.ibm.com IP
address.

Step 5: DNS Server rotates address for the name and provides first available in this moment -
it means next - IP address for the client - 192.168.0.12
Step 6: Second client connects to data via Interface Node 2.
4.2.3 Interface node failover / failback

Interface nodes may be dynamically removed and re-inserted into a cluster. The method of
upgrade, or repair of a interface node, would be to take the interface node out of the cluster.
The remaining interface nodes would assume the workload. The interfacee node may then be
upgraded or repaired, and then re-inserted into the cluster, and workload will then be
automatically rebalanced across the interface nodes in the SONAS.
When an interface node is removed from the cluster, or if there is a interface node failure,
healthy interface nodes take over the load of the failed node. In this case, the SONAS
Software Cluster Manager will automatically:
򐂰 Terminate old network connections and move the network connections to a healthy
interface node. IP addresses are automatically re-allocated to a healthy interface node
– Session and state information that was kept in the Cluster Manager is used to support
re-establishment of the session and maintaining IP addresses, ports, etc.
– This state and session information and metadata for each user and connection is
stored in memory in each node in a high performance clustered design, along with
appropriate shared locking and any byte-range locking requests, as well as other
information needed to maintain cross-platform coherency between CIFS, NFS, FTP,
HTTP users
򐂰 Notification technologies are used to ‘tickle’ the application and cause a reset of the
network connection
This is shown in Figure 4-4:

DNS Server
(name resolution)
SONAS.virtual.com
10.0.0.10
10.0.0.13 10.0.0.12 10.0.0.11
10.0.0.10 10.0.0.14 10.0.0.12
10.0.0.11 10.0.0.15 10.0.0.13
10.0.0.14
10.0.0.15
Figure 4-4 SONAS interface node failover
At the time of the failover of the node, if the session or application is not actively in a
connection transferring data, the failover can usually be transparent to the client. If the client
is transferring data, depending on the protocol and application, the application service failover
may be transparent to the client, depending on nature of the application, and depending on
what is occurring at the time of the failover.
In particular, if the client application, in response to the SONAS failover and SONAS
notifications, automatically does a retry of the network connection, then it is possible that the
user will not see an interruption of service. Examples of software that do this can include
many NFS-based applications, as well as Windows applications that do retries of the network
connection, such as the Windows XCOPY utility.
If the application does not do automatic network connection retries, or the protocol in question
is stateful (i.e. CIFS) then a client side reconnection may be necessary to re-establish the
session. Unfortunately for most CIFS connections, this will be the likely case.
In case of failure a Interface node all configured IP addresses on the node are taken over and
balanced by remaining Interface nodes. IP balancing is done by round robin algorithm, so it
means that SONAS does not check which node is more loaded in cache or bandwidth aspect.
This is illustrated in Figure 4-5. IP addresses configured on Interface node 2 are moved by
SONAS to Interface nodes 1 and 2. It means that from the SONAS client point of view host
name and IP address are still the same. Failure of the node is almost transparent for the client
and now it accesses data via Interface node 3 as indicated by Step 6.

Figure 4-5 Interface node failure - failover and load balancing
Note: NFS consists of multiple separate services, protocols and daemons that need to
share metadata among each other. If due to client crash, on reboot, the client is redirected
to some other Interface node, there is a remote possibility that the locks are lost from the
client but are still present on the previous Interface node creating problems for connection.
Therefore, use of DNS host names for mounting NFS shares is not supported. In order to
balance the load on SONAS, it is recommended to mount shares using different IP
addresses. This is an NFS limitation - for example CIFS uses only a single session, so
DNS host names can be used.
4.3 Bonding
Bonding is a method in which multiple network interfaces are combined to function as one
logical bonded interface for redundancy or increased throughput. SONAS network ports may
be bonded into two configurations using standard IBM SONAS bonding tools. Before creating
a bond interface you have to be sure that no network is assigned to the slaves and there is no
active IP address on any of the slaves. When network interfaces are bonded a new logical
interface is created which consists of slave physical interfaces. The bonded devices are
monitorable through the IBM SONAS GUI Topology pages. The MAC address of the bonding
device is taken from first added slave device, then the MAC address is passed to all following
slaves and remains persistent until the bonding logical device is brought down or

deconfigured. The bonding interface has a hardware address of 00:00:00:00:00:00 until the
first slave is added.
4.3.1 Bonding modes

Currently SONAS supports the following two bonding modes, both of them do not require any
specific configuration of your switch.
򐂰 mode 1 - active backup configuration: only one slave in the bond configuration is active
at a time. Other slaves become inactive until the active, primary slave fails. To avoid
switch confusing MAC address is externally visible only on one port. This mode provides
fault tolerance. Currently, 10 Gbit Converged Network Adapters (CNAs) in Interface
Nodes for external data connectivity are configured to handle IP over Infiniband in this
mode. Moreover all internal management Network Interface Cards (NICs) and internal
data Infiniband Host Channel Adapters (HCAs) are configured in SONAS by default in this
active-backup configuration. It means that all internal SONAS networks share a single IP
address and work in hot standby configuration.
򐂰 mode 6 - adaptive load balancing: the outgoing traffic is redistributed between all slaves
working in bond configuration according to the current load on each slave. The receive
load balancing is achieved through ARP negotiation.The receive load is redistributed
using round robin algorithm among the group of slaves in the bond. Effectively, this
configuration combines bandwidth into a single connection, so provides fault tolerance
and load balancing. Currently by default 1Gb Network Interface Cards (NIC) in Interface
Nodes for external data connectivity are configured in this mode.
Important: Current SONAS version does not support bonding network interfaces in
management node for external administrator connectivity. This means that in case of
failure of the link, the administrative IP address will not be moved automaticaly to a
backup interface. In this case SONAS will be serving data for clients, but the SONAS
administrator will not be able to reconfigure the SONAS system via GUI and CLI.
4.3.2 Monitoring bonded ports

SONAS uses a tool for monitoring bonded ports. The tool periodically checks the carrier state
of each slave and in case of failure, SONAS marks the device as down and takes appropriate
actions. The tool monitors links from devices to the nearest connected switch, so it is
important to understand that this tool can not detect network failure if it occured beyond the
nearest switch or if a switch is refusing to pass traffic while still maintaining carrier on. This
issue is especially important for external data networks. SONAS internally always uses two
switch configurations, so in case of failure internal single link or a switch SONAS is still up
and running. When you are planning external data connectivity, it is recommended to assure
multi-switch configuration to SONAS.
4.4 Network groups

A network group is a set of SONAS interface nodes which will use the same network
configuration. You can separate traffic between SONAS and external clients by using network
groups. To do that, you should create a new global name space in DNS with its own IP
address ranges, which contain only interface nodes belonging to the network group. You can
use different physical adapters or different VLAN IDs. The network group concept is shown
on Figure 4-6 on page 150.

`
1
SONAS Client 1 sonas1.pl.ibm.com ?
DNS Server
2 10.0.0.3
sonas2.pl.ibm.com. IN A 10.0.0.1
3
10.0.0.3
NETWORK GROUP 1 NETWORK GROUP 2 sonas1.pl.ibm.com. IN A 10.0.0.3

Interface Interface Interface sonas1.pl.ibm.com. IN A 10.0.0.6
node1 node2 node3
10.0.0.3 10.0.0.5 10.0.0.1

10.0.0.4 10.0.0.6 10.0.0.2
Figure 4-6 Network group concept in SONAS
In case of failure of an interface node in the network group, IP addresses will be taken over
only by remaining interface nodes in this network group. This is shown on Figure 4-7 on
page 150.
`
1
SONAS Client 1 sonas1.pl.ibm.com ?
2 DNS Server
10.0.0.3
3 sonas2.pl.ibm.com. IN A 10.0.0.1
10.0.0.3
NETWORK GROUP 1 NETWORK GROUP 2

Interface node2 sonas1.pl.ibm.com. IN A 10.0.0.4
Interface sonas1.pl.ibm.com. IN A 10.0.0.5
Interface
10.0.0.5 node3 sonas1.pl.ibm.com. IN A 10.0.0.6
node1
10.0.0.6
10.0.0.3 10.0.0.1
10.0.0.4 10.0.0.2
Figure 4-7 Failure a node in network group
This concept may be useful to separate traffic between the production and test environment,
or between two applications. It is important to understand that you can separate only network
traffic, and cannot separate internal data traffic. All interface nodes have access to all exports
and file systems’ data is accessible by interface nodes via all storage pods. To limit data

placement, you can use policies as described in “SONAS - Using the central policy engine
and automatic tiered storage” on page 107, but still it could be impossible to effectively
separate traffic between two environments.
You can limit or separate only network traffic to/from SONAS front-end (interface nodes). All
data may be written/read to/from all storage pods, according the logical storage pool
configuration and policy engine rules that are in effect.
By default, a single group will contain all interface nodes; that group is called the default
network group. You are allowed to configure and add nodes to custom network groups only
when these nodes are detached from the default network group. It is not possible to configure
a node in both default and custom network groups. It is not possible to remove the default
network group, but it can be empty.
4.5 Implementation networking considerations

In this section we discuss network considerations when implementing your SONAS system.
4.5.1 Network interface names

In the SONAS installation process network interfaces are created. These interfaces have
preconfigured names and when you create for example new bond configuration SONAS
creates new predefined names for the new interfaces. In SONAS you can find following
interfaces:
򐂰 ethX0...ethXn - bonded interfaces for public network
򐂰 ethXsl0...ethsln - slave interfaces for public network
򐂰 ethXn.vlanid - VLAN interface created on top a “n” interface
򐂰 eth0...ethn - interfaces without bond configuration
򐂰 mgmt0...mgmtn - bonded interfaces for management network
򐂰 mgmtsl0...mgmtsln - slave interfaces for management network
򐂰 data0...datan - bonded interfaces of InfiniBand network
򐂰 ib0...ibn - slave interfaces of InfiniBand network
4.5.2 Virtual Local Area Networks

In SONAS it is possible to configure Virtual Local Area Networks (VLANs). Packets tagging is
supported on the protocol level. This means that VLAN interfaces must be added on top of a
bonding or a physical interface.
򐂰 VLAN trunking is supported. Multiple networks , including VLAN ID's, can be defined in
SONAS, and networks can be assigned to physical ports in a n:n relationship.
򐂰 Overlapping VLANs are supported. Multiple VLANs can be assigned to a single adapter
򐂰 A VLAN ID must be in the range from 1 to 4095. You can create many VLAN IDs and
logicaly separate your traffic to and from SONAS cluster.
򐂰 The VLAN concept may be useful with Network Group configurations. With VLAN, you can
separate traffic between SONAS Network Groups and external clients.
򐂰 SONAS CLI comands can define different agreggates such as networks, port groups, and
VLANs; these can be changed with single commands and mapped to each other.
You should assure that all IP addresses which belong to the VLAN id can communicate with
your external clients. In addition, VLAN tagging may be used for backup your SONAS.

4.5.3 IP address ranges for internal connectivity

SONAS is preconfigured to use following IP address ranges for internal connectivity:
򐂰 172.31.*.*
򐂰 192.168.*.*
򐂰 10.254.*.*
You can choose the range during SONAS installation. The range you select must not conflict
with the IP addresses used for the customer Ethernet connections to the Management
node(s) and Interface nodes (refer to “Planning IP addresses” on page 262).
4.5.4 Use of Network Address Translation

When a node becomes unhealthy, all public IP addresses are withdrawn by SONAS and thus
all routes disappear. This might lead to situation where the node might not be able to become
healthy again, if a required service (for example, a ‘winbind’) requires a route to an external
server (for example, Active Directory).
To address this issue, one solution is to have static public addresses assigned to each node,
thus allowing a node to always be able to route traffic to the external network. This is the most
simple solution, but it uses up a large number of additional IP addresses.
A more sophisticatead solution, and the one used in SONAS, is to use the Network Address
Translation (NAT). In this mode, only one additional external IP address is needed. One of
the nodes in the SONAS cluster is elected to be hosting this IP address, so it can reach the
external services; this node is called the NAT Gateway. All other nodes, if their external IP
address is not accessible, route data via the NAT Gateway node to external networks for
authentication and authorization purposes.
In this way Network Address Translation (NAT) is used in SONAS to remove the need for
external access from the internal private SONAS network. NAT (a technique used with
network routers) is used to allow a single external IP address to be mapped to one or more
private network IP address ranges. In SONAS, NAT is used to allow a single customer IP
address to to be used to access the Management Node and Interface nodes in the internal
private network IP addresses.
To the external network, we define SONAS with IP addresses on the external customer
network. These addresses are mapped to the internal private network addresses, and
through the network address translation, authorized external users than can gain access to
the Management Node(s) and Interface Nodes in the internal network.
A network router is configured to translate the external IP address and port on the customer
network to a corresponding IP address and port on the internal SONAS private network. Only
one IP address is used in whole SONAS cluster on the customer network, while the port is
used to specify the various servers accessed using that single IP address. The SONAS
Management Node and Interface Nodes are assigned their private IP addresses on the
internal SONAS network during SONAS installation.
Note that this external IP address is not a Data Path connection; it is not used to transfer data
from and to Interface Nodes. Rather, this external IP address is used to provide a path from
the Management Node and Interface Node to the customer network, for authentication and
authorization purposes. Even if a node has its Data Path ports disabled (example: an
Interface Node with a hardware problem could have its Data Path ports disabled under the
control of the software), the node can still access the Active Directory or LDAP server for
authentication / authorization.

This mechanism assures that in case of a hardware problem or an administrative mistake, if a

required service (for example, a ‘winbind’) requires a route to an external server (for example,
Active Directory), an route is still available. This provides additional assurance that SONAS is
still able to be working in a healthy state, and data clients will not be affected.
4.5.5 Management node as NTP server

In SONAS, all nodes are configured by default to use the management node as Network
Time Protocol (NTP) Server - this assures that all SONAS nodes are operating on a common
time. You can assign an external NTP server on the management node to propagate
configuration on the whole SONAS cluster.
4.5.6 Maximum Transmission Unit

A default of 1500 bytes is used for the maximum transmission unit (MTU). MTU sizes can be
configured, and jumbo frames are supported with the 10 GbE ports (users may configure
MTUs up to 9000).
In case of a VLAN interfaces, we substract 4 bytes from the MTU of the corresponding
regular interfaces to compensate for the additional 48 bit VLAN tag.
4.5.7 Considerations and restrictions

Note that SONAS does not yet support the following networking functions. This should not be
considered an all-inclusive list, and these are are known requirements for SONAS:
򐂰 IP v6
򐂰 NFS v4
򐂰 LDAP signing
4.6 The impact of network latency on throughput

Network latency may impact throughput negatively and the impact will be higher the higher
the bandwidth of the network link, so adverse latency effects will be felt more with 10GigE
links than with 1GigE links. The effect discussed here is true for a single client sending
requests to a file server such as the SONAS server. Figure 4-8 on page 153 illustrates a
typical IO request from an application client to the file server. We have the following times:
t_lat The time spent in network latency getting the request from the
application client to the file server
t1 The time to transfer the request over the network
t2 The latency inside the file server, 0 in our example
t3 The time to transfer the response back to the server
Figure 4-8 Schematic of an IO request with latency

So the total time taken for an IO request is given by the sum of t_lat, t1, t2 and t3, we will call
this sum t_sum. Figure 4-9 shows the time it takes to transfer requests and responses over
the network links, for example a 61140 byte response will require 0.457764 msec over a
1GigE link, that can transfer 134217728 bytes/second, and 10 times less or 0.045776 msec
on a 10GigE link.
1GigE ms/req 10GigE ms/req

Size bytes IO type 134217728 1342177280
117 t1 request 0.000872 0.000087
61440 t3 response 0.457764 0.045776
Figure 4-9 request time on network link
The faster the request transfer time over the link the more requests (for example requests/sec
or IO/sec) you can get over the link per unit of time and consequently the greater the amount
of data that can be transferred over the link per unit of time (for example MB/sec).
Now introduce network latency into the equation, each IO will be delayed by a a given amount
of latency milliseconds, t_lat, and so each request from the application client will have periods
of data transfer, t1and t3, and idle periods measured by t_lat. During the t_lat periods the
network bandwidth is not used by the application client and so it is effectively wasted, the
bandwidth really available to the application client will thus be diminished by the sum of the
idle periods. The table shown in Figure 4-10 calculates how the reduction of effective
bandwidth si correlated with increasing network latency, and how this changes over 1GigE
and 10GbitE links. The last four lines show a 10GigE link, with latency - t_lat - increasing 0 to
0.001, 0.01 and 0.1 msec, t1 and t3 msec are the times spent on the network link, funcion of
bandwith or bytes/sec and t2, the internal latency in the server is assumed to be zero. The
t_sum value is the sum of t_lat+t1+t2+t3 represent the request response time time. So, for the
10GigE case with 0.01 msec l_lat we have a response time t_sum of 0.055864 msec and so
we can drive 17901 IO/sec. Each IO transferres 117 bytes request plus 61440 bytes
response in total or 61557 bytes in total and at 17901 IO/sec we can drive a troughput of
61557 x 17901 or 1051MB/sec (tot). Considering only the effective data transferred back to
the server, 61440 bytes per IO we also have 61440 x 17901 or 1049MB/sec.
MB/sec (resp)
MB/sec (tot)
Nework link
t_lat ms
IO/sec
t_sum
t1 ms
t2 ms
t3 ms
1g 0 0.000872 0 0.457764 0.458635 2180 128 128

1g 0.001 0.000872 0 0.457764 0.459635 2176 128 127
1g 0.01 0.000872 0 0.457764 0.468635 2134 125 125
1g 0.1 0.000872 0 0.457764 0.558635 1790 105 105
10g 0 0.000087 0 0.045776 0.045864 21804 1280 1278
10g 0.001 0.000087 0 0.045776 0.046864 21339 1253 1250
10g 0.01 0.000087 0 0.045776 0.055864 17901 1051 1049
10g 0.1 0.000087 0 0.045776 0.145864 6856 402 402
Figure 4-10 Latency to throughput correlation
We can see that with a latency value of 0 on a 10GigE link we can get a throughput of
1278MB/sec and adding in a network latency of 0.1 msec we get a a throughput of
402MB/sec that represents a 69% reduction in effective bandwidth. This reduction may
appear suprising given the theorocal bandwidth available. The charts is Figure 4-11 on

page 155 show how bandwitdth decreases for a single client accessing a server as latency
incerases. Looking at the first chart we see that the drop is much greater at higher bandwidth
values: the 10G MB/s line drops much more sharply as latency increases than the 1G MB/sec
line does an this means that the adverse effect of latency is more pronounced the gerater the
link bandwidth. The second chart shows the effect of latency on a workload with a smaller
blocksize or reqest size: 30720 bytes instead of 61440 bytes. The chart shows that that at 0.1
msec latency the throughput drops to just over 200MB/sec with a 30720 byte response size
instead of the 400MB/sec we get with the same latency 0f 0.1 msec but a request size of
61440 bytes.
1400
1G MB/s 10G M B/s
1200
Throughput [MB/sec]
1000
800
600 Effect of network latency on throughput:

117 byte requests and 61440 byte response s
400
200
0
0 0.001 latency [msec] 0.01 0.1
1400
1G MB/s 10G M B/s
1200
Throughput [MB/sec]
1000
800
600 Effect of network latency on throughput:

117 byte requests and 30720 byte response s
400
200
0
0 0.001 latency [msec] 0.01 0.1
4
Figure 4-11 Effect of latency on network throughput
To summarize, you should evaluate your network latency to understand the effect it can have
on expected throughput for single client applications. Latency has a greater impact with larger
network bandwidth links and smaller request sizes.
These adverse effects can be offset by having multiple different clients access the server in
parallel so they can take advantage of the unused bandwidth.


Draft Document for Review November 1, 2010 9:32 am 7875Policies.fm
Chapter 5. SONAS policies

This chapter provides information on how you create and use SONAS policies. We will
discuss the following topics:
Creating and managing policies
Policy command line syntax
Policy rules and best practices
Sample policy creation walkthrough

7875Policies.fm Draft Document for Review November 1, 2010 9:32 am
5.1 Creating and managing policies

We discuss what policies and rules consist of, show examples of policies and rules and
discuss the SONAS commands that manage policies and rules. We illustrate how to create a
storage pool and extend a filesystem to use the storage pool. We then show how to create
and apply data allocation policies.
File placement policies for a filesystem are set using the setpolicy command evaluated when
a file is created. If no file placement rule is in place GPFS will store data on the system pool
also called pool1.
File management policies are used to control the space utilization of online storage pools,
they can be tied to file attributes such as age and size and also to pool utilization thresholds.
The file management rules are evaluated periodically when the runpolicy command is
executed or when a task scheduled with the mkpolicytask is executed.
A policy consists in a list of one or more policy rules. Each policy rule or rule for short is a SQL
like statement that istructs SONAS GPFS what to do with a file in a specific storage pool if the
file meets specific criteria. A rule can apply to a single file, a fileset or a whole filesystem. A
rule specifies conditions that, when true, applies the action stated in the rule. A sample file
placement rule statement to put all text files in pool2 looks like the following:
RULE ‘textfiles’ SET POOL ‘pool2’ WHERE UPPER(name) LIKE ‘%.TXT’
A rule can specify many different types of conditions, for example:

򐂰 File creation, access or modification date and time
򐂰 Date and time when rule is evaluated
򐂰 Fileset name
򐂰 File name and extension
򐂰 File size and attributes auch as user and group IDs
SONAS supports eight kinds of rules:

File placement rule controls allocation pool of new files
File migration rule controls file movement between pools
File deletion rule controls file deletion
File exclusion rule excludes files from placement in a pool
File list rule generates a list of file that match a criteria
File restore rule controls where to restore files
External storage pool definition rule creates a list of files for the TSM server
External list definition rule creates a list of files
Rules must adhere to a specific syntax that is documented in the Managing Policies chapter
of IBM Scale Out Network Attached Storage Administrator’s Guide, GA32-0713. This syntax
is similar to the SQL language as it contains statements such as WHEN
(TimeBooleanExpression) and WHERE SqlExpression. Rules also contain SQL expressions
clauses that allow you to reference different file attributes as SQL variables and combine
them with SQL functions and operators. Dewpending on the clause a SQL expression must
evaluate to either true or false, a numeric value or a character string. Not all file attributes are
available to all rules.
5.1.1 The SCAN engine

Pool selection and error checking for file-placement policies is performed in the following
phases:

򐂰 When you install a new policy, the basic syntax of all the rules in the policy will be
checked.
򐂰 Also all references to storage pools will be checked. If a rule in the policy refers to a
storage pool that does not exist, the policy is not installed and an error is returned.
򐂰 When a new file is created, the rules in the active policy are evaluated in order. If an error
is detected, an error will be written to the log, all subsequent rules will be skipped, and an
EINVAL error code will be returned to the application.
򐂰 Otherwise, the first applicable rule is used to store the file data.
File management policies are executed and evaluated by the runpolicy command. A sample
file management template policy is shown in Example 5-1. This policy will migrate files from
the silver storage pool to the HSM storage pool if the silver pool is more than 90% full and will
stop at 70% pool utilization. It will exclude migrated files and predefined excludes from the
migration and perform migration in the order established by the weight expression.
Example 5-1 SONAS HSM template policy

[root@plasma]# lspolicy -P TEMPLATE-HSM
Policy Name Declaration Name Default Declarations
TEMPLATE-HSM stub_size N define(stub_size,0)
TEMPLATE-HSM is_premigrated N define(is_premigrated,(MISC_ATTRIBUTES LIKE '%M%' AND KB_ALLOCATED > stub_size))
TEMPLATE-HSM is_migrated N define(is_migrated,(MISC_ATTRIBUTES LIKE '%M%' AND KB_ALLOCATED == stub_size))
TEMPLATE-HSM access_age N define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
TEMPLATE-HSM mb_allocated N define(mb_allocated,(INTEGER(KB_ALLOCATED / 1024)))
TEMPLATE-HSM exclude_list N define(exclude_list,(PATH_NAME LIKE '%/.SpaceMan/%' OR NAME LIKE '%dsmerror.log%' OR PATH_NAME LIKE
'%/.ctdb/%'))
TEMPLATE-HSM weight_expression N define(weight_expression,(CASE WHEN access_age < 1 THEN 0 WHEN mb_allocated < 1 THEN access_age
WHEN is_premigrated THEN mb_allocated * access_age * 10 ELSE mb_allocated * access_age END))
TEMPLATE-HSM hsmexternalpool N RULE 'hsmexternalpool' EXTERNAL POOL 'hsm' EXEC 'HSMEXEC'
TEMPLATE-HSM hsmcandidatesList N RULE 'hsmcandidatesList' EXTERNAL LIST 'candidatesList' EXEC 'HSMLIST'
TEMPLATE-HSM systemtotape N RULE 'systemtotape' MIGRATE FROM POOL 'silver' THRESHOLD(80,70) WEIGHT(weight_expression) TO POOL
'hsm' WHERE NOT (exclude_list) AND NOT (is_migrated)
A file can be potential candidate for only one migration or deletion operation during one
runpolicy run, only one action will be performed. The SONAS runpolicy command uses the
SONAS scan engine to determine the files on which to apply specific actions. The SONAS
scan engine is based on the GPFS mmapplypolicy command in the background, and
mmapplypolicy runs in three phases:
Phase one selects candidate files. All files in the selected filesystem device are scanned and
all policy rules are evaluated in order for each file. Files are either excluded or candidated to
migration or deletion and each candidate fiel is assigned a weight or priority. Thesholds are
also determined and all the candidate files ar esent as input to the next phase.
Phase two chooses and schedules files. It takes the output of phase one and orders it so that
candidates with higher weights are chosen before those with lower weights. Files are
groupded into batches for processing, generally according to weight, and the process is
repeated until threshold objectives are met or until the file list is finished. generally files are
not chosen in this phase after the occupancy level of the source pool falls below the low
threshold or when the occupancy of the target pool is above the limit or 99% of totah capacity.
Phase three performs the actual file migration and deletion: the candidate files that were
chosen and scheduled by the second phase are migrated or deleted, each according to its
applicable rule. For migrations, if the applicable rule had a REPLICATE clause, the replication
factors are also adjusted accordingly. It is also possible that the source and destination pools
be the same because it can be used to adjust the replication factors of files without
necessarily moving them from one pool to another.
The SONAS R1.1.1 the threshold implementation is a single policy per filesystem. This
means that a single threshold is managed at a time. For example in a Pool1->Pool2->Pool3
setup data can transfer from Pool1  Pool2, causing Pool2 to exceed its threshold. In that
Chapter 5. SONAS policies 159

case Pool2 will not start transferring data to Pool3 until Pool1 has finished its transfer to
Pool2. Sufficient headroom must be provided in secondary and lower tiers of storage to deal
with this delayed threshold management.
Note: The Policy rules - examples and tips section in IBM Scale Out Network Attached
Storage Administrator’s Guide, GA32-0713 contains some good advice and tips on how to
get started in writing policies and rules.
5.2 SONAS CLI policy commands

The SONAS CLI has multiple commands to create and manage policies. Policies are created
using the mkpolicy and mkpolicyrule commands.
Figure 5-1 shows the CLI policy commands and their interaction
lspolicy
list policies
1-all defined in db or
2-specific policy details
3-applied to all filesys
SONAS db
mkpolicy SONAS cluster
create policy policy1
rule1
rule2 setpolicy filesys22
chpolicy apply policy to fs policy1
change policy policy7 for new files
rule3
rule4 runpolicy
rmpolicy
remove policy execute policy on fs filesys44
for existing files policy1
mkpolicytask
schedule policy cron
Filesys44
rmpolicytask when to run default
remove applied filesys policy?
schedule policy
Figure 5-1 CLI pilicy commands and their interaction
The mkpolicy command creates a new policy template with a name and a list of one or more
rules, the policy and rules are stored in the SONAS management database and a validation
of the rules is not performed at this time. The command is invoked as follows:
mkpolicy policyName [-CP <policyName> | -R <rules>] [-D]
The policy has a name and a set of rules specified with the -R switch. The -D switch sets the
default policy for a filesystem. Optionally a policy can be created by copying an existing policy
or a predefined policy template with the mkpolicy command and the -CP oldpolicy option. The
policy will be later applied to a SONAS filesystem.
The rules for a policy must be entered as a single string and separated by semicolons and
thre must be no leading or trailing blanks surrounding the semicolon(s).
This can be accomplished one of two different ways:

򐂰 The first method is to enter the rule as a single long string

mkpolicy ilmtest -R "RULE 'gtktosilver' SET POOL 'silver' WHERE NAME LIKE '%gtk
%';RULE 'ftktosystem' SET POOL 'system' WHERE NAME LIKE '%ftk%';RULE 'default'
SET POOL 'system'"
򐂰 The second method uses the linux line continuation character (backslash) to enter rules.
mkpolicy ilmtest -R "\
> RULE 'gtktosilver' SET POOL 'silver' WHERE NAME LIKE '%gtk%';\
> RULE 'ftktosystem' SET POOL 'system' WHERE NAME LIKE '%ftk%';\
> RULE 'default' SET POOL 'system'"
Some sample uses of the mkpolicy command are shown:

򐂰 Create a policy with the name "test" with two rules assigned.
mkpolicy test -R "set pool 'system';DELETE WHERE NAME LIKE '%temp%'"
򐂰 Create a policy with the name "test_copy" as a copy of the existing policy "test“
mkpolicy test_copy -CP test
򐂰 Create a policy with the name "default" with two rules assigned and marks it as the default
policy
mkpolicy default -R "set pool 'system';DELETE WHERE NAME LIKE '%temp%'" -D
The chpolicy command modifies an existing policy by adding, appending or deleting rules and
the rmpolicy can remove a policy from the SONAS database but it does not remove a policy
from a filesystem.
The chkpolicy command allows you to check policy syntax and to test the policy as follows:
chkpolicy device [-c <cluster name or id>] -P <policyName> [-T]
Where <device> specifies the filesystem and <policyName> the policy contained in the
database to be tested. Without the -T option, the policy will only be checked for correctness
against the file system. Using the -T option will do a test run of the policy, outputting the result
of applying the policy to the file system and showing which files would be migrated, as shown
in Example 5-2:
Example 5-2 Checking policies for correctness

[root@plasma.mgmt001st001 ~]# chkpolicy gpfs0 -P HSM_external -T
...
WEIGHT(inf) MIGRATE /ibm/gpfs0/mike/fset2/sonaspb26/wv_4k/dir1/test184/f937.blt
TO POOL hsm SHOW()
...
[I] GPFS Policy Decisions and File Choice Totals:
Chose to migrate 311667184KB: 558034 of 558039 candidates;
Chose to premigrate 0KB: 0 candidates;
Already co-managed 0KB: 5 candidates;
Chose to delete 0KB: 0 of 0 candidates;
Chose to list 0KB: 0 of 0 candidates;
0KB of chosen data is illplaced or illreplicated;
Predicted Data Pool Utilization in KB and %:
silver 4608 6694109184 0.000069%
system 46334172 6694109184 6.921624%
EFSSG1000I Command successfully completed.
Multiple named policies can be stored in the SONAS database. Policies can be listed with the
lspolicy command. Using lspolicy without arguments returns the name of all the policies
stored in the SONAS database. Specifying -P policyname lists all the rules in a policy and
specifying lspolicy -A lists filesystems with applied policies. Example 5-3 shows examples of
the list command:
Example 5-3 Listing policies

[root@plasma.mgmt001st001 ~]# lspolicy

Policy Name Declarations (define/RULE)

TEMPLATE-HSM stub_size,is_migrated,access_age,weight_expression,hsmexternalpool,hsmcandidatesList,systemtotape
TEMPLATE-ILM stub_size,is_premigrated,is_migrated,access_age,mb_allocated,exclude_list,weight_expression
gtkilmhack gtktosilver,ftktosystem,default
gtkpolicyhsm stub_size,is_premigrated,is_migrated
gtkpolicyhsm_flushat2000 stub_size,is_premigrated,is_migrated,access_age
[root@plasma]# lspolicy -P TEMPLATE-HSM

Policy Name Declaration Name Default Declarations
TEMPLATE-HSM stub_size N define(stub_size,0)
TEMPLATE-HSM is_premigrated N define(is_premigrated,(MISC_ATTRIBUTES LIKE '%M%' AND KB_ALLOCATED > stub_size))
TEMPLATE-HSM is_migrated N define(is_migrated,(MISC_ATTRIBUTES LIKE '%M%' AND KB_ALLOCATED == stub_size))
TEMPLATE-HSM access_age N define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
TEMPLATE-HSM mb_allocated N define(mb_allocated,(INTEGER(KB_ALLOCATED / 1024)))
TEMPLATE-HSM exclude_list N define(exclude_list,(PATH_NAME LIKE '%/.SpaceMan/%' OR NAME LIKE '%dsmerror.log%' OR PATH_NAME LIKE
'%/.ctdb/%'))
TEMPLATE-HSM weight_expression N define(weight_expression,(CASE WHEN access_age < 1 THEN 0 WHEN mb_allocated < 1 THEN access_age
WHEN is_premigrated THEN mb_allocated * access_age * 10 ELSE mb_allocated * access_age END))
TEMPLATE-HSM hsmexternalpool N RULE 'hsmexternalpool' EXTERNAL POOL 'hsm' EXEC 'HSMEXEC'
TEMPLATE-HSM hsmcandidatesList N RULE 'hsmcandidatesList' EXTERNAL LIST 'candidatesList' EXEC 'HSMLIST'
TEMPLATE-HSM systemtotape N RULE 'systemtotape' MIGRATE FROM POOL 'silver' THRESHOLD(80,70) WEIGHT(weight_expression) TO POOL
'hsm' WHERE NOT (exclude_list) AND NOT (is_migrated)
[root@plasma.mgmt001st001 ~]# lspolicy -A

Cluster Device Policy Set Name Policies Applied Time Who applied it?
plasma.storage.tucson.ibm.com testsas gtkpolicyhsm_flushat_4_12_20hr gtkpolicyhsm_flushat_4_12_20hr 4/26/10 11:17 PM root
A named policy stored in the SONAS database can be applied to a filesystem using the
setpolicy command. Policies set with the setpolicy command become the active policy for a
filesystem. The active policy controls the allocation and placement of new files in the
filesystem. The setpolicy -D command can also be used to remove an active policy for a
filesystem.
The runpolicy command executes or runs a policy on a filesystem. Either the default policy,
the one set on the filesystem using the setpolicy command, can be run by specifying the -D
option or a different policy stored in teh SONAS database can be run by specifying the -P
option. The runpolicy executes migration and deletion rules.
The mkpolicytask command creates a SONAS cron job, a scheduled operation, that applies
the currently applied policy on a filesystem at aspecified time. The mkpolicytask takes the
filesystem as an argument. To remove scheduled policy tasks from a filesystem you can use
the rmpolicytask command with filesystem as the argument.
5.3 SONAS policy best practices

We will now introduce some policy considerations and best practices that you should keep in
mind when developing and coding SONAS policies.
5.3.1 Cron jobs considerations

We will analyze a sample scenario that assumes that the filesystem has already been set up
for ILM with tiered or peered pools, with or without HSM. Cron jobs here are used to move
data between storage tiers. In the Pool1 to Pool2 (tiered) to Pool3 (TSM HSM ) case a cron
job would typically be used on either the Pool1 to Pool2 or the Pool2 to Pool3 migration. In
the peered case, where we have Pool1 and Pool2 (peered) ) that then migrate to Pool3 (TSM
HSM), case a cron job would typically be used on the Pool1 to Pool3 and Pool2 to Pool3
migration.
A typical use case for a cron job is to transfer large amounts of data at known periods of low
activity so that the migration thresholds set in the filesystem policy are rarely activated. If the

files being transferred are going to external TSM storage and will be accessed at a later time
the files could be premigrated by the cron job, otherwise they can be migrated by the cron job.
Now assume the filesystem is a single pool called Pool1 and migrates data to an external
TSM pool called Pool3. The thresholds for this pool are 80,75 so that if the filesystem is over
80% full, then HSM will migrate data until the pool is 75% full.
Assume for discussion a usage pattern that is heavy write activity from 8AM to 12PM, then
heavy mixed activity (reads and writes) from 12PM to 6PM, then activity tapers off, and the
system is essentially idle at 10PM.
With normal threshold processing, the 80% threshold is most likely to be hit between 8AM
and 12PM when Pool1 is receiving the most new data. Hitting this threshold will cause the
filesystem to respond by migrating data to Pool3. The read activity associated with this
migration will compete with the current host activity, slowing down the host jobs and
lengthening the host processing window.
If the daily write activity consisted of 10-20% of the size of the disk pool, migration would not
be required during the host window if the pool started at 80%-20%=60% full. A 5% margin
may be reasonable to ensure that the threshold is never hit in normal circumstances.
A reasonable cron job for this system is to have a migration policy set for 10PM that has a
migration threshold of 60,55 so if the filesystem is over 60% full, migrate to 55%. In addition a
cron job should be registered to trigger the policy at 10PM.
The cron job will activate the policy that is currently active on the filesystem. The policy will
need to include two migration clauses to implement these rules, a standard threshold
migration rule using threshold 80,75:
RULE defaultmig MIGRATE FROM POOL 'system' THRESHOLD (80,75)
WEIGHT(weight_expression) TO POOL 'hsm' WHERE NOT (exclude_list) AND NOT
(is_migrated)
And a specific 10PM migration rule using threshold 60,55

RULE deepmig MIGRATE FROM POOL 'system' THRESHOLD (60,55)
WEIGHT(weight_expression) TO POOL 'hsm' WHERE NOT (exclude_list) AND NOT
(is_migrated) AND HOUR(CURRENT_TIMESTAMP)=22
This scenario has an issue, SONAS filesystem will use the lowest of the two configured
thresholds to trigger its "lowDiskSpace" event:
RULE defaultmig MIGRATE FROM POOL 'system' THRESHOLD (80,75) …
RULE deepmig MIGRATE FROM POOL 'system' THRESHOLD (60,55)…
In this case SONAS filesystem will trigger a policy scan at 60% and this operation will happen
every 2 minutes, and generallyu it will not be at 10PM.
The scan will traverse all files in the filesystem and, as it is not 10PM, it will not find any
candidates but it will create al lot of wasted metadata activity, the policy will work, just burn
lots of CPU and disk IOPs. How can this behaviour be avoided? There are two solutions,
either avoid threshold in cron job call or use backup coupled with HSM.
To avoid the threshold consider your storage usage, determine a time period that
accomplishes your goal without using a threshold, for example using a rule that states
migrate all files that haven't been accessed in the last 3 days using a statement such as
(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) > 2

This has the advantage that it avoids threshold "spin" but has the disadvantage that it cannot
premigrate files.
To avoid the current cron job current limitation that only allows you to run the active filesystem
policy, the one that was put there with setpolicy, you could use use an external scheduler to
execute a SONAS command using ssh and do a runpolicy <mySpecificPolicy> using a
command similar to the following one.
ssh <SONASuser@mgmtnode.customerdomain.com> runpolicy <mySpecificPolicy>
5.3.2 Policy rules

We will now illustrate the mechanics of SONAS policy rules, explain the rule syntax and give
wrule coding best practices. For a detailed explanation on SONAS policy rules refer to the
GPFS Advanced Administration Guide - Version 3 Release 3 in Chapter 2. Information
Lifecycle Management for GPFS.
You should start creating policies from the SONAS supplied templates called
TEMPLATE-ILM and TEMPLATE-HSM. You can list the HSM template using the following
command:
lspolicy -P TEMPLATE-HSM
And you will see a policy similar to that shown in Figure 5-2. Make sure you use HSMEXEC,
HSMLIST statements as coded in the templates and ensure you keep the file exclustion rules
that were in the sample policy.
define(stub_size,0)
define(is_premigrated,(MISC_ATTRIBUTES LIKE '%M%' AND KB_ALLOCATED > stub_size))
define(is_migrated,(MISC_ATTRIBUTES LIKE '%M%' AND KB_ALLOCATED == stub_size)) Keep these
define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
rules/defines
define(mb_allocated,(INTEGER(KB_ALLOCATED / 1024)))
define(exclude_list,(PATH_NAME LIKE '%/.SpaceMan/%' OR Modify this
NAME LIKE '%dsmerror.log%' OR PATH_NAME LIKE '%/.ctdb/%')) Weight
define(weight_expression,(CASE WHEN access_age < 1 THEN 0
WHEN mb_allocated < 1 THEN access_age
Expression
WHEN is_premigrated THEN mb_allocated * access_age * 10
ELSE mb_allocated * access_age
END))
RULE 'hsmexternalpool' EXTERNAL POOL 'hsm' EXEC 'HSMEXEC' Tweak these
RULE 'hsmcandidatesList' EXTERNAL POOL 'candidatesList' EXEC 'HSMLIST'
thresholds
RULE 'systemtotape' MIGRATE
FROM POOL 'silver' THRESHOLD(80,70)
Keep this
WEIGHT(weight_expression) TO POOL 'hsm'
WHERE NOT (exclude_list) AND NOT (is_migrated) clause
RULE 'default' set pool 'system' Add a default
Placement rule
Figure 5-2 Sample policy syntax constructs
All SONAS policies must end with a default placement rule.
If you are running with HSM, consider using default = system to set the system pool as
default. Data will be probably configured to cascade, so put most data in fastest pool then let
it cascade through tiers. the statement to use is:
RULE 'default' set pool 'system'
If you are running with ILM and tiered storage, consider default = pool2 where pool2 is a
slower pool. The files default to slower pool and select files for faster pool explicitly. That way

if you forget a filter, it goes into the slower, and probably hopefully larger, pool. use a
statement like:
RULE 'default' set pool 'pool2'
Remember placement rules only apply to files created after placement rule is applied and that
placement rules do not affect recalled files as they will return to the pool they migrated from.
Policies can be coded using defines, also called macro defines. These are essentially named
variables used to make rules easier to read. For example the statement creates a define
named mb_allocated and sets it to the size of the file in MB.
define(mb_allocated,(INTEGER(KB_ALLOCATED / 1024)))
Defines offer a Convenient way to encapsulate weight expressions so as to provide common

definitions across the policy. Some common exclusions ould be:
򐂰 "special file" migration exclusion definition – always use this when migrating
򐂰 "migrated file" migration exclusion definition – always use this when migrating
A policy is a set of rules, macros can be used to make rules easier to read. Rules determine
what the policy does and the first rule matched applies to a file so order will matter. There are
two major types of rules: placement rules determines what pool a file is placed in when it first
appears in the filesystem and migration rules specify the conditions under which a file that
exists in the filesystem is moved to a different pool. Migration policies must include the
special file exclusion clause and migrated exclusion clause.
5.3.3 Peered policies

Peered policies contain placement rules only. Defines are generally not be required for
peered ILM policies. Placement rules select files by user defined criterion or policy, for
example:
RULE 'P1' set pool 'system' where upper(name) like '%SO%'
RULE 'P1' set pool 'system' where upper(name) like '%TOTALLY%'
Peered pools must contain a default placement rule, that by default puts files in the lower
performance pool, and then select groups of files using rules for placement into the higher
performance pool. For example:
RULE 'default' set pool 'slowpool'
5.3.4 Tiered policies

Tiered policies contain both migration rules and optional placement rules. This type of policy
requires the defines contained in the sample TEMPLATE-ILM policy. You may also
encapsulate weight expression as a define. Optional] placement rules select files by policy.
Migration rules. Some best practices are shown below:
򐂰 Make sure at least one threshold exists as a Safety Net, even if using other rules
򐂰 Suggest including exclusion clauses for migrated and special files in migration rules even
if not using HSM, so it can be added later
򐂰 Non-threshold migration will need an associated cron job to trigger it, as discussed in a
later section on migration filters
The policy is terminated by the default placement rule

RULE 'default' set pool 'system'

We use a suggest default of higher performance pool as subsequent tiering will cascade data
from high performance to low performance pools.
5.3.5 HSM policies

Use the defines from the TEMPLATE-HSM rules. You can again encapsulate weight
expression as a define and optionally have placement rules to select files by policy.
Follow these best practices for the migration rules:

򐂰 External pool rules – use rules from template (HSMEXEC, HSMLIST)
򐂰 Threshold – Make sure at least one exists as a safety net even if using other rules
򐂰 Always include exclusion clauses (migrated, special files) in migration rules
򐂰 Non-threshold migration – need an associated cron job to trigger – may want to have
"time" clause to prevent running on threshold trigger
򐂰 Define at least one rule for each migration "level" (system->pool2, pool2->hsm)
򐂰 External pool rules – use rules from template (HSMEXEC, HSMLIST)
Remember to terminate th policy with a default placement rule.
5.3.6 Policy triggers

Policies can be applied to a filesystem or only reside in the SONAS database.
򐂰 Filesystem policy
– "active" – one per filesystem, loaded from database (setpolicy)
򐂰 Database policy
– "inactive" – they are not running
– "default" = quick path to recalling a policy – this is a db state only
Triggers control when policies are activated. Policies only do something if triggered. We have
the following kinds of triggers:
򐂰 Manual trigger
– The runpolicy command allows a database policy to be run
򐂰 Automated triggers, also referred to as callbacks
– Threshold
i. The SONAS GPFS file system manager detects that disk space is running below
the low threshold that is specified in the current policy rule and raises a
lowDiskSpace event
ii. The lowDiskSpace event initiates aSONAS GPFS migration callback procedure
iii. The SONAS GPFS migration callback executes the SONAS script defined for that
callback
iv. The SONAS script executes the active filesystem policy
򐂰 Cron
– In SONAS R1.1.1 cron activates the default filesystem policy, later releases may allow
a different database policy to be selected and not the default policy for the filesystem.
When SONAS identifies that a threshold has been reached, it will trigger a new lowspace
event every two minutes so long as the fill livel of the filesystem is above the threshold.
SONAS knows that a migration was already triggered, so it ignores the new trigger and it will
not do any additional processing, the migration that started earlier continues execution.

5.3.7 Weight expressions

Weight Expressions are used with threshold migration rules. The threshold limits the amount
of data moved and the weight expression determines the order of files being migrated so that
files with the highest weight are moved first and until the threshold is satisfied.
We suggest you code weight expression as a define because it makes rule easier to read, as
the following rule shows:
RULE 'systemtosilver' MIGRATE FROM POOL 'system' THRESHOLD(15,10)
WEIGHT(weight_expression) TO POOL 'silver' WHERE NOT (exclude_list) AND NOT
(is_migrated)
Where weight expression is:

define(weight_expression,(CASE WHEN access_age < 1 THEN 0 WHEN mb_allocated <
1 THEN access_age WHEN is_premigrated THEN mb_allocated * access_age * 10
ELSE mb_allocated * access_age END))
The above two statements are simpler to read than the combined statements:
RULE 'systemtosilver' MIGRATE FROM POOL 'system' THRESHOLD(15,10) WEIGHT(CASE
WHEN access_age < 1 THEN 0 WHEN mb_allocated < 1 THEN access_age WHEN
is_premigrated THEN mb_allocated * access_age * 10 ELSE mb_allocated *
access_age END) TO POOL 'silver' WHERE NOT (exclude_list) AND NOT
(is_migrated)
5.3.8 Migration filters

Migration filters are used to control what gets migrated and when. Exclusion rules, or flters,
should include:
򐂰 Migrated and special files – these should be used from the templates
򐂰 Optionally small files – leave small files behind for efficiency if they can fit on disk, note
threshold + weight rule might do this anyways, so this might not be a useful rule
򐂰 The fine print – this means that small files won't be migrated to offline storage, and cannot
be recovered from the offline storage. Although HSM can be used to recover files, it is
NOT recommended and/or is NOT supported as a customer action – they should be using
backup/restore – in that case, if they run coupled with backup, the small files will be
backed up, just not migrated
Time filters may be useful when coulped with cron jobs, for example a cron every Sunday at
4:05 AM – perhaps we are flushing a lot of files not accessed for a week.
5.3.9 General considerations

Understand your SONAS and TSM throughputs and loads, make sure your thresholds leave
sufficient freespace to finish without running out of disk space. Note that bandwith to TSM
may only reduce the rate the filesystem fills during peak usage and not necessarily at a fast
enough rate depending on your configuration. The filesystem high threshold should allow the
peak use period to finish without filling the filesystem 100%. Always use a threshold if you are
using ILM/HSM. Even if you don't expect to hit the threshold, this will provide a safety net in
case your other policies have bugs, or in case your usage profile changes. Be aware that a
cron job that exploits a "low threshold" rule will cause "metadata spin". Migrations rules with
no threshold do not trigger automatically but need a cron job for that.

TSM will clone backups if HSM migration is done first, migration will still take the same
amount of time to move data from SONAS to TSM but backups may be faster depending on
server throughput. The migrequiresbackup option can be set at the TSM server and th
eoption can be used to prevent the following scenario:
If ACL data of a premigrated file are modified, these changes are not written to the TSM
server, if the file will be migrated after this change. To avoid loosing the modified ACL data,
use the option migrequiresbackup yes. This setting will not allow to migrate files, whose ACL
data have been modified and no current backup version exists on the server.
When using migrequiresbackup you must backup files or you may run out of space as HSM
will not move files.
5.4 Policy creation and execution walktrough

We will now illustrate the operational steps required to setup and execute SONAS policies,
both using teh SONAS GUI and with the SONAS CLI.
5.4.1 Create storage pool using the GUI

To create a storage pool using the GUI connect to the SONAS GUI and navigate to
Storage  Disks. You will see a display of all the NSD disks that are available as shown in
Figure 5-3. We see that all disks are in the system pool.
Figure 5-3 List of available NSD devices
You can also list the available storage pools for a specific filesystem by selecting Storage 
Storage Pools as shown in Figure 5-4. Note that you have only one storage pool, system, for
our file system. System is the default storage pool name and cannot be removed.
Figure 5-4 Storage pools details for selected file system

To assign a disk to a filesystem proceed to the Files  Files Systems panel. Select the
redbooks file system to which we wand to assign a new NSD disk with a different storage
pool. After selecting the filesystem below you will see the File System Disks window as
shown in Figure 5-5 on page 169
Figure 5-5 :File Systems Disks window pol3
Press the Add a disk to the file system button and a panel as that in Figure 5-6 is shown.
Select the disks to add, choose a disk type and specify a storage pool name and press Ok.
Figure 5-6 Add a disk to the filesystem panel pol4
After the task completes you will see that the filesystem now resides on two disks and you will
see file system and storage pool usage as shown in Figure 5-7 on page 170:

Figure 5-7 File system disk and usage display
5.4.2 Create storage pool using the CLI

Connect to the SONAS CLI. You can list the NSD volumes using the lsdisk command as
shown in Example 5-4:
Example 5-4 Listing NSDs

[sonas02.virtual.com]$ lsdisk
Name File system Failure group Type Pool Status Availability Timestamp
gpfs1nsd gpfs0 1 dataAndMetadata system ready up 4/22/10 3:03 AM
gpfs3nsd redbook 1 dataAndMetadata system ready up 4/22/10 3:03 AM
gpfs5nsd redbook2 1 dataAndMetadata system ready up 4/22/10 3:03 AM
gpfs4nsd 1 dataAndMetadata system ready 4/21/10 10:50 PM
gpfs6nsd 1 system ready 4/22/10 10:42 PM
We will modify the storage pool assignment for the NSD called gpfs4nsd using the chdisk and
lsdisk commands as shown in Example 5-5. Attributes such as pool name, usage type and
faliure group cannot be changed for disks that ar active in a filesystem.
Example 5-5 Change storage pool and data type assignement

[sonas02.virtual.com]$ chdisk gpfs4nsd --pool silver --usagetype dataonly
EFSSG0122I The disk(s) are changed successfully!
gpfs4nsd 1 dataOnly silver ready 4/21/10 10:50 PM
To add the gpfs4nsd to the redbook2 filesystem use the chfs command as shown in
Example 5-6.
Example 5-6 Add a disk to the redbook2 filesystem

[sonas02.virtual.com]$ chfs --add gpfs4nsd redbook2

The following disks of redbook2 will be formatted on node strg002st002.virtual.com:

gpfs4nsd: size 1048576 KB
Extending Allocation Map
Creating Allocation Map for storage pool 'silver'
31 % complete on Thu Apr 22 22:53:20 2010
Flushing Allocation Map for storage pool 'silver'
Disks up to size 24 GB can be added to storage pool 'silver'.
Checking Allocation Map for storage pool 'silver'
Completed adding disks to file system redbook2.
mmadddisk: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
EFSSG0020I The filesystem redbook2 has been successfully changed.
You can verify the storage pools and NSD assignement with the lspool command as shown in
Example 5-1:
Example 5-7 Listing storage pools

[sonas02.virtual.com]$ lspool
Filesystem Name Size Usage Available fragments Available blocks Disk list
gpfs0 system 2.00 GB 4.2% 350 kB 1.91 GB gpfs1nsd;gpfs2nsd
redbook system 1.00 GB 14.7% 696 kB 873.00 MB gpfs3nsd
redbook2 silver 1.00 GB 0.2% 14 kB 1021.98 MB gpfs4nsd
redbook2 system 1.00 GB 14.7% 704 kB 873.00 MB gpfs5nsd
Repeat the lsdisk command to confirm the correct filesystem to disk assignements as shown
in Example 5-8:
Example 5-8 Listing NSD disks

gpfs4nsd redbook2 1 dataOnly silver ready up 4/21/10 10:50 PM
5.4.3 Create and apply policies using the GUI

To create and apply a policy using the SONAS GUI select Files  Policies. You will see the
Policies List window with a list of file systems as shown in Figure 5-8. Selecting a file system
you will see the Policy Details window for that filesystem below.

Figure 5-8 Policy list window
In the policy details section of the window type your policy. Note that you can also load the
policy from a file on your computer by pressing the Load policy button. Press the Set policy
button and choose apply at the prompt to set the policy. After this press the Apply policy
button and choose apply at the prompt to apply the policy. After applying the policy you will
see a panel as shown in Figure 5-9 on page 173 showing a summary of the policy that will be
applied. Policies are now active.

Figure 5-9 Apply policy task progress window
5.4.4 Create and apply policies using the CLI

We will create a new policy called redpolicy using the CLI that contains the rules shown in
Example 5-9:
Example 5-9 Policy rules

RULE 'txtfiles' set POOL 'silver' WHERE UPPER(name) like '%.TXT'
RULE 'movepdf' MIGRATE FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF'
RULE 'default' set POOL 'system'
Note: The CLI mkpolicy and mkpolicyrule commands do NOT accept the RULE statement
so the RULE statement must be removed from all policies statements.

We create the policy and add the first rule using the mkpolicy command to create the policy
with the first rule and the mkpolicyrule command to append policy rules to the redpolicy policy
as shown in Example 5-10:
Example 5-10 Create a new policy

[sonas02]# mkpolicy -P "redpolicy" -R " set POOL 'silver' WHERE UPPER(name) like '%.TXT' ;"
[sonas02]# mkpolicyrule -P "redpolicy" -R " MIGRATE FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF' ;"
[sonas02]# mkpolicyrule -P "redpolicy" -R " set POOL 'system' "
We can list all policies defined using the CLI with the lspolicy -P all command as shown in
Example 5-11:
Example 5-11 List all policies

[sonas02]# lspolicy -P all
Policy Name Rule Number Rule Is Default
redpolicy 1 set POOL 'silver' WHERE UPPER(name) like '%.TXT' N
redpolicy 2 MIGRATE FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF' N
redpolicy 3 set POOL 'system' N
Note: You cannot list policies created using the GUI with lspolicy.
Now validate the policy using the chkpolicy command as shown in Example 5-12:
Example 5-12 Validate the policy

[sonas02]# chkpolicy -P "redpolicy" -T validate -d redbook2 -c sonas02.virtual.com
No error found. All the placement rules have been validated.
After successful validation set the policy for filesystem redbook2 using the setpolicy
command as shown in Example 5-13 on page 174. We then run the lspolicy -A command to
verify what filesystems have policies.
Example 5-13 Set the policy

[sonas02]# setpolicy -P "redpolicy" -d redbook2 -c sonas02.virtual.com
[root@sonas02.mgmt001st002 ~]# lspolicy -A

Cluster Device Policy Name Applied Time Who applied it?
sonas02.virtual.com redbook2 redpolicy 4/26/10 11:00 PM root
sonas02.virtual.com gpfs0 N/A
sonas02.virtual.com redbook N/A
Note: Policies crated with the GUI do not appear in the SONAS CLI lspolicy -A command.
The redbook filesystem does have a valid policy that was set using the GUI as shown in
Example 5-14. It was created using the GUI as it has RULE statements with word
comments that are not allowed by the CLI
Example 5-14 Policies applied to filesystems

[sonas02]# lspolicy -d redbook
Cluster Device Policy
Last update
sonas02.virtual.com redbook RULE 'txtfiles' set POOL 'silver' WHERE UPPER(name) like '%.TXT' ; RULE 'movepdf' MIGRATE FROM POOL 'system'
TO POOL 'silver' WHERE UPPER(name) like '%.PDF' ; RULE 'default' set POOL 'system' 4/26/10 10:59 PM
[sonas02]# lspolicy -d redbook2

Cluster Device Policy
Last update
sonas02.virtual.com redbook2 /* POLICY NAME: redpolicy */ ; RULE '1' set POOL 'silver' WHERE UPPER(name) like '%.TXT' ; RULE '2' MIGRATE
FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF' ; RULE '3' set POOL 'system' 4/26/10 11:10 PM

5.4.5 Testing policy execution

We will now connect to the SONAS management node as root to run the GPFS mmlsattr
command. Policies will be verified in the redbooks filesystem.The policy being executed is
Example 5-15 Sample policy

RULE 'txtfiles' set POOL 'silver' WHERE UPPER(name) like '%.TXT'
RULE 'movepdf' MIGRATE FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF'
/* This is a new policy for Lukasz & John */
RULE 'default' set POOL 'system'
We will verify that files that end with the .txt extension are placed in the silver pool, other files
go to the system pool and .pdf files are allocated in the system pool and subsequently moved
to the silver pool. We have created three files, we list them with ls -la and then run the GPFS
mmlsattr command to verify file placement, as shown in Example 5-16. The files are placed
as follows:
򐂰 test1.mp3 on the system pool
򐂰 test2.txt on the silver pool
򐂰 test3.pdf on the system pool
Example 5-16 Files allocated and managed by policies

[root@sonas02.mgmt001st002 export2]# ls -la
drwxr-xr-x 2 VIRTUAL\administrator root 8192 Apr 23 04:52 .
drwxr-xr-x 4 root root 32768 Apr 22 02:32 ..
-rw-r--r-- 1 root root 0 Apr 23 04:51 test1.mp3
-rw-r--r-- 1 root root 0 Apr 23 04:51 test2.txt
-rw-r--r-- 1 root root 0 Apr 23 04:52 test3.pdf
[root@sonas02.mgmt001st002 export2]# mmlsattr -L test*

file name: test1.mp3
metadata replication: 1 max 2
data replication: 1 max 2
immutable: no
flags:
storage pool name: system
fileset name: root
snapshot name:
file name: test2.txt

immutable: no
flags:
storage pool name: silver
fileset name: root
snapshot name:
file name: test3.pdf

immutable: no
flags:
fileset name: root
snapshot name:
Now we can apply the policy using the GUI by going to Files  Policies, selecting our file
system and clicking the Apply policy button and choosing apply. The apply policy causes the
migration rule to be applied. After policy execution we verify the correct placement of the files
using the mmlsattr command as shown in Example 5-17. The files will now be placed on
storage pools as follows:
򐂰 test1.mp3 remains on the system pool
򐂰 test2.txt remains on the silver pool
򐂰 test3.pdf has been moved to the silver pool

Example 5-17 List file status

[root@sonas02.mgmt001st002 export2]# mmlsattr -L test*
file name: test1.mp3
immutable: no
flags:
fileset name: root
snapshot name:
file name: test2.txt

immutable: no
flags:
fileset name: root
snapshot name:
file name: test3.pdf

immutable: no
flags:
fileset name: root
snapshot name:
Note: The mmlsattr is a GPFS command that needs to be run on SONAS using root
authority. SONAS does not support running commands with root authority. SONAS
developement recognizes the need for an equivalent SONAS command to verify file
placement of files in storage pools.

Draft Document for Review November 1, 2010 9:32 am 7875Availability.fm
Chapter 6. Backup and recovery, availability

and resiliency functions
This chapter illustrates SONAS components and external products that can be used to
guarantee data availability and resiliency. We also provide details on the Tivoli Storage
Manager integration. Topics discussed in this chapter include:
򐂰 Back-up and recovery of files in a SONAS cluster
򐂰 Configuring SONAS to use HSM
򐂰 Replication of SONAS data
򐂰 SONAS snapsots

7875Availability.fm Draft Document for Review November 1, 2010 9:32 am
6.1 High Availability and protection in base SONAS

A SONAS cluster offers many high availability and data protection features that are part of the
base configuration and do not need to be ordered separately. SONAS is a grid like storage
solution. By design all the components in a SONAS cluster are redundant so there is no
single point of faliure, for example we have multiple interface nodes for client access and data
can be replicated cross multiple storage pods. The software components included in the
SONAS cluster also offer high availabilty functions, for example the SONAS GPFS filesystem
is accessed concurrently from multiple interface nodes and offers data protection through
synchronous replication and snapshots. Refer to the Chapter 3, “Software architecture” on
page 73 for more details.
The SONAS also includes TSM client software for data protection and backup to an external
TSM server, asynchronous replication functions to send data to a remote SONAS or file
server.
Data is accessed through interface nodes, and interface nodes are deployed in groups of two
or more to guarantee data accessibility in the case that an interface node is no longer
accessible. The SONAS software stack manages services availability and access failover
between multiple interface nodes. This allows clients to continue accessing data in the case
that an interface node is unavailable. The SONAS Cluster Manager is comprised of three
fundamental components for data access failover:
򐂰 Cluster Trivial Database (CTDB) monitors services and restarts them on an avaiable node
The Cluster Trivial Database (CTDB) offers concurrent access from multiple nodes with
locking to maintain data integrity.
򐂰 DNS performs IP address resolution and round robin IP load balancing
򐂰 File sharing protocol error retry mechanisms
These three components, together with retry mechanism in the file sharing protocols, make
SONAS into a high availability file sharing solution.
In this chapter we will introduce and discuss the SONAS high availability and data protection
functions and how they can be applied in your environment.
6.1.1 Cluster Trivial Database

CTDB is used for two major functions. First it provides a clustered manager that can scale
well to large numbers of nodes. The second function it offers is the control of the cluster,
CTDB controls the public IP addresses used to publish the NAS services and moves them
between nodes. Using monitoring scripts, CTDB determines the health state of a node. If a
node has problems, like broken services or network links, the node becomes unhealthy. In
this case, CTDB migrates all public IP addresses to healthy nodes and sends so-called CTDB
“tickle-acks” to the clients so that they reestablish the connection. CTDB also provides the
API to manage cluster IP addresses, add and remove nodes, ban and disable nodes.
CTDB must be healthy on each node of the cluster for SONAS to work correctly. When
services are down for any reason, the state of CTDB might go down. CTDB services can be
restarted on a node using either the SONAS GUI or the command line. It is also possible to
change change CTDB configuration parameters such as public addresses, log file info and
debug level.

Suspending and resuming nodes

The SONAS administrator GUI or command line allow you to perform multiple operations on
a node.
The suspendnode and resumenode CLI commands provide control of the status of an
interface node in the cluster. The suspendnode command suspends a specified interface
node. It does this by banning the node at the CTDB level. A banned node does not participate
in the cluster and does not host any records for the CTDB. The IP addresses for a suspended
node are taken over by an other node and no services are hosted on the suspended node.
GUI example of suspendnode command

Following is an example of suspending and resuming a node from the GUI:
򐂰 Select Clusters , Interface Nodes from the GUI
򐂰 Select the cluster from the Active Cluster pulldown menu
򐂰 Select cluster node int001st002.virtual.com by marking the checkbox and select the
Suspend button to suspend the node
򐂰 After a short pause the screen shown in Figure 6-1 will appear showing that the status for
node int001st002.virtual.com is stopped and that all active IP addresses are on node node
int002st002.virtual.com:
Figure 6-1 Suspended node display
򐂰 To re-enable activity on node int001st002.virtual.com we select it and press the Resume

button, Figure 6-2 on page 180 shows the resulting status. Note that the public IP
addresses have been rebalanced across the nodes and that the status for the node is
active.
Chapter 6. Backup and recovery, availability and resiliency functions 179

Figure 6-2 Interface node IP addresses after node resume
6.1.2 DNS performs IP address resolution and load balancing

What happens when a problem occurs on a SONAS interface node or on the network that
connects the the client to the SONAS interface node depends on multiple factors such as the
file sharing protocol in use and on specific SONAS configuration parameters. We will illustrate
some failover considerations.
All requests from a client to a SONAS cluster for data accessis serviced through the SONAS
public IP address. These public IP addresses are similar to virtual addresses because in
general the client can access the same service, at different moments in time, over several
different public IP addresses. SONAS interface nodes can have multiple public IP addresses
for load balancing and IP failover, for example the lsnwinterface -x CLI command displays all
public addresses in the interface nodes as shown in Figure 6-13. The figure shows two
iterface nodes: int001st002 and int001st002 each with two public IP addresses assigned on
interfaces eth1 and eth2. The management node is also shown but it does not host any public
IP addresses.
[[SONAS]$ lsnwinterface -x
Node Interface MAC Master/Slave Up/Down IP-Addresses
int001st002.virtual.com eth0 02:1c:5b:00:01:01 UP
int001st002.virtual.com eth1 02:1c:5b:00:01:02 UP 10.0.1.121
mgmt001st002.virtual.com eth0 02:1c:5b:00:00:01 UP
Figure 6-3 Publc IP addresses before IP address failover
In Figure 6-3 we see that in normal operating conditions each interface node has two public
IP addresses and Figure 6-4 shows that after a node failover all public IP addresses have
been moved to interface node int002st002 and node int001st002 is hosting no IP addresses.

[[SONAS]$ Node Interface MAC Master/Slave Up/Down IP-Addresses

int002st002.virtual.com eth1 02:1c:5b:00:02:02 UP 10.0.1.121,10.0.1.122
int002st002.virtual.com eth2 02:1c:5b:00:02:03 UP 10.0.2.121,10.0.2.122
Figure 6-4 Publc IP addresses after IP address failover
6.1.3 File sharing protocol error recovery

Depending on the data access protocol different behaviors may be observed. The FTP and
SFTP protocols typically fail as they do not survive failed TCP connection, in this case the
user should restart the session, for example by reconnecting to the file server, using the same
IP address to get access to the new node. The CIFS protocol behaves well and demonstrates
failover using either DNS name resolution or static IP addressing.
The NFS protocol is not always successful on failover, to avoid reconnection problems we
recommend that NFS shares be accessed using static IP addresses and not DNS address
resolution. The reason for this is that when an NFS client uses DNS addresses, after a client
faliure, it may get a different IP address when it reconnects to the SONAS cluster but the NFS
file locks are dependant on and still held by the NFS client’s original IP address. In this
situation the NFS client may hang indefinitely waiting for lock on the file. To clean up the NFS
lock situation you can recover CTDB services on the failed node using the SONAS GUI using
the Clusters  Interface Node  Restart option.
6.2 Backup and restore of file data

This section discusses the backup and restore methods and techniques for SONAS file data,
it does not address the protection of SONAS metadata, for a discussion on the latter topic see
6.5.1, “Backup of SONAS configuration information” on page 212.
6.2.1 Tivoli Storage Manager terminology and operational overview

IBM Tivoli Storage Manager, working together with IBM SONAS, provides an end-to-end
comprehensive solution for backup/restore, archival, and hierarchical storage management
(HSM).
In order to best understand how IBM SONAS works together with IBM Tivoli Storage
Manager, it is useful here to review and compare the specific Tivoli Storage Manager
terminology and processes involved with:
򐂰 Backing up and restoring files
򐂰 Archiving and retrieving them
򐂰 Migrating and recalling them (Hierarchical Storage Management)
Tivoli Storage Manager terminology

If you use Tivoli Storage Manager to backup files (which will invoke the TSM backup-archive
client code on the interface nodes), copies of the files are created on the Tivoli Storage

Manager server external storage, and the original files remain in your local file system. To
obtain a backed file from Tivoli Storage Manager storage, for example in case the file is
accidentally deleted from the local file system, you restore the file.
If you use Tivoli Storage Manager to archive files to Tivoli Storage Manager storage, those
files are removed from your local file system, and if needed later, you retrieve them from
Tivoli Storage Manager storage.
If you use Tivoli Storage Manager to migrate SONAS files to external storage (which will
invoke the TSM HSM client code on the interface nodes), you move the files to external
storage attached to the Tivoli Storage Manager server, and TSM will replace the file with a
stub file in the SONAS file system. You may accept the default stub file size, or if you wish,
specify the size of your TSM HSM stub files to accomodate needs or applications that wish to
read headers or read initial portions of the file. To users, the files appear to be online in the
file system. If the migrated file is accessed, TSM HSM will automatically initiate a recall of the
full files from their migration location in external Tivoli Storage Manager-attached storage.
The effect to the user will simply be enlongated response time while the file is being recalled
and reloaded into internal SONAS storage. You may also initiate recalls proactively if desired.
6.2.2 Methods to backup a SONAS cluster

SONAS is a storage device that stores you file data so it is important to develop an
appropriate file data protection and backup plan to be able to recover data in case of disaster,
accidental deletion or data corruption.
We discuss how to backup the data contained in SONAS cluster using either Tivoli Storage
Manager (TSM) or other ISV backup products, we do not discuss the backup of SONAS
configuration information. SONAS cluster configuration information is stored on the
management node in multiple repositories. SONAS offers the bacupmanagement node
command to backup SONAS cluster configuration information. The use of this command is
described in 6.5, “Disaster recovery methods”.
SONAS clusters are preloaded with TSM to act as a TSM client to backup filesystems. The
SONAS TSM client requires an external, customer supplied and licensed, TSM server.
6.2.3 TSM client and server considerations

The TSM client integrated into the SONAS is at version 6.1 and this client version is
compatible with TSM servers at versions 5.5, 6.1 and 6.2. The TSM client runs on the
SONAS interface nodes and each interface node can open up to eight sessions to the TSM
server and multiple interface nodes can initiate proportionally more sessions to the TSM
server. For example 10 interface nodes could initiate up to 80 TSM sessions. We suggest
setting the TSM server maxsess parameter to a value of 100 for SONAS. If the TSM server
cannot handle such a large number of sessions it may be necessary to reduce the number of
interface nodes involved in a backup as server sessions that hang or are disconnected may
result in incomplete or failed backups.
Note: As each node can start up to eight parallel sessions the TSM client maxnummp
parameter should be set to eight. This means that a TSM client node can initiate up to
eight mount requests for TSM sequential media on the server.
SONAS currently supports LAN backup through the preinstalled TSM backup/archive client
running on interface nodes and only LAN backup is supported, LAN free backup is not
supported nor implemented. TSM uses the backup component, the archiving component is

not used. All backup and restore operations are executed using the SONAS CLI commands,
native TSM commands are not supported. The TSM client is configured to retry backup of
open files and continue without backing the file up after a set number of retries. The TSM
backup path length is limited to 1024 characters including both file and directory path length.
File names must not use the following characters: " or ' or linefeed (0x0A). Databases should
be shut down or frozen before a backup occurs to put them into a consistent state. Backup
jobs are run serially, that is only one backup job for one filesystem can run at one point in
time.
TSM database sizing

The TSM server and the TSM server database must be sized appropriately based on the
numbero of files that will be backed up. Each file that is backed up is an entry in the TSM
database and each file entry in the TSM database uses between 400 and 600 bytes or
around 0.5KB so we can give a rough estimate of the size of the database by multiplying the
number of files by the abverage file entry size, for example a total of 200 million files will
consume around 100GB of TSM database space. As of TSM 6.1 the maximum
recommended size for one TSM database is 1000 GB. When very large numbers of files
need to be backed up you may need to deploy multiple TSM servers. The smallest SONAS
that can be handled by a TSM server is a file system so this means that only one given TSM
server can backup and restore files for a given filesystem. When you have n filesystems you
can have between 1 and n TSM servers.
6.2.4 Configuring interface nodes for TSM

You must set up the interface nodes to work with TSM before you can configure and perform
a backup and restore operations. Before starting the confuguration the following information
is required:
򐂰 The TSM server name, IP address and port for the TSM servers to be configured
򐂰 The host names of the interface nodes that will run the backups
The following procedure also assumes you have already defined the TSM server
configuration elements such as policy domain, management class and storage pools have
been setup beforehand. For additional information on TSM configuration refer to SG24-5416
IBM Tivoli Storage Manager Implementation Guide that can be downloaded at:
http://www.redbooks.ibm.com/abstracts/sg245416.html?Open
Set up the SONAS client definitions on the TSM servers, you must execute these steps on all
the TSM servers:
1. Connect to the first TSM server to be configured as a TSM administrator with the TSM
command line interface (CLI) client by running the dsmadmc command on a system with
the TSM administrative interface installed
2. Register a virtual node name for the SONAS cluster. You can choose any name you like
providing it is not already registered to TSM. You may choose the SONAS cluster name
for example sonas1 with password sonas1secret and register the node to a TSM domain
called sonasdomain. Use the TSM register node command as follows:
register node sonas1 sonas1secret domain=sonasdomain
3. Register one TSM client node for each SONAS interface node that will run the TSM client.
Assuming we have the following three interface nodes: int1st2, int2st2 and int3st2 we
register a separate TSM node and password for each one using the TSM register node
command as follows:
register node int1st2node int1st2pswd domain=sonasdomain


4. Grant all the TSM client nodes representing the interface nodes proxy access to the TSM
virtual node representing the SONAS cluster using the TSM grant proxynode
administrator command, assuming we have the following three interface node TSM
clients: int1st2node, int2st2node and int3st2node: and that the cluster is called sonas1we
run the following TSM administrator commands:
grant proxynode target=sonas1 agent=int1st2node,int2st2node,int3st2node
5. Now we create a TSM server stanza, an entry into the TSM configuratiuon file, on all the
SONAS interface nodes. Assuming the TSM server is called tsmsrv1 and has IP address
tsmsrv1.com with port 1500 and we have the following three interface nodes to configure
for backup: int1st2, int2st2 and int3st2.
6. Connect to node int1st2 using the SONAS CLI and issue the following command:
cfgtsmnode tsmsrv1 tsmsrv1.com 1500 int1st2node sonas1 int1st2 int1st2pswd
9. Repeat steps from 1 to 8 for all the TSM servers you wish to configure.
Now the TSM servers will be configured on all interface nodes. You can verify this by issuing
the SONAS lstsmnode command without arguments to see all TSM stanza information on all
interface nodes.
6.2.5 Performing TSM backup and restore operations

Prerequisite to performing backup and restore operations is that all SONAS interface nodes
have been configured to connect to the TSM servers as outlined in 6.2.4, “Configuring
interface nodes for TSM”. We now will configure individual filesystems to backup to a specific
TSM server using the cfgbackupfs command to define what filesystem to backup, where to
back it up to and where to run the backup operation. This command does not perform the
actual backup operation, it just configures the SONAS cluster for backing up a specific
filesystem to a TSM server. For example to backup filesystem gpfsjt to TSM server tsmsrv1
and execute the backup operation on the two nodes int1st2 and int2st2 you issue the
following command from the SONAS CLI:
cfgbackupfs gpfsjt tsmsrv1 int1st2,int2st2
The possible interface nodes were supplied when configuring the TSM server stanza with the
cfgtsmnode SONAS command as shown in 6.2.4, “Configuring interface nodes for TSM” on
page 183. More than one interface node can be specified in a comma separated list,
providing it has allready been defined with the cfgtsmnode command. You can use the
lsbackupfs command to list the configured backups as shown in Example 6-1:
Example 6-1 List filesystem backup configuration and status

# lsbackupfs
File system TSM serv List of nodes Status Start time End time
gpfsjt tsmsrv1 int1st2,int2st2 NOT_STARTED N/A N/A
Now that the backup is fully configured we can run our first backup operation using the
SONAS startbackup CLI command. The command accepts a list of one or more filesystems,

and specifying no arguments makes the command backup all filesystems with configured
backup destinations. For example to start backing up the file system gpfsjt, issue:
startbackup gpfsjt
The command starts backup execution as a background operation and returns control to the
caller. You will have to monitor the status and completion of the backup operation for the
specific filesystem using the lsbackup SONAS command, as shown in Example 6-2:
Example 6-2 lsbackup command output

# lsbackup gpfsjt
Filesystem Date Message
gpfsjt 20.01.2010 02:00:00 G0300IEFSSG0300I The filesys gpfsjt backup started.
gpfsjt 19.01.2010 12:30:52 G0702IEFSSG0702I The filesys gpfsjt backup was done
successfully.
gpfsjt 18.01.2010 02:00:00 G0300IEFSSG0300I The filesys gpfsjt backup started.
You can also list the TSM server and backup interface node associations the status of the
latest backup, and validate the backup configuration by using the lsbackupfs -validate
SONAS command, for example Example 6-3 on page 185:
Example 6-3 Listing backup configuration and status

# lsbackupfs -validate
File system TSM server List of nodes Status Start time
gpfsjt tsmsrv1 int1st2,int2st2 COMPLETED_SUCCESSFULLY 1/21/10 04:26
(.. continuation of lines above ..)
.. End time Message Validation Last update
.. 1/21/10 04:27 INFO: backup ok (rc=0). Node is OK,Node is OK 1/21/10 04:27
TSM backups can be scheduled by using the CLI or GUI using the scheduled task called
StartBackupTSM. To schedule a backup of all SONAS filesystems at 4:15 AM you use
mktask as shown below:
mktask StartBackupTSM --parameter ”sonas02.virtual.com” --minute 15 --hour 4
Files backed up to TSM can be restored using the startrestore SONAS CLI command. The
startrestore command takes the filename or pattern as an argument so you need to know the
name of the files or directories to restore, you can also specify a restore date and time.
Specifying no date time filters will return the most recent backup data. The files will be
restored to the original location or to a different location if desired and you can choose wether
to replace the original files. An example of the restore command with the replace option is:
startrestore "/ibm/gpfsjt/dirjt/*" -R
The lsbackupfs command will show if a restore is currently running by displaying

RESTORE_RUNNING in the message field.
6.2.6 Using TSM HSM client

SONAS offers a Hierarchical Storage Management (HSM) integration to send data to
external storage devices managed by TSM. The TSM HSM clients run in the SONAS
interface nodes and use the ethernet connections within the interface nodes to connect to the
external, customer provided, TSM server. The primary goal of the HSM supportis to provide a
high performance HSM link between a SONAS subsystem and an external tape subsystem.
SONAS HSM support requires the following:

򐂰 One or more external TSM servers must be provided and the servers must be accessible
through the external ethernet connections on the interface nodes
򐂰 The SONAS cfgtsmnode must be run to configure the TSM environment
򐂰 SONAS GPFS policies drive migration so TSM HSM automigration needs to be disabled
Every interface node has a TSM HSM client installed alongside with the standard TSM
backup/archive client. An external TSM server is attached to the interface node through the
interface node ethernet connections. The TSM HSM client soppurts the SONAS GPFS
filesystem through the use of the Data Management API (DMAPI).
Before configuring HSM to a filesystem you must complete the TSM initial setup using the
cfgtsmnode command as ilustrated in 6.2.4, “Configuring interface nodes for TSM”. SONAS
HSM will use the same TSM server that was configured for the SONAS TSM backup client,
and using the same server allows TSM to clone data between the TSM server backup
storage pools and HSM storage pools.
With the SONAS TSM client, one TSM server stanza is provided for each GPFS filesystem.
Therefore, one GPFS filesystem can be connected to one single TSM server. Multiple GPFS
filesystems may use either the same or different TSM servers. Multiple TSM servers may be
needed when you have large number of files in a filesystem.
Note: At the time of writing you cannot remove SONAS HSM without help from IBM
The SONAS HSM client must be configured to run on all the interface nodes in the SONAS
cluster as migrated files can be accessed from any node and so the TSM HSM client needs to
be active on all the nodes. All SONAS HSM configuration commands will be run using the
SONAS CLI aand not the GUI.
To configure SONAS HSM use the cfghsmnode command is used to validate the connection
to TSM and sets up HSM parameters. It validates the connection to the provided TSM server
and it registers the migration callback. This script is invoked as follows:
cfghsmnode <TSMserver_alias> <intNode1,intNode2,...,intNodeN> [ -c <clusterId |
clusterName> ]
where <TSMserver_alias> is the name of the TSMserver set up by the backup/archive client,
<intNode1,intNode2,...> is the list of interface nodes that will run HSM to the attached TSM
server and <clusterId> or <clusterName> is the cluster identifier.
You then use the cfghsmfs SONAS command as follows:

cfghsmfs <TSMserv> <filesystem> [-P pool] [-T(TIER/PEER)] [-N <ifnodelist>] [-S
stubsize]
where <TSMserv> is the name of the TSMserver set up with the cfgtsmnode command,
<filesystem> is the name of the SONAS filesystem to be managed by HSM, <pool> is the
name of the user pool, TIER/PEER specifies if the system pool and the specified user pool
are set up as TIERed or PEERed, <ifnodelist> is the list of interface nodes that will interface
with the TSM server for this filesystem and <stubsize> is the HSM stubfile size in bytes.
For debugging pourposes there are two commands that can be used: lshsmlog shows the
HSM error log output (/var/log/dsmerror.log) and lshsmstatus shows the current HSM status .

SONAS HSM concepts

Uning SONAS Hierarchical Storage Manager (HSM) new and most frequently used files
remain on your local file systems, while those you use less often are automatically migrated
to storage media managed by an external TSM server. Migrated files still appear local and
are transparently migrated to and retrieved from the TSM Server. Files can also be prioritized
for migration according to their size and/or the number of days since they were last accessed,
which allows users to maximize local disk space. Enabling space management for a file
system can provide the following benefits:
򐂰 Extends local disk space by utilizing storage on the TSM Server
򐂰 Takes advantage of lower-cost storage resources that are available in your network
environment
򐂰 Allows for automatic migration of old and/or large files to the TSM Server
򐂰 Helps to avoid out-of-disk space conditions on client file systems
To migrate a file, HSM sends a copy of the file to a TSM server and replaces the original file
with a stub file on the local file system. A stub file is a small file that contains the information
required to locate and recall a migrated file from the TSM Server. It also makes it appear as
though the file still resides on your local file system. Similar to backups and archives,
migrating a file does not change the access time (atime) or permissions for that file.
SONAS storage management policies control and automate the migration of files between
storage pools and external storage.
A feature of automatic migration is the premigration of eligible files. The HSM client will detect
this condition and begin to automatically migrate eligible files to the TSM Server. This
migration process will continue to migrate files until the file system utilization falls below the
defined low threshold value. At that point, the HSM client will begin to premigrate files. To
premigrate a file, HSM copies the file to TSM storage and leaves the original file intact on the
local file system (i.e. no stub file is created). An identical copy of the file resides both on the
local file system and in TSM storage. The next time migration starts for this file system, HSM
can quickly change premigrated files to migrated files without having to spend time copying
the files to TSM storage. HSM verifies that the files have not changed since they were
premigrated and replaces the copies of the files on the local file system with stub files. When
automatic migration is performed, premigrated files are processed before resident files as this
allows space to be freed in the file system more quickly
A file managed by HSM can be in multiple states:

Resident A resident file resides on the local file system. For example, a newly
created file is a resident file.
Migrated A migrated file is a file that has been copied from the local file system
to TSM storage and replaced with a stub file.
Premigrated A premigrated file is a file that has been copied from the local file
system to TSM storage but has not been replaced with a stub file. An
identical copy of the file resides both on the local file system and in
TSM storage. A file can be in the premigrated state after premigration.
If a file is recalled but not modified, it will also be in the premigrated
state.
To return a migrated file to your workstation, access the file in the same way as you would
access a file that resides on your local file system. The HSM recall daemon automatically
recalls the migrated file from Tivoli Storage Manager storage. This process is referred to as
transparent recall.

6.3 Snapshots
SONAS offers filesystem level snapshots that allow you to create a point in time copy of all
the user data in a filesystem. System data and currently existing snapshots are not copied
with the snapshot operation. The snapshot function allows other programs, such as backups,
to run concurrently with user updates and still obtain a consistent copy of the file system at
the time the snapshot copy was created. Snapshots also provide an online backup capability
that allows easy recovery from common problems such as accidental deletion of a file, and
comparison with older versions of a file.
One SONAS cluster supports a maximum of 256 snapshots for each filesystem. When you
exceed the 256 snapshot limit you will not be able to create new snapshots and will receive
an error until you remove one or more existing snapshots.
The SONAS snapshots are space efficient because they only keep a copy of data blocks that
have subsequently been changed or have been deleted from the filesystem after the
snapshot has been taken.
6.3.1 Snapshot considerations

As snapshots are not copies of the entire file system so they should not be used as protection
against media faliure.
A snapshot file is independent from the original file as it only contains the user data and user
attributes of the original file. For Data Management API (DMAPI) managed file systems the
snapshot will not be DMAPI managed, regardless of the DMAPI attributes of the original file
because the DMAPI attributes are not inherited by the snapshot. For example, consider a
base file that is a stub file because the file contents have been migrated by TSM HSM to
offline media, the snapshot copy of the file will not be managed by DMAPI as it has not
inherited any DMAPI attributes and consequently referencing a snapshot copy of a TSM HSM
managed file will not cause TSM to initiate a file recall.
6.3.2 VSS snapshot integration

Snapshots can be integrated into a Microsoft Windows environment using windows Shadow
Copy Services (VSS). For seamless SONAS integration with VSS the following snapshot
naming convention must be followed: @GMT-yyyy.MM.dd-HH.mm.ss where the letter
groupings indicate a unique date and time. Snapshots created using the CLI automatically
adhere to this naming convention. Snapshots that are created with this name will be visible in
the “Previous version” window of the Windows Explorer, as illustrated in Figure 6-5 on
page 189. Note that Windows dispays the date and time based on the users date and time
settings.

Figure 6-5 Example Windows Explorer folder previous versions tab
6.3.3 Snapshot creation and management

We will show how to create and manage SONAS snapshots using both the command line
and the GUI. SONAS snapshot commands create a snapshot of the entire file system at a
specific point in time. Snapshots appear in a hidden subdirectory of the root directory called
.snapshots.
Create snapshots from the GUI

To create a snapshot of a sample filesystem called gpfsjt through the SONAS GUI proceed
as follows:
1. Log in to the SONAS management GUI
2. Select files  snapshots
3. Select the active cluster and the filesystem you wish to snapshot as shown in Figure 6-6
on page 190:

Figure 6-6 Select cluster and filesystem for snapshot
4. Click on the create new snapshot button.

5. You will be prompted for a name for the new snapshot; accept the default name if you
wish for the snapshot to be integrated with Windows VSS previous versions and click ok to
proceed.
6. You will see a task progress indicator window as shown in Figure 6-7. You can monitor
task progression using this window:
Figure 6-7 Snapshot task progress indicator
7. You can close the task progress window by clicking the close button.
8. You will now be presented with the list of available snapshots as shown in Figure 6-8 on
page 191.

Figure 6-8 List of completed snapshot
Creating and listing snapshots from theCLI

You can create snapshots from the SONAS CLI command line using the mksnapshot
command, as shown in Figure 6-9 on page 191:
[SONAS]$ mksnapshot gpfsjt

EFSSG0019I The snapshot @GMT-2010.04.09-00.32.43 has been successfully created.
Figure 6-9 Create a new snapshot
To list all snapshots from all filesystems you can use the lssnapshot command as shown in
Figure 6-10, the command retrieves data regarding the snapshots of a managed cluster from
the database and returns a list of snapshots:
[SONAS]$ lssnapshot
Cluster ID Device name Path Status Creation Used (metadata) Used (data) ID Timestamp
72..77 gpfsjt @GMT-2010.04.09-00.32.43 Valid 09.04.2010 02:32:43.000 16 0 5 20100409023246
72..77 gpfsjt @GMT-2010.04.08-23.58.37 Valid 09.04.2010 01:59:06.000 16 0 4 20100409023246
72..77 gpfsjt @GMT-2010.04.08-20.52.41 Valid 08.04.2010 22:52:56.000 64 1 1 20100409023246
Figure 6-10 List all snaphsots for all filesystems
Note the ID Timestamp field is the same for all snapshots, and this indicates the timestamp of
the last SONAS database refresh. The lssnapshots command with the -r option forces a
refresh of the snapshots data in the SONAS database by scanning all cluster snapshots
before retrieving the data for the list from the database.
Removing snapshots
Snapshots can be removed using the rmsnapshot command or from the GUI. For example to
remove a snapshot for filesystem gpfsjt using the command line we can proceed as shown in
Figure 6-11 where we show the following steps:
򐂰 Issue the lssnapshot command for filesystem gpfsjt and choose a snapshot to remove by
choosing that snapshot’s name, for example @GMT-2010.04.08-23.58.37
򐂰 Issue the rmsnapshot command with the name of the filesystem and the name of the
snapshot
򐂰 To verify if the snapshot has been removed issue the lssnapshot command again and
check that the removed snapshot is no longer present

[SONAS]$ lssnapshot -d gpfsjt

ClusID Devname Path Status Creation Used (metadata) Used (data) ...
72..77 gpfsjt @GMT-2010.04.09-00.32.43 Valid 09.04.2010 02:32:43.000 16 0 ...
72..77 gpfsjt @GMT-2010.04.08-23.58.37 Valid 09.04.2010 01:59:06.000 16 0 ...
72..77 gpfsjt @GMT-2010.04.08-20.52.41 Valid 08.04.2010 22:52:56.000 64 1 ...
[SONAS]$ rmsnapshot gpfsjt @GMT-2010.04.08-23.58.37
[SONAS]$ lssnapshot -d gpfsjt

ClusID DevName Path Status Creation Used (metadata) Used (data) ...
72..77 gpfsjt @GMT-2010.04.09-00.32.43 Valid 09.04.2010 02:32:43.000 16 0 ...
72..77 gpfsjt @GMT-2010.04.08-20.52.41 Valid 08.04.2010 22:52:56.000 64 1 ...
Figure 6-11 Removing snapshots
Scheduling snapshots at regular intervals

To automate the task of creating snapshots ad regular intervals you can create a repeating
SONAS task based on the snapshot task template called MkSnapshotCron. For example to
schedule a snapshot 5 minutes on filesystem gpfsjt issue the command shown in as shown in
Figure 6-12:
[SONAS]$ mktask MkSnapshotCron --parameter "sonas02.virtual.com gpfsjt" --minute */5

EFSSG0019I The task MkSnapshotCron has been successfully created.
Figure 6-12 Create a task to schedule snapshots
Note that to create scheduled cron tasks you must issue the mktask command from the CLI,
it is not possible to create cron tasks from the GUI. To list the snapshot task that you have
created you cas use the lstask command an shown in Figure 6-13.
[[SONAS]$ lstask -t cron

Name Description Status Last run Runs on Schedule
MkSnapshotCron This is a cronjob for scheduled snapshots. NONE N/A Mgmt node Runs at every 5th minute.
Figure 6-13 List scheduled tasks
And to verify that snapshots are being correctly performed you can use the lssnapshot
command as shown in Figure 6-14
[SONAS]$ lssnapshot
Cluster ID Device name Path Status Creation Used (metadata) Used (data) ID
72..77 gpfsjt @GMT-2010.04.09-03.15.06 Valid 09.04.2010 05:15:08.000 16 0 9
72..77 gpfsjt @GMT-2010.04.09-03.10.08 Valid 09.04.2010 05:10:11.000 16 0 8
72..77 gpfsjt @GMT-2010.04.09-03.05.03 Valid 09.04.2010 05:05:07.000 16 0 7
72..77 gpfsjt @GMT-2010.04.09-03.00.06 Valid 09.04.2010 05:00:07.000 16 0 6
72..77 gpfsjt @GMT-2010.04.09-00.32.43 Valid 09.04.2010 02:32:43.000 16 0 5
72..77 gpfsjt @GMT-2010.04.08-20.52.41 Valid 08.04.2010 22:52:56.000 64 1 1
Figure 6-14 List snapshotst
Microsoft Windows Viewing previous versions

Snapshots created with the naming convention like @GMT-yyyy.MM.dd-HH.mm.ssname will
be visible in the “Previous version” window of the Windows Explorer, as illustrated in

Figure 6-15 on page 193. The snapshots are only visible at the export level. To see the
previous versions for an export follow these steps:
1. Open a Windows Explorer window to see the share for which you want previous versions
displayed, \\10.0.0.21 in our example is the server and sonas21jt is our share
2. Click on the sonas21jt share name with mouse button two to bring up the sonas21jt share
properties window as shown in step (1)
3. Double click with th emouse to select a timestamp for which you wish to see the previous
versions, Today, April 09, 2010, 12:15 PM as shown in step (2)
4. You are now presented with a screen (3) showing the previous versions of files and
directories contained in the sonas21jt folder
3 2
Figure 6-15 Microsoft Windows - viewing previous versions
6.4 Local and remote replication

Data replication functions create a second copy of the file data and are used to offer a certain
level of protection against data unavailability. Replication generally offer protection against
component unavaiability such as a missing storage device or storage pod but does not offer
protection against logical file data corruption. When we replicate data we usually want to send
it to some distance as a protection against hardware faliure or a disaster event that makes th
eprimary copy of data unavailable, in the case of disaster protection we usually talk about
sending data to a remote site at a reasonable distance from the primary site.
6.4.1 Synchronous versus asynchronous replication

Data replication can occur in two different ways depending when the acknowledgement to the
writing application is returned: it can be synchronous or asynchronous. With synchronous
replication both copies of the data are written to their respective storage repositories before
returning an acknowlegement to the writing application. With aynchronous replication one
copy of the data is written to the primary storage repository, then an acknowlegement is
returned to the writing application and only subsequently is the data going to be written to the
secondary storage repository. Asynchronous replication can be further broken down into

continous or periodic replication depending on the frequency that batches of updates are sent
to the secondary storage. The replication taxonomy is illustrated in Figure 6-16 on page 194:
synchronous
replication
periodic
asynchronous
continous
Figure 6-16 Replication types
Asynchronous replication is normally used when the additional latency due to the distance
becomes problematic because it causes an unacceptable elongation to response times to the
primary application.
6.4.2 Block level versus file level replication

Replication can occur at different levels of granularity, it can be block level when we replicate
a disk or LUN and it can be file level when we replicate files or some portion of a file system
such as a directory or a fileset.
File level replication can the be either stateless or stateful. Stateless file replication occurs
when we replicate a file to a remte site and then loose track of it whereas stateful replication
tracks and coordinate updates made to the local and remote file so as to maintain the two
copies of the file in sysnc.
6.4.3 SONAS cluster replication

Replication can occur inside one single SONAS cluster or between a local SONAS cluster
and a remote SONAS cluster. The term intracluster replication refers to replication between
storage pods in the same SONAS cluster whereas intercluster replcation occurs between one
SONAS cluster and a remote destination that can be a separate SONAS cluster or a file
server. With intracluster replication the application does not need to be aware of the location
of the file and failover is transparent to the application itself whereas with intercluster
replication the application needs to be aware of the file’s location and will need to connect to
the new location to access the file. Figure 6-17 on page 195 shows two SONAS clusters with
file1 replicated using intracluster replication and file2 replicated with intercluster replication.

SONAS cluster#1 SONAS cluster#2
Interf.1 Interf.2 Interf.n Interf.1 Interf.2 Interf.n
stor pod1 stor pod2 stor pod1
InTERcluster
InTRAcluster file2 replication file2
replication
file1 file1
stor1 stor2 stor1
local/campus --------------------------- distance ---------------------------- geographic
Figure 6-17 Replication options
Table 6-1 on page 195 shows the possible SONAS replication scenarios
Table 6-1 SONAS replication solutions

Type Intracluster or Stateful or Local or
inetercluster stateless Remote distance
synchronous intracluster stateful local
asynchronous Intercluster stateless remote
6.4.4 Local synchronous replication

Local synchronous replication is implemented within a single SONAS cluster so it is defined
as intracluster replication. Synchronous replication is protection against total loss of a whole
storage building block or storage pod and it is implemented by writing all datablocks to two
storage building blocks that are part of two separate faliure groups. Synchronous replication
is implemented using separate GPFS faliure groups. Currently synchronous replication
applies to an entire filesystem and not to the individual fileset.
Since the writes are acknowledged to the application only when both writes have been
completed, write performance is dictated by the slower storage building block. High latencies
will degrade performance and therefore this is a short distance replication mechanism.
Synchronous replication requires InfiniBand connection between both sites and an increase
in distances will decrease the performance.
Another use case is protection against total loss of a complete site. In this scenario a
complete SONAS cluster (including interface and storage nodes) is split across two sites. The
data is replicated between both sites, so that every block is written to a building block on both
sites. For proper operation the administrator has to define correct failure groups. For the two
site scenario we need one failure group for each site. As of release 1.1 this use case is not
completely applicable as all infiniband switches reside in the same rack and unavailability of
this rack will stop SONAS cluster communications.
Synchronous replication does not distinguish between the two storage copies and these are
both peers, SONAS does not have a preferred faliure group concept where it will send all
reads, reads are sent to disks in both faliure groups.

Synchronous replication in the SONAS filesystem offers the following replication choices:
򐂰 No replication at all
򐂰 Replication of metadata only
򐂰 replication of data and metadata
It is recommended that metadata replication always be used for file systems within SONAS
cluster. Synchronous replication can be established at file system creation time or later when
the filesystem already contains data. Depending on when replication is applied, different
procedures must be followed to enable synchronous replication. Synchronous replication
requires that the disks belong to two distinct faliure groups so as to ensure that the data and
metadata is not replicated to the same physical disks. It is recommended that the different
faliure groups be defined on different storage enclusures, storage controlers to guarantee a
possibility if failover in the case that a physical disk component becomes unavailable.
Synchronous replication has the following prerequisites:

򐂰 Two separate faliure groups must be present
򐂰 The two faliure groups must have the same number of disks
򐂰 The same number of disks from each faliure group and the same disk usage type must be
assigned to the filesystem.
Establishing synchronous replication at filesystem creation

Synchronous replication across faliure groups can be established as an option at filesystem
creation time using either the GUI or the mkfs CLI command and specifying the -R option.
The -R option
Sets the level of replication used in this file system and can be one of:
򐂰 none, which means no replication at all
򐂰 meta, which indicates the file system meta data is synchronously mirrored
򐂰 all, which indicates the file system data and meta data is synchronously mirrored
Establishing synchronous replication after filesystem creation

Establishing synchronous replication after file system creation cannot be done using the GUI
but requires the CLI interface. To enable synchronous replication the following two steps
must be carried out:
򐂰 Enable synchronous replication with the change filesystem chfs command and specifying
the -R option
򐂰 Redistribute the filesystem data and metadata using the restripefs command
The following section will show how to enable synchronous replication on an existing
filesystem called gpfsjt:
򐂰 We use lsdisk to see the available disks and lsfs to see the filesystems as shown in
Figure 6-18:

[SONAS]$ lsdisk
gpfs3nsd gpfsjt 1 dataAndMetadata system ready up 4/12/10 3:03 AM
gpfs4nsd 1 dataAndMetadata userpool ready 4/13/10 1:55 AM
gpfs5nsd 1 dataAndMetadata system ready 4/13/10 1:55 AM
[SONAS]$ lsfs
Cluster Devicen Mountpoint .. Data replicas Metadata replicas Replication policy Dmapi
sonas02 gpfs0 /ibm/gpfs0 .. 1 1 whenpossible F
sonas02 gpfsjt /ibm/gpfsjt .. 1 1 whenpossible T
Figure 6-18 Disks and filesystem before replication
򐂰 Using the example in Figure 6-18 on page 197 we verify the number of disks currently
assigned to the gpfsjt filesystem in the lsdisk output and see there is only one disk used
called gpfs3nsd. To create the synchronous replica we need the same number of disks as
the number of disks currently assigned to the filesystem. From the lsdisk output we also
verify that there are a suffucient number of free disks that are not assigned to any
filesystem. We will use the disk called gpfs5nsd to create the data replica.
򐂰 The disk called gpfs5nsd is currently in faliure group 1 as the primary disk, we must assign
the disk to a separate faliure group 2, using the chdisk command as shown in Figure 6-19
and then we verify the disk status with lsdisk. Also verify that the new disk, gpfs5nsd is in
the same pool as the current disk gpfs3nsd:
[SONAS]$ chdisk gpfs5nsd --failuregroup 2

[SONAS]$ lsdisk
gpfs5nsd 2 dataAndMetadata system ready 4/13/10 2:15 AM
Figure 6-19 Assign a new faliure group to a disk
򐂰 At this point we add the new disk to file system gpfsjt using the chfs -add command as
illustrated in Figure 6-20 and verify the outcome using the lsdisk command:

[SONAS]$ chfs gpfsjt -add gpfs5nsd

The following disks of gpfsjt will be formatted on node mgmt001st002.virtual.com:
gpfs5nsd: size 1048576 KB
Checking Allocation Map for storage pool 'system'
52 % complete on Tue Apr 13 02:22:03 2010
100 % complete on Tue Apr 13 02:22:05 2010
Completed adding disks to file system gpfsjt.
EFSSG0020I The filesystem gpfsjt has been successfully changed.
[SONAS]$ lsdisk
[SONAS]$ lsfs
Cluster Devicen Mountpoint .. Data replicas Metadata replicas Replication policy Dmapi
Figure 6-20 Add a disk to a filesystem
򐂰 From the lsdisk output we can see that gpfs5nsd is assigned to filesystem gpfsjt and from
the lsfs output we notice that we still only have one copy of data and metadata as shown
in the Data replicas and Metadata replicas columns. To activate data and metadata
replication we need to execute the chfs -R command as shown in Figure 6-21:
[SONAS]$ chfs gpfsjt -R all

EFSSG0020I The filesystem gpfsjt has been successfully changed.
[SONAS]$ lsfs
Cluster DevicenMountpoint Data replicas Metadata replicas Replication policy Dmapi
Figure 6-21 Activate data replication
򐂰 The lsfs command now shows that there are two copies of the data in the gpfsjt filesystem.
򐂰 Now perform the restripefs command with the replication switch to redistribute data and
metadata as shown in Figure 6-22 on page 199:

[SONAS]$ restripefs gpfsjt --replication

Scanning file system metadata, phase 1 ...
Scan completed successfully.
Scanning user file metadata ...
EFSSG0043I Restriping of filesystem gpfsjt completed successfully.
[root@sonas02.mgmt001st002 dirjt]#
Figure 6-22 Restripefs to activate replication
SONAS does not offer any command to to verify that the file data is actually being replicated.
To verify the replication status connect to SONAS as a root user and issue the mmlsattr
command with the -L switch as illustrated in Figure 6-23. The report shows the metadata and
data replication status and we can se we have two copies for both metadata and data.
s
[root@sonas02.mgmt001st002 userpool]# mmlsattr -L *

file name: f1.txt
immutable: no
flags:
fileset name: root
snapshot name:
file name: f21.txt

immutable: no
flags:
storage pool name: userpool
fileset name: root
snapshot name:
Figure 6-23 Verify that file data is replicated
Filesystem syncronous replication can also be disabled using the chfs command as shown in
the following example:
chfs gpfsjt -R all
After changing the filesystem attributes the restripefs command needs to nebe issued to
remove replicas of the data, as shown in the following example:
restripefs gpfsjt --replication

6.4.5 Remote async replication

The ability to continue operations in the face of a regional disaster is handled through the
async replication mechanism of the SONAS appliance. Async replication allows for one or
more file systems within an SONAS file name space to be defined for replication to another
SONAS system over the customer network infrastructure. As the name async implies, files
created, modified or deleted at the primary location are propagated to the remote system
sometime after the change of the file in the primary system.
The async replication process looks for changed files in a defined file system of the source
SONAS since the last replication cycle was started against it, and using the rsync tool to
efficiently move only the changed portions of a file from one location to the next. In addition to
the file contents, all extended attribute information about the file is also replicated to the
remote system.
Async replication is defined in a single direction, such that one site is considered the source
of the data, and the other is the target as illustrated in Figure 6-24. The replica of the file
system at the remote location should be used in a Read-Only mode, until it is needed to
become usable in the event of a disaster.
File tree A rsync File tree A replica
local ----------------------------------- distance ---------------------------- geographic
Figure 6-24 Async replication source and target
The SONAS Interface nodes are defined as the elements for performing the replication
functions. When using async replication, the SONAS system detects the modified files from
the source system, and only moves the changed contents from each file to the remote
destination to create an exact replica. By only moving the changed portions of each modified
file, the network bandwidth is used very efficiently.
The file based movement allows the source and destination file trees to be of differing sizes
and configurations, as long as the destination file system is large enough to hold the contents
of the files from the source.
Async replication allows that all or portions of the data of a SONAS system to be replicated
asynchronously to another SONAS system and in the event of an extended outage or loss of
the primary system the data kept by the backup system will be accessible in R/W by the
customer applications. Async replication also offers a mechanism to replicate the data back
to the primary site after the outage or new system is restored. The backup system also offers
concurrent R/O access to the copy of the primary data testing/validation of the disaster
recovery mirror. The data at the backup system can be accessed by all of the protocols in use
on the primary system. You can take R/W snapshot of the replica, which can be used to allow

for full function disaster recovery testing against the customer's applications. Typically, the
R/W snapshot is deleted after the disaster recovery test has concluded.
File shares defined at the production site are not automatically carried forward to the
secondary site, and must be manually redefined by the customer for the secondary location,
an these shares should be defined as R/O until such time that they need to do production
work against the remote system in full R/W, for example for business continuance in the face
of a disaster. Redefinition to R/W shares can be done by using the CLI or GUI.
The relationship between the primary and secondary site is a 1:1 basis: one primary and one
secondary site.The scope of an async replication relationship is on a file system basis. Best
practices will need to be followed to ensure that the HSM systems are configured and
managed to avoid costly performance impacts during the async replication cycles that can be
due to the fact that the file has been migrated to offline storage before being replicated and
needs to be recalled from offline storage for replication to occur.
User authentication and mapping requirements

Async replication requires coordination of the customer's Window SID domain information to
the UID/GID mapping internal to the SONAS cluster as the ID mapping from the Windows
domain to the UNIX UID/GID mapping is not exchanged between the SONAS systems. As
the mapping are held external to the SONAS system in one of LDAP, NIS or with AD with
Microsoft SFU, the external customer servers hold mapping information and must have
coordinated resolution between their primary and secondary sites.
Async replication will only be usable for installations that use LDAP, NIS, or AD with the SFU
extensions, note that standard AD, without SFU, will not be sufficient. The reason is that
async replication can only move the files and their attributes from one site to the next.
Therefore, the UID/GID information which GPFS maintains is carried forward to the
destination. However, Active Directory only supplies a SID (windows authentication ID), and
the CIFS server inside of the SONAS maintains a mapping table of this SID to the UID/GID
kept by GPFS. This CIFS server mapping table is not carried forward to the destination
SONAS.
Given this, when users attempt to talk to the SONAS at the remote site, they will not have a
mapping from their Active Directory SID to the UID/GID of the destination SONAS, and their
authentication will not work properly, for example users may map to wrong users files.
LDAP, NIS and AD with SFU maintain the SID to UID/GID mapping external to the SONAS,
and therefore as long as their authentication mechanism is visible to the SONAS at the
source and the destination site they do not have a conflict with the users and groups.
The following assumptions are made for the environment supporting async replication:
򐂰 One of the following authentication mechanisms: either an LDAP or AD w/SFU
environment which is resolvable across their sites, or is mirrored/consistent across their
sites such that the SONAS at each site is able to authenticate from each location.
򐂰 The authentication mechanism is the same across both locations
򐂰 The time synchronization across both sites is sufficient to allow for successful
authentication
Async replication considerations

This section outlines some of the key considerations of async replication design and
operation which needs to be highlighted and understood.

At release 1.1.1 we suggest you limit the files in a filesystem that uses async replication to
approximately 60 million files to limit scan time and avoid scalability issues.
Replication is done on a filesystem basis and filesets on the source SONAS cluster do not
retain the fileset information on the destination SONAS cluster. The file tree on the source is
replicated to the destination, but the fact that it is a fileset, or any quota information, is not
carried forward to the destination cluster's file tree.
The path to the source and destination locations given to the underlying cnreplicate CLI
command must not contain ':' '\' '\n' or any whitespace characters. The underlying paths within
the directory tree being replicated are allowed to have them.
The network bandwidth required to move large amounts of data, such as the first async
replication of a large existing file system or the failback to an empty SONAS after a disaster,
will take large amounts of time and network bandwidth to move the data. Other means of
restoring the data, such as physical restore from a backup, would be a preferred means of
populating the destination cluster would greatly reduce the restore time and reduce the
burden on the network.
Disk I/O, the I/O performance is driven by GPFS and its ability to load balance across the
nodes participating in the file system. Async replication performance is driven by metadata
access for the scan part, and customer data access for the rsync movement of data. The
number and classes of disks for metadata and customer data are an important part of the
overall performance.
TSM HSM stub files are replicated as regular files, and an HSM recall is performed for each
file, ot they can be omitted using the command line.
HSM considerations in an asynch replication environment

Async replication can co-exist with SONAS file systems being managed via the TSM HSM
software, which seamlessly moves files held within a SONAS file system to and from a
secondary storage media such as tape.
The key concept is that the TSM HSM client hooks into the GPFS file system within the
SONAS, to replace a file stored within the SONAS with a stub file which appears to the end
user that it still exists within the SONAS GPFS file system on disk, but actually has been
moved to the secondary storage device. Upon access to the file, the TSM HSM client
suspends the GPFS request for data within the file, until it to retrieve the file from the
secondary storage device and replace it back within the SONAS primary storage. At which
point the file can be accessed directly again from the end users through the SONAS.
The primary function of this is to allow for the capacity of the primary storage to be less than
the actual amount of data it is holding, using the secondary (cheaper/slower) storage to retain
the overflow of data. The following list has key implications with using the HSM functionality
with file systems being backed up for disaster recovery purposes with async replication :
Source and destination primary storage capacities

The primary storage on the source and destination SONAS systems should be reasonably
balance in terms of capacity. Since HSM allows for the retention of more data than primary
storage capacity and async replication is a file based replication, planning must be done to
ensure the destination SONAS system has enough storage to hold the entire contents of the
source data (both primary and secondary storage) contents.
HSM management at destination

If the destination system uses HSM management of the SONAS storage, enough primary
storage at the destination should be considered to ensure that the change delta to be

replicated over into its primary storage as part of the DR process. If the movement of the data
from the destination location's primary to secondary storage is not fast enough, the replication
process can outpace this movement causing a performance bottleneck in completing the
disaster recovery cycle.
Therefore, the capacity of the destination system to move data to the secondary storage
should be sufficiently configured to ensure that enough data has been pre-migrated to the
secondary storage to account for the next async replication cycle and the amount of data to
be replicated can be achieved without waiting for movement to secondary storage. For
example, enough TSM managed tape drives will need to be allocated and operational,
enough media, to ensure enough data can be moved from the primary storage to tape, in
order to ensure that enough space is available to the next wave of replicated data.
Replication intervals with HSM at source location

Planning should be done to ensure that the frequency of the async replication is such that the
changed data at the source location is still in primary storage when the async process is
initiated. This should require a balance with the source primary storage capacity, the change
rate in the data, and the frequency of the async replication scan intervals.
If changed data is moved from primary to secondary storage before the async process can
replicate it to the destination, the next replication cycle will need to recall it from the
secondary storage back to the primary in order to copy it to the destination. The amount of
files which need to be recalled back into primary storage and the duration to move them back
into primary storage will directly impact the time which the async process will need in order to
finish replicating.
SONAS async replication configurations

For business continuance in a disaster, SONAS supports an asynchronous replication
between two SONAS systems in a 1:1 relationship. The SONAS systems are distinct from
one another, such that they are independent clusters with a non-shared InfiniBand
infrastructure, separate interface, storage and management nodes and so on. The
connectivity between the systems is via the customer network between the customer facing
network adapters in the interface nodes. The local and remote SONAS systems do not
require the same hardware configuration in terms of nodes or disks, only the space at the
secondary site needs to be enough to contain the data replicated from the primary site.
The systems must be capable of routing network traffic between one another using the
customer supplied IP addresses or fulluy qualified domain names on the interface nodes.
Async replication in single direction

There are two primary disaster recovery topologies for a SONAS system. The first is where
the second site is a standby disaster recovery site, such that it maintains a copy of file
systems from the primary location only. It can be used for testing purposes, for continuing
production in a disaster, or for restoring the primary site after a disaster. Figure 6-25 on
page 204 illustrates the relationship between the primary and secondary sites for this
scenario:

AD w/SFU DR
Users LDAP Users
NIS
File tree A File tree A replica

rsync
File tree A snapshot File tree A replica snapshot
Figure 6-25 Async replication with single active direction
Async replication in two active directions

The second scenario shown in Figure 6-26 on page 204 is when the second site exports
shares of a file system in addition to holding mirrors of a file tree from the primary site. This
scenario is when the SONAS at both sites is used for production I/O, in addition to being the
target mirror for the other SONAS system's file structure. This may be in both directions, such
that both SONAS systems have their own file trees, in addition to the having the file tree of
the other; or may be that both have their own file tree, and only one has the mirror of the
other.
Common
User User
AD w/SFU
group A group B
LDAP, NIS

rsync
File tree B replica File tree B

rsync
File tree B replica snapshot File tree B snapshot
Figure 6-26 Bidirectional async replication and snapshots
Async replication configuration

The asynchronous replication code runs on the management and interface nodes. The
configuration of async replication must be coordinated between the destination SONAS
system with the source SONAS system. Asynchronous replication processes run on one or
more nodes in the cluster
This is done through administration commands, you start on the destination SONAS system:
1. Define the source SONAS system to the destination SONAS

cfgrepl sourcecluster -target

Where sourcecluster is the hostname or IP address of the source cluster's Management
Node.
2. Define the file tree target on the destination SONAS to hold the source SONAS file tree.
This creates the directory on the destination SONAS to be used as the target of the data
for this replication relationship.
mkrepltarget path sourcecluster
Where path is the file system path on the destination SONAS which should be used to
hold the contents of the source SONAS file tree and the source cluster is the hostname or
IP address of the source cluster's Management Node (matching the one provided to the
cfgrepl command).
Once the destination SONAS system is defined, the source SONAS needs to be configured
through the following administrative actions:
1. Configure the async relationship on the source SONAS cluster
cfgrepl targetcluster {-n count | --pairs source1:target1 [, source2:target2
…]} --source
where:
– targetcluster is the hostname or IP address of the target cluster's Management Node
– count is the number of node pairs to use for replication
– pairs is the explicit mapping of the source/destination node pairs to use for replication
2. Define the relationship of the source file tree to the target file tree
cfgreplfs filesystem targetpath
where filesystem is the source file tree to be replicated to the destiantion and targetpath is
the full path on the destination where the replica of the source should be made.
The configuration of the async replication determines how the system performs the mirroring
of the data for disaster recovery. The configuration step identifies which SONAS nodes
participate in the replication for the source and destination systems.
At least one source and target pair must be specified with this CLI command and multiple can
be entered separated by commas. When setting up replication use this command, the
following restrictions are in place:
򐂰 All source nodes must be in the same cluster.
򐂰 The IP addresses of the source node should be the "Internal" IP addresses associated
with the Infiniband network with the SONAS
򐂰 All target nodes must be in the same cluster. The IP addresses of the target nodes should
be the public IP addresses of the Interface nodes which CTDB control
򐂰 Source and target cannot be in the same cluster
򐂰 The first source node specified controls the replication, and is considered the replication
manager node
򐂰 Multiple source nodes can replicate to the same destination
The cfgrepl command creates a configuration file, /etc/asnc_repl/arepl_table.conf, which

contains the information provided with the following internal structure:
src_addr1 dest_addr1

Part of the async configuration needs to ensure that the source cluster can communicate to
the destination cluster without being challenged with the SSH/scp password requests. To
achieve this, the ssh key from the id_rsa.pub from the destination SONAS system needs to
be added to the authorized_keys file of the source nodes participating in the async operation.
Async replication operation

The primary function of the async replication is to make a copy of the customer data,
including file system metadata, from one SONAS system to another over a standard IP
network. The design also attempts to minimixe network bandwidth useage by only moving the
portions of the file which have been modified to the destination system.
The primary elements of the async replication operation include:

򐂰 SONAS code performs key replication tasks such as scanning for changed files, removing
files which are deleted at the source on the destination and recovery and retry of failures.
򐂰 Unix rsync replication tool for comparing the source/destination files for differences, and
only moving and writing the delta information on the destination to ensure the destination
matches the source.
The main steps involved in the async replication process are enumerated below:
1. Create local snapshot of source filesystem
2. Scan and collect a full file path list with the stat information
3. Build a new, changed and deleted file and directory list, including hard links
4. Distribute rsync tasks among defined nodes configured to participate in async replication
5. Remove deleted files and create hard links on the remote site
6. Create remote snapshot of replica file system if indicated in async command
7. Remove local snapshot if created from specified async command
Async replication tools will, by default, create a local snapshot of the file tree being replicated,
and use the snapshot as the source of the replication to the destination system. This is the
preferred method as it creates a well defined point-in-time of the data being protected against
a disaster. The scan and resulting rsync commands would be invoked against a stable,
non-changing file tree which provides a known state of the files to be coordinated with the
destination. Async replication does have a parameter which tells the system to skip the
creation of the snapshot of the source, but the scan and following rsync will be performed on
changing files. This has the following implications:
򐂰 Inconsistent point-in-time value of the destination system, as changes to the tree during
the async process would cause files scanned and replicated first to be from an earlier
state potentially than the files later in the scan.
򐂰 Files changed after the scan cycle had taken place would be omitted from the replication
򐂰 A File could be in flux during the rsync movement
The name of the snapshot is based off of the path to the async replication directory on the
destination system, with the extension _cnreplicate_tmp appended to it. For example, if the
destination file tree for async is /ibm/gpfsjt/async, then the resulting snapshot directory will be
created in the source file system:
/ibm/gpfs0/.snapshots/ibm_gpfsjt_async_cnreplicate_tmp
These snapshots are alongside any other snapshots created by the system as a part of user
request. The async replication tool will ensure that it only operates on snapshots it created
with its own naming convention. These snapshots do count towards the 256 snapshot limit

per a file system, and should therefore be accounted for with the other snapshots used by the
system. After the successful completion of async replication, the snapshot created in the
source file system is removed.
After the completion of the async replication, a snapshot of the filesystem containing the
replica target is performed. The name of the snapshot is based off of the destination path to
the async replication directory with the extension _cnreplicate_tmp appended to it.
As with source snapshots, these snapshots are alongside any other snapshots created by the
system as a part of user request. The async replication tool will ensure that it only operates
on snapshots it created with this naming convention. These snapshots do count towards the
256 snapshot limit per a file system, and should therefore be accounted for with the other
snapshots used by the system.
Replication frequency and Recovery Point Objective considerations

To ensure that data in the remote SONAS sites is as current as possible and has small
Recovery Point Objective it would seem natural to run async replication as frequently as
possible. The frequency of the replication would need to take into account a number of
factors, including:
򐂰 The change rate of the source data
򐂰 The number of files contained within the source file tree
򐂰 The network between SONAS systems, including bandwidth, latency, and sharing aspects
򐂰 The number of nodes participating in the async replication
A replication cycle has to complete before a new cycle can be started. The key metric in
determining the time it takes for a replication cycle to complete is the time it takes to moved
the changed contents of the source to the destination based on the change rate of the data
and the network capabilities.
For example, a 10 TB file tree with a 5% daily change rate would need to move 500 GB of
data over the course of a day (5.78 MB/s average over the day). Note: actual daily change
rates are probably not consistent over the 24 hour period, and should be based off of the
maximum change rate per hour of over the day. The required network bandwidth to achieve
this would be based on the Recovery Point Objective (RPO). With a RPO of 1 hour, enough
network bandwidth would be needed to ensure that the maximum change rate over the day
can be replicated to the destination in under an hour.
Part of the async replication algorithm is the determination of the changed files, which can be
a CPU and disk intensive process which should be accounted for as part of the impact.
Continually running replications below the required RPO could cause undue impact to other
workloads using the system.
Async replication scenarios

Pefore performing async replication verify that the following conditions are met:
򐂰 Ensure you have consistent Active Directory with SFU or LDAP authentication across the
sites participating in the disaster recovery environment.
򐂰 Mapping of users across both sites need to be consistent from Windows SID Domain to
Unix UID/GID
򐂰 Ensure sufficient storage at destination for holding replica of source file tree and
associated snapshots
򐂰 Network between source and destination need to be capable of supporting SSH
connections and rsync operations.

򐂰 The network between the source and destination Interface Nodes should have sufficient
bandwidth in order to account for the change rate of data being modified at the source
between replicas, and the required RTO/RPO objectives to meet disaster recovery
criteria.
򐂰 Define async relationship between interface nodes of the source and destination, define
target filesystem, and create the source/destination file system relationship with cfgreplf,
mkrepltarget, and cfgreplfs commands
Performing async replications

The following are the considerations and actions to protect the data against an extended
outage or disaster to the primary location. The protection is via carrying out async replications
between the source and destination systems.
򐂰 Perform async replication between source and destination SONAS systems. Replication
can be carried out manually or via scheduled operation.
– Manually invoke the startrepl command to initiate an async replication cycle against
the directory tree structure specified in the command for the source and destination
locations.
– Define an automated schedule for the async replication to be carried out by the system
on defined directory tree structures.
򐂰 Monitor the stats of the current and previous async replication processes to ensure a
successful completion.
– Async replication will raise a CIM indication to the Health Centere, which can be
configured to generate SMTP and/or SNMP alerts.
Disaster recovery testing

Define shares as R/O to destination file tree for accessing file resources at destination.
򐂰 Modification of the destination file tree as part of the validation of data or testing DR
procedures should not be done. Changes to destination file tree are not tracked, and will
cause the destination to differ from the source
򐂰 FTP, HTTP, and SCP shares cannot be created R/O, and are a point of risk in being able
to modify the target directory tree. Note: modifications to the target directory tree are not
tracked by the DR recovery process, and can lead to discrepancies between the source
and target file tree structures.
You must access disaster recovery location file structure as read-only . You must create the
shares at the destination site which are to be used to access the data from the disaster
recovery location.
Business continuance
The overall steps for enabling the recovery site involve the following major components:
1. Perform baseline file scan of file tree replica used as the target for the async replication
2. Define shares/exports to the file tree replica
3. Continue production operation against remote system
The baseline scan will establish the state of the remote system files which was last received
by the production site, which will track the changes made from this point forward. For the
configuration where the secondary site was strictly only a backup for the production site,
establishing the defined shares for the replica to enable it for production is the primary
consideration. Figure 6-27 on page 209 illustrates this scenario:

AD w/SFU DR
Users LDAP Users
NIS

rsync
Figure 6-27 Business continuance, active - passive, production site faliure
If the second site contained its own production file tree in addition to replicas, then the failure
also impacts the replication of its production file systems back to the first site as illustrated in
Figure 6-28:
Common
User User
AD w/SFU
group A group B
LDAP, NIS
Figure 6-28 Business continuance, active - active, production site failure
The steps to recover at the disaster recovery site are:

򐂰 Run the startrepl command with -S parameter to run a scan only on the destination
system to establish a point in time of the current file tree structure. This allows the system
to track changes to the destination file tree in order to assist in delta file update back to the
original production system.
򐂰 Define shares to destination file systems as R/W using the mkexport command, or change
existing R/O shares used for validation/testing to R/W using the chexport command.
򐂰 Proceed with R/W access to data at disaster recovery location against the file tree.
Recovery from disaster

The recovery of a SONAS system at a site following an extended outage will depend on the
scope of the failure. The following are primary scenarios from the resulting outage:
򐂰 The failing site was completely lost, such that no data was retained

򐂰 The failing site had an extended outage, but data was retained
򐂰 The failing site had an extended outage, and some amount of data has been lost.
Assumed the amount and scope of the loss is unknown.
Recovery to an empty SONAS

If the failing site was completely lost, the recovery must take place against an empty system,
either a new site location with a new SONAS system or the previous SONAS system was
restored but contains none of the previously stored data. For the purposes of this document,
it is assumed that the SONAS system has been installed, configured with IP addresses,
connections to authentication servers, have been completed to be able to bring the system to
an online state.
The recovery steps for an active-passive configuration are the following:

1. Configure the async replication policies such that the source to destination relationship
moves from the secondary site to the new primary site. For new primary site, you need to
enable it to be the destination of an async relationship and create target file tree for async
replication. For the secondary site, you configure it as an async source, and define the
async relationship with its file tree as the source and the one configured on the new
primary site as the target.
2. Perform async replication back to the new primary site and note that it can take a long
time to transfer the entire contents electronically, the time is based on the amount of data
and the network capabilities.
3. Halt production activity to secondary site, perform another async replication to ensure that
primary and secondary sites are identical
4. Perform baseline scan of primary site file tree
5. Define exports/shares to primary site
6. Begin production activity to primary site
7. Configure async replication of the source/destination nodes to direct replication back from
the new primary site to the secondary site.
8. Resume original async replication of primary to secondary site as previously defined
before disaster.
Figure 6-29 on page 210 illustrates disaster failback to an empty SONAS:
AD w/SFU DR
Users LDAP Users
NIS

rsync
Figure 6-29 Disaster failback to an empty SONAS

In the scenario where the second site was used for both active production usage and as a
replication target, the recovery would be as illustrated in Figure 6-30.
Common
User User
AD w/SFU
group A group B
LDAP, NIS

rsync

rsync
Figure 6-30 Failback to an empty SONAS in an active-active environment
The loss of the first site also would have lost the replica of the second's site file systems,
which will need to be replicated back to the first site. The outline of the recovery steps for an
active active configuration are the following:
򐂰 Configure the async replication policies such that the source to destination moves from
secondary site to the new primary site for file tree A
򐂰 Perform async replication with "full" replication parameter back of file tree A to new
primary site; the time to transfer the entire contents electronically can be long time, based
on the amount of data and network capabilities.
򐂰 Halt production activity to secondary site, perform another async replication to ensure that
primary and secondary sites are identical
򐂰 Perform baseline scan of file tree A at site 1
򐂰 Define exports and shares to file tree A at site 1
򐂰 Begin production activity to file tree A at site 1
򐂰 Configure async replication of the source/destination nodes to direct replication back from
new primary site to secondary site for file tree A
򐂰 Resume original async replication of file tree A from new primary site to secondary site
򐂰 For the first async replication of file tree B from secondary site to new primary site, ensure
that the ful replication parameter is invoked, to ensure that all contents from file tree B are
sent from secondary site to new primary site.
6.5 Disaster recovery methods

To rebuild a SONAS cluster, in the case of a disaster that makes the whole SONAS cluster
unavailable, two types of data are required:
򐂰 The data contained on the SONAS cluster
򐂰 The SONAS cluster configuration files

The data contained in the SONAS cluster can be backed up to a backup server such as TSM
or other supported backup product or to recover the data from a remote intracluster replica of
the data to a remote cluster or file server.
6.5.1 Backup of SONAS configuration information

SONAS configuration information can be backed up using the backupmanagementnode
SONAS CLI command. This command makes a backup from the local management node,
where the command is running on, and stores it on another remote host or server.
This command allows you to backup one or more of the following SONAS configuration
components:
򐂰 auth
򐂰 callhome
򐂰 cimcron
򐂰 ctdb
򐂰 derby
򐂰 misc
򐂰 role
򐂰 sonas
򐂰 ssh
򐂰 user
򐂰 yum
The command alllows you to specify how many previously preserved backup versions must
be kept and the older backups are deleted. The default value is three versions. You can also
specify the target host name where the backup is stored, by default the first found storage
node of the cluster and the target directory path within the target host where the backup is
stored, by default /var/sonas/managementnodebackup. The example in Figure 6-31 shows
the backupmanagementnode command used to backup management node configuration
information for components auth,ssh,ctdb,derby:
[root@sonas02 bin]# backupmanagementnode --component auth,ssh,ctdb,derby

EFSSG0200I The management node mgmt001st002.virtual.com(10.0.0.20) has been successfully backuped.
[root@sonas02 bin]# ssh strg001st002.virtual.com ls /var/sonas/managementnodebackup

mgmtbak_20100413041835_e2d9a09ea1365d02ac8e2b27402bcc31.tar.bz2
mgmtbak_20100413041847_33c85e299643bebf70522dd3ff2fb888.tar.bz2
mgmtbak_20100413041931_547f94b096436838a9828b0ab49afc89.tar.bz2
mgmtbak_20100413043236_259c7d6876a438a03981d1be63816bf9.tar.bz2
Figure 6-31 Activate data replication
Note: Whereas administrator backup of management node configuration information is

allowed and documented in the manuals, the procedure to restore the configuration
information is not documented and needs to be performed under the guidance of IBM
support personnel.
The restoration of configuration data is done using the cnmgmtconfbak command that is used
by the GUI when building up a new management node. The cnmgmtconfbak command can
also be used for listing of available archives and it requires you to specify --targethost <host>
and --targetpath <path> to any backup/restore/list . Figure 6-32 on page 213 shows the
command switches and how to get a list of available backups:

[root@sonas02]# cnmgmtconfbak
Usage: /opt/IBM/sofs/scripts/cnmgmtconfbak <command> <mandatory_parameters> [<options>]
commands:
backup - Backup configuration files to the bak server
restore - Restore configuration files from the bak server
list - List all available backup data sets on the selected server
mandatory parameters:
--targethost - Name or IP address of the backup server
--targetpath - Backup storage path on the server
options: [-x] [-v] [-u N *] [-k N **]
-x - Debug
-v - Verbose
--component - Select data sets for backup or restore (if archive contains
data set. (Default:all - without yum!)
Legal component names are:
auth, callhome, cim, cron, ctdb, derby, role, sonas, ssh, user, yum, misc
(Pls. list them separated with commas without any whitespace)
only for backup
-k|--keep - Keep N old bak data set (default: keep all)
only for restore
-p|--fail_on_partial - Fail if archive does not contain all required components
-u|--use - Use Nth bak data set (default: 1=latest)
[root@sonas02]# cnmgmtconfbak list --targethost strg001st002.virtual.com --targetpath (..cont..)

/var/sonas/managementnodebackup
1 # mgmtbak_20100413043236_259c7d6876a438a03981d1be63816bf9.tar.bz2
2 # mgmtbak_20100413041931_547f94b096436838a9828b0ab49afc89.tar.bz2
3 # mgmtbak_20100413041847_33c85e299643bebf70522dd3ff2fb888.tar.bz2
4 # mgmtbak_20100413041835_e2d9a09ea1365d02ac8e2b27402bcc31.tar.bz2
Figure 6-32 Configuration backup restore command
Note: You can backup the configuration data to a remote server external to the SONAS
cluster by specifying the --targethost switch. The he final copy of the archive file is
performed by the scp command, so the target remote server can be any server to which
we have a passwordless access estabilished. Establishing passwordless access to a
remote server does require root access to the SONAS cluster.
6.5.2 Restore data from a traditional backup

The data contained in the SONAS cluster can be backed up to a backup server such as TSM
or other supported backup product; Using that backup it is possible to recover all the data that
was contained in the SONAS cluster. Backup and restore procedures are discussed in more
detail in 6.2, “Backup and restore of file data”.
6.5.3 Restore data from a remote replica

SONAS data can also be recovered from SONAS data replicas stored on a remote SONAS
cluster or on a filesererver that is the target for SONAS asynchronous replication. To recover
data stored on a remote system you can use utilitues such as xcopy and rsync to copy the
data back to the original location. The copy can be performed from one of two places:
1. From a SONAS interface node on the remote system using asynchronous replication to
realign the data

2. From an external SONAS client that mounts the shares for both the remote system that
contain a copy of the data to be restored and for the local system that needs to be
repopulated with data.
The first system requires that the remote system be a SONAS cluster whereas the second
method will work regardless of the type of remote system.
For additional information on how to recover from an asynchronous replica refer to ,

“Recovery from disaster” on page 209.

Draft Document for Review November 1, 2010 9:32 am 7875ConfigSize.fm
Chapter 7. Configuration and sizing

This chapter provides information on the different SONAS configurations and sizing
information that needs to be considered before ordering your SONAS appliance.
򐂰 What you need to know and do to order the box
򐂰 SONAS capacity planning tools
򐂰 Guidelines

7875ConfigSize.fm Draft Document for Review November 1, 2010 9:32 am
7.1 Tradeoffs between configurations

As explained in Chapter 2, “Hardware architecture” on page 41 the SONAS solution has been
designed to have the best flexibility between user needs and storage performances.
SONAS can have multiple configurations according to your needs, which can go from the
rack level to the device level inside Interface Nodes. Table 7-1 on page 216, provides a
summary of the SONAS product names and the corresponding IBM machine type/model
numbers (MTMs) assigned to each product. All SONAS hardware products will be under a
single IBM machine type of 2851.
Table 7-1 SONAS configurations and model numbers

IBM Product Name Model number
SONAS Interface node 2851-SI1
SONAS Management node 2851-SM1
SONAS Storage node 2851-SS1
SONAS RAID storage controller 2851-DR1
SONAS Storage expansion unit 2851-DE1
SONAS 36-port Infiniband switch 2851-I36
SONAS 96-port Infiniband 2851-96
SONAS Base rack 2851-RXA
SONAS Storage Expansion rack 2851-RXB
SONAS Interface expansion 2851-RXC

pack
7.1.1 Rack configurations

In this section we will describe all available configurations, from a macro level (rack) to a
micro level (hardware device).
򐂰 SONAS Rack
– SONAS Base Rack
Three versions of the SONAS Base Rack are available. Only one can lead to the
smallest SONAS configuration, see Figure 7-2 on page 220. The two versions
remaining have to be used with Storage Expansion Racks (see Figure 7-3 on page 221
and Figure 7-4 on page 222) because they do not include a Storage Pod.
– SONAS Storage Expansion Rack
Expansion Rack by definition which can not be used alone and need to be connected
to a Base Rack, see Figure 7-5 on page 223.
– SONAS Interface Expansion Rack
Expansion Rack by definition which can not be used alone and needs to be connected
to a Base Rack, see Figure 7-6 on page 224.

7.1.2 Switch configurations

The section “Switches” on page 47 provides information on the internal and external switches
in the SONAS appliance. In this section we discuss considerations related to the switches
that should be considered prior to ordering your SONAS appliance.
򐂰 Infiniband switch configuration
– All major components of a SONAS system like interface nodes, storage nodes and
management node are interconnected by a high-performance low-latency InfiniBand
4X Double Data Rate (DDR) fabric. The 36 port configuration with two Infiniband
switches for redundancy, allows you to interconnect up to 36 nodes inside your SONAS
cluster. Nodes can be one Management Node, some Storage Nodes and some
Interface Nodes. Remind that you have two Storage Nodes per Storage Pod.
– Like the 36 port configuration, the 96 port configuration has actually two switches for
redundancy and can provide you the largest SONAS configuration with up to 60
Storage Nodes (or 30 Storage Pods) and 30 Interface Nodes, and obviously the
Management Node. However the 96 port Infiniband switch is actually made of up to
four Board Lines. Each Board Line is composed of 24 infiniband ports, which means
that even with the 96 port configuration you may start with only 24 ports and grow if
needed.
Note that there is no inexpensive way to change back from a 36 port configuration to a
96 port configuration as they are part of different Base Rack Models. So configure your
SONAS Storage Solution accordingly. Infiniband is the layer which interconnect all
Storage Nodes, Interface Nodes and Management Nodes. Note however that four
InfiniBand ports of each InfiniBand switch are reserved for the following components:
• Two reserved for future use
• One for the required management node
• One for the optional redundant management node
The remaining InfiniBand ports are available for interface nodes and storage nodes.
You can find in Table 7-3 on page 225 and Table 7-4 on page 226, the maximum
capacity available inside your SONAS Storage Solution based on Infiniband switch
configuration.
7.1.3 Storage Pod configuration

The section “Storage pods” on page 50 provides details on the Storage Pods in the SONAS
appliance and should be reviewed prior to reading this section. This section provides
considerations related to Storage Pod configuration.
– Controller configuration.
Inside a Storage Pod you can have at least one Storage Controller which is mandatory,
and up to two Storage Controllers, each one with an optional Storage Expansion.
Intermediate configurations are allowed and performance scales up with the number of
disks inside the Storage Pod, up to 100% from the single Storage Controller
configuration when adding another Storage Controller, and up to 85% when adding a
Storage Expansion.
– SAS drive configuration.
You can choose to fill one Storage Controller or one Storage Expansion with SAS or
SATA drives. Basically SAS drives have a faster spindle speed than SATA drives and
are more fault tolerant. But they have smaller capacity - 450 GB for SONAS solution.
They also require more power consumption. The maximum number of SAS drives
Chapter 7. Configuration and sizing 217

inside a single Storage Expansion Rack is 360. Which means one full Storage Pod and
a second one with Storage Controller only.
– SATA drive configuration.
Basically SATA drives have a larger capacity than SAS drives, up to 2 TB in SONAS
configuration, and require less power than SAS ones. That is why the maximum
configuration inside a single Storage Expansion rack with SATA drives is 480 drives.
Moreover you can choose between 1TB or 2TB configuration. Note that you can not
mix inside the same Storage pool (Storage Controller or Storage Expansion).
7.1.4 Interface Node configuration

The section “Interface nodes” on page 43 should be reviewed for an understanding of the
function of the Interface Node before reading this section. This section provides
considerations related to Interface Node configuration.
򐂰 Memory capacity. By default one Interface node comes with 32 GB of memory. This
amount of memory is used for caching. From a performance perspective Interface Nodes
are caching frequently used files in memory. As SONAS is design to keep the connection
between one client and one Interface Node until the client unmounts the share access,
clients performance will increase due to this caching mechanism. For enhanced
performance you can increase this amount of memory and then your chance to have files
still in cache, by purchasing additional 32GB of memory (FC 1000) or 128GB memory (FC
1001) in your Interface Node.
Feature code 1000 provides an additional 32 GB of memory in the form of eight 4 GB
1333MHz double-data-rate three (DDR3) memory dual-inline-memory modules (DIMMs).
You can order only one of FC 1000 per interface node. FC 1001 installs a total of 128GB
of memory in the interface node. Installation of FC 1000 or or FC 1001 into an already
installed interface node is a disruptive operation that requires you to shut down the
interface node. However, a system with a functioning interface node continues to operate
with the absence of the interface node being upgraded.
򐂰 Client network connectivity. We introduce SONAS concept and mechanism in Chapter 1,
“Introduction to Scale Out File Network Attached Storage” on page 1. We especially
mention how clients are accessing data through Interface Nodes. This access is physically
allowed by the connectivity between Interface Nodes and clients, namely the client
network.
Basically each interface node has five 1 gigabit Ethernet (GbE) path connections (ports)
on the system board, two of the onboard Ethernet ports connect to the internal private
management network within the SONAS system for health monitoring and configuration.
One is used for connectivity to the Integrated Management Module (IMM) that enables the
user to remotely manage the interface node, the two onboard Ethernet ports remaining
are used for the client network.
By default this two 1GigE connections are configured in active/failover mode. You can
change this default configuration to an aggregate mode and then increase the theoretical
bandwidth from 1Gb/s to 2Gb/s. If this default bandwidth does not fulfill clients needs, you
may add an extra Quad-port 1GbE NIC (FC 1100) adapter.
This feature provides a quad-port 10/100/1000 Ethernet PCIe x8 adapter card. This NIC
provides four RJ45 network connections for additional host IP network connectivity. This
adapter supports a maximum distance of 100m using Category 5 or better unshielded
twisted pair (UTP) four-pair media. You are responsible for providing the network cables to
attach the network connections on this adapter to their IP network. One of feature code
1100 can be ordered for an interface node. The manufacturer of this card is Intel, OEM
part number: EXPI9404PTG2L20.

The other option is to add an extra Dual-port 10Gb Converged Network Adapter (FC
1101). Indeed this feature provides a PCIe 2.0 Gen 2 x8 low-profiles dual-port 10Gb
Converged Network Adapter (CNA) with two SFP+ optical modules. The CNA supports
short reach (SR) 850nm multimode fiber (MMF). You are responsible for providing the
network cables to attach the network connections on this adapter to their IP network. One
of feature code 1101 can be ordered for an interface node. The manufacturer of this card
is Qlogic, OEM part number: FE0210302-13.
Last option is to purchase both adapters.
Figure 7-2 is a summary of all available connectivity configurations within a single
Interface node.
Table 7-2 Number of ports available in Interface Node.

Number of ports in various configurations of a single Interface Node
on board 1GbE Available features Total number of data

connectors path connectors
Feature Code 1100, Feature Code 1101,
Quad-port 1GbE Dual-port 10GbE
Network Interface Converged Network
Card (NIC) Adapter (CNA)
2 0 0 2
2 0 (1 with two ports) 4
2 1 (with 4 ports) 0 6
2 1 (with 4 ports) 1 (with 2 ports) 8
򐂰 Cabling consideration
For each Interface node in the base rack no InfiniBand cables need to be ordered. Copper
InfiniBand cables are automatically provided for all Interface nodes in the base rack. The
length of the copper InfiniBand cables provided is based on the position of the Interface
node in the rack. You must however order InfiniBand cable features for inter-rack cabling
after determining the layout of your multi-rack system in case of Interface Expansion Rack
required for instance. Indeed multiple infiniband cable features are available, main
difference is if you are using a 36 or 96 ports Infiniband switch configuration. Indeed
connectors are not the same inside the two models, the 36 ports model requires QSFP
connectors while the 96 ports model requires X4 connectors as shown in Figure 7-1 .
Figure 7-1 Infiniband connectors.
For additional Quad-Port adapter, Cat 5e cables or better are required to support 1 Gb
network speeds, but Cat 6 provides better support for 1 Gbps network speeds.

The 10 GbE data-path connections support short reach (SR) 850 nanometer (nm)
multimode fiber (MMF) optic cables that typically can reliably connect equipment up to a
maximum of 300 meters (M) using 50/125? (2000MHz*km BW) OM3 fiber.
7.1.5 Rack configurations

This section provides configuration considerations related to the available SONAS racks.
Feature Code 9005 Base Rack
The Feature Code 9005 Base Rack is the only

Base Rack configuration which can lead to the
smallest SONAS configuration. In this rack
several elements are mandatory, including:
򐂰 Embedded GigE switches
򐂰 Management Node
򐂰 Infiniband switches
򐂰 Two Interface Nodes
򐂰 Two Storage Nodes
򐂰 One Storage Controller.
Assuming 2TB SATA drives in the Storage

Controller the capacity would be 240 TB.
If more capacity is required, Disk Storage

Expansion and Storage Controller can be added
to have a full Storage Pod configuration.
Should future growth require an additional

Storage Pod, or more Interface nodes are
required, Storage Expansion racks and Interface
Expansion racks can be attached to this Base
Rack.
The only limitation here would be the number of

Infiniband connections remaining on the 36 ports
Infiniband switches.
Figure 7-2 Base Rack Feature Code 9005
For additional information on the SONAS Rack Base Feature Code 9005 refer to “Rack types
- how to choose correct rack for your solution” on page 61.

Feature Code 9003 Base Rack
The Feature Code 9003 Base Rack configuration

has interface nodes. It does need a storage
expansion rack. In this rack some elements are
mandatory:
All these mandatory components have to be used

with an additional Storage Expansion rack.
There are no Storage Pods in this Base Rack.

According to the Storage capacity needed, you
have to add as many Storage Expansion racks as
required.
If more Interface Nodes are needed for external

client network connectivity, you may add one
Interface Node Expansion rack.
The only limitation here would be the number of

Infiniband connections remaining on the 36 port
Infiniband switches.
For additional information on the Feature Code 9003 SONAS Rack Base refer to “Rack types

Feature Code 9004 SONAS Base Rack
Feature Code 9004 Base Rack configuration can

be used in a SONAS configuration where a large
96 port switch is needed. In this rack the
following elements are required:
These mandatory components have to be used

with an additional Storage Expansion rack.
Note that there are no Storage Pods in this Base

Rack. Based on your Storage capacity
requirements, you have to add as many Storage
Expansion racks as required.
If additional Interface Nodes are needed for

external client network connectivity; you may add
one Interface Node Expansion rack.
With the two 96 port Infiniband switches

configuration you are able to achieve the largest
SONAS configuration. Which means a 14.4 TB
capacity with 2TB SATA drives.
For additional information on the Feature Code 9004 SONAS Base Rack refer to “Rack types

Storage Expansion Rack
The Storage Expansion Rack can be used is

combination with a Base Rack. In this rack the
mandatory elements are:
򐂰 Two Storage Nodes
򐂰 One Storage Controller
All these mandatory components have to be used in

combination with one of the Base Racks.
If you need more capacity in your SONAS

configuration you may need to add Disk Storage
Expansion, or even a Storage Controller into the
existing Storage Pod located in the Base Rack.
The other option is to add additional Storage Pod

through a Storage Expansion Rack.
There are a minimum of two Storage Nodes per

Storage Expansion Rack each require an Infiniband
connection and up to four Storage Nodes.
The maximum number of Storage Pods in a SONAS

environment is 30, which lead to 15 Expansion Rack.
The above statement implies your are not limited in

terms of Infiniband connections.
Figure 7-5 Storage Expansion Rack.
To read more details on this SONAS Storage Expansion Rack refer to “SONAS storage
expansion unit” on page 53.

Interface Expansion Rack
The Interface Expansion Rack can be used is

combination with a Base Rack. In this rack some
elements are mandatory like:
򐂰 One Interface Node
All these mandatory components have to be used in

combination with a Base Rack (whatever the
version).
Indeed if you need more bandwidth for your client

network, you may have to add Interface Nodes into
your Base Rack. The other option is to add additional
Interface Nodes through an Interface Expansion
Rack. There is a minimum of one Interface Node per
Storage Expansion Rack which require an Infiniband
connection.
The maximum number of Interface Nodes in a

SONAS cluster is 30. This implies your are not
limited in terms of Infiniband connections.
Figure 7-6 Interface Expansion Rack.
To read more detail on this SONAS Interface Expansion Rack refer to “Rack types - how to
choose correct rack for your solution” on page 61.
7.2 Considerations for sizing your configuration

We described above the flexibility you have when designing the most appropriate SONAS
solution for your environment. According to your requirements see “Inputs for SONAS sizing”
on page 227 you will be able to size your SONAS environment in the “Sizing the SONAS
appliance” on page 235. This means determine:
򐂰 The appropriate number of Interface nodes,
򐂰 The appropriate capacity of the system and Storage Pods,
򐂰 The appropriate Infiniband switch configuration.
But also
򐂰 The appropriate client network connectivity,

򐂰 The appropriate amount of memory inside Interface Nodes,

򐂰 The appropriate disk technology.
As there is no inexpensive way to switch from a 36 port configuration to a 96 port

configuration, you have to size accordingly your basic SONAS environment. If you know for
sure that your needs will grow and then need many expansion rack you can still start with the
96 ports configuration but not fill the entire switch. Indeed a full 96 ports IB switches is
actually composed by four board lines of 24 IB ports.
This hardware configuration and sizing is the most difficult part. Indeed regarding the
software configuration, as everything is included with the SONAS software, you do not have
to worry about potential key features not included in and that you might need later.
Every piece of software from the Operating System to the backup client software is included,
for more details see, Chapter 3, “Software architecture” on page 73.
Table 7-1 on page 216 shows the maximum storage capacity using the 36 port Infiniband
switch.
Table 7-3 Maximum storage capacity with the 36 ports Infiniband switch configuration.
Interface Maximum Number of Maximum Maximum Maximum Maximum
nodes Storage storage number of number of number of storage
pods nodes Storage Disk Storage hard disk capacity (2
controllers Expansion drives TB SATA
units disks)
3 14 28 28 28 3360 6720
4 14 28 28 28 3360 6720
5 13 26 26 26 3120 6240
6 13 26 26 26 3120 6240
7 12 24 24 24 2880 5760
8 12 24 24 24 2880 5760
9 11 22 22 22 2640 5280
10 11 22 22 22 2640 5280
11 10 20 20 20 2400 4800
12 10 20 20 20 2400 4800
13 9 18 18 18 2160 4320
14 9 18 18 18 2160 4320
15 8 16 16 16 1920 3840
16 8 16 16 16 1920 3840
17 7 14 14 14 1680 3360
18 7 14 14 14 1680 3360
19 6 12 12 12 1440 2880
20 6 12 12 12 1440 2880
21 5 10 10 10 1200 2400

Interface Maximum Number of Maximum Maximum Maximum Maximum

nodes Storage storage number of number of number of storage
pods nodes Storage Disk Storage hard disk capacity (2
controllers Expansion drives TB SATA
units disks)
22 5 10 10 10 1200 2400
23 4 8 8 8 960 1920
24 4 8 8 8 960 1920
25 3 6 6 6 720 1440
26 3 6 6 6 720 1440
27 2 4 4 4 480 960
28 2 4 4 4 480 960
29 1 2 2 2 240 480
30 1 2 2 2 240 480
Table 7-4 on page 226 shows the maximum storage capacity using the 96 port Infiniband
switch.
Table 7-4 Maximum Storage Capacity with 96 port Infiniband switch configuration.
Number of Number of Number of Number of Disk Maximum Maximum
Storage pods Storage nodes Storage Storage number of hard storage
controllers Expansion disk drives capacity (2 TB
units SATA disks)
1 2 2 2 240 480
2 4 4 4 480 960
3 6 6 6 720 1440
4 8 8 8 960 1920
5 10 10 10 1200 2400
6 12 12 12 1440 2880
7 14 14 14 1680 3360
8 16 16 16 1920 3840
9 18 18 18 2160 4320
10 20 20 20 2400 4800
11 22 22 22 2640 5280
12 24 24 24 2880 5760
13 26 26 26 3120 6240
14 28 28 28 3360 6720
15 30 30 30 3600 7200
16 32 32 32 3840 7680

Number of Number of Number of Number of Disk Maximum Maximum

Storage pods Storage nodes Storage Storage number of hard storage
controllers Expansion disk drives capacity (2 TB
units SATA disks)
17 34 34 34 4080 8160
18 36 36 36 4320 8640
19 38 38 38 4560 9120
20 40 40 40 4800 9600
21 42 42 42 5040 10080
22 44 44 44 5280 10480
23 46 46 46 5520 11040
24 48 48 48 5760 11520
25 50 50 50 6000 12000
26 52 52 52 6240 12480
27 54 54 54 6480 12960
28 56 56 56 6720 13440
29 58 58 58 6960 13920
30 60 60 60 7200 14400
7.3 Inputs for SONAS sizing

We have previously described in detail the SONAS Architecture, both from a hardware and
software point of view. Then we pointed out different SONAS appliance configurations you
may have which provides you great flexibility.
The question you may have now is what is the most appropriate SONAS configuration? Will
your SONAS Storage Solution fit all your needs? Not to large not to small?
As explained in section above SONAS is a File System solution which provide access to
SONAS clients through network shares, NFS, CIFS or ftp. On top of these protocols are
running your daily business applications. These critical applications may even rely on ISV like
database, or work in combination with other like virtualization or backup solutions. To see
more details regarding the SONAS ISV support refer to the chapter titled ISV support in the
book IBM Scale Out Network Attached Storage Concepts, SG24-7874.
This means, depending of your application but also your entire software stack, you may
require more performance or capacity here or there in order to fit your requirements. Basically
like all Storage solutions, the better you know how works your application the easier it is to
size the Storage which will host your data.
In this section we will describe in detail what may be your business application characteristics
from a Storage point of view and how they can impact your choice regarding the sizing.
First of all, you have to keep in mind that network file based solutions are not always the most
appropriate option according to your workload. SONAS is only one product from the wide IBM
Storage product portfolio. For instance if your daily business application is using Direct

Attached Storage (DAS), by design this locally storage attached solution will have a lower
latency than a network attached solution like SONAS. If your business application is latency
bound, SONAS may not be the better option. Still dealing with network design, if your
application is using very small access in a random way, a network attached solution will not
provide you tremendous performance. For more details regarding good candidates for a
SONAS solution have a look to the chapter titled SONAS usage cases in the book IBM Scale
Out Network Attached Storage, SG24-7874.
7.3.1 Application characteristics

Typically a list of key application characteristics for SONAS would be:
򐂰 Capacity and bandwidth
򐂰 Access pattern
򐂰 Cache hit ratio
The first characteristic of this above list is the easiest one to determine. Indeed as said above
the better you know your application the easier it would be to size the storage solution. That
means it could be very challenging to determine precisely these characteristics. The capacity
and bandwidth required and/or currently in use in your existing environment is the easiest one
to find whereas the cache hit ratio is much more complex to determine.
We will now first define these above concepts, then explain how from a physical point of view
it can impact the performance, and last how to measure these.
7.3.2 Workload characteristics definition

When planning your SONAS appliance the characteristics of your workloads must be taken
into consideration.
Capacity and bandwidth

The capacity you may require for your SONAS will depend on your utilization. Indeed if you
planned to use SONAS only for your business application, you should be able to determine
the capacity you need. But you may use your application in a virtualized or database
environment which means you will need to deal with these middleware to determine the total
capacity needed. Moreover if you have also planned to host your user’s data on your SONAS,
with or without quotas, you have to extrapolate the amount of storage they will need. Last but
not least backup and restore or disaster recovery policy may increase in a significant way the
total amount of space you will need. To see more details regarding these critical aspects refer
to the Chapter 6, “Backup and recovery, availability and resiliency functions” on page 177.
Regarding the bandwidth requirement, this is more business application related. Depending
on your application, digital media or high performance computing for instance, you may
require the largest bandwidth possible. Keep in mind you will have to process the data next,
there is no need to have a large bandwidth if your application can not handle it. However you
may plan to have multiple user sessions running in parallel, which will require a larger
bandwidth.
You will probably not use only one application. Indeed you can many applications, many
users using the shares for different purpose through NFS or CIFS, you can also have ISVs
running and accessing data. For all above workloads you will have to cumulate bandwidth
and capacity.

Access pattern
The access pattern is more difficult to identify. What we mean here by access pattern is the
workload access type, random or sequential, the file size and the read/write ratio.
When your application is performing IO on the Storage Pool, the IO access would be
considered as sequential if you are able to get contiguous data when performing successive
request access which have consecutive physical addresses. On the contrary it would be
considered as random if, in order to retrieve contiguous data, you have to access to
non-consecutive locations on the drive.
In both cases, random or sequential, these access are writing or reading files. The file size is
basically the size required on the Storage solution to store these files. We do not take into
account snaphot, backup or replication concepts which can increase the size required to
store a single file.
Finally your business application does not perform reads or writes exclusively. The read/write
ratio is actually the ratio between the average number of reads and the average number of
writes during execution.
Once again you will probably not use only one application. As SONAS allows you to use for
all users and applications a unique global name space, this will lead to a mix of access type in
case of multiple application. Indeed if one application do sequential access while the second
one are also accessing data from a random way, the global access on the SONAS File
System may not be 100% sequential or 100% random. This is the same for file size and
read/write ratio.
Cache hit ratio

The cache hit ratio information is definitively the more complex information to retrieve. From
your application point of view, IO operations are embedded in computations or process
operations. When performing an IO read request, after the first request the data is stored in
memory, that means the data is cached. The next time you will need this exact same data (or
an extract) you may retrieve it directly from the cache and then avoid access time from the
storage pool. This is obviously much more efficient as access from memory are far more
faster than access from disk. If you are able to retrieve it, this is a cache hit. If you are not able
to retrieve it, and then need to access it from the disk again, this is a cache miss. The cache
hit ratio is the ratio between the number of cache hits and the number of access requests.
In case of multiple applications, many software layers or middleware accessing data, it may
be even more difficult to determine this cache hit ratio.
We defined above some key characteristics from your application. We will now explain how
they can impact performance from a Storage point of view.
7.3.3 Workload characteristics impact

In this section we describe the impact the various workload characteristics can have on your
SONAS performance.

There is no real impact because of the capacity in a performance point of view in a standard
utilization. Obviously if you do not have space left on device to perform snaphot, backup or
even IO operations for your application, you will just not be able to use it anymore. If you did
not use all NSD available when you created your first SONAS file system, you may include
these in order to add space, or do some cleaning, refer to Chapter 10, “SONAS

administration” on page 305 for more details. You may also keep in mind that your storage
needs will grow with months and years of utilization anyway. Refer toTable 7-5 on page 236
for an overview of raw usable capacity.
Regarding the bandwidth consideration the overall Storage bandwidth is determined by first
the number of Storage Pod inside your SONAS Storage Solution, then the number of Storage
Controller and Storage Expansion inside each Storage Pod and last but not least the type of
disks inside (refer to Chapter 2, “Hardware architecture” on page 41). Indeed as SONAS is
based on GPFS which is a scalable File System solution your overall bandwidth will increase
with the number of Storage elements inside.
The above Storage consideration is only the first step, indeed as your SONAS users will
access data on shares through Interface Nodes, you need to ensure that Interface nodes are
able to deliver all the storage bandwidth. There are two way to increase the Interface Node
bandwidth, increase the number of Interface node or increase their network connectivity
bandwidth. This can be done with additional features such as the 10GigE adapter or the
Quad ports GigE adapter (refer to “Tradeoffs between configurations” on page 216).
In case your SONAS Storage Solution bandwidth does not fit your environment needs, your
application will need more delay to complete. It can be particularly harmful in case of real time
application like video surveillance.
Note that above considerations are provided to help you better understand how to size your
SONAS in order to fit your requirements the first time. Indeed SONAS is a scale out Storage
Solution, this simply means that if you need more capacity or more bandwidth or a
combination of both, you can add extra Storage capacity, the GPFS layer will allow you to
add this new capacity and bandwidth in your existing environment up to 14.4 PB. But as
described in “Tradeoffs between configurations” on page 216, even if SONAS provides some
flexibility there are some configuration considerations like the Infiniband switches capacity
which are not swapable in a non inexpensive way. This is clearly the aim of this section, to
size and foresee further needs.
Access pattern
As discussed above access type can be random or sequential. On top of the access type you
have the file size and then the read/write ratio.
Basically from benchmark considerations, small random access are often associated to IO/s
performance, while large sequential one are often associated to MB/s performance. These
two values are basic disks and Storage Controllers characteristics, but they can also be
appropriate metrics for your business application. Indeed instead of access type and file size
you may have a better idea of what your application or environment needs in term of IO/s or
MB/s.
From a disk level the IO/s is determined by the technology. The IO/s metric can be determine
from intrasec disks characteristics such as:
򐂰 Average Seek Time.
򐂰 Rotational Latency
The Average Seek Time is the time required by the disk drive to position the drive head over
the correct track, while the Rotational Latency is the time required for the target sector to
rotate under the disk head before it can be read or written. Average latency is estimated as
the time required for one half of a full rotation. You can see Average Seek time and Rotational
Latency in manufacturer specifications.
Basically high performance 15K RPM Serial Attached SCSI (SAS) disk drives have a lower
Average Seek time and Rotational Latency (due to their higher rotational speed) than high

capacity 7.2K RPM Serial Advanced Attachment Technology (SATA) disk drives. Indeed 15K
RPM SAS disk drives generally have seek times in the 3-4 millisecond range and are capable
of sustaining an IOPS rate between 160 and 200 per disk drive, while 7.2K RPM SATA disk
drives generally have longer seek times in the 8-9 millisecond range and are capable of
sustaining 70-80 IOPS per disk drive.
The information provided above are general Rules of Thumb for planning your SONAS
system. re only kind of rules of thumb. You have 60 disks (SATA or SAS) within a single
Storage Controller, this is not mean that the Storage Controller performance are 60 times the
performance of a single disk. Same if you add an additional Storage Expansion to increase
the number of disk to 120 per storage Controller, the overall performance will not be 120
times the performance of a single disk. First because of the software RAID technology used
in the Storage Controller read and write performances are not the same, even if both read
and write are by definition a single IO, you do not have the same performance for a read and
write operations, even for two different write operations you will not have same performances.
Indeed because of Raid 5 and Raid 6 definition, as described in Figure 7-7, you have to deal
with parity bit. Actually depending on the raid algorithm the parity bit is not always on the
same disk.
Figure 7-7 Raid 5 and Raid 6 definition
The biggest penalty you may have in term of performance is when you are trying to update a
single data inside your raid array as shown in Figure 7-8 on page 232. Four IO operations are
required for a single data updates.

Figure 7-8 Raid 5 write penalty
For a full write on all disks there are no penalties anymore, see Figure 7-9 on page 232.
Figure 7-9 Raid 5 entire write
Then you have a bottleneck due to the Storage Controller itself just as every Storage
Controller, it just can not scale perfectly with the number of disks inside.

Regarding the bandwidth or MB/s characteristic you also have a limited bandwidth from a
single disk, SATA or SAS. Which is not the same depending the IO access read or write, and
the overall bandwidth of the Storage Controller will not be the sum of each disk bandwidth.
Both because of the RAID overhead and the Storage Controller bottleneck. Basically MB/s
performances, which deal with sequential access, depend on the Storage controller
technology and algorithms. Refer to the Chapter 2, “Hardware architecture” on page 41 to
review performance differences between a configuration with two storage controllers and a
configuration with a storage controller and an storage expansion unit. Indeed in both
configuration you have the same number of disk but with two controllers performance are
better.
As read and write performances are not identical from a Storage Controller point of view, this
is why Storage controller presents both read and write performances; the read/write ratio may
help you to better size your SONAS environment. Indeed even if IO/s and MB/s of
characteristics of a single disk are supposed to be the same for both read and write requests,
the RAID layer implies some additional overhead for write access even if algorithms in used
in Storage controller are designed to perform as well as possible for both read and write. This
means for the exact same capacity, and storage configuration, you will have better
performance with a high read/write ratio, simply because your application is performing much
more read than write requests.
Cache Hit Ratio

As described above the cache hit ratio describes the reuse potential of your business
application. Indeed with a high cache hit ratio, you are able to reuse more frequently the data
stored in cache. As memory accesses are far much more efficient than disk accesses the
more your application will reuse cached data, the faster the access will be, and then the faster
the IO will complete. In order to show you the advantage of the caching effect, let’s deal with
some rough figures (rules of thumb). The caching effect means the data is in the Interface
Node memory. Access time to retreive data from server memory is roughly few µs (micro
second). Compare to hundred of µs through the network (1GigE or 10GigE) it is very few.
That means roughly hundred of micro second to access data for cache hit.
For cache miss, you have to add the infiniband latency to go to Storage nodes, which is few
micro second also, if you are lucky the data is in controller cache which is few ms (milli
second) which means few thousand of micro second and a total of thousand of micro second
(network GigE and Infiniband and cache miss are negligeable compare to ms cache hit on
Storage Controller). If you are unlucky data are on disks, and you will need additional dizains
of milli second to access it, which means dizain of thousand of micro second. Roughly a
factor 10 000 between cache hit on Interface Node and data on disks. You will find more
information describing latency impact in Figure 4.6 on page 153.
Basically this is exactly the same concern inside server CPU, indeed when executing
algorithms, data can be stored and access from memory, or from different level of cache.
Depending on the CPU architecture you can have up to three levels of cache. The closer to
the cpu the data is (cpu <=> cache level 1 <=> cache level 2 (<=> cache Level 3) <=>
memory), the faster the result is. Moreover cache miss means you first have to seek for
something before getting it from next level (extra waste of time) and then extra penalties for
performances.
In the SONAS Storage Solution, Interface nodes have been designed to cache data for users
reuse. Indeed based on the GPFS File System, Interface Nodes will access data from
Storage Pods, and more precisely from Storage Nodes, through the Infiniband Network and
then will store data in cache. The reason why the connection between SONAS users and
Interface Nodes who grant access through NFS or CIFS shares is kept, is exactly to provide
SONAS users these caching capabilities. This amount of cache can even been increase with

the appropriate feature, your Interface Nodes will then have 32 GB amount of memory in
addition to the 32 GB in the default configuration.
If your application basically does much reuse, or has a high cache hit ratio, this caching
capability may increase performance, especially the additional feature. But if this ratio is low,
then the caching effect will not help you a lot. Keep in mind than even if your application has a
high cache hit ratio, if you have many applications running, many user accessing data, or a
significant software stack you may not have enough memory to take advantage of this
caching. Indeed if each layer is accessing data on the SONAS Storage Solution, Interface
nodes may need to remove “application data” to store middleware or users data and then
decrease the effect of caching for your business application.
If you do not know precisely your workload characteristics, below are described some
methods in order to retrieve them. Obviously it will be useful for your SONAS sizing but also
to understand more precisely your daily business application IO behavior.
7.3.4 Workload Characteristics measurement

In this section we describe how to measure the different workload characteristics.

Your Storage Administrators should determine pretty easily the amount of storage your
current environment is using. It may be longer if you are currently using an environment with
separate islands of data which have to be managed independently. You will have to connect,
or at least run commands from a servers on the network to gather information if your Storage
management software allows you to. Another option is to use Tivoli Storage Productivity
Center, indeed if Tivoli Storage Productivity Center is installed and monitoring your Storage
environment, you may be able to retrieve this information directly from it.
Regarding the bandwidth, options available are to measure bandwidth from your Storage
Subsystems, but just like the capacity, you will need to measure it from each single Storage
subsystem. The second option would be to measure it from servers side running your
application and any other software layers which may require bandwidth..
Access pattern
Tivoli Storage Productivity Center is here again an appropriate option to retrieve any kind of
IO utilization and access.
If Tivoli Storage Productivity Center is not setup on your current environment you may
retrieve the read and write information and maybe the size from your Storage Subsystem if
they include monitoring in graphical view or CLI. From your servers you may find these
information from tools like iostat, dstat, nestat, vmstat, nmon. You may also ask to your
application developers information regarding IO access.
Cache hit ratio

This one is the most hard to find. Only options are Tivoli Storage Productivity Center or
Monitoring GUI available on Storage Subsystems.
However keep in mind that you may only find partial information. Indeed if you are looking
from a Storage Subsystem point of view you will find information regarding this particular
storage subsystem and then only for applications running on it. Same if you are looking at the
application level with your application developers, you may be not aware of IO access
required by middleware of other software layers.

For all of them, you may use nmon and nmon_analyser tools from Unix systems (NFS), or
perfmon tools with appropriate counters from Windows (CIFS) for graphical reports. From
NFS access you may also use iostat/dstat/netstat/vmstat tools.
7.4 Powers of two and powers of ten: the missing space

To avoid miscalculations and surprises it is important that you understand the measurement
units in which computing capacities are expressed. We have bits and bytes, megabytes and
gigabytes. How large is a gigabyte? Well, it depends how you calculate it. When disk vendors
discuss storage capacity they usually are presenting it in powers of 10 so that 1GB is 109 (ten
to the power of 9) or 1,000,000,000 bytes. When you format or report on the capacity of a
storage device the numbers are generally represented in a binary scale based on 210 or
1,024 bytes also termed a kilobyte. Using this notation 1GB of space in a file system is
calculated as 10243 , that is equivalent to 230 , or 1,073,741,824 bytes. So if you format your
new 1GB decimal drive you will see only 0,93GB binary. You are missing 7% of your space,
and the effect gets more and more pronounced as the capacity grows. The table in
Figure 7-10 on page 235 shows how space is calculated using the decimal and binary
notation and the percentage that they differ calculated as the binary representation divided by
the decimal representation minus one.
Unit dec 10^3 bin 2^10 bin_dec%

1-kilo 10^3 2^10 2%
2-mega 10^6 2^20 5%
3-giga 10^9 2^30 7%
4-tera 10^12 2^40 10%
5-peta 10^15 2^50 13%
6-exa 10^18 2^60 15%
7-zetta 10^21 2^70 18%
Figure 7-10 Space difference with binary and decimal notations
Note that at the terabyte scale we are off by around 10% that grows to 13% at the petabyte
scale. That’s also the reason why I only get around 55GB of space on my laptop’s 60GB
drive.
From a SONAS perspective the disk storage space is presented and discussed in the
decimal notation and 60TB of disk is 60x1012 bytes of storage. On the other hand when you
format the disk drives the space is presented using the binary notation, so 1TB is 240 bytes.
Note that when we discuss network capacities and bandwidth if the Gbit and 10Gbit Ethernet
adapters we are using the binary notation so a 1Gbit ethernet link corresponds to 230 bits per
second or 134,217,728 bytes per second.
7.5 Sizing the SONAS appliance

We have looked at SONAS trade-off configurations, and workload characteristics. We will
now look at how to size the appropriate SONAS Storage Solution.
The main goal in this section is to determine:

򐂰 Capacity requirements
򐂰 Storage Subsystem disks type
򐂰 Interface node connectivity and memory configuration

򐂰 Base Rack Model
7.5.1 Capacity requirements

We previously looked at the capacity and bandwidth workload characteristics. According to
these values you first have to size the Storage Capacity of your SONAS appliance.
The minimum SONAS configuration as described in “Tradeoffs between configurations” on

page 216 is only 60 disks within a single controller. Depending on the disk technology this
minimum Raw Usable Capacity is 20 TB with SAS disks and up to 93 TB with 2TB SATA
disks, see Table 7-5.
Table 7-5 Disk type and capacity

Feature Code Disk Disk Capacity Total Disks Data Disks Raw Usable Capacity
Technology
6 x 1300 SATA 1 TB 60 48 46 540 265 619 456
6 x 1301 SATA 2 TB 60 48 93 080 531 238 912
6 x 1310 SAS 450 GB 60 48 20 564 303 413 248
6 x 1311 SAS 600 GB 60 48 27 419 071 217 664
This minimum configuration can grow up to four times the above capacity if you use a full
Storage Pod and do not mix the disks type, that leads to 372 TB with 2TB SATA disks
exclusively. Minimum would be 82 TB with SAS disks exclusively. Then if you need more
capacity you will have to use additional Storage Expansion Rack “Tradeoffs between
configurations” on page 216 with two Storage Pods each.
7.5.2 Storage Subsystem disk type

Regarding the Storage Subsystem disk type, it is more related to the access pattern
considerations from the previous section. Indeed as explained above you can choose
between SAS or SATA technology, and you can ever mix them inside the same Storage Pod.
Refer to Chapter 6, “Backup and recovery, availability and resiliency functions” on page 177
for more information regarding best practices and storage pools. If your are more interested in
MB/s than IO/s then SATA drives may be the right option. Indeed SATA drives offer good
results on large sequential access both in read and write in comparison to SAS ones, with
more capacity.
However if your application is more IO/s oriented which means small and random access
then SAS drives should be the good options. Obviously if you do not need a large capacity
SAS drives are a good options for both random and sequential access, small or large, read or
write. If you have more than one application running on your SONAS environment you may
try a mix of SAS and SATA drives and create separate share accordingly. Keep in mind that if
you have less capacity with SAS drives than with SATA ones, refer to the Table 7-5 on
page 236 above, you have a better resiliency and additional spare drives.
After you determined your capacity requirements and then designed a draft version of your
SONAS Storage Solution, you may need to come back for an additional iteration with slightly
changes because of the Storage Subsystem disk types. Indeed as SAS drives are smaller,
you may need more Storage Pods for a given capacity.

Then after considering the overall capacity and the subsystem storage disk types, you have
to do a latest iteration with backup and restore or disaster recovery consideration to have a
more precise view of the number of Storage Pods required.
7.5.3 Interface node connectivity and memory configuration

You may notice that sometimes we refer to the SONAS bandwidth and sometimes to the
MB/s of the Storage Controller or disks. Actually the bandwidth is also from a SONAS users
point of view, which means from Interface Nodes point of view. Interface Nodes have nothing
to do with classic storage consideration as the MB/s. The required bandwidth, from both your
applications, users or middleware, will determine the Interface Node configuration. In one
hand we figured out with two previous bullet what would be the better Storage Pods
configuration, and in the other hand we will figure out the more appropriate Interface Node
configuration in order to be able to size the entire SONAS Storage Solution. Depending on
the bandwidth you need you have two options. First increase the number of Interface Nodes,
or increase their network connectivity capability, or even both. First thing to do is to determine
if your bandwidth is an overall bandwidth or a peak bandwidth. If you planned to have many
users accessing the SONAS through multiple share access, accessing data independently
but in parallel, then you are more interested in an overall bandwidth. If you planned to access
your data hosted by your SONAS Storage Solution through few servers running your daily
business application which required a huge bandwidth, then your are more focus on a peak
bandwidth.
As previously described, the Interface node default configuration is two GigE connections in
failover/backup configuration to the public network. This means a single 1Gb/s connection for
each Interface Node. Moreover you will access via NFS or CIFS protocol which can lead to
extra overhead. Maximum packet size for NFS is 32KB for example. First option is to double
this bandwidth by changing the configuration in aggregate mode, and then have a 2Gb/s
bandwidth (still NFS or CIFS however). To increase the overall bandwidth the simplest way is
to add extra Interface Nodes in your SONAS configuration, you may even add an Interface
Expansion Rack in order to increase the number of Interface node to the maximum allowed in
a SONAS configuration. If you are more focus on a peak bandwidth, first option would be to
add an extra Quad ports GigE connectivity feature. This means a total of six GigE
connections to be configured in a single failover/backup configuration (this is the default but
does absolutely not increase your bandwidth), three failover/backup configurations which will
result to a 3Gb/s bandwidth, or an aggregate configuration which leads to a 6Gb/s bandwidth.
Still with NFS and CIFS protocol on top. Another option is to add an 10GigE dual port adapter
which can also be configured in failover/backup or in aggregate configuration. This will
respectively lead to a 10Gb/s bandwidth and 20 Gb/s bandwidth, still with NFS and CIFS
protocols on top. Last option is to use both additional adapters which means 6 GigE
connections and two 10GigE connections. Obviously if you add these extra features for each
Interface Node, you will increase mechanically the overall bandwidth.
Here again the above bandwidth consideration lead to a draft Interface Nodes configuration,
as previously for Storage Pods, the cache hit ratio parameters will make you do a second
iteration in your Interface Node configuration process. The cache hit ratio means that your
application can reuse data and take advantage of the SONAS caching ability. In order to
increase this caching potential you have two options. First increase the number of Interface
Nodes or increase the amount of memory inside each Interface Nodes. This ends the
Interface node connectivity and memory configuration sub section.
Based on the Capacity and Storage subsystem disk type; you have identified a number of
Storage Pods for the SONAS configuration. Based on the information in we have previously
discussed regarding Interface node connectivity and memory configuration; you have
identified a number of Interface Nodes for your SONAS configuration. These both number will

now help you to determine the appropriate Base Rack Model. As described in “Rack types -
how to choose correct rack for your solution” on page 61 there are three Base Rack Models.
The first one contains Interface Nodes, no Storage Pod and a 36 port Infiniband switch, the
second one contains some Interface Nodes, no Storage Pods and a 96 port Infiniband switch
while the third one contain both Interface and Storage Pod and a 36 port Infiniband switch.
Depending on the total number of Storage Pods, and Interface Nodes from the three previous
section you are able to determine the total number of Infiniband ports required for your
SONAS configuration. Keep in mind that a single Storage Pod require two Infiniband ports
because is partially made of two Storage Nodes.
More than 36 nodes in total would imply the second base rack model with the 96 ports
Infiniband switch. Here again the aim of SONAS is to be a Scale Out solution, which means
extra Storage added if needed and extra Interface Nodes added if needed. So you do not
have to be extremely precise and exhaustive when configuring your SONAS. The only
requirement is to choose carefully the base rack model, which means the Infiniband switch,
because there is no way to swap base rack model configuration in a non expansive way. You
may still order for the 96 ports Infiniband switch and partially full it with a single 24 ports
Infiniband Board Line and scale out later if needed.
7.6 Tools
There are tools available that can be used to help you analyze your workload and, using
workload characteristics, size your SONAS system.
7.6.1 Workload analyzer tools

In this section we describe tools that you can use to help you understand your workload
characteristics.
nmon tool
nmon is a free tool to analyze AIX and Linux performances and gives you a huge amount of
information all on one screen. Instead of using five or six separate tools, nmon can gather
some information like CPU utilization, Memory use, Disks I/O rates, transfers, and read/write
ratios, Free space on file systems, Disk adapters, Network I/O rates, transfers, and read/write
ratios, Network File System (NFS) and much more on one screen and dynamically updating
it.
This nmon tool can also capture the same data into a text file for later analysis and graphing
for reports. The output is in a spreadsheet format (.csv).
As described above you may use nmon in order to monitor dynamically your environment, but
you can also capture data into a .csv file and other tools such as nmon_analyser or
nmon_consolidator to analyze data, and generate graphe or tables.
The aime of nmon_analser is to use nmon .csv output files generated during your run for
instance, as input and generates an Excel spreadsheet where each tab gather information
regarding CPU consumption, memory utilization or disks usage, and describe results with
schemes and table.
Basically in a big infrastructure you may need to monitor every node, server and clienthelp. If
you need a big picture instead of one screenshot per nodes, you may want to gather all nmon
information for a typical application, typical run, or typical subset of nodes. Instead of
nmon_analyser you would need nmon_consolidator, which is basically the same tools but

which are consolidate many .csv into a single Excel spreadsheet document. This may be
useful also for a virtualized environment, where you may need to monitor resources from a
host point of view (RedHat 5.4 host, VMware ESX, or AIX PowerVM™), instead of Virtual
Machines point of view. In Figure 7-11 and Figure 7-12 on page 239, you can see a cpu
utilization summary both from a single LPAR (Power virtualization with AIX) and the entire
system (Power AIX).
Figure 7-11 Single Partition output
Figure 7-12 Entire System output
Links
For more detailed information regarding:
򐂰 nmon tool refer to the folowing URL:
http://www.ibm.com/developerworks/aix/library/au-analyze_aix/
򐂰 nmon_analyser refer to:
http://www.ibm.com/developerworks/wikis/display/Wikiptype/nmonanalyser
򐂰 nmon_consolidator refer to:
http://www.ibm.com/developerworks/wikis/display/WikiPtype/nmonconsolidator
perfmon tool
Like the nmon tools suite for Linux and AIX systems, you can use the Windows perfmon tool
in order to gather and analyse your workload application. Indeed Windows Operating

Systems provide the perfmon (perfmon.exe) utility to collect data. Perfmon allows for realtime
performance counter visualization or historical reporting. Basically there are severals
performance indicators or counters which are gathered into objects.
For instance an object can be:

򐂰 processor,
򐂰 memory,
򐂰 physical disks,
򐂰 network interfaces.
Then each object provides individual several counters like:

򐂰 %processor time for processor,
򐂰 Pages read/s for memory,
򐂰 %disk write time for physical disks,
򐂰 current bandwidth for network interfaces.
As explained above, once you have selected appropriate counters, you can visualize results
dynamically, or record these into a Excel spreadsheet for later analysis and report. Unlike
nmon you do not need additional tools to analyse. First you need to launch the perfmon tool,
then generate a data collection during the application execution for example, or during the
whole day. Once the data collection is generated you can open the log file generated,
visualize it and even generated a .csv file. Finally open the generated csv file with Excel and
create scheme and table as described in Figure 7-13 on page 240 and Figure 7-14 on
page 241.
Figure 7-13 Processor counters

Figure 7-14 Physical Disks counters


Draft Document for Review November 1, 2010 9:32 am 7875InstallPlan.fm
Chapter 8. Installation planning

This chapter provides information on the basic installing planning of the SONAS appliance. It
does not include considerations for Tivoli Storage Manager, replication, or ILM.

7875InstallPlan.fm Draft Document for Review November 1, 2010 9:32 am
8.1 Physical planning considerations

In Chapter 7, “Configuration and sizing” on page 215 we looked at all SONAS configurations
possible, and we provided information to size the appropriate SONAS configuration according
to your needs. In this chapter you will complete an entire questionnaire in order to prepare
and plan your future SONAS installation.
But before going further there are critical physical consideration you need to think about.
Indeed in this section we will provide you some technical information regarding physical
requirements. As for any IT Solution you will have to handle some physical features related to
load floor or power requirements.
All following considerations have been IBM certifiied, they are not only measurements. This is
exactly what is required in your data center.
Space and floor requirements

We described in detail in Chapter 7, “Configuration and sizing” on page 215 all SONAS frame
model types: Base and Expansion Rack models. You also have to handle these rack models
in the “Sizing the SONAS appliance” on page 235 in order to create your SONAS Storage
Solution configuration which will fit all your performance and scale out requirements.
Your SONAS Storage Solution may be the smallest one which means a single Base Rack
model, or the largest one which means 17 racks in your data center including the Base Rack,
the Interface Expansion rack and 15 Storage Expansion rack.
It would probably be something between, but whatever the number of SONAS rack you may
need, there are some floor load requirements you need to consider inside your data center.
Altough your SONAS configuration may contain some half empty rack, because of mandatory
components, see Chapter 2, “Hardware architecture” on page 41, all weight considerations
assume that all racks are full.
Basically even before considering the floor requirement, you would have to ensure that all
logistic and setup aspects meets the IBM requirements. You can find all these logistics
consideration, loading dock, elevator or shipping containers and setup consideration like the
use of raised floor or rack cabling in tthe manual IBM Scale Out Network Attached Storage,
GS32-0716.
The next step will then be to identify each SONAS rack model inside your configuration and
find out your floor load rating of the location you plan to install the SONAS system. Floor load
requirements are critical so you need to carefully follow IBM recommendations and ensure
your location meets the minimum used by IBM which is 342 kg per m² (70 lb. per ft²).
If yes, then you need to determine the appropriate weight distribution area by following IBM
rules describes in the Figure 8-1 on page 245. Remember that the weight distribution areas
can absolutely not overlap, they are calculatied for the maximum weight of the racks (racks
full) and for an individual frame model only.

Figure 8-1 Weight distribution area
Assuming the sizing you have done led to a configuration with one Base rack and two
Storage Expansion racks, which will be setup on the same row in your data center. You must
ensure first that all SONAS racks have at least 762 mm (30 in.) of free space in front and in
rear of them, and according to the above weight distribution areas, they also have
155+313=468 mm or 313+313=626 mm between them as described in the Figure 8-3 on
page 246.
Below in the Figure 8-2, you will find detailed SONAS rack dimensions.
Figure 8-2 Rack dimensions
Chapter 8. Installation planning 245

Figure 8-3 Floor loading example
As described above, minimum space between a base rack RXA #3 and a Storage Expansion
rack RXB is 468 mm (18.4 in.) whereas it is 626 mm (24.6 in.).
Power Consumption
Each SONAS Rack has either four intelligent PDUs (iPDUs) or four base PDUs. The iPDUs
can collect energy use information from energy-management components in IBM devices
and report the data to the Active Energy Manager feature of IBM Systems Director for power
consumption monitoring.
Each SONAS rack also requires four 30A line cords, two as primary and two as secondary.
As described in Chapter 2, “Hardware architecture” on page 41 you can configure your

SONAS system with either SATA drives, SAS drives, or a combination or both. But you have
to keep in mind that SAS and SATA drives does not have the same power consumption
requirements. In the Figure 8-4 on page 247, you can see the maximum number of Storage
controller and disk expansions unit you may have inside a single Storage Expansion rack
according to power consumption requirements.

Figure 8-4 Maximum number of Storage according to SAS drives power consumption
In the Figure 8-5, you find additional information regarding the power consumption
measurements done for heavy usage scenario with fully populated SONAS Racks and SAS
drives exclusively.
Figure 8-5 Power consumption measurements
Noise
Based on acoustics tests performed for a SONAS system:
򐂰 90dB registered for a fully populated Base rack (2851-RXA) system
򐂰 up to 93 bB in worst scenario with a fully populated Storage Expansion Rack (2851-RXB)
system operating acoustic noise specifications are:
Declared Sound Power Level, LwAd
< 94 dBA @ 1m at 23ºC
However you can reduce the audible sound level of the components installed in each rack by
up to 6 dB with the acoustic doors feature (feature code 6249 of each SONAS rack).
Heat and cooling

In order to optimize the cooling system of your SONAS Storage Solution, you may used
raised floor to increase air circulation in combination with perforated tiles.
You may find more details information regarding such a setup in the manual SONAS
Introduction and Planning Guide, GA32-0716.
Regarding the temperature and humidity while the system is in use or shut down, you can
refer to the Figure 8-6 on page 248.

Figure 8-6 Cooling measurements
8.2 Installation checklist questions

In this section we will review the questionnaire you had to fill in order to complete your
SONAS solution. This information is critical and we refer to some of these questions from that
list in further sections.
First in Table 8-1are gather questions related to the managment node configuration like the
cluster name or domain name for example. Then in Table 8-2 on page 249 you will find
questions regarding the remote access for the Call Home feature.
In next pages, we will have to provide information on the quorum configuration in Table 8-3 on
page 250, CLI credentials in Table 8-4 on page 250 or in the nodes location in Table 8-5 on
page 250.
Latest questions refer to DNS, NAT or Authentication method configurations. You will then be
prompted to fill some fields related to these topics in Table 8-6 on page 251, Table 8-7 on
page 251and Table 8-8 on page 252.
Table 8-1 Management Node configuration

Question # Field Value Notes
1 Cluster Name This is the name of your IBM SONAS cluster.

Example: sonascluster
2 Domain Name This is the your network domain name.

Example: mydomain.com
Note: The Cluster Name and Domain Name are
typically used in combination.
Example: sonascluster.mydomain.com
3 Internal IP Address Specify 1, 2, or 3. You need to use a predetermine range for your
Range Private IP Network. But this range must not
conflict with your already existing network
configuration which will be used as the Public IP
Network in order to access Management or
Interface nodes.
The available IP Address ranges are:
1. 172.31.*.*
2. 192.168.*.*
3. 10.254.*.*
Note:
If you are already using the first range, then
choose the second one, except if you are using
both then choose the third one.

4 Management console IP This IP address will be associated to the

address Management node. It has to be on the public
network and accessible by the storage
administrator.
5 Management console This is the numeric Gateway of your

gateway Management console.
6 Management console This is the numeric Subnet Mask of your

subnet mask Management console.
7 Host name mgmt001st001 This is your preassigned Management Node

host name.
8 Root Pasword You can specify here the password you want to
be set on the management node for root access.
By default it is Passw0rd (where P is capitalized
and 0 is zero).
9 NTP Server IP Address SONAS need to synchronize all nodes inside the
cluster and your Authentication method, you
then have to provide at least one NTP Server.
A second NTP Server is recommended for
redundancy.
Note: The Network Time Protocol (NTP)
Server(s) can be either local or on the internet.
Note: Only the Management Node require a
connection to your NTP server, indeed itself will
become the NTP server for the whole cluster.
10 Time Zone Refering to Time zone list. You have to specify

the number corresponding to your location.
11 Number of frames being You need to specify here the total quantity of
installed rack frames in this cluster.
In next Table 8-2, we need you to provide some information regarding the remote
configuration of your SONAS in order to enable the Call Home feature.
Table 8-2 Remote configuration

12 Company Name
13 Address This is the address where your SONAS is

located. Example: Bldg 123, Room 456, 789 N
DataCenter Rd, City, State
14 Customer Contact In case of severe issue this is the primary

Phone Number contact that IBM service will call.
15 Off Shift Customer This is the alternate phone number.

Contact Phone Number
16 IP Address of Proxy Optional. You have to provide the IP Address of

Server (for Call Home) the Proxy Server if it is needed to access the
internet for Call Home feature.

17 Port of Proxy Server (for Optional. You have to provide the port of above
Call Home) Proxy Server if it is needed to access the
internet for Call Home feature
18 Userid for Proxy Server Optional. You have to provide the userid of
(for Call Home) above Proxy Server if it is needed to access the
19 Password for Proxy Optional. You have to provide the Password of

Server (for Call Home) above Proxy Server if it is needed to access the
In Table 8-3 you will need to provide the quorum topology of your SONAS system.
Table 8-3 Quorum Topology

20 Quorum Storage nodes 1. Your first action will be to select an odd

number of Quorum Nodes, you can use both
Interface and Storage Nodes.
2. Valid choices are 3, 5, or 7.
3. If your cluster is composed by more than a
single frame, you must spread your quorum
21 Quorum Interface nodes nodes in several frames.
4. Once you have built the appropriate topology,
write the Interface and Storage Node numbers in
the table.
In Table 8-4 you will have to determine CLI credentials. Your SONAS administrator will use
these credentials to connect to the CLI or GUI in order manage your entire SONAS Storage
Solution.
Table 8-4 CLI credentials

22 CLI User ID Your SONAS administrator will use this ID for

GUI or CLI connection, for instance: myuserid
23 CLI Password This is the password corresponding to the User

ID above. Example: mypassword
You will fill the Table 8-5 below with all SONAS nodes location in your data center.
Table 8-5 Nodes Location

Question # Node Number Rack number/ Node Serial Number Infiniband Port
position Number
24 Management Node
25 Interface Node #1
...
26 Storage Node
...

The Rack number is actually the number of the rack containing this node whereas the
position indicates the position (U) where this node is installed in the rack. Finally Node Serial
Number is the serial number of the node and the Infiniband Port Number is the Infiniband
Switch port number where the node is connected.You do not have to give this information for
preinstalled nodes.
Next Table 8-6 and Table 8-7 are gathering information regarding your existing DNS and NAT
configuration.
Table 8-6 DNS configuration

27 IP Address of Domain You need to provide here the numeric IP

Name Services (DNS) address of one or more Domain Name Services
Server(s) (DNS) Servers you are using inside your
network.
In order to avoid a bottleneck because of a
single DNS server, and then improve
performance, you may set up multiple DNS
servers in a round-robin configuration.
28 Domain This is the Domain Name of your cluster (such

as mycompany.com).
Note: This field is not required and may be left
blank. If it is left blank then no Domain name will
be set for the cluster.
29 Search String(s) This is a list of one or more Domain Names to be

used when trying to resolve a shortname
(example: mycompany.com,
storage.mycompany.com,
servers.mycompany.com). Note: This field is not
required and may be left blank. If it is left blank
then no search string will be set for the cluster.
Table 8-7 NAT configuration

30 IP Address The numeric IP address requested here is the IP

address needed to access the Management and
Interface Nodes through the internal private
network connections using NAT overloading,
meaning that a combination of this IP Address
and a unique port number will correspond to
each node (Management Node and Interface
Nodes only). This IP Address must not be the
same as the Management Node IP Address or
the Interface Node IP Addresses .
31 Subnet Mask This is the Subnet Mask associated with the IP

Address above.
32 CIDR Equivalent of the This is the CIDR (/XX) equivalent of the Subnet
Subnet Mask Mask specified above.
33 Gateway This is the Default Gateway associated with the

IP Address above.

Next step is to provide detail of your authentication method in Table 8-8. IYou will have to
integrate your SONAS system into your existing authentication environment, which can be
Active Directory (AD) or Lightweight Directory Access Protocol (LDAP).
Table 8-8 Authentication Methods

34 Authentication Method [ ] Microsoft Active What is the authentication method you are using
Directory in your environment?
or
[ ] LDAP
35 AD Server IP address In case of an Active Directory configuration you

need to provide the numeric IP address of the
Active Directory server.
36 AD Directory UserID This User ID and the Password below will be

used to authenticate to the Active Directory
server.
37 AD Password This is the password associated to the userid

above.
38-0 LDAP IP Address In case of a LDAP configuration, you need to

provide the numeric IP address of the remote
LDAP server.
38 LDAP SSL Method [ ] Off In case of a LDAP configuration you can choose
to use an open (unencrypted) or a secure
[ ] SSL (Secure Sockets (encrypted) communication between your
Layer) SONAS cluster and the LDAP server.
In case of secured communication two methods
[ ] TLS (Transport Layer can be used: SSL or TLS.
Security) Note: When SSL or TLS is used, a security
certificate file must be copied from your LDAP
server to the IBM SONAS Management Node.
39 LDAP Cluster Name This is the Cluster Name specified in Table 8-1
(example sonascluster)
40 LDAP Domain Name This is the Domain Name specified in Table 8-1
(example mydomain.com)
41 LDAP Suffix These are the suffix, rootdn and rootpw from the
/etc/openldap/slapd.conf file on your LDAP
42 LDAP rootdn server.
These information can be found there:
43 LDAP rootpw
/etc/openldap/slapd.conf on your LDAP
server.
44 LDAP Certificate Path If you choose the SSL or TSL method above you
need to provide the path on the IBM SONAS
Management Node where you will copy the
Certificate file.
Once your SONAS has been integrated into your existing environment, and the
authentication method setup accordingly, you will be able to create exports in order to grant
access to SONAS users. But before you create these exports some information in Table 8-9 is
required.

Table 8-9 Protocols access

45 Protocols [ ] CIFS (Common These are all supported protocols which can be
Internet File System) used in order to accessing data. You need to
[ ] FTP (File Transfer check one or more according to your needs.
Protocol)
[ ] NFS (Network File
System).
46 Owner This is the owner of the shared disk space. It can

be a username, or a combination of
Domain\username.
Example: admin1
Example: Domain1\admin1
47 CIFS Options If you need a CIFS share you need to detail

some options. The options are a
comma-separated key-value pair list. Valid CIFS
options are:
browseable=yes
comment="Place comment here"
Example: -cifs browseable=yes,comment="IBM
SONAS"
48 IP Address: If you need an NFS share you need to provide

Subnet Mask: some NFS options. If NFS options are not
CIDR Equivalent: specified, the NFS shared disk will not be
Access Options: accessible by SONAS clients. NFS options
include a list of client machines allowed to
[ ] ro or [ ] rw access the NFS shared drive, and the type of
access to be granted to each client machine.
[ ] root_squash or [ ] Example: -nfs
no_root_squash "9.11.0.0/16(rw,no_root_squash,async)"
[ ] async or [ ] sync.
In the last Table 8-10, you will need to provide details on Interface subnet and network
information.
Table 8-10 Interface Subnet

49 Subnet Basically this is the public network. This network

will be use for communication between SONAS
Interface Nodes and your application servers.
As an example, if you have three Interface
Nodes on a single network, with IP addresses
from 9.11.136.101 through 9.11.136.103, then
the your subnet will be 9.11.136.0, and the
subnet mask 255.255.255.0 (/24 in CIDR
format).
50 Subnet Mask This is the Subnet Mask associated with the

Subnet listed above.
51 CIDR Equivalent of the This is the Subnet Mask listed above, converted
Subnet Mask to CIDR format.

52 VLAN ID Optional. This is a list of one or more Virtual LAN

Identifiers. A VLAN ID must be in the range from
1 to 4095. If you do not use VLANs then leave
this field blank.
53 Group Name Optional. This is a name assigned to a network

group. This allows you to reference a set of
Interface Nodes using a meaningful name
instead of a list of IP addresses or host names.
If you do not use network groups then leave this
field blank.
54 Interface Node Number IP Address (repeat it for each Interface node).

/ hostname Subnet/Subnetmask
Gateway
You must complete the tables above prior to any SONAS setup. Some of the information in
the table is critical, for example the authentication method. Your SONAS Storage Solution will
not work if not properly configured.
In this section and the previsous one, our main concern was the pre-installation process. In
the following sections we will assume that your SONAS Storage Solution has been properly
preconfigured and setup with all required information. However your SONAS is not yet ready
to use, you will have to complete some additional planning steps regarding Storage and
Network configuration, and last but not least the authentication method and IP address load
balancing configuration.
8.3 Storage considerations

In this section we discuss storage considerations.
8.3.1 Storage
SONAS storage consists of disks grouped in sets of 60 SATA or SAS hard drives. Enclosures
with SATA drives always are configured in RAID 6 array, enclosures with SAS drives always
are configured in RAID 5 array. Because of power consumption it is possible to use maximum
360 SAS 450 GB or 600 GB drives and 480 SATA 1 or 2 TB drives per rack. It means that
maximum capacity for SATA drives is 14.4 PB and for SAS drives is 2.43 PB.
SONAS supports up to 2,147,483,647 files with 1 MB block size with maximum supported
approximately 60 million files with async replication. The maximum number of files in a
SONAS is constrained by the formula:
maximum number of files = (total file system space/2) / (inode size +
subblock size)
For file systems that will be doing parallel file creates, if the total number of free inodes is not
greater than 5% of the total number of inodes there is the potential for slowdown in file system
access. Take this into consideration when changing your file system.

8.3.2 Asynch replication considerations

If you are going to use async replication with SONAS then you have to take into consideration
that async replication tool will, by default, create a local snapshot of the file tree being
replicated and use the snapshot as the source of the replication to the destination system.
This is the preferred method as it creates a well defined point-in-time of the data being
protected against a disaster.
After the successful completion of async replication, the snapshot created in the source file
system is removed. However you have to ensure sufficient storage at replication source and
destination for holding replica of source file tree and associated snapshots. A snapshot is a
space efficient copy of a file system at the point when the snapshot is initiated. The space
occupied by the snapshot at the time of creation and before any files are written to the file
system is a few KB for control structures. There is no additional space required for data in a
snapshot prior to the first write to the file system after the creation of the snapshot. As files
are updated, the space consumed increases to reflect the main branch copy and also a copy
for the snapshot. The cost of this is the actual size of the write rounded up to the size of a file
system block for larger files or the size of a sub-block for small files. In addition, there is a cost
for additional inode space and indirect block space to keep data pointers to both the main
branch and snapshot copies of the data. This cost grows as more files in the snapshot differ
from the main branch, but the growth is not linear because the unit of allocation for inodes is
chunks in the inode file which are the size of the file system sub-block.
After the completion of the async replication, a snapshot of the filesystem containing the
replica target is performed. Impact of snapshots to the SONAS capacity depends on the
purpose the snapshots are used for a specific use case. If the snapshots are used temporarily
for the purpose of creating an external backup and removed afterwards the impact is most
likely not significant for configuration planning. In cases where the snapshots are taken
frequently for replication or as backup to enable users to do an easy restore the impact can
not be disregarded. But the concrete impact depends on the frequency a snapshot is taken,
the length of time when each snapshot exists and the number of the files in the file system are
changed by the users as well as the size of the writes/changes.
Back up for disaster recovery purposes

The following list has key implications with using the HSM functionality with file systems being
backed up for disaster recovery purposes with the async replication engine:
򐂰 Source/Destination Primary Storage Capacity - The primary storage on the source and
destination SONAS systems should be reasonably balanced in terms of capacity. Since
HSM allows for the retention of more data than primary storage capacity and async
replication is a file based replication, planning must be done to ensure the destination
SONAS system has enough storage to hold the entire contents of the source data (both
primary and secondary storage) contents.
򐂰 HSM Management at Destination - If the destination system uses HSM management of
the SONAS storage, enough primary storage at the destination should be considered to
ensure that the change delta to be replicated over into its primary storage as part of the
Disaster Rrecovery process. If the movement of the data from the destination location’s
primary to secondary storage is not fast enough, the replication process can outpace this
movement causing a performance bottleneck in completing the disaster recovery cycle.
Therefore, the capacity of the destination system to move data to the secondary storage
should be sufficiently configured to ensure that enough data has been pre-migrated to the
secondary storage to account for the next async replication cycle and the amount of data
to be replicated can be achieved without waiting for movement to secondary storage. For
example, enough TSM managed tape drives will need to be allocated and operational,
along with enough media.

8.3.3 Block size

The size of data blocks in a SONAS file system may be specified at file system creation. This
value can not be changed without recreating the file system. SONAS offers following block
sizes for file systems: 16 KB, 64 KB, 128 KB, 256 KB (the default value), 512 KB, 1 MB, 2 MB
and 4 MB. The block size defines the minimum amount of space that file data can occupy in a
sub-blocks, a sub-block is 1/32 of the block size and it defines the maximum size of a I/O
request that SONAS sends to the underlying disk drivers. File system blocks are divided into
sub-blocks for use in files smaller than a full block or for use at the end of a file where the last
block is not fully used. Sub-blocks are 1/32 of a file system block which is the smallest unit of
allocation for file data. As an example, the sub-block size in a file system with the default 256
KB is 8 KB. The smallest file will be allocated 8 KB of disk space and files greater than 8 KB,
but smaller than 256 KB will be rounded up to a multiple of 8 KB sub-blocks. Larger files will
use whole data blocks. The file system will attempt to pack multiple small files in a single file
system block, but it does not implement a guaranteed best fit algorithm for performance
reasons.
A Storage controller in SONAS is configured to support 32 KB chunk sizes, this value is

preconfigured and can not be changed. It means that for an example default SONAS block
size (256 KB) is divided by 8 disks in each RAID array and written to 8 different drives in 32
KB chunks each. Each RAID array in SONAS consists of 10 disks, 8 is available for data,
because SATA RAID consists of 8+P+Q and SAS RAID consists of 8+P+spare drives. On
Figure 8-7 on page 256 you can see how SONAS writes a block of data with default block
size.
SONAS default can be changed

256 KB block size
8 8 8 8
1/32 of block size
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
KBKBKBKB KBKBKBKB KBKBKBKB KBKBKBKB KBKBKBKB KBKBKBKB KBKBKBKB KBKBKBKB
can not be
changed
32 KB 32 KB 32 KB 32 KB 32 KB 32 KB 32 KB 32 KB
chunk chunk chunk chunk chunk chunk chunk chunk parity parity/spare
RAID array
Figure 8-7 how SONAS writes data to disks

In file systems with a mix of variance in the size of files within the file system, using a small
block size would have a large impact on performance when accessing large files. In this kind
of system it is suggested that you use a block size of 256 KB (8 KB sub-block). Even if only
1% of the files are large, the amount of space taken by the large files usually dominates the
amount of space used on disk, and the waste in the sub-block used for small files is usually
insignificant. Larger block sizes up to 1MB are often a good choice when the performance of
large files accessed sequentially are the dominant workload for this file system. The effect of
block size on file system performance largely depends on the application I/O pattern. A larger
block size is often beneficial for large sequential read and write workloads. A smaller block
size is likely to offer better performance for small file, small random read and write, and
metadata-intensive workloads. The efficiency of many algorithms that rely on caching file
data in a page pool depends more on the number of blocks cached rather than the absolute
amount of data. For a page pool of a given size, a larger file system block size would mean
fewer blocks cached. Therefore, when you create file systems with a block size larger than
the default of 256 KB, it is recommended that you increase the page pool size in proportion to
the block size. Data is cached in interface nodes memory, so it is important to plan correctly
RAM memory size in interface nodes.
8.3.4 File system overhead and characteristics

There are two classes of file system overhead in the SONAS file system. One is the basic
overhead of a file system and the overhead required to manage an amount of storage. This
includes disk headers, basic file system structures and allocation maps for disk blocks. The
second overhead is the space required in support of user usage of the file system. This
includes user directories plus the inodes and indirect blocks for files and potential files. Both
classes of metadata are replicated in a SONAS system for fault tolerance. The system
overhead depends on the number of LUNs and the size of the LUNs assigned to a file
system; but are typically on the order of a few hundred MB or less per file system. The
metadata in support of usage can be far higher, but is largely a function of usage. The cost of
directories is totally a function of usage and file naming structures. A directory costs at least
the minimum file size for each directory and more if the number of entries is large. For a
256KB block size file system, the minimum directory would be 8KB. The number of directory
entries per directory block varies with customer usage. For example, if the average directory
contained 10 entries, the cost a directory would be 800 bytes. This number would be doubled
for metadata replication. The cost of inodes is a function of how the file system is configured.
By default, SONAS is configured with 50M inodes preallocated and a maximum allowed
inodes value of 100M. By default, an inode requires 512 bytes of storage. The defaults would
require 50GB of storage for inodes (512 * 50M *2 for replication). If the user actually had 50M
files with an average directory holding 10 files, the cost for directories would be about 80GB.
Higher density directories would require less space for the same number of files. There may
also be a requirement for some amount of space for indirect blocks for larger files. These two
categories dominate to overhead for a file system. There are other minor usages such as
recovery logs or message logs.
8.3.5 SONAS master file system

A file system should be assigned as a master to be used to avoid a split-brain scenario out of
the viewpoint of a clustered trivial database (CTDB). A split brain scenario occures when two
nodes within SONAS lose communication with each other (i.e. network breaks) and in this
case without master file system it is not possible to decide which node should be the recovery
master. Without the master file system mechanism in case of failure a internal part of SONAS
data may be corrupted. A CTDB requires a shared file system so all nodes in the cluster will
access to the same lock file. This mechanism assure that only one part of the cluster in a
split-brain scenario stays up and running, and can access the data. The recovery master

checks consistency of the cluster and in case of failure performs recovery process. Only one
node at a time can act as a recovery master. Which node is designated the recovery master
is decided by an election process in the recovery daemons running on each node. SONAS
master file system may be shared with your data. It is not possible to unmount or delete a file
system that is the master file system, but you can remove the master flag from a file system
and then unmount it i.e. for maintenance reasons. It is highly not recommended to stay longer
period of time with no master file system configuration than absolutely necessary, during this
time system is exposed to a possible split-brain scenario and data corruption, so master file
system should be reactivated as soon as possible. Set and unset master file system may be
executed only with CLI.
8.3.6 Failure groups

SONAS allows you to organize your hardware into failure groups. A failure group is a set of
disks that share a common point of failure that could cause them all to become
simultaneously unavailable. SONAS software can provide RAID 1 mirroring at the software
level. In this case failure groups are defined which are duplicates of each other, defined to
reside on different disk subsystems. In the event that a disk subsystem failed and could not
be accessed, SONAS software will automatically switch to the other half of the failure group.
Expansion rack with storage pods may be moved away from each other on for the lenght of
Infiniband cables. Currently the longest cable is available 50m. It means that for example you
are allowed to scratch cluster and move two storage expansion racks at a distance of 50m
and create mirror on a failure group level between these two rack.
With failure of a single disk, if you have not specified multiple failure groups and replication of
metadata, SONAS will not be able to continue because it cannot write logs or other critical
metadata. If you have specified multiple failure groups and replication of metadata, the failure
of multiple disks in the same failure group will put you in the same position. In either of these
situations, GPFS will forcibly unmount the file system. It is recommended to replicate at least
metadata between two storage pods, so you have to create two failure groups for two storage
pods.
8.3.7 Setting up storage pools

Storage pool is a collection of disks with smiliar properties which provide a specific quality of
service for specific use, such as to store all files for a particular application or a specific
business division. Using storage pools, you can create tiers of storage by grouping storage
devices based on performance, or reliability characteristics. For example, one pool could be
an enterprise class storage system that hosts high-performance SAS disks and another pool
might consist set of economical SATA disks.They are managed together as a group, storage
pool provide a means to partition the file system’s storage. There are two type of storage
pools:
򐂰 system storage pool (exists by default)
A storage pool that contains the system metadata (system and file attributes, directories
indirect blocks, sybolic links, policy file, configuration information, and metadata server
state) that is accessible to all metadata servers in the cluster. Metadata can not be moved
out of system storage pool. System storage pool is allowed to store user data and by
default will go in to system storage pool unless placement policy is activated. System
storage pool can not be removed unless deleting the entire file system. Disks inside
system pool can be deleted as long as there is at least one disk assigned to system pool
or enough disks with space to store existing metadata. System storage pool contains
metadata, so you should use the fastest and the most reliable disks for reasons such as

better performance of whole SONAS file system and failure protection. There may be only
one system pool per file system and the pool is required.
򐂰 user storage pool
Up to 7 user storage pools can be created per file system. User storage pool does not
contain metadata, only stores data, so disks that are assigned to user storage pool can
only have usage type “data only”.
Maximum of 8 storage pools per file system can be created including required system storage
pool. Storage pool is an attribute of each disk and is specified as a field in each disk
descriptor when the file system is created or when disk is added to an existing file system.
SONAS offers internal storage pools and external storage pools. Internal storage pools are
managed within SONAS. External storage pools are managed by an external application
such as Tivoli Storage Manager. SONAS manages the movement of data to and from
external storage pools. SONAS provides integrated automatic tiered storage (Integrated
Lifecycle Management (ILM), and provides an integrated a global policy engine to enable
centralized management of files/file-sets in the one or multiple logical storage pools. This
flexible arrangement allows file based movement down to a 'per file' basis if needed (refer to .
3.6.1, “SONAS - Using the central policy engine and automatic tiered storage” on page 107).
8.4 SONAS integration into your network

In this section we will describe how to integrate your new SONAS system into your existing
network environment. This network integration requires first an user authentication method to
grant SONAS access, a planning Public and Private network and also a IP address load
balancing mechanism configuration.
8.4.1 Authentication using AD or LDAP

You can use your existing authentication method environment to grant user access to
SONAS. Indeed SONAS support the following authentication method configurations:
򐂰 Lightweight Directory Access Protocol (LDAP)
򐂰 LDAP With MIT Kerberos
򐂰 SAMBA primary domain controller (PDC).
However SONAS does not support multiple authentication methods running in parallel. The
rule is only one type of authentication method at any given time.
When a user attempts to access the IBM SONAS, he enters a user ID and password. The
user ID and password are sent across the customer's network to the remote Authentication
and Authorization server, which compares the user ID and password to valid user ID and
password combinations in its local database. If they match, the user is considered
Authenticated. The remote server sends a response back to the IBM SONAS, confirming that
the user has been Authenticated and providing Authorization information.
Authentication is the process to identify a user, while Authorization is the process to grant
access to resources to the identified user.

We will provide more detail in the Chapter 9, “Installation and configuration” on page 273 for
the AD or LDAP configuration but briefly through the command line interface, it can be
perform with cfgad/cfgldap and chkauth commands.
MS Active Directory
One method for user authentication is to communicate with a remote Authentication and
Authorization server running Microsoft Active Directory software. The Active Directory
software provides Authentication and Authorization services.
For the cfgad command you will need to provide some information like the Active Directory
Server IP address and cluster name. Basically these information have been required in the
Table 8-8 on page 252. Here we will need answers to questions #35 to #37.
Through the Command Line Interface run the following command, see Example 8-1 on
page 260.
Example 8-1 cfgad command example
cfgad -as <ActiveDirectoryServerIP> -c <clustername>.<domainname> -u <username>

-p <password>
where:
򐂰 <ActiveDirectoryServerIP> is the IP Address of the remote Active Directory server as
specified in Table 8-8 on page 252, question #35.
򐂰 <clustername> is the Cluster Name as specified in Example 8-1 on page 248, question
#1.
򐂰 <domainname> is the Domain Name as specified in Example 8-1 on page 248, question
#2.
򐂰 <username> is the Active Directory User ID as specified in Example 8-8 on page 252,
question #36.
򐂰 <password> is the Active Directory Password as specified in Example 8-8 on page 252,
question #37.
Example: cli cfgad -as 9.11.136.116 -c sonascluster.mydomain.com -u aduser -p
adpassword
To check if this cluster is now part the Active Directory domain, perform the following
command, see Example 8-2:
Example 8-2 chkauth command example
cli chkauth -c <clustername>.<domainname> -t
where:
򐂰 <clustername> is the Cluster Name specified in Table 8-1 on page 248, question #1.
򐂰 <domainname> is the Domain Name as specified in Table 8-1 on page 248, question #2.
Example: cli chkauth -c sonascluster.mydomain.com -t
If the above cfgad command was successful, in the output from the chkauth command you
will see 'CHECK SECRETS OF SERVER SUCCEED' or a similar message.

LDAP
Another method for user authentication is to communicate with a remote Authentication and
Authorization server running Lightweight Directory Access Protocol (LDAP) software. The
LDAP software provides Authentication and Authorization services.
For the cfgldap command you will need to provide some information like the LDAP Server IP
address and the cluster name. Basically these information have been required in the
Table 8-8 on page 252. Here we will need answers to questions #38 to #44.
Through the Command Line Interface run the following command, see Example 8-3:
Example 8-3 cfgldap command example

cfgldap -c <cluster name> -d <domain name> -lb <suffix> -ldn <rootdn> -lpw
<rootpw> -ls <ldap server> -ssl <ssl method> -v
where:
򐂰 <cluster name> is the Cluster Name as specified in Table 8-8 on page 252, Question #39
򐂰 <domain name> is the Domain Name as specified in Table 8-8 on page 252, Question
#40
򐂰 <suffix> is the suffix as specified in Table 8-8 on page 252, Question #41
򐂰 <rootdn> is the rootdn as specified in Table 8-8 on page 252, Question #42
򐂰 <rootpw> is the password for access to the remote LDAP server as specified in Table 8-8
on page 252, Question #43
򐂰 <LDAP Server IP> is the IP Address of the remote Active Directory server as specified in
Example 8-8, Question #38-0.
򐂰 <ssl method> is the SSL method as specified in Table 8-8 on page 252, Question #38
Example: cli cfgldap -c sonascluster -d mydomain.com -lb "dc=sonasldap,dc=com"

-ldn "cn=Manager,dc=sonasldap,dc=com" -lpw secret -ls 9.10.11.12 -ssl tls -v
To check if this cluster is now part the Active Directory domain, perform the command
described in Example 8-2 on page 260.

8.4.2 Planning IP addresses

In this section we will describe briefly the Public and Private IP addresses in order to avoid
any kind of conflicts during SONAS utilization. For more details regarding these network, both
private and public, see the “Understanding the IP Addresses for Internal Networking” on
page 286.
In the above Table 8-1 on page 248, Question #3, you have been prompted for an available IP
Addresse range.
As described in Chapter 2, “Hardware architecture” on page 41 SONAS is made by three

differents networks. The public network, which will be used for SONAS users or administrator
to access Interface Nodes or Management nodes respectively. The two other ones are
Private Network, or management network, which will be used by the Management Node to
handle the whole cluster, and the Data Network, or Infiniband Network, on top of which the
SONAS File System is built. These two last networks, private and data, are not used by
SONAS users or administrator, but as they coexist on all nodes with Public Network you
should ensure you will not use the same in order to avoid some IP conflicts.
There are only three choices for the Private Network range, default setting for Public IP
addresses is the range 172.31.*.* but your may already use this particular range in your
existing environment, so the 192.168.*.* range may be more appropriate. Similarly if you are
using both 172.31.*.* and 192.168.*.* ranges, then the range 10.254.*.* should be used as
private network instead.
In order to determine IP address ranges currently in used on your data center location, you
should ask your Network administrators.
8.4.3 Data access and IP address balancing

We will now highlight required information in order to setup the SONAS IP address balancing.
This IP balancing is basically handled both by the DNS and the CTDB layers.
In this section we will show you how the CTDB layer works, in coordination with the DNS, to
provide SONAS users an access to data.
As you noticed in the Installation checklist questions section, some details regarding your
DNS configuration are required. With these information you will be able to setup the
connection between your DNS and your SONAS. For the data access through the client
network, SONAS users have to mount exports via CIFS, NFS or ftp protocols.
As the SONAS Storage Solution has also been designed to be a good candidate for Cloud
Storage, accessing SONAS data should be as transparent as possible from a technical point
of view. Basically your SONAS users do not have to know and even understand how to
access data, they just access it.
As mentioned above this process is working thanks to a appropriate DNS configuration and
the CTDB layer. First the DNS is reponsible to route SONAS user requests to Interface
Nodes in a round robin manner. Which means that two consecutive requests would access
data through two distincts Interface Nodes.
In the tables from section “Installation checklist questions” on page 248, we used the
sonascluster.mydomain.com DNS host name example. For consitancy considerations we will
keep the same one in the following schemes. These following schemes will describe step by
step the DNS and CTDB mechanism in a basic environment. This environment is composed
by three Interfaces Nodes, one DNS server and two active clients one running a Linux

Operating System, the other one running a Windows Operating System. Again for
consistancy consideration we also represented Management Node and Storage Pods. Even
if they do not have any impact on the DNS and/or CTDB mechanism. The last FTP client is
also here to remind the last protocol in use in SONAS.
Assuming the first SONAS user, runing the Linux Operating system, wants to mount a NFS
share on his workstation. He will run a mount command with the sonascluster.mydomain.com
DNS hostname as described in the top left corner in the Figure 8-8 on page 263. This request
will be caught by the DNS server (step 1), which will then look inside its list of IP addresses
and forward the request to the appropriate Interface node (step 2). This in a round robin way,
while sending acknowledgment to the Linux SONAS user (step 3). The connection between
the first SONAS user and one Interface Node is then established as you can see with the
dashed arrow in Figure 8-8 on page 263 below.
Figure 8-8 SONAS user accessing data with NFS protocol
Assuming now a second SONAS user, who needs also to access data hosted on the SONAS
Storage Solution with a CIFS protocol from its Windows laptop. He will run a net use
command (or use the map network drive tool) using the same sonascluster.mydomain.com
DNS hostname as you can see in Figure 8-9 on page 264.
This second request will be caught here again by the DNS server which, in a round robin way,
will assign to this second user the next IP address. Then steps 1 to 3 are repeated as
described in Figure 8-9 on page 264. The final connection between the second SONAS user
and the Interface Node is then established, see the new dashed arrow on the right.

Figure 8-9 SONAS user accessing data with CIFS protocol
Connections between SONAS users and Interface nodes remain active until shares are
unmounted from SONAS users, or in case of Interface node failure.
In case of Interface Node failure the IP address balancing is handle by the CTDB layer. The
CTDB layer in order to handle Interface Node failure works with a table. Briefly this table is
recreated as soon as a new event happens. An event can be Interface Node failure or
recovery. Table entries are Interface Nodes identifiants and Public IP addresses.
In the example shown in Figure 8-10 on page 265, the SONAS has been configured in such a
way that the CTDB have a table with three Interface Node identifiers and three Public IP
addresses for SONAS users.

Figure 8-10
In our environment we have three Interface Nodes #1, #2 and #3 and three IP addresses.
The CTDB table has been created with these entries:
򐂰 #1, #2, #3
򐂰 10.10.10.1, 10.10.10.2, 10.10.10.3
From the CTDB point of view:

򐂰 #1 is responsible of 10.10.10.1.
With your two SONAS users connected in the Figure 8-10 above only the two first Interface
Nodes are used. The first Interface node is using the 10.10.10.1 IP address while the second
one is using 10.10.10.2, according to the CTDB “table”. In case of failure of the first Interface
Node, which was in charge of the 10.10.10.1 IP address, this IP address 10.10.10.1 will then
be handled by the last Interface Node as described in the Figure 8-11 on page 266.
From the CTDB point of view you now have:

򐂰 #3 is reponsible of 10.10.10.3 and 10.10.10.1.

Figure 8-11
As you can see on the Figure 8-11, above the first NFS SONAS user has now an active
connection to the last Interface Node.
This is basically how the CTDB is handling the IP address balancing. Your DNS is handling
the round robin method while the CTDB is in charge of the IP failover.
However in the above example, there is a potential loadbalancing bottleneck in case of failure
of one Interface node. Indeed assuming a third user accessing the SONAS through the ftp
protocol as described in the Figure 8-12 on page 267, the connection is established with the
last dashed arrow on the third Interface Node. The first NFS user is still connected to the
SONAS through the first Interface Node, while the second CIFS user is connected to the
SONAS through the second Interface Node and the last ftp user is accessing the SONAS
through the third Interface Node (the DNS here again gave the next IP address).

Figure 8-12
You may notice that from here all incoming users will be affected to Interface Nodes one, two
or three in the same way because of the DNS round robin configuration. You may have for
example four users connected to each Interface Node as described in the Figure 8-13 on
page 268.

Figure 8-13
The botleneck we mentioned earlier appears if one Interface Node fails. Indeed the IP
address handled by this failing Interface node will migrate, obviously will all users and their
workload, to another Interface Node according to the CTDB table. You will then have one
Interface Node handling a single IP address and four user workloads (second Interface Node)
and the third Interface Node handling two IP addresses and eight user workloads as
described in the Figure 8-14 on page 269.

Figure 8-14
The original overall SONAS users workload was equally loadbalanced between the three
Interface Nodes, 33% of the workload each, after the Interface node crash and with the above
CTDB configuration, the workload is now 33% on the second Interface Node and 66% on the
third Interface Node.
In order to avoid this, a simple configuration would be to create more IP addresses than
Interface Nodes available. Basically in our example six IP addresses, two per Interface node,
would be more appropriate as shown on Figure 8-15 on page 270.

Figure 8-15
In that case the original the original CTDB table is:

򐂰 #1 is responsible of 10.10.10.1 and 10.10.10.4
In case of failure the failing Interface node previously in charge of two IP addresses, will
offload his first IP address on the second Interface Node and his second IP address on the
third Interface Node. Indeed bellow is the new CTDB table:
򐂰 #2 is responsible of 10.10.10.1 and 10.10.10.2 and 10.10.10.5
򐂰 #3 is responsible of 10.10.10.3 and 10.10.10.4 and 10.10.10.6
The results of this would be a 50-50% workload spread into the two remaining Interface
Nodes after the crash as described in Figure 8-16 on page 271.

Figure 8-16
Once the first Interface Node is back again, it will be a new event and the new CTDB table
would then again be:
Which means the traffic will then be load balanced on the three Interface Nodes again.
8.5 Attachment to customer applications

This section is a summary of what you need to keep in mind before integrating your SONAS
into your existing infrastructure and be able to use it.
8.5.1 Redundancy
SONAS, as explained in this book has been designed to be a High Available Storage
Solution. This High Availability relies on hardware redundancy and software high availability
with GPFS and CTDB. But as you planed to integrate SONAS into your own infrastructure,
you have to ensure that all externals services or equipments are also high available. Indeed
your SONAS need an Active Directry Server (or LDAP) for Authentication, is this
authentication server redundant? Same question for NTP and DNS servers. From a hardware
point of view, do you have redundant power? Are there network switches for the Public
Network?

8.5.2 Share access

As described in the previous section on Data access and IP address balancing, you have to
attach your SONAS to your existing DNS and use a DNS round robin configuration in order to
load balanced the user SONAS IP requests to all Interface Nodes (beware this is not a
workload load balancing). But for any specific reason you want to use directly the IP address
instead of the DNS hostname. Regarding the CTDB layer, the above section show you how to
configure your IP Public Network and CTDB in order to load balance the workload from one
failed Interface Node to the remaining ones. Typical SONAS use is to map one SONAS user
to a single Interface Node in order to take advantage of the caching inside Interface Node, but
for some particular reason you may need to use the same CIFS share twice from the same
SONAS user (through two drive letters), and then use two Interface Nodes. However you
should not do this with NFS shares, because of NFS design NFS protocol need to send
metadata to different NFS services which may be located on two separate nodes in such a
configuration.
8.5.3 Caveats
If you have planned to migrate your existing environment and business applications to a
SONAS Storage Solution, be aware that NAS Storage are not always the most appropriate
options. Indeed if your business application is currently writing or reading data from a locally
attached solution (DAS), you will increase in a significant way the latency on a Storage base
solution by design. Similarly if your application is performing a huge numbers of write, even
small ones, on a locally attached solution, it will quickly overload your network switches. A
workaround for these above requirement is first to use caching on the client side to reduce
the higher bandwidth impact on performance, and to combine IO requests on client side in
order to reduce IO size. You may also modify your application in order to be more tolerant in
case of paquet loss or timeout expire due to IP protocol, and make it retry.
8.5.4 Backup considerations

There are also good pratices for backing up your storage. First, stop your applicationcleanly
in order to have consistent data, then take a snapshot, and use it for backup process while
restarting your application.

Draft Document for Review November 1, 2010 9:32 am 7875InstallConfig.fm
Chapter 9. Installation and configuration

This chapter provides information on the basic installation and configuration of your SONAS
appliance.

7875InstallConfig.fm Draft Document for Review November 1, 2010 9:32 am
9.1 Pre-Installation
At this point, you have completed your IBM SONAS purchase and it has been delivered. You
are now ready to integrate your SONAS appliance with the installation.
1. Review the floor plan and pre-installation planning sheet to determine whether all
information has been provided.
2. If the pre-installation planning sheet is not complete, contact the Storage Administrator.
This information will be required through the rest of the installation, and the install cannot
start until the Preinstallation Planning Sheet is done.
3. The IBM authorized service provider will perform all the necessary preliminary planning
work including verifying the information in the planning worksheets in order to make sure
you are well aware of the specific requirements like physical environment or networking
environment for the SONAS system.
9.2 Installation
Installation of a SONAS appliance requires both Hardware installation as well as Software
installation.
9.2.1 Hardware installation

This section provides a high level overview of the tasks to complete the SONAS hardware
installation.
The IBM SONAS appliance shipped must be unpacked and moved to the desired location.
The appliance when shipped from the IBM manufacturing unit has all the connections to the
nodes inside the rack, already made. The internal connections are done using the infinibad
connections through which the nodes communicate with each other.
The IBM authorized service provider performs the following tasks:

1. Builds and assembles the hardware components into the final SONAS system.
2. Checks the InfiniBand switches to ensure that all storage locations are ready to use.
3. Connects the expansion rack if required.
4. Loads the software stack on each node of the rack.
5. Loads the Disk Drive modules onto the storage drawer.
6. Powers on the storage controllers, storage expansions, KVM switch and Display module.
7. Powers On the Management Node.
9.2.2 Software installation

After getting ready with the hardware, the IBM authorized service provider begins the
software installation process. During this process the script “first_time_install” is run.
The initial steps requires you to provide the details to the IBM authorized service provider, the
configuration information needed to setup the internal network and get the managament node
connect to your network. Refer to the planning tables in Chapter 8, “Installation planning” on
page 243 for the following:

1. Clustername
2. Management console IP address
3. Managment Node gateway address
4. Managment Node subnet mask
5. Root password
6. NTP Server IP address.
After you have input the required parameters, the script then asks for powering on all the
nodes.
Once you have all the nodes powered on, the configuration script first detects the Interface
Nodes. Review the list of Interface Nodes on the screen to determine if the ID, Frame, Slot
and Quorum settings are correct.
The Storage Nodes are then detected and configured in a similar way. Review the list of
Storage Nodes to determine if the ID, frame, Slot and Quorum settings are correct.
Check the health of the Managment Nodes, Interface Nodes and Storage Nodes to ensure
that the Managment Node can communicate with the Interface Nodes and Storage Nodes.
9.2.3 Check health of the Node hardware

The IBM authorized service provider uses a script that checks the health of the Management
Nodes, Interface Nodes and Storage Nodes. It ensures that the Management Nodes can
communicate with the Interface Nodes and the Storage Nodes.
The script verify_hardware_wellness checks the Node health. It searches for the Nodes and
checks the Ethernet connections to each node and displays the results.
If the check is successful, it will display the number of Nodes it detected. Compare this list
with the list of Nodes you have in the pre-installation planning sheet.If the number of Nodes
displayed is correct, type Y and press Enter. If the number of Nodes displayed is not correct,
type N and press Enter.
If the check detects a problem, it will display one or more error messages. Refer to the
Problem Determination and Troubleshooting Guide, GA32-0717 for detailed information.
9.2.4 Additional hardware health checks

This procedure checks the health of the Ethernet switches, Infiniband Switches and Storage
Drawers.
The IBM authorized service provider runs the command cnrsscheck. This command will run
all the checks and display the results.
Review the result of the checks and verify that the checks have status of OK. For any
problems reported by this command, refer to the Problem Determination and Troubleshooting
Guide, GA32-0717.
Chapter 9. Installation and configuration 275

9.3 Post Installation

At the end of software installation the SONAS system would have a fully configured Clustered
file system (GPFS) with all the disks configured for the filesystem for use, the Management
GUI Interface running and a CLI Interface running.
Create a CLI user id using the CLI command mkuser. In order to run the CLI commands on
the SONAS appliance, add the cluster to the CLI Interface using the command addcluster.
The SONAS appliance is now ready for further configuration.
9.4 Software Configuration

The Software Configuration can either be performed by the IBM Personnel as an additional
service offering or by you, as a system adminsitrator of the SONAS appliance. It is carried out
after the hardware and software installation. The pre-installation planning sheets in
Chapter 9, “Installation and configuration” on page 273 requires you to fill in the
administrative details and enviroment. These details include information regarding the
network and the Microsoft Active Directory or LDAP server. The software configuration
Procedure uses a series of CLI commands.
We describe the the procedure as a high level overview below:

1. Verify that the nodes are ready by checking the status of the nodes using command
lscurrnode and ensure they are in Ready condition. This command should also display the
Roles of each node correctly along with their status of being Ready.
2. Configure the Cluster Manager (CTDB).
3. Create the Failover Group and Filesystem (GPFS) using command chdisk and mkfs.
4. Configure the DNS Server IP address and Domain using command setnwdns.
5. Configure the NAT gateway using the command mknwnatgateway.
6. Integrate with an authentication server, such as active domain (AD) or LDAP with either
cfgad (and optional cfgsfu), cfgldap, or cfgnt4.
7. Configure the Data Path IP Addresses, Group and attach IP Address using command
mknw, mknwgroup and attachnw respectively.
8. Connect a client workstation through a configured export to verify that the system is
reachable remotely and that the interfaces work as expected.

9.5 Sample environment

Let us now take an example and go through the steps for installation and configuration of a
SONAS appliance.
Consider the following setup:
Harware considerations: Rack contains: 1 Management Node, 6 Interface Nodes, 2

Storage Nodes, switches and Infiniband connections.
Software considerations: AD/LDAP is already configured on an external server, Filesystem

and export information already available.
Cluster name in this example is: Furby.storage.tucson.ibm.com
9.5.1 Initial hardware installation

As mentioned in Section 9.2.1, “Hardware installation” on page 274, the IBM authorized
service provider prepares the SONAS system. The racks are assembled, Nodes are
interconnected for them to communicate with eachother. The software code is installed on the
nodes.
The cluster configuration data should be already present in the pre-installation planning
sheet. The Management Node is then powered on, keeping the other Nodes shut down. The
script first_time_install is then run which configures the cluster.
In the following screen shots, you will see some of the few steps captured during the
installation procedure carried out by the IBM authorized service provider. Figure 9-1 shows
the options that are shown when first_time_install script is run.
Figure 9-1 Sample of script first_time_install being run

As the script proceeds, it asks for the configuration parameters to configure the cluster.
These details include the Management Node IP, Internal IP Range, Root Password, Subnet
IP Address, NTP Server IP and more.
The Interface Nodes and the Storage Nodes are then powered on. The script checks for
these Nodes and identifies them as per the configuration fed during the installation
procedure.
Figure 9-2 shows the detection of Interface Nodes and Storage Nodes when powered on.
Figure 9-2 The script detecting the Interface Node and Storage Node
Figures, Figure 9-3 on page 279 and Figure 9-4 on page 279 show the assignment of ID for
the Interface Nodes and Storage Nodes in the cluster. It also involves the step to assign the
nodes to be Quorum Nodes or not.

Figure 9-3 Identifying the sequence of the Interface Nodes and assigning Quorum Nodes
Figure 9-4 Identifying sequence of Storage Nodes and assigning QuorumNodes

The next screen, shown in Figure 9-5, shows the configuration of the cluster where each of
the Interface and Storage nodes are added as a part of the cluster and the cluster nodes are
prepared to communicate with each other.
Figure 9-5 Cluster being configured with all the Nodes
The screen shown in Figure 9-6 on page 281 shows the end of the script after which the
cluster has been successfully configured.

Figure 9-6 Cluster now being created and first_time_install script completes
The Health of the system is then checked. The IBM athorized service provider then logs in
into the Management Node and runs the health check commands.
The verify_hardware_wellness script checks for the connectivity between the Management
Node, Interface Nodes and the Storage Nodes. The command cnrsscheck is then run to
check for the health of the Ethernet Swtiches, Infiniband Swutches and the Storage Drawers.
and to see if the Nodes have their roles assigned respectively and if they are able to
communicate with each other. below shows the command output for our example cluster
setup.
Example 9-1 Running script verify_hardware_wellness and cnrsscheck to check the overall Health of cluster created.
# verify_hardware_wellness
[NFO] [2010-04-21 15:59:06] 197409 /opt/IBM/sonas/bin/verify_hardware_wellness()
3 minutes.
minutes.
1 minutes.
Discovery results:
There are 6 interface nodes.

There are 2 storage nodes.
There is 1 management node.
Is this configuration correct? (y/n): y

Hardware configuration verified as valid configuration.
[root@Humboldt.mgmt001st001 ~]# cnrssccheck --nodes=all --checks=all

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv mgmt001st001 vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
================================================================================
Run checks on mgmt001st001
It may take a few minutes.
EthSwCheck ... OK
IbSwCheck ... OK
NodeCheck ... OK
================================================================================
IBM SONAS Checkout Version 1.00 executed on: 2010-04-21 23:07:57+00:00
Command syntax and parameters: /opt/IBM/sonas/bin/cnrsscdisplay --all
================================================================================
Host Name: mgmt001st001
Check Status File: /opt/IBM/sonas/ras/config/rsSnScStatusComponent.xml
================================================================================
================================================================================
Summary of NON-OK Statuses:
Warnings: 0
Degrades: 0
Failures: 0
Offlines: 0
================================================================================
Ethernet Switch status:
Verify Ethernet Switch Configuration (Frame:1, Slot:41) OK

Verify Ethernet Switch Hardware (Frame:1, Slot:41) OK
Verify Ethernet Switch Firmware (Frame:1, Slot:41) OK
Verify Ethernet Switch Link (Frame:1, Slot:41) OK
Verify Ethernet Switch Configuration (Frame:1, Slot:42) OK
Verify Ethernet Switch Hardware (Frame:1, Slot:42) OK
Verify Ethernet Switch Firmware (Frame:1, Slot:42) OK
Verify Ethernet Switch Link (Frame:1, Slot:42) OK
================================================================================
InfiniBand Switch status:
Verify InfiniBand Switch Configuration (Frame:1, Slot:35) OK

Verify InfiniBand Switch Hardware (Frame:1, Slot:35) OK
Verify InfiniBand Switch Firmware (Frame:1, Slot:35) OK
Verify InfiniBand Switch Link (Frame:1, Slot:35) OK
Verify InfiniBand Switch Configuration (Frame:1, Slot:36) OK
Verify InfiniBand Switch Hardware (Frame:1, Slot:36) OK
Verify InfiniBand Switch Firmware (Frame:1, Slot:36) OK
Verify InfiniBand Switch Link (Frame:1, Slot:36) OK
================================================================================
Node status:
Verify Node General OK

================================================================================
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mgmt001st001 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv strg001st001 vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

================================================================================
Run checks on strg001st001
FcHbaCheck ... OK
DdnCheck ... OK
DdnLogCollector ... OK
================================================================================
================================================================================
Host Name: strg001st001
================================================================================
================================================================================
Warnings: 0
Degrades: 0
Failures: 0
Offlines: 0
================================================================================
DDN Disk Enclosure status:
Verify Disk Enclosure Configuration (Frame:1, Slot:1) OK

Verify Disk in Disk Enclosure (Frame:1, Slot:1) OK
Verify Disk Enclosure Hardware (Frame:1, Slot:1) OK
Verify Disk Enclosure Firmware (Frame:1, Slot:1) OK
Verify Array in Disk Enclosure (Frame:1, Slot:1) OK
================================================================================
FibreChannel HBA status:
Verify Fibre Channel HBA Configuration (Frame:1, Slot:17, Instance:0) OK

Verify Fibre Channel HBA Firmware (Frame:1, Slot:17, Instance:0) OK
Verify Fibre Channel HBA Link (Frame:1, Slot:17, Instance:0) OK
================================================================================
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ strg001st001 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv strg002st001 vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

================================================================================
Run checks on strg002st001
FcHbaCheck ... OK
DdnCheck ... OK
DdnLogCollector ... OK
================================================================================
================================================================================
Host Name: strg002st001
================================================================================

================================================================================
Warnings: 0
Degrades: 0
Failures: 0
Offlines: 0
================================================================================
DDN Disk Enclosure status:
Verify Disk Enclosure Configuration (Frame:1, Slot:1) OK

Verify Disk in Disk Enclosure (Frame:1, Slot:1) OK
Verify Disk Enclosure Hardware (Frame:1, Slot:1) OK
Verify Disk Enclosure Firmware (Frame:1, Slot:1) OK
Verify Array in Disk Enclosure (Frame:1, Slot:1) OK
================================================================================
FibreChannel HBA status:

================================================================================
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ strg002st001 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note: All the commands run for the configuration of SONAS is run as root. You can either
export the PATH variable to inlcude the CLI path or run the commands from the CLI
directory. In our example we change directory to the CLI directory by running:
# cd /opt/IBM/sofs/cli
At the end of the hardware installation, the cluster is created. The IBM athorized service then
creates a CLI user, adds the cluster to the GUI. See Example 9-2 below.
Example 9-2 Creating a new CLI user using CLI command mkuser
[root@furby.mgmt001st001 cli]# mkuser -p Passw0rd cliuser
EFSSG0019I The user cliuser has been successfully created.
[root@furby.mgmt001st001 cli]# addcluster -h int001st001 -p Passw0rd

EFSSG0024I The cluster Furby.storage.tucson.ibm.com has been successfully added
You need to enable the license as in Example 9-3 below and then the cluster is ready for the
rest of the software configuration.
Example 9-3 Enabling License.

[root@furby.mgmt001st001 cli]# enablelicense
EFSSG0197I The license was enabled successfully!

9.5.2 Initial software configuration

The Initial software configuration is a series of CLI commands run by either, the IBM
Personnel as an additional service offering or by you, as a system adminsitrator of the
SONAS appliance. The procedure is explained below. To start, you need to login to the
Management Node using the root user id. You need to enter the root password. Make sure
the cluster is added to the Management Interface.
Verify that the Nodes are Ready

Before you begin with the configuration, make sure that the cluster has been added to the
Management Interface and that the nodes are in ready state. Verify using the command
lsnode as shown in the Example 9-4 below and confirm if the command output displays all
the Nodes correctly and they all have OK as the Connection Status for each node.
Example 9-4 Verifying that the nodes are all ready by running CLI command lsnode.
[root@furby.mgmt001st001 cli]# lsnode -v
Hostname IP Description Role Product Version Connection stat
int001st001 172.31.132.1 interface 1.1.0.2-7 OK
mgmt001st001 172.31.136.2 management 1.1.0.2-7 OK
strg001st001 172.31.134.1 storage 1.1.0.2-7 OK
strg002st001 172.31.134.2 storage 1.1.0.2-7 OK
Note: The actual command output displayed on the screen has many more fields that
shown in this example. This example has been simplified to ensure the important
iformation is clear.
Now, check the state of nodes using the command lscurrnode as shown in the Example 9-5.
Example 9-5 Running command lscurrnode to check node state.

[root@furby.mgmt001st001 cli]# lscurrnode
Node ID Node type Node state Management IP Address Infiniband IP address
int001st001 Interface ready 172.31.4.1 172.31.132.1
mgmt001st001 Management ready 172.31.8.1 172.31.136.1
strg001st001 Storage ready 172.31.6.1 172.31.134.1
strg002st001 Storage ready 172.31.6.2 172.31.134.2
information is clear.
Column 3 in Example 9-5 displays the state of the node. Verify that the state of each Node is
Ready.

9.5.3 Understanding the IP Addresses for Internal Networking

For Internal Networking, we have the Management Network and the Infiniband Network.
Management IP Address and Infiniband addresses as you see in above example have the IP
172.31.*.*. This is chosen from three options that are available. Refer to Chapter 8,
“Installation planning” on page 243.
For our example, we have chosen the 172.31.*.* IP address range. While the first two part of
the IP address remains constant to what you have chosen, the last two varies.
Management IP Range
This is the network for the Management Node to send management data to the Interface
Nodes and Storage Nodes. This is a Private Network not reachable to the outside clients.
There is no data transferred in this network but only management related communication like
commands or passing management related information from the Management Nodes to the
Interface Nodes and Storage Nodes. You can read more in Chapter 4, “Networking
considerations” on page 139
From the above Example 9-5 on page 285, you can see that the Managament IP takes the
range of 172.31.4.* for Interface Nodes, 172.31.8.* for Management Node and 172.31.6.* for
Storage Node. This is done by the install script while creating the SONAS cluster.
Here as we see, the first two part of IP address is constant and then depending on the
Interface Nodes, Management Node and Storage Node, the Managament IP address is
assigned as:
򐂰 Interface Node: 172.31.4.*
򐂰 Management Node: 172.31.8.*
򐂰 Storage Node: 172.31.6.*
Where the last part of IP address is incremented sequentially depending on the number of
Interface Nodes and Storage Nodes. At the time of the writing of this book only a single
Management Node is supported.
Infiniband IP Range
This is the network range which is used for data transfer between the Interface Node and
Storage Node. Like the Management IP, this is a Private Network and not reachable to the
outside clients. Refer to Chapter 9, “Installation and configuration” on page 273.
From Example 9-5 on page 285, you can see that the Infiniband IP takes the range of
172.31.132.* for Interface Nodes, 172.31.136.* for Management Node and 172.31.134.* for
Storage Node. This is done by the install script while creating the SONAS cluster.
Here as we see, the first two part of IP address is constant and then depending on the
Interface Nodes, Management Node and Storage Node, the Managament IP address is
assigned as:
򐂰 Interface Node: 172.31.132.*
򐂰 Management Node: 172.31.136.*
򐂰 Storage Node: 172.31.134.*
9.5.4 Configure the Cluster Manager (CTDB)

The Cluster Manager manages the SONAS cluster to a large extent. Its an integral part of the
SONAS appliance and holds some of the most important configuration data of the cluster.

The CTDB acts as the Cluster Manager for the SONAS appliance. More information about
the CTDB can be found in the Appendix , “CTDB” on page 476 under CTDB.
The SONAS Cluster Manager or CTDB is configured using the CLI command on the
Management Node.
The command requires you to add a Public Cluster Name for the cluster which will be the
name used to advertise the cluster to the neighbouring network like a Windows client
machine. This name is limited to 15 ASCII characters without any spaces or special
characters as shown in Example 9-6.
Example 9-6 Configuring the Cluster Manager using CLI command cfgcluster.
[root@furby.mgmt001st001 cli]# cfgcluster Furby
Are you sure to initialize the cluster configuration ?
Do you really want to perform the operation (yes/no - default no): yes
(1/6) - Prepare CIFS configuration
(2/6) - Write CIFS configuration on public nodes
(3/6) - Write cluster manager configuration on public nodes
(4/6) - Import CIFS configuration into registry
(5/6) - Write initial configuration for NFS,FTP,HTTP and SCP
(6/6) - Restart cluster manager to activate new configuration
EFSSG0114I Initialized cluster configuration successfully
The command prompts, “Do you really want to perform the operation?” Type yes and
press Enter to continue.
Verify the cluster has been configured by running the lscluster command. This command
should display the CTDB clustername you have used to configure the Cluster Manager. The
output of the command is as shown below in the Example 9-7. The public Cluster name is
Furby.storage.tucson.ibm.com.
Example 9-7 Verifying the cluster details using CLI command lscluster.
[root@furby.mgmt001st001 cli]# lscluster
ClusterId Name PrimaryServer SecondaryServ
12402779238926611101 Furby.storage.tucson.ibm.com strg001st001 strg002st001
9.5.5 List all available Disks

The available disks can be checked using the CLI command lsdisk. The output of the
command is as shown below in the example Example 9-8.
Example 9-8 Listing the disks availale using CLI command lsdisk.
[root@furby.mgmt001st001 cli]# lsdisk
Name File system Failure group Type Pool Status
array0_sata_60001ff0732f8548c000000 1 system ready
array0_sata_60001ff0732f85a8c060006 1 system ready
array0_sata_60001ff0732f85c8c080008 1 system ready
array0_sata_60001ff0732f85e8c0a000a 1 system ready

array1_sata_60001ff0732f85d8c090009 1 system ready

array1_sata_60001ff0732f85f8c0b000b 1 system ready
array1_sata_60001ff0732f8608c0f000c 1 system ready
9.5.6 Adding a second Failure Group

As you saw in the above Example 9-8, the failure groups for all the disks is the default failure
group which is assigned at the time of cluster creation. For enabling replication of data on the
filesystem, there should be more than one failure group available.
In our example we create filesystem with replication enabled and hence we change failure
group of some disks which will be part of filesystem, to have a different one. The chdisk
command allows you to modify the failure group property of the disk. See Example 9-9 below.
Example 9-9 Changing Failure Group of disks using CLI command chdisk.
[root@furby.mgmt001st001 cli]# chdisk
array1_sata_60001ff0732f8558c010001,array1_sata_60001ff0732f8578c030003,array1_sata_60001ff0732f
8598c050005,array1_sata_60001ff0732f85d8c090009,array1_sata_60001ff0732f85f8c0b000b,
array1_sata_60001ff0732f8608c0f000c --failuregroup 2
You can verify the changed failure groups using the command lsdisk as seen in previous
section 9.5.5, “List all available Disks” on page 287
Example 9-10 below displays the output after changing the failure groups.
Example 9-10 Verifying the changed Failure Groups of disks using CLI command lsdisk.
[root@furby.mgmt001st001 cli]# lsdisk
array0_sata_60001ff0732f85a8c060006 1 system ready
array1_sata_60001ff0732f8558c010001 2 dataAndMetadata system ready
array1_sata_60001ff0732f85d8c090009 2 dataAndMetadata system ready
array1_sata_60001ff0732f85f8c0b000b 2 dataAndMetadata system ready
array1_sata_60001ff0732f8608c0f000c 2 dataAndMetadata system ready
9.5.7 Create the GPFS File System

The underlying clustered filesystem that SONAS uses is the IBM GPFS Filesystem. Use the
CLI command mkfs to create the filesystem. Example 9-11 shows how to create the
filesystem using the command. Note that we do not use all the available disks. We use three
disks from failure group 1 and three disks from failure group 2.
Example 9-11 Creating the root Files system using CLI command mkfs.
[root@furby.mgmt001st001 cli]# mkfs gpfs0 /ibm/gpfs0 -F
array0_sata_60001ff0732f8548c000000,array0_sata_60001ff0732f8568c020002,array0_sata_60001ff0732f

8588c040004,array0_sata_60001ff0732f85a8c060006,array1_sata_60001ff0732f8558c010001,array1_sata_
60001ff0732f8578c030003,array1_sata_60001ff0732f8598c050005,array1_sata_60001ff0732f85d8c090009
--master -R meta --nodmapi
The following disks of gpfs0 will be formatted on node strg001st001:
array0_sata_60001ff0732f8548c000000: size 15292432384 KB
array0_sata_60001ff0732f85a8c060006: size 15292432384 KB
array1_sata_60001ff0732f85d8c090009: size 15292432384 KB
Formatting file system ...
Disks up to size 141 TB can be added to storage pool 'system'.
Creating Inode File
0 % complete on Wed Apr 21 16:36:30 2010
Creating Allocation Maps
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool 'system'
Completed creation of file system /dev/gpfs0.
EFSSG0019I The filesystem gpfs0 has been successfully created.
EFSSG0038I The filesystem gpfs0 has been successfully mounted.

EFSSG0140I Applied master role to file system gpfs0
EFSSG0015I Refreshing data ...
In Example 9-11 on page 288, the filesystem gpfs0 is created with replication of the MetaData
set and hence uses disks from 2 failure groups. The second failure group was created in the
previous Example 9-9 on page 288 . The filesystem is also marked as the Master filesystem.
Master filesystem is a unique filesystem in the SONAS appliance. This filesystem holds the
shared information that is used by the Cluster Manager, CTDB.
You can verify the creation of filesystem using the command lsfs. Example 9-12 displays the
output for the newly created filesystem.
Example 9-12 Verifying the creation of file system using CLI command lsfs.
[root@furby.mgmt001st001 cli]# lsfs
Cluster Devicename Mountpoint
Furby.storage.tucson.ibm.com gpfs0 /ibm/gpfs0
Note: The actual information displayed on the screen has many more fields than that are
shown in the above Example 9-12, and is too large to show in this example. This example
has been simplified to ensure the inportant information is clear.
The command lsdisk shows the list of disks used now for the gpfs0 filesystem as shown in
Example 9-13 .
Example 9-13 Verifying the disks used for the file system created using CLI command lsdisk
lsdisk
array0_sata_60001ff0732f8548c000000 gpfs0 1 dataAndMetadata system ready up
array0_sata_60001ff0732f85a8c060006 gpfs0 1 dataAndMetadata system ready up
array1_sata_60001ff0732f85d8c090009 gpfs0 2 dataAndMetadata system ready up
array1_sata_60001ff0732f85f8c0b000b 2 dataAndMetadata system ready
array1_sata_60001ff0732f8608c0f000c 2 dataAndMetadata system ready
As you can see, the disk mentioned at creation of filesystem is now part of the filesystem
(gpfs0 in the example) and it includes disks from both the failure groups.
9.5.8 Configure the DNS Server IP addresses and domains

The SONAS appliance must be configured with the IP Address of the Domain Name Services
(DNS) servers and the Domains. These IP addresses are also called the Public IP addresses
which are accessible on your network. Only the Management Node and the Interface nodes
are accessible on your network.

The DNS can be configured using the CLI command setnwdns. we will take three examples
to explain the command.
In the first example, the setnsdns command is run with only a single DNS server with IP
address 9.11.136.116 with no Domain or Search String (see Example 9-14 below).
Example 9-14 Configuring DNS with only DNS server IP.

[SONAS]$ setnwdns 9.11.136.116
In the second example,the setnwdns command is run with a single DNS server with IP
address 9.11.136.116 along with a domain name of storage.ibm.com and single search string
as servers.storage.ibm.com is used (see Example 9-15 below).
Example 9-15 Configuring DNS with DNS server IP, domain name and Search string.
[SONAS]$ setnwdns 9.11.136.116 --domain storage.ibm.com --search servers.storage.ibm.com
In the third example,the setnwdns command is run with multiple DNS servers having IPs
9.11.136.116 and 9.11.137.101, domain name of storage.ibm.com and multiple search
strings like servers.storage.ibm.com, storage.storage.ibm.com (see Example 9-16 on
page 291).
Example 9-16 Configuring DNS with DNS server IP, domain name and multiple search strings
[SONAS]$ setnwdns 9.11.136.116,9.11.137.101 --domain storage.ibm.com --search
servers.storage.ibm.com,storage.storage.ibm.com
For our example cluster setup, we use the setnwdns with three search string options,
storage3.tucson.ibm.com, storage.tucson.ibm.com and
sonasdm.storage.tucson.ibm.com as shown in Example 9-17. Here our DNS Server IP is
9.11.136.132 and 9.11.136.116.
Example 9-17 Configuring DNS with DNS server IP and multiple Search strings using CLI command mknw
[root@furby.mgmt001st001 cli]# setnwdns 9.11.136.132,9.11.136.116 --search
storage3.tucson.ibm.com,storage.tucson.ibm.com,sonasdm.storage.tucson.ibm.com
To verify that the DNS Server IP Address and Domain have been successfully configured,
check the content of the resolv.conf file on each Management and Interface node. Keep in
mind, the Management Node and Interface Nodes are the only nodes accessible from your
network and hence only these Nodes are used to setup DNS. Steps to verify the DNS
configuration are shown in Example 9-18.
Example 9-18 Verifying that the DNS has been successfully configured
[root@furby.mgmt001st001]$ onnode all cat /etc/resolv.conf
>> NODE: 172.31.132.1 <<

search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com
nameserver 9.11.136.132
>> NODE: 172.31.132.2 <<


>> NODE: 172.31.132.3 <<

>> NODE: 172.31.132.4 <<

>> NODE: 172.31.132.5 <<

>> NODE: 172.31.132.6 <<

>> NODE: 172.31.136.2 <<

In Example 9-18 on page 291, the SONAS setup has one Management Node and six
Interface Nodes. The DNS server IP used is 9.11.136.116 and three search strings,
storage3.tucson.ibm.com, storage.tucson.ibm.com and
sonasdm.storage.tucson.ibm.com are used.
The Management Node IP is: 172.31.136.2 and Interface Node IPs are: 172.31.132.* as
described in 9.5.3, “Understanding the IP Addresses for Internal Networking” on page 286.
9.5.9 Configure the NAT Gateway

Network Address Translation (NAT) is a technique used with the network routers. The
SONAS appliance has its Interface Nodes talking to each other using a Private IP Address.
This network is not accessible by your network. The Public IP Addresses are the addresses
through which the you can access the Managagement Nodes as well as the Interface Nodes.
Hence, the Management Nodes and Interface nodes have a Private IP address for internal
communication and a Public IP address for the external communication. NAT allows a single
IP address on your network or public IP address to be used to access the Management Node
and Interface Nodes on their private network IP addresses.
The network router converts the IP address and port on your network to a corresponding IP
address and port on the private network. This IP address is not a Data path connection and is
not used for reading or writing files. It is used to provide path from Managament Node and
Interface Nodes to your network for the authorization and authentication process.
The CLI command mknwnatgateway is used to configure NAT on the SONAS appliance.
Example 9-19 shows how NAT is configured using the CLI command:

Example 9-19 Setting up the NAT gateway using CLI command mknwnatgateway.
[root@furby.mgmt001st001]$ mknwnatgateway 9.11.137.246/23 ethX0 9.11.136.1 172.31.128.0/17
mgmt001st001,int001st001,int002st001,int003st001,int004st001,int005st001,int006st001
EFSSG0086I NAT gateway successfully configured.
As you can see in above Example 9-19, the Public NAT gateway IP is 9.11.219.245, Interface
is ethX0, default gateway is 9.11.136.1, private network IP address is 172.31.128.0/17 and
the nodes specified are Management Nodes and the six Interface Nodes. This means, all the
Management and Interface Nodes talk to the outside word on their public IP through the NAT
gateway.
Confirm the NAT has been configured using CLI command, lsnwnatgateway as shown
below in Example 9-20.
Example 9-20 Verifying that the NAT gateway has been successfully configured using CLI command lsnwnatgateway.
[root@furby.mgmt001st001]$ lsnwnatgateway
Public IP Public interface Default gateway Private network Nodes
9.11.137.246/23 ethX0 9.11.136.1 172.31.128.0/17,
172.31.136.2,172.31.132.1,172.31.132.2,172.31.132.3,172.31.132.4,172.31.132.5,172.31.132.6
Another way to checkthat the NAT gateway has been successfully configured is to check if the
Management Nodes and Interface Nodes can ping the gateway specified (see
Example 9-21).
Example 9-21 Verifying that all the Nodes of the cluster can ping the NAT Gateway
onnode all ping -c 2 9.11.137.246/23
>> NODE: 172.31.132.1 <<

PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data.
64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.034 ms
--- 9.11.137.246 ping statistics ---

2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.022/0.028/0.034/0.006 ms
>> NODE: 172.31.132.2 <<

--- 9.11.137.246 ping statistics ---

>> NODE: 172.31.132.3 <<

--- 9.11.137.246 ping statistics ---


>> NODE: 172.31.132.4 <<

--- 9.11.137.246 ping statistics ---

>> NODE: 172.31.132.5 <<

--- 9.11.137.246 ping statistics ---

>> NODE: 172.31.132.6 <<

--- 9.11.137.246 ping statistics ---

>> NODE: 172.31.136.2 <<

--- 9.11.137.246 ping statistics ---

A successful ping shows that the NAT gateway has been successfully configured.
9.5.10 Configure Authentication - AD and LDAP

SONAS requires that the users accessing the appliance must be authorized and
authenticated. You can choose to use Active Directory (AD) or LDAP for authentication and
authorization. SONAS supports both authentication methods and has equivalent CLI
commands for its configuration.
When users access IBM SONAS, they are required to enter their user ID and password. This
user ID and password pair is sent across the network to the remote
authentication/authorization server which compares the user ID and password to the valid
user ID and password combinations in the database. If they match then the user is
considered to be Authenticated. The remote server then sends a response to SONAS
confirming that the user has been successfully Authenticated.

Note:
1. Authentication is the process of verifying the identity of the user. Users confirms that
they are indeed the users they are claiming to be. This is typically accomplished by
verifying the user ID and password.
2. Authorization is the process of determining if the users are allowed to acces. The users
may have persmissions to access certain files but may not have permissions to access
others. This is typically done by ACLs.
Sections below describe configuration of Acitve Directory Server (AD) and LDAP in detail.
You will choose one of the authentication methods.
Configure using Active Directory (AD)

CLI command cfgad allows you to configure AD server. After the configuration, you need to
check if it has been successful using the CLI command chkauth. See following Example 9-22
for the command usage. For the example, consider the AD server here has IP 9.11.136.116
and AD user aduser having password adpassword.
Example 9-22 Configuring using Windows AD using CLI command cfgad.

[root@furby.mgmt001st001]$ cfgad -as 9.11.136.132 -c Furby.storage.tucson.ibm.com -u
Administrator -p Ads0nasdm
(1/11) Parsing protocol
(2/11) Checking node accessibility and CTDB status
(3/11) Confirming cluster configuration
(4/11) Detection of AD server and fetching domain information from AD server
(5/11) Checking reachability of each node of the cluster to AD server
(6/11) Cleaning previous authentication configuration
(7/11) Configuration of CIFS for AD
(8/11) Joining with AD server
(9/11) Configuration of protocols
(10/11) Executing the script configADForSofs.sh
(11/11) Write auth info into database
EFSSG0142I AD server configured successfully
Now verify that the cluster is now part of the Active Directory (AD) domain using the following
command as shown in Example 9-23 below:
Example 9-23 Verifying that the Windows AD server has been successfully configured.
[root@furby.mgmt001st001]$ chkauth -c Furby.storage.tucson.ibm.com -t
Command_Output_Data UID GID Home_Directory Template_Shell
CHECK SECRETS OF SERVER SUCCEED
Configure using Lightweight Directory Access Protocol (LDAP)

CLI command cfgldap allows you to configure the LDAP server. After the configuration, you
need to check if it has been successful using the CLI command chkauth. See following
Example 9-22 for the command usage. For the Example 9-24, consider the LDAP server (ls)
here has IP sonaspb29 and other parameters that LDAP requires like: Suffix (lb) as
“dc=sonasldap,dc=com”, rootdn (ldn) as “cn=manager, dc=sonasldap, dc=com”, password
(lpw) as secret and ssl method as tls. You can get this information from your LDAP
administrator. It is found in the /etc/eopn/ldap/slapd.conf file on the LDAP server. This
information is also collected in Preinstallation Planning Sheet. in Chapter 8, “Installation
planning” on page 243.

Example 9-24 Configuring using LDAP using CLI command cfgldap.

[root@furby.mgmt001st001]$ cfgldap -c Furby.storage.tucson.ibm.com -d storage.tucson.ibm.com -lb
“dc=sonasldap,dc=com” -ldn “cn=Manager,dc=sonasldap,dc=com” -lpw secret -ls sonaspb29 -ssl tls
-v
Now verify that the cluster is now part of the LDAP sever using the following command as
show in Example 9-25 .
Example 9-25 Verifying that the LDAP server has been successfully configured.
[root@furby.mgmt001st001]$chkauth -c Furby.storage.tucson.ibm.com -t
Command_Output_Data UID GID Home_Directory Template_Shell
CHECK SECRETS OF SERVER SUCCEED
9.5.11 Configure Data Path IP Addresses

CLI command mknw helps you configure the Data Path IP Addresses. The command creates
a network configuration that can be configured on the interface node for the cluster. This
meta-configuration-only operation is applied first if a user attaches the network configuration
to an interface and, optionally, to a host group (see Example 9-26 below).
In the example we use the public IP addresses of: 9.11.137.10, 9.11.137.11, 9.11.137.12,
9.11.137.13, 9.11.137.14, and 9.11.137.15, subnet is 9.11.136.0/23 and subnet gateway is
0.0.0.0/0:9.11.136.1
Example 9-26 Configuring the Data Path IP using the CLI command mknw.
[root@furby.mgmt001st001]$ mknw 9.11.136.0/23 0.0.0.0/0:9.11.136.1 ••add
9.11.137.10,9.11.137.11,9.11.137.12,9.11.137.13,9.11.137.14,9.11.137.15
Verify that the Data Path IP Address has been successfully configured using the CLI
command lsnw as shown in Example 9-27 below.
Example 9-27 Verifying that the Network is successfully c onfigured using CLI command lsnw.
[root@furby.mgmt001st001]$ lsnw -r
Network VLAN ID Network Groups IP-Addresses Routes
9.11.136.0/23 9.11.137.10,9.11.137.11,9.11.137.12,9.11.137.13,9.11.137.14,9.11.137.15
The above command is used with no VLAN. You can also run with VLAN option as shown in
Example 9-28 below. In the example, 101 is the identification number of the VLAN.
Example 9-28 Configuring the Data Path IP with VLAN using the CLI command mknw.
[root@furby.mgmt001st001]$ mknw 9.11.136.0/23 0.0.0.0/0:9.11.136.1 --vlan 101 ••add
9.11.137.10,9.11.137.11,9.11.137.12,9.11.137.13,9.11.137.14,9.11.137.15
Verify the command is successful by running CLI command lsnw. Example 9-29 shows
sample output.
Example 9-29 Verifying that the Network is successfully c onfigured using CLI command lsnw.
9.11.136.0/23 101 9.11.137.10,9.11.137.11,9.11.137.12,9.11.137.13,9.11.137.14,9.11.137.15

9.5.12 Configure Data Path IP Address Group

CLI command mknwgroup helps you configure the Data Path IP Address Group. The
command creates a group of nodes with the name groupName. An existing network
configuration can be attached to this group of nodes (see Example 9-30 below).
Example 9-30 Configure Data Path IP Group using CLI command mknwgroup.
[root@furby.mgmt001st001]$ mknwgroup int
int001st001,int002st001,int003st001,int004st001,int005st001,int006st001
Verify the command is successful by running CLI command lsnwgroup as seen in

Example 9-31.
Example 9-31 Verifying that the Data Path IP Group has been successfully configured using CLI command lsnwgroup
[root@furby.mgmt001st001]$ lsnwgroup -r
Network Group Nodes Interfaces
DEFAULT int int001st001,int002st001,int003st001,int004st001,int005st001,int006st001
9.5.13 Attach the Data Path IP Address Group

CLI command attachnw helps you attach the Data Path IP Address Group. The command
attaches a network to a specified network group. All nodes in the network are configured so
that the cluster manager might start any of the IP addresses configured for the specified
network on the specified interface (see Example 9-32).
Example 9-32 Attaching the Data Path IP group using CLI command attachnw.
[root@furby.mgmt001st001]$ attachnw 9.11.136.0/23 ethX0 -g int
Verify the command is successful by running CLI command lsnw as shown below in
Example 9-33.
Example 9-33 Verify that the Data Path IP has been successfully attached using CLI command lsnw
9.11.136.0/23 int 9.11.137.10,9.11.137.11,9.11.137.12,9.11.137.13,9.11.137.14,9.11.137.15
9.6 Creating Exports for data access

SONAS allows clients to access the data stored on the filesystem using protocols like CIFS,
NFS and FTP. Data exports are created and as long as the protocols CIFS, NFS and FTP are
active, these exports can be accessed using these protocols. An export is a shared disk
space. Exports can be created using the CLI command mkexport and also using the GUI.
The mkexport command requires you to enter Sharename and a Directory Path for the
export. The Directory Path is the path where the directory that is to be accessed by the clients

is located. This directory is seen by the clients with its Sharename. The Sharename is used
by the clients to mount the export or access the export.
Depending on the protocol you wish to configure your export for, you need to pass the
respective parameters. As an administrator, you should provide all these details.
1. FTP takes no parameter.
2. NFS requires you to pass certain parameters:
– Client/IP/Subnet/Mask: The clients that can access the NFS share. “*” implies all
clients can access
– ro or rw: Depending on if the export should have read-only access or read-write
access.
– root_Squash: This is enabled by default. You can set it to no_root_squash.
– async: This is enabled by default. You can set it to sync if required.
3. CIFS requires you to pass parameters like:

– browsable: By default it is Yes, you can change it to No.
– comment: You can write any comment for the CIFS export.
In Example 9-34 below, we see how an export is created for all the protocols like CIFS, NFS
and FTP. Let the Sharename be: shared and Directory path be /ibm/gpfs0/shared. Here, we
also need to be sure that the filesyetem gpfs0 is already existing and mounted on all the
nodes. We set the default parameters for CIFS and mention minimum parameters for NFS in
the example below.
Example 9-34 Creating Data export using CLI command mkexport.

[root@furby.mgmt001st001]$ mkexport shared /ibm/gpfs0/shared --nfs "*(rw,no_root_squash,async)" --ftp
--cifs browseable=yes,comment="IBM SONAS" --owner "STORAGE3\eebbenall"
EFSSG0019I The export shared has been successfully created.
Verify the exports are created correctly using the command lsexport. See Example 9-35
below:
Example 9-35 Verfifying that the export has been successfully created using CLI command lsexport.
[root@furby.mgmt001st001]$ lsexport -v
Name Path Protocol Active Timestamp Options
shared /ibm/gpfs0/shared FTP true 4/14/10 6:13 PM
shared /ibm/gpfs0/shared NFS true 4/14/10 6:13 PM *=(rw,no_root_squash,async,fsid=693494140)
shared /ibm/gpfs0/shared CIFS true 4/14/10 6:13 PM browseable,comment=IBM SONAS
9.7 Modify ACLs to the shared export

Access Control Lists (ACLs) are used to specify the authority a user or group has to access a
file, directory, or file system. A user or group could be granted read-only access to files in a
directory, while given full (create/write/read/execute) access to files in another directory. Only
a user who has been granted authorization in the ACLs will be able to access files on the IBM
SONAS.

Access rights / ACL management must only be executed on Windows clients. In case a
Windows workstation is not available, it can be done on the CLI using the GPFS command to
change the ACLs. For this, you will need root access to the SONAS system and user must be
familiar with the VI editor.
Go through the following steps to provide ACLS:

1. Firstly, change the group or user who can access this shared export using command
chgrp as in example below where group is “domain users” from the domain “domain1”.
The export here is “/ibm/gpfs0/shared” (see Example 9-36).
Example 9-36 Change group persmissions using command chgrp.

$ chgrp "STORAGE3\domain users" /ibm/gpfs0/shared
2. You should use a Windows workstation on your network to modify the ACLs in order to
provide the appropriate authorization. The following sub-steps can be used as a guide:
– Access the shared folder using Windows Explorer.
Note: This procedure must be used by the owner.
– Right click on the folder, and select 'Sharing and Security...'.

– Use the functions on the Sharing tab and/or the Security tab to set the appropriate
authorization.
3. If a Windows workstation is not available for modifying the ACLs, use the following
sub-steps to manually edit the ACLs:
Note: It requires manual editing of the ACL file using the VI editor, so it should only be
used by those who are familiar with the VI editor. You need to be root in order to
execute this command
Run the command shown in the Example 9-37 using VI editor to modify ACLs. Specify that
you want to use VI as the editor in the following way:
$ export “EDITOR=/bin/vi”
Example 9-37 ViewGPFS ACLs to give access to users using GPFS command mmgetacl.
$ export “EDITOR=/bin/vi”
Type mmeditacl /ibm/gpfs0/shared and press Enter. The following screen is displayed:
#NFSv4 ACL
#owner: STORAGE3\eebenall
#group: STORAGE3\domain users
special:owner@:rwxc:allow
(X)RbEAD/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED
(-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED
special:group@:rwx-:allow
(X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED
(-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED
special:everyone@:r-x-:allow

Now change the ACLs by adding the text shown below Example 9-38 in bold:
Example 9-38 Adding new group to export using the GPFS command mmeditacl.
#NFSv4 ACL
#owner: STORAGE3\eebenall
group:STORAGE3\domain admins:rwxc:allow
(X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED
Verify that the new user/group has been added by running the command mmgetacl for the
directory whose ACLs were changed. The output should include the newly added user/group
as show in Example 9-37 on page 299. See output below in Example 9-39:
Example 9-39 Verifying that the new group was successfully added to export using GPFS command mmgetacl.
$ mmgetacl /ibm/gpfs0/shared
#NFSv4 ACL
#owner: STORAGE3\administrator
group:domain1\domain admins:rwxc:allow
(X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED
(-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED

9.8 Test access to the SONAS

Now that we have completed the Installation and Configuration, let us see how to access the
SONAS appliance.
1. You should be already connected to the Management node. If not, login with user id root
and password.
2. From the Management Node, you can view the Health Center using the following steps:
a. Select Applications  SONAS  SONAS GUI as shown in Figure 9-7.
Figure 9-7 Showing how to access SONAS GUI
b. If an Alert is displayed, warning you about an invalid security certificate, click on OK.
c. If a 'Secure Connection Failed' message is displayed as shown below in Figure 9-8.
Figure 9-8 Security Connection Failed message when accessing GUI
Click on Add Exception as shown in Figure 9-9.
Figure 9-9 Adding exception

A new window appears as shown below in Figure 9-10. Click on Get Certificate, and
click on Confirm Security Exception.
Figure 9-10 Get Certificate and Confirm Security Exception
d. At the Scale Out File Services login screen, log in with User ID root and the root
password (see Figure 9-11 below).
Figure 9-11 Login into Management GUI Interface with root user ID and Password
e. If you are asked if you want Firefox to remember this password, click on “Never for
This Site.”
f. The first time you log into the GUI, you will be asked to accept the software license
agreement. Follow the instructions on the screen to accept the software license
agreement.

g. Click on Health Summary.

h. Click on Alert Log. The Alert Log will be displayed.
i. Review the Alert Log entries. Figure 9-12 shows an example of Alert Log.
Figure 9-12 Example Alert Log. Ignore the Criticial Errors
Note: It is normal for one or more informational entries (entries with a severity of
info) to be in the log following the installation. These can be ignored.
j. If any problems are logged, click on the Event ID for more information. The Information
Center will be displayed with information about the Event ID.
k. If a 'Firefox prevented this site from opening a pop-up window.' message is displayed,
click on Preferences, and click on Allow pop-ups for localhost.
l. Resolve any problems by referring to the Problem Determination guide in the
Information Center. If unable to resolve a problem, contact your next level of support.
m. When any problems have been resolved, clear the System Log by clicking on System
Log then clicking on Clear System Log.
n. When you are finished using the SONAS GUI, click on Logout. The Scale Out File
Services login screen will be displayed.
o. Close the browser by clicking X. The Linux desktop will be displayed.
p. Logout by selecting System  Log Out root.

3. Connect the Ethernet network cables.

\
Note: Connecting the customer Ethernet cables is a customer responsibility.
4. Connect each cable to an available Ethernet port on the Interface Node.

5. If the Rack contains another Interface Node, repeat the steps in this section until all
Interface Nodes in the Rack have been cabled.
6. If you are installing more than one Rack, repeat the steps in this section until all Interface
Nodes in all of the Racks you are installing have been cabled.
The IBM SONAS system is now ready for use.

Draft Document for Review November 1, 2010 9:32 am 7875Admin-2.fm
10
Chapter 10. SONAS administration

This chapter provides information on how you use the GUI and CLI to administer your
SONAS. Daily administrator tasks are discussed and examples provided.

7875Admin-2.fm Draft Document for Review November 1, 2010 9:32 am
10.1 Using the management interface

The SONAS appliance can be accessed using the Graphical User Interface (GUI) and the
Command Line Interface (CLI) provided by SONAS. The GUI has different administrative
Panels to carry out administrative tasks. The CLI allows you to administor the system using
commands.
Both CLI and GUI provide with details and help for each tasks and command. CLI also
contains the manpages whih you can use to get more information about a certain command.
The GUI tasks are made to be self explainatory and also have tool tips for every text box or
command which gives more information about what is to be done. There is also the “?” sign
which is for “help” on the right hand upper corner of each panel in the GUI.
In this chapter, most of the important and commonly used commands are explained for both
the GUI and the CLI.
10.1.1 GUI tasks

You can start and stop the GUI as root user via the Management console using the
startmgtsrv and stopmgtsrv commands. GUI tasks are those that you can carry out using the
Graphical Interface of the SONAS box. You login to the Management Node using an Internet
Browser like Internet Explorer or Firefox. On the URL bar, you need to type the following:
https://management_node_name_or_ip_address:1081/ibm/console.
In our example below, we have management node with ip: 9.11.137.220. Hence, you can
access the GUI using:
https://9.11.137.220:1081/ibm/console
Figure 10-1 on page 306 shows the login page. You need to enter the login name and
password and click on Log in to login to a Management Interface.
Figure 10-1 SONAS Management GUI asking for login details
Once logged in, you will be able to see the screen as shown in Figure 10-2 on page 307
below.

Figure 10-2 SONAS GUI when logged in as GUI user
The left frame of the GUI as shown in the above Figure 10-2, shows the collapsed view of all
the categories existing in the GUI.
Figure 10-3 on page 308 illustrates the various areas on the GUI navigation panel. To the left
we have the main navigation pane which allows us to select the component we wish to view
or the task we wish to perform. On the top we see the currently logged-in administrative user
name and just below that we find the navigation tabs that allow you to switch between
multiple open tasks. Underneath we have a panel that contains context-sensitive help,
minimize and maximize buttons at the top right. W e then have action buttons and table
selection, sording and filtering controls. Below that we see a tabe list of objects. At the bottom
right is a refresh button that shows the time the data was last collected and will refresh the
diplayed data when pressed. Clicking on an individual object brings up a detailed display of
information for that object.
Chapter 10. SONAS administration 307

Main Navigation Logged in User
Tab Navigation
Help, min/maximize Panel
Action Buttons
Table select, sort, filter
Click for
Click to refresh List
Details
Figure 10-3 ISC GUI navigation panel areas
The left frame expanded view of all the tasks are as below in the Figure 10-4 on page 309. As
seen in figure, on the URL bar, you provide the MGMT GUI IP address or Management Node
hostname along with right path to access GUI as mentioned above.
Once logged in, on the main page, arond the top center, you can see the CLI user name who
is currently logged in to the GUI. On the right corner, you will also see link to logout from GUI.
The left frame is a list of categories which provide links to perform any task on the cluster.
Click on the links on the left to open the corresponding panel on the right.
The categories are described at a high level in the section below:

Figure 10-4 Expanded view of the left panel with all tasks available in SONAS GUI
The GUI Categories are divided into several tasks seen as links in the left frame. Broadly
these tasks are listed below:
1. Health Center: This panel shows the health of the cluster, its nodes. It gives a topological
view of the cluster and its components. It also provides the logs of system and alert logs of
the system. It provides additonal features like Call home.
2. Clusters: This panel allows you to manage the Cluster, the Interface Nodes, Storage
Nodes.
3. Files: All file system related tasks can be performed in this panel.
4. Storage: Storage at the backend can be manageed using the tasks available.
5. Performance and Reports: SONAS GUI provides you with elegant reports and graphs of
various parameters that you can measure like File system utilization, Disk utilization and
others.
6. SONAS Console Settings: In this section, you can enable threshold limits for Utlization
Monitoring. You can also view the tasks scheduled on the SONAS system. Also, in case of
any notification required for crossing any threshold values, you can set notification to send
emails. Managing this can be done using this panel.

7. Settings: In this Panel, you can manage users and also enable tracing.
In the next section, each of the Categories and its undelying tasks are discussed:
Health Summary
This category allows you to check the health of the cluster including the Interface Nodes,
Storage Nodes and the Management Nodes. It consists of 3 panels:
1. Topology: This panel displays a graphical representation of the SONAS Software system
topology. It provides information about the Management Nodes, Interface Nodes, Storage
Nodes. It includes the state of the data networks and Storage Blocks. It also shows
information related to the filesystem like number of Filesystem existing, number Mounted,
number of Exports and more. See Figure 10-5 below. You can click on each component
and see further details about each.
Figure 10-5 Topology View of the SONAS cluster
2. Alert Log: The Alert log panel displays the alert events that are generated by the SONAS
Software. Each page has around 50 logs displayed. Severity of event can be Info,
Warning or Critical. They are displayed in Blue, Yellow and Red respectively. You can
filter logs in the table depending on the severity, time period of logs and source. Source of
logs is the host on which the event occured on. See Figure 10-6 on page 311 below.

Figure 10-6 Alert Logs in the GUI for SONAS
3. System Log: This panel displays system log events that are generated by the SONAS
Software, which includes management console messages, system utilization incidents,
status changes and syslog events.
Each page displays around 50 logs. System logs are of 3 levels, Information (INFO),
Warnings (WARNING) and Severe (SEVERE). You can filter the logs by the log level,
component, host and more. Figure 10-7 on page 312 shows how the System log panel in
the GUI looks.

Figure 10-7 System Logs in the GUI for the SONAS cluster
Clusters
This panel allows you to administor the cluster including the Interface Nodes and storage
Nodes. It allows you to modify the cluster configuration parameters. Each panel and its tasks
are discussed in the following section:
1. Clusters:
a. Add/Delete cluster to Management Node: The GUI allows you to manage not just
the cluster it is a part of, but also other clusters. You can also delete the cluster from
the GUI n order to stop managing it.
You can add the cluster you wish to manage using the “Add cluster” option in the
“Select Action” drop box. This opens a new panel (in the same window) in which you
need to add IP Address of one of the nodes of the cluster and its password. The cluster
is identified and is added into the GUI. Figure 10-8 on page 313 shows how you can
add the cluster.

Figure 10-8 Cluster Page on the GUI. Select Action to add cluster in the GUI
You can also Delete the cluster previously added to the GUI, by selecting it using the
check box present before the name of the cluster and clicking the “Delete cluster”
option in the “Select Action” drop box. It will ask you for your confirmation before
deleting it. Figure 10-9 on page 313 shows how you can delete the cluster added to the
GUI.
Figure 10-9 Cluster Page on the GUI. Select Action to delete cluster in the GUI
b. View Cluster Status

This panel displays the clusters that have been added to the GUI. See Figure 10-10 below.

Figure 10-10 View Cluster Status
c. Nodes: This panel is one of the tabs on the lower side of the clusters panel. Here, you
can view the status of connection for all the nodes like Management Node, Interface
Nodes and Storage Nodes. Following Figure 10-11 shows the view of the Nodes Panel.
Figure 10-11 Node information seen
Clicking on the links of the Nodes as shown in blue in figure below will take you to the
respective Interface Node, Storage Node Panel explained in points 2 and 3 of the
section “Clusters” on page 312 below.
d. File Systems: This panel displays all the filesystem on the SONAS appliance. It shows
other information of the filesystem like the Mount point, the Size of the filesystem, Free
space, Used space and more. See Figure 10-12 below. This is a read-only panel for
viewing.
Figure 10-12 Filesystem Information seen
e. Storage Pools: The storage pool panel displays the information of the various Storage
Pools existing. It displays which filesystem belongs to the Storage pool and its capacity
used. See Figure 10-13 below. This is just a read only panel for viewing. You cannot
modify any Storage Pool parameters.

Figure 10-13 Storage Pool Information seen
f. Services: The Services panel shows the various Services that are configured on the
SONAS Box. It also shows its status whether Active or Inactive. The services that are
supported are FTP, CIFS, NFS, HTTP and SCP. These services are required to
configure the data exports on SONAS. End users can access data stored in SONAS
using these data exports. Hence, services need to be configured in order to share the
data to be able to be accessed by using one of the services. You cannot modify the
status of the services from this panel. See Figure 10-14 on page 315.
Figure 10-14 Service information seen
g. General Options: This option allows to view and modfiy the Cluster configuration. It
allows you to modify some of its Global options as well as node specific parameters.
You can also view the cluster details like Cluster Name, Cluster ID, Primary and
Secondary servers and many more. See Figure 10-15 on page 316.

Figure 10-15 General Options for the cluster seen
h. Interface Details: This panel allows you to view and modify the Cluster Manager’s
(CTDB) configuration details. The panel in Figure 10-16 is where you can see the
Netbios Name, Workgroup name, if the Cluster Manager manages winbind and more.
This panel is read only.
Figure 10-16 Interface details for the Cluster
The Advance Options button, on the panel shown in Figure 10-17 on page 317, allows
you to view and modify the CTDB configuration parameters. You can modify the “reclock”
path and other “Advanced Options” of the CTDB. CTDB manages many services and has
a configurable parameter for each. You can modify each to allow CTDB to manage or to
not manage.
Some of the parameters are as follows:
– CTDB_MANAGES_VSFTPD, CTDB_MANAGES_NFS
– CTDB_MANAGES_WINBIND
– CTDB_MANAGES_HTTPD
– CTDB_MANAGES_SCP
By default, these values are set to “yes” and CTDB manages them. You can modify it to
“no” if you wish for CTDB not to manage these services. In case CTDB is not managing
the services, anytime the service goes down, CTDB will not notify by going unhealthy. It
will remain in OK state. In order to monitor the service, set this value to “yes”.

Figure 10-17 Advanced Options under Interface details seen as CTDB information
2. Interface Nodes: The Interface Node panel allows you to view the node status. It displays
the public IP for each node, the active IP address it is servicing, CTDB status and more.
You can also carry out operations on the node like, Suspend, Resume node, Restart the
node or Recover CTDB. To do so, you need to select the Node on which you wish to
perform the action and then select the button respectively. You must also select the
Cluster whose Interface Nodes you want to check by selecting the cluster from the “Active
cluster” drop down menu. Figure 10-18 shows the Interface Node Panel.

Figure 10-18 Interface Node details and operations that can be performed on them
3. Storage Nodes: This panel displays the information of the Storage Nodes. It displays the
IP address of the Storage Nodes, their Connection Status, GPFS status and more. It also
allows you to Start and Stop the Storage Nodes using the Buttons, “Start” and “Stop”.
You need to select the Storage Nodes which you want to Start/Stop and click the
Start/Stop button respectively. You must also select the Cluster whose Storage Nodes
you want to check by selecting the cluster from the “Active cluster” drop down menu.
You will notice, the first Storage Node is highlighted by default. Below in the same pane,
you can see the details of the Storage Node like Hostname, Operating System, Host IP
Address, Controller Information and more. Figure 10-19 on page 319 below shows how
the panel for Storage Node looks.

Figure 10-19 Storage Node details and operations that can be performed on them
Files
This panel allows you to carry out file system related tasks. You can create File Systems,
Filesets, Exports, Snapshots and many more. You must select the Cluster on which you want
to perfom the tasks by selecting the cluster from the “Active cluster” drop down menu. Each of
the tasks that can be performed will be described in the following secton:
1. File Systems: This panel has four sections. Each of the sections are described below.
a. File System: This section allows you to create File Systems. The underlying File
system that a SONAS box creates is a GPFS clustered File System. If there is a
filesystem already existing, it displays the basic information about the filesystem like
Name, Mount Point, Size, Usage and more. You can also perform operations like
Mount, Unmount and Remove a Filesystem. The buttons on that panel help perform
these tasks. See Figure 10-20 on page 320 below.
In case the filesystem extends to next page, you can click on the arrow button to move
to the next page and back. The table also has a refresh Button which is the button in
the lower right corner. This will refresh the list of exports in the table. You can also
select individually, select all or select inverse the exports in this table.

Figure 10-20 File system list and operations that can be performed on them
b. File System Configuration: This section displays the configuration details of the
higlighted File System. It shows information about Device Number, ACL Type, Number
of Inodes, Replication details, Quota details, Mount information and more. It also
allows you to modify the ACL Type, Locking Type, Number of Inodes for the File
System. Click on “Apply” button to apply the new configuration parameters. See
Figure 10-21 on page 320 below .
Figure 10-21 File system detail

c. File System Disks: This section displays the Disks used for the Filesystem. It also
displays the Disk Usage Type. You can Add disks to the file system by clicking on the
“Add a disk to the file system button” or Remove disks from the File System by
selecting the disk and clicking the “Remove” button. See Figure 10-22 below.
Figure 10-22 Disk information of the cluster
d. File System Usage: This section displays the File System Usage information like the
number if Free Inodes, Used Inodes, the Storage pool usage and details. See
Figure 10-23 below.
Figure 10-23 File system usage for the cluster
2. Exports: This panel displays the exports created on the SONAS Box for clients to access
the data. It also allows to create new Exports, Delete Exports, Modify exports and more.
You can also modify the configuration parameters for the protocols like CIFS and NFS.
Each of the sections are described below.

a. Exports: This section displays all the exports that are created along with details like
the Sharename, Path of Directory, Protocols configured for the export. You can add a
new export by using the “Add” button. For existing exports, you can carry out
operations like, Modify the Export, Remove protocols, Activate or Deactivate an export
and Remove the export by selecting the export you wish to perform operation on and
click the respective button. below shows the panel with some existing exports as
examples. See Figure 10-24 on page 322 below. As you can see the exports continue
up to page 4. You can click on the arrow button to move to the next page and back. The
table also has a refresh Button which is the button in the lower right corner. This will
refresh the list of exports in the table. You can also select individually, select all or
select inverse the exports in this table. By default, the first export is highlighted and
protocol details of the exports are displayed in the lower section explained in detail
below.
Figure 10-24 Exports existing in the cluster and operations you can perform on them
b. CIFS Export Configuration: This section displays the CIFS export configuration
details of the highlighted export. As seen in Figure 10-24 above, the first export is
highlighted by default. You can select other exports from the table shown above.
This panel displays the configured parameters like Comment, Browsable options and
Read-only option for the CIFS export and also allows you to modify them. You can also
use the Add, Modify and Remove buttons to add, modify and remove the Advanced
Options respectively, if any. Click on Apply to apply new configuration parameters.
Figure 10-25 explains the panel.

Figure 10-25 Configuration details for CIFS protocol
c. NFS Export Configuration: This section displays the list of NFS clients configured to
access the NFS Exports and their Options. You can modify existing client details using
the “edit” link in the table, remove the client using the “remove” link and also add new
client using the “Add Client” button. Click on the “Apply” button to apply the changes.
See Figure 10-26 below.
Figure 10-26 NFS configuration details
3. Policies: This panel displays and also allows you to set Policies for the file systems
existing. Policy is a rule that you can apply to your File System. It is discussed in detail in
“Call Home test” on page 425.
The Policies panel has two sections explained below:
a. Policies List: This section allows you to view the policies set for the filesystem
avalable. By default the first filesystem is highlighted and its policy details are shown in
the lower section of the panel. You can set default policy to a filesystem by clicking on
the “Set Default Policy” button. Figure 10-27 below shows the Policy Panel.

Figure 10-27 Policies listed for the file systems in the cluster
In the above example, currently there is no policy set for the filesystem.
b. Policy Details: This section shows the Policy details for the filesystem. There is a
policy editor that exists which is a text box where you can write new policies for the file
system. You can apply the policy using the “Appy Policy” button or set the policy by
clicking on the “Set Policy” button on the right of the editor. You can also load policies
using the “Load Policy” button. See Figure 10-28.
Figure 10-28 Editor to create policies for the file systems
4. File Sets: This panel displays the File Sets existing in the file system. You can choose the
file system whose File Sets you want to view by choosing the filesystem from the “Active
file system” drop down list along with the active cluster from the “Active Cluster” drop

down menu. The table then displays all the File Sets that exist for the filesystem. The root
fileset is created by the system. This is the default one and is created when you create the
first filesystem. In the table, you can also view the details of the filesets like Name, Path
linked to, Status and more, in the lower section of the table. You need to highlight or select
the fileset whose details you wish to see.
By default the first File Set is highlighted and in the lower section of the panel, you can
view and modify other details of the File Set. You can view other File Set details by
clicking on and highlighting the one you want to view. Figure 10-29 on page 325 shows the
list of all the filesets and information about the fileset which is highlighted. In our example
below, we have just the root fileset listed.
Figure 10-29 Listing Filesets in the cluster and displaying information of the filset
You can use the “Create a File Set” button to create a new one. You can also Delete an
existing File Sets or Unlink the existing File Sets by selecting the File Sets you want
operate on and click the “Delete” or “Unlink” button respectively.

5. Quota: The clustered file system allows enabling quota and assigning quotas to the users,
groups on filesets and file systems. There are soft limits and hard limits for disk space and
for number of i-nodes. There is also Grace time available when seting quotas. These
concepts are described here:
Soft Limit Disk: The soft limit defines a level of disk space and files below which the user,
group of users or file set can safely operate. Specify soft limits for disk space in units of
kilobytes (k or K), megabytes (m or M), or gigabytes (g or G). If no suffix is provided, the
value is assumed to be in bytes.
Hard Limit Disk: The hard limit defines the maximum disk space and files the user, group
of users or file set can accumulate. Specify hard limits for disk space in units of kilobytes
(k or K), megabytes (m or M), or gigabytes (g or G). If no suffix is provided, the value is
assumed to be in bytes.
Soft Limit I-nodes: The i-node soft limit defines the number of i-nodes below which a
user, group of users or file set can safely operate. Specify soft limits for i-nodes in units of
Hard Limit I-nodes: The i-node hard limit defines the maximum number of i-nodes that a
user, group of users, or file set can accumulate. Specify hard limits for i-nodes in units of
Grace Time: Grace time allows the user, group of users, or file set to exceed the soft limit
for a specified period of time (the default is one week). If usage is not reduced to a level
below the soft limit during that time, the quota system interprets the soft limit as the hard
limit and no further allocation is allowed. The user, group of users, or file set can reset this
condition by reducing usage enough to fall below the soft limit.
Figure 10-30 below shows the screen shot of how Quota looks in the GUI. On a SONAS
box, as of now the GUI only allows readonly access to quota which means, you can only
view the quota but not enable or set quota. In our example, this is the default quota
displayed for the file systems for the user root.
Figure 10-30 Quotas page in the GUI

6. Snapshots: This panel displays the Snapshots existing in the file system. You can
choose the file system whose File Sets you want to view by choosing the filesystem from
the “Active file system” drop down list along with the active cluster from the “Active
Cluster” drop down menu. In the table that lists the snapshots you can also see other
details like Name, Status, Creation Time Stamp and more. You can remove an already
existing snapshot from the cluster. To do this, select the snapshot you wish to remove and
press the “Remove” button. You can also create snapshots using the “Create a new
Snapshot of the active cluster and filesystem” button. By default, the first snapshot is
selected and highlighted. In the lower section of the panel, you can see the details of the
snapshot. You can choose another snapshot from the list to see its corresponding details.
See Figure 10-31 on page 327 below.
Figure 10-31 Snapshot lists that exist in cluster and its details
Storage
This panel allows you to view the Storage Disks and Pool details. You can perform certain
operations on them like Remove Disk, Suspend or Resume Disks. You can also view the
Storage Pools available and its usage details. You must select the Cluster on which you want
to perfom the tasks by selecting the cluster from the “Active cluster” drop down menu. Each of
the tasks that can be performed will be described in the following secton:
1. Disks: This panel displays the disks that are available in the SONAS box and its
information like Usage, Filesystem it is attached to, Status, Failure Group, Storage Pool it
belongs to and more. The table also has a refresh Button which is the button in the lower
right corner. This will refresh the list of exports in the table. You can also select
individually, select all or select inverse the exports in this table. You can filter the table
using filter parameters.
By default, the first disk is highlighted and below in the lower end of the pane, other details
of the disks are displayed. This includes Volume ID, Sector Size, List of disk servers they
reside on and more. See Figure 10-32 below.

Figure 10-32 List of Storage disks in the cluster and their details
2. Storage Pools: This panel displays the Storage Pool list for a file system. The main table
displays the filesystems existing in the cluster. It also displays the Pool Usage and i-node
Usage for the filesystem.
By default, the first filesystem in the list is highlighted and for this filesystem in the lower
section of the panel, you can see the Storage Pool related details like Number of Free
i-nodes, Maximum I-Nodes, Allocated i-NodesIt also displays the size of the pool. You can
also see Size, Free Blocks and Fragment details of the NSD or disks in the system. See
Figure 10-33 on page 329 below.

Figure 10-33 Storage pools existing in the cluster and their details
Performace and reports:

This panel allows you to monitor the Performance and generate Reports. You can check
performance of the System in the System Utilization panel and also of the File System in the
File System Utilization panel. You can also generate reports for the same, daily, weekly,
monthly or any other. The charts are a pictorial representation. Each of the panels are
described below.
1. System Utlization:
In this panel, you can view the Performance of the System. You can also generate reports
or charts that illustrate the system utilization of the specified cluster nodes. You need to
choose at least one node and the measurement settings. Click on "Generate Charts"
button to display the chart you want. See Figure 10-34 on page 330 below.

Figure 10-34 System Utilization details for the Nodes in the cluster.
2. File System Utilization: This panel generates charts that illustrate the utilization of the
specified file system. The table shows the filesystem that exist in the cluster. It also
displays other details like Clustername, Disk usage for filesystem and more. At the start,
no chart is displayed till you select the Filesystem, Duration and click on “Generate
Charts” button. See Figure 10-35 on page 331 below.

Figure 10-35 File system Utilization for the file systems in the cluster
SONAS Console Settings

This panel allows you to carry out various kinds of tasks like view and add thresholds to
monitor the cluster, list the scheduled tasks and also create new tasks for being scheduled.
You can also setup notifcation so that, if the threshold is reached or crossed, an event report
can be generated and a mail be sent to the administrator or other recepients.. Additionally,
you can add or remove recepients and edit the contact information. You must select the
Cluster on which you want to perfom the tasks by selecting the cluster from the “Active
cluster” drop down menu. Each of the tasks that can be performed will be described in the
following secton:
1. Utilization Thresholds: This panel list all thresholds for various utilization monitors per
cluster. A corresponding log message is generated for a monitor if the warning
respectively the error level value is exceeded by the values measured the last
"recurrences" times. The table displays the details of all the thresholds added like their
Warning level, Error level and more. You can remove a threshold previously added by
selecting it and clicking on the “Remove” button. You can add new Thresholds using the
“Add Threshold” button. See Figure 10-36 on page 332 below. Geeneration of charts is
also explained in detail for system utilization in 10.9.1, “System Utilization:” on page 402
and file system utilization in 10.9.2, “File System Utilization” on page 404.

Figure 10-36 Threshold details for the cluster
2. Scheduled Tasks: This panel allows you to view and manage the tasks. SONAS has a
list of predefined task for the management node. A predefined task can be a GUI task or a
cron task. GUI tasks can be scheduled only one time and only run on the management
node. Whereas cron tasks can be scheduled multiple times and for the different clusters
managed by the management node. Cron tasks are predefined to run either on all nodes
of the selected cluster or on the recovery master node only. You can add new tasks and
remove or execute existing tasks too.
This panel has 2 sections. First part is the table that lists all the tasks that are already
scheduled. The lower section is the details about each tasks. The two sections are
explained below:
a. Tasks List: This is the upper section of the pane. It lists the tasks that are already
scheduled in the form of a table. It includes the Task Name, Schedule, Execution
Node, Status of last run and more. You can excute or remove any task in the list by
selecting the task and clicking the “Execute” or “Remove” button respectively. You can
also add a new task using the “Add Task” button. See Figure 10-37 on page 333 below.
You can select any other tasks to see its details which is displayed in the next section.
You can also select single or multiple nodes and filter out using fileter parameters. You
can select the arrow button to view tasks in the next page if any.

Figure 10-37 Scheduled tasks list for the cluster
b. Task Details: By default, the first task is highlighted in the table. You can change the
selection by clicking on any other tasks from the table. Upon selecting the task, its
details is shown in the lower section of the pane. The details include Task name,
Description, Task Parameter, Schedule time and more. See Figure 10-38 on page 333
below.
Figure 10-38 Task details

3. Notification Settings: This panel allows you to define notification settings for the selected
cluster. Choose the Default option in the “Active Cluster” drop down menu to apply the
settings as default values for all clusters.
The panel as you can see in has a lot of options that you can choose to set notification for.
You can set it for Utilization monitoring, GUI, Syslg events, Quota Checking and many
more. You must also fill up the “General Email Settings” section of the panel with email
addresses and details, so that upon any event generated in case any threshold has been
reached, the respective users will receive a notification email. Describe Header or/and
Footer to the email if required.
To finish, complete the “SNMP Settings” section with the required details and make sure
you click the button “Apply” to save your settings. See Figure 10-39 on page 334 below.
Figure 10-39 Notification settings
4. Notification Recipients:
This panel lists all the recipients who are configured to receive notification emails in case
certain threshold that you are monitoring has been crossed. Select the cluster from the
“Active CLuster” drop down menu. The table lists the Name, email ID, Status and more.
You can remove an existing user added. See Figure 10-40 on page 335 below.

Figure 10-40 Notification settings details for added recipients
5. Contact Information: The internal contact information is used as reference data only.
You can enter the data for the internal contact who has been chosen to address any
SONAS questions or issues. The details you must add are: Customer name, Main phone
contact, Site phone contact, E-mail contact, Location, Comment. You can do so using the
“Edit” button. See Figure 10-41 on page 335 below.
Figure 10-41 Contact details of the customer
Settings
This panel allows you to manage the Console settings and tracing. It also allows you to
manage Users allowed to access SONAS using the Management Interface (GUI). The two
sections are described briefly below:

1. Console Logging and Tracing: This panel allows you to view and modify the
configuration properties of the console server diagnostic trace services. Changes to the
configuration take affect after pressing 'OK'. See Figure 10-42 on page 336 below.
Figure 10-42 Console Logging and tracing
2. Console User Authority: Configuration for adding, updating, and removing Console
users. See Figure 10-43 on page 337 below.

Figure 10-43 Console User authority
10.1.2 Accessing the CLI

To access the CLI, ssh to the Management Node and login using the CLI User ID and
password. You will be taken to a restricted shell which allows you to run only CLI commands
and no linux commands except a few limited ones.
For example, consider the Management Node hostname is: Furby.storage.tucson.ibm.com,

you can login to CLI using:
#ssh Furbymgmt.storage.tucson.ibm.com
You will be asked to enter the CLI user id and Password. After that, you will be taken to a CLI
prompt. See Figure 10-44 on page 337 below.
Figure 10-44 CLI user logging in to the Management Node from a Linux client
Example 10-1 contains a list of commands that are available for a CLI user.
Example 10-1 Command list for a CLI user.

[Furby.storage.tucson.ibm.com]$ help
Known commands:
addcluster Adds an existing cluster to the management.
attachnw Attach a given network to a given interface of a network group.

backupmanagementnodeBackup the managament node

cfgad configures AD server into the already installed CTDB/SMABA
cluster.Previously configured authentication server settings will be erased
cfgbackupfs Configure file system to TSM server association
cfgcluster Creates the initial cluster configuration
cfghsm Configure HSM on each client facing node
cfgldap configure LDAP server against an existing preconfigured cluster.
cfgnt4 configure NT4 server against an existing preconfigured cluster.
cfgsfu Configures user mapping service for already configured AD
cfgtsmnode Configure tsm node.
chdisk Change a disk.
chexport Modifies the protocols and their settings of an existing export.
chfs Changes a new filesystem.
chfset Change a fileset.
chkauth Check authentication settings of a cluster.
chkpolicy validates placement rules or get details of management rules of a policy on
a specified cluster for specified device
chnw Change a Network Configuration for a sub-net and assign multiple IP
addresses and routes
chnwgroup Adds or removes nodes to/from a given network group.
chuser Modifies settings of an existing user.
confrepl Configure asynchronous replication.
dblservice stop services for an existing preconfigured server.
detachnw Detach a given network from a given interface of a network group.
eblservice start services for an existing preconfigured server.
enablelicense Enable the license agreement flag
initnode Shutdown or reboot a node
linkfset Links a fileset
lsauth List authentication settings of a cluster.
lsbackup List information about backup runs
lsbackupfs List file system to tsm server and backup node associations
lscfg Displays the current configuration data for a GPFS cluster.
lscluster Lists the information of all managed clusters.
lsdisk Lists all discs.
lsexport Lists all exports.
lsfs Lists all filesystems on a given device in a cluster.
lsfset Lists all filesets for a given device in a cluster.
lshist Lists system utilization values
lshsm Lists configured hsm file systems cluster
lslog Lists all log entries for a cluster.
lsnode Lists all Nodes.
lsnw List all public network configurations for the current cluster
lsnwdns List all DNS configurations for the current cluster
lsnwgroup List all network group configurations for the current cluster
lsnwinterface List all network interfaces
lsnwnatgateway List all NAT gateway configurations for the current cluster
lsnwntp List all NTP configurations for the current cluster
lspolicy Lists all policies
lspool Lists all pools.
lsquota Lists all quotas.
lsrepl List result of the asynchronous replications.
lsservice Lists services
lssnapshot Lists all snapshots.
lstask Lists all (background) tasks for the management node.
lstsmnode Lists defined tsm nodes in the cluster

lsuser Lists all users of this mangement node.

mkexport Creates a new export using one or more protocols.
mkfs Creates a new filesystem.
mkfset Creates a fileset
mknw Create a new Network Configuration for a sub-net and assign multiple IP
mknwbond Makes a network bond from slave interfaces
mknwgroup Create a group of nodes to which a network configuration can be attached.
See also the commands mknw and attachnw.
mknwnatgateway Makes a CTDB NAT gateway
mkpolicy Makes a new policy into database
mkpolicyrule Appends a rule to already existing policy
mksnapshot creates a snapshot from a filesystem
mktask Schedule a prefedined task for
mkuser Creates a new user for this management node.
mountfs Mount a filesystem.
querybackup Query backup summary
restripefs Rebalances or restores the replication of all files in a file system.
resumenode Resumes an interface node.
rmbackupfs Remove file system to TSM server association
rmcluster Removes the cluster from the management (will not delete cluster).
rmexport Removes the given export.
rmfs Removes the given filesystem.
rmfset Removes a fileset
rmlog Removes all log entries from database
rmnw Remove an existing public network configuration
rmnwbond Deletes a regular bond interface.
rmnwgroup Remove an existing group of nodes. A maybe attached public network
configuration must be detached in advance
rmnwnatgateway Unconfigures a CTDB NAT gateway.
rmpolicy Removes a policy and all the rules belonging to it
rmpolicyrule Removes one or more rules from given policy
rmsnapshot Removes a filesystem snapshot
rmtask Removes the given scheduled task.
rmtsmnode Remove TSM server stanza for node
rmuser Removes the user from the management node.
rpldisk Replaces current NSD of a filesystem with a free NSD
runpolicy Migrates/deletes already existing files on the GPFS file system based on the
rules in policy provided
setnwdns Sets nameservers
setnwntp Sets NTP servers
setpolicy sets placement policy rules of a given policy on cluster passed by user.
setquota Sets the quota settings.
showbackuperrors Shows errors of a backup session
showbackuplog Shows the log of the recent backup session.
showrestoreerrors Shows errors of a restore session
showrestorelog Shows the log of the recent restore session.
startbackup Start backup process
startreconcile Start reconcile process
startrepl Start asynchronous replication.
startrestore Start restore process
stopbackup Stops a running TSM backup session
stoprepl Stop asynchronous replication.
stoprestore Stops a running TSM restore session
suspendnode Suspends an interface node.

unlinkfset Unlink a fileset.

unmountfs Unmount a filesystem.
Plus the UNIX commands: grep, initnode, man, more, sed, startmgtsrv, stopmgtsrv, sort, cut,
head, less, tail, uniq
For additional help on a specific command use 'man command'.
[Furby.storage.tucson.ibm.com]$
In SONAS there are some tasks that can be done exclusively by a CLI users while some of
the tasks you can perform using CLI commands as well from the GUI. The commands shown
above are a combination of both.
Each command has help regarding the usage and can be viewed by using command:
# manpage <command_name>
or
# <command_name> --help
For example,let us see help for the command “mkfs” using --help and manpages. See
Example 10-2 for complete help of command and Figure 10-45 on page 341 which shows
snapshot of the manpage help.
Example 10-2 Help or usage for CLI command mkfs taken as example
[Furby.storage.tucson.ibm.com]$ mkfs --help
usage: mkfs filesystem [mountpoint] [-b <blocksize>] [-c <cluster name or id>] [--dmapi |
--nodmapi] [-F <disks>] [-i <maxinodes>] [-j <blockallocationtype>] [--master] [-N
<numnodes>][--noverify] [--pool <arg>] [-R <replica>]
filesystem
The device name of the file system to be created. File system names need not be
fully-qualified.
mountpoint
Specifies the mount point directory of the GPFS file system.
-b,--blocksize <blocksize> blocksize
-c,--cluster <cluster name or id> define cluster
--dmapi enable the DMAPI support for HSM
-F,--disks <disks> disks
-i,--numinodes <maxinodes> Set the maximal number of inodes in the
file system.
-j,--blockallocationtype <blockallocationtype> blockallocationtype
--master master
-N,--numnodes <numnodes> numnodes
--nodmapi disable the DMAPI support for HSM
--noverify noverify
--pool <arg> pool
-R,--replica <replica> Sets the level of replication used in this
file system. Either none, meta or all

Figure 10-45 Manpage for CLI command mkfs taken as example
Similarly you can run help for each of the command available in CLI and also run manpage
command for each.
10.2 SONAS administrator tasks list

In this section we will see the administrator tasks that can be done on the SONAS appliance.
Some of these tasks are carried out by both the CLI commands and SONAS GUI. There are
some that can be carried out only with CLI commands and some that can be done only
through the SONAS GUI.
Below is the list of commands and tasks that can be done as a whole on the SoNAS System.
Some of the important and commonly used commands will be discussed in detail in the
sections below.
Tasks that can be performed only by the SONAS GUI

1. Configure GUI user roles.
2. Configure notification setting and recipients.
3. Configure threshold settings.
4. Change log or trace settings.
5. Show Alert log.
6. Show Health Center (Topology).
7. Show on which nodes the file system is mounted.
8. Start, resume, or suspend an NSD.
9. Querying file system space.
Tasks that can be performed only by the SONAS CLI

1. Start or stop management service.
2. Start, stop, or change asynchronous replication.
3. Create or delete console users.

4. Create or change quotas.

5. Create, list, change, or remove a CLI user.
6. Create, remove, list and restore backups with TSM
7. Configure and show network configuration (DNS, NTP, etc).
8. Configure and show authentication server integration.
9. Shutdown or reboot a node.
10.Restripe the file system.
11.'Replace a disk (LUN).
12.'Change disk properties.
13.Set/unset a master file system
Tasks that can be performed by the SONAS GUI and SONAS CLI
1. Configure protocols and their settings.
2. Add or remove a cluster to/from the management node.
3. Add or remove Network Shared Disks (NSDs).
4. Create or delete a file system.
5. Create or delete exports.
6. Create or delete tasks
7. Create or delete snapshots.
8. Start or stop storage nodes.
9. Change file system parameters.
10.Change cluster parameters.
11.Change disk or NSD status.
12.'Change policies
13.Link or unlink file sets
14.'Mount or unmount a file system.
15.Select the GPFS cluster.
16.Show node status.
17.'Show cluster status.
18.'Show system utilization (CPU, RAM, and so on).
19.'Show snapshots.
20.'Show file system utilization.
21.'Show NSD status.
22.'Show file system status.
23.'Show or filter quotas.
24.'Show storage pools.
25.Show policies.
26.Show file sets.
27.Show the event log.
28.Show tasks.

10.3 Cluster Management

Cluster related commands are the ones used to view or modify the cluster configuration. It
includes configuration of Management Nodes, Interface Nodes, Storage Nodes or cluster as
a whole. Below some of the common cluster tasks are described in detail.
10.3.1 Add/Delete cluster to the GUI

Using the GUI: You can add a cluster to the GUI using the “Add/Delete Cluster” option in the
Clusters panel of the GUI. See details in section 1.a of “Clusters” on page 312.
Using CLI: You can add the cluster to the CLI using the command: addcluster. Example 10-3
below shows the usage and command output.
Example 10-3 Usage and command output for CLI command addcluster.
[Furby.storage.tucson.ibm.com]$ addcluster --help
usage: addcluster -h <host> -p <password>
-h,--host <host> host
-p,--password <password> password
[Furby.storage.tucson.ibm.com]$ addcluster -h int001st001 -p Passw0rd
EFSSG0024I The cluster Furby.storage.tucson.ibm.com has been successfully added
10.3.2 View Cluster status

Using the GUI: You can view the cluster details that is added by clicking the “Clusters” link
under “Clusters” on the left side frame of the GUI. This will open a panel with the Cluster
Details and all other information related to the cluster. For more information refer to the point
1.b of section “Clusters” on page 312.
Using the CLI: You can view the cluster status by running the CLI command lscluster. See
Example 10-4 below for command output.
Example 10-4 Command output for CLI command lscluster

[Furby.storage.tucson.ibm.com]$ lscluster
ClusterId Name PrimaryServer SecondaryServer
12402779238924957906 Furby.storage.tucson.ibm.com strg001st001 strg002st001
10.3.3 View Interface Node and Storage Node Status:

Using the GUI: You can view the Node status from the Management GUI.. For this, you need
to click on the “Interface Node” or “Storage Node” link in the Clusters category. Upon clicking
the links, the respective pages will open which displays the corresponding status.
Refer to the point 2 and point 3 from “Clusters” on page 312 section to view the status of the
Interface Nodes and Storage Nodes.
Using the CLI: You can view the status of the nodes using the CLI command lsnode.
Example 10-5 below shows the usage and command output. You can get more information
using the -v option or see output formatted with delimiters using -Y option.

Example 10-5 Usage and command output for CLI command lsnode
[Furby.storage.tucson.ibm.com]$ lsnode --help
usage: lsnode [-c <cluster name or id>] [-r] [-v] [-Y]
-r,--refresh refresh list
-v,--verbose extended list
-Y format output as delimited text
[Furby.storage.tucson.ibm.com]$ lsnode
Hostname IP Description Role Product Version Connection status GPFS status CTDB status
Last updated
int001st001 172.31.132.1 interface 1.1.0.2-7 OK active active
4/22/10 3:59 PM
int002st001 172.31.132.2 interface 1.1.0.2-7 OK active active
4/22/10 3:59 PM
mgmt001st001 172.31.136.2 management 1.1.0.2-7 OK active active
4/22/10 3:59 PM
strg001st001 172.31.134.1 storage 1.1.0.2-7 OK active
4/22/10 3:59 PM
strg002st001 172.31.134.2 storage 1.1.0.2-7 OK active
4/22/10 3:59 PM
10.3.4 Modify Interface and Storage Nodes status

You can also modify the status of Interface and Storage nodesin following way:
Interface Nodes:
1. Suspend Node: This command suspends the Interface Node and BANS the CTDB on it
and disables the Node. A banned node does not participate in the cluster and does not
host any records for the CTDB. Its IP address has been taken over by an other node and
no services are hosted.
Using the GUI: Refer to the point 2 from “Clusters” on page 312 section to view the
operations that you can perform on the Interface Nodes and Storage Nodes.
Using the CLI: Use the CLI command suspendnode. Example 10-6 below shows the
usage and command output.
Example 10-6 Commnad usage and output for CLI command suspendnode
[Furby.storage.tucson.ibm.com]$ suspendnode --help
usage: suspendnode nodeName [-c <cluster name or id>]
nodeName
Specifies the name or ip of the node for identification.
[Furby.storage.tucson.ibm.com]$ suspendnode int002st001 -c Furby.storage.tucson.ibm.com

EFSSG0204I The node(s) are suspended successfully!
2. Resume Node: This command resumes the Suspended Interface node. It UNBANS the
CTDB on that node and enables the node. The resumed node participates in the cluster
and hosts records for the clustered trivial database (CTDB). It takes back its IP address
and starts hosting services.
Using the GUI: Refer to the point 2 from “Clusters” on page 312 section to view the

Using the CLI: Use the CLI command resumenode. Example 10-7 below shows the
usage and command output..
Example 10-7 Command usage and output for CLI command resumenode.
[Furby.storage.tucson.ibm.com]$ resumenode --help
usage: resumenode Node [-c <cluster name or id>]
Node
Specifies the name of the node for identification.
[Furby.storage.tucson.ibm.com]$ resumenode int002st001

EFSSG0203I The node(s) are resumed successfully!
Note: Recover Node and Restart Node cannot be done using the CLI. It should be done
only using GUI.
Storage Nodes
This section shows Storage Node commands.
1. Stop Node: This command unmounts the filesystem on that Node and shutsdown the
GPFS daemon.
Using the GUI: Refer to the point 3from “Clusters” on page 312 section to view the
Using the CLI: This task cannot be run using the the CLI. There is no command existing
to operform this operation.
2. Start Node: This starts the GPFS daemon the Storage node selected and mounts the
filesystem on that Node.
Using the GUI: Refer to the point 3from “Clusters” on page 312 section to view the
Using the CLI: This task cannot be run using the the CLI. There is no command existing
to operform this operation.
10.4 File system management

File system management is one of the essential tasks in the SONAS box. The file system
created is the GPFS file system. Under this category there are many tasks that you can
perform right from creating, mounting, unmounting, deleting, changing file system details,
adding disks and more. We will see some of the important and commonly used file system
tasks in detail below:
10.4.1 Create a File system

Using the GUI: You can create the file system using the GUI by clicking on the “File System”
link or task under the “Files” Category in the GUI. Upon clicking this link, a page will open on
the right hand side which has a table that lists the file system that already exist. In our
example, the filesystem gpfs0 is already existing. Below this table is the button to create File
System which says, “Create a File System” See Figure 10-46 below.

Figure 10-46 File system details in the Management GUI
To create the File System, click on the “Create a File System” button. A new panel will open
which asks you to enter the details like:
1. Select NSD: Here, you select the NSD you wish to add in the Filesystem. At least One
NSD should be defined. In case of replication, at least 2 should be selected such that the
two NSDs should belong to different Failure Groups so that in case of failure on one NSD,
the replica will be available for access. Select the NSD by clicking on the check box on the
left of the table (see Figure 10-47 below). Click on the next tab in the panel.
Figure 10-47 Select NSDs from list available in SONAS GUI to create filesystem
2. Basic Information: This is the next tab. Enter the “Mount point” for the File System and
“Device name” or “File System Name” you wish to create. Choose the Block Size from the
list available in the “Block Size” drop down menu. You can use the “Force” option if you do
not want GPFS to check if the NSD chosen has been already used by another File
System. Its advisable to use this option if you are sure that the NSD is not currently being
used by any file system and it is Free (see Figure 10-48 on page 347). Click on the next
tab in the panel.

Figure 10-48 Enter basic information in SONAS GUI to create filesystem
3. Locking and access control lists (ACL): This tab is for the ACLs and the Locking type.
Currently we support only NFSV4 locking type and NFSV4 ACL type which is already
chosen by default in the GUI. The drop down menu for both “Locking Type” and “ACL
type” is hence de-activated or disabled. See Figure 10-49 on page 347 below. Click on the
next tab.
Figure 10-49 Locking and ACL information to create filesystem
4. Replication: This tab allows you to choose if you want replication enabled. In case you
enable replication, you need to select at least 2 NSDs as mentioned above. Also, the two
NSD should belong to two different failure groups. The “Enable Replication Support”
enables replication support for all files and metadata in the file system. This setting cannot
be changed once the file system has been created. The value for both the maximum data
and metadata replicas is set to 2.
To set replication to true, select the “Enable Replication” check box. See Figure 10-50
below. Click on the next tab.

Figure 10-50 Replication information for creating new filesystem
5. Automount: This tab allows you to set Automount to True which means, after every node
restart, the file system will be automatically mounted on the nodes. If set to False, or not
selected, the file system need to be manually mounted on the nodes. See Figure 10-51
below. Click on the next tab.
Figure 10-51 Setting automount for the new filesystem
6. Limits: In this tab you need to enter the number of nodes you want the file system to be
mounted on and the maximum number of files that the filesystem can hold.
Enter the “Number of Nodes” in the text box available. This is estimated number of nodes
that will mount the file system. This is used as a best guess for the initial size of some file
system data structures. The default is 32. This value cannot be changed after the file
system has been created. When you create a GPFS file system, consider over estimating
the number of nodes that will mount the file system. GPFS uses this information for
creating data structures that are essential for achieving maximum parallelism in file
system operations. Although a large estimate consumes additional memory, under
estimating the data structure allocation can reduce the efficiency of a node when it
processes some parallel requests such as the allotment of disk space to a file. If you
cannot predict the number of nodes that will mount the file system, use the default value. If
you are planning to add nodes to your system, you should specify a number larger than
the default. However, do not make estimates that are not realistic, because specifying an
excessive number of nodes might have an adverse affect on buffer operations.
Enter the number of “Maximum number of Files” in the text box available. This will be the
maximum number of files that will be allowed to be created on this file system.
See Figure 10-52 below. Click the next tab.

Figure 10-52 Setting inode limits and maximum number of files for the new filesystem
7. Miscellaneous: Using this tab you can enable other options like Quota, DMAPI, atime
and mtime. Check boxes are provided which need to be selected incase you want to selec
the option. Uncheck if you do not want to select the option. See Figure 10-53 on page 349
below.
Figure 10-53 Miscellaneous information for the file system
8. Final Step: Go through each tab again to verify all the necessary parameters are
selected. Once you have confirmed all the parameters for the filesystem, click on the
cutton “OK”. This button is on the lower end of the “Create File system” panel.
When clicked, the tasks begins and there is a Tasks Progress window that appears that
displays the task that is performed and its details. When done, at end of each stask, there
should be Green check mark (). If any error occurs, there will be a Red cross (x) and an
error message will apear. Check the error, correct it and retry. When the task is
completed, click button “Close” to close the window. See Figure 10-54 below.

Figure 10-54 Task Progress bar for completion
Using the CLI: You can create a Filesystem using the mkfs CLI command . The NSD name
is mandatory and you need to enter at least one NSD. Select -R (replication) to none, if you
do not wish to enable Replication. In case you enable replication, you need to enter at least 2
NSD where both these NSDs belong to different Failure Group. The block size and replication
factors chosen affect file system performance. Example 10-8 below shows the help and
usage of the command. For the example, the block size was left to the default 256KB. Also,
replication was not enabled.
Example 10-8 mkfs command example

[Furby.storage.tucson.ibm.com]$ mkfs --help
usage: mkfs filesystem [mountpoint] [-b <blocksize>] [-c <cluster name or id>] [--dmapi |
--nodmapi] [-F <disks>] [-i <maxinodes>] [-j <blockallocationtype>] [--master] [-N
<numnodes>][--noverify] [--pool <arg>] [-R <replica>]
filesystem
The device name of the file system to be created. File system names need not be
fully-qualified.
mountpoint
Specifies the mount point directory of the GPFS file system.
-b,--blocksize <blocksize> blocksize
--dmapi enable the DMAPI support for HSM
-F,--disks <disks> disks
-i,--numinodes <maxinodes> Set the maximal number of inodes in the
file system.
-j,--blockallocationtype <blockallocationtype> blockallocationtype
--master master
-N,--numnodes <numnodes> numnodes
--nodmapi disable the DMAPI support for HSM
--noverify noverify
--pool <arg> pool
-R,--replica <replica> Sets the level of replication used in this
file system. Either none, meta or all
[Furby.storage.tucson.ibm.com]# mkfs gpfs1 --nodmapi -F array0_sata_60001ff0732f85c8c080008 -R

none --noverify
array0_sata_60001ff0732f85c8c080008: size 15292432384 KB
Formatting file system ...
Disks up to size 125 TB can be added to storage pool 'system'.
Creating Inode File

3 % complete on Fri Apr 23 09:54:04 2010

Creating Allocation Maps
Clearing Inode Allocation Map
Clearing Block Allocation Map

Formatting Allocation Map for storage pool 'system'

Completed creation of file system /dev/gpfs1.
EFSSG0019I The filesystem gpfs1 has been successfully created.

[Furby.storage.tucson.ibm.com]#
10.4.2 List the Filesystem status

In this section we show how to list the filesystem status.
Using the GUI: When clicking on ‘Files System” link or task from the “Files” category, you
can see the table that lists all the filesystems in the Cluster. By default the first file system is
highlighted or selected. The details of the fielsystem are shown in the lower section of the
panel. You can see File System, Disk and Usage information. If you want to look at details of
another file system, you need to select that such that it is highlighted. More information can
be found in the point 1 of “Files” on page 319.
Using the CLI: You can view the status of the File systems using the CLI command, lsfs.
The command displays the file system names, mount point, Quota, Blocksize ACL Types,
Replication details and more. See usage and command output in Example 10-9 below.
Example 10-9 Command usage and output for the CLI command lsfs
[Furby.storage.tucson.ibm.com]$ lsfs --help
usage: lsfs [-c <cluster name or id>] [-d <arg>] [-r] [-Y]
-d,--device <arg> define device
[Furby.storage.tucson.ibm.com]$ lsfs
Cluster Devicename Mountpoint Type Remote device Quota Def. quota
Blocksize Locking type ACL type Inodes Data replicas Metadata replicas Replication policy Dmapi Block
allocation type Version Last update Master
Humboldt.storage.tucson.ibm.com gpfs0 /ibm/gpfs0 local local user;group;fileset
256K nfs4 nfs4 100.000M 1 2 whenpossible F scatter
11.05 4/23/10 5:15 PM YES
Humboldt.storage.tucson.ibm.com gpfs2 /ibm/gpfs2 local local user;group;fileset 64K
nfs4 nfs4 14.934M 1 1 whenpossible F scatter
11.05 4/23/10 5:15 PM NO
10.4.3 Mount the FIle system

In this section we show how to mount the file system.
Using the GUI: You can mount the File system you have created using the GUI by clicking
on the filesystem that you want to mount and click on the button “Mount”. The task will
proceed to ask you to choose the number of nodes you want to mount the filesystem on. You

can either choose “Mount on alll node” or “Choose nodes” from the drop down menu as seen
in Figure 10-55
Figure 10-55 Select to mount the file system on all or selective nodes
Choose to mount on selected nodes then requires you to select the nodes on which you want
to mount the file system. The window seen is similar to Figure 10-56.
Figure 10-56 Select the nodes if tomount on selective nodes
When done, Cick on OK in the same window. The filesystem is then mounted on the nodes
specified. The task progress window will display the progress and once successfull will have
Green check marks (). If any error, the error message will be shown and the window wll
show Red cross sign (x). If any error, check the logs, correct the problem and retry. See
If successful, close the window by clicking the “Close” button. The window will disappear and
you will be brought to the first page of the File Systems page. The table on the main File
System page should now list the filesystem to be mounted on the number of nodes selected.
Using the CLI: You can mount the filesystem using the CLI command mountfs. The
command allows you to choose to mount the file system on all the nodes or on specific
interface nodes. The usage and command output is displayed below in Example 10-10. In the
example, the filesystem gpfs1 is mounted on all nodes and hence -n option is omitted.

Example 10-10 Command usage and output for the CLI command mountfs
[Furby.storage.tucson.ibm.com]$ mountfs --help
usage: mountfs filesystem [-c <cluster name or id>] [-n <nodes>]
filesystem
Identifies the file system name of the file system. File system names need not be
fully-qualified.
-n,--nodes <nodes> nodes
[Furby.storage.tucson.ibm.com]$ mountfs gpfs2
10.4.4 Unmount the File system

Using the GUI: You can unmount the File system you have created using the GUI by clicking
on the filesystem that you want to mount and click on the button “Unmount”. The task will
proceed to ask you to choose the number of nodes you want to unmount the filesystem on.
You can either choose “Unmount on alll node” or “Choose nodes” from the drop down menu
as seen in Figure 10-58 below
Figure 10-58 Select if to be unmounted from all or selective nodes
Choose to mount on selected nodes then requires you to select the nodes on which you want
to mount the file system. The window seen is as below in Figure 10-59.
Figure 10-59 Select nodes to unmount from
When done, Cick on OK in the same window. The file system is then unmounted on the
nodes specified. The task progress window will display the progress and once successfull will
have Green check marks (). If any error, the error message will be shown and the window
will show Red cross sign (x). If any error, check the logs, correct the problem and retry. See
Figure 10-60 below.

Once operations has completed, close the window by clicking the “Close” button. The window
will disappear and you will be brought to the first page of the File Systems page. The table on
the main File System page should now list the filesystem to be unmounted on the number of
nodes selected.
Using the CLI: You can unmount the filesystem using the CLI command unmountfs. The
command allows you to choose to unmount the file system on all the nodes or on specific
interface nodes. The usage and command output is displayed below in . In Example 10-11,
the filesystem gpfs1 is unmounted on all nodes and hence -n option is omitted.
Example 10-11 Command usage and output for the CLI command unmountfs.
[Furby.storage.tucson.ibm.com]$ unmountfs --help
usage: unmountfs filesystem [-c <cluster name or id>] [-n <nodes>]
filesystem
Specifies the name of the filesystem for identification.
-n,--nodes <nodes> nodes
[Furby.storage.tucson.ibm.com]$ unmountfs gpfs2

EFSSG0039I The filesystem gpfs2 has been successfully unmounted.
10.4.5 Modify the File system configuration

Using the GUI: The SONAS GUI allows you to modify the file system configuration of the
filesystem already created. Some of the parameters require that the filesystem is unmounted
while some can be done while it is still mounted.
The lower section of the File System’s panel which also displays the status, disks and usage
information of the filesystem has some check boxes and also text boxes which can be edited
to modify some parameters.
1. Modifying the “File system Configuration” Parameters: As shown in Figure 10-61
below, the text boxe for “Number of iNodes” and drop down menu for Locking Type” and
“ACL type” can be modified while the filesystem is still mounted.

Figure 10-61 Panel to view and modify the Filesystem details
The three check boxes, “Enable Quota”, “Supress amtime” and “Exact mtime” are the ones
that need the filesystem to be unmounted.In Figure 10-61 above, shown with red asterix (*)
Upon modifying the parameters, you need to click the button “OK” for the task to progress.
The task bar will show you the progress of operation and once successfull will have Green
check marks (). If any error, the error message will be shown and the window will show Red
cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-62 on
page 356. Click on the “Close” button to close the window.
2. Modifying the “Disks” for the File System: You can add or remove disks added to the
filesystem. The file system should have at least one disk.

a. Adding New Disks: You can add more by clicking the “Add Disk to the file system”
button. A new window appears listing the free disks, you can choose which disk to add.
Chose the “disk type”. You can also specify the “Failure Group” and “Storage pool” of
the disk when adding. When done, click OK. See Figure 10-63 below.
Figure 10-63 Select disk to add to the file system
The task progress bar will appear which will show you the progress of operation and
once successfull will have Green check marks (). If any error, the error message will
be shown and the window will show Red cross sign (x). If any error, check the logs,
correct the problem and retry. See Figure 10-64 on page 357 below. The new disk will
be successfully added. Click on the “Close” button to close the window.
b. Remove Disks: You can also remove the disk by selecting the disk you want to delete
and clicking on the “Remove” button from the panel in the lower section of the File
System’s page as shown below in Figure 10-65 on page 358.

Figure 10-65 Select the disk to be removed from the list of disks for the file system selected
On clicking the “Remove” button, a new window will appear which asks for confirmation
to remove the disk as shown below in Figure 10-66 on page 358.
Figure 10-66 Confirmation for removal of disks
To confirm click on button “OK”. The task progress bar will appear which will show you
the progress of operation and once successfull will have Green check marks (). If any
error, the error message will be shown and the window will show Red cross sign (x). If
any error, check the logs, correct the problem and retry. See Figure 10-68 on
page 360. below. The new disk will be successfully removed. Click on the “Close”
button to close the window.
Using the CLI: You can change the filesystem parameters using the command chfs.
Example 10-12 below describes the usage and shows command output of chfs used to add
new disk to the filesystem
Example 10-12 Command usage and output for changing properties of file system by adding disk.
[Furby.storage.tucson.ibm.com]$ chfs --help

usage: chfs filesystem [--add <disks> | --noverify | --pool <arg>] [--atime

<{exact|suppress}>] [-c <cluster name or id>] [--force | --remove <disks>] [-i <maxinodes>]
[--master | --nomaster] [--mtime <{exact|rough}>][-q <{enable|disable}>] [-R
<replica>]
filesystem
The device name of the file system to be changed. File system names need not be
fully-qualified.
--add <disks> Adds disks to the file system.
--atime <{exact|suppress}> If set to exact the file system will stamp access times
on every access to a file or directory. Otherwise access times will not be recorded.
--force enforce disk removal without calling back the user
-i,--numinodes <maxinodes> Set the maximal number of inodes in the file system.
--master master
--mtime <{exact|rough}> If set to exact the file or directory modification times
will be updated immediately. Otherwise modification times will be updated after a several second
delay.
--nomaster nomaster
--noverify noverify
--pool <arg> pool
-q,--quota <{enable|disable}> Enables or disables quotas for this file system.
-R,--replica <replica> Sets the level of replication used in this file system.
Either none, meta or all
--remove <disks> Removes disks from the file system.
[Furby.storage.tucson.ibm.com]$ chfs gpfs2 --add array0_sata_60001ff0732f85c8c080008

array0_sata_60001ff0732f85c8c080008: size 15292432384 KB
Checking Allocation Map for storage pool 'system'
9 % complete on Mon Apr 26 12:14:19 2010
Completed adding disks to file system gpfs2.
EFSSG0020I The filesystem gpfs2 has been successfully changed.

10.4.6 Delete File system

Using the GUI: To delete an existing file system from the cluster, click on the “File system”
link in the “Files” Category on the left [panel. Select the filesystem you want to delete and
click the “Remove” button. A window asking for your confirmation appears. Click on the “OK”
button if you are sure. See Figure 10-68 on page 360 below.
Note: Make sure that the file system is unmounted at this point.
Figure 10-68 Confirmation to delete the filesystem
Once you have confirmed, the operation is carried out. The task progress bar will
appear which will show you the progress of operation and once successfull will have
Green check marks (). If any error, the error message will be shown and the window
will show Red cross sign (x). If any error, check the logs, correct the problem and retry.
See Figure 10-69 on page 360 below. The new disk will be successfully removed. Click
on the “Close” button to close the window.
Using the CLI: You can delete and existing file system from tle cluster using CLI command
rmfs. The command usage and output is shown below in Example 10-13.
Example 10-13 Command usage and output for removing the file system.
[Furby.storage.tucson.ibm.com]$ rmfs --help
usage: rmfs filesystem [-c <cluster name or id>] [--force]
filesystem
The device name of the file system to contain the new fileset. File system names need
not be fully-qualified.
--force enforce operation without calling back the user
[Furby.storage.tucson.ibm.com]$ rmfs gpfs2

All data on following disks of gpfs2 will be destroyed:
array1_sata_60001ff0732f85f8c0b000b
Completed deletion of file system /dev/gpfs2.
mmdelfs: Propagating the cluster configuration data to all

10.4.7 Master and Non-Master file system

A Master file system is a special type of file system. At least one file system should be a
master file system to avoid a split-brain detection. Only a single filesystem should be a master
file system. The master role will be moved if another file system has the master role already.
Split-brain detection and node failover for NFS will not work properly without a master file
system.
More about Master File system is explained in the CTDB section of the Appendix A,
“Additional component detail” on page 475.
Using the GUI: As of now, you cannot create a master filesystem from the GUI.
Using the CLI: Master file system can be created using the CLI command mkfs and option
--master.
When creating the first file system in the cluster it will automatically be set as master even if
--master has not been specified. You can make the first filesystem as non-master, by creating
the file system using mkfs command and using the --nomaster flag.
See Example 10-8 on page 350 which explains the creating of file system using CLI
command mkfs to know more about the command mkfs.
10.4.8 Quota Management for File systems

The SONAS file system enables you to add quotas to the file systems and to the users and
groups that exist on the box. You can set the quota for a user, a group, or a file set. Soft limits
are subject to reporting only while hard limits will be enforced by the file system. The GUI as
of now allows only to view the quotas that are enabled. You need to set the quota using the
CLI command setquota. This command will set the quota. For setting quota the file system
should have the quota option enabled for itself. You can do this using the GUI or the CLI
command chfs. The file system should be in an unmounted state while making this change.
Quota tasks are discussed below:

1. View or List quota
Using the GUI: You can list the quota using the GUI by clicking on the “Quota” link under
the “Files” category. See pint 5 under “Files” on page 319 for more explaination on this.
The section already explains viewing quota from the GUI.
Using the CLI: You can view the quota from the CLI using the CLI command lsquota. The
command retrieves data regarding the quota managed by the management node from the
database and returns a list in either a human-readable format or in a format that can be
parsed.
Example 10-14 below shows the usage and the command output for the lsquota
command.
Example 10-14 Command usage and output for CLI command lsquota.
[Furby.storage.tucson.ibm.com]$ lsquota --help
usage: lsquota [-c <cluster name or id>] [-r] [-Y]

[Furby.storage.tucson.ibm.com]$ lsquota
Cluster Device SL(usage) HL(usage) Used(usage) SL(inode) HL(inode) Used(inode)
Furby.storage.tucson.ibm.com gpfs0 --- --- 16 kB --- --- 1
Furby.storage.tucson.ibm.com tms0 --- --- 13.27 MB --- --- 135
iformation is clear.
2. Set Quota
Using the GUI: You cannot set quota from the GUI. GUI shows only a read only
representation for the Quota management.
Using the CLI: You can set the quota for the filesystem using the CLI command setquota.
This command sets the quota for an user, a group, or a file set. Soft limits are subject to
reporting only; hard limits will be enforced by the file system. Disk area sizes are either
without suffix (byte) or with "k" (kilobyte), "m" (megabyte), "g" (gigabyte), "t" (terabyte), or
"p" (petabyte). These values are not case sensitive. The effective quotas are passed in
kilobytes and matched to block sizes. I-node limits accept only "k" and "m" suffixes. The
maximal value for i-node limits is 2 GB.
Warning: The setting of the quota doesnât update the database, because the refresh
takes too much time. If you want to see the result immediately with the lsquota
command, invoke it using the -r option (lsquota -r).
The Example 10-15 below shows the command usage and the output for the CLI
command setquota. In the example, we setquota for hard and soft limits for disk usage for
a user “eebenall” from the domain “Storage3” and the file system “gpfs0”.
Example 10-15 Command usage and output of CLI command setquota

[Furby.storage.tucson.ibm.com]$ setquota --help
usage: setquota device [-c <cluster name or id>] [-g <arg>] [-h <arg>] [-H <arg>] [-j <arg>]
[-S <arg>] [-s <arg>] [-u <arg>]
device
The mount point or device of the filesystem.
-g,--group <arg> name of the group
-h,--hard <arg> hardlimit of the disk usage in bytes, KB, MB, GB, TB or
PB
-H,--hardinode <arg> hardlimit of the inodes in bytes, KB or MB
-j,--fileset <arg> name of the fileset
-S,--softinode <arg> softlimit of the inodes in bytes, KB or MB
-s,--soft <arg> softlimit of the disk usage in bytes, KB, MB, GB, TB or
PB
-u,--user <arg> name of the user

accepted postfixes:
'k' : kiloByte, 'm' :MegaByte, 'g' : GigaByte, 't' : TeraByte, 'p' : PetaByte
[Furby.storage.tucson.ibm.com]$ setquota gpfs0 -u STORAGE3\\eebenall -h 400g -s 200g

EFSSG0040I The quota has been successfully set.
10.4.9 Fileset management

SONAS box allows you to create filesets. A FileSet is a group of files. They are created inside
an existing file system. They are similar to filesystem in some ways as you can perform file
system operations on them. You can replicate, set quotas and also create snapshots. Filesets
are not mounted but are linked or unlinked. You can link a fileset to a directory. This function
creates a junction or a link. The directory to which you link the fileset should not be an existing
directory. It will be created when you link and deleted when you unlink. More is explained in
further sections. Other tasks you can do on a fileset is view, create, remove, link and unlink.
Let us see each in detail below.
1. View or List Filesets
Using the GUI: You can view the filesets created and thier information by clicking on the
“filesets” link under the “Files” category. This section is already described above in Point 4
under “Files” on page 319.
Using the CLI: You can view the filesets in the cluster by using the CLI command lsfset.
This command lists all the filesets along with the details. In the example below, you can
also see an additional fileset, “newfileset” along with the default root. Example 10-16
shows the command usage and output of CLI command lsfset.
Example 10-16 Usage and output for the CLI command lsfset
[Furby.storage.tucson.ibm.com]$ lsfset --help
usage: lsfset device [-c <cluster name or id>] [-r] [-Y]
device
The device name of the file system to contain the fileset. File system names need not
be fully-qualified.
[Furby.storage.tucson.ibm.com]$ lsfset gpfs0

ID Name Status Path CreationTime Comment Timestamp
0 root Linked /ibm/gpfs0 4/21/10 4:39 PM root fileset 4/26/10 5:10 PM
1 newfileset Unlinked -- 4/26/10 10:33 AM this is a test fileset 4/26/10 5:10 PM
2. Create Filesets
Using the GUI: You can create a fileset by clicking on the “Create a Fileset” button on the
main page of the Filesets. This will open a new window asking for the details of the filesets
like “Name” and an optional “Comment”. Click on “OK” when done. The task will create a
fileset. The newly created fileset will be displayed in the table of all the filesets. You can
click on it to see details. At this point, the fileset is not linked to any directory. It cannot be
used to store data. You need to link the fileset similar to mounting a file system before
using it to store data. Figure 10-70 on page 364 shows below the dialog box for creating a
fileset.

Figure 10-70 Creating a new fileset
The task progress bar will appear which will show you the progress of operation and
once successfull will have Green check marks (). If any error, the error message will
be shown and the window will show Red cross sign (x). If any error, check the logs,
correct the problem and retry. See Figure 10-71 below. The new disk will be
successfully added. Click on the “Close” button to close the window.
Figure 10-71 Task bar showing progress of creating fileset
Using the CLI: You can create the fileset using the CLI command mkfset. This command
constructs a new file set using the specified name. The new file set is empty except for a
root directory, and does not appear in the directory namespace until the linkfset command
is issued to link the fileset. The command usage and output is shown below in
Example 10-17. In the example, we create a new fileset called “newfileset” in the “gfs0” file
system. This fset is not yet linked and hence in the “Path” column you see no value. We
can check that the fileset is created successfully by checking the command lsfset. The
example below also shows the command output of lsfset. In example the new fileset used
is “newfileset” create on filesystem “gpfs0”.
Example 10-17 Command usage and output for CLI command mkfset and lsfset.
[Furby.storage.tucson.ibm.com]$ mkfset --help
usage: mkfset device filesetName [-c <cluster name or id>] [-t <comment>]
device
filesetName
Specifies the name of the newly created fileset.
-t <comment> comment
[Furby.storage.tucson.ibm.com]$ mkfset gpfs0 newfileset -t “This is a new Fileset”

EFSSG0070I Fileset newfileset created successfully!


0 root Linked /ibm/gpfs0 3/18/10 5:54 PM root fileset 5/5/10 2:06 AM
1 newfileset Unlinked -- 5/5/10 2:06 AM this is a new fileset 5/5/10 2:06 AM
3. Link Filesets
When the filesets are linked, a junction is created. The junction is a special directory entry,
much like a POSIX hard link, that connects a name in a directory of one file set, the
parent, to the root directory of a child file set. From the user’s viewpoint, a junction
always appears as if it were a directory, but the user is not allowed to issue the unlink or
rmdir commands on a junction. Instead, the unlinkfset command must be used to remove
a junction. As a prerequirement, the file system must be mounted and the junction path
must be under the mount point of the file system.
Using the GUI: When you create a fileset it is not linked by default. You need to manually
link it to a directory which is not existing. On the GUI, when you click on the new fileset
that you have created in the table, the section below in the panel for filesets, displays the
information about the fileset. In our example, we have created a new fileset called
“newfileset” which is not yet linked.
The lower section displays the details like Name, Status, and more. Along with this, if the
fileset is not yet linked, the button “Link” is enabled. You can click the button “Link”. A new
window opens asking for the path. Click “OK” when done and the fileset will then be linked
to this directory. See Figure 10-72 below.
Figure 10-72 Details of the fileset created is seen. The fileset is currently not linked
In case, the fileset is already linked, the “Unlink” button will be enabled and the text box for
path and “Link” button will be disabled.
In our example, we now link the fileset to a path “/ibm/gpfs0/redbook”. The fileset
“newfileset” is now linked to this path. See Figure 10-73 on page 366 below which shows
the dialog box that opens to enter the path to link the fileset.

Figure 10-73 Linking the fileset to the path /ibm/gpfs0/redbook
The task bar for the progress of the task appears. Click “Close” when task completed
successfully. The details for fileset is seen as in Figure 10-74 on page 366 below.
Figure 10-74 Fileset details after linking fileset
Using the CLI: You can link the fileset using the CLI linkfset command. The command will
link the fileset to the directory specified. This directory is the “junctionPath” in the
command. In the example, we also run lsfset to confirm the fileset is linked. See
Example 10-18 below. In example the fileset used is “newfileset” create on filesystem
“gpfs0”.
Example 10-18 Linking fileset usinc CLI command linkfset. lsfset verifies the link.
[Furby.storage.tucson.ibm.com]$ linkfset --help
usage: linkfset device filesetName [junctionPath] [-c <cluster name or id>]
device
filesetName
Specifies the name of the fileset for identification.
junctionPath
Specifies the name of the junction. The name must not refer to an existing file system
object.
[Furby.storage.tucson.ibm.com]$ linkfset gpfs0 newfileset /ibm/gpfs0/redbook

EFSSG0078I Fileset newfileset successfully linked!
[Furby.storage.tucson.ibm.com]$ [root@st002.mgmt001st002 ~]# lsfset gpfs0


1 newfileset Linked /ibm/gpfs0/redbook 5/5/10 2:06 AM this is a new fileset 5/5/10 3:10 AM
4. Unlink Filesets
Using the GUI: You can unlink the fileset by clicking on the “Unlink” button. From the
table that lists all the fileset, click on the fileset you wish to unlink. Upon clicking the fileset,
you will see the fileset details below the table. In the details of the fileset, you have the
“unlink” button. See Figure 10-74 on page 366 which displays the details of fileset and the
unlink button. When you click this button, a new window opens asking for confirmation.
Figure 10-75 Confirm to unlink fileset
Click on “OK” to confirm. The task bar for the progress of the task appears. Click “Close”
when task completed successfully. The fileset will be successfully unlinked.
Using the CLI: You can unlink the fileset using the CLI command unlinkfset. The
command unlinks a linked file set. The specified file set must exists in the specified file
system. See command usage and output in the Example 10-19 below. The example also
shows the output for command lsfset confirming that the fileset was unlinked. In example
the fileset used is “newfileset” create on filesystem “gpfs0”.
Example 10-19 Command usage and output for unlinking fileset using CLI Command unlinkfset and lsfset to verify
[Furby.storage.tucson.ibm.com]$ unlinkfset --help
usage: unlinkfset device filesetName [-c <cluster name or id>] [-f]
device
filesetName
-f force
[Furby.storage.tucson.ibm.com]$ unlinkfset gpfs0 newfileset

EFSSG0075I Fileset newfileset unlinked successfully!

1 newfileset Unlinked -- 5/5/10 2:06 AM this is a new fileset 5/5/10 3:26 AM
5. Remove Filesets
Using the GUI: You can remove the fileset by selecting the fileset you want to delete and
click the “Delete” button. The tasks opens a new window asking for confirmation before
deleting (see Figure 10-76).

Figure 10-76 Delete file set confirmation
Click “OK” to confirm. The task progress bar will appear which will show you the
progress of operation and once successfull will have Green check marks (). If any
error, the error message will be shown and the window will show Red cross sign (x). If
any error, check the logs, correct the problem and retry. See Figure 10-77 below. The
new disk will be successfully added. Click on the “Close” button to close the window.
Figure 10-77 Task bar showing progress of deleting fileset
Using the CLI: You can delete a fileset using the CLI command rmfset. The command
asks for confirmation and then on confirmation, deletes the file set specified. The rmfset
command fails if the file set is currently linked into the namespace. By default, the rmfset
command fails if the file set contains any contents except for an empty root directory. The
root file set cannot be deleted.
The Example 10-20 below shows the command usage and output for deleting a fileset. In
example the fileset used is “newfileset” create on filesystem “gpfs0”.
Example 10-20 rmfset command example

[Furby.storage.tucson.ibm.com]$ rmfset --help
rmfset usage: rmfset device filesetName [-c <cluster name or id>] [-f] [--force]
device
filesetName
-f Forces the deletion of the file set. All file set
contents are deleted. Any child file sets are first unlinked.
[Furby.storage.tucson.ibm.com]$ rmfset gpfs0 newfileset

EFSSG0073I Fileset newfileset removed successfully!


0 root Linked /ibm/gpfs0 3/18/10 5:54 PM root fileset 5/5/10 4:25 A
10.5 Creating and managing exports

Data stored in the directories, filesets and file system, can be accessed using data access
protocols like CIFS, NFS and FTP. For this, you need to configure the services and also
create shares or exports on the GPFS filesystem with which you can then access the data
using any of the above protocols. SONAS as of now does not support HTTP and SCP.
Services are configured during the installation and configuration of the SONAS box. Once the
services are configured it is possible for you to share your data using any of the rotocols by
creating exports using the command line option or the GUI.
You can add more protocols if not already. You can also remove protocols from the export if
you do not wish to export the data using the service or protocol. You can also activate and
deactivate the export. Finally you can delete the existing export. All this is explained later in
this section.
Click on the “Exports” link under the “Files” category on the SONAS GUI to view and manage
all the exports in SONAS. A table that lists all existing exports is seen. Read point 2 from
“Files” on page 319 for more on the Exports configuration page.
Note: As of now IBM SONAS supports FTP, CIFS and NFS exports. Even though the GUI
and CLI command, both show the options of adding an HTTP and SCP export, it is not
officially supported yet.
10.5.1 Create Exports

Using the GUI: Click on the “Add” button to create a new export. This opens up a new page
which asks for more details on the export like the Name of export which will be seen by the
end users. This is the “sharename”. You also have to enter the “Directory path” that you want
to export. This is the actual data directory that you wish to export to the end users. If not
existing, it will create the directory. The path till the directory to be exported however needs to
be already existing.
You can also assign it with an owner name, so that the owner gets the required ACLs to
access this directory. Last step is to identify the protocols by which you want to share this
directory. You can choose any or all of them which are CIFS, FTP and NFS. Click next when
done. See Figure 10-78 on page 370 below.

Figure 10-78 Panel to create a new export. Provide the sharename, pathname, owner and services
A new page will open which will ask you for protocol related information. Each of them are
described here:
1. FTP: FTP does not take any parameters during its configuration. Proceed to click next as
shown in figure Figure 10-79 below.
Figure 10-79 Panel that shows that directory in path given is created but has default ACLs

Note: The warning message here is because the folder in the directory path mentioned
does not exist. However, this directory is created by this operation. This warning message
informs you that the directory has been created and has the default ACLs which needs to
be modified if required.
2. NFS Export: NFS exports are accessed by per clients or hosts and not users. Hence, you
need to mention which hosts or clients can access the NFS exports. On the new page that
opens, you need to add the client details in the “Client settings” section as follows:
– Client Name: Add name of host who can access the export. You can individual
hostnames or “*” for all clients/host.
– Read Only: Check this box if you want the clients to have Read Only access.
Unchecking will give the clients both read and write access.
– Sync: Check this box if you want replies to the requests only after the changes are
committed to stable storage.
– Root Squash: This option, maps requests from uid/gid 0 to the anonymous uid/gid.
Click the “Add Client” button to successfuly add the client. When added, it is added to the
table that displays all clients for the NFS export. Now click on the “Next” button. See
Figure 10-80 bellow.
Figure 10-80 NFS configuration panel. Add client and other properties
3. CIFS Export: The CIFS configuration takes parameters as follows:

– Comment: This can be any user defined comment.

– Browsable: This check box if checked allows the export to be visible in the “net view”
command and in the browse list.
– ACL / Access Rights: If checked, the export has only read-only access.
See Figure 10-81 on page 372 below.
Figure 10-81 Panel for CIFS configuration
Click on the “Next” button to proceed. The next page is the Final page which asks for
confirmation before configuring the exports. See Figure 10-82 below.
Figure 10-82 Final configuration page
Click on the “Finish” button to confirm. Click “back” to go back and make some changes. Click
“cancel” to cancel the creation of exports. This will bring you to the main page of “Exports”.
Once you have confirmed and clicked to finish, the task is carried out. The task progress bar
will appear which will show you the progress of operation and once successfull will have
Green check marks (). If any error, the error message will be shown and the window will
show Red cross sign (x). If any error, check the logs, correct the problem and retry. See
Figure 10-83 on page 373 below. The new disk will be successfully removed. Click on the
“Close” button to close the window.

The newly created exports will be added to the table on the main page of the exports.
Using the CLI: You can create an export using the CLI command mkexport. This command
takes the name of the “sharename” and the “directory path” of the share you wish to create.
You can create FTP, CIFS and NFS share with this command. FTP share does not need any
parameters. CIFS and NFS take some parameters. Using the command you can also create
an “inactive” share. Inactive shares are when the creation of the share is complete however
the share cannot be used by the end users. By default, the share is active. You can also add
“owner” which will give the required ACLs to the user to access the share.
The command usage and output in Example 10-21 below. In this example, FTP, CIFS and
NFS share is created.
Example 10-21 Command usage and output for creating export using CLI command mkexport
[Furby.storage.tucson.ibm.com]$ mkexport --help
usage: mkexport sharename path [-c <cluster name or id>] --cifs <CIFS options> | --ftp |
--http | --nfs <NFS client definition> | --scp [--inactive][--owner <owner>]
sharename
Specifies the name of the newly created export.
path
Specifies the name of the path which will be share.
--cifs <CIFS options> enable CIFS protocol [using CIFS options]
--ftp enable FTP protocol
--http enable HTTP protocol
--inactive share is inactive
--nfs <NFS client definition> enable NFS protocol [using clients(NFSoption)]
--owner <owner> directory owner
--scp
[Furby.storage.tucson.ibm.com]$ mkexport shared /ibm/gpfs0/shared --ftp --nfs "*(rw,no_root_squash,async)"

--cifs browseable=yes,comment="IBM SONAS" --owner "SONASDM\eebanell

You can also create an inactive share using the --inactive option in the mkexport command.
This is not to do from the GUI.
10.5.2 List and view status of exports created

Using the GUI: You can view the exports that have been created by clicking on the “Exports”
link under the “Files” category from the lef panel. More about the listing of exports is already
covered in the point 2 of section “Files” on page 319.
Using the CLI: You can list the exports or shares using the CLI command lsexport. This
commands lists all the exports as a list for each protocol they are created. Example 10-22
below shows the command usage and output.
Example 10-22 Command usage and output for listing exports using CLI command lsexport.
[Furby.storage.tucson.ibm.com]$ lsexport --help
usage: lsexport [-c <cluster name or id>] [-r] [-v] [-Y]
-v,--verbose extended list
[Furby.storage.tucson.ibm.com]$ lsexport
Name Path Protocol Active Timestamp
1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 FTP true 4/28/10 3:35 AM
1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 HTTP true 4/28/10 3:35 AM
1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 NFS true 4/28/10 3:35 AM
1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 CIFS true 4/28/10 3:35 AM
1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 SCP true 4/28/10 11:03 AM
1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 FTP true 4/28/10 11:03 AM
1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 HTTP true 4/28/10 11:03 AM
1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 NFS true 4/28/10 11:03 AM
1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 CIFS true 4/28/10 11:03 AM
1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 SCP true 4/28/10 11:03 AM
1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 FTP true 4/28/10 18.38 PM
1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 HTTP true 4/28/10 18.38 PM
1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 NFS true 4/28/10 18.38 PM
1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 CIFS true 4/28/10 18.38 PM
1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 SCP true 4/28/10 18.38 PM
10.5.3 Modify exports

Using the GUI: You can modify an export by adding more services or protocol to the export
or by changing protocol parameters for the existing export. You cannot delete protocols using
this button.
1. Add Protocols: If you have already created and export for FTP access and would like to
provide it with NFS and CIFS too, you can click on the export already exisiting in the table
that lists the exports and click on the “Modify” button.
This opens a new window asking for the protocols you wish to add to the existing export.

Figure 10-84 Panel to add new protocols to the export already existing
As you see, the protocols that are already added are disabled. Also, sharename and path
are disabled so that you cannot change it. You can click the protocols that you wish to add
and click on next. The same procedure as creating the export is followed from this step on.
Provide details for protocol that you add like in above example, for NFS protocol. FTP
takes none. Click on the next button to continue till you finish. For detail steps you can
refer to “Create Exports” on page 369
2. Change Protocol Parameters: You can change parameters for both NFS and CIFS. On
the main page of “Exports” under the “Files” category you can see the table that displays
all the existing exports. If you click on any export, in the section lower on that same page,
you can see protocol information for that export. You can see details for only CIFS and
NFS protocols.
a. NFS Details: You can change NFS details by adding more clients or removing existing.
You can also edit an existing client and add more options like seen in the Figure 10-85
on page 375 below.
Figure 10-85 Modifying NFS configuraton by editing clients or adding new clients
You can click on the “edit” link to change options of the client added, you can remove
the client added using the “remove” link. You can also add new client using the “Add
Client” button. When you want to edit or add a new client, a new window opens asking
for details of the client as shown in Figure 10-86 on page 376 below.

Figure 10-86 Modify settings for clients
For a new client, you need to add Client name, check or uncheck read-only,
root-aquash, sync and others as required. For an existing client, the name field will be
disabled since its an existing one. To remove the client, click on “remove” link.
b. CIFS Details: You can modify a CIFS export parameters by editing the Details like
comment, Browsable option and ACLs. You can also add, modify or remove advanced
options for a CIFS share using the Advanced Option - Add, Modify and Delete Button.
Figure 10-87 Modify configuration for CIFS share
Using the CLI: You can modify an existing share or export using the CLI command
chexport. Using the CLI command, unlike the GUI, you can remove or add protocols using
the same command. Each has different options to use. In this section, adding new protocols
will be discussed.
You can add new protocols by adding the --cifs, --ftp and --nfs option and its protocol
definitions. The command usage and output is as shown in Example 10-23 below. For this
example, the existing export was a CIFS export. FTP and NFS are added using chexport.
Example 10-23 Command usage and output for adding new protcols to existing share.
[Furby.storage.tucson.ibm.com]$ chexport --help
usage: chexport sharename [--active] [-c <cluster name or id>] [--cifs <CIFS options>] [--ftp
<arg>] [--http <arg>] [--inactive] [--nfs <NFS client definition>] [--nfsadd <NFS clients>]
[--nfsremove <NFS clients>] [--scp <arg>]
sharename
Specifies the name of the export for identification.
--active share is active


--ftp <arg> FTP
--http <arg> HTTP
--nfsadd <NFS clients> add NFS clients
--nfsremove <NFS clients> remove NFS clients
--scp <arg> SCP
[Furby.storage.tucson.ibm.com]$ chexport shared --ftp --nfs "*(rw,no_root_squash,async)"

EFSSG0022I Protocol FTP is configured for share shared.
EFSSG0034I NFS Export shared is configured, added client(s): *, removed client(s): None.
You can add more or remove clients to the NFS protocols, or modify CIFS options using
the [--nfs <NFS client definition>] [--nfsadd <NFS clients>] [--nfsremove <NFS clients>]
and the [--cifs <CIFS options>] options. In the Example 10-24 below, a new client is added
to the NFS export.
Example 10-24 Command output to add new NFS clients to existing NFS share.
[Furby.storage.tucson.ibm.com]$ chexport shared --nfsadd "9.1.2.3(rw,no_root_squash,async)"
EFSSG0034I NFS Export shared is configured, added client(s): 9.1.2.3, removed client(s): None.
10.5.4 Remove service/protocols

Using the GUI: You can remove protocols from an existing export using the “Remove
Service” button found on the main page of the “Exports” page in the “Files” category.
Click on the exisitng export or share that you want to modify. Click on the the “Remove
Service” button. A new window will be opened which asks for the protocols that you wish to
remove. It has a check box against each protocol. Select the one you wish to remove. Only
select the ones that the export is already confgured with. You will get an error if you select
protocols that are not configured already. See Figure 10-88 below. Click “OK” when done.
Figure 10-88 Panel to remove protocols from share
On clicking the OK button on that window, the task progress bar will appear which will show
you the progress of operation and once successfull will have Green check marks (). If any
error, the error message will be shown and the window will show Red cross sign (x). If any

error, check the logs, correct the problem and retry. See Figure 10-89 below. The new disk will
be successfully removed. Click on the “Close” button to close the window. The export is
successfully modified.
Using the CLI: You can remove a protocol from an existing share using the command
chexport and off option for a protocol. The command usage and output is shown as follows
in Example 10-25. In this example the existing share or export is configured for CIFS, FTP
and NFS. The command removes FTP and NFS.
Example 10-25 Command usage and output to remove protocols from existing share
sharename
--ftp <arg> FTP
--http <arg> HTTP
--scp <arg> SCP
[Furby.storage.tucson.ibm.com]$ chexport shared --ftp off --nfs off

EFSSG0023I Protocol FTP is removed from share shared.
EFSSG0023I Protocol NFS is removed from share shared.
10.5.5 Activate Exports

Exports when created are active unless specified. When an export is active, you can acces
the data in it. When an export is inactive, the configuration data related to the export is
removed from all the nodes. Even though the export exists, it cannot be hence accessed by
an end user.
Using the GUI: You can activate an existing export that has been deactivated using the
“Activate” button in the GUI. This task will create all the configuration data needed for the
exports on all the nodes so that the share/export is available for access to the end users.

Using the CLI: You can activate a shareusing the CLI command chexport and --active
option. The command usage and output is as shown below in Example 10-26.
Example 10-26 Command usage and help to activate an existing share.

sharename
--ftp <arg> FTP
--http <arg> HTTP
--scp <arg> SCP
[Furby.storage.tucson.ibm.com]$ chexport shared --active

EFSSG0037I The share shared is activated.
10.5.6 Deactivate Exports

Using the GUI: You can deactivate an existing export that is active using the “Deactivate”
button in the GUI. This task will create all the configuration data needed for the exports on all
the nodes so that the share/export is available for access to the end users.
Using the CLI: You can activate a shareusing the CLI command chexport and --active
option. The command usage and output is as shown below in Example 10-27
Note: You can also create an export that is already deactive by using the --inactive option
in the command mkexport. For more information read “Create Exports” on page 369.
Example 10-27 Command usage and output to deactivate an existing share

sharename
--ftp <arg> FTP
--http <arg> HTTP


--scp <arg> SCP
[Furby.storage.tucson.ibm.com]$ chexport shared --inactive

EFSSG0037I The share shared is
inactivated.
10.5.7 Remove Exports

Using the GUI: You can remove existing exports using the “Remove export” button on the
main page of the “Exports” under the “Files” category.
Click on the export you wish to remove from the table that lists all exports. Click on the button
“Remove Export”. You will be asked to confirm the removal of export. See below
Figure 10-90.
Figure 10-90 Confirmation to remove exports
Click on the “OK” button which will proceed to remove the export or share. Once it has been
removed, it will not be seen in the table that shows existing exports.
Using the CLI: You can remove an existing export using the CLI command rmexport. The
commands asks for your confirmation. If entereeThe command removes all the configuration
details of the export from all nodes. See command usage and output in Example 10-28 below.
Example 10-28 Command usage and outoput to remove an existing export.

[Furby.storage.tucson.ibm.com]$ rmexport --help
usage: rmexport sharename [-c <cluster name or id>] [--force]
sharename
[Furby.storage.tucson.ibm.com]$ rmexport shared

EFSSG0021I The export shared has been successfully removed.
10.5.8 Test accessing the Exports

Below we explain how to access the shares. NFS and CIFS can be accessed by mounting
the exports. We will see each.
CIFS
CIFS export needs to be mounted before accessing. A CIFS share can be accessed using
both Windows and Unix machine.

1. Accessing CIFS using Windows:

To mount a CIFS share from windows. Richt click on “My computers” and click on “Map a
Network Drive” as shown in Figure 10-91 below.
Figure 10-91 Mapping a drive on windows to access CIFS share
A new window opens which asks to enter the Drive and path details. Choose a drive letter
from the drop down list. Enter the path for the share you want to access in the format of:
“\\cluster_name\sharename” where cluster_name is the name of the cluster you want to
access and sharename is the Name of the share that you want to mount.
In our example as seen in Figure 10-92, cluster_name we specify as IP: 9.11.137.219 and
sharename is shared. We mount the share on “X” drive.
Figure 10-92 Choose Drive letter and path to mount
Click on “different user name” link on the above window and enter the Windows user
name and passowrd. This user must have access or ACLs set to access this share. In our

example, user is: “STORAGE3\\eebenall” belonging to the domain “STORAGE3”. See

Figure 10-93 Adding user name and Password to access the share
Click on “Finish”. The share should be mounted successfully. You can then access the
share by accessing “My Computer” and the “X” drive which you just mounted. Double click
the Drive and you will be able to see the contents of the share as shown in Figure 10-94
on page 382 below.
Figure 10-94 Data seen from mounted share

2. Accessing CIFS using UNIX:

Mount the CIFS share using the mount.cifs command. The following Figure 10-95 shows
the command. In our example, we use the client, “sonaspb44” which is a linux client. We
create a directory cifs_export in /mnt directory where we will mount the share. The cluster
is “Furby.storage.tucson.ibm.com” and share is “shared”. The user we have used to
access is “STORAGE3\\eebenall” belonging to the domain “STORAGE3”. See
Figure 10-95 below.
Figure 10-95 Command to mount and access the data from Unix
NFS
NFS shares are to be mounted too for accessing data. Following shows how to mount on
Unix clients. In our example, use the linux client “sonaspb44” and have created a directory,
“nfs_export” in “/mnt” where we will mount the NFS export. The cluster is
“Furby.storage.tucson.ibm.com” and share is “shared”. See Figure 10-96 below.
Figure 10-96 NFS share mount
FTP
FP shares can be accessed by both Windows and Unix. Use the “ftp” command to access the
export. You can also use external ftp client applications on windows to accees the share.
Below we explain access from both Windows and Unix.
1. Accessing FTP from Windows:
You can use any FTP client to access data from the FTP export. We use the command
prompt to display the same. In our example, cluster is “Furby.storage.tucson.ibm.com”
and share is “shared”. See Figure 10-97 on page 384 below.
On running, FTP, you are prompted to enter the user and password. In this example, the
user is “STORAGE3\\eebenall” belonging to the domain “STORAGE3”. See Figure 10-97
on page 384. You then need to run a “cd” at the FTP prompt to the sharename which you

want to access. As shown below, we do a “ftp> cd shared” to access the FTP export,
“shared”.
Figure 10-97 Accessing FTP share from Windows
2. Accesing FTP from Unix:

You can access the FTP data by running the FTP commad from the Unix client. In our
example, cluster is “Furby.storage.tucson.ibm.com” and share is “shared” and we use a
linux client “sonaspb44”.
On running, FTP, you are prompted to enter the user and password. In this example, the
user is “STORAGE3\\eebenall” belonging to the domain “STORAGE3”. See Figure 10-98
on page 385. You then need to run a “cd” at the FTP prompt to the sharename which you
want to access. As shown below, we do a “ftp> cd shared” to access the FTP export,
“shared”.

Figure 10-98 Accesing the FTP share from the Linux Client
10.6 Disk Management

Each of the disks that exists in the SONAS box can be managed. You can view the status of
the disks and also perform actions like Suspend and Resume. You can also Start disks.
10.6.1 List Disks and View Status:

Using the GUI: Click on the “Disks” link under the “Storage” category. This will display a table
with all the disk and the information about each. You can see the Name, Filesystem it is
attached to, Usage details, Failure group, Storage pool and more. You can refer to “Disks”
from section “Storage” on page 327.
Using the CLI: You can list the disks in the cluster using the CLI command lsdisk. This
command lists the existing disks along with the information like File system it is attached to,
the Failure group, Storage Pool, Type of disk and many more. The command usage and
output is as shown in the Example 10-29 on page 385 below.
Example 10-29 Command usage and help to list the disks in the cluster.
[Furby.storage.tucson.ibm.com]$ lsdisk --help
usage: lsdisk [-c <cluster name or id>] [-d <arg>] [-r] [-v] [-Y]
-d,--device <arg> define device
-v,--verbose extra columns
[Furby.storage.tucson.ibm.com]$ lsdisk

gpfs5nsd 4004 system ready 4/28/10 4:42 AM
Susupend disk
Using the GUI: You can Suspend disks using the “Suspend” button. Select the disk you want
to suspend and click on the “Suspend” button. The operation opens a new window asking for
your confirmation before suspending the disk. See Figure 10-99 below.
Figure 10-99 Confirmation before suspending disk
Click “OK” to confirm. The task progress bar will appear which will show you the progress of
operation and once successfull will have Green check marks (). If any error, the error
message will be shown and the window will show Red cross sign (x). If any error, check the
logs, correct the problem and retry. See Figure 10-100 below. The new disk will be
successfully removed. Click on the “Close” button to close the window.
When suspended, the disk appears in the table with status “Suspended” as shown in
Figure 10-101 Panel shows disk is suspended
Using the CLI: There is no CLI command as of now to suspend a disk.

Resume Disks
Using the GUI: You can Resume disks using the “Resume” button. Select the suspended
disk you wish to resume and press the “Resume” button. The operation opens a new window
which asks for confirmation. See Figure 10-102 below.
Figure 10-102 Confirmation before resuming a suspended disk
operation and once successfull will have Green check marks (). If any error, the error
The disk that was suspended before will be have status ready as shown in Figure 10-104 on
page 387 below.
Figure 10-104 Panel shows that the disk has been successfully resumed
Using the CLI: There is no CLI command to resume a node as of now.
10.6.2 Change Properties of disks

Using the GUI: As of now there is no way to change the properties of a disk using the GUI.

Using the CLI: You can change the properties of a disk using the CLI command chdisk. The
properties that you can modify for a disk is the Failure Group, Storage Pool and the Usage
Type. Usage of the command is shown below in Example 10-30.
Example 10-30 Command usage for changing properties of a disk

[Furby.storage.tucson.ibm.com]$ chdisk --help
usage: chdisk disks [-c <cluster name or id>] [--failuregroup <failuregroup>] [--pool <pool>]
[--usagetype <usagetype>]
disks
The name of the device
--failuregroup <failuregroup> failure group
--pool <pool> pool name
--usagetype <usagetype> usage type
Each of parameters that can be changed is explained in detail below.

1. Failure Group: You can change the Failure Group of a disk by using the option
--failuregroup along with the command chdisk.
2. Storage Pool: You can change the Failure Group of a disk by using the option
--stroagepool along with the command chdisk.
3. Usage Type: You can change the Failure Group of a disk by using the option
--usagetype along with the command chdisk.
In Example 10-31 we change each of the parameter for one of the disks
“array1_sata_60001ff0732f85f8c0b000b”. The example also shows the state of the disk
before changing and the disk whose information is changed in bold.
Example 10-31 Command output for CLI command lsdisk and using chdisk to change failure group of disk
Name File system Failure group Type Pool Status Availability
Timestamp
4/26/10 3:03 AM
4/26/10 3:03 AM
4/26/10 3:03 AM
4/26/10 3:03 AM
4/26/10 3:03 AM
4/26/10 3:03 AM
4/26/10 3:03 AM
4/26/10 3:03 AM
array0_sata_60001ff0732f85e8c0a000a tms0 1 dataAndMetadata system ready up
4/26/10 3:03 AM
array1_sata_60001ff0732f8608c0f000c tms0 2 dataAndMetadata system ready up
4/26/10 3:03 AM
array0_sata_60001ff0732f85c8c080008 1 dataAndMetadata system ready
4/23/10 10:00 AM
array1_sata_60001ff0732f85f8c0b000b 2 dataAndMetadata newpool ready
4/24/10 3:05 AM

[Furby.storage.tucson.ibm.com]$ chdisk array1_sata_60001ff0732f85f8c0b000b --failuregroup 200 --pool

newpool --usagetype descOnly
Name File system Failure group Type Pool Status Availability
Timestamp
4/26/10 3:03 AM
4/26/10 3:03 AM
4/26/10 3:03 AM
4/26/10 3:03 AM
4/26/10 3:03 AM
4/26/10 3:03 AM
4/26/10 3:03 AM
4/26/10 3:03 AM
array0_sata_60001ff0732f85e8c0a000a tms0 1 dataAndMetadata system ready up
4/26/10 3:03 AM
array1_sata_60001ff0732f8608c0f000c tms0 2 dataAndMetadata system ready up
4/26/10 3:03 AM
array0_sata_60001ff0732f85c8c080008 1 dataAndMetadata system ready
4/23/10 10:00 AM
array1_sata_60001ff0732f85f8c0b000b 200 descOnly newpool ready
4/28/10 10:14 PM
10.6.3 Start Disks

Using the GUI: Select the disk you want to Start. Click on the “Start” button.
Using the CLI: There is no CLI command as of now to start the disk.
10.6.4 Remove Disks

Using the GUI: Select the disk you want to Remove. Click on the “Remove” button.
Using the CLI: There is no CLI command as of now to remove the disk.
10.7 User management

Users who can access the the SONAS box can be of two types. The first one, the system
administrator who needs to manage the SONAS system and the second are end users who
will be accessing, reading and writing the data in the file systems. In this section, we will see
each in detail:

10.7.1 SONAS administrator

The SONAS administrator is the user who has all administrative rights to carry out
operations on the SONAS box. The administrator manages the cluster nodes, storage, file
systems, exports and also sets up the monitoring and thresholds required to be set up and
can view the status or health of the whole cluster.
The SONAS administrator can further be a CLI user or the GUI user. User roles are
currently defined only for GUI users. As of now all CLI users will have the rights to perform
all commands. Now let’s look at the details:
1. SONAS CLI user: A SONAS CLI user is create using the SONAS CLI command mkuser.
This user is a special user with a restricted bash shell. The user can run no Unix
command except a few like: grep, initnode, man, more, sed, sort, cut, head, less, tail, uniq.
All the other commands that the administrator can run are the SONAS CLI command.
For the list of commands, run help at the command prompt after you are logged in as a
CLI user. The output is as shown in Example 10-32 below.
Example 10-32 Commands that a SONAS user can execute.

[Furby.storage.tucson.ibm.com]$ cli help
Known commands:
addcluster Adds an existing cluster to the management.
addnode Adds a new cluster node.
attachnw Attach a given network to a given interface of a network group.
backupmanagementnodeBackup the managament node
cfgad configures AD server into the already installed CTDB/SMABA
cluster.Previously configured authentication server settings will be erased
cfgbackupfs Configure file system to TSM server association
cfgcluster Creates the initial cluster configuration
cfghsm Configure HSM on each client facing node
cfgldap configure LDAP server against an existing preconfigured cluster.
cfgnt4 configure NT4 server against an existing preconfigured cluster.
cfgsfu Configures user mapping service for already configured AD
cfgtsmnode Configure tsm node.
chavailnode Change an available node.
chcurrnode Changes current node
chdisk Change a disk.
chexport Modifies the protocols and their settings of an existing export.
chfs Changes a new filesystem.
chfset Change a fileset.
chkauth Check authentication settings of a cluster.
chkpolicy validates placement rules or get details of management rules of a policy on
a specified cluster for specified device
chnw Change a Network Configuration for a sub-net and assign multiple IP
chnwgroup Adds or removes nodes to/from a given network group.
chservice Change the configuration of a protocol service
chuser Modifies settings of an existing user.
confrepl Configure asynchronous replication.
dblservice stop services for an existing preconfigured server.
detachnw Detach a given network from a given interface of a network group.
eblservice start services for an existing preconfigured server.
enablelicense Enable the license agreement flag
initnode Shutdown or reboot a node
linkfset Links a fileset
lsauth List authentication settings of a cluster.

lsavailnode List available nodes.

lsbackup List information about backup runs
lsbackupfs List file system to tsm server and backup node associations
lscfg Displays the current configuration data for a GPFS cluster.
lscluster Lists the information of all managed clusters.
lscurrnode List current nodes.
lsdisk Lists all discs.
lsexport Lists all exports.
lsfs Lists all filesystems on a given device in a cluster.
lsfset Lists all filesets for a given device in a cluster.
lshist Lists system utilization values
lshsm Lists configured hsm file systems cluster
lslog Lists all log entries for a cluster.
lsnode Lists all Nodes.
lsnw List all public network configurations for the current cluster
lsnwdns List all DNS configurations for the current cluster
lsnwgroup List all network group configurations for the current cluster
lsnwinterface List all network interfaces
lsnwnatgateway List all NAT gateway configurations for the current cluster
lsnwntp List all NTP configurations for the current cluster
lspolicy Lists all policies
lspool Lists all pools.
lsquota Lists all quotas.
lsrepl List result of the asynchronous replications.
lsservice Lists services
lssnapshot Lists all snapshots.
lstask Lists all (background) tasks for the management node.
lstsmnode Lists defined tsm nodes in the cluster
lsuser Lists all users of this mangement node.
mkavailnode Add an available node to the database.
mkcurrnode Makes current node
mkexport Creates a new export using one or more protocols.
mkfs Creates a new filesystem.
mkfset Creates a fileset
mknw Create a new Network Configuration for a sub-net and assign multiple IP
mknwbond Makes a network bond from slave interfaces
mknwgroup Create a group of nodes to which a network configuration can be attached.
See also the commands mknw and attachnw.
mknwnatgateway Makes a CTDB NAT gateway
mkpolicy Makes a new policy into database
mkpolicyrule Appends a rule to already existing policy
mkservice Configure services
mksnapshot creates a snapshot from a filesystem
mktask Schedule a prefedined task for
mkuser Creates a new user for this management node.
mountfs Mount a filesystem.
querybackup Query backup summary
restripefs Rebalances or restores the replication of all files in a file system.
resumenode Resumes an interface node.
rmbackupfs Remove file system to TSM server association
rmcluster Removes the cluster from the management (will not delete cluster).
rmexport Removes the given export.
rmfs Removes the given filesystem.
rmfset Removes a fileset

rmlog Removes all log entries from database

rmnode Removes a node from the cluster.
rmnw Remove an existing public network configuration
rmnwbond Deletes a regular bond interface.
rmnwgroup Remove an existing group of nodes. A maybe attached public network
configuration must be detached in advance
rmnwnatgateway Unconfigures a CTDB NAT gateway.
rmpolicy Removes a policy and all the rules belonging to it
rmpolicyrule Removes one or more rules from given policy
rmsnapshot Removes a filesystem snapshot
rmtask Removes the given scheduled task.
rmtsmnode Remove TSM server stanza for node
rmuser Removes the user from the management node.
rpldisk Replaces current NSD of a filesystem with a free NSD
runpolicy Migrates/deletes already existing files on the GPFS file system based on the
rules in policy provided
setnwdns Sets nameservers
setnwntp Sets NTP servers
setpolicy sets placement policy rules of a given policy on cluster passed by user.
setquota Sets the quota settings.
showbackuperrors Shows errors of a backup session
showbackuplog Shows the log of the recent backup session.
showrestoreerrors Shows errors of a restore session
showrestorelog Shows the log of the recent restore session.
startbackup Start backup process
startreconcile Start reconcile process
startrepl Start asynchronous replication.
startrestore Start restore process
stopbackup Stops a running TSM backup session
stoprepl Stop asynchronous replication.
stoprestore Stops a running TSM restore session
suspendnode Suspends an interface node.
unlinkfset Unlink a fileset.
unmountfs Unmount a filesystem.
Plus the UNIX commands: grep, initnode, man, more, sed, startmgtsrv, stopmgtsrv, sort, cut,
head, less, tail, uniq
For additional help on a specific command use 'man command'.
To get more help on each of the commands, the administrator can then check the
manpage by running man <command_name> or <command_name> --help as show in
the below. In the example the command mkuser is used.
As mentioned previously, the CLI user as of now has no roles defined. As of now a CLI
user can run all the administrative commands to manage the cluster, storage,
filesystems and exports. The administrator can also look into the logs and utilizations
charts for information on the health of the cluster and its components.
2. SONAS GUI user: The SONAS GUI user needs to be added into the GUI by the root user.
After an install, the root user is automatically added into the GUI. Log in as the root user
and password. Click on the link “Console User Authority” link under the “Settings” category
of the GUI. This will open up a page on the left which will have table. The table lists all the
GUI users who can access the GUI and their roles. See point 2 under “Settings” on
page 335.
The tasks that you can perform in that panel is, add new user or remove a GUI user. You
can do this using the “Add” and “Remove” button respectively explained below.

i. Add user: Add a user by clicking on the “Add” button. A new page asking for the
user details will open. Type in the user name. This user should be an existing CLI
user already created using the mkuser command before. You need also specify the
role for the user. You can have different roles like:
- Administrator: This user will have all the administrator rights and can perform all
the operatons like the CLI user.
- Storage Administrator: This user will have rights to manage the storage. Tasks
to operate on the storage can all be done by this user.
- Operator: Operator would have only read access. This user can view the logs,
health status and overall topology of the cluster.
- System administrator: This user can administor the system as a whole.
Click “OK” when done. Figure 10-105 below shows the panel to add a new user.
Figure 10-105 Panel to add user and user roles to the Users
After user is added, the table will display the newly added user as shown in
Figure 10-106 below.
Figure 10-106 Panel displaying the newly added user
ii. Remove User: Select the user to delete and click the “Remove” button. The user
will be successfully deleted from the GUI.
Note: Deleting a user from the GUI does not delete a user from the CLI. The CLI
user still exists.
iii. Logout: A user selected can be logged out using the “Logout” button.
Depending on the Role given to the user, the GUI user will have different access
permissions and can perform different operations.

10.7.2 SONAS end users

The SONAS end users are the users who will be accessing the data stored in the file system.
They can write data and also read data.
Data from the cluster can be accessed by the end users only through the data exports. The
protocols that SONAS supports currently is CIFS, FTP and NFS.
To access the data using the protocols, the users need to authenticate. SONAS supports
Windows AD authentication server and LDAP server. To know more about integrating the
authentication server to SONAS appliance refer chapter 6, section 6.7.8 Configure
Authentication - AD and LDAP.
NFS, an exception does not need users to authenticate because it checks for authenticity of
the client or hosts. The other protocols like FTP and CIFS as of now require that the users
authenticate. CIFS authenticates with the Windows AD server while FTP will work for both
Windows AD users and LDAP.
Authentication is the process of verifying the identity of the user. Users confirms that they
are indeed the users they are claiming to be. This is typically accomplished by verifying the
user ID and password from the authentication server.
Authorization is the process of determining if the users are allowed to acces. The users may
have persmissions to access certain files but may not have permissions to access others.
This is typically done by ACLs.
The fIle system ACLs supported in the current SONAS is GPFS ACLs which is the NFSV4
ACLs. The directories and exports need to be given the right ACLs for the users to be able to
access. As of now you can give the owner, the rights or permissions to an export, by
specifying the owner option while creating one from both, the GUI or CLI. If you want to give
other users the access, you need to modify the ACL file in GPFS for the directory or export
using the GPFS command mmeditacl. You can view the ACLs by using the GPFS command
mmgetacl.
Note: As of now you need to use the GPFS command to view or edit ACLs. This
commands need root access.
Example 10-33, shows how you can provide ACLs to a directory or export.
Example 10-33 Viewing current ACLs for an export using GPFS command mmgetacl
export EDITOR=/bin/vi
$ mmgetacl /ibm/gpfs0/Sales
#NFSv4 ACL
#owner:root
#group:root
special:group@:r-x-:allow

Example 10-34 adds ACLs for another user.

Consider in the example, we are giving “Read-Write” access to the Windows AD user
“David”, for an already existing export named “Sales” in the “/ibm/gpfs0” filesystem.
Example 10-34 Adding ACL for giving user DAVID access to the export
$ mmeditacl /ibm/gpfs0/Sales
#NFSv4 ACL
#owner:root
#group:root
user:STORAGE3\david:rwxc:allow
(X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-READ_NAMED
special:group@:r-x-:allow
Save the file and when you quit, press “yes” when asked to confirm the ACLs. The new
ACLs will then be written for the user and the export.
Depending on the users you want to give access you can add them in the ACLs file. You
can also give group access in the similar way as above and add users to the group.
10.8 Services Management

In this section we discuss the Management Service function and administration.
10.8.1 Management Service administration

The Management service takes care of both the GUI interface of SONAS as well as the CLI
interface of SONAS. For working on any of these interface it is required that the Management
Service be running.
1. Stop Management Service

Using the GUI: As of now that Management Service cannot be stopped from the GUI.
Using the CLI: You can stop the Management Service by running the CLI command
stopmgtsrv. Command usage and output is shown below in Example 10-35.
Example 10-35 Command usage and output for CLI command stopmgtsrv to stop Management or CLI service
[Furby.storage.tucson.ibm.com]$ stopmgtsrv --help
usage: stopmgtsrv
stop the management service
[Furby.storage.tucson.ibm.com]$ stopmgtsrv
EFSSG0008I Stop of management service initiated by root
2. Start Management Service:

Using the GUI: You cannot start the Management Service using the GUI.
Using the CLI: You can start the Management Service using the CLI command
startmgtsrv. Command usage and output is shown below in Example 10-36.
Example 10-36 Command usage and output for starting Management or CLI service
[Furby.storage.tucson.ibm.com]$ startmgtsrv --help
usage: startmgtsrv [-f | --force]

start the management service
-f, --force restart gui if already running
[Furby.storage.tucson.ibm.com]$ startmgtsrv
EFSSG0007I Start of management service initiated by root
Once the service has started, you can verify by running the CLI help command on the CLI
or acces the GUI. CLI help should display all the commands that are available for the CLI
user. The GUI should prompt for user ID and password.
If you are unable to access the GUI or CLI commands, either restart the Management
service using the command startmgtsrv with the --force option. This restarts the
Management Service. Command output is as shown in Example 10-37.
Example 10-37 CLI command starmgtsrv to start CLI and Managament service forcefully
[Furby.storage.tucson.ibm.com]$ startmgtsrv --force
EFSSG0008I Stop of management service initiated by root
EFSSG0007I Start of management service initiated by root
10.8.2 Manage Services on the cluster

The services that are running on SONAS appliance are CIFS, FTP, HTTP, NFS and SCP.
These services are needed for clients to access the SONAS data exports. All these services
would be already configured during the configuration of the SONAS box using the fgad and
cfgldap command.
You can view the status of the services configured. You can also enable and disable them.
They need to be configured for you to carry out any operations on them. Below we disucss
each task that can be carried out on the services.

1. List the Service Status

Using the GUI: You can view the services that are active from the GUI by checking the
“Services” tab under the Clusters section of the category Cluster. You cannot disable or
enable any service. See point 1.f under “Clusters” on page 312 for more information.
Using the CLI: You can list the services by CLI command lservice. This command lists all
the services, its state and also if it is configured or not. The command usage and output is
as shown in Example 10-38 below.
Example 10-38 Example for usage and command output for CLI command lsservice
[Furby.storage.tucson.ibm.com]$ lsservice --help
usage: lsservice [-c <cluster name or id>] [-r] [-Y]
[Furby.storage.tucson.ibm.com]$ lsservice
Name Description Is active Is configured
FTP FTP protocol yes yes
HTTP HTTP protocol yes yes
NFS NFS protocol yes yes
CIFS CIFS protocol yes yes
SCP SCP protocol yes yes
In the example, you can see, all the services are configured. This means, all the
configuration files for the services are up to date on each node of the cluster. Under the
column “Is Active”, you can see if the service is active or inactive. Active denotes that the
service is up and running. Exports can be accessed using that service. Users or clients
can access the data exported using that protocol or service. Inactive means that the
service is not running and hence all data connections will break.
2. Enable Service
Using the GUI: You cannot enable service using the GUI.
Using the CLI: You can enable service using the eblservice CLI command. The command
usage and output is as in Example 10-39 below. To enable service, you need to pass the
“clustername” or “clusterid” as mandatory parameter and also names of services as a
comma separated list, which you want to enable. To enable all, you can pass “all”. The
command asks for confirmation. You can also use the --force option to force the operation
and override the confirmation. In our example, we have the services FTP and NFS
disabled. We enable them using the command eblservice.
Example 10-39 Example showing usage and command output for CLI command eblservice
[Furby.storage.tucson.ibm.com]$ eblservice --help
usage: eblservice -c <cluster name or id> [--force] [-s <services>]
--force enforce operation without prompting for confirmation
-s,--services <services> services
FTP FTP protocol no yes
NFS NFS protocol no yes

[Furby.storage.tucson.ibm.com]$ eblservice -c st002.vsofs1.com -s ftp,nfs --force
FTP FTP protocol yes yes
3. Disable Service
Using the GUI:You cannot enable service using the GUI.
Using the CLI: You can disable service using the CLI command dblservice. The
command usage and output is as in Example 10-40 below. To disable service, you need to
pass the names of services, as a comma separated list, which you want to disable. To
disable all, you can pass “all”. The command asks for your confirmation. You can also skip
the confirmation by using the --force option which forces the disabling of service. You can
also confirm usng hte command lsservice as shown in same Example 10-40.
CIFS and SCP need to be always running. CIFS is require to be running for CTDB to be
healthy and SCP which is the SSH service cannot be stopped as all the internal
communication between the nodes is done using SSH. In case you pass CIFS and SCP,
they will not be stopped and a warning message is seen. The other services will be
stopped. The second Example 10-41 on page 399 shows disabling all the services with
the --force. option. You can also see the warning message for CIFS and SCP in this case.
Example 10-40 Usage for CLI command dblservice and output when disabling FTP only
[Furby.storage.tucson.ibm.com]$ dblservice --help
usage: dblservice [-c <cluster name or id>] [--force] -s <services>
--force enforce operation without prompting for confirmation
-s,--services <services> services
[Furby.storage.tucson.ibm.com]$ dblservice -s ftp

Warning: Proceeding with this operation results in a temporary interruption of file services
EFSSG0192I The FTP service is stopping!
EFSSG0194I The FTP service is stopped!

Example 10-41 Example where all services are disabled - CIFS and SCP show warning message
[Furby.storage.tucson.ibm.com]$ dblservice -s all --force
EFSSG0192I The NFS service is stopping!
EFSSG0192I The HTTP service is stopping!
EFSSG0193C Disable SCP services failed. Cause: Never stop scp/sshd service. We didn't stop
scp/sshd service but other passed services were stopped.
EFSSG0192I The FTP service is stopping!
EFSSG0193C Disable CIFS services failed. Cause: Never stop cifs service. We didn't stop cifs
service but other passed services were stopped.
EFSSG0109C Disable services failed on cluster st002.vsofs1.com. Cause: SCP : Never stop scp/sshd
service. We didn't stop scp/sshd service but other passed services were stopped.CIFS : Never
stop cifs service. We didn't stop cifs service but other passed services were stopped.
HTTP HTTP protocol no yes
NFS NFS protocol no yes
4. Change service configuration

Using the GUI: You can change the configuration for each service configured using the
GUI. As seen in point 1.f under “Clusters” on page 312. In the you can see the table
containing list of services. Each of these services are a link and you can click on each. On
clicking on them, a new window opens which allows you to change different configuration
parameters for the service. We shall see below for each of the service or protocol.
– FTP: A new window as shown in Figure 10-107 below shows the various parameters
that you can change for the FTP configuration. Click on “Apply” when done. The new
configuration data is written into the CTDB registry and also the FTP configuration files
on each node.

Figure 10-107 FTP configuration parameters
– HTTP: HTTP requires you to install an HTTP certificate. When clicked on the HTTP
link, you can see a window like in Figure 10-108 below.
You can install an existing certificate or generate a new one as discussed below.
Figure 10-108 HTTP Configuration panel
• Upload an Existing Certificate: You can upload an existing certificate which is a

“.crt” or “.key” file. Click the “Upload Certificate” button to upload a new certificate.

A new window as shown below in Figure 10-109 opens up. It asks you for the path
for the certificate. Click on browse and search for the certificate file. Click on
“Upload” button to upload the file. This window then closes. Click on the “Install
Certificate” book as shown above in Figure 10-108.
Figure 10-109 Uploading the certificate file
• Generate a New Certificate: Fill out all the text boxes as shown in Figure 10-108.
Click on the “Generate and Install certificate” button. This will generate a new one
and installs the certificate.
– NFS: NFS as of now does not have any configuration parameters to modify.
– CIFS: As shown in the Figure 10-110 below, you can see the different parameters that
you can change for CIFS. As you can see in the figure, you can change some common
parameters and also some “Advanced Options”. You can Add, Modify or Remove
advanced parameters using the respective buttons in the panel. Click on Apply button
when done. The configuration will be successfully written on all nodes.
Figure 10-110 CIFS configuration parameters
– SCP: When you select the SCP protocol by clicking on its link a new window opens.
Figure 10-111 below shows the different parameters you can modify for SCP service or
protocol. SCP protocol also provides SFTP method for data access. You can allow or
disallow SFTP by using the check box for it. Click on “Apply” to apply the changes.

Figure 10-111 SCP Configuration Details
Using the CLI: As this time, you cannot change the configuration parameters from the
SONAS CLI.
10.9 Real-time and historical reporting

The SONAS GUI allows for real time and historical reporting. You can generate Reports
and Charts to display the cluster utilization. This can be both the System Utilization or File
system Utilization.
As seen in “Performace and reports:” on page 329, the GUI has panels for both System
and File System Utlization. You can also generate emails to be sent to administrator to
report an event or threshold. You can set this up using the GUI. This feature is not
available in the CLI.
Reporting and generating charts are explained in the below sections.
10.9.1 System Utilization:

You can use this section for generating reports. To access this feature, click on the
“Performace and Reports” category to expand the options. Click on the “Systen Utilization”
link.
The System utilization table displays information about the nodes in the cluster. Select the
nodes you want to generate charts for. Choose the “Measurement variable” from the drop
down menu. This is a list of system variables that you can measure utilization for. Some of
them include, CPU Usage, Memory Usage, Network usage and errors, Disk I/O and usage.
Figure 10-112 Measurement Variable for System Utilization charts
Select the “Measurement duration” from the drop down menu. This list allows you to select
a duration of time for which you want to measure the utilization of the system. You can

choose durations like, Daily, Weekly, Monthly, 6monthly, 18monthly and more. See
Figure 10-113 below.
Figure 10-113 Measurement Duration for System Utilization charts
Once you are done with selecting the Node, variable whose Utilization you want to check
and duration of check, click “Generate Charts” button, the chart is generated. Below
figures show two examples. One dislaying charts of Daily - Memory Usage for the
Management Node (Figure 10-114 on page 403)and the other Weekly - Disk I/O for
Interface Node 2 (Figure 10-115 on page 404).
Figure 10-114 Daily memory Utilization charts for Management Node

Figure 10-115 Weekly Disk I/O Utilization charts for Interface Node 2
The above examples show some of the options. You can also generate charts for all nodes,
selective nodes.
10.9.2 File System Utilization

You can use this section for generating reports. To access this feature, click on the
“Performace and Reports” category to expand the options. Click on the “Systen Utilization”
link.
The File System utilization table displays information about the filesystem in the cluster.
Figure 10-35 on page 331 shows the panel for the File System Utilization. Select filesystem
whose usage charts you want to generate. Select the duration from the drop down menu. This
list allows you to select a duration of time for which you want to measure the utilization of the

system. You can choose durations like, Daily, Weekly, Monthly, 6monthly, 18monthly and
more. See Figure 10-116 on page 405 below.
Figure 10-116 Duration period for File System Utilization charts
Once you are done with selecting the Node and duration, click “Generate Charts” button. The
chart is generated. Below figures show two examples. We have just a single filesystem. We
generate charts for Daily - File System Usage (Figure 10-117 ) and the other Weekly - File
system (Figure 10-118 on page 406).
Figure 10-117 WeeklyFilesystem Utilization charts for gpfs0

Figure 10-118 Daily File System Utlization Charts for gpfs0
10.9.3 Utilization Thresholds and Notification

Click on the “Utilization Thresholds” link under category “SONAS Console Settings” in the
Management GUI. This pane displays all thresholds for utilization monitoring. See
Figure 10-36 on page 332. The table displays all the threshold details that have been added.
You can add threshold by clicking ont he “Add Threshold” button. A new window opens as
shown in below. Enter the details of the threshold you want to add. You can choose frm the
drop down menu the variable you want to monitor like, CPU, Filesystem, GPFS, Memory and
Network usage. See Figure 10-119 below.
Figure 10-119 Add new utilization thresholds details panel
Choose the warning, error level and reoccurances you want to track. See Figure 10-120 on
page 407.

Figure 10-120 Utilization thresholds panel parameters
When done, click OK. A new threshold will be added to the list. You need to configure the
receipients in order to receive email notifications.
Click on the link, “Notification Settings” under the “Console settings” category.
10.10 Scheduling tasks in SONAS

SONAS allows you to schedule some tasks to be performed without any manual intervention.
There are some GUI tasks that can be schduled and some CRON tasks that can be
scheduled. There is a fixed list of tasks that you can schedule as of now on the SONAS
appliance.
You need to create a task which schedules a predefined task for the management node. A
predefined task can be a GUI task or a cron task. GUI tasks can be scheduled only one time
and only run on the management node. Whereas cron tasks can be scheduled multiple times
and for the different clusters managed by the management node. Cron tasks are predefined
to run either on all nodes of the selected cluster or on the recovery master node only.
An error is returned to the caller if either of the following conditions is met:

1. An already scheduled GUI task is scheduled for another time.
2. A task with the denoted name does not exist.
There are many operations you can perform on the tasks using the GUI or the CLI. We shall
all both in detail.
10.10.1 List tasks

Using the GUI: You can lists the tasks that are defined already. Click on the “Scheduled
Tasks” links under “SONAS Console Settings” category. This will display all the tasks that are
already added to the cluster. These are the predefined tasks. The GUI panel for listing is
already described above. See point 2 under “SONAS Console Settings” on page 331 for
more. The tasks that are executed on the GUI run only on the Management Nodes. The Cron
tasks can be run on one or all nodes of the cluster.
Using the CLI: The tasks can be scheduled using the CLI command mktask. The command
takes some input values like cluster name, seconds, minutes, hours and other time values for
the task to run. Along with this, there is an option called “parameter”. This parameter is
optional and only valid for a cron task. The GUI tasks currently do not have any parameters.
An error is returned to the caller if this option is denoted for a GUI task. The parameter

variable is a space separated parameter. The command usage and output for adding both
GUI and CRON is as shown in Example 10-42 below.
Listed below are the CRON tasks that are available in SONAS are:
1. MkSnapshotCron:
Parameter: The cron job expects 2 parameter in the following order:
clusterName - the name of the cluster the file system belongs to
filesystem - the file system description (for example, /gpfs/office)

2. StartReplCron:
Parameter: The cron job expects 2 parameter in the following order:
source_path - the directory that shall be replicated
target_path - the directory to which the data shall be copied

3. StartBackupTSM:
Parameter: The cron job expects 1 parameter
clusterName - the cluster of the file systems which must be backed up

4. StartReconcileHSM:
Parameter: The cron job expects 3 parameters in the following order:
clusterName - the cluster of the file systems which must be backed up
filesystem - the file system to be reconciled
node - the node on which the file system is to be reconciled

5. BackupTDB:
Parameter: The cron job expects 1 parameter
target_path - the directory to which the backup shall be copied
For more information on how to add these parameters for these CRON tasks, you can refer to
the manpage for the command mktask.
The first example as in Example 10-42 shows the adding of MkSnapshotCron task. This task
is a CRON task. This task takes 2 parameters, Clustername and Filesystem name. For our
example, we have clustername as “Furby.storage.tucson.ib.com” and filesystem as “gpfs0”.
In the second example in Example 10-42, we add a task that is a GUI task.
Example 10-42 Command usage and output in adding CRON and GUI tasks using CLI command mktask
[Furby.storage.tucson.ibm.com]$ mktask --help
usage: mktask name [-c <cluster name or id>] [--dayOfMonth <dayOfMonthdef>] [--dayOfWeek
<dayOfWeekdef>] [--hour <hourdef>] [--minute <minutedef>] [--month <monthdef>] [-p <parameter>]
[--second <seconddef>]
name
Specifies the name of the newly created task.
--dayOfMonth <dayOfMonthdef> define the scheduler option for the dayOfMonth

--dayOfWeek <dayOfWeekdef> define the scheduler option for the dayOfWeek

--hour <hourdef> define the scheduler option for the minute
--minute <minutedef> define the scheduler option for the minute
--month <monthdef> define the scheduler option for the month
-p,--parameter <parameter> denote the parameter passed to the scheduled cron task
--second <seconddef> define the scheduler option for the second
[Furby.storage.tucson.ibm.com]$ mktask MkSnapshotCron --parameter "Furby.storage.tucson.ibm.com

gpfs0" --minute 10 --hour 2 --dayOfMonth */3
EFSSG0019I The task MkSnapshotCron has been successfully created.
[Furby.storage.tucson.ibm.com]$ mktask FTP_REFRESH --minute 2 --hour 5 --second 40

EFSSG0019I The task FTP_REFRESH has been successfully created.
10.10.2 Remove task:

Using the GUI: You can remove the task, by selecting the task from the table of tasks and
click the “Remove” button. The operation opens a new window asking for confirmation as
seen in Figure 10-121 below.
Figure 10-121 Confirmation to remove the tasks
operation and once successfull will have Green check marks (ü). If any error, the error
Using the CLI: You can remove the task added using the CLI command rmtask. This
command delets the command from the list of tasks to be scheduled by the system. An error
is returned to the caller if a task that does not exist is denoted. The command usage and
output is shown in Example 10-43. In the first example, we delete an CRON task added,
MkSnapshotCron and in the second example we delete the GUI task FTP_REFRESH
Example 10-43 rmtask CLI command example

[Furby.storage.tucson.ibm.com]$ rmtask --help
usage: rmtask name [-c <cluster name or id>]

name
Specifies the name of the task for identification.
[Furby.storage.tucson.ibm.com]$ rmtask MkSnapshotCron

EFSSG0021I The task MkSnapshotCron has been successfully removed.
[Furby.storage.tucson.ibm.com]$ rmtask FTP_REFRESH

EFSSG0021I The task FTP_REFRESH has been successfully removed.
10.10.3 Modify the Schedule Tasks

Using the GUI: You can modify a task from the GUI. Select the task from the table of tasks,
that you need to modify. Click on it such that the details are displayed below the table. The
details below have some parameters that can be modified. Schedules for these tasks are
modifiable. For CRON tasks, the parameters for each task is also modifiable. Figures below
display the panel which allows to modify the tasks details. The first Figure 10-123 shows
panel for the CRON tasks. In this example, the CRON task “MkSnapshotCron” is seen. Here
you can modify both the Schedule for task and the Task parameters. Click on “Apply” for the
chages to be applied.
Figure 10-123 Panel to modify a CRON task- the Schedule and Task parameter can be modified
The Figure 10-124 shows for GUI task. As you can see, the Schedule for the task can be
modified. Click on “Apply” when done to apply the changes. In this example, we have
considered modifying the GUI task FTP_REFRESH.

Figure 10-124 Panel to modify the GUI task FTP_REFRESH
10.11 Health Center

From the GUI you can access the Health Summary located on the left panel. This Health
Summary can provide you detailed information regarding any components inside your
SONAS Storage Solution, this through topology, alert or system logs features as described in
“GUI tasks” on page 306. In addition to the Call Home and SNMP traps features, this are the
three components of the SONAS Health Center.
10.11.1 Topology
From the GUI, In the Health Summary section on the left panel, you can reach the topology
feature. This will display a graphical representation of the varied SONAS architectural
components. Indeed you will find there information regarding Management and Interface
Nodes status, but also the public and data networks, Storage Pods, File System and exports.
Overview
The first view as shown in Figure 10-125, will give you a big picture of your system, for more
details on a specific area you will have to expand this area.

Figure 10-125 SONAS topology overview
In topology view all components of your SONAS are described, when moving your cursor on
the selected component you will see a tooltip as shown on in the figure above. Then for more
information on one of these components click on the link appropriate link.
Topology layered displays and drilldown

The topology view gives a quick cluster status overview via the Health Center backend and
the displayed data is retrieved from the SONAS Health Center backend database. The
topology webpage heavily uses the Dojo toolkit to display and retrieve data and data is
retrieved via AJAX calls in different intervals, depending on the selected view. The accuracy
and level of detail of the displayed data is split into 3 layers, each going into greater detail.
Layer 1 shown in Figure 10-126 on page 412 gives a short status overview of the main
system components and the view is updated every 5 seconds. You can click on the layer 1
icon to drill down to level 2 displays.
Component Status Summary (click to open Level 3 Status view)
Component Title
Component Icon
Click to display Level 2 or
Level 3 Details view
Click to display Level 2 view

Component Details
Figure 10-126 Topology view at layer 1

Layer 2, shown in Figure 10-127, display details about the Interface Nodes and Storage
building blocks and is updated every 5 seconds. Clicking on a layer 2 icon brings up the layer
3 view.
Displays Interface Nodes

based on logical internal name
(i.e. 001 represents int001st001)
Click to open
Level 3 view
Figure 10-127 Topology view at layer 2
Layer 3, an example of which is shown in Figure 10-128, gives the deepest level of accuracy.
Modal dialog windows are opened to display the details that are updated every 30 seconds,
or can be refreshed manually by clicking the refresh icon.
Figure 10-128 Topology view at layer 3 example
All Interface, Management and Storage Node Details have the following same tabs:
Hardware, Operating System, Network, NAS Services and status.
Interface Node
For instance, if you need more information regarding the Interface Nodes because of the
warning message in the figure above click on the “Interface Nodes (6)” link in the figure above
and you will see the Figure 10-129 on page 414.

Figure 10-129 Interface Nodes overview
The new windows above show you an overview of the Interface Nodes configuration of your
SONAS Storage Solution. We can see here that the Warning message propagated in the
global overview is actually not a warning message for a particular node, but for all of them.
Here again in order to have more details for a given Interface Node click on the choosen
target and you will see all information related as described in the Figure 10-130 on page 415.
10.11.2 Default Grid view

In all following section we describe the Graphical version dealing with icons, but you can
request the listing view by clicking on the List tab beneath the Interface Nodes icons. The
Icon is part of the default Grid view.
This new windows provides you information on varied domains such as the hardware, the
Operating System, the Network, the NAS services and the status in each corresponding tab.
The Figure 10-130 on page 415 shows you information regarding the Hardware section, and
in this section you can even have a finer granularity with Motherboard, CPU, FAN, HDD,
Memory Modules, Power or Network Card tabs.

Figure 10-130 Interface Node Harwarde information
The next Operating System section, you will find details regarding the Computer System
details, the Operating System Details or the Local File System as described in Figure 10-131.
Figure 10-131 Interface Node Operating System information
If you need information on the Network status on that particular Interface Node then choose
the Network section and you will find there all information regarding all network bonding
interface configured on the selected Interface Node as shown in the following Figure 10-132
on page 416.

Figure 10-132 Interface Node Network information
Similarly if you need input regarding the NAS services or the Interface Node status, then
choose the approrpriate tabs as described in Figure 10-133 and Figure 10-134 on page 417
below.
Figure 10-133 Interface Node NAS Services information

Figure 10-134 Interface Node Status Message Information
The NAS Services section shows you all Exports such as CIFS, NFS, HTTP, FTP and SCP
status, or service status like CTDB or GPFS for instance; and the Status section gather all
previsous information section with more details. Whereas the three first sections are static,
indeed you will find there only configuration information, the two last ones are dynamic and
the warning icon seen on higher level, interface nodes level or topology level, refers only to
the Status section (NAS services issues would also be included in Status section), to the first
line with the “degraded” Level to be more precise. Once this issue fixed, the warning icon will
disappear.
Management Node
Back to the topology overview, if you are more interested in Management Node information,
then click on the Management Node link and you will see the same windows and hierarchy as
described sooner for the Interface Nodes. Exact same section and tab except the NAS
Service section which does not exist anymore and is replaced by the Management Section as
you can see on the next Figure 10-135.
Figure 10-135 Management Node Management information

Interface Network
From the topology overview, you can also get information from the Interface Network by
clicking on the “Interface Network” link. There you will find information regarding the Public IP
addresses in use and the authentication method as described in Figure 10-136 on page 418
and Figure 10-137 below.
Figure 10-136 Interface Network Public IPs information
Figure 10-137 Interface Network Authentication information
Data Network
Once a again from the topology overview, if you need to pick up some information regarding
the data networ, or Infiniband network , then click on the “data network” link and you will see
something similar to the Figure 10-138 on page 419. In the first tab called “Network”, you will
find information regarding State, IP adress or throughput for each Infiniband connection
filtered by Interface, Management and Storage Nodes in left tabs. The second above tab
called “Status” will gather information similarly as the Status tab for each individual Interface
node in the Interface Node topology.

Figure 10-138 Data Network information
Storage Building Block

Last harware component of the topology overview is the Storage Buiding Blocks. This part is
the latest hardware component of the SONAS Storage Solution. This StorageBuilding block is
a list of Storage Pod used for your SONAS file system, on top you have built your SONAS
shares. Once you have selected the appropriate component, you will see a windows similar
to the Figure 10-139. This represent the first (and only in this example) Storage Pod used.
Figure 10-139 Storage Pod overview

If you click on the Storage Pod Icon you will see another familiar windows which enumerates
all components of this Storage Pod as described in the Figure 10-140 below.
Figure 10-140 Storage Controller view in Storage Building Block
The First tab describes Storage Components of the Storage Pod. In our case we have a
single Storage Controller, but you can have up to two Storage Controller and two Storage
Expansion units. The First tab of this Storage components view shows storage details, more
precisely controller details in our example; and the status tab shows you same kind of details
you may see in the status tab from the Interface Node above, see Figure 10-134 on page 417.
This was information related to the Storage part of the Storage Pod, in case you are looking
for information related to Storage Node, you have a dedicated tab for both Storage Nodes
inside the Storage Pod. If you click the Storage Node name tabs above you will find more
detailed information as described in Figure 10-141 and Figure 10-142 on page 421:
Figure 10-141 Storage Node view in the Storage Building Block

Figure 10-142 Second Storage Node in the Storage Building Block
For these two Storage Node, you can find similar information we described sooner for
Interface Nodes; the only difference is regarding the Storage tab where you may find
information regarding the SONAS File System as shown in Figure 10-143.
Figure 10-143 SONAS File Sytem information for each Storage Node
This Storage Building Block view, where you can find any information regarding Storage Pod
used in your SONAS File System is the latest hardware component of the SONAS Storage
Solution, but from the overview windows you may also find some information related to the
File System and the exports shares.
File System
Indeed from the overview window you may request File System information by clicking on the
File system component. You will then see a window as descrived in the Figure 10-144 on
page 422.

Figure 10-144 SONAS File System information
The above windows shows you typical File System information such as the device name, the
mount point, the size and available space left for instance. Each SONAS File System created
will result of one entry in this table.
Shares
As for SONAS file System, you may request to have information about the shares you
created from these file system. To have such information from the topology overview click on
the appropriate component and you will see something like Figure 10-145.
Figure 10-145 Shares information
In the above windows tou can see the status, the name, directory associated to your share,
but more important the protocol from which SONAS user can access this share. In our
example the share is accessible by FTP, NFS and CIFS.
These two lasts compenents complete the topology view from the Health Center. Further
sections will describe System logs, Call Home and SNMP features.
10.11.3 Event logs

Event logs are basically composed by two kind of logs, Alert and System logs. The Redhat
Linux Operating system reports all internal information, event, issues or failures in a ‘syslog’
file located in /var/log/messages. Each Interface and Storage Node has its own syslog file. In
SONAS all nodes send their syslog files to the Management node which consolidate all these
files and display in the System Log which is available from the GUI. It is a ‘raw’ display of
these files with some filtering tools as describe in Figure 10-146 on page 423. Each page
displays around 50 logs. System logs are of 3 levels, Information (INFO), Warnings
(WARNING) and Severe (SEVERE). You can filter the logs by the log level, component, host
and more.

Figure 10-146 System Logs window
The Alert log panel displays specific information warning and critical events from the syslog
and displays them them in a summarized view. As SONAS administrator you should have a
look first at this log when looking for problems. Each page has around 50 logs displayed, one
per event which can be an Info, Warning or Critical message, they are displayed in Blue,
Yellow and Red respectively. You can filter logs in the table depending on the severity, time
period of logs and source. Source of logs is the host on which the event occured on as shown
on Figure 10-147 on page 424.

Figure 10-147 Alert Logs window
The System Log panel displays system log events that are generated by the SONAS
Software, which includes management console messages, system utilization incidents, status
changes and syslog events. The Figure 10-146 on page 423 shows how the System log panel
in the GUI looks.
10.12 Call home

The SONAS Storage Solution has been designed in order to provide you a full support. We
have described above how to use the SONAS GUI and find there information in the topology
overview or directly from the Event Logs.
Actually each SONAS hardware component have at least one Error Detection Code method.
These method can be the Denali code which is a Director API module for performing
Interface, Storage and Management Nodes checking and monitoring. Or it can be the System
Checkout code, based on tape products, which monitors components like Infiniband
switches, Ethernet swithces,Fibre Channel connection or Storage Controller. Latest one is
the SNMP mechanism, used inside SONAS only, and monitors every components, servers,
switches and Storage Conrollers.
The Denali method use CIM providers which are also used by the System Checkout method,
whereas SNMP traps are converted also into CIM providers. All these methods provide inputs
to the GUI Health Center Event Log as described in the section above. Depending on the
severity of this issue, it can raise an Electronic Customer Care (ECC) Call Home. The Call
Home feature has been designed to start first with hardware events based on unique error

codes. This Call Home feature is configured as part of first time installation. It is used to send
hardware events to IBM support. Call Homes are based only on Denali and System Checkout
errors, but SNMP traps do not initiate Call Home.
The valid machine models that will call home are:

򐂰 2851-SI1 – Interface Nodes
򐂰 2851-SM1 – Management Nodes
򐂰 2851-SS1 – Storage Nodes
򐂰 2851-DR1—Storage Controller
򐂰 2851-I36 – 36 Port Infiniband Switch
򐂰 2851-I96 – 96 Port Infiniband Switch
There will be no call homes against a 2851-DE1 Storage Expansion unit as any errors from it
will call home against its parent 2851-DR1 Storage Controller Unit. Similarly any errors
against the Ethernet switches will call home against 2851-SM1 management node.
The Figure 10-148, show an example of Call Home, which initiates a Error ID-based Call
Home using a 8-character hex value as definied in the RAS Error Code Mapping File.
Figure 10-148 Sample Error ID Call Home
The following Figure 10-149, shows a Call Home test with the -t option:
Figure 10-149 Call Home test


Draft Document for Review November 1, 2010 9:32 am 7875Migration_JT.fm
11
Chapter 11. Migration overview

In this chapter we discuss how to migrate your existing file server or NAS filer into the SONAS
system. Migration of data on file systems is more complex then migration of data on block
devices.There is no universal tool or method for file migration.
This chapter will cover the following aspects:

򐂰 Migration of user authentication and ACLs
򐂰 Migration of files and directories
򐂰 Migration of CIFS shares and NFS exports

7875Migration_JT.fm Draft Document for Review November 1, 2010 9:32 am
11.1 SONAS file system authentication

In this section we illustrate the authentication services offered by the SONAS file system.
11.1.1 SONAS file system ACLs

The SONAS filesystem is provided by GPFS technology so most of the implementation
details and considerations are similar to those in a GPFS environment. We will refer
interchangeably to the SONAS file system and to the SONAS GPFS filesystem. The SONAS
file system supports the NFSv4 ACL model and so offers much better support for Windows
ACLs. For more information on NFSv4 ACLs see section 5.11 in the RFC3530 available at
http://www.nfsv4.org/ . NFS V4 ACLs are very different than traditional ACLs, and provide
much more fine control of file and directory access.
The NFS version 4 ACL attribute is an array of access control entries (ACE). Although, the
client can read and write the ACL attribute, the NFSv4 model is the server does all access
control based on the server's interpretation of the ACL. If at any point the client wants to
check access without issuing an operation that modifies or reads data or metadata, the client
can use the OPEN and ACCESS operations to do so.
In the case of NFS V4 ACLs, there is no concept of a default ACL. Instead, there is a single
ACL and the individual ACL entries can be flagged as being inherited either by files,
directories, both, or neither.
SONAS file ACLs can be listed issuing the mmgetacl command as shown in Example 11-1:
Example 11-1 mmgetacl output for a file

[root@plasma.mgmt001st001 b031]# mmgetacl pump0426_000019a4
#NFSv4 ACL
#owner:root
#group:root
special:owner@:rw-c:allow
(X)READ/LIST (X)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED
(-)DELETE (-)DELETE_CHILD (X)CHOWN (-)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED
special:group@:r---:allow:DirInherit
(-)DELETE (-)DELETE_CHILD (-)CHOWN (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED
user:redbook:r---:allow
group:library:r---:allow
special:everyone@:r---:allow
An NFS V4 ACL consists of a list of ACL entries, the GPFS representation of NFS V4 ACL
entries are three lines each, due to the increased number of available permissions beyond
the traditional rwxc.
The first line has several parts separated by colons (’:’).

򐂰 The first two parts part identify the user or group and the name of the user or group.
򐂰 The third part displays a rwxc translation of the permissions that appear on the
subsequent two lines.
򐂰 The fourth part is the ACL type. NFS V4 provides both an allow and deny type:

allow Means to allow (or permit) those permissions that have been
selected with an ’X’.
deny Means to not allow (or deny) those permissions that have been
selected with an ’X’.
򐂰 The fifth, optional, and final part is a list of flags indicating inheritance. Valid flag values
are:
FileInherit Indicates that the ACL entry should be included in the initial ACL
for files created in this directory.
DirInherit ndicates that the ACL entry should be included in the initial ACL for
subdirectories created in this directory (as well as the current
directory).
InheritOnly Indicates that the current ACL entry should NOT apply to the
directory, but SHOULD be included in the initial ACL for objects
created in this directory.
As in traditional ACLs, users and groups are identified by specifying the type and name. For
example, group:staff or user:bin. NFS V4 provides for a set of special names that are not
associated with a specific local UID or GID. These special names are identified with the
keyword special followed by the NFS V4 name. These names are recognized by the fact that
they end with the character ’@’. For example, special:owner@ refers to the owner of the file,
special:group@ the owning group, and special:everyone@ applies to all users.
11.1.2 File sharing protocols in SONAS

SONAS supports multiple file sharing protocols such as CIFS, NFS, FTP and others to
access files over the network. We will introduce some of these protocols and their
implications with SONAS.
NFS protocol
The Network File System (NFS) protocol specifies how computers can access files over the
network in a similar manner to how files are accessed locally. NFS is now an open standard
and is implemented in most major operating systems. There are multiple versions of NFS,
NFSv4 is the most current and emerging version and the most widespread in use is NFSv3.
SONAS supports NFSv3 as a file sharing protocol for data access and the SONAS filesystem
implements NFSv4 ACLs.
The NFS protocol is a client server protocol where the NFS client accesses data from a NFS
server. The NFS server, and SONAS acts as an NFS server, exports directories. NFS allows
parameters such as read only, read write and root squash to be specified for a specific
export. The NFS client mounts exported directories using the mount command.
Security in NFS is managed as follows: Authentication, the process of verifying if the NFS
client machine is allowed to access the NFS server, in NFS is performed on the IP address of
the NFS client. NFS client IP addresses are defined on the NFS server when configuring the
export. Authorization, or verifying if the user can access a specific file, is done based on the
user and group of the originating NFS client and this is matched against the file ACLs. As the
user on the NFS client is passed as is to the NFS server, a NFS client root user will have root
access on the NFS server; to avoid a NFS client gaining root access to the NFS server you
can specify the root_squash option.
FTP protocol
File Tranfer Protocol (FTP) is a protocol to copy files from one computer to another over a
TCP/IP connection. FTP is a client server architecture where the FTP client accesses files
Chapter 11. Migration overview 429

from the FTP server. Most current operating systems support the FTP protocol natively and
so do most web browsers. FTP supports user authentication and anonymous users.
SONAS supports FTP authentication through the SONAS AD/LDAP servers. File access
athorization is done with ACL support. SONAS supports enforcement of ACLs and the
retrieval of POSIX attributes but ACLs can not be modified using FTP.
CIFS protocol
The protocol used in Windows environments to share files is the Server Message Block
(SMB) protocol, sometimes called the Common Internet File System (CIFS) protocol. The
SMB protocol originated in IBM and was later enhanced by Microsoft and was renamed to
CIFS. Among the services that Windows file and print servers provids are browse lists,
authentication, file serving and print serving. Print serving is out of the scope of our
discussion. Browse lists offer a service to clients that need to find a share using the Windows
net use command or Windows Network Neighborhood. The file serving function in CIFS
comprises the following functions:
򐂰 Basic server function
򐂰 Basic client function
򐂰 Distributed File System (Dfs)
򐂰 Offline files/Client side caching
򐂰 Encrypted File System (EFS)
򐂰 Backup and restore
򐂰 Anti-virus software
򐂰 Quotas
The protocol also includes authentication and authorization and related functions such as:
򐂰 NT Domains
򐂰 NT Domain trusts
򐂰 Active Directory
򐂰 Permissions and Access Control Lists
򐂰 Group policies
򐂰 User profile and logon scripts
򐂰 Folder redirection
򐂰 Logon hours
򐂰 Software distribution, RIS and Intellimirror
򐂰 Desktop configuration control
Simple file serving function in SONAS is relatively straightforward, however, duplicating some
of the more advanced function available on Windows servers can be more difficult to set up.
SONAS uses the CIFS component to serve files. Athentication is provided through LDAP or
AD with or without the Microsoft SFU component. Authorization is supported using ACLs are
enforced on files and directories, for users with up to 1020 group memberships. Windows
tools cn be used to modify ACLs. ACL inheritance is similar, not identical, to Microsoft
Windows and SACLs are not supported.
11.1.3 Windows CIFS and SONAS considerations

Windows clients can use the CIFS protocol to access files on the SONAS fileserver. The
SONAS CIFS implementation is nearly transparent to the users. SONAS like a Windows file
and print server, but cannot behave completely as a Windows 2000 or XP server.
Server side encription and file transfer compression or server side compression is not
supported. SONAS does not support signed SMB requests. SONAS can participate in a DFS

infrastructure but cannot act as the DFS root. Trasparent failover between nodes is supported
providing the application supports network retry. To insure consistency files are synced to
disk when CIFS closes the file. SONAS CIFS supports byte range strict locking. SONAS
supports lease management for client side caching, it supports level1 opportunistic locks
(oplocks) but not level2 oplocks.
Files migrated by SONAS HSM will be shown as offline files and marked with the hourglass
symbol in Windows Explorer.
SONAS supports the standard CIFS timestamps for the client:

򐂰 Created Time stamp: When the file was created in current directory, when copiedd to a
new directory a new time stamp is created
򐂰 Modified Time stamp: When the the file is last modified. When the file is copied to another
directory the modified time stamp remains the same.
򐂰 Accessed Time stamp : It is the time when the file is last accessed. This value is set by the
application program that sets the value. It is application dependent, not all applications
update this timestamp.
򐂰 Meta-Data change Timestamp. last change to file metadata.
SONAS snapshots are exported to Windows CIFS clients via the VSS API, This means that
the snapshot data can be accessed through the previous versions dialog in Windows
Explorer.
SONAS supports case insensitive file lookup for CIFS clients. SONAS also supports the DOS
attributes on files, The read-only bit is propagated to POSIX bits to make it available to NFS
and other clients. SONAS supports the automatic generation of MS-DOS 8.3 character file
names.
Symbolic links are supported for clients such as linux and MAC OSX that use the SMB unix
extensions. Symbolic links are followed on the server side for Microsoft SMB clients but will
not be displayed as symbolic links but as files or directories that are referenced by the link.
SONAS does not support acces-based enumeration, also called hide unreadable, that hides
directories and files from users that have no read access. SONAS does not support sparse
files. SONAS does not support interoperations with WINS to appear in the “Network
Neighborhood” of Windows clients. SONAS does not currently supports the SMB2 and
SMB2.1 enhancements of the SMB protocol introduced by Windows 2008 and Windows 7.
Multiple protocols can be selected and configured when creating an export. If there is not a
specific need to configure multiple protocols a single protocol should be used to increase
performance by not propagating leases and share modes into the linux kernel to allow proper
interaction to direct file system access by multiple protocols.
11.2 Migrating files and directories

When you deploy a new SONAS infrastructure in an existing environment you may need to
migrate files and directories from you current file servers to SONAS. The process of migrating
files and directories need to be planned carefully as the migration of files and directories may
require a considerable amount of time and the migration of file metadata requires careful
evaluation.

11.2.1 Data migration considerations

The migration of files and directories consists in copying the data from a source file server to
the destination SONAS appliance using specific tools such as robocopy or rsynch. These
tools work by reading data from the source file server and writing to the destination SONAS
appliance as shown in Figure 11-1 on page 432. We will have an intermediate system, a data
mover system, that mounts both the old file server shares and the new SONAS appliance
shares and executed the migration software tool copies all the data over.
Figure 11-1 Migration scenario with data mover
In the diagram above the data flows twice through the network, read from old file server and
write to new SONAS. Depending on the type of the file server it may be possible to run the
migration software tool on the old file server system and eliminate one network hop. The
amount of time to copy the data over is affected by multiple factors such as the following:
򐂰 The amount of data to migrate, the more the data the longer the time it will take to migrate
the data
򐂰 The network bandwidth available for the migration, the greater the bandwidth the shorter
the time. One way is to dedicate network lnks for the migration process. The data mover
system will have a greater bandwidth requirements as it will have to read the data from the
source system and write it out again to the destination system so it will need twice the
network bandwidth as the SONAS appliance. One way to reduce contention is to use two
different adapters, one to the source filer and a separate one to the SONAS system.
򐂰 The utilization of the file server. Contention for file server resources may slow down the
migration. The file server may be still in production use and so you shoud evaluate file
server disk and server utilization before the migration.
򐂰 Average file size may impact migration times as smalle files will have more metadata
overhead to manage for a given amount of data and so will take longer to migrate.
򐂰 Disk fragmentation on the source file server may slow down the reading of large
sequential files.
Applications and users typically need access to a whole export or share or a whole
subdirectory tree in an export or share. In general the application does not allow to have
access to a subset of directories or shares. Consequently, the migration of files needs
downtime. During the migration process some file are already migrated while others not and
there are no mechanisms for synchronization between migration and user or application
access to the file. Consequently applications and users cannot access data while files are
being migrated.

The data migration process can be executed in different ways:

򐂰 Migration of a file server – in a single step
– Needs long downtime for larger file servers
– Requires downtime for all apps/users
– IP address of old file server can be replaced by IP address of SONAS in DNS server
– File access path does not change from an app/user point of view
򐂰 Migration of a file server – one share/export after the other
– Shorter downtime as above
– Requires downtime for some apps/users
– DNS update does not work
• Old file server and SONAS run in parallel
• Applications and users must use new access path, once files are migrated, and this
requires client side changes
򐂰 Migration of a file server – one subdirectory after the other
– Requires a shorter downtime than the case above
– Same considerations as for migration by share/export
The use of tools that allow incremental resyncronization of changes from source to target
open up additional possibilities and we show the two options:
򐂰 Stopping the client applications, copying the files from source system to destination and
then redirecting clients to the SONAS target. This approach requires potentially large
downtimes to copy all the data.
򐂰 Copy the data to the SONAS target while clients access the data. After most of the data
has been copied the client applications are stopped and only the modified data is copied.
This approach reduces the downtime to a synchronization of the updated data since the
last copy was performed. It requires that the file copy tool that you use support
incremental file resynch.
11.2.2 Metadata migration considerations

The migration of file metadata needs careful planning. SONAS 1.1 stores access control lists
in GPFS NFSv4 ACLs, this is made possible because GPFS supports the NFSv4 ACL model.
Unix POSIX bits are mapped to GPFS ACLs by SONAS internally. Windows CIFS ACLs are
mapped to GPFS ACLs and Windows DOS attributes such as hidden, system, archive and
readonly are stored in the GPFS inode. Data migration tools such as xcopy and robocopy
need to copy the metadata correctly.
NFS v3 installations can use standard Unix tools such as cp and rsync to copy the data from
the source to the destination system. Files owned by root may require special attention as
different unix platforms may have different default groups for root, also the root squash export
option may remap the root user UID/GID to a different UID/GID, the one assigned to the
nobody account on tho SONAS server or on the data mover machine. NFS v4 client access is
currently not supported by SONAS.
Installations using the CIFS client access can use standard Windows tools for file migration
such as xcopy or robocopy. SONAS ACLs are not fully interoperable with Windows ACLs. If
you have complex ACL structures, for example structures that contain large numbers of users
and groups or nested groups, an expert assesment of the ACL structure is strongly
recommended and a proof of concept may be needed to verify differences and develop a
migration strategy.

If you have a mixture of NFS v3 and CIFS access to your file systems you must decide
wether to use Windows or unix copy tools as only one tool can be used for the migration. As
Windows metadata tends to be more complex than the unix metadata we suggest that you
use the Windows migration tools for migration and then verify if unix metadata is copied
correctly.
Additional challenges may be present when you must migrate entities such as sparse files,
hard and soft links and shortcuts. For example using a program that does not support sparse
files to read a sparse file that occupies 10MB of disk space and represents 1GB of space will
cause 1GB of data to be transferred over the network. You need to evaluate these cases
individually to decide how to proceed.
Note: The migration of ACLs makes sense only when the destination system will operate
within the same security context as the source system, in the sense that they will used the
same AD or LDAP server.
11.2.3 Migration tools

There are multiple tools that can be used to migrate files from one file server to another file
server or SONAS appliance. We will illustrate some of the available tools:
xcopy The xcopy utility is an extended version of the Windows copy
command. This utility comes with the Windows NT and 200x operating
systems. Xcopy was developed to copy groups of files between a
source and a destination. As of Windows 2000 xcopy can copy file and
directory ACLs, these were not copied in Windows NT. xcopy is
deprecated and substituted by the robocopy command. Xcopy should
be used to migrate CIFS shares.
robocopy Robocpy is a Windows tool and has been developed as a follow-on to
xcopy. Robocopy was introduced with the Windows NT 4.0 resource
kit and has become a standard feature of Windows Vista, Windows 7
and Windows server 2008. Robocopy offers mirroring of directories
and can copy NTFS file data together with attributes, timestamps and
NTFS ACLs. It supports network faliure and restart and also
bandwidth throttling. Robocopy also supports a mirro mode to lign the
contents of directories and remove destination files that have been
removed from the local directory. Robocopy offers a GUI interface that
can be used to execute file migrations or generate a command line
script for deferred execution. The robocopy utility should be used to
migrate CIFS shares.
richcopy Richcopy is a freely available Microst tool similar to robocopy that has
a GUI interface. One of the advantages of richcopy is that it allows
multitheading of copy operations and this can improve migration
performance.
secure copy The secure copy or scp command is an implementation of the secure
copy protocol. It allows you to securely transmit files and directories
and timestamps and permissions for files. It should be used to
transport data between NFS shares.
rsync The rsync tool is a unix application to synchronize files and directories
between different servers. It minimizes data transfer between the sites
as it can find and trasmit only file differences, this can be useful when
performing data migration in incremental steps. The rsync tool
supports compression and encrypted transmission of data and it offers

bandwidth throttling to limit both bandwidh usage and load on the

source system. It supports the copying of links, devices, owners,
groups, and permissions and ACLs; it can exclude files from copy and
copy links and sparse files. The rsync tool should be used to transport
data between NFS shares.
net rpc share Samba offers a utility called net that when used as net rpc share
migrate files can be used to copy files and directories with full
preservation of ACLs and DOS file attributes. To use this utility to
migrate files from an existing Windows file server to a SONAS system
you need a separate data mover system running Linux with samba.
For more information on the command see:
http://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/Ne
tCommand.html This migration approach can be used to transport data
between CIFS shares.
There are other tools for file and directory copy but they are ouside the scope of our
discussion. Whatever tool is chosen you should test the migration process and the resulting
migrated files and directories before performing the real migration and switchover. Special
care should be placed in verifying the permissions and ACL migration. Tools such as Brocade
VMF/StorageX have been dicontinued. For information on the F5 file vertualization solution
go to:
http://www.f5.com/solutions/technology-alliances/infrastructure/ibm.html
There are various products on the market that can perform transparent file migration from a
source file server to a destination file server such as SONAS. These products act as a
virtualization layer that sits between the client application and th file servers, migrates data in
the background while redirecting user access to the data.
The F5 intelligent file virtualization solutions enable you to perform seamless migrations
between file servers and NAS devices such as SONAS. No client reconfiguration is required
and the migration process that runs in the background does not impact user access to data.
For more information please refer to:
http://www.f5.com/solutions/storage/data-migration/
The AutoVirt file virtualization software offers a policy-based file migration function that can
help you schedule and automate file migration tasks and then perform the file migration
activities transparently in the background while applications continue to accessthe data. For
more information refer to:
http://www.autovirt.com/
The SAMBA suite offers a set of tools that can assist in migration to a Linux SAMBA
implementation. The net rpc vampire utility can be used to migrate one or more NT4 or later
domain controllers to a SAMBA domain controller running on Linux, the vampire utility acts as
a backup domain controller and replicates all definitions from the primary domain controller.
SAMBA also offers the net rpc migrate utility that can be used in multiple ways as illustrated:
򐂰 net rpc share migrate all migrates shares from remote to a destination server
򐂰 net rpc share migrate files migrates files and directories frto a destination server
򐂰 net rpc share migrate security migrates share-ACLs from remote to destination server
򐂰 net rpc share migrate shares migrates shares definitions from remote to destination server

11.3 Migration of CIFS Shares and NFS Exports

After having migrated files and their related permissions you need to access the shares over
the network using a protocol such as CIFS or NFS. The access path for a user or an
application comprises the following components:
򐂰 The DNS name or the IP address of the file server
򐂰 The name of the CIFS network drive or the NFS export directory
򐂰 The file path inside the network drive/export directory also called sub directory tree and
filename
After migration to the new server you can either change the IP address of the file server on
the clients to point to the new server. Alternatively you can change the IP addresses in the
DNS name to point to the new server, this approach requires less effort as only a DNS name
change is required. It can only be used with offline migration as in coexistence mode, when
bth the old and new file servers are used, you will require two distinct IP addresses.
The name of the CIFS network drive or of the NFS export should be maintaned unchanged
on the new server to simplify migration. The file path should also be maintained unchanged
by the file and directory migration process to minimize application disruption.
Samba offers a utility called net rpc share migrate shares that can be used to copy share
definitions from a source file server to a destination file server. This utility can be used to get
a list of all shares on a source server with the net rpc share list
11.4 Migration considerations

File data migration in a NAS environments is quite different from block data migration that is
traditionally performed on storage subsystems. In a block environment you migrate LUNs in
relatively straightforward ways. When you migrate files in a NAS environment you have to
take into account additional aspects such as the multiple access protocols that can be used
and the multiple security and access rights mechanisms that the customr uses and how these
fit in with SONAS.
The challanges in migrating file data in are:

򐂰 Keep downtime to a minimum or even no down time
򐂰 Ensure there is no data loss or corruption
򐂰 Consolidation, where multiple source file servers are migrated into one target
For completeness of our discussion, one way to avoid migration challenges is to avoid data
migration and repopulate the new environment from scratch. This is easire done for specific
environments such as digital media or data mining but can be prolematic for user files as you
cannot expect end users to start from scratch with their files.
When planning for the migration you shoud consider the downtime required to perform the
migration, downtime during which filesystems will not be accesible to the users. The
downtime duration is the time taken to copy all the files from the source filesystem to the
target filesystem and reconfigure users to point to the target filesystem and it is proportional
to the amount of data to be copied. To reduce the downtime you may plan to do an initial copy
of all the data whil eusers keep accessing it, then terminate user access and copy only the
files that have changed since the full copy was completed, thus reducing the downlime
required to perform the migration.

11.4.1 Migration data collection

Before initiating file data migration to SONAS you should gather information on the customer
environment and the amount of data to be migrated. Table 11-1 illustrates the main
information that is required to plan for file data migration.
Table 11-1 Migration data collection

Value Description
number of filesystems How many individual filesystems need to be migrated
total amount of data GB The quantity of data to migrate in all filesystems in GB. This is the
used space in the filesystems, not the allocated space.
average file size MB The average file size on all filesystems.
size of largest filesystem GB are there any single very large filesystems
list of filesystems with size & A list of all filesystems with the size and number of files for each
number of files one is useful for detailed planning, if availble.
data change rate GB The amount of data that is changed in the filesystem on a daily or
weekly basis. This can be obtained from backup statistics as the
change rate corresponds to the number of files backed up each
day with incremental backups.
number of users The number of users accessing the filesystems
authentication protocols What kind of protocols are currently being used for authentication:
AD, LDAP, SAMBA PDC?
Windows to UNIX sharing Are files shared between Windows and unix environments? Using
what kind of method for mapping Windows SIDs to UNIX
UIDs/GIDs?
Network bandwidth How are the filers connected to the network. What is the maximum
bandwidth available for the data migration? Will migration of data
impact the same network connections used by the end users?
routers and firewalls Are there routers and firewalls between the source and destination
file servers?
With the above information you can start to estimate aspects such as the amount of time it will
physically take to migrate the data, in what timeframe this can be done and what impact it will
have on users.
11.4.2 Types of migration approaches

There are multiple migration methods, including:
򐂰 Block level migration - not applicable in our case but included to show differences
򐂰 File system migration based on network copy from source to target
򐂰 File level backup and restore
A block-level migration can be performed when moving data to a new file server using the
same operating system platform. When performing block-level migration the advantages are
that you do not have to concern yourself with file meta data, file ACLs or permissions, the
migration can often be performed transparently in the background and is fast. The possible
disadvantages are that it may require additional hardware with associated installation costs
and may require multiple serice interruptions to introduce and remove the migration

appliance. Thsi approach is not applicable in SONAS as it is an appliance and the data to
migrate comes from external appliances. .
A file system migraton uses tools to copy files and file permissions from a source file server to
a target file server. The advantages are that there are multiple free software tools such as
xcopy and rsync to do this and they are relatively easy to setup, require little or no new
hardware. The disadvantages are that the migration of ACLs needs administrative account
rights for the duration of the migration, it is generally slower then block-level migration as the
throughput is gated by the network and the migration server and you must plan for mapping
CIFS to NFS shares.
File level backup and restore is also a viable migration option. It has to be a file-level backup,
so NDMP backups are not an option as they are full backups and are written in an appliance
specific format. Also the file level backups have to come from the same operating system
type as the target system, so in the case of SONAS the source system should be a UNIX or
Linux system. The advantages of this approach are that it is fast, the backup environment
most likely already in place and there are minimal isses due to ACLs and file attributes. The
possible disadvantages include that restores from these backups need to be tested before
the migration date, tapes may get clogged up by the migration so scheduled backups might
be at risk and also network congestion if there is no dedicated backup network.
The diagram in Figure 11-2 shows the components and flows for a file-system copy migration:
UNIX 3
NFS Authentication server Authentication flow
client with AD or LDAP Data access flow
6 Data migration flow
Windows
CIFS
client
4 2
Windows server for Target SONAS
1 CIFS w/robocopy
Source file server
Linux server for
5
NFS w/rsync
Figure 11-2 File system copy migration flow
First note that alll components share and access one common authentication service (3) that
runs one of the protocols supported in a SONAS environment. Unix and Linx clients (6) are
connected to the source file server (1). We have a server to migrate each file sharing
protocol, so for UNIX file systems we will use the Linux server with robocopy (5) and for
Windows filesystems we will use the Windows server with robocopy (4). The UNIX server (5)
and Windows server (4) connect to both the source file server (1) and to the SONAS target
server (2) over the customer LAN network. The robocopy or rsync utilities running on these
servers will read file data and metadata from the source file server (1) and copy them file by
file to the target SONAS (2). The migration steps in this scenario are as follows:
1. Copy one group of shares or exports at a time, from source filer to SONAS
2. Shutdown all clients using those shares or exports
3. Copy any files that have been changed since last copy
4. Remap the clients to access the SONAS and restart the clients

11.4.3 Sample throuhput estimates

The performance you can get in migrating data depends on many factors, we will show a
sample scenario based on the following environment:
򐂰 Network
– 10 G Brocade Ethernet network switch TurboIron X24
– 1 G Cisco Ethernet network switch
򐂰 Servers
– IBM System X 3650-M2 Windows server with Qlogic 10G Ethernet card and 1G
Broadcom Extreme II
– IBM System X 3650-M2 Linux server with Qlogic 10G Ethernet card and 1G Broadcom
Extreme II
򐂰 Storage
– Nseries N6070 with 28 disks for NFS share
– Nseries N6070 with 28 disks for CIFS share
– SONAS 2851 base config (3x IO nodes, 2x storage nodes, 120x disk drives 60xSAS
and 60xSATA)
The migration throuhput test resuts that were obtained in this environment are:
򐂰 10G Ethernet we got the following:
– Rsync, NFS mount on Linux server with 10 GigE 140MB/s
– Robocopy, CIFS share on Windows 2008/2003 server with 10 GigE 70 MB/s
– Richcopy, CIFS share on Windows 2008/2003 server with 10 GigE 90 MB/s
򐂰 10G Ethernet with jumbo frames enabled we got the following:
– Rsync, NFS mount on Linux server with 10 GigE interface 140 MB/s
– Robocopy, CIFS share on Windows 2008/2003 server with 10 GigE 95 MB/S
– Richcopy, CIFS share on Windows 2008/2003 server with 10 GigE 110 MB/s
򐂰 1G Ethernet we got the following:
– Rsync, NFS mount on Linux server with 1 GigE interface 35-40 MB/s
– Robocopy, CIFS share on Windows 2008/2003 server with 1 GigE 25-35 MB/S
– Richcopy, CIFS share on Windows 2008/2003 server with 1 GigE 28-35MB/s
To get a definitive performance estimate for your environment you should run migration test
runs where you copy data to the new server without affecting or modifying end user access to
the source file server.
11.4.4 Migration throughput example

Let us discuss one migration example. Our company has 2000 users. Each user has a mail
box files about 500 MB and a archive files of 1,5 GB. Assuming the mail box file changes
about 25% a day per user we calculate the daily change rate.
500MB * 2000 *0,25 =250000MB ~ 244GB/day.
Assuming we can start the migration after office hours gives us a window of about 8-10 h. So
to copy 244GB /10h = 25GB/h would require a migration speed in the order of 6,9 MB/s or
55Mb/s.
Assume that after test runs in our environment we measured the following throughputs:
򐂰 63 MB/s or 504 Mb/s on a 10Gb link
򐂰 48MB/s or ~380 Mb/s on a 1Gb link

In this case the migration of the new data would last about 1,10 h on the 10Gb ling and 1,44 h
on the 1Gb link. In our migration test this would translate to about 1- 1.5h migration time for
244GB data and the maximum amount of data changes per day of about 1.3 TB, assuming a
6 hour migration window and a 63MB/sec data rate on the 10Gb link.
Continuing with the example above, in addition to the 244 GB/day change to the mailbox,
users do also archiving of the changes. Assuming that the complete archive file will also be
migrated this will result in the following duration:
򐂰 10Gb link: (500MB+1500MB)*2000/63MB/s = ~17h
򐂰 1Gb link:(500MB+1500MB)*2000/48MB/s = ~23h
In this case the migration would run longer then the allocated window. You now have two
options:
Split the migration load to two separate migration servers or run the migration tool more
frequently as most tools only migrate the delta between the source and the target file.
As mentioned before, the right measure will probably only be determined by test runs.

Draft Document for Review November 1, 2010 9:32 am 7875QuickStart.fm
12
Chapter 12. Getting started with SONAS

This chapter will take you through common tasks from the basic implemetation through
monitoring your SONAS using a scenario we developed based on our experience writing this
book.

7875QuickStart.fm Draft Document for Review November 1, 2010 9:32 am
12.1 Quick start

In this scenario, we will go through tasks in a day in the life of a SONAS administrator. Tasks
we will walk through include create a user and share,monitor the appliance, execute GUI and
CLI sample tasks.
12.1.1 Quick start tasks

For the quickstart chapter we will document the following tasks performed by a SONAS
administrator. We need root access to a live SONAS system to perform these tasks:
򐂰 Login using CLI and GUI
򐂰 Create a new administrative user
򐂰 Create a filesystem
򐂰 Create exports
򐂰 Access the export from a client --> needs a client
򐂰 Create and access asnapshot
򐂰 Backup and restore files with TSM --> requiores TSM server
򐂰 Create and apply a policy
򐂰 Monitoring: Add a a new public IP address
Additional tasks that require SONAS access are:
Setup asynchronous replication: Requires access to system with 1.1.1. code & 2
simulators or real systems. Our current simulators are at 1.1.0 code level and on two different
machines so they are no good. Proposal is to ask Sven for access/demo on Almaden system,
alternatively we need Sims at 1.1.1 level that can talk to each other.
TSM backup: Requires TSM server. Could use simulator and ask Alexander Saupp if there is
available TSM server or install TSM server on linux 10.0.0.66 or windows 10.0.0.100 .
Alternatively access TSM server for real system
Installation procedure: Requires real machine that needs to be installed.
Failover: test cases require real machine
12.2 Connecting to the SONAS system

We can connecto to the SONAS appliance using either the Command Line Interface (CLI) or
the Graphical User Interface (GUI)
We will show how to connect to the SONAS appliance at address 9.11.102.6, that is the
management node public IP address of th example SONAS appliance, your address will be
different.
12.2.1 Connect to SONAS appliance using GUI

To connect to the appliance using the GUI open a web browser. In our environment we use
the following URL to connect:
https://9.11.102.6:1081/ibm/console
You will see a Login screen as shown in Figure 12-1 on page 443.

Figure 12-1 SONAS login screen
Enter the administrator user and password and press the Log in button and you will be
connected to the SONAS admin GUI as shown in Figure 12-2. Note that the default userid
and password for a newly installed SONAS appliance is root and Passw0rd.
Figure 12-2 : SONAS GUI welcome screen
Note: You can access SONAS help information from the SONAS at the following URL:
https://9.11.102.6:1081/help
12.2.2 Connect to SONAS appliance using CLI

To connect to the appliance using the CLI start a ssh client session to the SONAS
management node public address as shown in the following example: Example 12-1 on
page 444.
Chapter 12. Getting started with SONAS 443

Example 12-1
# ssh root@9.11.102.6
root@9.11.102.6's password:
Last login: Mon Aug 3 13:37:00 2009 from 9.145.111.26
12.3 Create SONAS administrators

SONAS administrators manage the SONAS cluster. You can create an administrator using
either the GUI or the command line:
12.3.1 Creating a SONAS administrator using the CLI

To create a SONAS user using the CLI you use the mkuser command as shown in
Example 12-2:
Example 12-2
[root@sonasisv.mgmt001st001 ~]# mkuser my_admin -p segreta
EFSSG0019I The user my_admin has been successfully created.
We can list the users with the lsuser command as shown in Example 12-3
Example 12-3
[root@sonasisv.mgmt001st001 ~]# lsuser
Name ID GECOS Directory Shell
cluster 901 /home/cluster /usr/local/bin/rbash
cliuser 902 /home/cliuser /usr/local/bin/rbash
my_admin 903 /home/my_admin /usr/local/bin/rbash
12.3.2 Creating a SONAS administrator using the GUI

From the SONAS GUI select Setting and Console User Authority and you wil see a list of
defined users as shown in Figure 12-3 on page 445

Figure 12-3 Display the user list
To add a new user CLI user to the GUI, for example my_admin, you use the Add button in
the Console User Authority window as shown in Figure 12-3. You will see a screen as shown
in Figure 12-4 on page 445. Enter the administrator name and select administrator role to
grant this administrator maximum priviliges and press Ok.
Figure 12-4 Create a new admin user
SONAS offers multipl GUI administrative roles to limit administrator working scope within the
GUI. The roles are:
administrator This role has access to all features and functions provided by the GUI,
it is the only role that can manage GUI users and roles. It is the default
when adding a user with the CLI.
operator The operator can do the following: Check cluster health, view cluster
configuration, verify system and file system utilization and manage
thresholds and notifications settings.

export administrator The export administrator is allowed to create and manage shares, plus
perform the tasks the operator can execute.
storage administratorThe storage administrator is allowed to manage disks and storage
pools, plus perform the tasks the operator can execute.
system administratorThe system administrator is allowed to manage nodes and tasks, plus
perform the tasks the operator can execute.
Note: Note: These user roles only limit the working scope of the user within the GUI. This
limitation does not apply to the CLI, which means the user has full access to all CLI
commands.
12.4 Monitoring your SONAS environment

You can monitor your SONAS system using the GUI. The GUI offers multiple tools and
interfaces to view the health of the system. Selected resources can also be monitored using
the command line but
12.4.1 Topology view

Select health Summary  Topology view as shown Figure 12-5 on page 446:
Figure 12-5 Topology view
The topology view offers a high level overview of the SONAS appliance, it highlights errors
and problems and allows you to quickly drill down to get more detail on individual
components. The topology view offers an overview of the following components:
򐂰 Networks: interface and data networks
򐂰 Nodes: interface, management and storage

򐂰 Filesystems and exports
In Figure 12-6 we see that the interface node is in critical status as it is flagged with a red
circle with an x inside. To expand the interface node click on the blue Interface Nodes link or
on the plus (+) sign at the bottom right of the interface nodes display and you will see the
interface node list and the current status as shown in Figure 12-6 on page 447
Figure 12-6 Interface node status list
To see the reason for the critical error status for a specific node click on the node entry in the
list and you get a status display of all events as shown in Figure 12-7 on page 447:
Figure 12-7 Node status messages

The first line shows that the problem originateds from a critical SNMP error, after having
corrected the error situation you can mark it as resolved by right-click on the error line and
and clicking on the Mark Selected Errors as Resolved box as shown in Figure 12-8 on
page 448:
Figure 12-8 :Marking an error as resolved
From the topology view you can display and easily drill down to SONAS appliance
information for example to view the filesystem information click on the filesystems link in the
topology view as shown in Figure 12-9 on page 448
Figure 12-9 Open filesystem details information
And you wil get a display such as the one shown in Figure 12-10
Figure 12-10 Filesystem details display
If, instead you click on the new window sign as shown in Figure 12-11 on page 448 :
Figure 12-11 Open filesystem page
You will be moved to a new SONAS filesystem configuration window as shown in

Figure 12-12 on page 449:

Figure 12-12 Filesystems page display
12.4.2 SONAS Logs

SONAS offers multiple logs for monitoring its status.
The alert log contains SONAS messages and events. It is accessed from Health
Summary  Alert Log and a sample is shown in Figure 12-13
Figure 12-13 Alert log display
The system log contains operating system messages and events. It is accessed from Health
Summary  System Log and a sample is shown in Figure 12-14
Figure 12-14 System log display
12.4.3 Performance and reports

The performance and reports option allows you to generate and display SONAS hardware
component and filesystem utilization reports.

Hardware componnt reports

Select Performance and Reports  System Utilization and you will see a list of SONAS
nodes as illustrated in Figure 12-15 on page 450:
Figure 12-15 System utilization display
You can report on CPU, memory, network and disk variables and generate reports from a
daily basis up to 3 years.
To generate a disk I/O report for strg001st001 select the storage node, select Disk I/O as
Measurement Variable and select Monthly Chart for Measurement Duration and press the
Generate Charts button and you will get a chart as illustrated in Figure 12-16
Figure 12-16 Local disk IO utilization trend report
Filesystem utilization
Select Performance and Reports  Filesystem Utilization and you will see a list of
SONAS filesystems as illustrated in Figure 12-17 on page 451:

Figure 12-17 Filesystem utilization selection screen
You can generate space usage charts by selecting a filesystem and a duration such as
Monthly chart, press Generate Charts and you will get a cart as that shown in Figure 12-18
Figure 12-18 Disk space utilization trend report
12.4.4 Threshold monitoring and notification

SONAS can be configured to monitor specific events and thresholds and send emails and
SNMP traps. To set up notification connect to the SONAS GUI and select SONAS Console
settings  Notification Settings. This will bring up the notifiucation settings screen. Enter
on the panel illustrated in Figure 12-19 on page 452 and press the Apply button:

Figure 12-19 Notification settings panel
The next step is to configure notification recipients. Select SONAS Console Settings 
Notification Recipients  Add Recipient and you are presented with the panel shown in
Figure 12-20 Add recipients panel

The notification recipents screen is now updated with the email recipient as shown in
Figure 12-21 Notification recipients panel
You can monitor specific utilization thresholds by going to SONAS Console Settings 
Utilization Thresholds and will see a panel as illustrated in Figure 12-22 and pressing the
Add Thresholds button:
Figure 12-22 Utilization threshold display panel
You are prompted for a threshold to monitor from the following list:
򐂰 File syste usage
򐂰 GPFS usage
򐂰 CPU usage
򐂰 Memory usage
򐂰 Network errors

Specify warning and error levels andalso recurrences of the event as shown in Figure 12-23
on page 454 and press Ok.
Figure 12-23 Add new threshold panel
12.5 Create a filesystem

You can create a new filesystem using either the GUI or tle CLI.
12.5.1 Creating a filesystem using the GUI

To create a new filesystem using the GUI select Files  Filesystems and you are presented
with the panel shown in Figure 12-24:
Figure 12-24 Filesystems display panel
Press the Create a File System button and you will will be presented with the panel shown in

Figure 12-25 Create filesystem panel - select NSDs
On this panel you will see multiple tabs:

򐂰 Select NSDs - to choose what NSDs to use
򐂰 Basic - to select mout point, block sizeand device name
򐂰 Locking and ACLs
򐂰 Replication settings
򐂰 Automount settings
򐂰 Limits - maximum nodes
򐂰 Miscellaneus - for quota management settings
Choose one or more NSDs and then select the Basic tab and specify mount point and device
name as shown in Figure 12-26. Accept the default for all remaining options.
Figure 12-26 Create filesystem panel - basic information
Now press the Ok button at the bottom of the screen (not shown). A progress indicator is
displayed as shown in Figure 12-27 on page 456. Press Close to close the progress
indicator:

Figure 12-27 Filesytem creation task progress
After completion you will see the filesystems list screen with the new redbook filesystem as
shown in Figure 12-28
Figure 12-28 Filesystem display panel
Note: To display additional information on a given filesystem click on the filesystem name
in the list. The name will be highlited and the detailed filesystem information for the
selected filesystem will be shown below.

Note that the redbook filesysrem is not mounted, as 0 nodes appears in the Mounted on Host
column as shown in Figure 12-28. Select th eredbooks filesystem entry and click the Mount
button and you will be presented with a box asking where to mount the filesystem. Select
Mount on all nodes and press Ok as shown in Figure 12-29 on page 457:
Figure 12-29 Filesystem mount panel
The file system will now be mounted on all interface nodes and on the management node.
12.5.2 Creating a filesystem using the CLI

To create a filesystem called redbook2 using the command line proceed as follows. List
available disks using th elsdisk command as shown in Example 12-4:
Example 12-4 Check available disks with lsdisk

gpfs1nsd gpfs0 1 dataAndMetadata system ready up 4/21/10 11:22 PM
gpfs2nsd gpfs0 1 dataAndMetadata system ready up 4/21/10 11:22 PM
gpfs3nsd redbook 1 dataAndMetadata system ready up 4/21/10 11:22 PM
gpfs4nsd 1 dataOnly userpool ready 4/21/10 10:50 PM
gpfs5nsd 1 dataAndMetadata system ready 4/21/10 10:50 PM
To create a new filesystem using the gpfs5nsd disk use the mkfs command as shown in
Example 12-5:
Example 12-5 Create the file redbook2 filesystem

mkfs redbook2 /ibm/redbook2 -F "gpfs5nsd" --noverify -R none
To list the new filesystems you can use the lsfs command as shown in Example 12-6:
Example 12-6 List all filesystems

[sonas02.virtual.com]$ lsfs
Cluster Devicename Mountpoint Type Remote device Quota Def. quota Blocksize Locking type ACL type Inodes
Data replicas Metadata replicas Replication policy Dmapi Block allocation type Version Last update
sonas02.virtual.com gpfs0 /ibm/gpfs0 local local user;group;fileset 64K nfs4 nfs4 33.536K 1
1 whenpossible F cluster 11.05 4/22/10 1:34 AM
sonas02.virtual.com redbook /ibm/redbook local local user;group;fileset 256K nfs4 nfs4 33.792K 1
1 whenpossible T scatter 11.05 4/22/10 1:34 AM
sonas02.virtual.com redbook2 /ibm/redbook2 local local user;group;fileset 256K nfs4 nfs4 33.792K 1
1 whenpossible T scatter 11.05 4/22/10 1:34 AM
Note: The lsfs returns a subset of the information available in the SONAS GUI. Information
not available in the lsfs command includes if and where mounted, space utilization. To get
this information from the command line you need to run GPFS commands as root.

To make the filesystem available you mount it on all interface nodes using the mountfs
command as shown in Example 12-7:
Example 12-7 Mount filesystem redbook2

[sonas02.virtual.com]$ mountfs redbook2
EFSSG0038I The filesystem redbook2 has been successfully mounted.
The file system can also be unmounted as shown in Example 12-8
Example 12-8 Unmount filesystem redbook2

[sonas02.virtual.com]$ unmountfs redbook2
EFSSG0039I The filesystem redbook2 has been successfully unmounted.
12.6 Creating an export

You can configure exports using either the GUI or the CLI.
12.6.1 Configuring exports using the GUI

Connect to the SONAS GUI and select Files  Exports and a screen like Figure 12-30 will
be displayed.
Figure 12-30 Exports configuration

To create a new export for the redbook filesystem click the Add button and you will see the
first screen of the export configuration wizard as shown in Figure 12-31 on page 459. Select
an export name and directory path and select the protocols you wish to configure and click
the Next> button.
Figure 12-31 Export configuration wizard e
You are presented with a NFS configuration screen shown in Figure 12-32 on page 460. Add
a client called “*” that represents all hostnames or IP addresses used by the clients. Unselect
the read only and root squash attributes and press the Add Client button. When all users
have been added press the Next button.

Figure 12-32 Export configuration wizard NFS settings
You now re presented with the CIFS configuration screen shown in Figure 12-33 on
page 460. Accept the defaults and press the Next button:
Figure 12-33 Export configuration wizard CIFS settings

On the last screen press the Finish button to finalize the configuration. Close the task
progress window that will appear and you will see the exports list screen shown in
Figure 12-34 Export list screen
Note: To display additional information on a given export click on the export name in the
list. The name will be highlited and the detailed export information for the selected export
will be shown below.
12.6.2 Configuring exports using the CLI

Use the mkexport command to create an export using the CLI.
Example 12-9 Create an export using CLI

[sonas02.virtual.com]$ mkexport my_redbook2 /ibm/redbook2/export1 --cifs "browseable=yes" --owner "VIRTUAL\administrator"
EFSSG0019I The export my_redbook2 has been successfully created.
To list the newly created export use the lsexport command as shown Example 12-10:
Example 12-10 List all defined exports

[sonas02.virtual.com]$ lsexport -v
Name Path Protocol Active Timestamp Options
my_redbook /ibm/redbook/export1 NFS true 4/22/10 3:05 AM *=(rw,no_root_squash,fsid=1490980542)
my_redbook /ibm/redbook/export1 CIFS true 4/22/10 3:05 AM browseable
my_redbook2 /ibm/redbook2/export1 CIFS true 4/22/10 3:05 AM browseable
Note: The SONAS CLI does not show all export attributes, for example the owner value is
not shown. To determine the owner use the GUI or the root account.
12.7 Accessing an export

We will show how to access an export from Windows and Linux

12.7.1 Accessing a CIFS share from Windows

To access a CIFS share from a Windows system we logon to a Windows system that is part
of the same domain as SONAS. Active Directory must be configured prior to performing this
step. We then navigate to Start menu and select My Computer. Select Tools  Map
network Drive. You will see a window as shown in Figure 12-35. Enter the SONAS cluster
name and export name: \\sonas2\my_redbook in the folder field and select a drive letter.
Press the Finish button.
Figure 12-35 Map network drive
Open the My Computer and verify that you can see the mapped network drive called
my_redbook as shown in Figure 12-36 on page 462.
Figure 12-36 Verify mapped drive
12.7.2 Accessing a CIFS share from Windows command prompt

Connect to a Windows system. Select Start Menu and select Command Prompt and enter
the windows net use command shown in Example 12-11:

Example 12-11 Windows net use to access shares I

C:\Documents and Settings\administrator.ADS>net use z: \\sonas02.virtual.com\my_redbook *
Type the password for \\sonas02.virtual.com\my_redbook:
The command completed successfully.
And verify the share is mounted again using the net use command as shown in
Example 12-12:
Example 12-12 Listing an export using CLI

C:\Documents and Settings\administrator.ADS>net use
New connections will not be remembered.
Status Local Remote Network

-------------------------------------------------------------------------------
OK Z: \\sonas02.virtual.com\my_redbook Microsoft Windows Network
The command completed successfully.
12.7.3 Accessing a NFS share from Linux

To access a NFS share from a Linux host connect to the Linux with a user that is defined with
the same authentication server used by the SONAS appliance. Create a mount point for your
SONAS export for example:
mkdir /sonas02/my_redbook
Now enter the mount command to mount the filesystem exported from SONAS and then
repeat the mount command without arguments to display all mounted filesystems as shown in
Example 12-13 on page 463:
Example 12-13 Mounting a SONAS filesystem on Linux

[root@tsm001st010 ~]# mount -t nfs sonas02.virtual.com:/ibm/redbook/export1 /sonas02/my_redbook
[root@tsm001st010 ~]# mount

/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
sonas02.virtual.com:/ibm/redbook/export1 on /sonas02/my_redbook type nfs (rw,addr=sonas02.virtual.com)
To verify what is exported for your client and can be mounted you can use the smbclient -L
command as shown in Example 12-14:
Example 12-14 Listing available exports

[root@tsm001st010 ~]# smbclient -L sonas02.virtual.com -U "virtual\administrator"
Enter virtual\administrator's password:
Domain=[VIRTUAL] OS=[Unix] Server=[CIFS 3.4.2-ctdb-20]
Sharename Type Comment

--------- ---- -------
IPC$ IPC IPC Service ("IBM SONAS Cluster")
my_redbook Disk
my_redbook2 Disk
Domain=[VIRTUAL] OS=[Unix] Server=[CIFS 3.4.2-ctdb-20]
Server Comment
--------- -------
Workgroup Master
--------- -------

12.8 Creating and using snapshots

You can create snapshots usning either the CLI or the GUI.
12.8.1 Creating snapshots with the GUI

To create a snapshot connect to the SONAS GUI and navigate to Files  Snapshots and
you will see a screen as shown in Figure 12-37 on page 464:
Figure 12-37 Snapshots list window
To create a snapshot select the name of an active, that is mounted, file system from the list.
We select the filesystem called redbook and then push the Create a new... button. Accept the
default snapshot name in the panel shown in Figure 12-38 and press the Ok button. By
accepting the default snapshot name the snapshots will be visible in the Windows previous
versions tab for Windows client ssytems.
Figure 12-38 Snapshot name window
Figure 12-39 on page 465 shows the current status and list of snapshots for a specific
filesystem, redbook in our case.

Figure 12-39 List of snapshots for filesystem
12.8.2 Creating snapshots with the CLI

To create a snapshot from the CLI using the default snapshot naming convention you use the
mksnapshot command as shown in Example 12-15:
Example 12-15 Create a snapshot with the CLI

[sonas02.virtual.com]$ mksnapshot redbook
EFSSG0019I The snapshot @GMT-2010.04.22-03.14.07 has been successfully created.
To lsit all available snapshots for the redbook filesystem you use teh lssnapshot command as
Example 12-16 List all snapshots with the CLI

[sonas02.virtual.com]$ lssnapshot -d redbook
Cluster ID Device name Path Status Creation Used (metadata) Used (data) ID Timestamp
720576040429430977 redbook @GMT-2010.04.22-03.14.07 Valid 22.04.2010 05:14:09.000 256 0 3 20100422051411
12.8.3 Accessing and using snapshots

Windows offers a previous versions function to view previous versions of a directory. You can
view the previous versions for a mounted share. Open My Computer and right-click on a
SONAS network drive or on any subdirectory in the network drive and select Properties from
the pull-down menu and you wil see a screen like that shown in Figure 12-40 on page 466.
Then select a version from the list and choose an action such as View Copy or Restore files.

Figure 12-40 Viewing the Windows previous versions tab snapshot4
Note: To access and view snapshots in a NFS share you must export the root directory for
the filesystem as snapshots are stored in a hidden directory called .snapshots in the root
directory.
To view snapshots from a linux client, connect to the Linux client, mount the file system from
a root export and list the directories.
Example 12-17 Mount the filesystem and list the snapshots

[root@tsm001st010 sonas02]# mount -t nfs 10.0.1.121:/ibm/redbook /sonas02/my_redbook
[root@tsm001st010 sonas02]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 14877060 3479016 10630140 25% /
tmpfs 540324 0 540324 0% /dev/shm
10.0.1.121:/ibm/redbook
1048576 155904 892672 15% /sonas02/my_redbook
[root@tsm001st010 export1]# ls -la /sonas02/my_redbook/.snapshots/

total 129
dr-xr-xr-x 5 root root 8192 Apr 22 05:14 .
drwxr-xr-x 4 root root 32768 Apr 22 02:32 ..
drwxr-xr-x 4 root root 32768 Apr 22 02:32 @GMT-2010.04.22-02.55.37
12.9 Backing up and restoring data with TSM

We will illustrate how to configure TSM to perform backup and restore. All actions are
performed from teh SONAS command line as the GUI does not offer TSM configuration
panels. We wil backup a filesystem called tsm0 that has been configured on the SONAS
appliance called humboldt.storage.tucson.ibm.com and will use the TSM server called

slttsm2.storage.tucson.ibm.com. Our SONAS cluster has 6 interface nodes and we will

configure TSM backup to run on the first three nodes.
We start by configuring a new set of TSM clients to the TSM server for the redbook fileststem
Connect as an administrator to the TSM server and define a target node for the tsm0 file
system called redhum and we define three agent nodes, one for each interface node that will
run backups, with a TSM client name such as the nodename. We then list the defined clients
as shown in Example 12-18.
Note: Each TSM client running on an interface node can start up to 8 parallel sessions to
the TSM server and we start the TSM client in parallel on 3 of the 6 interface nodes in the
cluster giving us a total od 24 paralell sessions to the TSM server. As each session to a
sequential TSM storage pool, for example file or tape, requires one mount point you must
configure the proxy target client with a number of mount points equal or greater than 24.
Example 12-18 Create TSM client nodes

tsm: SLTTSM2>reg n redhum redhum domain=standard maxnummp=24
ANR2060I Node REDHUM registered in policy domain STANDARD.
ANR2099I Administrative userid REDHUM defined for OWNER access to node REDHUM.
tsm: SLTTSM2>reg n redhum1 redhum1

ANR2060I Node REDHUM1 registered in policy domain STANDARD.
ANR2099I Administrative userid REDHUM1 defined for OWNER access to node REDHUM1.


tsm: SLTTSM2>q node

Node Name Platform Policy Domain Days Since Days Since Locked?
Name Last Acce- Password
ss Set
------------------------- -------- -------------- ---------- ---------- -------
REDHUM (?) STANDARD <1 <1 No
REDHUM1 (?) STANDARD <1 <1 No
We now associate the three TSM agent nodes to the redhum target node as shown in
Example 12-19:
Example 12-19 Grant TSM proxy node

tsm: SLTTSM2>grant proxy target=redhum agent=redhum1,redhum2,redhum3
ANR0140I GRANT PROXYNODE: success. Node REDHUM1 is granted proxy authority to node REDHUM.
Now connect to the SONAS CLI and and define the TSM server configuration information to
the TSM client by using the cfgtsmnode command as shown in Example 12-20 on page 467:
Example 12-20 Configure TSM server to SONAS

[Humboldt.storage.tucson.ibm.com]$ cfgtsmnode slttsm2 9.11.136.30 1500 redhum1 redhum int001st001 redhum1
EFSSG0150I The tsm node was configured successfully.
You can list the TSM server configuration with the lstsmnode as shown in Example 12-21:

Example 12-21 List TSM client configuration

[Humboldt.storage.tucson.ibm.com]$ lstsmnode
Node name Virtual node name TSM server alias TSM server name TSM node name
int001st001 redhum slttsm2 9.11.136.30 redhum1
int004st001 server_a node.domain.company.COM
We are now ready to perform TSM backup and restore operations using the cfgbackupfs
command as shown in Example 12-22. After configuring the filesystem backup we list
configured filesystem backups with the lsbackupfs command.
Example 12-22 Configure and list filesystem backup to TSM

[Humboldt.storage.tucson.ibm.com]$ cfgbackupfs tms0 slttsm2 int002st001,int003st001
EFSSG0143I TSM server-file system association successfully added
EFSSG0019I The task StartBackupTSM has been successfully created.
[Humboldt.storage.tucson.ibm.com]$ lsbackupfs -validate

File system TSM server List of nodes Status Start time End time Message Validation Last update
tms0 slttsm2 int002st001,int003st001 NOT_STARTED N/A N/A Node is OK.,Node is OK. 4/23/10 4:51 PM
To start a backup you use the startbackup command and specify a filesystem as shown in
Example 12-23. You can then list the backup status with the lsbackupfs command and verify
the status.
Example 12-23 Start a TSM backup

[Humboldt.storage.tucson.ibm.com]$ startbackup tms0
EFSSG0300I The filesystem tms0 backup started.
[Humboldt.storage.tucson.ibm.com]$ lsbackupfs
File system TSM server List of nodes Status Start time End time Message
Last update
tms0 slttsm2 int002st001,int003st001 BACKUP_RUNNING 4/23/10 4:55 PM N/A
log:/var/log/cnlog/cnbackup/cnbackup_tms0_20100423165524.log, on host: int002st001 4/23/10 4:55 PM

Draft Document for Review November 1, 2010 9:32 am 7875Hintsips.fm
13
Chapter 13. Hints, tips and how to

information
This chapter contains information we found useful in the development of this book. It includes
out hands-on experiences as well as useful information from whitepapers, developers and
implementers of SONAS.

7875Hintsips.fm Draft Document for Review November 1, 2010 9:32 am
13.1 What to do when you receive an error message
13.1.1 EFSSG0026I management service stopped

When you issue a CLI command such as lsfs and receive a message such as the one shown
in Figure 13-1:
[SONAS]$ lsfs
EFSSG0026I Cannot execute commands because Management Service is stopped. Use
startmgtsrv to restart the service
Figure 13-1 Management service stopped message
You can proceed to restart the management service with the startmgtsrv command as shown
in Figure 13-2:
[SONAS]$ startmgtsrv
EFSSG0007I Start of management service initiated by cliuser1
Figure 13-2 Starting th emanagement service
When the management service is stopped by the CLI, the GUI will not work with some
exceptions. Only the following commands can be used when the management service is not
running:
򐂰 initnode
򐂰 startmgtsrv
13.2 Debugging SONAS with Logs
13.2.1 CTDB Health Check

CTDB gives a fair understanding of the health of the cluster. The cluster is healthy if all the
nodes have CTDB state as OK. To check the status of CTDB, on comand prompt, run the
command,
# ctdb status
This is a root command and not a CLI command. However, as a CLI user on the Management
GUI, you can also check the status from the Management GUI by checking the Interface
Node details by clicking the link, “Interface Node” under the “Clusters” category.
The column in the table for the Interface Node shows the status of CTDB.
CTDB can be unhleahty for many reasons. In case it is monitoring the services and GPFS, it
will go to unhealthy state if any of these services is down or have some issues.
In case, CTDB is unhealthy, check for the logs. The Management GUI system logs will give
certain idea of what is wrong.
You can also collect latest logs from all the nodes by running the command:
#cndump

This command needs root access as of now. It collects all the logs from the nodes and
creates a compressed zip file. It will take some time to complete and when done, will also
show the path where it is stored.
When you uncompress the file, you will see a directory for each node, like Management
Node, each Interface Nodes and each Storage Nodes. Inside each you will find the
directories with log information and more from each node.
13.2.2 GPFS Logs:

For GPFS logs you can look into:
~/var/mmfs/gen/mmfslog
where “~” is the path to each node collected by cndump command.
or login to each node and check in path:

/var/mmfs/gen/mmfslog
13.2.3 CTDB Logs:

You can check the CTDB logs in each node at path:
/var/log/messages
The logs on the management node will have consolidate logs of each node. You can check
for individual Interface and Storage Node by checking for /var/log/messages file on each
node or in the directory collected by cndump for each node.
13.2.4 Samba/Winbind Logs:

You can also check the Samba or Winbind logs in each node at path:
/var/log/messages
The logs on the management node will have consolidate logs of each node. You can check
for individual Interface and Storage Node by checking for /var/log/messages file on each
node or in the directory collected by cndump for each node.
13.3 CTDB Unhealthy

CTDB manages the health of the cluster. It monitors the services and file systems. There are
many reasons why CTDB can go unhealthy. Some can be rectified by you as an administrator
of the system. Described below are some of the quick checks that you can make.
13.3.1 CTDB manages Services

Here are some of the configurable parameters for services in CTDB. You can changes these
values through the GUI.
CTDB_MANAGES_VSFTPD
Chapter 13. Hints, tips and how to information 471

CTDB_MANAGES_NFS
CTDB_MANAGES_WINBIND
CTDB_MANAGES_HTTPD
CTDB_MANAGES_SCP
CTDB_MANAGES_SAMBA
By default, these variables are set to “yes”. CTDB manages these services in this case.
Whenever the services are down, CTDB goes unhealthy.
If CTDB has gone unhealthy, check on all the Interface Nodes, if these services are up and
running. In case it is not, start the process.
If you do not wish to CTDB to monitor and would like to have some services not running, you
may want to turn off this varialble. In this case, you will not be notified in future. Turn off this
variable at your own risk.
Use the GUI to change the parameter. Refer to the Administrator chapter in point 1.h under
Clusters on page 30 to know how.
13.3.2 Master file system umounted

CTDB can go unhealthy, if the master filesystem is down. This means, it can no longer
access the “reclock” file it uses to hold a lock on. This makes CTDB go unhealthy. Check for
the master filesystem and see that it is mounted.
13.3.3 CTDB manages GPFS

If all services are up and running, and also master filesystem is mounted, you can check for
other GPFS file systems that have exports.
By default, theCTDB variable, CTDB_MANAGES_GPFS is set to “yes”. This means, if there is any file
system on which there are exports created for users to access, and that file system is not
mounted, CTDB goes unhealthy.
Check for the mounts by running the command,

#mmlsmount all -L
This command is a root command and will display mount information of all the filesystems on
each node.
In case, you see that any filesystem is unmounted, and there are exports created on the
exports, mount the filesystem so that CTDB becomes healthy. You can check for the exports
information by running the command,
#lsexports
More on these CLI commands are in the Administration chapter.
In case, you wish you have some exports created for testing and do not wish to have the
filesystem mounted any more, you may not want to have CTDB monitor the GPFS filesystem.
However, in this case, anytime, any of the filesystem is unmounted, whether needed or not,
CTDB will not notify by changing its staye. You can change this value at your own risk.

13.3.4 GPFS unable to mount

CTDB may go unhealthy if any of the GPFS filesystem is not mounted and there are data
exports created on that file system. As mentioned in, 13.3.3, “CTDB manages GPFS”, you
can try and mount the filesystem. But, if GPFS filesystem refuses to mount, CTDB will still
unhealthy. To check why GPFS is not mounting, you may want to check the file system
attribute and check the value set for “Is DMAPI enabled?” or “-z” option. You can run the
following command to check:
#mmlsfs gpfs1 -z
flag value description
---- ---------------- -----------------------------------------------------
-z yes Is DMAPI enabled?
In above example, consider filesystem gpfs1, is not mounting. Here, the value of “-z” option is
set to “yes”. In this case, the GFPS filesystem is waiting for a DMAPI application and will only
mount when it becomes available. If you do not have any DMAPI applications running and do
not wish GPFS to wait on any DMAPI application, you need to set this “-z” option to “no”.
This value is set to “yes” by default and DMAPI is enabled. Remember to create a filesystem
with “--nodmapi” option when you create filesystem using the CLI command mkfs if you do not
wish to enable it.
If already set to “yes” you can use command mmchfs and change the value for “-z” option.
Chapter 13. Hints, tips and how to information 473


Draft Document for Review November 1, 2010 9:32 am 7875ax01.fm
Appendix A. Additional component detail

In this appendix we provide more details on the common components, terms, concepts and
products that are referred to throughout this book.

7875ax01.fm Draft Document for Review November 1, 2010 9:32 am
CTDB
Introduction to Samba
Samba is software that can be run on a platform other than Microsoft Windows, for example,
UNIX, Linux, IBM System 390, OpenVMS, and other operating systems. Samba uses the
TCP/IP protocol that is installed on the host server. When correctly configured, it allows that
host to interact with a Microsoft Windows client or server as if it is a Windows file and print
server.
Thus, on a single Linux NAS server, Samba provides the mechanism to access data from a
Windows client. Samba stores its information in small databases called TDB.
Each TDB has meta data information of the POSIX to CIFS semantic and vice versa.The
local TDB file also containes the messaing, locking details for files and information about
open files that are accessed by many clients. All this is for a single NAS server.
For a clustered file system like SONAS, which provides a clustered NAS environment to allow
many clients to access data from multiple nodes in the cluster, this becomes a bit tricky.
Samba process running on the local nodes does not know about the locking information held
by the samba process running locally on the other nodes.
Taking an example, say a file, file1 has been accessed through two different nodes by two
clients. These two nodes do not know about the locks and hence do now know that each of
them have accessed the same file. In this case, if both the nodes write to the file, the file
content may be corrupted or last save may be the latest file stored.
There was no way to co-ordinate the samba processes (smbd) that run on different nodes. To
have consistency in the data access and writes, there must be a way in which the samba
processes (smbd) running on each node can communicate with each other and share the
informationto avoid shared data corruption.
Cluster Implementation Requirements

A clustered file server ideally has the following properties:
򐂰 All clients can connect to any server which appears as a single large system.
򐂰 Distributed filesystem and all servers can serve out the same set of files.
򐂰 Provide Data integrity.
򐂰 A server can fail and clients are transparently reconnected to another server.
򐂰 All file changes are immediately seen on all servers.
򐂰 Minimise the latency of any checks that might require cross-cluster communication.
򐂰 Ability to scale by adding more servers/disk backend.
Clustered TDB
To overcome the shortcomings on a traditional Samba and to provide a clustered file server
efficiently, the clustered TDB was implemented. The Clustered TDB is a shared TDB
approach to distributing locking state. In this approach, all cluster nodes access the same
TDB files. CTDB provides the same types of functions as TDB but in a clustered fashion,
providing a TDB-style database that spans multiple physical hosts in a cluster. The cluster
filesystem takes care of ensuring the TDB contents are consistent across the cluster. The
prototypes include extensive modifications to Samba internal data representations to make
the information stored in various TDBs node-independent.

CTDB also provides failover mechanism to ensure data is not lost if any node goes down
while serving data. It does this with the use of Virtual IP addresses or Public IP addresses.
More on this is explained in detail later. Figure 13-3 below shows the CTDB implementation.
Figure 13-3 CTDB implementation.
As you can see, there is a virtual server which encloses all the nodes as though all the samba
processes on each node talk to each other and update eachother about the locking and other
information held by the samba process.
CTDB architecture
The design is particularly aimed at the temporary databases in Samba, which are the
databases that get wiped and re-created each time Samba is started. The most important of
those databases are the 'brlock.tdb' for byte range locking database and the 'locking.tdb' for
open file database. There are a number of other databases that fall into this class, such as
'connections.tdb' and 'sessionid.tdb', but they are of less concern as they are accessed much
less frequently.
Samba also uses a number of persistent databases, such as the password database, which
must be handled in a different manner from the temporary databases.
Some of the databases that CTDB uses are listed below:

򐂰 account_policy.tdb: NT account policy settings such as pw expiration, etc...
򐂰 brlock.tdb: Byte range locks.
򐂰 connections.tdb: Share connections. Used to enforce max connections, etc.
򐂰 gencache.tdb: Generic caching database.
Appendix A. Additional component detail 477

򐂰 group_mapping.tdb: Stores group mapping information. Not used when using LDAP
backend.
򐂰 locking.tdb: Stores share mode and oplock information.
򐂰 registry.tdb: Windows registry skeleton (connect via regedit.exe).
򐂰 sessionid.tdb: Session information to support utmp = yes capabilities.
We mentioned above that the Clustered TDB is a shared TDB and all nodes access the same
TDB files. This means, all these databases databases are shared by all nodes such that each
one of them can access and update the records. This means that these databases must be
stored in the shared file system. In this case, GPFS. Hence, each time, a record is to be
updated, the smbd daemon on every node will update on the shared database and write to
the shared disks. Since the shared disks can be over network, it could make it very slow and
be a major bottleneck.
To make it simpler, each node of the cluster has CTDB daemon ctdbd running and will have a
local, old-style tdb stored in a fast local filesystem. The daemons negotiate only the metadata
for the TDBs over the network. The actual data read and writes always happens on the local
copy. Ideally this filesystem will be in-memory, such as on a small ramdisk, but a fast local
disk will also suffice if that is more administratively convenient. This makes the read write
approach really fast. The contents of this database on each node will be a subset of the
records in the CTDB (clustered tdb).
However, for Persistent database, when a node wants to write to a persistent CTDB, it locks
the whole database on the network with a transaction, performs its read and write, commits
and finally distributes the changes to all the nodes and write locally too. This way the
persistent database is consistent.
CTDB records typically looks like this:
typedef struct {
char *dptr;
size_t dsize;
} TDB_DATA;
TDB_DATA key, data;
All ctdb operations are finally converted into operations based on these tdb records.
Each of these records are augmented with an additonal header. The header contains the
following information:
uint64 RSN (record sequence number)
uint32 DMASTER (VNN of data master)
uint32 LACCESSOR (VNN of last accessor)
uint32 LACOUNT (last accessor count)

Figure 13-4
RSN: Record Sequence Number
The RSN is used to to identify which of the nodes in the cluster has the most recent copy of a
particular record in the database during a recovery after one or more nodes have died. It is
incremented whenever a record is updated in a local TDB by the 'DMASTER' node.
DMASTER: Data Master
The DMASTER is the virtual node number of the node that 'owns the data' for a particular
record. It is only authoritative on the node which has the highest RSN for a particular record.
On other nodes it can be considered a hint only.
The node that has the highest RSN for a particular record will also have its VNN (Virtual Node
Number) equal to the local DMASTER field of the record, and that no other node will have its
VNN equal to the DMASTER field. This allows a node to verify that it is the 'owner' of a
particular record by comparing its local copy of the DMASTER field with its VNN. If and only if
they are equal then it knows that it is the current owner of that record.
LACCESSOR

The LACCESSOR field holds the VNN of the last node to request a copy of the record. Its
mainly used to determine if the current data master should hand over ownership of this record
to another node.
LACCOUNT
LACOUNT field holds a count of the number of consecutive requests by that node.
LMASTER: Location Master
In addition to the above, each record is also associated with LMASTER (location master).
This is the VNN of the node for each record that will be referred to when a node wishes to
contact the current DMASTER for a record. The LMASTER for a particular record is
determined solely by the number of virtual nodes in the cluster and the key for the record.
RECOVER MASTER
When a node fails, CTDB performs a process called recovery to re-establish a proper state.
The recovery is carried through by the node that holds the role of the RECOVERY MASTER.
It collects the most recent copy of all records from the other nodes.
Only one node can become the RECOVERY MASTER and this is determined by an election
process. This process involves a lock file, called the recovery lock or "reclock" that is placed
in the MASTER file system of the clustered file system. At the end of the election, the newly
nominated recovery master holds an lock on the recovery lock file. The RECOVERY
MASTER node is also responsible of monitoring the consistency of the cluster and to perform
the actual recovery process when reqired.
You can check for the reclock path by using the command: (In this example, /ibm/gpfs0 is the
MASTER filesystem)
# ctdb getreclock
Example output:
Reclock file:/ibm/gpfs0/.ctdb/shared
How CTDB works to synchronize access to data

The key insight is that one node does not need to know all records of a database. Most of the
time, it is sufficient when a node has an up to date copyl of the records that affect its own
client connections. Even more importantly, when a node goes down, it is acceptable to lose
those data that are just about the client connections on that node. Therefore, for a normal
TDB, a node only has those records in its local TDB that it has already accessed. Data is not
automatically propagated to other nodes and just transferred upon request.
When a node in the cluster wants to update a record, the ctdb daemon tries to find out who
the DMASTER is, for a record. DMASTER is the node that owned record. To get the VNN for
DMASTER, it contacts the LMASTER, which will reply with the VNN of the DMASTER. The
requesting node then contacts that DMASTER, but must be prepared to receive a further
redirect, because the value for the DMASTER held by the LMASTER could have changed by
the time the node sends its message.
This step of returning a DMASTER reply from the LMASTER is skipped when the LMASTER
also happens to be the DMASTER for a record. In that case the LMASTER can send a reply
to the requesters query directly, skipping the redirect stage.

The dispatcher daemon will listen for CTDB protocol requests from other nodes, and from the
local smbd via a unix domain datagram socket.The dispatcher daemon follows an event
driven approach, executing operations asynchronously. This
The below Figure 13-5 on page 481 explains the working mentioned above when DMASTER
is as mentioned as LMASTER. The second figure Figure 13-6 on page 481 explains when the
DMASTER changes and there is another request made to get the VNN of new DMASTER.
Figure 13-5 Fetching sequence for CTDB and contacting DMASTER as directed by LMASTER
Figure 13-6 Fetching when DMASTER changes.

The following Figure 13-7 on page 482 also shows working of the dispatcher daemon. When
the node wants to write or read data, it gets the VNN of the current DMASTER for the record.
It them contacts the dispatcher on the node corresponding to the VNN that is listening to
CTDB requests from other nodes, gets the updated copy for the record on its node and
udates it locally.
Figure 13-7
At the time of node failure, the LMASTER gives the VNN of the record that was last updated.
If the node that has the latest information is the node that fails, it is OK to loose this
information as it is only connection information for files. For persistent database, the
information is always available on all nodes and is the up to date copy.
Providing High Availablity for Node Failure

It is highly essential for a filesystem to provide High Availabilty feature so that data access by
the end users are not disruppted upon a node failure. Many application like Banking,
Secutrity, Satellite applications need continuos and real time access to data and cannot
afford to have any interruptions. Hence, High availablity is absolutely necessary.
Broadly, there are two kinds of systems: Active-Passive systems and Active-Active systems.
Active-Passive Failover System: In these systems, one of the servers is always down and
is like the backup server. All data access happens through the Active server. When the server
goes down, the backup server is brought up and starts to service requests. In this case, the
data connections break since there is a time lag between the active server going down and
backup server coming up. This is how the traditional NAS system works. Figure 13-8 below
shows an Active-Passive failovre system where Node2 is always down when Node1 is active
and servicing requests. When Node1 fails, Node2 becomes active and services the requests.

Figure 13-8 Active-Passive failover systems
Active-Active Systems: In these systems, all the nodes in the cluster are active. When a
node fails, the other nodes take over. The service requests of the failed nodes are transferred
to the other and they immediately start servicing requests. The application may see a slight
pause in data transfer but as long as the application can handle failed TCP connections and
start again, the data transfer does not fail and happens uninterruptedly. Figure 13-9 below
shows the Active-Active failover systems where it can be seen that when Node1 fails, all the
requests are passed on to Node2 which is always active. It is done transparent to the users
and hence data transfer does not stop as long as the applications can failover a TCP
connection.
Figure 13-9 Active-Active Failover system.
With the help CTDB, SONAS provides with this feature of Node Failure.
CTDB features:
CTDB uniquely identifies each of the nodes in the cluster by the Virtual Node number, VNN. It
maps the physical addresses with the VNN.
Also, CTDB works with two IP networks. The internal network on the infiniband for the CTDB
communication between the nodes. This is same as that for the clusters internal network for
communication between the nodes. The second type is the public addresses through which
the clients access the nodes for data.
You can check for the public IPs set for the nodes by running the following command on each
node:
# ctdb ip
Example output:

Number of addresses:4
12.1.1.1 0
12.1.1.2 1
12.1.1.3 2
12.1.1.4 3
The configuration of the CTDB is stored in /etc/sysconfig/ctdb on all nodes. The node details
which carries the list of all the IP addresses of the nodes in the CTDB cluster is stored in the
/etc/ctdb/nodes file. These are the private IP addresses of the nodes. The public addresses of
the clustered system is stored in /etc/ctdb/public_addresses file. These addresses are not
physically attached to a specific node and is managed by CTDB. It is attached/detached to a
physical node at runtime. Each node needs to specify the public_addresses that it services in
the /etc/ctdb/public_addresses file.
For example, if a cluster has six nodes and six IP addresses, each node should specify all of
the six IP addresses in order to be able to service any one at any point of time in case of a
failure. If certain IP address is not mentioned, that IP will not be serviced by this node. Hence,
its a good practice to specify all the Public IP addresses on each node, so each node can
failover for the IP if required.
Even though a node has all the public IP specified, CTDB assigns a unique IP address for it
to service. This means, for example, if we have a six node cluster and six public IP
addresses, then each node can hold all the six IP addresses. However, CTDB will assign just
unique IP addresses to each node such that at any point in time, a single IP address is
serviced only by a single node. As another example, consider a six node cluster with twelve
IP addresses. In this case, each node may take any of the twelve IP addresses, but CTDB will
assign two IP addresses that it will service, which is unique while each node.
CTDB uses round robin to assign IP addresses to the nodes. CTDB makes a table of all the
VNN number and assigns or maps each VNN with an IP address in a round robin way.
When a node fails, the CTDB remakes this table of IP addresses mapping for each node. It
considers all the nodes whether or not it is down. It assigns each node again with IP
addresses in a round robin way. Once this is done, the CTDB then picks the IP addresses
assigned to the node that has failed. It now counts the number of IP address each node is
servicing and redistributes the IP addresses to the nodes that have the least IP addresses. In
case, all are equal, it uses round robin mechanism.
CTDB Node revovery Mechanism

Following are some of the steps broadly done by the CTDB on a Node recovery:
򐂰 Freeze the cluster.
򐂰 Verify database state.
򐂰 Pull and merge records based on RSN.
򐂰 Push updated records.
򐂰 Cleanup databases.
򐂰 Assign IP takeover nodes.
򐂰 Build and distribute new LMASTER mapping.
򐂰 Create and distribute new GENERATION NUMBER.
򐂰 Set recovery node as the DMASTER for all the records.

IP Failover Mechanism:
When a node goes leaves the cluster, CTDB moves its public IP addresse to other nodes that
have the addresses listed in their public addresses pool. But now the clients connected to that
node have to reconnect to the cluster. In order to reduce the nasty delays that come with
these IP switches to a minimum, CTDB makes use of a clever trick called tickle-ACK.
How it works is, the client does not know that the IP he/she is connected to has moved, while
the new CTDB node only knows the TCP connection has become invalid, but does not know
the TCP sequence number. So the new CTDB node sends an invalid TCP packet with
sequence and ACK number set to zero. This “tickles” the client to send a valid ACK packet
back to the new node. Now CTDB can validly close the connection by sending a RST packet
and force the client to reestablish the connection.
CTDB Manages the Cluster

CTDB manages the cluster by monitoring the services and the health of the cluster. CTDB
has some configurable parameters like CTDB_MANAGES_GPFS, CTDB_MANAGES_FTP,
CTDB_MANAGES_NFS and more which when set to true, manages these services.
By default these variables are set to “yes”. CTDB then manages these services and in case
the services on any node is down, CTDB goes to unhealthy state. In case, you set this to”no”,
CTDB does not manage it any more and will remain healthy even if the service is down. If you
do not wish to monitor any service, you can set these variables to “no”.
In case of the SONAS appliance, the Management GUI provides with mechanism to modify
these configurable parameters. You can find all the configurable parameters in the
/etc/sysconfig/ctdb file.
CTDB status displays the status of each node. Node status reflects the current status of the
node. There are five possible states:
OK - This node is fully functional.
DISCONNECTED - This node could not be connected through the network and is
currently not participating in the cluster. If there is a public IP address associated with
this node it should have been taken over by a different node. No services are running
on this node.
DISABLED - This node has been administratively disabled. This node is still functional
and participates in the CTDB cluster but its IP addresses have been taken over by a
different node and no services are currently being hosted.
UNHEALTHY - A service provided by this node is malfunctioning and should be
investigated. The CTDB daemon itself is operational and participates in the cluster. Its
public IP address has been taken over by a different node and no services are
currnetly being hosted. All unhealthy nodes should be investigated and require an
administrative action to rectify.
BANNED - This node failed too many recovery attempts and has been banned from
participating in the cluster for a period of RecoveryBanPeriod seconds. Any public IP
address has been taken over by other nodes. This node does not provide any services.
All banned nodes should be investigated and require an administrative action to rectify.
This node does not perticipate in the CTDB cluster but can still be communicated with.
I.e. ctdb commands can be sent to it.
STOPPED - A node that is stopped does not host any public ip addresses, nor is it part
of the VNNMAP. A stopped node can not become LVSMASTER, RECMASTER or
NATGW. This node does not perticipate in the CTDB cluster but can still be
communicated with. I.e. ctdb commands can be sent to it.

You can check the status using the command:

# ctdb status
CTDB Tunables
CTDB has a lot of tunables that can be modified. However, these variables rarely need to be
modified.
You can check the variables by running command:

# ctdb listvars
Example output:
MaxRedirectCount = 3
SeqnumInterval = 1000
ControlTimeout = 60
TraverseTimeout = 20
KeepaliveInterval = 5
KeepaliveLimit = 5
MaxLACount = 7
RecoverTimeout = 20
RecoverInterval = 1
ElectionTimeout = 3
TakeoverTimeout = 5
MonitorInterval = 15
TickleUpdateInterval = 20
EventScriptTimeout = 30
EventScriptBanCount = 10
EventScriptUnhealthyOnTimeout = 0
RecoveryGracePeriod = 120
RecoveryBanPeriod = 300
DatabaseHashSize = 10000
DatabaseMaxDead = 5
RerecoveryTimeout = 10
EnableBans = 1
DeterministicIPs = 1
DisableWhenUnhealthy = 0
ReclockPingPeriod = 60
CTDB Databases
You can lists all clustered TDB databases that the CTDB daemon has attached to. Some
databases are flagged as PERSISTENT, this means that the database stores data
persistently and the data will remain across reboots. One example of such a database is
secrets.tdb where information about how the cluster was joined to the domain is stored.

You can check the databases available by running the command:

# ctdb getdbmap
Example output:
Number of databases:10
dbid:0x435d3410 name:notify.tdb path:/var/ctdb/notify.tdb.0
dbid:0x42fe72c5 name:locking.tdb path:/var/ctdb/locking.tdb.0 dbid:0x1421fb78
name:brlock.tdb path:/var/ctdb/brlock.tdb.0
dbid:0x17055d90 name:connections.tdb path:/var/ctdb/connections.tdb.0
dbid:0xc0bdde6a name:sessionid.tdb path:/var/ctdb/sessionid.tdb.0
dbid:0x122224da name:test.tdb path:/var/ctdb/test.tdb.0
dbid:0x2672a57f name:idmap2.tdb path:/var/ctdb/persistent/idmap2.tdb.0 PERSISTENT
dbid:0xb775fff6 name:secrets.tdb path:/var/ctdb/persistent/secrets.tdb.0 PERSISTENT
dbid:0xe98e08b6 name:group_mapping.tdb path:/var/ctdb/persistent/group_mapping.tdb.0
PERSISTENT
dbid:0x7bbbd26c name:passdb.tdb path:/var/ctdb/persistent/passdb.tdb.0 PERSISTENT
You can also check the details of a database by running the command:
# ctdb getdbstatus <dbname>
Example: ctdb getdbstatus test.tdb.0
Example output:
dbid: 0x122224da
name: test.tdb
path: /var/ctdb/test.tdb.0
PERSISTENT: no
HEALTH: OK
You can get more information on CTDB by running manpage for ctdb as follows:
#man ctdb
File system concepts and access permissions

File systems are a way of organizing and storing files where files are named sequences of
bytes. A file system arranges the named files into a structure such as a unix tree hierarchy to
facilitate location, access and retreival of individual files by the operating system. File
systems generally store data on an underlying storage device such as disk or tape in blocks
or clusters of a defined size. Files are named and the name is used by the users to locate and
access the files. Files can be organized in directory structures with subdirectories to facilitate
the organization of data.
Other than the actual file data, a file can contain associated metadata that contain attributes
such as last update time, the type such as file or directory and attributes that control access
such as user, group and access permissions or access rights that control what use can be

made of the file, such as execute it or read only. Filesystems offer functions to create, access,
move and delete files and directories. File systems may also offer hierarchies between
storage devices and offer quota mechanisms to control the amount of space used by users
and groups.
Permissions and access control lists

The implementation of permissions and access rights differs between file systems. Unix and
POSIX file systems support traditional unix permissions and also generally support POSIX.1e
or NFSv4 access control lists.
Traditional UNIX permissions

Permissions or access rights control the access of users to files in a file system. In unix file
systems permissions are grouped into three classes: users, group and other. The files in a
filesystem are owned by a user. This user or owner defines the file’s owner class. Files are
also assigned a group which define the group class, and it can have different permissions
from the user. The owner could also not be part of the file group but belong to a different
group.
There are three types of permissions for each class:

read That permits read to a file. When set on a directory it allows to list the
contents of the directory but not to read the contents or attributes of
individual files
write That permits writes and modification to a file or directory. Write also
allows file creation and deletion and rename.
execute That permits the user to run an executable file. Execute on a directory
allows the execution of files in that directory but does not enable listing
or viewing them.
If a permission is not set, the access it would allow is denied. Permissions are not inherited
from the upper level directory.
Access control lists

An access control list (ACL) is a list of permissions associated with a given file. The ACL
controls what users and groups in the system are allowed to access to a given file. Each entry
in an access control list is called an access control element (ACE) and the ACE contains two
parts: a user or group that i sthe subject of the authorization operation and an operation that
can be performed on the file such as execute or delete.
Windows uses an ACL model that differs considerably from the POSIX model and mapping
techniques between the two ar not completely satisfactory, so mapping between Windows
and Posix ACL should be avoided if possible. NFSv4 introduces an ACL model that is similar
to the Windows ACL model and so simplifies mapping between the two models. IBM GPFS
and SONAS implement NFSv4 ACLs natively.
Permissions and ACLs in Windows operating systems

There have been two main progressions of file and folder permissions on Windows operating
systems: DOS attributes and NTFS security.
DOS attributes:
There are four DOS attributes which can be assigned to files and folders.
Read Only File cannot be written to
Archive File has been touched since the last backup

System File is used by the operating system

Hidden File is relatively invisible to the user
Lost File is gone
These attributes apply to FAT, FAT32 and NTFS file systems.
NTFS security
There are 13 basic permissions which are rolled up into six permission groups. These apply
only to the NTFS file system, not FAT nor FAT32. The six permission groups are:
Full Control Allow all 13 basic permissions
Modify Allow all permissions except Delete subfolders and files, Change
permission, and Take ownership
Read Allow List folder/Read data, Read attributes, Read extended attributes
and Read permissions
Write Allow Create files/Append data, Write attributes, Write extended
attributes, Delete subfolders and files, and Read Permissions
Read and execute Allow all that the Read permission group allows plus Traverse
Folder/Execute File
List folder contents This is for folders only, not files. It is the same as Read and Execute
for files
The 13 basic permissions are the following; some of them differ depending on whether they
apply to folders or files:
򐂰 Traverse folders (for folders only)/Execute file (for files only)
򐂰 List folder/Read data
򐂰 Read attributes
򐂰 Read extended attributes
򐂰 Create files/Append data
򐂰 Write attributes
򐂰 Write extended attributes
򐂰 Delete subfolders and files
򐂰 Delete
򐂰 Read permissions
򐂰 Change permissions
򐂰 Take ownership
To view the permission groups, right-click on any file or folder in Windows explorer, choose
the Properties menu item and then choose the Security tab. More information on Windows file
and folder permissions is available on the Microsoft technet site at:
http://technet.microsoft.com/en-us/library/bb727008.aspx
GPFS overview
Smart Storage Management with IBM General Parallel File System.
Enterprise file data is often touched by multiple processes, applications and users throughout
the lifecycle of the data. Managing the data workflow is often the highest cost of part storage
processing and management, in regards to processing and people time. In the past,
companies have addressed this challenge using different approaches including clustered
servers and network attached storage. Clustered servers are typically limited in scalability
and often require redundant copies of data. Traditional network attached storage solutions
are restricted in performance, security and scalability. To effectively address these issues you
need to look at a new more effective data management approach. Figure A-1, describes a

typical infrastructre with unstructured data. This is a data Storage approach but not data
management.
Figure A-1 Unstructured data
In Figure A-2 on page 491, GPFS provides a real data management solution thanks to
following capabilities:
򐂰 File Management.
򐂰 Performance.
򐂰 Enhanced availability.
򐂰 Better automation.
򐂰 Scale-out Growth.
Indeed because GPFS allows you to bring together islands of information, redudant data and
underutilized segment of storage it povides a strong File Management solution. GPFS is also
a solution which is able to scale and integrate emerging technologies, providing you both
performance and security regarding your storage investment. By design GPFS is an
enhanced availability solution, ensuring data consistency through varied mechanism. These
mechanisms can even be easily automated thanks to powerful ILM tools integrated inside
GPFS.

Figure A-2 Structured data
To fulfill above capabalities, GPFS provides you a single gobal namespace with a centralized
management. This allows better Storage utilization and performances for varied workloads as
describe in Figure A-3 on page 492. Indeed both database application, archive or application
workload can use the single gobal namespace provided by GPFS. GPFS will automatically
handle all your storage subsystems ensuring a homogenous storage utilization.

Figure A-3 GPFS features
GPFS Architecture
The Figure A-4 on page 493, describes the GPFS architecture. Basically a typical GPFS
utilization is to run your daily business application on NSD Clients (or GPFS clients). These
clients will access the same global name space through a LAN. Data accessing from clients
will then been transferred to NSD servers (or GPFS servers) though the LAN. NSD clients
and NSD servers are gather in a GPFS cluster.
򐂰 Latest GPFS version (3.3) supports AIX, Linux and Windows as NSD clients or NSD
servers. These Operating system can run on many IBM and even non IBM Hardware.
򐂰 Regarding the LAN, GPFS can use GigE Network as well as 10GigE or Infiniband
networks
Then servers will commit IO operations on storage subsystem where are physically located
Luns. From a GPFS point of view a LUN is actually a NSD.
򐂰 GPFS supports varied Storage Subsystems IBM and non even non IBM. Basically as the
IBM SAN Volume Controller solution is also supported by GPFS, severals Storage
Subsystem solution are de facto compatible with GPFS.
To find more details regarding the software or hardware supported version refers to the
following link:
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.gpfs.d
oc/gpfs_faqs/gpfsclustersfaq.html
Before describing in more details GPFS features and mechanism, it is important to specify
here that like any File System, GPFS handle data and metadata, but even if there are two
kinds of nodes (clients and server) inside a single GPFS cluster, there are no nodes
dedicated to metadata management. Like all mechanism inside GPFS every nodes inside the

cluster can be used to execute this mechanism. Basically some GPFS mechanisms are run in
parallel from all nodes inside the cluster like the File System scanning for instance.
Figure A-4 GPFS Architecture
GPFS File Management

We will describe here some GPFS mechanism which will bring you a strong File
management. As described above inside a single GPFS cluster you have NSD clients and
NSD servers. The main difference between these roles is that NSD servers are physically
connected to Storage Subsystem where LUNs are located. From any nodes inside the GPFS
cluster you can create one or many GPFS File Systems on top of these NSD. During NSD
creation, the operation which consist on allocating LUNs to the GPFS layer, you can choose
to host both data and metadata on the same NSD, or split them and then use some LUNs to
host metadata or data only. As GPFS is a scalable solution, you will probably attach more
than a single Storage Subsystem to your SAN and then your NSD servers. Assuming
different technology or drives technology you can choose to host metadata on SAS or SSD
drives; and data on SATA drives for example. This in order to increase metadata
performance.
Still inside the same GPFS file system, you can decide to create different failure groups. As
for the metadata/data split you can can specify the failure group during the NSD creation or
change it later. Then during file system creation, you can specify GPFS that you want a
replication of your metadata and/or data. This can also been changed after creation. GPFS
will then automatically replicate your metadata and/or data to your failure groups. Obviously
as it is replication you need twice the required capacity to replicate all data.
Another option would be to create some Storage Pools. Still assuming you have several
storage subsystem in your SAN, and then accessing by your NSD servers, you can decide to
create multiple Storage Pools. One SATA storage pool, one SAS Storage pool or a Tape
storage pool, the same way you can aslo decide to create an IBM DS8300 storage pool and
an IBM DS5300 storage pool, or even an IBM storage Pool and X Storage Pool. Here again
you can decide it during NSD creation, or change it later. Then you can use these storage
pool for different workload. For instance use SATA pool for multimedia files, use SAS pool for
financial workload, and tape pool for archive. Or you can use SAS pool for daily business,

then move file to SATA pool at the end of the week, and later to Tape Pool. Whereas failure
groups were automatically handled by GPFS, Storage pool mechanism need some rules to
be automated by GPFS. Indeed with Information Life cycle Management tool provided by
GPFS, you can create some rules which will then be part of a policy. Basics rules are
placement rules: place multimedia files on SATA pool, and financial workload files on SAS
pool, or migration rules: move data from SAS pool to SATA pool at the end of the week, and
move data from SATA to Tape pool at the end of the month. This rules can be gather inside
GPFS policies. This policies can then be automated and scheduled.
You can also use more complex rules and policies to run a command at any given time on the
entire GPFS File System on a subset of files like: delete all files older than two years, or move
all files from the projectA directory to tape. In order to compare with classical Unix
commands, migration rules would be a mv command whereas the last one would be more
like a find command combine with an exec one.
The ILM tool can be used for each GPFS File System, but you can also create some GPFS
FileSets inside a single GPFS File System for a finer granularity, and then apply policy or
quota rules to this File Set which are basically directory or GPFS sub trees.
GPFS Performance
GPFS is not only a centralized management solution providing a global namespace, indeed
GPFS has been design to scale according to your need in term of capacity but also to provide
an aggregate bandwidth if set up appropriately. As explained above a typical use of GPFS is
to run daily business application on NSD clients which are accessing data through the
network. Note that depending your requirement, you may have only NSD servers and no
NSD clients. With such a configuration you will run your application directly from NSD servers
which also have access to the global namespace. Assuming you have NSD clients running
your application on the GPFS File System, this file system has been created with a key
parameters: the GPFS BlockSize. In few words the equivalent of the GPFS BlockSize for a
NSD servers, is the chunk size or segment size for a RAID controller. This block size can be
set from 16KB to 4MB.
Assuming a GPFS cluster with some NSD clients, four NSD servers and one storage
subsystem with four RAID arrays. The GPFS File System has been created with a 1MB Block
Size. From the storage subsystem point of view, all four arrays have been configured in a
RAID5 configuratoin with a 256KB segment size. Your application is running on NSD clients
and generate a 4MB IO. These 4MB packets will be sent through the network in 1MB piece to
NSD servers. NSD servers will then forward the 1MB packets to Storage Subsystem
controller which will split these into 256KB pieces (segment size). This leads to a single 4MB
IO writen in a single IO operation on disk level as decribed in Figure A-5 on page 495,
Figure A-6 on page 495, Figure A-7 on page 496 and Figure A-8 on page 496.
In figures below, each NSD is a RAID 5 array built with four data disks and an extra parity
one. Performing any IO operation on a NSD is equivalent to perform IO operations on
physical disks inside the RAID.

Figure A-5 Step 1, application is generating IO
Figure A-6 Step 2 data sent to NSD servers

Figure A-7 Step 3 data sent to Storage Subsystem
Figure A-8 Step 4 GPFS block size chop into segment size piece by the controller
This above figures describe the GPFS functionment with few NSD and a single Storage
Subsystem, but this is exactly the same for larger configuration and will all NSD clients
running application on the GPFS File System in parallel. As GPFS has been design to scale

with your storage infrastructure, if you add more Storage Subsystem and NSD servers, you
will increase your overall bandwidth
GPFS High Availability solution

GPFS provides you great performance and an efficient centralized file management tools.
But this is also a great high availability solution. From the NSD clients side, as GPFS provides
a single domain namespace, you can access data from any nodes inside the GPFS cluster,
from all NSD clients but also NSD servers. Moreover if you have a network layer fully
redondant you can assume that there is no point of failure from the NSD clients side.
From the NSD servers side you also have a high availability solution. In figures described
above, NSD clients are sending packets to NSD servers in parallel. Actually each block is
sent to each NSD in a round robin way. But each NSD has one NSD server as primary
server, and a list of NSD servers as backup. This means that from the NSD clients point of
view, sending data to NSD in a round robin way is equivalent to send data to primary NSD
servers in a round robin way. But as each NSD has a list of backup, if a network or a Fibre
Channel connection (in case of a SAN) is broken, or even a NSD server failure, NSD client
will send data to backup NSD servers. You can afford to loose many NSD servers and still
access data on the GPFS File System. Depending your GPFS configuration and architecture,
you can even still access data as long as one NSD server is up and running. GPFS
mechanisms ensure your data integrity once all NSD serves are back again and ready to be
use again.
From the Storage side, only RAID configuration can guaranty the high availibility of your data,
but you can also use GPFS features to replicate your data for better security as described in
“GPFS File Management” on page 493. You can even replicate synchronously your entire
GPFS File System between two distant sites.
As any file system solution GPFS also provide snapshot function with a maximum of 256
snapshots per file system. These snapshot are readable only, and are by definition
instantaneous. GPFS snapshot feature us the copy on write method on a block level, which
means that the original version of a block (GPFS Block Size) is copied anywhere else while
the new version of the block is updated. The snaphot is then pointing on the new position of
the blocks in the allocation map table.
Even if there is no metadata server concept in GPFS, there are however some key roles
between GPFS nodes inside the cluster. These roles are required to ensuer data integrity.
GPFS special roles are:
򐂰 The GPFS cluster manager
򐂰 The file system manager
򐂰 The metanode
The GPFS cluster manager is reponsible of granting disks leases, detecting disks failure or
selecting the file system manager for instance. The file system manager is in charge of
several roles as the file system configuration (changes in configuration for example) or the
management of disk space allocation (for efficient parallel allocation of space), token
management (for file locking mechanism) or quota management (if configured).
The two previous one are unique inside a single file system whereas there are as many
metanode as there are open frile. Indeed the metanode is the node inside the GPFS cluster
responsible of the metadata for a given open file.

GPFS failure group

A failure group is a group of disks that have a common point of failure. GPFS by default will
assign a failure group at the node level for each disk in the GPFS file system. The reason for
this is that the node is seen as a single point of failure. You will be able to assign a number to
each failure group.
Another type of failure group that is seen from more than a single nodes point of view is a
Virtual Shared Disk that is twin-tailed and available from two nodes over an IBM Recoverable
Shared Disk. Although there are two nodes, this represents a single failure group.
The default number that GPFS assigns to a failure group is the node number plus 4000. If you
decide that a disk should ignore failure group consideration, then you can assign it a value of
-1. This generally indicates that a disk has no point of common failure with any other disk.
You will be able to assign a value from -1 to 4000 for your failure groups.
Refer to 2.2.2.4, “Creating Disk Descriptor Files” on page 46 for examples of failure groups.
Other GPFS features

We have described above what are basics GPFS utilization, but there are several other
features you can use with GPFS. This feature can be as simple as a GUI for management
purpose and complex like multi cross clusters.
Among these interesting feature you have:

򐂰 The GUI which is incuded in GPFS packages if you are more familiar with GUI than CLI.
򐂰 The Cluster NFS (CNFS) feature, which allows you to use some nodes inside the GPFS
cluster as NFS clients, and then access the GPFS File System from other nodes not in the
GPFS cluter using the NFS protocol. You can even load balanced to access through many
NFS servers with appropriate DNS configuration. Similarly GPFS supports also NFSv4
ACLs and Samba and then allows Windows and Unix users to share data.
򐂰 The HSM compatibility, indeed you can use GPFS in combination with HSM for better
tiering usage inside your Storage Infrastructure.
򐂰 The cross cluster feature, which allows you in case of multi sites data center, to grant
access to NSD clients from remote GPFS cluster site to the local GPFS File System (and
vice versa).
Documentation
For any more detailed documentation on GPFS refer to the IBM website:
http://www-03.ibm.com/systems/software/gpfs/index.html
or the online GPFS documentation:
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.gpfs.d
oc/gpfsbooks.html
or GPFS wiki:
http://www.ibm.com/developerworks/wikis/display/hpccentral/General+Parallel+File+System
+%28GPFS%29

Tivoli Storage Manager (TSM) overview

In this section we illustrate the IBM Tivoli Storage Manager (TSM) software product and how
it relates to a SONAS environment. TSM is the cornerstone product on which IBM bases data
protection and storage management. We explain how Tivoli Storage Manager provides an
abstraction or virtualization lWe discuss the following topics:
򐂰 Tivoli Storage Manager concepts and architecture
򐂰 TSM backup archive client
򐂰 TSM Hierarchical Storage Management (HSM) client
򐂰 TSM generic NDMP support
For additional information, refer to the following redbooks IBM Tivoli Storage Management
Concepts, SG24-4877 and Tivoli Storage Manager V6.1 Technical Guide, SG24-7718 .
Tivoli Storage Manager concepts

IBM Tivoli Storage Manager provides a comprehensive solution focused on the key data
protection and management activities of backup, archive, recovery, space management, and
disaster recovery. Tivoli Storage Manager allows you to separate the backup, archiving, and
retention of data from storage-related aspects of the data, in addition to many other services.
Tivoli Storage Manager offers many data protection and storage management functions
relevant to SONAS:
Data backup with TSM "Progressive backup: Progressive backup for file systems eliminates
the need for redundant, full backups again. This allows you to backup less data each night
saving network, server, and storage resources than with other, traditional open systems
products Backups will finish quicker and restores will be faster because the restore
paradigm of full + incremental is not required with TSM. Backups finish quicker because less
file data needs to be moved using TSM. TSM always has a full backup available in the TSM
storage repository.
Data archiving defines how to insert data into the data retention system. Tivoli Storage
Manager offers a command line interface to archive and back up files and a C language
application programming interface (API) for use by content management applications.
Storage defines on which storage device to put the object. Tivoli Storage Manager supports
hundreds of disk and tape storage devices and integrated hierarchical storage management
of stored data. You can choose the most effective storage device for your requirements and
subsequently let the data automatically migrate to different storage tiers. WORM functionality
is offered by System Storage Archive Manager. The Tivoli Storage Manager administrator
cannot accidentally or intentionally delete objects stored in Tivoli Storage Manager.
Storage management services are provided by Tivoli Storage Manager. These additional
storage management services facilitate hardware replacement and disaster recovery. Tivoli
Storage Manager allows for easy migration to new storage devices when the old storage
devices need replacing, and this will likely happen when data is retained for long periods of
time. Tivoli Storage Manager also offers functions to make multiple copies of archived data.
Tivoli Storage Manager offers a strong and comprehensive set of functions that you can
exploit to effectively manage archived data. You can consider Tivoli Storage Manager an
abstraction or virtualization layer between applications requiring data retention or storage
management services and the underlying storage infrastructure.

Tivoli Storage Manager architectural overview

Tivoli Storage Manager is a client server software application that provides services such as
network backup and archive of data to a central server. There are two main functional
components in a Tivoli Storage Manager environment:
򐂰 You install the Tivoli Storage Manager client component on servers, computers, or
machines that require Tivoli Storage Manager services. The Tivoli Storage Manager client
accesses the data to be backed up or archived and is responsible for sending the data to
the server.
򐂰 The Tivoli Storage Manager server is the central repository for storing and managing the
data received from the Tivoli Storage Manager clients. The server receives the data from
the client over the LAN network, inventories the data in its own database, and stores it on
storage media according to predefined policies. Figure 13-10 illustrates the components of
a Tivoli Storage Manager environment. You can see that the core component is the Tivoli
Storage Manager server.
Figure 13-10 Tivoli Storage Manager components: architectural overview
We review and discuss the main components and functions of a Tivoli Storage Manager
environment, emphasizing the components that are most relevant to an ILM-optimized
environment. These components are:
򐂰 Tivoli Storage Manager server
򐂰 Administrative interfaces
򐂰 The server database
򐂰 Storage media management
򐂰 Data management policies
򐂰 Security concepts
򐂰 Backup Archive client interface
򐂰 Client application programming interface (API)
򐂰 Automation
򐂰 The client to server data path
Tip: For a detailed overview of Tivoli Storage Manager and its complementary products,refer
to the IBM Tivoli software information center at the following location:

http://publib.boulder.ibm.com/infocenter/tivihelp
Tivoli Storage Manager server

The Tivoli Storage Manager server consists of a run-time environment and a DB2 relational
database. You can install the server on several operating systems and on diverse hardware
platforms, covering all popular environments. The DB2 database with its recovery log stores
all the information about the current environment and the managed data. The Tivoli Storage
Manager server listens for and communicates with the client systems over the LAN network.
Administrative interfaces
For the central administration of one or more Tivoli Storage Manager server instances, as
well as the whole data management environment, Tivoli Storage Manager provides command
line or Java™-based graphical administrative interfaces, otherwise known as administration
clients. The administrative interface enables administrators to control and monitor server
activities, define management policies for clients, and set up schedules to provide services to
clients atregular intervals.
The server database

The Tivoli Storage Manager server database is based on a standard DB2 database that is
integrated into and installed with the Tivoli Storage Manager server itself. The Tivoli Storage
Manager server DB2 database stores all information relative to the Tivoli Storage Manager
environment, such as the client nodes that access the server, storage devices, and policies.
The Tivoli Storage Manager database contains one entry for each object stored in the Tivoli
Storage Manager server, and the entry contains information, such as:
򐂰 Name of the object
򐂰 Tivoli Storage Manager client that sent the object
򐂰 Policy information or Tivoli Storage Manager management class associated with theobject
򐂰 Location where the object is stored in the storage hierarchy
The Tivoli Storage Manager database retains information called metadata, which means data
that describes data. The flexibility of the Tivoli Storage Manager database enables you to
define storage management policies around business needs for individual clients or groups of
clients. You can assign client data attributes, such as the storage destination, number of
versions, and retention period at the individual file level and store them in the database.
The Tivoli Storage Manager database also ensures reliable storage management processes.
To maintain data integrity, the database uses a recovery log to roll back any changes made if
a storage transaction is interrupted before it completes. This is known as a two-phase
commit.
Storage media management

Tivoli Storage Manager performs multiple diverse hierarchy and storage media management
functions by moving or copying data between different pools or tiers of storage, as shown in

Figure 13-11 Tivoli Storage Manager management of the storage hierarch
A Tivoli Storage Manager server can write data to more than 400 types of devices, including
hard disk drives, disk arrays and subsystems, standalone tape drives, tape libraries, and
other forms of random and sequential-access storage. The server uses media grouped into
storage pools. You can connect the storage devices directly to the server through SCSI,
through directly attached Fibre Channel, or over a Storage Area Network (SAN). Tivoli
Storage Manager provides sophisticated media management capabilities that enable IT
managers to perform the following tasks:
򐂰 Track multiple versions of files (including the most recent version)
򐂰 Respond to online file queries and recovery requests
򐂰 Move files automatically to the most cost-effective storage media
򐂰 Expire backup files that are no longer necessary
򐂰 Recycle partially filled volumes
Tivoli Storage Manager provides these capabilities for all backup volumes, including on-site
volumes inside tape libraries, volumes that have been checked out of tape libraries, and
on-site and off-site copies of the backups.
Tivoli Storage Manager provides a powerful media management facility to create multiple
copies of all client data stored on the Tivoli Storage Manager server. Enterprises can use this
facility to back up primary client data to two copy pools: One stored in an off-site location, and
the other kept on-site for possible recovery from media failures. If a file in a primary pool is
damaged or resides on a damaged volume, Tivoli Storage Manager automatically accesses
the file from an on-site copy if it is available or indicates which volume needs to be returned
from an off-site copy.
Tivoli Storage Manager also provides a unique capability for reclaiming expired space on
off-site volumes without requiring the off-site volumes to be brought back on-site. Tivoli
Storage Manager tracks the utilization of off-site volumes just as it does for on-site volumes.

When the free space of off-site volumes reaches a determined reclamation threshold, Tivoli
Storage Manager uses the on-site volumes to consolidate the valid files onto new volumes,
then directs the new volumes to be taken off-site. When the new tapes arrive off-site, Tivoli
Storage Manager requests the return of the original off-site volumes, which can be reused as
scratch volumes.
Data management policies

A data storage management environment consists of three basic types of resources: client
systems, rules, and data. The client systems contain the data to manage, and the rules
specify how the management must occur; for example, in the case of backup, how many
versions you keep, where you store them, and so on.
Tivoli Storage Manager policies define the relationships between these three resources.
Depending on your actual needs for managing your enterprise data, these policies can be
simple or complex.
Tivoli Storage Manager has certain logical entities that group and organize the storage
resources and define relationships between them. You group client systems, or nodes in
Tivoli Storage Manager terminology, together with other nodes with common storage
management requirements, into a policy domain.
We discuss these concepts in greater detail in “Policy management” on page xx.
Security concepts
Because the storage repository of Tivoli Storage Manager is the place where an enterprise
stores and manages all of its data, security is a vital aspect for Tivoli Storage Manager. To
ensure that only the owning client or an authorized party can access the data, Tivoli Storage
Manager implements, for authentication purposes, a mutual suspicion algorithm, which is
similar to the methods used by Kerberos authentication.
Whenever a client (backup/archive or administrative) wants to communicate with the server,

an authentication has to take place. This authentication contains both-sides verification,
which means that the client has to authenticate itself to the server, and the server has to
authenticate itself to the client.
To do this, all clients have a password, which is stored at the server side as well as at the
client side. In the authentication dialog, these passwords are used to encrypt the
communication. The passwords are not sent over the network, to prevent hackers from
intercepting them. A communication session will be established only if both sides are able to
decrypt the dialog. If the communication has ended, or if a time-out period has ended with no
activity, the session will automatically terminate and a new authentication will be necessary.
TSM offers encription of data sent by the client to the server. It offers both 128 bit AES and 56
bit DES encription.
Backup Archive client interface

Tivoli Storage Manager is a client-server program. You must install the client product on the
machine you want to back up. The client portion is responsible for sending and receiving data
to and from the Tivoli Storage Manager server. The Backup Archive client has two distinct
features:
򐂰 The backup feature allows users to back up a number of versions of their data onto the
Tivoli Storage Manager server and to restore from these, if the original files are lost or
damaged. Examples of loss or damage are hardware failure, theft of computer system, or
virus attack.

򐂰 The archive feature allows users to keep a copy of their data for long term storage and to
retrieve the data if necessary. Examples of this are to meet legal requirements, to return to
a previous working copy if the software development of a program is unsuccessful, or to
archive files that are not currently necessary on a workstation.
The latter features are the central procedures around which Tivoli Storage Manager is built.
Backup and archive are supporting functions to be able to retrieve lost data later on. You can
interact with the Tivoli Storage Manager server to run a backup/restore or archive/retrieve
operation through three different interfaces:
򐂰 Graphical User Interface (GUI)
򐂰 Command Line Interface (CLI)
򐂰 Web Client Interface (Web Client)
The command line interface has a richer set of functions than the GUI. The CLI has the
benefit of being a character mode interface, and, therefore, is well suited for users who need
to type the commands. You may also consider using it when you cannot access the GUI
interface or when you want to automate a backup or archive by using a batch processing file.
Client application programming interface (API)

Tivoli Storage Manager provides a data management application program interface (API) that
you can use to implement application clients to integrate popular business applications, such
as databases or groupware applications. The API also adheres to an open standard and is
published to enable customers and vendors to implement specialized or custom clients for
particular data management needs or nonstandard computing environments. The Tivoli
Storage Manager API enables an application client to use the Tivoli Storage Manager storage
management functions. The API includes function calls that you can use in an application to
perform the following operations:
򐂰 Start or end a session
򐂰 Assign management classes to objects before they are stored on a server
򐂰 Archive objects to a server
򐂰 Signal retention events for retention, such as activate, hold, or release
Alternatively, some vendor applications exploit the Tivoli Storage Manager data management
API by integrating it into their software product itself to implement new data management
functions or to provide archival functionality on additional system platforms. Some examples
are IBM DB2 Content Manager, IBM DB2 Content Manager OnDemand, IBM CommonStore
for SAP R/3, Lotus® Domino®, and Microsoft Exchange data archival.
The API, including full documentation available on the Internet, is published to enable
customers and vendors to implement their own solutions to meet their requirements.
Automation
Tivoli Storage Manager includes a central scheduler that runs on the Tivoli Storage Manager
server and provides services for use by the server and clients. You can schedule
administrative commands to tune server operations and to start functions that require
significant server or system resources during times of low usage. You can also schedule
client action, but that would be unusual for a data retention-enabled client. Each scheduled
command (administrative or client) action is called an event. The server tracks and records
each scheduled event and its completion status in the Tivoli Storage Manager server
database.
Client to server data path

Tivoli Storage Manager data can travel from client to server either over the LAN network or
the SAN network when using Tivoli Storage Manager for SAN to enable LAN-free data

transfers. The diagram in Figure 13-12 schematically illustrates the components and data
paths in a Tivoli Storage Manager environment.
Figure 13-12 Backup environment pipeline and data flows
Figure 13-12 shows the data flow or pipeline and potential bottlenecks in a Tivoli Storage
Manager environment. It illustrates the route the data takes through the many components of
the client-server storage environment. For each step in this route, we list causes of potential
performance bottlenecks.
Data is read by the backup/archive client from client disk or transferred in memory to the API
client from a content manager application. The Tivoli Storage Manager client might compress
the data before sending it to the Tivoli Storage Manager server in order to reduce network
utilization.
The client can choose whether or not to use the LAN or the SAN, also called LAN-free, for
data transport. The SAN is optimized for bulk transfers of data and allows writing directly to
the storage media, bypassing the Tivoli Storage Manager server and the network. LAN-free
support requires an additional Tivoli Storage Manager license called Tivoli Storage Manager
for SAN. Archiving data is normally a low volume operation, handling relatively small amounts
of data to be retained for long periods of time. In this case, the LAN is more than adequate for
data transport.
The Tivoli Storage Manager server receives metadata, and data when using LAN transport,
over the LAN network. Tivoli Storage Manager then updates its database. Many small files
potentially can cause a high level of database activity.
When the data is received over the LAN, it generally is stored in a disk storage pool for later
migration to tape as an overflow location.
The maximum performance of data storage or retrieval operations depends on the slowest
“link in the chain”, another way of illustrating it is that performance is constrained by the
smallest pipe in the pipeline, as shown in Figure 13-12 on page 505. In the figure, the LAN is
the constraint on performance.

Tivoli Storage Manager storage management

Tivoli Storage Manager manages client data objects based on information provided in
administrator-defined policies.
Data objects can be subfile components, files, directories, or raw logical volumes that are
archived from client systems. They can be objects such as tables, logs, or records from
database applications, or simply a block of data that an application system archives to the
server. The Tivoli Storage Manager server stores these objects on disk volumes and tape
media that it groups into storage pools.
Tivoli Storage Manager storage pools and storage hierarchy

Tivoli Storage Manager manages data as objects as they exist in Tivoli Storage Manager
storage pools. See Figure 13-13 on page 506:
Backup client
LAN, WAN, or
SAN
Data
object Device class - disk Device class - tape
Primary storage pool - disk
Data
object
Copy storage pool

Migrate
storage pool - tape

volumes
Copy pool
Primary storage pool - tape
TSM storage hierarchy

4 © 2005 IBM Corporation
Figure 13-13 IBM Tivoli Storage Manager storage hierarchy
Each object is “bound” to an associated management policy. The policy defines how long to
keep that object and where the object enters the storage hierarchy. The physical location of
an object within the storage pool hierarchy has no effect on its retention policies. You can
migrate or move an object to another storage pool within a Tivol Storage Manager storage
hierarchy. This can be useful when freeing up storage space on higher performance devices,
such as disk, or when migrating to new technology. You can and should also copy objects to
copy storage pools. To store these data objects on storage devices and to implement storage
management functions, Tivoli Storage Manager uses logical definitions to classify the
available physical storage resources. Most important is the logical entity called a storage
pool, which describes a storage resource for a single type of media, such as disk volumes,
which are files on a file system, or tape volumes, which are cartridges in a library.

Native data deduplication

Tivoli Storage Manager provides a built-in data deduplication feature. Deduplication is a
technique that allows more data to be stored on a given amount of media than would
otherwise be possible. It works by removing duplicates in the stored version of your data. In
order to do that, the deduplication system has to process the data into a slightly different
form. When you need the data back, it can be reprocessed into in the same form as it was
originally submitted.
Tivoli Storage Manager is capable of deduplicating data at the server. It performs

deduplication out of band, in Tivoli Storage Manager server storage pools. Deduplication is
only performed on data in FILE (sequential disk) devtype storage pools. Tivoli Storage
Manager chunks the data and calculates an MD5 of all the objects in question, which are then
sliced up into chunks. Each chunk has an SHA1 hash associated with it, which is used for the
deduplication. The MD5s are there to verify that objects submitted to the deduplication
system are reformed correctly, since the MD5 is recalculated and compared with the saved
one to ensure that returned data is correct.
Deduplication and compression are closely related, and the two often work in similar ways,
but the size of working set of data for each is different. Deduplication works against large data
sets compared to compression (for example, real-world LZW compression often only has a
working set under 1MB, compared to deduplication which is often implemented to work in the
range of 1TB to 1PB). With deduplication, the larger the quantity being deduplicated, the
more opportunity exists to find similar patterns in the data, and the better the deduplication
ratio can theoretically be, so a single store of 40TB would be better than five separate
datastores of 8TB each.
Deduplication is effective with many, but not all workloads. It requires that there are
similarities in the data being deduplicated: for example if a single file exists more than once in
the same store, this could be reduced down to one copy plus a pointer for each deduplicated
version (this is often referred to as a “Single Instance Store”). Some other workloads such as
uncompressible and non-repeated media (JPEGs, MPEGs, MP3, or specialist data such as
geo-survey data sets) will not produce significant savings in space consumed. This is
because the data is not compressible, has no repeating segments, and has no similar
segments.
To sum up, deduplication typically allows for more unique data to be stored on a given
amount of media, at the cost of the additional processing on the way into the media (during
writes) and the way out (during reads).
Device classes
A storage pool is built up from one or more Tivoli Storage Manager storage pool volumes. For
example, a disk storage pool can consist of several AIX raw logical volumes or multiple AIX
files on a file system. Each AIX raw logical volume or AIX file corresponds to one Tivoli
Storage Manager storage pool volume.
A logical entity called a device class is used to describe how Tivoli Storage Manager can
access those physical volumes to place the data objects on them. Each storage pool is bound
to a single device class.
The storage devices used with Tivoli Storage Manager can vary in their technology and total
cost. To reflect this fact, you can imagine the storage as a pyramid (or triangle), with
high-performance storage in the top (typically disk), normal performance storage in the
middle (typically optical disk or cheaper disk), and low-performance, but high-capacity,
storage at the bottom (typically tape). Figure 4-4 illustrates this tiered storage environment
that Tivoli Storage Manager uses:

򐂰 Disk storage devices are random access media, making them better candidates for
storing frequently accessed data. Disk storage media with Tivoli Storage Manager can
accept multiple parallel data write streams.
򐂰 Tape, however, is an economical high-capacity sequential access media, which you can
can easily transport off-site for disaster recovery purposes. Access time is much slower for
tape due to the amount of time necessary to load a tape into a tape drive and locate the
data. However, for many applications, that access time is still acceptable.
Note: Today many people in the industry say that tape is dead and customers should use
disk instead. However, the performance of high-end tape devices is often unmatched by
disk storage subsystems. Current tape has a native performance in the range of or over
100MB/sec. that with compression can easily pass 200 MB/sec. Also, you should consider
the cost: The overall power consumption of tape is usually less than that of a continuously
Disk storage is referred to as online storage, while tape storage has often been referred to as
off-line and also near-line with regard to Hierarchical Storage Management (HSM) in the past.
With Tivoli Storage Manager HSM, tape volumes, located in a tape library, are accessed by
the application that is retrieving data from them (near-line) transparently. Tapes no longer in
the library are off-line, requiring manual intervention. The introduction of lower cost mass
storage devices, such as Serial Advanced Technology Attachment (SATA) disk systems,
offers an alternative to tape for near-line storage. Figure 13-14 illustrates the use of a SATA
disk as near-line storage.
Figure 13-14 Online, near-line, and off-line storage
Device types
Each device defined to Tivoli Storage Manager is associated with one device class. Each
device class specifies a device type. A device type identifies a device as a member of a group
of devices, devices that shares similar media characteristics. For example, the LTO device
type applies toLTO tape drives.
The device type also specifies management information, such as how the server gains
access to the physical volumes, recording format, estimated capacity, and labeling prefixes.
Device types include DISK, FILE, and a variety of removable media types for tape and optical
devices. Note that a device class for a tape or optical drive must also specify a library. The
library defines how Tivoli Storage Manager can mount a storage volume onto a storage
device such as a tape drive.
Device access strategy

The access strategy of a device is either random or sequential. Primary storage pools can
use random devices (such as disk) or sequential devices (such as tape). Copy storage pools

use sequential access devices. Certain Tivoli Storage Manager processes use only
sequential access strategy device types:
򐂰 Copy storage pools
򐂰 Tivoli Storage Manager database backups
򐂰 Export
򐂰 Import
Tape devices
Tivoli Storage Manager supports a wide variety of enterprise class tape drives and libraries.
We recommend that you use tape devices for backing up your primary storage pools to copy
storage pools and for backing up the database. Tape devices are well suited for this, because
the media can be transported off-site for disaster recovery purposes.
Policy management
A data storage management environment consists of three basic types of resources: client
system, policy, and data. The client systems ontains the data to manage, for example, file
systems with multiple files.
The policies are the rules to specify how to manage the objects. For example, for archives
they define how long to retain an object in TSM storage, in which storage pool to place an
object; or, in the case of backup, how many versions to keep, where they should be stored,
and what Tivoli Storage Manager does to the stored object once the data is no longer on the
client file system.
Client systems, or nodes, in Tivoli Storage Manager terminology, are grouped together with
other nodes with common storage management requirements into a policy domain. The
policy domain links the nodes to a policy set, a collection of storage management rules for
different storage management activities.
Note: The term client node refers to the application sending data to the Tivoli Storage
Manager server.
A policy set consists of one or more management classes. A management class contains the
rule descriptions called copy groups and links these to the data objects to manage.
A copy group is the place where you define all the storage management parameters, such as
the number of stored copies, retention period, and storage media. When the data is linked to
particular rules, it is said to be bound to the management class that contains those rules.
Another way to look at the components that make up a policy is to consider them in the
hierarchical fashion in which they are defined; that is, consider the policy domain containing
the policy set, the policy set containing the management classes, and the management
classes containing the copy groups and the storage management parameters, as illustrated
in Figure 13-15.

Clients Policy domain

nodes
Policy set #3
Policy set #2
Policy set #1
Management Copy group Data

Class #1 Rules

Class #2 Rules

Class #3 Rules
Figure 13-15 Policy relationships and resources
We explain the relationship between the items in Figure 13-15 in the following pages.
Copy group rules

Copy group rules can define either a backup copy group or an archive copy group. One set of
rules applies to backups and a separate set to archives:
Backup copy group

This copy group controls the backup processing of files associated with the specific
management class. It is uncommon to use backup copy groups for archival or data retention
applications because they are better suited to backup versioning of files. A backup copy
group determines:
򐂰 Where to store the object
򐂰 What to do if the file if file on the client is in use
򐂰 Whether or not to backup only if modified or changed
򐂰 Enforce minimum frequency of backup, to avoid backing up every time
򐂰 If the file exists on the client node:
– How many copies to keep
– How long to keep them
򐂰 If the file has been deleted on the client:
– How many copies to keep
– How long to keep the last copy of the file
Archive copy group

This copy group controls the archive processing of files associated with the management
class. An archive copy group determines:
򐂰 How the server handles files that are in use during archive
򐂰 Where the server stores archived copies of files
򐂰 How long the server keeps archived copies of files

Management class
The management class associates client files with archive copy groups with files. A
management class is a Tivoli Storage Manager policy.
Each individual object stored in TSM is associated with one and only one management class.
A management class is a container for copy groups; it can contain either one backup or
archive copy group, both a backup and an archive copy group, or no copy groups at all. Users
can bind (that is, associate) their files to a management class through the include-exclude list,
a set of statements or rules that associate files to a management class based on file filtering
rules. Alternatively, a user can explicitly request an archive management class.
Policy set
The policy set specifies the management classes that are available to groups of users. Policy
sets contain one or more management classes. You must identify one management class as
the default management class. Only one policy set, the ACTIVE policy set, controls policies in
a policy domain.
Policy domain
The concept of policy domains enables an administrator to group client nodes by the policies
that govern their files and by the administrators who manage their policies. A policy domain
contains one or more policy sets, but only one policy set (named ACTIVE) can be active at a
time. The server uses only the ACTIVE policy set to manage files for client nodes assigned to
a policy domain.
򐂰 You can use policy domains to:
򐂰 Group client nodes with similar file management requirements
򐂰 Provide different default policies for different groups of clients
򐂰 Direct files from different groups of clients to different storage hierarchies based on need
򐂰 Restrict the number of management classes to which clients have access
Figure 13-16 summarizes the relationships among the physical device environment, Tivoli
Storage Manager storage and policy objects, and clients. The numbers in the following list
correspond to the numbers in the figure.
Figure 13-16 Basic policy structure for backup
Figure 13-16 shows an outline of the policy structure. These are the steps to create a valid
policy:

1. When clients are registered, they are associated with a policy domain. Within the policy
domain are the policy set, management class, and copy groups.
2. When a client (application) backs up an object, the object is bound to a management class.
A management class and the backup copy group within it specify where files are stored first
(destination), and how they are managed.
3. Storage pools are the destinations for all stored data. A backup copy group specifies a
destination storage pool for archived files. Storage pools are mapped to device classes,
which represent devices. The storage pool contains volumes of the type indicated by the
associated device class.
Data stored in disk storage pools can be migrated to tape or optical disk storage pools and
can be backed up to copy storage pools.
Hierarchical storage management

Hierarchical storage management (HSM) refers to a function of Tivoli Storage Manager that
automatically distributes and manages data on disk, tape, or both by regarding devices of
these types and potentially others as levels in a storage hierarchy. The devices in this storage
hierarchy range from fast, expensive devices to slower, cheaper, and possibly removable
devices. The objectives are to minimize access time to data and maximize available media
capacity.
Hierarchical storage management is implemented in many IBM products, such as Tivoli

Storage Manager, in System i®, and in z/OS in the combination of the storage management
subsystem (SMS), DFSMShsm, DFSMSdss, and DFSMSrmm.
Tivoli Storage Manager HSM solutions are applied to data on storage media, such as disk;
the data is automatically migrated from one level of storage media to the next level based on
some predefined policy. Tivoli Storage Manager offers different kinds of HSM functionality.
HSM in the Tivoli Storage Manager server

One level of HSM is related to how the Tivoli Storage Manager server stores data. The
TivoliStorage Manager server stores data on storage pools or collections of storage volumes
of the same media type, as discussed in “Tivoli Storage Manager storage management” on
page 91. You can map different Tivoli Storage Manager storage pools to different device
types, and they can be concatenated together into a hierarchy using the Tivoli Storage
Manager nextstgpool parameter.
Figure 13-17 illustrates a Tivoli Storage Manager server hierarchy with three storage pools.
Storage pools are managed by threshold; each pool has a high threshold and a low threshold.
When the amount of data in the storage pool exceeds the high threshold, Tivoli Storage
Manager initiates a migration process to move the data. The data is moved to a destination
called next storage pool, the next storage pool is defined a storage pool parameter in the
original storage pool. So, in the example we see that poolfast has a next storage pool called
poolslow. The migration process will move data from the poolfast to poolslow; the process
starts when the amount of data stored in poolfast exceeds the high migration threshold and
stops when it reaches the low threshold.

Figure 13-17 Tivoli Storage Manager server migration processing
Tivoli Storage Manager offers additional parameters to control migration of data from one
storage pool to the next. One of these is migdelay that specifies the minimum number of days
that a file must remain in a storage pool before the file becomes eligible for migration to the
next storage pool.
HSM for file systems

Tivoli Storage Manager offers two separate HSM clients for file systems: one for UNIX and
one for Windows environments.
In both cases, the HSM client resides on the file server where you want to perform space
management. It moves files from the local file system to lower cost storage managed by the
Tivoli Storage Manager server, and this movement is called migration. Tivoli Storage
Manager performs this movement based on criteria such as file size and age.
Moving a file to the Tivoli Storage Manager server implies that the file is removed from the
Tivoli Storage Manager client. The client file system continues to see the file as if it were still
on local disk. When a request to access the file occurs, the HSM client intercepts the file
system requests and, depending on operating system platform, either recalls the file to
primary storage or, in some cases, can redirect the file system request to secondary storage.
These operations are performed transparently to the file system request even though the
request can be slightly delayed because of the tape mount processing.
Figure 13-18 Illustrates a sample HSM storage hierarchy built to minimize storage costs.

Pool A: High end disk.

Migrate to PoolB after
data 14 days non-use
Pool A
te
gra
Pool B: Cheap SATA disk.
Migrate to Pool C if capacity
Mi
Recall
utilization exceeds 80%
Recall
data
Pool B
Mi
gra
Pool C
tio
n Tape library
data
Pool C
Figure 13-18 Sample cost-based HSM storage hierarchy
HSM for UNIX clients

The IBM Tivoli Storage Manager for Space Management for UNIX (HSM) client migrates files
from your local file system to storage and recalls them either automatically or selectively.
Migrating files to a distributed storage device frees space for new data on your local file
system.
Your Tivoli Storage Manager administrator defines management classes to files. You, as root
user:
򐂰 Select space management options and settings.
򐂰 Assign management classes to your files.
򐂰 Exclude files from space management.
򐂰 Schedule space management services.
These options and settings determine which files are eligible for automatic migration, the
order in which files are migrated, where the migrated files are stored, and how much free
space is maintained on your local file system. You prioritize files for migration by their file
size, or by the number of days since your files were last accessed. Stub files that contain the
necessary information to recall your migrated files remain on your local file system so that the
files appear to reside locally. When you access migrated files, they are recalled automatically
to your local file system. This is different from archiving, which completely removes files from
your local file system.
The HSM client provides space management services for locally mounted file systems, and it
migrates regular files only. It does not migrate character special files, block special files,
named pipe files, or directories.
File migration, unlike file backup, does not protect against accidental file deletion, file
corruption, or disk failure. Continue to back up your files whether they reside on your local file
system or in Tivoli Storage Manager storage. You can use the IBM Tivoli Storage Manager
backup-archive client to back up and restore migrated files in the same manner as you would
back up and restore files that reside on your local file system. If you accidentally delete stub
files from your local file system, or if you lose your local file system, you can restore the stub
files from Tivoli Storage Manager.

For planned processes, such as storing a large group of files in storage and returning them to
your local file system for processing, use the archive and retrieve processes. You can use the
backup-archive client to archive and retrieve copies of migrated files in the same manner as
you would archive and retrieve copies of files that reside on your local file system. HSM
supports various file systems. Currently, these integrations exist:
򐂰 File system proprietary integration
Data can be directly accessed and read from any tier in the storage hierarchy. This is
supported on JFS on AIX.
򐂰 DMAPI standard-based integration
The Data Management Application Programming Interface (DMAPI) standard has been
adopted by several storage management software vendors. File system vendors focus on
the application data management part of the protocol. Storage management vendors
focus on the hierarchical storage management part of the protocol. Tivoli Storage
Manager HSM Client supported platforms currently are: GPFS on AIX, VxFS on Solaris,
GPFS on xLinux, and VxFS on HP.
HSM for Windows clients

HSM for Windows offers automated management features, such as:
򐂰 Policy-based file selection to apply HSM rules to predefined sets of files
򐂰 On-demand scheduling to define when to perform HSM automatic archiving
򐂰 Transparent recall to automatically have an application to reference a migrated file
The policies or rules that HSM for Windows supports allow you to filter files based on
attributes, such as:
򐂰 Directory name
򐂰 File types, based on the extensions
򐂰 Creation, modification, or last access date of file
Automatic archiving performs archiving operations based on inclusion or exclusion of

directories and subdirectories and inclusion or exclusion of file extensions. In addition, you
can configure filter criteria based on creation, modification, and last access date.


Draft Document for Review November 1, 2010 9:32 am 7875bibl.fm
Related publications
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.
IBM Redbooks
For information about ordering these publications, see “How to get Redbooks” on page 517.
Note that some of the documents referenced here may be available in softcopy only.
򐂰 IBM Scale Out Network Attached Storage Concepts, SG24-7874
򐂰 IBM eServer xSeries and BladeCenter Server Management, SG24-6495
򐂰 Configuration and Tuning GPFS for Digital Media Environments, SG24-6700
򐂰 IBM Tivoli Storage Manager Implementation Guide, SG24-5614
Online resources
These Web sites are also relevant as further information sources:
򐂰 SONAS Support Site
http://www.ibm.com/storage/support/
and select:
Product family: Network Attached Storage (NAS)
Product: Scale Out Network Attached Storage
Click Go.
򐂰 Support for IBM System Storage, TotalStorage and Tivoli Storage products
http://www.ibm.com/storage/support/
򐂰 Additional GPFS documentation sources
http://www.ibm.com/systems/gpfs
http://www-03.ibm.com/systems/software/gpfs/resources.html
򐂰 NFS V4 ACL information
http://www.nfsv4.org/
How to get Redbooks

You can search for, view, or download Redbooks, Redpapers, Technotes, draft publications
and Additional materials, as well as order hardcopy Redbooks publications, at this Web site:
ibm.com/redbooks

7875bibl.fm Draft Document for Review November 1, 2010 9:32 am
Help from IBM

IBM Support and downloads
ibm.com/support
IBM Global Services

ibm.com/services

Draft Document for Review November 1, 2010 9:32 am 7875IX.fm
Index
session-oriented 142
Numerics CLI access 337
36 port Infiniband switch CLI credentials 250
storage capacity 225 CLI tasks 341
cluster backup
A TSM 182
access control list 142 cluster configuration information 182
ACL Cluster management 343
modify 299 cluster replication 194
acoustics tests 247 clustered trivial database 257
Active Energy Manager component 61 cnrsscheck command 275
architecture 8 configuration changes 46
async replication 255 configuration information backup 212
attachnw command 297 configuration sizing 227
authentication 294 Console users 336
authentication environment, 252 CTDB 178, 287
authentication method 259 CTDB layer 262
authentication methods 143 CTDB tickle-acks 178
customer-supplied racks 61
B
backup 183 D
bacupmanagement node command 182 data access failover 178
banned node 179 data blocks 256
Base Rack data growth contributors 3
Feature Code 9004 222 Data Management API 188
feature code 9005 220 database application
base rack 62 file access 4
feature code 9003 62 default network group 151
Feature Code 9004 63 direct attached storage 6
Feature Code 9005 63 disaster recovery 211
base rackKFeature Code 9003 221 disaster recovery purposes 255
Baseboard Management Controller 55 disk scrubbing 67
Block I/O 4 DMAPI 188
block size 256 Domain Name Servers 143
bonded IP address 58, 60 drive configurations 67
bonding 148 drive options 66
bonding interface dual-inline-memory modules 218
hardware address 149
bonding mode E
mode 1 149 Ethernet connections
bonding modes 149 six additional 53
mode 6 149 Ethernet network
external ports 67
C Ethernet switch
Call Home feature 249 internal private 48
cfgad command 260, 295 Ethernet switches 47
cfgldap command 261 expansion unit 52
CIFS expansion units 57
access control list 142 Exports 378
authorization 142 external connections 49
export configuration 322 external ethernet connections 49
file lock 142 external network 152
file shares 142

7875IX.fm Draft Document for Review November 1, 2010 9:32 am
F SAS HDD 43
failover 147 single point of failure 55
NFS 181 Interface nodes 146
protocol behaviors 181 interface nodes 10
failover considerations 180 network group 149
failover failback 146 optional features 44
failure groups 258 Internal IP addresses
FAT file system 140 ranges 49
File I/O 4 internal private management network 56
file level security 142 IP address ranges 49, 152
file restore 185
file shares 142 L
file system LAN free backup 182
concept 140 LDAP server 295
related tasks 319 LDAP software 261
file system concept 140 Lightweight Directory Access Protocol 252
filesets 363 locking 142
Filesystem management 345 lsnwinterface command 180
floor load 244
M
G manage Users 335
General Parallel File System 7 Management ethernet network 59
global namespace 10, 144 management node
GPFS Filesystem 288 NTP server 153
GUI tasks 341 management node connections 56
marketplace requirements 2
H applications 3
hardware architecture 41 master file system 257
hardware overview 42 Master filesystem 290
high availability 178 maximum transmission unit 153
high availability design 11 maxsess parameter 182
Microsoft Active Directory 260
migration 427
I
Infiniband connections 58
InfiniBand switch N
36-port 47 NAS
96-port 47 access 6
Infiniband switch limitations 6
configuration 217 overview 6
InfiniBand switches 47 NAT Gateway 152, 292
Integrated Baseboard Management Controller 49 Network Address Translation 152
Integrated Management Module 49 Network Address Translation gateway 292
integration 259 network attached storage 140
intelligent PDU 246 network bonding 148
Interface Expansion Rack 224 network group 149
interface expansion racks 62 network interface name 151
Interface Node network interface names 151
configuration 218 network router 152
panel 317 network traffic
Interface node separate 151
components 43 networking 139
failover failback 146 NFS
rear view 54 access control 142
interface node locking 142
cache memory 10 stateless service 142
connections 53 NFS clients 323
Intel NIC 44 NFS shares
Qlogic network adapter 44 DNS host names 148

Draft Document for Review November 1, 2010 9:32 am 7875IX.fm
NFS verses CIFS 142 authentication methods 143

NIC 140 base rack 62
nmon tool 238 CLI tasks 341
node cluster backup 182
failover 147 cluster configuration information 182
NTP server 153 component connections 53
configuration changes 46
drive options 66
O Ethernet ports 43
onboard Ethernet ports 56 external ports 67
overview 1 GUI tasks 341
hardware architecture 41
P hardware overview 42
parallel grid architecture 49 Health Center 412
perfmon tool 239 Infiniband connections 58
Policies panel 323 interface expansion rack 63
Policy details 324 interface node 45
port configuration IP address ranges 152
switch migration 225 management node connections 56
power consumption 247 Network Address Translation 152
power distribution units 61 operating system 45
Private Network range 262 overview 1, 7
protocol behaviors 181 raw storage 9
SAS drives 66
scale out capability 10
Q software 10, 43
quorum topology 250 storage controller 51
storage expansion uni 53
storage management 11
R storage node connections 55
rack configurations
switches 47
power distribution units 61
Tivoli Storahe Manager 182
racks
SONAS Base Rack
customer supplied 61
XIV storage 70
raid storage controller 51
SONAS snapshots 188
Redbooks Web site 517
space efficient 188
Contact us xviii
space requirements 244
remote configuration 249
startmgtsrv command 397
replication schedule 117
stateful 142
resolv.conf file 291
stateless service 142
Resume Node command 344
storage 254
rmsnapshot 191
storage controller
root fileset 325
RAID 5 66
rsync transfer 117
rebuilds 52
Storage Expansion Rack 223
S storage expansion unit 53
SAS drive storage expansion units 57
configuration 217 Storage node
SAS drives 66 HDDs 45
SATA drive storage node 45
configuration 218 contents 45
scale out capability 10 storage node connections 55
schedule tasks 408 storage nodes
service maintenance port 55–56 HA pairs 45
sizing 227 maximum 45
software 10 Storage Pod
Software Cluster Manager 146 configuration 217
SONAS 188 Storage pod
addressing 144 expansion 50
architecture 8 storage pod 50
Index 521
7875IX.fm Draft Document for Review November 1, 2010 9:32 am
connectivity 57
storage pool 258
user 259
suspendnode command 179, 344
switch configurations 217
switches 47
symbolic host names. 143
synchronous replication 196
system overhead 257
T
Tivoli Storage Manager 182
transparent recall 187
TSM client software 178
TSM database 183
TSM database sizing 183
TSM HSM clients 185
TSM server
maxsess parameter 182
U
unmount file system 354
V
VLAN 151
VLAN tagging 151
VLAN trunking 151
W
weight distribution 244
Windows Active Directory 142
workload analyzer tools 238
X
XIV
Metadata replication 71
SONAS configuration 68
XIV configuration
component considerations 70
XIV storage 68
attachment to SONAS 69

Draft Document for Review November 1, 2010 9:32 am
Back cover ®
IBM Scale Out Network

Attached Storage
Architecture, Planning ®
Learn to setup and IBM Scale Out Network Attached Storage (SONAS) is a Scale Out NAS
offering designed to manage vast repositories of information in INTERNATIONAL
customize the IBM
enterprise environments requiring very large capacities, high levels of TECHNICAL
Scale Out NAS
performance, and high availability. SUPPORT
ORGANIZATION
Details hardware and The IBM Scale Out Network Attached Storage provides a range of
software architecture reliable, scaleable storage solutions for a variety of storage
requirements. These capabilities are achieved by using network
Includes daily access protocols such as NFS, CIFS, HTTP and FTP. Utilizing built-in
administration RAID technologies all data is well protected with options to add BUILDING TECHNICAL
additional protection through mirroring, replication, Snapshots and INFORMATION BASED ON
scenarios backup. These storage systems are also characterized by simple PRACTICAL EXPERIENCE
management interfaces that make installation, administrating and
troubleshooting uncomplicated and straight forward. IBM Redbooks are developed
by the IBM International
This book provides the reader with details of the hardware and Technical Support
software architecture that make up the SONAS appliance along with Organization. Experts from
configuration, sizing and performance considerations. It provides IBM, Customers and Partners
information of the integration of the SONAS into an existing network. from around the world create
The administration of the SONAS appliance through the GUI and CLI is timely technical information
demonstrated as well as backup and availability scenarios. A quick
based on realistic scenarios.
Specific recommendations
start scenario takes you through common SONAS administration tasks are provided to help you
to familarize you with the SONAS system. implement IT solutions more
effectively in your
environment.
For more information:

ibm.com/redbooks
SG24-7875-00 ISBN

SONAS Architecture and Implementation

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

SONAS Architecture and Implementation

Caricato da

Copyright:

Formati disponibili

Front cover

Draft Document for Review November 1, 2010 9:32 am SG24-7875-00

IBM Scale Out Network

Details hardware and software

Includes daily administration

International Technical Support Organization

SONAS Architecture and Implementation

First Edition (November 2010)

This document created or updated on November 1, 2010.

© Copyright International Business Machines Corporation 2010. All rights reserved.

iv SONAS Architecture and Implementation

Chapter 1. Introduction to Scale Out File Network Attached Storage . . . . . . . . . . . . . . 1

Chapter 2. Hardware architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

© Copyright IBM Corp. 2010. All rights reserved. v

2.3.1 SONAS storage controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Chapter 3. Software architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

vi SONAS Architecture and Implementation

3.9.4 How to protect a SONAS files without TSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Chapter 4. Networking considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Chapter 5. SONAS policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

5.4.2 Create storage pool using the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

Chapter 6. Backup and recovery, availability and resiliency functions . . . . . . . . . . . 177

Chapter 7. Configuration and sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

Chapter 8. Installation planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

viii SONAS Architecture and Implementation

8.1 Physical planning considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

Chapter 9. Installation and configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

Chapter 10. SONAS administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

10.3.1 Add/Delete cluster to the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

Chapter 11. Migration overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427

x SONAS Architecture and Implementation

11.2 Migrating files and directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431

Chapter 12. Getting started with SONAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

Chapter 13. Hints, tips and how to information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469

13.3.4 GPFS unable to mount . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473

Appendix A. Additional component detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475

Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517

xii SONAS Architecture and Implementation

© Copyright IBM Corp. 2010. All rights reserved. xiii

The following terms are trademarks of other companies:

xiv SONAS Architecture and Implementation

The team who wrote this book

Mary Lovelace is a Consulting IT Specialist at the International Technical Support

Vincent Boucher is an IT Specialist as a member of the EMEA Products and Solutions

© Copyright IBM Corp. 2010. All rights reserved. xv

Lukasz Razmuk is an IT Architect at IBM Global Technology Services in Warsaw, Poland.

xvi SONAS Architecture and Implementation

Thanks to the following people for their contributions to this project:

Now you can become a published author, too!

xviii SONAS Architecture and Implementation

Chapter 1. Introduction to Scale Out File

© Copyright IBM Corp. 2010. All rights reserved. 1

1.1 Marketplace requirements

Figure 1-1 Explosion of data demands an Information -Led transformation

2 SONAS Architecture and Implementation

Figure 1-2 Today’s workloads demand new approach to data access

Chapter 1. Introduction to Scale Out File Network Attached Storage 3

• Web 2.0 and service-oriented architecture

1.2 Understanding I/O

1.2.1 File I/O

4 SONAS Architecture and Implementation

Figure 1-3 File I/O

1.2.2 Block I/O