Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Anyone who has ever worked this product knows how quickly things can go bad. This
short TOI is intended to help you locate information within Sun, ask the right questions
before going onsite and to be able to identify potential problems and gather information
to help resolve problems quickly.
This also assumes you have some familiarity with the product and don't need help
installing or bringing up or working with the GUI.
We'll start with a brief (very brief) overview of what the products are...
All are the same from firmware and application perspective just one or two controllers
and a varying amount of disks. They use a hardware RAID controller designed by LSI.
They all have battery backup that retains data in cache in the event of a power outage,
they do not de-stage data to disk until power is restored. They all have data cache.
A1000
Single controller unit made from a single D1000, either 8 disk version or 12 disk version.
A3500
A dual controller unit that uses D1000 trays.
A3500fc
Same as A3500 but is the only fiber channel model.
Listed next are some important web sites with valuable information.
http://storage.east/mreid
This is Mark Reids web page, lots of good data and a firmware to Raid Manager
application table that is invaluable for finding mismatches.
http://acts.ebay/storage/A3x00
This is Bob De Guc's page, it has a good section on nvsram and lots of other good
information.
http://webhome.ebay/A3x00
This is the Network Storage – Dilbert & Sonoma. Tons of great resources, a must see.
http://cpre-emea.uk/cgi-bin/df125853/lsiproc.tcl
http://pts-americas.west/nws/products/A3500/index.html
And of course.....
http://sunsolve.central.sun.com
Always get the latest patch matrix, Infodoc 43483, (formerly Early Notifier 20029) and
your patches here.
1. VERIFY with data what version Raid Manager is being used with pkginfo -l SUNWosar .
2. VERIFY with data the firmware version for the application is appropriate using
module profile or raidutil -c.
3. Collect a module profile, Start RM6 GUI got to,configuration >File>Save Module
Profile, and check nvsram. If using 6.22 update1, the nvsram, listed as “Product
Revision” should be 0003 if not the upgrade is incomplete and this needs to be fixed.
4. Collect explorer data to confirm patch levels against Infodoc 43483and check
messages. In the explorer under /disks/sonoma you will find output of many raid manger
commands which can be used for trouble shooting.
Also, in the disks/sonoma/usr_lib_osa directory you will find the rmlog.log files. THESE
ARE CRUCIAL. With this file you can see the errors that raid manger has been reporting
with a simple tool, rmlogscan. REVIEW THE ERROR MESSAGES. Frequently there
will be more than one of these files that has been renamed, get it also.
6. Use the information gathered to decide how to proceed, look at the over all health and
layout of the storage. Always check to see if they have layed out luns in a fashion that
will give tehm most redundancy. In larger arrays you should be able to survive a tray
failure without data loss. Use the module profile and plot the layout.
This concludes the inband method of data gathering. See examples on following pages.
* NOTE: Both 6.1.1 u1 and u2 report the same information from the
pkginfo command BUT if you do a "pkgparam SUNWosafw |grep 106513 " only
6.1.1u2 will come back with anything.
2. VERIFY with data the firmware version for the application is appropriate.
Bad Patch If you see these firmware levels read the READMES and proceed
accordingly.
3. Collect a module profile, Start RM6 GUI got to,configuration >File>Save Module
Profile, and check nvsram. If using 6.22 update1, the nvsram, listed as “Product
Revision” should be 0003 if not the upgrade is incomplete and this needs to be fixed.
Host: sykes
Controllers:
Number of Drives = 6
Host-side ID (dec): 5
Drives:
Capacity
LUN Controller (MB) RAID Level
0 c4t5d0 34389 5
Hot Spare - - 1 - -
1 1 5 5 34389 0
4. Collect explorer data to confirm patch levels against Infodoc 43483and check
messages. In the explorer under /disks/sonoma you will find output of many raid
manger commands which can be used for trouble shooting.
Also, in the disks/sonoma/usr_lib_osa directory you will find the rmlog.log files.
THESE ARE CRUCIAL. With this file you can see the errors that raid manger has
been reporting with a simple tool, rmlogscanmsg. REVIEW THE ERROR
MESSAGES. Frequently there will be more than one of these files that has been
renamed, get it also.
./rmlogscanmsg
USAGE: @(#)rmlogscan 2.0
rmlogscanmsg -l [-m] [-f logfile] | -f logfile | -m
-l run on live system, defaults to read /usr/lib/osa/rmlog.log
-f logfile - rmlog.log filename
-f is required for offline checks
if -f and -l are used together, the -f logfile takes precedence
-m message log format, default is /var/adm/messages
The following example is from an escalation that multiple controllers had been replaced
on. The rmlog.log showed the real problem all the time. A failed environmental card in a
D1000. VERY IMPORTANT to read the logs.
./rmlogscanmsg -f rmlog.log
==========================================================================
==========================================================================
07/03/2002 17:42:43 c3t5d0s0 1T93600641 Message text: Successfully formatted logical unit 4 at RAID 5
with a capacity of 68938 MB.
11/15/2002 03:22:12 c3t5d3s0 1T93600641, 3, Optimal, Unit Attention
ASC/ASCQ A0 00 Write-Back Cache (With Mirroring) Could Not Be Enabled
11/15/2002 03:22:12 c3t5d4s0 1T93600641, 4, Optimal, Unit Attention
ASC/ASCQ A0 00 Write-Back Cache (With Mirroring) Could Not Be Enabled
11/15/2002 03:22:11 c3t5d2s0 1T93600641, 2, Optimal, Unit Attention
ASC/ASCQ A0 00 Write-Back Cache (With Mirroring) Could Not Be Enabled
11/15/2002 03:22:12 c4t4d0s0 1T10400808, 0, Optimal, Unit Attention
ASC/ASCQ A0 00 Write-Back Cache (With Mirroring) Could Not Be Enabled
11/15/2002 03:22:12 c4t4d1s0 1T10400808, 1, Optimal, Unit Attention
ASC/ASCQ A0 00 Write-Back Cache (With Mirroring) Could Not Be Enabled
02/08/2003 05:37:06 c3t5d2 1T93600641 Message text: The controller firmware version for controller
c3t5d2 (module: aurora_001) is 2.05.06. The recommended firmware version for this application is
3.00.00 or higher.\n
02/08/2003 05:37:06 c4t4d0 1T10400808 Message text: The controller firmware version for controller
c4t4d0 (module: aurora_001) is 2.05.06. The recommended firmware version for this application is
3.00.00 or higher.\n
02/08/2003 05:52:09 c3t5d2 1T93600641, 2, Optimal, Unit Attention
ASC/ASCQ 3F C7 Non-Media Component Failure
Sub-enclosure Group Power Supply , Failed
END
Rev. 41a 01/28/2004
Inquires and especially corrections to chris.coffin@sun.com su: Sonoma Trouble Guide