Sei sulla pagina 1di 68

Cluster Administration and

Troubleshooting Guide

Version Number (1.0)


January 5, 2004
Author: Fabian SIRACH
Microsoft Services, France

GLOBE ISIT OASIS2

< Cluster Administration and


Troubleshooting Guide>

 2001 Nestec Ltd. – GLOBE – Global Business Excellence. http://veviis01.nestec.ch/GLOBE/ GL-GLOBE


Proprietary document not to be divulged outside the Company.
Printed by Nestec Ltd., CH-1800 Vevey, Switzerland.
Cluster Administration and
Troubleshooting Guide

Document Control
Document Owner Fabian SIRACH
Review Cycle in months 12

Date of update Updated by Changes Made (section Version Status of


(dd/mm/yyyy) (author name) numbers and description) # document

05/01/2004 Fabian SIRACH Draft Creation 1.0 Draft


03/02/2004 Fabian SIRACH Revision 1.1 Final

 2001 Nestec Ltd. – GLOBE – Global Business Excellence. http://veviis01.nestec.ch/GLOBE/ GL-GLOBE


Proprietary document not to be divulged outside the Company.
Printed by Nestec Ltd., CH-1800 Vevey, Switzerland.
Cluster Administration and
Troubleshooting Guide

Table of Contents

DOCUMENT CONTROL..........................................................................................................2

TABLE OF CONTENTS...........................................................................................................3

1 PREFACE...............................................................................................................................6
1.1 DOCUMENT AUDIENCE............................................................................................................6
1.2 PREREQUISITES AND RELATED DOCUMENTATION............................................................................6
2 CLUSTER ADMINISTRATION...............................................................................................7
2.1 RESOURCES MANAGEMENT......................................................................................................7
2.2 GROUP MANAGEMENT............................................................................................................8
2.3 NODE MANAGEMENT............................................................................................................10
2.4 APPLYING SERVICE PACK AND HOTFIX.....................................................................................13
2.5 CHKDSK AND AUTOCHK.........................................................................................................13
2.6 CLUSTER COMMAND LINE.....................................................................................................14
3 TROUBLESHOOTING.........................................................................................................15
3.1 ONE NODE IS DOWN............................................................................................................15
3.2 ENTIRE CLUSTER IS DOWN....................................................................................................17
3.3 ONE OR MORE SERVERS QUIT RESPONDING.............................................................................17
3.4 CLUSTER SERVICE DO NOT START..........................................................................................17
3.5 CLUSTER SERVICE STARTS BUT CLUSTER ADMINISTRATOR WILL NOT CONNECT...................................18
3.6 CLUSTER ADMINISTRATOR STOPS RESPONDING ON FAILOVER.......................................................19
3.7 GROUP/RESOURCES FAILOVER PROBLEMS................................................................................20
3.8 QUORUM RESOURCES FAILURE..............................................................................................23
3.9 NETWORK NAME RESOURCE DOES NOT GO ONLINE..................................................................25
3.10 PHYSICAL DISK RESOURCE PROBLEM....................................................................................25
3.11 CLIENT CONNECTIVITY PROBLEM...........................................................................................26
3.11.1 Clients have intermittent Connectivity Based on Group Ownership......................26
3.11.2 Clients do not Have any Connection with the Cluster............................................27
3.11.3 Clients have Problems Accessing Data Through a File Share..............................27
3.11.4 Client Experience Intermittent Access....................................................................28
4 APPENDIX: MSCS EVENT MESSAGES............................................................................29
4.1 EVENT ID 1000................................................................................................................29
4.2 EVENT ID 1002................................................................................................................29
4.3 EVENT ID 1006................................................................................................................30
4.4 EVENT ID 1007................................................................................................................30
4.5 EVENT ID 1009................................................................................................................30
4.6 EVENT ID 1010................................................................................................................31
4.7 EVENT ID 1011.................................................................................................................31
4.8 EVENT ID 1012................................................................................................................31
4.9 EVENT ID 1015................................................................................................................31
 2001 Nestec Ltd. – GLOBE – Global Business Excellence. http://veviis01.nestec.ch/GLOBE/ GL-GLOBE
Proprietary document not to be divulged outside the Company.
Printed by Nestec Ltd., CH-1800 Vevey, Switzerland.
Cluster Administration and
Troubleshooting Guide

4.10 EVENT ID 1016..............................................................................................................32


4.11 EVENT ID 1019...............................................................................................................32
4.12 EVENT ID 1021..............................................................................................................32
4.13 EVENT ID 1022..............................................................................................................33
4.14 EVENT ID 1023..............................................................................................................33
4.15 EVENT ID 1024..............................................................................................................33
4.16 EVENT ID 1034..............................................................................................................33
4.17 EVENT ID 1035..............................................................................................................34
4.18 EVENT ID 1036..............................................................................................................34
4.19 EVENT ID 1037..............................................................................................................35
4.20 EVENT ID 1038..............................................................................................................35
4.21 EVENT ID 1040..............................................................................................................35
4.22 EVENT ID 1041..............................................................................................................36
4.23 EVENT ID 1042..............................................................................................................36
4.24 EVENT ID 1043..............................................................................................................36
4.25 EVENT ID 1044..............................................................................................................37
4.26 EVENT ID 1045..............................................................................................................37
4.27 EVENT ID 1046..............................................................................................................37
4.28 EVENT ID 1047..............................................................................................................38
4.29 EVENT ID 1048..............................................................................................................38
4.30 EVENT ID 1049..............................................................................................................38
4.31 EVENT ID 1050..............................................................................................................39
4.32 EVENT ID 1051..............................................................................................................39
4.33 EVENT ID 1052..............................................................................................................39
4.34 EVENT ID 1053..............................................................................................................40
4.35 EVENT ID 1054..............................................................................................................40
4.36 EVENT ID 1055..............................................................................................................40
4.37 EVENT ID 1056..............................................................................................................41
4.38 EVENT ID 1057..............................................................................................................41
4.39 EVENT ID 1058..............................................................................................................41
4.40 EVENT ID 1059..............................................................................................................42
4.41 EVENT ID 1061..............................................................................................................42
4.42 EVENT ID 1062..............................................................................................................42
4.43 EVENT ID 1063..............................................................................................................42
4.44 EVENT ID 1064..............................................................................................................43
4.45 EVENT ID 1065..............................................................................................................43
4.46 EVENT ID 1066..............................................................................................................43
4.47 EVENT ID 1067..............................................................................................................44
4.48 EVENT ID 1068..............................................................................................................44
4.49 EVENT ID 1069..............................................................................................................44
4.50 EVENT ID 1070..............................................................................................................45
4.51 EVENT ID 1071..............................................................................................................45
4.52 EVENT ID 1073..............................................................................................................45
4.53 EVENT ID 1077..............................................................................................................45
4.54 EVENT ID 1080..............................................................................................................45
4.55 EVENT ID 1093..............................................................................................................46
 2001 Nestec Ltd. – GLOBE – Global Business Excellence. http://veviis01.nestec.ch/GLOBE/ GL-GLOBE
Proprietary document not to be divulged outside the Company.
Printed by Nestec Ltd., CH-1800 Vevey, Switzerland.
Cluster Administration and
Troubleshooting Guide

4.56 EVENT ID 1096..............................................................................................................46


4.57 EVENT ID 1097..............................................................................................................46
4.58 EVENT ID 1098..............................................................................................................46
4.59 EVENT ID 1100...............................................................................................................47
4.60 EVENT ID 1102...............................................................................................................47
4.61 EVENT ID 1104...............................................................................................................47
4.62 EVENT ID 1105...............................................................................................................47
4.63 EVENT ID 1107...............................................................................................................48
4.64 EVENT ID 1109...............................................................................................................48
4.65 EVENT ID 1115...............................................................................................................48
5 APPENDIX: RELATED EVENT MESSAGES......................................................................49
5.1 EVENT ID 9......................................................................................................................49
5.2 EVENT ID 101..................................................................................................................49
5.3 EVENT ID 1004................................................................................................................50
5.4 EVENT ID 1005................................................................................................................50
5.5 EVENT ID 2511.................................................................................................................50
5.6 EVENT ID 4199................................................................................................................50
5.7 EVENT ID 5719................................................................................................................51
5.8 EVENT ID 7000................................................................................................................51
5.9 EVENT ID 7013................................................................................................................51
5.10 EVENT ID 7023..............................................................................................................52
6 APPENDIX: MAINTENANCE TOOLS.................................................................................53
6.1 WINDOWS 2000 TOOLS........................................................................................................53
6.2 WINDOWS 2000 RESOURCE KIT TOOLS...................................................................................54
7 APPENDIX: USING AND READING THE CLUSTER LOGFILE........................................55
7.1 CLUSTERLOG ENVIRONMENT VARIABLE..............................................................................55
7.2 OPERATING SYSTEM VERSION NUMBER AND SERVICE PACK LEVE..................................................55
7.3 CLUSTER SERVICE STARTUP..................................................................................................55
7.4 LOGFILE ENTRIES FOR COMMON FAILURES................................................................................59
8 APPENDIX: Q258078 CLUSTER SERVICE STARTUP OPTIONS....................................64

 2001 Nestec Ltd. – GLOBE – Global Business Excellence. http://veviis01.nestec.ch/GLOBE/ GL-GLOBE


Proprietary document not to be divulged outside the Company.
Printed by Nestec Ltd., CH-1800 Vevey, Switzerland.
Cluster Administration and
Troubleshooting Guide

1 Preface
This document intends to provide general operations required for administering and
troubleshooting Windows 2000 Cluster Service for NESTLE Enterprise Portal on Windows
2000 Operating System.

1.1 Document Audience


Windows 2000 System Administrators

1.2 Prerequisites and Related Documentation


Good Knowledge about Microsoft Windows 2000 Advanced Server
Good Knowledge about Microsoft Windows 2000 Cluster Service
W2K_GEO_SERVICE_CHECK Procedure
W2K_GEO_SERVICE_FAILOVER Procedure
W2K_GEO_SERVICE_SHUTDOWN Procedure
W2K_GEO_SERVICE_TAKEOVER Procedure
W2K_Cluster Server_Manual Takeover Procedure
W2K_Operating System_Restart Service Procedure
W2K_Server Status_Check Patch Level Procedure
W2K_Software_Apply Hotfix Procedure
Operation Runbook W2K GeoCluster Document
Operation Runbook W2K Document

8909675.doc Page 6 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

2 Cluster Administration

2.1 Resources Management


Bring a resource online:

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative


Tools, and then click Cluster Administrator).
2. In the console tree, click the Resources folder.
3. In the details pane, click the resource you want.
4. On the File menu, click Bring Online.

Refer to
W2K_GEO_SERVICE_FAILOVER Procedure
W2K_GEO_SERVICE_TAKEOVER Procedure

Take a resource offline:

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative


Tools, and then click Cluster Administrator).
2. In the console tree, click the Resources folder.
3. In the details pane, click the resource you want.
4. On the File menu, click Take Offline.

Remark: taking a resource offline causes all resources that depend on that resource to be
taken offline.

Refer to
W2K_GEO_SERVICE_FAILOVER Procedure
W2K_GEO_SERVICE_TAKEOVER Procedure

8909675.doc Page 7 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

2.2 Group Management


Bring a group online:

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative


Tools, and then click Cluster Administrator).
2. In the console tree, double-click the Groups folder.
3. In the details pane, click the group you want.
4. On the File menu, click Bring Online.

Take a group offline:

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative


Tools, and then click Cluster Administrator).
2. In the console tree, double-click the Groups folder.
3. In the details pane, click the group you want.
4. On the File menu, click Take Offline.
Remark: resources in a group go offline in the order of their dependencies.

Move a group to another node:

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative


Tools, and then click Cluster Administrator).
2. In the console tree, double-click the Groups folder.
3. In the details pane, click the group you want.
4. On the File menu, click Move Group.
Remark: after the transfer, the new node owns all resources in the group, as the Owner
column in the details pane should reflect.

8909675.doc Page 8 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

Specify preferred owners of a group:

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative


Tools, and then click Cluster Administrator).
2. In the console tree, click the Groups folder.
3. In the details pane, click the group you want.
4. On the File menu, click Properties.
5. On the General tab, next to Preferred owners, click Modify.
6. In the Modify Preferred Owners dialog box, enter any changes you want to make:
1. To add one or more preferred owners, under Available nodes, not Preferred
owners, click the nodes you want to add, and then click the right arrow.
2. To remove a preferred owner, under Preferred owners, click the nodes you
want to remove, and then click the left arrow.
3. To change the priority of a preferred owner, click the node, and then click the up
or down arrow.

Set group failover policy:

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative


Tools, and then click Cluster Administrator).
2. In the console tree, click the Groups folder.
3. In the details pane, click the group you want.
4. On the File menu, click Properties.
5. On the Failover tab, type values for Threshold and Period.
Remark: the failover policy for a group is the maximum number of times (Threshold) that
the group is allowed to fail over in the specified number of hours (Period) before it is
taken completely offline. If a group fails over more often than this, the Cluster service
leaves it offline.

8909675.doc Page 9 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

Set group failback policy:

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative


Tools, and then click Cluster Administrator).
2. In the console tree, click the Groups folder.
3. In the details pane, click the group you want.
4. On the File menu, click Properties.
5. On the Failback tab, click Prevent failback or Allow failback.
If you click Allow failback, then either click Immediately, or click Failback between
and set the time interval.
Remark: to set the time interval for Failback between, enter numbers between 0 and 23
for the beginning and end of the interval. If the first number is greater than the second, the
interval will end on the following day. The numbers correspond to the local time of the
cluster group, as read on a 24-hour clock.

2.3 Node Management


Stopping the Cluster service:
1. Open Cluster Administrator (click Start, point to Programs, point to Administrative
Tools, and then click Cluster Administrator).
2. In the console tree, click the node.
3. On the File menu, click Stop Cluster Service.
Remark: when you stop the Cluster service on a node, you prevent clients from accessing
cluster resources through that node. When you stop the Cluster service on a node, all
groups move to the other node (if the failover policies allow it).
Refer to
W2K_GEO_SERVICE_CHECK Procedure
W2K_GEO_SERVICE_FAILOVER Procedure
W2K_GEO_SERVICE_SHUTDOWN Procedure

Starting the Cluster service:

8909675.doc Page 10 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative


Tools, and then click Cluster Administrator).
2. In the console tree, click the node.
3. On the File menu, click Start Cluster Service.

Installing Cluster Service when the other node is online:

1. Open Add/Remove Programs (click Start, point to Settings, click Control Panel, and
then double-click Add/Remove Programs),
2. Click Add/Remove Windows Components,
3. The Welcome to the Windows Components wizard will begin.
4. In Components, select Cluster Service.
5. Click Next.
6. Cluster Service files are located on the Windows 2000 Advanced Server CD-ROM.
Enter Z:\i386.
7. Click OK.
8. Click Next.
9. Click I Understand to accept the condition that Cluster Service is supported on
hardware from the Hardware Compatibility List only.
10. In the Create or Join a Cluster dialog, select The second or next node in the cluster,
and click Next.
11. Enter the cluster name and click Next.
12. Leave Connect to cluster as unchecked. The Cluster Service Configuration wizard
will automatically supply the name of the user account selected during the installation
of the first node. Always use the same account as you used when setting up the first
cluster node.
13. Enter the password for the account and click Next.
14. At the next dialog box, click Finish to complete configuration.
15. The Cluster Service will start. Click OK.
16. Close Add/Remove Programs.

8909675.doc Page 11 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

To validate the cluster installation, do to following:

1. Click Start, click Programs, click Administrative Tools, and click Cluster
Administrator.
2. The presence of two nodes shows that a cluster exists and is in operation.

Removing Cluster Service :

1. Open Add/Remove Programs (click Start, point to Settings, click Control Panel, and
then double-click Add/Remove Programs),
2. Click Add/Remove Windows Components,
3. The Welcome to the Windows Components wizard will begin.
4. Click Next,
5. In Components, click to clear Cluster Service, and then click Next.

Enable diagnostic Logging :

1. Open Control Panel (click Start, point to Settings, and then click Control Panel),
2. Double click System,
3. On the Advanced tab, click Environment Variables.
4. Under System variables, click New.
5. In Variable Name, specify the name of the variable. In Variable Value, specify the
name of the diagnostic log file.
For example, set Variable Name to Clusterlog and Variable Value to
C:\Temp\Cluster.log.
6. Click OK, and then click OK again. Close Control Panel.
7. Stop and restart the Cluster service.

Disable diagnostic Logging :

8909675.doc Page 12 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

1. Open Control Panel (click Start, point to Settings, and then click Control Panel),
2. Double click System,
3. On the Advanced tab, click Environment Variables.
4. Under System variables, select Clusterlog.
5. Click Delete, click OK, and then click OK again. Close Control Panel.
6. Stop and restart the Cluster service.

2.4 Applying Service Pack and Hotfix


Refer to document “GLOBE ISIT OASIS2 Applying Hotfixes”

2.5 Chkdsk and Autochk


Disks that are attached to the shared bus interact differently with Chkdsk.Exe than with
Autochk.Exe. Autochk.Exe, the system startup version of Chkdsk.Exe, does not perform file
system checks on shared drives when the system starts, even if the operations are required.
The Cluster service performs a file system integrity test for each drive when it brings a
physical disk online. The cluster automatically starts Chkdsk if it is necessary.
If you have to run Chkdsk on a drive, click the following article numbers to view the articles in
the Microsoft Knowledge Base:

• 174617 CHKDSK Runs While Running Microsoft Cluster Server Setup


• 176970 How to Run the CHKDSK /F Command on a Shared Cluster Disk

8909675.doc Page 13 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

2.6 Cluster Command Line


Using Cluster tool:

With this tool, you can do each operation that can be done with Cluster Administrator.

To verify nodes state: type cluster <cluster name> node


To verify resource state: type cluster <ressource name> resource
To verify network adapter state: type cluster <cluster name> network
To move a group: type CLUSTER <cluster name> GROUP <cluster group> /MOVETO :
<Node name>

8909675.doc Page 14 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

3 Troubleshooting

3.1 One node is Down


General Information:

Before troubleshooting, if a single node is unavailable, make sure that resources and
groups are available on the other node.

If a node is online, gather information about failure:

1. Check event logs on the online node (event log messages meaning is indicated in
Appendix)
2. Check cluster diagnostic logfile,
3. Check for the existence of a recent Memory.dmp file that may have been created from a
recent crash. If necessary, contact Microsoft Product Support Services for assistance with
this file.
4. Go to the paragraph corresponding to the failure.

Symptoms and solutions:

Symptoms Causes Solutions


Second node cannot join the You may not be using the Confirm that you are using
cluster. proper cluster name, node the proper cluster name,
name, or IP address. node name, or IP address
The Cluster Name resource Confirm that the Cluster
may not have started. Name resource started.
The Cluster service may not Confirm that the Cluster
be running on the first node.service is running on the
first node and that all
resources within the
Cluster Group are online
before installing the second
node.
Network connectivity may not Confirm that network
exist between the two nodes connectivity exists between
the two nodes
You may not have IP Confirm that you have IP
connectivity to the cluster connectivity to the cluster
address address and that the IP
address is assigned to the
correct network.

8909675.doc Page 15 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

Symptoms Causes Solutions


Second node cannot connect to The same drive letters may Confirm that the cluster
the cluster drives. not have been assigned to drives are assigned the
the cluster drives on all same drive letters on all
nodes nodes.
The SCSI devices may not Verify that each SCSI
have unique IDs. device has a unique ID.
SCSI controller IDs are
preset to seven. Reset one
SCSI controller ID to six.
The second node may not be Confirm that the second
physically connected to the node is physically
cluster drive. connected to the cluster
drive. If it is not, shut down
both nodes and the cluster
drive. Connect the nodes to
the shared SCSI bus.
Then, start the cluster drive
and start the first node.
After the Cluster service
starts on the first node,
start the second node, and
attempt to connect to the
cluster drive
The SCSI controllers on the Confirm that the SCSI
shared SCSI bus may not be controllers on the shared
correctly configured SCSI bus are correctly
configured (with both cards
configured to transfer data
at the same rate).
The devices and controllers Confirm that your devices
may not match. and controllers match.

8909675.doc Page 16 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

3.2 Entire Cluster is Down


Before troubleshooting, try to bring at least one node online. If you can achieve this
goal, the affect on users may be substantially reduced.

If a node is online or one server could be started, gather information about failure:

1. Check event logs on the online node (event log messages meaning is indicated in
Appendix)
2. Check cluster diagnostic logfile,
3. Check for the existence of a recent Memory.dmp file that may have been created from
a recent crash. If necessary, contact Microsoft Product Support Services for assistance
with this file.
4. Go to paragraph corresponding to the identified failure.

If no server can be restarted, in last resort, restore both nodes (this procedure is defined
Disaster Recovery Guide).

3.3 One or More Servers Quit Responding


If one or more servers are not responding but have not crashed or otherwise failed, the
problem may be related to:

 Domain controllers’ connectivity: Check network connectivity with domain controllers


and for other network problems (use the ping command with dc name and dc IP
address).
 configuration,
 software,
 driver issues,
 Fiber Channel Connectivity
 Connected disk devices.

For theses problems go to the corresponding chapter.

3.4 Cluster Service Do not Start


1. Check the event log messages and look at Appendix.
2. Determine if the issue comes from the service account used for Cluster Service.

Information: failures related to the service account may result in Event ID 7000 or
Event ID 7013 errors in the event log. In addition, you may receive the following error
message:

8909675.doc Page 17 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

"Could not start the Cluster Service on \\computername. Error 1069: The
service did not start because of a logon failure."

 Make sure the account is not disabled and that password expiration is not a
factor.
 This domain account needs to be a member of the local administrators group on
each server.
 The account needs the Logon as a service and Lock pages in memory rights.
 Make sure the password specified for the Cluster Service account is correct.
(Retype it and click on the apply button, try to restart the service).

3. Check to make sure the quorum disk is online and that the Fiber Channel has
proper termination and proper function.

Information: if the quorum disk is not accessible during startup, the following error
message may occur:
"Could not start the Cluster Service on \\computername. Error 0021: The device
is not ready."

If the Cluster Service is running on the other cluster node, check the cluster
logfile on that system for indications of whether or not the other node attempted to join
the cluster. If the cluster node did try to join the cluster, and the request was denied,
the logfile may contain details of the event. For example, if you evict a node from the
cluster, but do not remove and reinstall MSCS on that node, when the server attempts
to join the cluster, the request to join will be denied.

To start a cluster node if the cluster service doesn’t start and no cluster.log file exists,
use the –debug option:

1. Open a console Windows by typing CMD in the Run menu,


2. Go in the %systemroot%\cluster directory ( cd \winnt\cluster)
3. Type CLUSSVC –debug

The debug informations are then send to console.

To stop this service, type CTRL+C.

3.5 Cluster Service Starts but Cluster Administrator will not connect
If the Services utility in Administrative Tools indicates that the service is running,
and you cannot connect with Cluster Administrator to administer the cluster, the
problem may be related to:
8909675.doc Page 18 of 68 Created: 06.01.2004
Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

 the Cluster Network Name,


 to the cluster IP address resources or
 there may also be RPC-related problems.

1. Check to make sure the RPC Service is running on both nodes.


2. If it is, try to connect to a known running cluster node by the computer name. If
running Cluster Administrator on the local node, you may specify a period (.) in place
of the name when prompted. This will create a local connection and will not require
name resolution.
3. If you can connect through the computer name or using the period, check the
cluster network name and cluster IP address resources. Make sure that these
and other resources in the cluster group are online. These resources may fail if a
duplicate name or IP address on the network conflicts with either of these resources.
4. A duplicate IP address on the network may cause the network adapter to shut down.
Check the system event log for errors.

3.6 Cluster Administrator Stops Responding On Failover


The Cluster Administrator application uses RPC communication to connect with the cluster. If
you use the cluster name to establish the connection, Cluster Administrator may appear to
stop responding during a failover of the Cluster group and its resources. This ordinary delay
occurs during the registration of the IP address and network name resources in the group
and the establishment of a new RPC connection. If a problem occurs with the registration of
these resources, the process may take extended time until these resources become
available.
The first RPC connection must time out before the application tries to establish another
connection. As a result, Cluster Administrator may eventually time out if problems occur when
the IP address or network name resources are brought online in the Cluster group. In this
situation, try to connect by using the computer name of one of the cluster nodes instead of
the cluster name. Doing so typically allows a more real-time display of resource and group
transitions without delay.

8909675.doc Page 19 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

3.7 Group/Resources Failover Problems


General Information:

The typical reason that a group may not failover properly is usually because of problems with
resources within the group. For example, if you elect to move a group from one node to
another, the resources within the group will be taken offline, and ownership of the group will
be transferred to the other node. On receiving ownership, the node will attempt to bring
resources online, according to dependencies defined for the resources. If resources fail to go
online, MSCS attempts again to bring them online. After repeated failures, the failing resource
or resources may affect the group and cause the group to transition back to the previous
node. Eventually, if failures continue, the group or affected resources may be taken offline.
You can configure the number of attempts and allowed failures through resource and group
properties.

1. When you experience problems with group or resource failover, evaluate which
resource or resources may be failing. Determine why the resource won't go
online.
2. Check resource dependencies for proper configuration and make sure they
are available.
3. Also, make sure that the "Possible Owners" list includes both nodes.
4. If resource properties do not appear to be part of the problem, check the event log
or cluster logfile for details.

Information: the "Preferred Owners" list is designed for automatic failback or initial group
placement within the cluster. In a two-node cluster, this list should only contain the name of
the preferred node for the group, and should not contain multiple entries.

To move a group from one node to another:

1. You must have administrative rights to run Cluster Administrator.


2. The destination node must be online and the cluster service started.
3. The state of the node must be online and not Paused.
4. Both cluster nodes should be listed in the Possible Owners list for the
resources within the group.
5. Also, to move a group, resources within the group cannot be in a pending state. To
initiate a Move Group request, resources must be in one of the following three
states: online, offline, or failed.

8909675.doc Page 20 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

Symptoms and causes:

Symptoms Causes Solutions


A resource fails, but is A resource may depend on In the resource Properties dialog
not brought back another resource that has box, make sure that the Do not
online. failed restart is clear. If the resource
needs another resource in order to
function, and if the second resource
fails, confirm that the dependencies
are correctly configured.
You cannot bring a The resource is not properly Make sure the application or service
resource online. installed. associated with the resource is
properly installed
The resource is not properly Make sure the properties are set
configured. correctly for the resource.
You cannot bring the You may not have restarted Make sure that you restarted all
default physical disk the servers after installing servers after installing the Cluster
resource online in the Cluster service service
Cluster Administrator. There may be hardware Make sure that there are no
errors or transport problems hardware errors or transport
problems. Using Event Viewer, look
in the event log for disk I/O error
messages or indications of
problems with the communications
transport.
One or more SCSI adapters Make sure that the SCSI adapters
on the shared SCSI bus are are configured correctly.
configured incorrectly.
The shared SCSI bus Make sure that the shared SCSI bus
exceeds the maximum cable does not exceed the maximum
length. cable length.
The shared SCSI bus is Make sure that the shared SCSI bus
improperly terminated is properly terminated
The disk is not supported. Make sure that the disk hardware or
firmware revision level is not
outdated.
Duplicate SCSI IDs have Verify that each SCSI device has a
been specified on the shared unique ID. SCSI controller IDs are
SCSI bus. preset to seven. Reset one SCSI
controller ID to six.

Symptoms Causes Solutions

8909675.doc Page 21 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

If you move your SCSI bus In order to accommodate these


adapter to another I/O slot, changes, make sure that your
add or remove bus adapters, shared SCSI bus adapter has been
or install a new version of the properly reconfigured.
bus adapter driver, the
cluster software may not be
able to access disks on your
shared SCSI bus
Windows 2000 is incorrectly Verify that Windows 2000 can
configured to access the detect the shared SCSI bus adapter
shared SCSI bus and that the SCSI IDs for the
adapter and disks are listed. ( open
Control Panel and double-click
SCSI Adapters)
A resource in the group may Determine if a resource in the group
be continually failing. is continually failing. If the node can,
it will bring the resource back up
without failing over the group. If the
resource continually fails but does
not fail over, make sure that the
resource property Restart and
affect the group is selected. Also,
check the Restart Threshold and
Restart Period settings, which are
also in the resource Properties
dialog box.
A group failed over but The failback policies of both Make sure that the Prevent
did not fail back. the group and the resources failback check box is clear in the
may not be properly group Properties dialog box. If the
configured. Allow failback check box is
selected, be sure to wait long
enough for the group to fail back.
Check these settings for all affected
resources within a group. Because
groups fail over as a whole, one
resource that is prevented from
failing back affects the entire group.
The node to which you want Make sure that the node to which
the group to fail back is not you want the group to fail back is
configured as the preferred configured as the preferred owner of
owner of the group. the group. If not, the Cluster service
leaves the group on the node to
which they failed over.
The entire group failed A node is offline. Make sure that the node is not
and has not restarted. offline.

8909675.doc Page 22 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

Symptoms Causes Solutions


The group has failed The group may have exceeded its
repeatedly failover threshold or its failover
period. Try to bring the resources
online individually (following the
correct sequence of dependencies)
to determine which resource is
causing the problem. Or, create a
temporary resource group (for
testing purposes) and move the
resources to it, one at a time.

3.8 Quorum Resources Failure


Symptoms and solutions:

Symptoms Causes Solutions


Quorum resource The resource is not physically Make sure that the resource is
does not start connected to the server. physically connected to the server
The devices are not properly Make sure that the devices are
terminated. properly terminated.
The problem is with the Turn off the SCSI devices and check
hardware configuration. the SCSI IDs of the devices. Make
sure that the IDs are not both set to 7
(the default).
Quorum resource The disk on the shared bus If the disk on the shared bus holding
fails. holding the quorum resource the quorum resource fails and cannot
has failed. be brought online, the Cluster service
cannot start. To correct this situation,
use the fixquorum option. For more
information, read the paragraph below.
Quorum log This may occur for a variety If the quorum log is corrupted, the
becomes corrupted. of reasons. Cluster service attempts to correct the
problem by resetting the log file. In this
case, the Cluster service writes the
following message in the
Windows 2000 system log:
The log file [name] was found to be
corrupt. An attempt will be made to
reset it.
If the quorum log cannot be reset, the
Cluster service cannot start.
If the Cluster service fails to detect that

8909675.doc Page 23 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

the quorum log is corrupted, the


Cluster service may fail to start. In this
case, there may be an
"ERROR_CLUSTERLOG_CORRUPT"
message in the system log.
To correct this, you must use the
noquorumlogging option. For more
information read the paragraph below.

If the Cluster Service won't start because of a quorum disk failure, check the
corresponding device. Quorum access problem is usually caused by connectivity or
authentication issues. If this is not the case, execute the procedure below.

This operation is complex and has a great impact on cluster, so it’s recommended to
realize it with Microsoft Product Support Services’ assistance.

Recover a failed quorum disk:

You can check the status of the quorum device by starting the service with the -fixquorum
switch, and attempt to bring the quorum disk online, or change the quorum location for the
service.
When you use the fixquorum option to start the Cluster service, only cluster name and
cluster IP resources are set online. To recover a failed quorum resource:
1. Start Cluster Service with the -fixquorum option on a single node :
a. Start the Services snap-in. Click Start , point to Programs , click Administrative
Tools , and then click Services .
b. Right-click and select the properties of the Cluster Service.
c. In the Start Parameters box, type: /fix quorum
d. Then press the Start button.
2. Use Cluster Administrator to configure the Cluster Service to use a different disk on the
shared bus for the quorum resource.
3. To view or change the quorum drive settings, right-click the cluster name at the top of the
tree, listed on the left portion of the Cluster Administrator window, and select Properties.
4. The Cluster Properties window contains three different tabs, one of which is for the
quorum disk,
5. From this tab, you may view or change quorum disk settings. You may also redesignate
the quorum resource.
Recover a failed quorum log:

This operation is complex and has a great impact on cluster, so it’s recommended to
realize it with Microsoft Product Support Services’ assistance.

8909675.doc Page 24 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

When this occurs, a message is logged in the eventlog :


The log file D:\MSCS\quolog.log was found to be corrupt. An attempt will be
made to reset it, or you should use the Cluster Administrator utility to
adjust the maximum size.

Remark : If the error message occurs after you restore the system state on a computer
that has lost the quorum log, the quorum information is copied to
%SystemRoot%\Cluster\Cluster_backup. You can then use the Clusrest.exe tool from the
Resource Kit to restore this information to the quorum disk.

If you have a backup of the system state on one of the computers after the last
changes were made to the cluster, you can restore the quorum by restoring this
information.

If you do not have a backup of the Quorum log file, recreate a new quorum log file based
on the cluster configuration information in the local system's cluster hive by starting the
Cluster Service with the ResetQuorumLog switch :
1. Start the Services snap-in. Click Start, point to Programs , click Administrative Tools , and
then click Services .
2. Right-click and select the properties of the Cluster Service.
3. In the Start Parameters box, type: /ResetQuorumLog
4. Then press the Start button.

3.9 Network Name Resource Does Not Go Online


There are various causes that can prevent a network name resource from going online. Many
causes may be clearly indicated in the system event log. Potential causes may include:

 A duplicate name on the network from an offending computer.


 Static WINS entries for the network name.
 A malfunctioning switch or router.
 An incorrect TCP/IP configuration for one or more network adapters.
 An incorrect setting for the RequireDNS property.

3.10 Physical Disk Resource Problem


Problems with physical disk resources are usually hardware related. Cables, termination, or
SCSI host adapter configuration may cause problems with failover, or may cause premature
failure of the resource. The system event log may often show events related to physical disk
or controller problems. However, some cable or termination problems may not yield such
helpful information.

8909675.doc Page 25 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

1. It is important to verify the configuration of the Fiber Channel and attached


devices, whenever you detect trouble with one of these devices.

2. BIOS or firmware problems might also be factors.

3.11 Client Connectivity Problem

3.11.1 Clients have intermittent Connectivity Based on Group Ownership


If clients successfully connect to clustered resources only when a specific node is the owner,
a few possible problems could lead to this condition. To define more precisely the problem:

1. Check the system event log on each server for possible errors.
2. Check to make sure that the group has at least one IP address resource and one
network name resource,
3. Check that clients use one of these to access the resource or resources within
the group. If clients connect with any other network name or IP address, they may not
be accessing the correct server in the event that ownership of the resources changes.
As a result of improper addressing, access to these resources may appear limited to a
particular node.
4. If you are able to confirm that clients use proper addressing for the resource or
resources, check the IP address and network name resources to see that they are
online.
5. Check network connectivity with the server that owns the resources. For example,
try some of the following techniques:
From the server
PING server's primary adapter IP address (on client network)
PING IP address of the group
PING Network Name of the group
PING Router/Gateway between client and server (if any)
PING Client IP address
If the above tests work correctly up to the router/gateway check, the problem may be
elsewhere on the network because you have connectivity with the other server and
local addresses. If tests complete up to the client IP address test, there may be a client
configuration or routing problem.

From the client:


PING Client IP address
PING Router/Gateway between client and server (if any)
PING server's primary adapter IP address (on client network)
PING IP address of the group
PING Network Name of the group

8909675.doc Page 26 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

If the tests from the server all pass, but you experience failures performing tests from
the client, there may be client configuration problems. If all tests complete except the
test using the network name of the group, there may be a name resolution problem.
This may be related to client configuration, or it may be a problem with the client's
designated DNS server.

3.11.2 Clients do not Have any Connection with the Cluster


If clients lose connectivity with both cluster nodes:
1. Check to make sure that the Cluster Service is running on each node.
2. Check the system event log for possible errors.
3. Check network connectivity between cluster nodes, and with other network devices,
by using the procedure in the previous section.
4. If the Cluster Service is running, and there are no apparent connectivity problems
between the two servers, there is likely a network or client configuration problem that
does not directly involve the cluster : Check to make sure the client uses the TCP/IP
protocol and has a valid IP address on the network
5. Make sure that the client is using the correct network name or IP address to access
the cluster.

3.11.3 Clients have Problems Accessing Data Through a File Share


If clients experience problems accessing cluster file shares:
1. Check the resource and make sure it is online, and that any dependent resources
(disks, network names, and so on) are online,
2. Check the system event log for possible errors,
3. Check network connectivity between the client and the server that owns the resource.
4. If the data for the share is on a shared drive (using a physical disk resource), make
sure that the file share resource has a dependency declared for the physical disk
resource.
5. You can reset the file share by toggling the file share resource offline and back online
again.
6. Cluster file shares behave essentially the same as standard file shares. So, make
sure that clients have appropriate access at both the file system level and the share
level.
7. Make sure that the server has the proper number of client access licenses loaded for
the clients connecting, in the event that the client cannot connect because of
insufficient available connections.

8909675.doc Page 27 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

3.11.4 Client Experience Intermittent Access


Network adapter configuration is one possible cause of intermittent access to the cluster,
and of premature failover:

1. Some autosense settings for network speed can spontaneously redetect network speed.
During the detection, network traffic through the adapter may be compromised. For best
results, set the network speed manually to avoid the recalibration.
8. Make sure to use the correct network adapter drivers. Some adapters may require
special drivers, although they may be detected as a similar device.

8909675.doc Page 28 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4 Appendix: MSCS Event Messages

4.1 Event ID 1000


Source ClusSvc

Description Microsoft Cluster Server suffered an unexpected fatal error


at line ### of source module %path%. The error code was
1006.

Problem Messages similar to this may occur in the event of a fatal


error that may cause the Cluster Service to terminate on the
node that experienced the error.

Solution Check the system event log and the cluster diagnostic logfile
for additional information. It is possible that the cluster
service may restart itself after the error. This event message
may indicate serious problems that may be related to
hardware or other causes.

4.2 Event ID 1002


Source ClusSvc

Description Microsoft Cluster Server handled an unexpected error at line


528 of source module
G:\Nt\Private\Cluster\Resmon\Rmapi.c. The error code was
5007.

Problem Messages similar to this may occur after installation of


Microsoft Cluster Server. If the cluster service starts and
successfully forms or joins the cluster, they may be ignored.
Otherwise, these errors may indicate a corrupt quorum logfile
or other problem.

Solution Ignore the error if the cluster appears to be working properly.


Otherwise, you may want to try creating a new quorum
logfile using the -noquorumlogging or
-fixquorum parameters as documented in the Microsoft
Cluster Server Administrator's Guide.

8909675.doc Page 29 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.3 Event ID 1006


Source ClusSvc

Description Microsoft Cluster Server was halted because of a cluster


membership or communications error. The error code was 4.

Problem An error may have occurred between communicating cluster


nodes that affected cluster membership. This error may
occur if nodes lose the ability to communicate with each
other.

Solution Check network adapters and connections between nodes.


Check the system event log for errors. There may be a
network problem preventing reliable communication
between cluster nodes.

4.4 Event ID 1007


Source ClusSvc

Description A new node, "ComputerName", has been added to the


cluster.

Information The Microsoft Cluster Server Setup program ran on an


adjacent computer. The setup process completed, and the
node was admitted for cluster membership. No action
required.

4.5 Event ID 1009


Source ClusSvc

Description Microsoft Cluster Server could not join an existing cluster


and could not form a new cluster. Microsoft Cluster Server
has terminated.

Problem The cluster service started and attempted to join a cluster.


The node may not be a member of an existing cluster
because of eviction by an administrator. After a cluster node
has been evicted from the cluster, the cluster software must
be removed and reinstalled if you want it to rejoin the
cluster. And, because a cluster already exists with the same
cluster name, the node could not form a new cluster with
the same name.

Solution Remove MSCS from the affected node, and reinstall MSCS
on that system if desired.

8909675.doc Page 30 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.6 Event ID 1010


Source ClusSvc

Description Microsoft Cluster Server is shutting down because the


current node is not a member of any cluster. Microsoft
Cluster Server must be reinstalled to make this node a
member of a cluster.

Problem The cluster service attempted to run but found that it is not
a member of an existing cluster. This may be due to eviction
by an administrator or incomplete attempt to join a cluster.
This error indicates a need to remove and reinstall the
cluster software.

Solution Remove MSCS from the affected node, and reinstall MSCS
on that server if desired.

4.7 Event ID 1011


Source ClusSvc

Description Cluster Node "ComputerName" has been evicted from the


cluster.

Information A cluster administrator evicted the specified node from the


cluster.

4.8 Event ID 1012


Source ClusSvc

Description Microsoft Cluster Server did not start because the current
version of Windows is not correct.

4.9 Event ID 1015


Source ClusSvc

Description No checkpoint record was found in the logfile


W:\Mscs\Quolog.log; the checkpoint file is invalid or was
deleted.

Problem The Cluster Service experienced difficulty reading data from


the quorum logfile. The logfile could be corrupted.

Solution If the Cluster Service fails to start because of this problem,


try manually starting the cluster service with the
-noquorumlogging parameter. If you need to adjust the
quorum disk designation, use the -fixquorum startup
parameter when starting the cluster service. Both of these
parameters are covered in the MSCS Administrator's Guide.

8909675.doc Page 31 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.10 Event ID 1016


Source ClusSvc

Description Microsoft Cluster Server failed to obtain a checkpoint from


the cluster database for log file W:\Mscs\Quolog.log.

Problem The cluster service experienced difficulty establishing a


checkpoint for the quorum logfile. The logfile could be
corrupt, or there may be a disk problem.

Solution You may need to use procedures to recover from a corrupt


quorum logfile. You may also need to run chkdsk on the
volume to ensure against file system corruption.

4.11 Event ID 1019


Source ClusSvc

Description The log file D:\MSCS\Quolog.log was found to be corrupt.


An attempt will be made to reset it, or you should use the
Cluster Administrator utility to adjust the maximum size.

Problem The quorum logfile for the cluster was found to be corrupt.
The system will attempt to resolve the problem.

Solution The system will attempt to resolve this problem. This error
may also be an indication that the cluster property for
maximum size should be increased through the Quorum
tab. You can manually resolve this problem by using the
-noquorumlogging parameter.

4.12 Event ID 1021


Source ClusSvc

Description There is insufficient disk space remaining on the quorum


device. Please free up some space on the quorum device. If
there is no space on the disk for the quorum log files then
changes to the cluster registry will be prevented.

Problem Available disk space is low on the quorum disk and must be
resolved.

Solution Remove data or unnecessary files from the quorum disk so


that sufficient free space exists for the cluster to operate. If
necessary, designate another disk with adequate free space
as the quorum device.

8909675.doc Page 32 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.13 Event ID 1022


Source ClusSvc

Description There is insufficient space left on the quorum device. The


Microsoft Cluster Server cannot start.

Problem Available disk space is low on the quorum disk and is


preventing the startup of the cluster service.

Solution Remove data or unnecessary files from the quorum disk so


that sufficient free space exists for the cluster to operate. If
necessary, use the -fixquorum startup option to start one
node. Bring the quorum resource online and adjust free
space or designate another disk with adequate free space as
the quorum device.

4.14 Event ID 1023


Source ClusSvc

Description The quorum resource was not found. The Microsoft Cluster
Server has terminated.

Problem The device designated as the quorum resource could not be


found. This could be due to the device having failed at the
hardware level, or that the disk resource corresponding to
the quorum drive letter does not match or no longer exists.

Solution Use the -fixquorum startup option for the cluster service.
Investigate and resolve the problem with the quorum disk.
If necessary, designate another disk as the quorum device
and restart the cluster service before starting other nodes.

4.15 Event ID 1024


Source ClusSvc

Description The registry checkpoint for cluster resource "resourcename"


could not be restored to registry key registrykeyname.
The resource may not function correctly. Make sure that no
other processes have open handles to registry keys in this
registry subkey.

Problem The registry key checkpoint imposed by the cluster service


failed because an application or process has an open handle
to the registry key or subkey.

Solution Close any applications that may have an open handle to the
registry key so that it may be replicated as configured with
the resource properties. If necessary, contact the application
vendor about this problem.

4.16 Event ID 1034


Source ClusSvc

Description The disk associated with cluster disk resource resource


name could not be found. The expected signature of the
disk was signature. If the disk was removed from the
cluster, the resource should be deleted. If the disk was
replaced, the resource must be deleted and created again to
bring the disk online. If the disk has not been removed or
replaced, it may be inaccessible at this time because it is

8909675.doc Page 33 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

reserved by another cluster node.

Problem The cluster service attempted to mount a physical disk


resource in the cluster. The cluster disk driver could not
locate a disk with this signature. The disk may be offline or
may have failed. This error may also occur if the drive has
been replaced or reformatted. This error may also occur if
another system continues to hold a reservation for the disk.

Solution Determine why the disk is offline or nonoperational. Check


cables, termination, and power for the device. If the drive
has failed, replace the drive and restore the resource to the
same group as the old drive. Remove the old resource.
Restore data from a backup and adjust resource
dependencies within the group to point to the new disk
resource.

4.17 Event ID 1035


Source ClusSvc

Description Cluster disk resource %1 could not be mounted.

Problem The cluster service attempted to mount a disk resource in


the cluster and could not complete the operation. This could
be due to a file system problem, hardware issue, or drive
letter conflict.

Solution Check for drive letter conflicts, evidence of file system


issues in the system event log, and for hardware problems.

4.18 Event ID 1036


Source ClusSvc

Description Cluster disk resource "resourcename" did not respond to a


SCSI inquiry command.

Problem The disk did not respond to the issued SCSI command. This
usually indicates a hardware problem.

Solution Check SCSI bus configuration. Check the configuration of


SCSI adapters and devices. This may indicate a
misconfigured or failing device.

8909675.doc Page 34 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.19 Event ID 1037


Source ClusSvc

Description Cluster disk resource %1 has failed a filesystem check.


Please check your disk configuration.

Problem The cluster service attempted to mount a disk resource in


the cluster. A filesystem check was necessary and failed
during the process.

Solution Check cables, termination, and device configuration. If the


drive has failed, replace the drive and restore data. This
may also indicate a need to reformat the partition and
restore data from a current backup.

4.20 Event ID 1038


Source ClusSvc

Description Reservation of cluster disk "Disk W:" has been lost. Please
check your system and disk configuration.

Problem The cluster service had exclusive use of the disk, and lost
the reservation of the device on the shared SCSI bus.

Solution The disk may have gone offline or failed. Another node may
have taken control of the disk, or a SCSI bus reset
command was issued on the bus that caused a loss of
reservation.

4.21 Event ID 1040


Source ClusSvc

Description Cluster generic service "ServiceName" could not be found.

Problem The cluster service attempted to bring the specified generic


service resource online. The service could not be located
and could not be managed by the Cluster Service.

Solution Remove the generic service resource if this service is no


longer installed. The parameters for the resource may be
invalid. Check the generic service resource properties and
confirm correct configuration.

8909675.doc Page 35 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.22 Event ID 1041


Source ClusSvc

Description Cluster generic service "ServiceName" could not be started.

Problem The cluster service attempted to bring the specified generic


service resource online. The service could not be started at
the operating system level.

Solution Remove the generic service resource if this service is no


longer installed. The parameters for the resource may be
invalid. Check the generic service resource properties and
confirm correct configuration. Check to make sure the
service account has not expired, that it has the correct
password, and has necessary rights for the service to start.
Check the system event log for any related errors.

4.23 Event ID 1042


Source ClusSvc

Description Cluster generic service "resourcename" failed.

Problem The service associated with the mentioned generic service


resource failed.

Solution Check the generic service properties and service


configuration for errors. Check system and application event
logs for errors.

4.24 Event ID 1043


Source ClusSvc

Description The NetBIOS interface for "IP Address" resource has failed.

Problem The network adapter for the specified IP address resource


has experienced a failure. As a result, the IP address is
either offline, or the group has moved to a surviving node in
the cluster.

Solution Check the network adapter and network connection for


problems. Resolve the network-related problem.

8909675.doc Page 36 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.25 Event ID 1044


Source ClusSvc

Description Cluster IP Address resource %1 could not create the


required NetBios interface.

Problem The cluster service attempted to initialize an IP Address


resource and could not establish a context with NetBios.

Solution This could be a network adapter- or network adapter driver-


related issue. Make sure the adapter is using a current
driver and the correct driver for the adapter. If this is an
embedded adapter, check with the OEM to determine if a
specific OEM version of the driver is a requirement. If you
already have many IP Address resources defined, make sure
you have not reached the NetBios limit of 64 addresses. If
you have IP Address resources defined that do not have a
need for NetBios affiliation, use the IP Address private
property to disable NetBios for the address. This option is
available in SP4 and helps to conserve NetBios address
slots.

4.26 Event ID 1045


Source ClusSvc

Description Cluster IP address "IP address" could not create the


required TCP/IP Interface..

Problem The cluster service tried to bring an IP address online. The


resource properties may specify an invalid network or
malfunctioning adapter. This error may occur if you replace
a network adapter with a different model and continue to
use the old or inappropriate driver. As a result, the IP
address resource cannot be bound to the specified network.

Solution Resolve the network adapter problem or change the


properties of the IP address resource to reflect the proper
network for the resource.

4.27 Event ID 1046


Source ClusSvc

Description Cluster IP Address resource %1 cannot be brought online


because the subnet mask parameter is invalid. Please check
your network configuration.

Problem The cluster service tried to bring an IP address resource


online but could not do so. The subnet mask for the
resource is either blank or otherwise invalid.

Solution Correct the subnet mask for the resource.

8909675.doc Page 37 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.28 Event ID 1047


Source ClusSvc

Description Cluster IP Address resource %1 cannot be brought online


because the IP address parameter is invalid. Please check
your network configuration.

Problem The cluster service tried to bring an IP address resource


online but could not do so. The IP address property contains
an invalid value. This may be caused by incorrectly creating
the resource through an API or the command line interface.

Solution Correct the IP address properties for the resource.

4.29 Event ID 1048


Source ClusSvc

Description Cluster IP address, "IP address," cannot be brought online


because the specified adapter name is invalid.

Problem The cluster service tried to bring an IP address online. The


resource properties may specify an invalid network or a
malfunctioning adapter. This error may occur if you replace
a network adapter with a different model. As a result, the IP
address resource cannot be bound to the specified network.

Solution Resolve the network adapter problem or change the


properties of the IP address resource to reflect the proper
network for the resource.

4.30 Event ID 1049


Source ClusSvc

Description Cluster IP address "IP address" cannot be brought online


because the address IP address is already present on the
network. Please check your network configuration.

Problem The cluster service tried to bring an IP address online. The


address is already in use on the network and cannot be
registered. Therefore, the resource cannot be brought
online.

Solution Resolve the IP address conflict, or choose another address


for the resource.

8909675.doc Page 38 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.31 Event ID 1050


Source ClusSvc

Description Cluster Network Name resource %1 cannot be brought


online because the name %2 is already present on the
network. Please check your network configuration.

Problem The cluster service tried to bring a Network Name resource


online. The name is already in use on the network and
cannot be registered. Therefore, the resource cannot be
brought online.

Solution Resolve the conflict, or choose another network name.

4.32 Event ID 1051


Source ClusSvc

Description Cluster Network Name resource "resourcename" cannot be


brought online because it does not depend on an IP address
resource. Please add an IP address dependency.

Problem The cluster service attempted to bring the network name


resource online, and found that a required dependency was
missing.

Solution Microsoft Cluster Server requires an IP address dependency


for network name resource types. Cluster Administrator
presents a pop-up message if you attempt to remove this
dependency without specifying another like dependency. To
resolve this error, replace the IP address dependency for
this resource. Because it is difficult to remove this
dependency, Event 1051 may be an indication of problems
within the cluster registry. Check other resources for
possible dependency problems.

4.33 Event ID 1052


Source ClusSvc

Description Cluster Network Name resource "resourcename" cannot be


brought online because the name could not be added to the
system.

Problem The cluster service attempted to bring the network name


resource online but the attempt failed.

Solution Check the system event log for errors. Check network
adapter configuration and operation. Check TCP/IP
configuration and name resolution methods. Check DNS
servers for possible database problems or invalid static
mappings.

8909675.doc Page 39 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.34 Event ID 1053


Source ClusSvc

Description Cluster File Share "resourcename" cannot be brought online


because the share could not be created.

Problem The cluster service attempted to bring the share online, but
the attempt to create the share failed.

Solution Make sure the Server service is started and functioning


properly. Check the path for the share. Check ownership and
permissions on the directory. Check the system event log for
details. Also, if diagnostic logging is enabled, check the log
for an entry related to this failure. Use the net helpmsg
errornumber command with the error code found in the log
entry.

4.35 Event ID 1054


Source ClusSvc

Description Cluster File Share %1 could not be found.

Problem The share corresponding to the named File Share resource


was deleted using a mechanism other than Cluster
Administrator. This may occur if you select the share with
Explorer and choose 'Not Shared'.

Solution Delete shares or take them offline via Cluster Administrator


or the command line program CLUSTER.EXE.

4.36 Event ID 1055


Source ClusSvc

Description Cluster File Share "sharename" has failed a status check.

Problem The cluster service (through resource monitors) periodically


monitors the status of cluster resources. In this case, a file
share failed a status check. This could mean that someone
attempted to delete the share through Windows NT Explorer
or Server Manager, instead of through Cluster Administrator.
This event could also indicate a problem with the Server
service, or access to the shared directory.

Solution Check the system event log for errors. Check the cluster
diagnostic log (if it is enabled) for status codes that may be
related to this event. Check the resource properties for
proper configuration. Also, make sure the file share has
proper dependencies defined for related resources.

8909675.doc Page 40 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.37 Event ID 1056


Source ClusSvc

Description The cluster database on the local node is in an invalid state.


Please start another node before starting this node.

Problem The cluster database on the local node may be in a default


state from the installation process and the node has not
properly joined with an existing node.

Solution Make sure another node of the same cluster is online first
before starting this node. Upon joining with another cluster
node, the node will receive an updated copy of the official
cluster database and should alleviate this error.

4.38 Event ID 1057


Source ClusSvc

Description The cluster service CLUSDB could not be opened.

Problem The Cluster Service tried to open the CLUSDB registry hive
and could not do so. As a result, the cluster service cannot
be brought online.

Solution Check the cluster installation directory for the existence of a


file called CLUSDB. Make sure the registry file is not held
open by any applications, and that permissions on the file
allow the cluster service access to this file and directory.

4.39 Event ID 1058


Source ClusSvc

Description The Cluster Resource Monitor could not load the DLL %1 for
resource type %2.

Problem The Cluster Service tried to bring a resource online that


requires a specific resource DLL for the resource type. The
DLL is either missing, corrupt, or an incompatible version.
As a result, the resource cannot be brought online.

Solution Check the cluster installation directory for the existence of


the named resource DLL. Make sure the DLL exists in the
proper directory on both nodes.

8909675.doc Page 41 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.40 Event ID 1059


Source ClusSvc

Description The Cluster Resource DLL %1 for resource type %2 failed to


initialize.

Problem The Cluster Service tried to load the named resource DLL
and it failed to initialize. The DLL could be corrupt, or an
incompatible version. As a result, the resource cannot be
brought online.

Solution Check the cluster installation directory for the existence of


the named resource DLL. Make sure the DLL exists in the
proper directory on both nodes and is of proper version. If
the DLL is clusres.dll, this is the default resource DLL that
comes with MSCS. Check to make sure the version/date
stamp is equivalent to or has a later date than the version
contained in the service pack in use.

4.41 Event ID 1061


Source ClusSvc

Description Microsoft Cluster Server successfully formed a cluster on


this node.

Information This informational message indicates that an existing cluster


of the same name was not detected on the network, and
that this node elected to form the cluster and own access to
the quorum disk

4.42 Event ID 1062


Source ClusSvc

Description Microsoft Cluster Server successfully joined the cluster.

Information When the Cluster Service started, it detected an existing


cluster on the network and was able to successfully join the
cluster. No action needed.

4.43 Event ID 1063


Source ClusSvc

Description Microsoft Cluster Server was successfully stopped.

Information The Cluster Service was stopped manually by the


administrator.

8909675.doc Page 42 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.44 Event ID 1064


Source ClusSvc

Description The quorum resource was changed. The old quorum


resource could not be marked as obsolete. If there is a
partition in time, you may lose changes to your database,
because the node that is down will not be able to get to the
new quorum resource.

Problem The administrator changed the quorum disk designation


without all cluster nodes present.

Solution When other cluster nodes attempt to join the existing


cluster, they may not be able to connect to the quorum disk
and may not participate in the cluster, because their
configuration indicates a different quorum device. For any
nodes that meet this criterion, you may need to use the
-fixquorum option to start the Cluster Service on these
nodes and make configuration changes.

4.45 Event ID 1065


Source ClusSvc

Description Cluster resource %1 failed to come online.

Problem The cluster service attempted to bring the resource online,


but the resource could not reach an online status. The
resource may have exhausted the timeout period allotted for
the resource to reach an online state.

Solution Check any parameters related to the resource and check the
event log for details.

4.46 Event ID 1066


Source ClusSvc

Description Cluster disk resource resourcename is corrupted. Running


Chkdsk /F to repair problems.

Problem The Cluster Service detected corruption on the indicated


disk resource and started Chkdsk /f on the volume to
repair the structure. The Cluster Service will automatically
perform this operation, but only for cluster-defined disk
resources (not local disks).

Solution Scan the event log for additional errors. The disk corruption
could be indicative of other problems. Check related
hardware and devices on the shared bus and ensure proper
cables and termination. This error may be a symptom of
failing hardware or a deteriorating drive.

8909675.doc Page 43 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.47 Event ID 1067


Source ClusSvc

Description Cluster disk resource %1 has corrupt files. Running Chkdsk


/F to repair problems.

Problem The Cluster Service detected corruption on the indicated


disk resource and started Chkdsk /f on the volume to
repair the structure. The Cluster Service will automatically
perform this operation, but only for cluster-defined disk
resources (not local disks).

Solution Scan the event log for additional errors. The disk corruption
could be indicative of other problems. Check related
hardware and devices on the shared bus and ensure proper
cables and termination. This error may be a symptom of
failing hardware or a deteriorating drive.

4.48 Event ID 1068


Source ClusSvc

Description The cluster file share resource resourcename failed to start.


Error 5.

Problem The file share cannot be brought online. The problem may
be caused by permissions to the directory or disk in which
the directory resides. This may also be related to permission
problems within the domain.

Solution Check to make sure that the Cluster Service account has
rights to the directory to be shared. Make sure a domain
controller is accessible on the network. Make sure
dependencies for the share and for other resource in the
group are set correctly. Error 5 translates to "Access
Denied."

4.49 Event ID 1069


Source ClusSvc

Description Cluster resource "Disk G:" failed.

Problem The named resource failed and the cluster service logged
the event. In this example, a disk resource failed.

Solution For disk resources, check the device for proper operation.
Check cables, termination, and logfiles on both cluster
nodes. For other resources, check resource properties for
proper configuration, and check to make sure dependencies
are configured correctly. Check the diagnostic log (if it is
enabled) for status codes corresponding to the failure.

8909675.doc Page 44 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.50 Event ID 1070


Source ClusSvc

Description Cluster node attempted to join the cluster but failed with
error 5052.

Problem The cluster node attempted to join an existing cluster but


was unable to complete the process. This problem may
occur if the node was previously evicted from the cluster.

Solution If the node was previously evicted from the cluster, you
must remove and reinstall MSCS on the affected server.

4.51 Event ID 1071


Source ClusSvc

Description Cluster node 2 attempted to join but was refused. Error


5052.

Problem Another node attempted to join the cluster and this node
refused the request.

Solution If the node was previously evicted from the cluster, you
must remove and reinstall MSCS on the affected server.
Look in Cluster Administrator to see if the other node is
listed as a possible cluster member.

4.52 Event ID 1073


Source ClusSvc

Description Microsoft Cluster Server was halted to prevent an


inconsistency within the cluster. The error code was 5028.

Problem The cluster service on the affected node was halted because
of some kind of inconsistency between cluster nodes.

Solution Check connectivity between systems. This error may be an


indication of configuration or hardware problems.

4.53 Event ID 1077


Source ClusSvc

Description The TCP/IP interface for cluster IP address resourcename


has failed.

Problem The IP address resource depends on the proper operation of


a specific network interface as configured in the resource
properties. The network interface failed.

Solution Check the system event log for errors. Check the network
adapter for proper operation and replace the adapter if
necessary. Check to make sure the proper adapter driver is
loaded for the device and check for newer versions of the
driver.

4.54 Event ID 1080


Source ClusSvc

Description The Microsoft Cluster Server could not write file

8909675.doc Page 45 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

W:\MSCS\Chk7f5.tmp. The disk may be low on disk space,


or some other serious condition exists.

Problem The cluster service attempted to create a temporary file in


the MSCS directory on the quorum disk. Lack of disk space
or other factors prevented successful completion of the
operation.

Solution Check the quorum drive for available disk space. The file
system may be corrupted or the device may be failing.
Check file system permissions to ensure that the cluster
service account has full access to the drive and directory.

4.55 Event ID 1093


Source ClusSvc

Description Node %1 is not a member of cluster %2. If the name of the


node has changed, Microsoft Cluster Server must be
reinstalled.

Problem The cluster service attempted to start but found that it was
not a valid member of the cluster.

Solution Microsoft Cluster Server may need to be reinstalled on this


node. If this is the result of a server name change, be sure
to evict the node from the cluster (from an operational
node) prior to reinstallation.

4.56 Event ID 1096


Source ClusSvc

Description Microsoft Cluster Server cannot use network adapter %1


because it does not have a valid IP address assigned to it.

Problem The network configuration for the adapter has changed and
the cluster service cannot make use of the adapter for the
network that was assigned to it.

Solution Check the network configuration. If a DHCP address was


used for the primary address of the adapter, the address
may have been lost. For best results, use a static address.

4.57 Event ID 1097


Source ClusSvc

Description Microsoft Cluster Server did not find any network adapters
with valid IP addresses installed in the system. The node will
not be able to join a cluster.

Problem The network configuration for the system needs to be


corrected to match the same connected networks as the
other node of the cluster.

Solution Check the network configuration and make sure it agrees


with the working node of the cluster. Make sure the same
networks are accessible from all systems in the cluster.

4.58 Event ID 1098


Source ClusSvc

Description The node is no longer attached to cluster network

8909675.doc Page 46 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

network_id by adapter adapter. Microsoft Cluster Server will


delete network interface interface from the cluster
configuration.

Information The Cluster Service observed a change in network


configuration that might be induced by a change of adapter
type or by removal of a network. The network will be
removed from the list of available networks.

4.59 Event ID 1100


Source ClusSvc

Description Microsoft Cluster Server discovered that the node is now


attached to cluster network network_id by adapter adapter.
A new cluster network interface will be added to the cluster
configuration.

Information The Cluster Service noticed a new network accessible by the


cluster nodes, and has added the new network to the list of
accessible networks.

4.60 Event ID 1102


Source ClusSvc

Description Microsoft Cluster Server discovered that the node is


attached to a new network by adapter adapter. A new
network and network interface will be added to the cluster
configuration.

Information The cluster service noticed the addition of a new network.


The network will be added to list of available networks.

4.61 Event ID 1104


Source ClusSvc

Description Microsoft Cluster Server failed to update the configuration


for one of the nodes Network interfaces. The error code was
errorcode.

Problem The cluster service attempted to update a cluster node and


could not perform the operation.

Solution Use the net helpmsg errorcode command to find an


explanation of the underlying error. For example, error 1393
indicates that a corrupted disk caused the operation to fail.

4.62 Event ID 1105


Source ClusSvc

Description Microsoft Cluster Server failed to initialize the RPC services.


The error code was %1.

Problem The cluster service attempted to utilize required RPC


services and could not successfully perform the operation.

Solution Use the net helpmsg errorcode command to find an


explanation of the underlying error. Check the system event
log for other RPC related errors or performance problems.

8909675.doc Page 47 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

4.63 Event ID 1107


Source ClusSvc

Description Cluster node node name failed to make a connection to the


node over network network name. The error code was 1715.

Problem The cluster service attempted to connect to another cluster


node over a specific network and could not establish a
connection. This error is a warning message.

Solution Check to make sure that the specified network is available


and functioning correctly. If the node experiences this
problem, it may try other available networks to establish the
desired connection.

4.64 Event ID 1109


Source ClusSvc

Description The node was unable to secure its connection to cluster


node %1. The error code was %2. Check that both nodes
can communicate with their domain controllers.

Problem The cluster service attempted to connect to another cluster


node and could not establish a secure connection. This could
indicate domain connectivity problems.

Solution Check to make sure that the networks are available and
functioning correctly. This may be a symptom of larger
network problems or domain security issues.

4.65 Event ID 1115


Source ClusSvc

Description An unrecoverable error caused the join of node nodename


to the cluster to be aborted. The error code was errorcode.

Problem A node attempted to join the cluster but was unable to


obtain successful membership.

Solution Use the NET HELPMSG errorcode command to obtain


further description of the error that prevented the join
operation. For example, error code 1393 indicates that a
disk structure is corrupted and nonreadable. An error code
like this could indicate a corrupted quorum disk.

8909675.doc Page 48 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

5 Appendix: Related Event Messages

5.1 Event ID 9
Source Disk

Description The device \Device\ScsiPort2 did not respond within the


timeout period.

Problem An I/O request was sent to a SCSI device and was not
serviced within acceptable time. The device timeout was
logged by this event.

Solution You may have a device or controller problem. Check SCSI


cables, termination, and adapter configuration. Excessive
recurrence of this event message may indicate a serious
problem that could indicate potential for data loss or
corruption. If necessary, contact your hardware vendor for
help troubleshooting this problem.

5.2 Event ID 101


Source W3SVC

Description The server was unable to add the virtual root "/" for the
directory "path" because of the following error: The system
cannot find the path specified. The data is the error.

Problem The World Wide Web Publishing service could not create a
virtual root for the IIS Virtual Root resource. The directory
path may have been deleted.

Solution Re-create or restore the directory and contents. Check the


resource properties for the IIS Virtual Root resource and
ensure that the path is correct. This problem may occur if
you had an IIS Virtual Root resource defined and then
uninstalled Microsoft Cluster Server without first deleting the
resource. In this case, you may evaluate and change virtual
root properties by using the Internet Service Manager.

8909675.doc Page 49 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

5.3 Event ID 1004


Source DHCP

Description DHCP IP address lease "IP address" for the card with
network address "media access control Address" has been
denied.

Problem This system uses a DHCP-assigned IP address for a network


adapter. The system attempted to renew the leased address
and the DHCP server denied the request. The address may
already be allocated to another system. The DHCP server
may also have a problem. Network connectivity may be
affected by this problem.

Solution Resolve the problem by correcting DHCP server problems or


assigning a static IP address. For best results within a
cluster, use statically assigned IP addresses.

5.4 Event ID 1005


Source DHCP

Description DHCP failed to renew a lease for the card with network
address "MAC Address." The following error occurred: The
semaphore timeout period has expired.

Problem This system uses a DHCP assigned IP address for a network


adapter. The system attempted to renew the leased address
and was unable to renew the lease. Network operations on
this system may be affected.

Solution There may be a connectivity problem preventing access to


the DHCP server that leased the address, or the DHCP
server may be offline. For best results within a cluster, use
statically assigned IP addresses.

5.5 Event ID 2511


Source Server

Description The server service was unable to recreate the share


"Sharename" because the directory "path" no longer exists.

Problem The Server service attempted to create a share using the


specified directory path. This problem may occur if you
create a share (outside of Cluster Administrator) on a
cluster shared device. If the device is not exclusively
available to this computer, the server service cannot create
the share. Also, the directory may no longer exist or there
may be RPC related issues.

Solution Correct the problem by creating a shared resource through


Cluster Administrator, or correct the problem with the
missing directory. Check dates of RPC files in the system32
directory. Make sure they concur with those contained in the
service pack in use, or any hotfixes applied.

5.6 Event ID 4199


Source TCPIP

Description The system detected an address conflict for IP address "IP


address" with the system having network hardware address

8909675.doc Page 50 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

"media access control address." Network operations on this


system may be disrupted as a result.

Problem Another system on the network may be using one of the


addresses configured on this computer.

Solution Resolve the IP address conflict. Check network adapter


configuration and any IP address resources defined within
the cluster.

5.7 Event ID 5719


Source Netlogon

Description No Windows NT Domain controller is available for domain


"domain." (This event is expected and can be ignored when
booting with the "No Net" hardware profile.) The following
error occurred: There are currently no logon servers
available to service the logon request.

Problem A domain controller for the domain could not be contacted.


As a result, proper authentication of accounts could not be
completed. This may occur if the network is disconnected or
disabled through system configuration.

Solution Resolve the connectivity problem with the domain controller


and restart the system.

5.8 Event ID 7000


Source Service Control Manager

Description The Cluster Service failed to start because of the following


error: The service did not start because of a logon failure.

Problem The service control manager attempted to start a service


(possibly ClusSvc). It could not authenticate the service
account. This error may be seen with Event 7013.

Solution The service account could not be authenticated. This may be


because of a failure contacting a domain controller, or
because account credentials are invalid. Check the service
account name and password and ensure that the account is
available and that credentials are correct. You may also try
running the cluster service from a command prompt (if
currently logged on as an administrator) by changing to the
%systemroot%\Cluster directory (or where you installed the
software) and typing ClusSvc -debug. If the service starts
and runs correctly, stop it by pressing CTRL+C and
troubleshoot the service account problem. This error may
also occur if network connectivity is disabled through the
system configuration or hardware profile. Microsoft Cluster
Server requires network connectivity.

5.9 Event ID 7013


Source Service Control Manager

Description Logon attempt with current password failed with the


following error: There are currently no logon servers
available to service the logon request.

More Info The description for this error message may vary somewhat
based on the actual error. For example, another error that
may be listed in the event detail might be: "Logon Failure:

8909675.doc Page 51 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

unknown username or bad password."

Problem The service control manager attempted to start a service


(possibly ClusSvc). It could not authenticate the service
account with a domain controller.

Solution The service account may be in another domain, or this


system is not a domain controller. It is acceptable for the
node to be a nondomain controller, but the node needs
access to a domain controller within the domain as well as
the domain that the service account belongs to. Inability to
contact the domain controller may occur because of a
problem with the server, network, or other factors. This
problem is not related to the cluster software and must be
resolved before you start the cluster software. This error
may also occur if network connectivity is disabled through
the system configuration or hardware profile. Microsoft
Cluster Server requires network connectivity.

5.10 Event ID 7023


Source Service Control Manager

Description The Cluster Server service terminated with the following


error: The quorum log could not be created or mounted
successfully

Problem The Cluster Service attempted to start but could not gain
access to the quorum log on the quorum disk. This may be
because of problems gaining access to the disk or problems
joining a cluster that has already formed.

Solution Check the disk and quorum log for problems. If necessary,
check the cluster logfile for more information. There may be
other events in the system event log that may give more
information.

8909675.doc Page 52 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

6 Appendix: Maintenance Tools

6.1 Windows 2000 tools

The tools presented below can be used for troubleshooting:

Tools Use to
Disk Management (compmgmt.msc) Determine whether a disk is available to a
particular node
If the disk can be selected under Disk
Management, it is online to the local system. If
the disk object appears dimmed, it is not
available for that node.
Services option in Administrative Tools Verify that the Cluster Service is running
Windows 2000 Explorer, My Computer, or Verify that a particular share has been exported
the Net View command from the server you expected
Event Viewer View and manage System, Security, and
Application event logs
Dr. Watson Detect, log, and diagnose application errors
Task Manager Monitor applications, tasks, and key
performance metrics; and view detailed
information on memory and CPU usage on
each application and process
Performance Monitor Monitor system details of application and
system behaviours, and monitor performance
Network Monitor Monitor and troubleshoot network connectivity
by capturing and analyzing network traffic
Windows Diagnostics (Winmsd.exe) Easily examine your system information on
device drivers, network usage, and system
resources, such as IRQ, DMA, and I/O
addresses

8909675.doc Page 53 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

6.2 Windows 2000 Resource Kit tools

The tools presented below can be used for troubleshooting:

Tools Use to
Diskmap This command-line utility produces a detailed report on the configuration of the
hard disk that you specify. It provides information from the registry about disk
characteristics and geometry, and reads and displays data about all of the
partitions and logical drives defined on the disk. It also shows Disk Signatures.
Dumpel Dump Event Log is a command-line utility that dumps an event log for a local or
remote system into a tab-separated text file. This utility can also be used to filter
for or filter out certain event types.
Filever This command-line tool examines the version resource structure of a file or a
directory of files on either a local or remote computer, and displays information on
the versions of executable files, such as .exe and .dll files.
Getmac GetMAC provides a quick method for obtaining the MAC (Ethernet) layer address
and binding order for a computer running Windows 2000, locally or across a
network. This can be useful when you want to enter the address into a sniffer, or if
you need to know what protocols are currently in use on a computer.
Netcons This GUI tool monitors and displays current net connections, taking the place of
the Windows command-line command net use.
Clustool This tool permit cluster configuration backup and restore.

8909675.doc Page 54 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

7 Appendix: Using AND Reading the Cluster LogFile

7.1 CLUSTERLOG Environment Variable

If you set the CLUSTERLOG environment variable, the cluster will create a logfile that
contains diagnostic information using the path specified. Important events during the
operation of the Cluster Service will be logged in this file. Because so many different events
occur, the logfile may be somewhat cryptic or hard to read. This document gives some hints
about how to read the logfile and information about what items to look for.

Note: Each time you attempt to start the Cluster Service, the log will be cleared and a new
logfile started. Each component of MSCS that places an entry in the logfile will indicate itself
by abbreviation in square brackets. For example, the Node Manager component would be
abbreviated [NM]. Logfile entries will vary from one cluster to another. As a result, other
logfiles may vary from excerpts referenced in this document.

Note Log entry lines in the following sections have been wrapped for space constraints in this
document. The lines do not normally wrap.

7.2 Operating System Version Number and Service Pack Leve

Near the beginning of the logfile, notice the build number of MSCS, followed by the operating
system version number and service pack level. If you call for support, engineers may ask for
this information:
082::14-21:29:26.625 Cluster Service started - Cluster Version 1.224.
082::14-21:29:26.625 OS Version 4.0.1381 - Service Pack 3.

7.3 Cluster Service Startup

Following the version information, some initialization steps occur. Those steps are followed by
an attempt to join the cluster, if one node already exists in a running state. If the Cluster
Service could not detect any other cluster members, it will attempt to form the cluster.
Consider the following log entries:
0b5::12-20:15:23.531 We’re initing Ep...
0b5::12-20:15:23.531 [DM]: Initialization
0b5::12-20:15:23.531 [DM] DmpRestartFlusher: Entry
0b5::12-20:15:23.531 [DM] DmpStartFlusher: Entry
0b5::12-20:15:23.531 [DM] DmpStartFlusher: thread created
0b5::12-20:15:23.531 [NMINIT] Initializing the Node Manager...
0b5::12-20:15:23.546 [NMINIT] Local node name = NODEA.
0b5::12-20:15:23.546 [NMINIT] Local node ID = 1.
0b5::12-20:15:23.546 [NM] Creating object for node 1 (NODEA)
0b5::12-20:15:23.546 [NM] node 1 state 1

8909675.doc Page 55 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

0b5::12-20:15:23.546 [NM] Initializing networks.


0b5::12-20:15:23.546 [NM] Initializing network interface facilities.
0b5::12-20:15:23.546 [NMINIT] Initialization complete.
0b5::12-20:15:23.546 [FM] Starting worker thread...
0b5::12-20:15:23.546 [API] Initializing
0a9::12-20:15:23.546 [FM] Worker thread running
0b5::12-20:15:23.546 [lm] :LmInitialize Entry.
0b5::12-20:15:23.546 [lm] :TimerActInitialize Entry.
0b5::12-20:15:23.546 [CS] Initializing RPC server.
0b5::12-20:15:23.609 [INIT] Attempting to join cluster MDLCLUSTER
0b5::12-20:15:23.609 [JOIN] Spawning thread to connect to sponsor 192.88.80.114
06c::12-20:15:23.609 [JOIN] Asking 192.88.80.114 to sponsor us.
0b5::12-20:15:23.609 [JOIN] Waiting for all connect threads to terminate.
06c::12-20:15:32.750 [JOIN] Sponsor 192.88.80.114 is not available, status=1722.
0b5::12-20:15:32.750 [JOIN] All connect threads have terminated.
0b5::12-20:15:32.750 [JOIN] Unable to connect to any sponsor node.
0b5::12-20:15:32.750 [INIT] Failed to join cluster, status 53
0b5::12-20:15:32.750 [INIT] Attempting to form cluster MDLCLUSTER
0b5::12-20:15:32.750 [Ep]: EpInitPhase1
0b5::12-20:15:32.750 [API] Online read only
04b::12-20:15:32.765 [RM] Main: Initializing.

Note that the cluster service attempts to join the cluster. If it cannot connect with an existing
member, the software decides to form the cluster. The next series of steps attempts to form
groups and resources necessary to accomplish this task. It is important to note that the
cluster service must arbitrate control of the quorum disk.
0b5::12-20:15:32.781 [FM] Creating group a1a13a86-0eaf-11d1-8427-0000f8034599
0b5::12-20:15:32.781 [FM] Group a1a13a86-0eaf-11d1-8427-0000f8034599 contains a1a13a87-
0eaf-11d1-8427-0000f8034599.
0b5::12-20:15:32.781 [FM] Creating resource a1a13a87-0eaf-11d1-8427-0000f8034599
0b5::12-20:15:32.781 [FM] FmpAddPossibleEntry adding 1 to a1a13a87-0eaf-11d1-8427-
0000f8034599 possible node list
0b5::12-20:15:32.781 [FMX] Found the quorum resource a1a13a87-0eaf-11d1-8427-0000f8034599.
0b5::12-20:15:32.781 [FM] All dependencies for a1a13a87-0eaf-11d1-8427-0000f8034599 created
0b5::12-20:15:32.781 [FM] arbitrate for quorum resource id a1a13a87-0eaf-11d1-8427-
0000f8034599.
0b5::12-20:15:32.781 FmpRmCreateResource: creating resource a1a13a87-0eaf-11d1-8427-
0000f8034599 in shared resource monitor
0b5::12-20:15:32.812 FmpRmCreateResource: created resource a1a13a87-0eaf-11d1-8427-
0000f8034599, resid 1363016
0dc::12-20:15:32.828 Physical Disk <Disk D:>: Arbitrate returned status 0.
0b5::12-20:15:32.828 [FM] FmGetQuorumResource successful
0b5::12-20:15:32.828 FmpRmOnlineResource: bringing resource a1a13a87-0eaf-11d1-8427-
0000f8034599 (resid 1363016) online.
0b5::12-20:15:32.843 [CP] CppResourceNotify for resource Disk D:
0b5::12-20:15:32.843 [GUM] GumSendUpdate: Locker waiting type 0 context 8
0b5::12-20:15:32.843 [GUM] Thread 0xb5 UpdateLock wait on Type 0
0b5::12-20:15:32.843 [GUM] DoLockingUpdate successful, lock granted to 1
0b5::12-20:15:32.843 [GUM] GumSendUpdate: Locker dispatching seq 388 type 0 context 8
0b5::12-20:15:32.843 [GUM] GumpDoUnlockingUpdate releasing lock ownership
0b5::12-20:15:32.843 [GUM] GumSendUpdate: completed update seq 388 type 0 context 8
0b5::12-20:15:32.843 [GUM] GumSendUpdate: Locker waiting type 0 context 9
0b5::12-20:15:32.843 [GUM] Thread 0xb5 UpdateLock wait on Type 0
0b5::12-20:15:32.843 [GUM] DoLockingUpdate successful, lock granted to 1
0b5::12-20:15:32.843 [GUM] GumSendUpdate: Locker dispatching seq 389 type 0 context 9
0b5::12-20:15:32.843 [GUM] GumpDoUnlockingUpdate releasing lock ownership
0b5::12-20:15:32.843 [GUM] GumSendUpdate: completed update seq 389 type 0 context 9
0b5::12-20:15:32.843 FmpRmOnlineResource: Resource a1a13a87-0eaf-11d1-8427-0000f8034599
pending
0e1::12-20:15:33.359 Physical Disk <Disk D:>: Online, created registry watcher thread.

8909675.doc Page 56 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

090::12-20:15:33.359 [FM] NotifyCallBackRoutine: enqueuing event


04d::12-20:15:33.359 [FM] WorkerThread, processing transition event for a1a13a87-0eaf-11d1-
8427-0000f8034599, oldState = 129, newState = 2.
04d::12-20:15:33.359 [FM] HandleResourceTransition: Resource Name = a1a13a87-0eaf-11d1-
8427-0000f8034599 old state=129 new state=2
04d::12-20:15:33.359 [DM] DmpQuoObjNotifyCb: Quorum resource is online
04d::12-20:15:33.375 [DM] DmpQuoObjNotifyCb: Own quorum resource, try open the quorum log
04d::12-20:15:33.375 [DM] DmpQuoObjNotifyCb: the name of the quorum file is
D:\MSCS\quolog.log
04d::12-20:15:33.375 [lm] LogCreate : Entry FileName=D:\MSCS\quolog.log
MaxFileSize=0x00010000
04d::12-20:15:33.375 [lm] LogpCreate : Entry

In this case, the node forms the cluster group and quorum disk resource, gains control of the
disk, and opens the quorum logfile. From here, the cluster performs operations with the
logfile, and proceeds to form the cluster. This involves configuring network interfaces and
bringing them online.
0b5::12-20:15:33.718 [NM] Beginning form process.
0b5::12-20:15:33.718 [NM] Synchronizing node information.
0b5::12-20:15:33.718 [NM] Creating node objects.
0b5::12-20:15:33.718 [NM] Configuring networks & interfaces.
0b5::12-20:15:33.718 [NM] Synchronizing network information.
0b5::12-20:15:33.718 [NM] Synchronizing interface information.
0b5::12-20:15:33.718 [dm] DmBeginLocalUpdate Entry
0b5::12-20:15:33.718 [dm] DmBeginLocalUpdate Exit, pLocalXsaction=0x00151c20
dwError=0x00000000
0b5::12-20:15:33.718 [NM] Setting database entry for interface a1a13a7f-0eaf-11d1-8427-
0000f8034599
0b5::12-20:15:33.718 [dm] DmCommitLocalUpdate Entry
0b5::12-20:15:33.718 [dm] DmCommitLocalUpdate Exit, dwError=0x00000000
0b5::12-20:15:33.718 [dm] DmBeginLocalUpdate Entry
0b5::12-20:15:33.875 [dm] DmBeginLocalUpdate Exit, pLocalXsaction=0x00151c20
dwError=0x00000000
0b5::12-20:15:33.875 [NM] Setting database entry for interface a1a13a81-0eaf-11d1-8427-
0000f8034599
0b5::12-20:15:33.875 [dm] DmCommitLocalUpdate Entry
0b5::12-20:15:33.875 [dm] DmCommitLocalUpdate Exit, dwError=0x00000000
0b5::12-20:15:33.875 [NM] Matched 2 networks, created 0 new networks.
0b5::12-20:15:33.875 [NM] Resynchronizing network information.
0b5::12-20:15:33.875 [NM] Resynchronizing interface information.
0b5::12-20:15:33.875 [NM] Creating network objects.
0b5::12-20:15:33.875 [NM] Creating object for network a1a13a7e-0eaf-11d1-8427-0000f8034599
0b5::12-20:15:33.875 [NM] Creating object for network a1a13a80-0eaf-11d1-8427-0000f8034599
0b5::12-20:15:33.875 [NM] Creating interface objects.
0b5::12-20:15:33.875 [NM] Creating object for interface a1a13a7f-0eaf-11d1-8427-
0000f8034599.
0b5::12-20:15:33.875 [NM] Registering network a1a13a7e-0eaf-11d1-8427-0000f8034599 with
cluster transport.
0b5::12-20:15:33.875 [NM] Registering interfaces for network a1a13a7e-0eaf-11d1-8427-
0000f8034599 with cluster transport.
0b5::12-20:15:33.875 [NM] Registering interface a1a13a7f-0eaf-11d1-8427-0000f8034599 with
cluster transport, addr 9.9.9.2, endpoint 3003.
0b5::12-20:15:33.890 [NM] Instructing cluster transport to bring network a1a13a7e-0eaf-
11d1-8427-0000f8034599 online.
0b5::12-20:15:33.890 [NM] Creating object for interface a1a13a81-0eaf-11d1-8427-
0000f8034599.
0b5::12-20:15:33.890 [NM] Registering network a1a13a80-0eaf-11d1-8427-0000f8034599 with
cluster transport.
0b5::12-20:15:33.890 [NM] Registering interfaces for network a1a13a80-0eaf-11d1-8427-
0000f8034599 with cluster transport.

8909675.doc Page 57 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

0b5::12-20:15:33.890 [NM] Registering interface a1a13a81-0eaf-11d1-8427-0000f8034599 with


cluster transport, addr 192.88.80.190, endpoint 3003.
0b5::12-20:15:33.890 [NM] Instructing cluster transport to bring network a1a13a80-0eaf-
11d1-8427-0000f8034599 online.

After initializing network interfaces, the cluster will continue formation with the enumeration of
cluster nodes. In this case, as a newly formed cluster, the cluster will contain only one node.
If this session had been joining an existing cluster, the node enumeration would show two
nodes. Next, the cluster will bring the Cluster IP address and Cluster Name resources online.
0b5::12-20:15:34.015 [FM] OnlineGroup: setting group state to Online for f901aa29-0eaf-
11d1-8427-0000f8034599
069::12-20:15:34.015 IP address <Cluster IP address>: Created NBT interface
\Device\NetBt_If6 (instance 355833456).
0b5::12-20:15:34.015 [FM] FmpAddPossibleEntry adding 1 to a1a13a87-0eaf-11d1-8427-
0000f8034599 possible node list
0b5::12-20:15:34.015 [FM] FmFormNewClusterPhase2 complete.
.
.
.
0b5::12-20:15:34.281 [INIT] Successfully formed a cluster.
09c::12-20:15:34.281 [lm] :ReSyncTimerHandles Entry.
09c::12-20:15:34.281 [lm] :ReSyncTimerHandles Exit gdwNumHandles=3
0b5::12-20:15:34.281 [INIT] Cluster Started! Original Min WS is 204800, Max WS is 1413120.
08c::12-20:15:34.296 [CPROXY] clussvc initialized
069::12-20:15:40.421 IP address <Cluster IP Address>: IP Address 192.88.80.114 on adapter
DC21X41 online
.
.
.
04d::12-20:15:40.421 [FM] OnlineWaitingTree, a1a13a84-0eaf-11d1-8427-0000f8034599 depends
on a1a13a83-0eaf-11d1-8427-0000f8034599. Start first
04d::12-20:15:40.421 [FM] OnlineWaitingTree, Start resource a1a13a84-0eaf-11d1-8427-
0000f8034599
04d::12-20:15:40.421 [FM] OnlineResource: a1a13a84-0eaf-11d1-8427-0000f8034599 depends on
a1a13a83-0eaf-11d1-8427-0000f8034599. Bring online first.
04d::12-20:15:40.421 FmpRmOnlineResource: bringing resource a1a13a84-0eaf-11d1-8427-
0000f8034599 (resid 1391032) online.
04d::12-20:15:40.421 [CP] CppResourceNotify for resource Cluster Name
04d::12-20:15:40.421 [GUM] GumSendUpdate: Locker waiting type 0 context 8
04d::12-20:15:40.437 [GUM] Thread 0x4d UpdateLock wait on Type 0
04d::12-20:15:40.437 [GUM] DoLockingUpdate successful, lock granted to 1
076::12-20:15:40.437 Network Name <Cluster Name>: Bringing resource online...
04d::12-20:15:40.437 [GUM] GumSendUpdate: Locker dispatching seq 411 type 0 context 8
04d::12-20:15:40.437 [GUM] GumpDoUnlockingUpdate releasing lock ownership
04d::12-20:15:40.437 [GUM] GumSendUpdate: completed update seq 411 type 0 context 8
04d::12-20:15:40.437 [GUM] GumSendUpdate: Locker waiting type 0 context 11
.
.
.
076::12-20:15:43.515 Network Name <Cluster Name>: Registered server name MDLCLUSTER on
transport \Device\NetBt_If6.
076::12-20:15:46.578 Network Name <Cluster Name>: Registered workstation name MDLCLUSTER on
transport \Device\NetBt_If6.
076::12-20:15:46.578 Network Name <Cluster Name>: Network Name MDLCLUSTER is now online

Following these steps, the cluster will attempt to bring other resources and groups online. The
logfile will continue to increase in size as the cluster service runs. Therefore, it may be a good

8909675.doc Page 58 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

idea to enable this option when you are having problems, rather than leaving it on for days or
weeks at a time.

7.4 Logfile Entries for Common Failures

After reviewing a successful startup of the Cluster Service, you may want to examine some
errors that may appear because of various failures. The following examples illustrate possible
log entries for four different failures.

Example 1: Quorum Disk Turned Off


If the cluster attempts to form and cannot connect to the quorum disk, entries similar to the
following may appear in the logfile. Because of the failure, the cluster cannot form, and the
Cluster Service terminates.
0b9::14-20:59:42.921 [RM] Main: Initializing.
08f::14-20:59:42.937 [FM] Creating group a1a13a86-0eaf-11d1-8427-0000f8034599
08f::14-20:59:42.937 [FM] Group a1a13a86-0eaf-11d1-8427-0000f8034599 contains a1a13a87-
0eaf-11d1-8427-0000f8034599.
08f::14-20:59:42.937 [FM] Creating resource a1a13a87-0eaf-11d1-8427-0000f8034599
08f::14-20:59:42.937 [FM] FmpAddPossibleEntry adding 1 to a1a13a87-0eaf-11d1-8427-
0000f8034599 possible node list
08f::14-20:59:42.937 [FMX] Found the quorum resource a1a13a87-0eaf-11d1-8427-0000f8034599.
08f::14-20:59:42.937 [FM] All dependencies for a1a13a87-0eaf-11d1-8427-0000f8034599 created
08f::14-20:59:42.937 [FM] arbitrate for quorum resource id a1a13a87-0eaf-11d1-8427-
0000f8034599.
08f::14-20:59:42.937 FmpRmCreateResource: creating resource a1a13a87-0eaf-11d1-8427-
0000f8034599 in shared resource monitor
08f::14-20:59:42.968 FmpRmCreateResource: created resource a1a13a87-0eaf-11d1-8427-
0000f8034599, resid 1362616
0e9::14-20:59:43.765 Physical Disk <Disk D:>: SCSI, error reserving disk, error 21.
0e9::14-20:59:54.125 Physical Disk <Disk D:>: SCSI, error reserving disk, error 21.
0e9::14-20:59:54.140 Physical Disk <Disk D:>: Arbitrate returned status 21.
08f::14-20:59:54.140 [FM] FmGetQuorumResource failed, error 21.
08f::14-20:59:54.140 [INIT] Cleaning up failed form attempt.
08f::14-20:59:54.140 [INIT] Failed to form cluster, status 3213068.
08f::14-20:59:54.140 [CS] ClusterInitialize failed 21
08f::14-20:59:54.140 [INIT] The cluster service is shutting down.
08f::14-20:59:54.140 [evt] EvShutdown
08f::14-20:59:54.140 [FM] Shutdown: Failover Manager requested to shutdown groups.
08f::14-20:59:54.140 [FM] DestroyGroup: destroying a1a13a86-0eaf-11d1-8427-0000f8034599
08f::14-20:59:54.140 [FM] DestroyResource: destroying a1a13a87-0eaf-11d1-8427-0000f8034599
08f::14-20:59:54.140 [OM] Deleting object Physical Disk
08f::14-20:59:54.140 [FM] Resource a1a13a87-0eaf-11d1-8427-0000f8034599 destroyed.
08f::14-20:59:54.140 [FM] Group a1a13a86-0eaf-11d1-8427-0000f8034599 destroyed.
08f::14-20:59:54.140 [Dm] DmShutdown
08f::14-20:59:54.140 [DM] DmpShutdownFlusher: Entry
08f::14-20:59:54.156 [DM] DmpShutdownFlusher: Setting event
062::14-20:59:54.156 [DM] DmpRegistryFlusher: got 0
062::14-20:59:54.156 [DM] DmpRegistryFlusher: exiting
0ca::14-20:59:54.156 [FM] WorkItem, delete resource <Disk D:> status 0
0ca::14-20:59:54.156 [OM] Deleting object Disk Group 1 (a1a13a86-0eaf-11d1-8427-
0000f8034599)
0e7::14-20:59:54.375 [CPROXY] clussvc terminated, error 0.
0e7::14-20:59:54.375 [CPROXY] Service Stopping...
0b9::14-20:59:54.375 [RM] Going away, Status = 1, Shutdown = 0.

8909675.doc Page 59 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

02c::14-20:59:54.375 [RM] PollerThread stopping. Shutdown = 1, Status = 0, WaitFailed = 0,


NotifyEvent address = 196.
0e7::14-20:59:54.375 [CPROXY] Cleaning up
0b9::14-20:59:54.375 [RM] RundownResources posting shutdown notification.
0e7::14-20:59:54.375 [CPROXY] Cleanup complete.
0e3::14-20:59:54.375 [RM] NotifyChanges shutting down.
0e7::14-20:59:54.375 [CPROXY] Service Stopped.

Perhaps the most meaningful lines from above are:


0e9::14-20:59:43.765 Physical Disk <Disk D:>: SCSI, error reserving disk, error 21.
0e9::14-20:59:54.125 Physical Disk <Disk D:>: SCSI, error reserving disk, error 21.
0e9::14-20:59:54.140 Physical Disk <Disk D:>: Arbitrate returned status 21.

Note The error code on these logfile entries is 21. You can issue net helpmsg 21 from the
command line and receive the explanation of the error status code. Status code 21 means,
"The device is not ready.” This indicates a possible problem with the device. In this case, the
device was turned off, and the error status correctly indicates the problem.

8909675.doc Page 60 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

Example 2: Quorum Disk Failure


In this example, the drive has failed or has been reformatted from the SCSI controller. As a
result, the cluster service cannot locate a drive with the specific signature it is looking for.
0b8::14-21:11:46.515 [RM] Main: Initializing.
074::14-21:11:46.531 [FM] Creating group a1a13a86-0eaf-11d1-8427-0000f8034599
074::14-21:11:46.531 [FM] Group a1a13a86-0eaf-11d1-8427-0000f8034599 contains a1a13a87-
0eaf-11d1-8427-0000f8034599.
074::14-21:11:46.531 [FM] Creating resource a1a13a87-0eaf-11d1-8427-0000f8034599
074::14-21:11:46.531 [FM] FmpAddPossibleEntry adding 1 to a1a13a87-0eaf-11d1-8427-
0000f8034599 possible node list
074::14-21:11:46.531 [FMX] Found the quorum resource a1a13a87-0eaf-11d1-8427-0000f8034599.
074::14-21:11:46.531 [FM] All dependencies for a1a13a87-0eaf-11d1-8427-0000f8034599 created
074::14-21:11:46.531 [FM] arbitrate for quorum resource id a1a13a87-0eaf-11d1-8427-
0000f8034599.
074::14-21:11:46.531 FmpRmCreateResource: creating resource a1a13a87-0eaf-11d1-8427-
0000f8034599 in shared resource monitor
074::14-21:11:46.562 FmpRmCreateResource: created resource a1a13a87-0eaf-11d1-8427-
0000f8034599, resid 1362696
075::14-21:11:46.671 Physical Disk <Disk D:>: SCSI, Performing bus rescan.
075::14-21:11:51.843 Physical Disk <Disk D:>: SCSI, error attaching to signature 71cd0549,
error 2.
075::14-21:11:51.843 Physical Disk <Disk D:>: Unable to attach to signature 71cd0549.
Error: 2.
074::14-21:11:51.859 [FM] FmGetQuorumResource failed, error 2.
074::14-21:11:51.859 [INIT] Cleaning up failed form attempt.

In this case, the most important logfile entries are:


075::14-21:11:51.843 Physical Disk <Disk D:>: SCSI, error attaching to signature 71cd0549,
error 2.
075::14-21:11:51.843 Physical Disk <Disk D:>: Unable to attach to signature 71cd0549.
Error: 2.

Status code 2 means, "The system cannot find the file specified.” The error in this case may
mean that it cannot find the disk, or that, because of some kind of problem, it cannot locate
the quorum logfile that should be on the disk.

8909675.doc Page 61 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

Example 3: Duplicate Cluster IP Address

If another computer on the network has the same IP address as the cluster IP address
resource, the resource will be prevented from going online. Further, the cluster name will not
be registered on the network, as it depends on the IP address resource. Because this name
is the network name used for cluster administration, you will not be able to administer the
cluster using this name, in this type of failure. However, you may be able to use the computer
name of the cluster node to connect with Cluster Administrator. Additionally, you may be able
to connect locally from the console using the loopback address. The following sample entries
are from a cluster logfile during this type of failure:
0b9::14-21:32:59.968 IP Address <Cluster IP Address>: The IP address is already in use on
the network, status 5057.
0d2::14-21:32:59.984 [FM] NotifyCallBackRoutine: enqueuing event
03e::14-21:32:59.984 [FM] WorkerThread, processing transition event for a1a13a83-0eaf-11d1-
8427-0000f8034599, oldState = 129, newState = 4.03e
.
.
.
03e::14-21:32:59.984 FmpHandleResourceFailure: taking resource a1a13a83-0eaf-11d1-8427-
0000f8034599 and dependents offline
03e::14-21:32:59.984 [FM] TerminateResource: a1a13a84-0eaf-11d1-8427-0000f8034599 depends
on a1a13a83-0eaf-11d1-8427-0000f8034599. Terminating first
0d3::14-21:32:59.984 Network Name <Cluster Name>: Terminating name MDLCLUSTER...
0d3::14-21:32:59.984 Network Name <Cluster Name>: Name MDLCLUSTER is already offline.
.
.
.
03e::14-21:33:00.000 FmpRmTerminateResource: a1a13a84-0eaf-11d1-8427-0000f8034599 is now
offline
0c7::14-21:33:00.000 IP Address <Cluster IP Address>: Terminating resource...
0c7::14-21:33:00.000 IP Address <Cluster IP Address>: Address 192.88.80.114 on adapter
DC21X41 offline.

Example 4: Evicted Node Attempts to Join Existing Cluster

If you evict a node from a cluster, the cluster software on that node must be reinstalled to
gain access to the cluster again. If you start the evicted node, and the Cluster Service
attempts to join the cluster, entries similar to the following may appear in the cluster logfile:
032::26-16:11:45.109 [INIT] Attempting to join cluster MDLCLUSTER
032::26-16:11:45.109 [JOIN] Spawning thread to connect to sponsor 192.88.80.115
040::26-16:11:45.109 [JOIN] Asking 192.88.80.115 to sponsor us.
032::26-16:11:45.109 [JOIN] Spawning thread to connect to sponsor 9.9.9.2
032::26-16:11:45.109 [JOIN] Spawning thread to connect to sponsor 192.88.80.190
099::26-16:11:45.109 [JOIN] Asking 9.9.9.2 to sponsor us.
032::26-16:11:45.109 [JOIN] Spawning thread to connect to sponsor NODEA
098::26-16:11:45.109 [JOIN] Asking 192.88.80.190 to sponsor us.
032::26-16:11:45.125 [JOIN] Waiting for all connect threads to terminate.
092::26-16:11:45.125 [JOIN] Asking NODEA to sponsor us.
040::26-16:12:18.640 [JOIN] Sponsor 192.88.80.115 is not available (JoinVersion),
status=1722.
098::26-16:12:18.640 [JOIN] Sponsor 192.88.80.190 is not available (JoinVersion),
status=1722.
099::26-16:12:18.640 [JOIN] Sponsor 9.9.9.2 is not available (JoinVersion), status=1722.

8909675.doc Page 62 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

098::26-16:12:18.640 [JOIN] JoinVersion data for sponsor 157.57.224.190 is invalid, status


1722.
099::26-16:12:18.640 [JOIN] JoinVersion data for sponsor 9.9.9.2 is invalid, status 1722.
040::26-16:12:18.640 [JOIN] JoinVersion data for sponsor 157.58.80.115 is invalid, status
1722.
092::26-16:12:18.703 [JOIN] Sponsor NODEA is not available (JoinVersion), status=1722.
092::26-16:12:18.703 [JOIN] JoinVersion data for sponsor NODEA is invalid, status 1722.
032::26-16:12:18.703 [JOIN] All connect threads have terminated.
032::26-16:12:18.703 [JOIN] Unable to connect to any sponsor node.
032::26-16:12:18.703 [INIT] Failed to join cluster, status 0
032::26-16:12:18.703 [INIT] Attempting to form cluster MDLCLUSTER
.
.
.
032::26-16:12:18.734 [FM] arbitrate for quorum resource id 24acc093-1e28-11d1-9e5d-
0000f8034599.
032::26-16:12:18.734 [FM] FmpQueryResourceInfo:initialize the resource with the registry
information
032::26-16:12:18.734 FmpRmCreateResource: creating resource 24acc093-1e28-11d1-9e5d-
0000f8034599 in shared resource monitor
032::26-16:12:18.765 FmpRmCreateResource: created resource 24acc093-1e28-11d1-9e5d-
0000f8034599, resid 1360000
06d::26-16:12:18.812 Physical Disk <Disk G:>: SCSI, error attaching to signature b2320a9b,
error 2.
06d::26-16:12:18.812 Physical Disk <Disk G:>: Unable to attach to signature b2320a9b.
Error: 2.
032::26-16:12:18.812 [FM] FmGetQuorumResource failed, error 2.
032::26-16:12:18.812 [INIT] Cleaning up failed form attempt.
032::26-16:12:18.812 [INIT] Failed to form cluster, status 2.
032::26-16:12:18.828 [CS] ClusterInitialize failed 2

The node attempts to join the existing cluster, but has invalid credentials, because it was
previously evicted. Therefore, the existing node refuses to communicate with it. The node
may attempt to form its own version of the cluster, but cannot gain control of the quorum disk,
because the existing cluster node maintains ownership. Examination of the logfile on the
existing cluster node reveals that the Cluster Service posted entries to reflect the failed
attempt to join:
0c4::29-18:13:31.035 [NMJOIN] Processing request by node 2 to begin joining.
0c4::29-18:13:31.035 [NMJOIN] Node 2 is not a member of this cluster. Cannot join.

8909675.doc Page 63 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

8 Appendix: Q258078 Cluster Service Startup Options

PSS ID Number: Q258078


Article last modified on 08-08-2001

:2000,4.0

======================================================================
-------------------------------------------------------------------------------
The information in this article applies to:

- Microsoft Windows 2000 Advanced Server


- Microsoft Windows 2000 Datacenter Server
- Microsoft Windows NT Server, Enterprise Edition version 4.0
-------------------------------------------------------------------------------

SUMMARY
=======

This is a list of all the available switches that can be used as startup
parameters to start the Cluster service.

To do this, go to the properties of the service, and put the appropriate switch
in the Start Parameters box, and then click Start.

NOTE: You must include a forward slash (/) at the beginning of the switch.

You can also use the desired switch when starting the Cluster service from the
command line as well:

net start clussvc.exe /<switch>

NOTE: The Debug command has special startup parameters, please reference the
Debug section below for proper usage.

Valid option switches are:

- FixQuorum -- No quorum device, no quorum logging


- DebugResMon -- Enable debugging of resrcmon process
- Debug -- Displays events during the start of Cluster Service. See below for
special syntax

Windows 2000 and later only switches:

8909675.doc Page 64 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

- ResetQuorumLog -- Dynamically re-creates the quorum log and checkpoint files


(this functionality is automatic in Windows NT 4.0)
- NoRepEvtLogging -- No replication of Event Log entries

MORE INFORMATION
================

Explanation of some of the switches:

- Debug

Function: It is possible that Cluster logging may not contain any helpful
information in diagnosing failures of the Cluster service to start. This is
because the Cluster service may fail prior to the Cluster.log starting.
Starting the Cluster service with this switch displays the initialization of
the Cluster service and can be beneficial in identifying these early
occurring problems.

Requirements: This switch is intended for temporary diagnostic use only. If


the Cluster service fails to start because of a logon error of the service
account, or another system-related error, the service may not have a chance
to run. As a result, a cluster.log file may not be created. This method runs
the service outside of the normal environment given by the Service Control
Manager. To use this switch, you must be logged on locally with
administrative rights and launched from the command prompt. Do not use the
/debug option for normal use or for any length of time. The service does not
run as efficiently with the option set.

Usage scenarios: This switch must be used only when the Cluster service fails
to start up. This switch will display on the screen the operation of the
Cluster Service as it attempts to start. This switch can only be used when
starting the service from the command prompt and you must be in the directory
that the Cluster Service is installed to, by default this is
%SystemRoot%\Cluster. This is also the only switch that you do not use the
NET START command to start the service.

Operation: Open a Command Prompt and change your current directory to the
%SystemRoot%\cluster directory. Then type:
"CLUSSVC /debug"
The cluster service will send output to the window similar to what would
normally be seen in the cluster.log. You may also capture this information to
a file by using the following command syntax instead:
"CLUSSVC /debug > c:\debug.log"

At the point that you are satisfied that the Cluster service is running

8909675.doc Page 65 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

properly, use CTRL+C from the keyboard to stop the service.

Note: You may wish to use the ClusterLogLevel environment variable to control
the output level when using the Debug switch, see this article for additional
information:

Q168801 How to Enable Cluster Logging in Microsoft Cluster Server

- FixQuorum

Function: Lets the cluster service start up despite problems with the quorum
device. The only resources that will be brought online once the service is
started is the Cluster IP Address and the Cluster Name. You can open Cluster
Administrator and bring other resources online manually.

Requirements: This switch MUST be used only in diagnosis mode on a very


temporary basis and not during normal operation. Only 1 node must be started
up using this switch and a second node must not be attempted to be joined to
the node started up using this switch. Typically, this switch is used alone.

Usage scenarios: If the cluster service is unable to start up in the normal


way due to the failure of the quorum resource, users can start up the cluster
service in this mode and attempt to diagnose the failure.

Operation: After the cluster service is started up, all resources including
the quorum resource remain offline. Users can then manually try to bring the
quorum resource online and monitor the cluster log entries as well as the new
event log entries and attempt to diagnose any problems with the quorum
resource.

NET START ClusSvc /FixQuorum

- ResetQuorumLog

Function: If the quorum log and checkpoint file is not found or is corrupt,
this can be used to create files based from the information in the local
node's %SystemRoot%\Cluster\CLUSDB registry hive. If the quorum log file is
found to be in proper order, this switch has no effect.

Requirements: Typically, only one node is started up using this switch and
this switch is used alone. Must be used only by experienced users who
understand the consequences of using information that is potentially out of
date, to create a new quorum log file.

Usage scenarios: This switch must be used only when the Cluster service fails
to start up on a Windows 2000 or later machine due to a missing/corrupt

8909675.doc Page 66 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

quorum log QUOLOG.LOG and CHKxxx.TMP files. Windows NT 4.0 will automatically
recreate these files if they do not exist, this functionality was added in
Windows 2000 to give more control over the start of the Cluster service.

Operation: The Cluster service does an auto-reset of the quorum log file if it
is found missing or corrupt by using the information in the currently loaded
cluster hive using the file %systemroot%\Cluster\CLUSDB.

NET START ClusSvc /ResetQuorumLog

- DebugResMon

Function: Helps you to debug the resource monitor process and, therefore, the
resource dynamic-link libraries (DLLs) that are loaded by the resource
monitor. You can use any standard Windows-based debugger.

Requirements: Can only be used when the cluster service is started from the
command prompt and using the "/debug" option, there is no equivalent registry
setting that could be used when cluster service is run as a service. Debugger
must be available for attaching to the resource monitor when it starts up.
Typically, this switch is used alone.

Usage scenarios: Developers use it to debug the resource monitor process and
resource DLLs. This option is extremely useful if a bug in a resource DLL
causes the resource monitor process to crash soon after it is started up by
the cluster service and before users can manually attach a debugger to the
resource monitor process.

Operation: Just before the resource monitor process is started up, the cluster
service process waits with a message "Waiting for debugger to connect to the
resmon process X" where X is the PID (Process ID) of the resource monitor
process. The cluster service does this waiting for all resource monitor
processes created by it. Once the user attaches a debugger to the resource
monitor process, and the resource monitor process starts up, the cluster
service continues with its initialization.

- NoRepEvtLogging

Function: The norepevtlogging command prevents replication of those events


recorded in the event log. This command is useful in reducing the amount of
information displayed in the command window by filtering out events already
recorded in the event log. Event log replication is a new feature added in
Windows 2000.

Usage scenarios: For example, to start the cluster service and log those
events not recorded in the event log to a local file, Debugnorep.log:

8909675.doc Page 67 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39
Cluster Administration and
Troubleshooting Guide

clussvc /debug /norepevtlogging > c:\DEBUGNOREP.log\

Operation: The norepevtlogging command can be set as a start parameter when


starting the cluster service from the Computer Management console.

The command line syntax is:


NET START ClusSvc /NoRepEvtLogging

This will prevent the node that was started with this switch not to replicate
it's information to other nodes, but it will still receive information from
other nodes that were started normally.

Additional query words: MSCS

======================================================================
Keywords : kbenv kbtool w2000mscs kbClustering
Technology : kbWinNTsearch kbWinNT400search kbwin2000AdvServ
kbwin2000AdvServSearch kbwin2000DataServ kbwin2000DataServSearch
kbWinNTSsearch kbWinNTSEntSearch kbWinNTSEnt400 kbWinNTS400search
kbwin2000Search kbWinAdvServSearch kbWinDataServSearch
Version : :2000,4.0
Issue type : kbinfo
======================================================================
=======
Copyright Microsoft Corporation 2001.

8909675.doc Page 68 of 68 Created: 06.01.2004


Author: Fabian SIRACH Printed: 22.08.2002 17:39

Potrebbero piacerti anche