GLOBE ISIT OASIS2 Cluster Administration and Troubleshooting

Cluster Administration and
Troubleshooting Guide
Version Number (1.0)

January 5, 2004
Author: Fabian SIRACH
Microsoft Services, France
GLOBE ISIT OASIS2
< Cluster Administration and

Troubleshooting Guide>
 2001 Nestec Ltd. – GLOBE – Global Business Excellence. http://veviis01.nestec.ch/GLOBE/ GL-GLOBE

Proprietary document not to be divulged outside the Company.
Printed by Nestec Ltd., CH-1800 Vevey, Switzerland.
Document Control
Document Owner Fabian SIRACH
Review Cycle in months 12
Date of update Updated by Changes Made (section Version Status of

(dd/mm/yyyy) (author name) numbers and description) # document
05/01/2004 Fabian SIRACH Draft Creation 1.0 Draft

03/02/2004 Fabian SIRACH Revision 1.1 Final

Table of Contents
DOCUMENT CONTROL..........................................................................................................2
TABLE OF CONTENTS...........................................................................................................3
1 PREFACE...............................................................................................................................6
1.1 DOCUMENT AUDIENCE............................................................................................................6
1.2 PREREQUISITES AND RELATED DOCUMENTATION............................................................................6
2 CLUSTER ADMINISTRATION...............................................................................................7
2.1 RESOURCES MANAGEMENT......................................................................................................7
2.2 GROUP MANAGEMENT............................................................................................................8
2.3 NODE MANAGEMENT............................................................................................................10
2.4 APPLYING SERVICE PACK AND HOTFIX.....................................................................................13
2.5 CHKDSK AND AUTOCHK.........................................................................................................13
2.6 CLUSTER COMMAND LINE.....................................................................................................14
3 TROUBLESHOOTING.........................................................................................................15
3.1 ONE NODE IS DOWN............................................................................................................15
3.2 ENTIRE CLUSTER IS DOWN....................................................................................................17
3.3 ONE OR MORE SERVERS QUIT RESPONDING.............................................................................17
3.4 CLUSTER SERVICE DO NOT START..........................................................................................17
3.5 CLUSTER SERVICE STARTS BUT CLUSTER ADMINISTRATOR WILL NOT CONNECT...................................18
3.6 CLUSTER ADMINISTRATOR STOPS RESPONDING ON FAILOVER.......................................................19
3.7 GROUP/RESOURCES FAILOVER PROBLEMS................................................................................20
3.8 QUORUM RESOURCES FAILURE..............................................................................................23
3.9 NETWORK NAME RESOURCE DOES NOT GO ONLINE..................................................................25
3.10 PHYSICAL DISK RESOURCE PROBLEM....................................................................................25
3.11 CLIENT CONNECTIVITY PROBLEM...........................................................................................26
3.11.1 Clients have intermittent Connectivity Based on Group Ownership......................26
3.11.2 Clients do not Have any Connection with the Cluster............................................27
3.11.3 Clients have Problems Accessing Data Through a File Share..............................27
3.11.4 Client Experience Intermittent Access....................................................................28
4 APPENDIX: MSCS EVENT MESSAGES............................................................................29
4.1 EVENT ID 1000................................................................................................................29
4.2 EVENT ID 1002................................................................................................................29
4.3 EVENT ID 1006................................................................................................................30
4.4 EVENT ID 1007................................................................................................................30
4.5 EVENT ID 1009................................................................................................................30
4.6 EVENT ID 1010................................................................................................................31
4.7 EVENT ID 1011.................................................................................................................31
4.8 EVENT ID 1012................................................................................................................31
4.9 EVENT ID 1015................................................................................................................31
4.10 EVENT ID 1016..............................................................................................................32

4.11 EVENT ID 1019...............................................................................................................32
4.12 EVENT ID 1021..............................................................................................................32
4.13 EVENT ID 1022..............................................................................................................33
4.14 EVENT ID 1023..............................................................................................................33
4.15 EVENT ID 1024..............................................................................................................33
4.16 EVENT ID 1034..............................................................................................................33
4.17 EVENT ID 1035..............................................................................................................34
4.18 EVENT ID 1036..............................................................................................................34
4.19 EVENT ID 1037..............................................................................................................35
4.20 EVENT ID 1038..............................................................................................................35
4.21 EVENT ID 1040..............................................................................................................35
4.22 EVENT ID 1041..............................................................................................................36
4.23 EVENT ID 1042..............................................................................................................36
4.24 EVENT ID 1043..............................................................................................................36
4.25 EVENT ID 1044..............................................................................................................37
4.26 EVENT ID 1045..............................................................................................................37
4.27 EVENT ID 1046..............................................................................................................37
4.28 EVENT ID 1047..............................................................................................................38
4.29 EVENT ID 1048..............................................................................................................38
4.30 EVENT ID 1049..............................................................................................................38
4.31 EVENT ID 1050..............................................................................................................39
4.32 EVENT ID 1051..............................................................................................................39
4.33 EVENT ID 1052..............................................................................................................39
4.34 EVENT ID 1053..............................................................................................................40
4.35 EVENT ID 1054..............................................................................................................40
4.36 EVENT ID 1055..............................................................................................................40
4.37 EVENT ID 1056..............................................................................................................41
4.38 EVENT ID 1057..............................................................................................................41
4.39 EVENT ID 1058..............................................................................................................41
4.40 EVENT ID 1059..............................................................................................................42
4.41 EVENT ID 1061..............................................................................................................42
4.42 EVENT ID 1062..............................................................................................................42
4.43 EVENT ID 1063..............................................................................................................42
4.44 EVENT ID 1064..............................................................................................................43
4.45 EVENT ID 1065..............................................................................................................43
4.46 EVENT ID 1066..............................................................................................................43
4.47 EVENT ID 1067..............................................................................................................44
4.48 EVENT ID 1068..............................................................................................................44
4.49 EVENT ID 1069..............................................................................................................44
4.50 EVENT ID 1070..............................................................................................................45
4.51 EVENT ID 1071..............................................................................................................45
4.52 EVENT ID 1073..............................................................................................................45
4.53 EVENT ID 1077..............................................................................................................45
4.54 EVENT ID 1080..............................................................................................................45
4.55 EVENT ID 1093..............................................................................................................46
4.56 EVENT ID 1096..............................................................................................................46

4.57 EVENT ID 1097..............................................................................................................46
4.58 EVENT ID 1098..............................................................................................................46
4.59 EVENT ID 1100...............................................................................................................47
4.60 EVENT ID 1102...............................................................................................................47
4.61 EVENT ID 1104...............................................................................................................47
4.62 EVENT ID 1105...............................................................................................................47
4.63 EVENT ID 1107...............................................................................................................48
4.64 EVENT ID 1109...............................................................................................................48
4.65 EVENT ID 1115...............................................................................................................48
5 APPENDIX: RELATED EVENT MESSAGES......................................................................49
5.1 EVENT ID 9......................................................................................................................49
5.2 EVENT ID 101..................................................................................................................49
5.3 EVENT ID 1004................................................................................................................50
5.4 EVENT ID 1005................................................................................................................50
5.5 EVENT ID 2511.................................................................................................................50
5.6 EVENT ID 4199................................................................................................................50
5.7 EVENT ID 5719................................................................................................................51
5.8 EVENT ID 7000................................................................................................................51
5.9 EVENT ID 7013................................................................................................................51
5.10 EVENT ID 7023..............................................................................................................52
6 APPENDIX: MAINTENANCE TOOLS.................................................................................53
6.1 WINDOWS 2000 TOOLS........................................................................................................53
6.2 WINDOWS 2000 RESOURCE KIT TOOLS...................................................................................54
7 APPENDIX: USING AND READING THE CLUSTER LOGFILE........................................55
7.1 CLUSTERLOG ENVIRONMENT VARIABLE..............................................................................55
7.2 OPERATING SYSTEM VERSION NUMBER AND SERVICE PACK LEVE..................................................55
7.3 CLUSTER SERVICE STARTUP..................................................................................................55
7.4 LOGFILE ENTRIES FOR COMMON FAILURES................................................................................59
8 APPENDIX: Q258078 CLUSTER SERVICE STARTUP OPTIONS....................................64

1 Preface
This document intends to provide general operations required for administering and
troubleshooting Windows 2000 Cluster Service for NESTLE Enterprise Portal on Windows
2000 Operating System.
1.1 Document Audience

Windows 2000 System Administrators
1.2 Prerequisites and Related Documentation

Good Knowledge about Microsoft Windows 2000 Advanced Server
Good Knowledge about Microsoft Windows 2000 Cluster Service
W2K_GEO_SERVICE_CHECK Procedure
W2K_GEO_SERVICE_FAILOVER Procedure
W2K_GEO_SERVICE_SHUTDOWN Procedure
W2K_GEO_SERVICE_TAKEOVER Procedure
W2K_Cluster Server_Manual Takeover Procedure
W2K_Operating System_Restart Service Procedure
W2K_Server Status_Check Patch Level Procedure
W2K_Software_Apply Hotfix Procedure
Operation Runbook W2K GeoCluster Document
Operation Runbook W2K Document
8909675.doc Page 6 of 68 Created: 06.01.2004

Author: Fabian SIRACH Printed: 22.08.2002 17:39
2 Cluster Administration
2.1 Resources Management

Bring a resource online:
1. Open Cluster Administrator (click Start, point to Programs, point to Administrative

Tools, and then click Cluster Administrator).
2. In the console tree, click the Resources folder.
3. In the details pane, click the resource you want.
4. On the File menu, click Bring Online.
Refer to
Take a resource offline:

2. In the console tree, click the Resources folder.
3. In the details pane, click the resource you want.
4. On the File menu, click Take Offline.
Remark: taking a resource offline causes all resources that depend on that resource to be
taken offline.
Refer to

2.2 Group Management

Bring a group online:

2. In the console tree, double-click the Groups folder.
3. In the details pane, click the group you want.
4. On the File menu, click Bring Online.
Take a group offline:

4. On the File menu, click Take Offline.
Remark: resources in a group go offline in the order of their dependencies.
Move a group to another node:

4. On the File menu, click Move Group.
Remark: after the transfer, the new node owns all resources in the group, as the Owner
column in the details pane should reflect.

Specify preferred owners of a group:

2. In the console tree, click the Groups folder.
4. On the File menu, click Properties.
5. On the General tab, next to Preferred owners, click Modify.
6. In the Modify Preferred Owners dialog box, enter any changes you want to make:
1. To add one or more preferred owners, under Available nodes, not Preferred
owners, click the nodes you want to add, and then click the right arrow.
2. To remove a preferred owner, under Preferred owners, click the nodes you
want to remove, and then click the left arrow.
3. To change the priority of a preferred owner, click the node, and then click the up
or down arrow.
Set group failover policy:

5. On the Failover tab, type values for Threshold and Period.
Remark: the failover policy for a group is the maximum number of times (Threshold) that
the group is allowed to fail over in the specified number of hours (Period) before it is
taken completely offline. If a group fails over more often than this, the Cluster service
leaves it offline.

Set group failback policy:

5. On the Failback tab, click Prevent failback or Allow failback.
If you click Allow failback, then either click Immediately, or click Failback between
and set the time interval.
Remark: to set the time interval for Failback between, enter numbers between 0 and 23
for the beginning and end of the interval. If the first number is greater than the second, the
interval will end on the following day. The numbers correspond to the local time of the
cluster group, as read on a 24-hour clock.
2.3 Node Management

Stopping the Cluster service:
2. In the console tree, click the node.
3. On the File menu, click Stop Cluster Service.
Remark: when you stop the Cluster service on a node, you prevent clients from accessing
cluster resources through that node. When you stop the Cluster service on a node, all
groups move to the other node (if the failover policies allow it).
Refer to
W2K_GEO_SERVICE_CHECK Procedure
W2K_GEO_SERVICE_SHUTDOWN Procedure
Starting the Cluster service:


2. In the console tree, click the node.
3. On the File menu, click Start Cluster Service.
Installing Cluster Service when the other node is online:
1. Open Add/Remove Programs (click Start, point to Settings, click Control Panel, and
then double-click Add/Remove Programs),
2. Click Add/Remove Windows Components,
3. The Welcome to the Windows Components wizard will begin.
4. In Components, select Cluster Service.
5. Click Next.
6. Cluster Service files are located on the Windows 2000 Advanced Server CD-ROM.
Enter Z:\i386.
7. Click OK.
8. Click Next.
9. Click I Understand to accept the condition that Cluster Service is supported on
hardware from the Hardware Compatibility List only.
10. In the Create or Join a Cluster dialog, select The second or next node in the cluster,
and click Next.
11. Enter the cluster name and click Next.
12. Leave Connect to cluster as unchecked. The Cluster Service Configuration wizard
will automatically supply the name of the user account selected during the installation
of the first node. Always use the same account as you used when setting up the first
cluster node.
13. Enter the password for the account and click Next.
14. At the next dialog box, click Finish to complete configuration.
15. The Cluster Service will start. Click OK.
16. Close Add/Remove Programs.

To validate the cluster installation, do to following:
1. Click Start, click Programs, click Administrative Tools, and click Cluster
Administrator.
2. The presence of two nodes shows that a cluster exists and is in operation.
Removing Cluster Service :
1. Open Add/Remove Programs (click Start, point to Settings, click Control Panel, and
then double-click Add/Remove Programs),
2. Click Add/Remove Windows Components,
3. The Welcome to the Windows Components wizard will begin.
4. Click Next,
5. In Components, click to clear Cluster Service, and then click Next.
Enable diagnostic Logging :
1. Open Control Panel (click Start, point to Settings, and then click Control Panel),
2. Double click System,
3. On the Advanced tab, click Environment Variables.
4. Under System variables, click New.
5. In Variable Name, specify the name of the variable. In Variable Value, specify the
name of the diagnostic log file.
For example, set Variable Name to Clusterlog and Variable Value to
C:\Temp\Cluster.log.
6. Click OK, and then click OK again. Close Control Panel.
7. Stop and restart the Cluster service.
Disable diagnostic Logging :

1. Open Control Panel (click Start, point to Settings, and then click Control Panel),
2. Double click System,
3. On the Advanced tab, click Environment Variables.
4. Under System variables, select Clusterlog.
5. Click Delete, click OK, and then click OK again. Close Control Panel.
6. Stop and restart the Cluster service.
2.4 Applying Service Pack and Hotfix

Refer to document “GLOBE ISIT OASIS2 Applying Hotfixes”
2.5 Chkdsk and Autochk

Disks that are attached to the shared bus interact differently with Chkdsk.Exe than with
Autochk.Exe. Autochk.Exe, the system startup version of Chkdsk.Exe, does not perform file
system checks on shared drives when the system starts, even if the operations are required.
The Cluster service performs a file system integrity test for each drive when it brings a
physical disk online. The cluster automatically starts Chkdsk if it is necessary.
If you have to run Chkdsk on a drive, click the following article numbers to view the articles in
the Microsoft Knowledge Base:
• 174617 CHKDSK Runs While Running Microsoft Cluster Server Setup

• 176970 How to Run the CHKDSK /F Command on a Shared Cluster Disk

2.6 Cluster Command Line

Using Cluster tool:
With this tool, you can do each operation that can be done with Cluster Administrator.
To verify nodes state: type cluster <cluster name> node

To verify resource state: type cluster <ressource name> resource
To verify network adapter state: type cluster <cluster name> network
To move a group: type CLUSTER <cluster name> GROUP <cluster group> /MOVETO :
<Node name>

3 Troubleshooting
3.1 One node is Down

General Information:
Before troubleshooting, if a single node is unavailable, make sure that resources and
groups are available on the other node.
If a node is online, gather information about failure:
1. Check event logs on the online node (event log messages meaning is indicated in
Appendix)
2. Check cluster diagnostic logfile,
3. Check for the existence of a recent Memory.dmp file that may have been created from a
recent crash. If necessary, contact Microsoft Product Support Services for assistance with
this file.
4. Go to the paragraph corresponding to the failure.
Symptoms and solutions:
Symptoms Causes Solutions

Second node cannot join the You may not be using the Confirm that you are using
cluster. proper cluster name, node the proper cluster name,
name, or IP address. node name, or IP address
The Cluster Name resource Confirm that the Cluster
may not have started. Name resource started.
The Cluster service may not Confirm that the Cluster
be running on the first node.service is running on the
first node and that all
resources within the
Cluster Group are online
before installing the second
node.
Network connectivity may not Confirm that network
exist between the two nodes connectivity exists between
the two nodes
You may not have IP Confirm that you have IP
connectivity to the cluster connectivity to the cluster
address address and that the IP
address is assigned to the
correct network.


Second node cannot connect to The same drive letters may Confirm that the cluster
the cluster drives. not have been assigned to drives are assigned the
the cluster drives on all same drive letters on all
nodes nodes.
The SCSI devices may not Verify that each SCSI
have unique IDs. device has a unique ID.
SCSI controller IDs are
preset to seven. Reset one
SCSI controller ID to six.
The second node may not be Confirm that the second
physically connected to the node is physically
cluster drive. connected to the cluster
drive. If it is not, shut down
both nodes and the cluster
drive. Connect the nodes to
the shared SCSI bus.
Then, start the cluster drive
and start the first node.
After the Cluster service
starts on the first node,
start the second node, and
attempt to connect to the
cluster drive
The SCSI controllers on the Confirm that the SCSI
shared SCSI bus may not be controllers on the shared
correctly configured SCSI bus are correctly
configured (with both cards
configured to transfer data
at the same rate).
The devices and controllers Confirm that your devices
may not match. and controllers match.

3.2 Entire Cluster is Down

Before troubleshooting, try to bring at least one node online. If you can achieve this
goal, the affect on users may be substantially reduced.
If a node is online or one server could be started, gather information about failure:
1. Check event logs on the online node (event log messages meaning is indicated in
Appendix)
2. Check cluster diagnostic logfile,
3. Check for the existence of a recent Memory.dmp file that may have been created from
a recent crash. If necessary, contact Microsoft Product Support Services for assistance
with this file.
4. Go to paragraph corresponding to the identified failure.
If no server can be restarted, in last resort, restore both nodes (this procedure is defined
Disaster Recovery Guide).
3.3 One or More Servers Quit Responding

If one or more servers are not responding but have not crashed or otherwise failed, the
problem may be related to:
 Domain controllers’ connectivity: Check network connectivity with domain controllers

and for other network problems (use the ping command with dc name and dc IP
address).
 configuration,
 software,
 driver issues,
 Fiber Channel Connectivity
 Connected disk devices.
For theses problems go to the corresponding chapter.
3.4 Cluster Service Do not Start

1. Check the event log messages and look at Appendix.
2. Determine if the issue comes from the service account used for Cluster Service.
Information: failures related to the service account may result in Event ID 7000 or
Event ID 7013 errors in the event log. In addition, you may receive the following error
message:

"Could not start the Cluster Service on \\computername. Error 1069: The
service did not start because of a logon failure."
 Make sure the account is not disabled and that password expiration is not a
factor.
 This domain account needs to be a member of the local administrators group on
each server.
 The account needs the Logon as a service and Lock pages in memory rights.
 Make sure the password specified for the Cluster Service account is correct.
(Retype it and click on the apply button, try to restart the service).
3. Check to make sure the quorum disk is online and that the Fiber Channel has
proper termination and proper function.
Information: if the quorum disk is not accessible during startup, the following error
message may occur:
"Could not start the Cluster Service on \\computername. Error 0021: The device
is not ready."
If the Cluster Service is running on the other cluster node, check the cluster
logfile on that system for indications of whether or not the other node attempted to join
the cluster. If the cluster node did try to join the cluster, and the request was denied,
the logfile may contain details of the event. For example, if you evict a node from the
cluster, but do not remove and reinstall MSCS on that node, when the server attempts
to join the cluster, the request to join will be denied.
To start a cluster node if the cluster service doesn’t start and no cluster.log file exists,
use the –debug option:
1. Open a console Windows by typing CMD in the Run menu,

2. Go in the %systemroot%\cluster directory ( cd \winnt\cluster)
3. Type CLUSSVC –debug
The debug informations are then send to console.
To stop this service, type CTRL+C.
3.5 Cluster Service Starts but Cluster Administrator will not connect
If the Services utility in Administrative Tools indicates that the service is running,
and you cannot connect with Cluster Administrator to administer the cluster, the
problem may be related to:
 the Cluster Network Name,

 to the cluster IP address resources or
 there may also be RPC-related problems.
1. Check to make sure the RPC Service is running on both nodes.

2. If it is, try to connect to a known running cluster node by the computer name. If
running Cluster Administrator on the local node, you may specify a period (.) in place
of the name when prompted. This will create a local connection and will not require
name resolution.
3. If you can connect through the computer name or using the period, check the
cluster network name and cluster IP address resources. Make sure that these
and other resources in the cluster group are online. These resources may fail if a
duplicate name or IP address on the network conflicts with either of these resources.
4. A duplicate IP address on the network may cause the network adapter to shut down.
Check the system event log for errors.
3.6 Cluster Administrator Stops Responding On Failover

The Cluster Administrator application uses RPC communication to connect with the cluster. If
you use the cluster name to establish the connection, Cluster Administrator may appear to
stop responding during a failover of the Cluster group and its resources. This ordinary delay
occurs during the registration of the IP address and network name resources in the group
and the establishment of a new RPC connection. If a problem occurs with the registration of
these resources, the process may take extended time until these resources become
available.
The first RPC connection must time out before the application tries to establish another
connection. As a result, Cluster Administrator may eventually time out if problems occur when
the IP address or network name resources are brought online in the Cluster group. In this
situation, try to connect by using the computer name of one of the cluster nodes instead of
the cluster name. Doing so typically allows a more real-time display of resource and group
transitions without delay.

3.7 Group/Resources Failover Problems

General Information:
The typical reason that a group may not failover properly is usually because of problems with
resources within the group. For example, if you elect to move a group from one node to
another, the resources within the group will be taken offline, and ownership of the group will
be transferred to the other node. On receiving ownership, the node will attempt to bring
resources online, according to dependencies defined for the resources. If resources fail to go
online, MSCS attempts again to bring them online. After repeated failures, the failing resource
or resources may affect the group and cause the group to transition back to the previous
node. Eventually, if failures continue, the group or affected resources may be taken offline.
You can configure the number of attempts and allowed failures through resource and group
properties.
1. When you experience problems with group or resource failover, evaluate which
resource or resources may be failing. Determine why the resource won't go
online.
2. Check resource dependencies for proper configuration and make sure they
are available.
3. Also, make sure that the "Possible Owners" list includes both nodes.
4. If resource properties do not appear to be part of the problem, check the event log
or cluster logfile for details.
Information: the "Preferred Owners" list is designed for automatic failback or initial group
placement within the cluster. In a two-node cluster, this list should only contain the name of
the preferred node for the group, and should not contain multiple entries.
To move a group from one node to another:
1. You must have administrative rights to run Cluster Administrator.

2. The destination node must be online and the cluster service started.
3. The state of the node must be online and not Paused.
4. Both cluster nodes should be listed in the Possible Owners list for the
resources within the group.
5. Also, to move a group, resources within the group cannot be in a pending state. To
initiate a Move Group request, resources must be in one of the following three
states: online, offline, or failed.

Symptoms and causes:

A resource fails, but is A resource may depend on In the resource Properties dialog
not brought back another resource that has box, make sure that the Do not
online. failed restart is clear. If the resource
needs another resource in order to
function, and if the second resource
fails, confirm that the dependencies
are correctly configured.
You cannot bring a The resource is not properly Make sure the application or service
resource online. installed. associated with the resource is
properly installed
The resource is not properly Make sure the properties are set
configured. correctly for the resource.
You cannot bring the You may not have restarted Make sure that you restarted all
default physical disk the servers after installing servers after installing the Cluster
resource online in the Cluster service service
Cluster Administrator. There may be hardware Make sure that there are no
errors or transport problems hardware errors or transport
problems. Using Event Viewer, look
in the event log for disk I/O error
messages or indications of
problems with the communications
transport.
One or more SCSI adapters Make sure that the SCSI adapters
on the shared SCSI bus are are configured correctly.
configured incorrectly.
The shared SCSI bus Make sure that the shared SCSI bus
exceeds the maximum cable does not exceed the maximum
length. cable length.
The shared SCSI bus is Make sure that the shared SCSI bus
improperly terminated is properly terminated
The disk is not supported. Make sure that the disk hardware or
firmware revision level is not
outdated.
Duplicate SCSI IDs have Verify that each SCSI device has a
been specified on the shared unique ID. SCSI controller IDs are
SCSI bus. preset to seven. Reset one SCSI
controller ID to six.

If you move your SCSI bus In order to accommodate these

adapter to another I/O slot, changes, make sure that your
add or remove bus adapters, shared SCSI bus adapter has been
or install a new version of the properly reconfigured.
bus adapter driver, the
cluster software may not be
able to access disks on your
shared SCSI bus
Windows 2000 is incorrectly Verify that Windows 2000 can
configured to access the detect the shared SCSI bus adapter
shared SCSI bus and that the SCSI IDs for the
adapter and disks are listed. ( open
Control Panel and double-click
SCSI Adapters)
A resource in the group may Determine if a resource in the group
be continually failing. is continually failing. If the node can,
it will bring the resource back up
without failing over the group. If the
resource continually fails but does
not fail over, make sure that the
resource property Restart and
affect the group is selected. Also,
check the Restart Threshold and
Restart Period settings, which are
also in the resource Properties
dialog box.
A group failed over but The failback policies of both Make sure that the Prevent
did not fail back. the group and the resources failback check box is clear in the
may not be properly group Properties dialog box. If the
configured. Allow failback check box is
selected, be sure to wait long
enough for the group to fail back.
Check these settings for all affected
resources within a group. Because
groups fail over as a whole, one
resource that is prevented from
failing back affects the entire group.
The node to which you want Make sure that the node to which
the group to fail back is not you want the group to fail back is
configured as the preferred configured as the preferred owner of
owner of the group. the group. If not, the Cluster service
leaves the group on the node to
which they failed over.
The entire group failed A node is offline. Make sure that the node is not
and has not restarted. offline.


The group has failed The group may have exceeded its
repeatedly failover threshold or its failover
period. Try to bring the resources
online individually (following the
correct sequence of dependencies)
to determine which resource is
causing the problem. Or, create a
temporary resource group (for
testing purposes) and move the
resources to it, one at a time.
3.8 Quorum Resources Failure

Symptoms and solutions:

Quorum resource The resource is not physically Make sure that the resource is
does not start connected to the server. physically connected to the server
The devices are not properly Make sure that the devices are
terminated. properly terminated.
The problem is with the Turn off the SCSI devices and check
hardware configuration. the SCSI IDs of the devices. Make
sure that the IDs are not both set to 7
(the default).
Quorum resource The disk on the shared bus If the disk on the shared bus holding
fails. holding the quorum resource the quorum resource fails and cannot
has failed. be brought online, the Cluster service
cannot start. To correct this situation,
use the fixquorum option. For more
information, read the paragraph below.
Quorum log This may occur for a variety If the quorum log is corrupted, the
becomes corrupted. of reasons. Cluster service attempts to correct the
problem by resetting the log file. In this
case, the Cluster service writes the
following message in the
Windows 2000 system log:
The log file [name] was found to be
corrupt. An attempt will be made to
reset it.
If the quorum log cannot be reset, the
Cluster service cannot start.
If the Cluster service fails to detect that

the quorum log is corrupted, the

Cluster service may fail to start. In this
case, there may be an
"ERROR_CLUSTERLOG_CORRUPT"
message in the system log.
To correct this, you must use the
noquorumlogging option. For more
information read the paragraph below.
If the Cluster Service won't start because of a quorum disk failure, check the
corresponding device. Quorum access problem is usually caused by connectivity or
authentication issues. If this is not the case, execute the procedure below.
This operation is complex and has a great impact on cluster, so it’s recommended to
realize it with Microsoft Product Support Services’ assistance.
Recover a failed quorum disk:
You can check the status of the quorum device by starting the service with the -fixquorum
switch, and attempt to bring the quorum disk online, or change the quorum location for the
service.
When you use the fixquorum option to start the Cluster service, only cluster name and
cluster IP resources are set online. To recover a failed quorum resource:
1. Start Cluster Service with the -fixquorum option on a single node :
a. Start the Services snap-in. Click Start , point to Programs , click Administrative
Tools , and then click Services .
b. Right-click and select the properties of the Cluster Service.
c. In the Start Parameters box, type: /fix quorum
d. Then press the Start button.
2. Use Cluster Administrator to configure the Cluster Service to use a different disk on the
shared bus for the quorum resource.
3. To view or change the quorum drive settings, right-click the cluster name at the top of the
tree, listed on the left portion of the Cluster Administrator window, and select Properties.
4. The Cluster Properties window contains three different tabs, one of which is for the
quorum disk,
5. From this tab, you may view or change quorum disk settings. You may also redesignate
the quorum resource.
Recover a failed quorum log:
This operation is complex and has a great impact on cluster, so it’s recommended to
realize it with Microsoft Product Support Services’ assistance.

When this occurs, a message is logged in the eventlog :

The log file D:\MSCS\quolog.log was found to be corrupt. An attempt will be
made to reset it, or you should use the Cluster Administrator utility to
adjust the maximum size.
Remark : If the error message occurs after you restore the system state on a computer
that has lost the quorum log, the quorum information is copied to
%SystemRoot%\Cluster\Cluster_backup. You can then use the Clusrest.exe tool from the
Resource Kit to restore this information to the quorum disk.
If you have a backup of the system state on one of the computers after the last
changes were made to the cluster, you can restore the quorum by restoring this
information.
If you do not have a backup of the Quorum log file, recreate a new quorum log file based
on the cluster configuration information in the local system's cluster hive by starting the
Cluster Service with the ResetQuorumLog switch :
1. Start the Services snap-in. Click Start, point to Programs , click Administrative Tools , and
then click Services .
2. Right-click and select the properties of the Cluster Service.
3. In the Start Parameters box, type: /ResetQuorumLog
4. Then press the Start button.
3.9 Network Name Resource Does Not Go Online

There are various causes that can prevent a network name resource from going online. Many
causes may be clearly indicated in the system event log. Potential causes may include:
 A duplicate name on the network from an offending computer.

 Static WINS entries for the network name.
 A malfunctioning switch or router.
 An incorrect TCP/IP configuration for one or more network adapters.
 An incorrect setting for the RequireDNS property.
3.10 Physical Disk Resource Problem

Problems with physical disk resources are usually hardware related. Cables, termination, or
SCSI host adapter configuration may cause problems with failover, or may cause premature
failure of the resource. The system event log may often show events related to physical disk
or controller problems. However, some cable or termination problems may not yield such
helpful information.

1. It is important to verify the configuration of the Fiber Channel and attached

devices, whenever you detect trouble with one of these devices.
2. BIOS or firmware problems might also be factors.
3.11 Client Connectivity Problem
3.11.1 Clients have intermittent Connectivity Based on Group Ownership

If clients successfully connect to clustered resources only when a specific node is the owner,
a few possible problems could lead to this condition. To define more precisely the problem:
1. Check the system event log on each server for possible errors.
2. Check to make sure that the group has at least one IP address resource and one
network name resource,
3. Check that clients use one of these to access the resource or resources within
the group. If clients connect with any other network name or IP address, they may not
be accessing the correct server in the event that ownership of the resources changes.
As a result of improper addressing, access to these resources may appear limited to a
particular node.
4. If you are able to confirm that clients use proper addressing for the resource or
resources, check the IP address and network name resources to see that they are
online.
5. Check network connectivity with the server that owns the resources. For example,
try some of the following techniques:
From the server
PING server's primary adapter IP address (on client network)
PING IP address of the group
PING Network Name of the group
PING Router/Gateway between client and server (if any)
PING Client IP address
If the above tests work correctly up to the router/gateway check, the problem may be
elsewhere on the network because you have connectivity with the other server and
local addresses. If tests complete up to the client IP address test, there may be a client
configuration or routing problem.
From the client:

PING Client IP address
PING Router/Gateway between client and server (if any)
PING server's primary adapter IP address (on client network)
PING IP address of the group
PING Network Name of the group

If the tests from the server all pass, but you experience failures performing tests from
the client, there may be client configuration problems. If all tests complete except the
test using the network name of the group, there may be a name resolution problem.
This may be related to client configuration, or it may be a problem with the client's
designated DNS server.
3.11.2 Clients do not Have any Connection with the Cluster

If clients lose connectivity with both cluster nodes:
1. Check to make sure that the Cluster Service is running on each node.
2. Check the system event log for possible errors.
3. Check network connectivity between cluster nodes, and with other network devices,
by using the procedure in the previous section.
4. If the Cluster Service is running, and there are no apparent connectivity problems
between the two servers, there is likely a network or client configuration problem that
does not directly involve the cluster : Check to make sure the client uses the TCP/IP
protocol and has a valid IP address on the network
5. Make sure that the client is using the correct network name or IP address to access
the cluster.
3.11.3 Clients have Problems Accessing Data Through a File Share

If clients experience problems accessing cluster file shares:
1. Check the resource and make sure it is online, and that any dependent resources
(disks, network names, and so on) are online,
2. Check the system event log for possible errors,
3. Check network connectivity between the client and the server that owns the resource.
4. If the data for the share is on a shared drive (using a physical disk resource), make
sure that the file share resource has a dependency declared for the physical disk
resource.
5. You can reset the file share by toggling the file share resource offline and back online
again.
6. Cluster file shares behave essentially the same as standard file shares. So, make
sure that clients have appropriate access at both the file system level and the share
level.
7. Make sure that the server has the proper number of client access licenses loaded for
the clients connecting, in the event that the client cannot connect because of
insufficient available connections.

3.11.4 Client Experience Intermittent Access

Network adapter configuration is one possible cause of intermittent access to the cluster,
and of premature failover:
1. Some autosense settings for network speed can spontaneously redetect network speed.
During the detection, network traffic through the adapter may be compromised. For best
results, set the network speed manually to avoid the recalibration.
8. Make sure to use the correct network adapter drivers. Some adapters may require
special drivers, although they may be detected as a similar device.

4 Appendix: MSCS Event Messages
4.1 Event ID 1000

Source ClusSvc
Description Microsoft Cluster Server suffered an unexpected fatal error

at line ### of source module %path%. The error code was
1006.
Problem Messages similar to this may occur in the event of a fatal

error that may cause the Cluster Service to terminate on the
node that experienced the error.
Solution Check the system event log and the cluster diagnostic logfile
for additional information. It is possible that the cluster
service may restart itself after the error. This event message
may indicate serious problems that may be related to
hardware or other causes.
4.2 Event ID 1002

Source ClusSvc
Description Microsoft Cluster Server handled an unexpected error at line

528 of source module
G:\Nt\Private\Cluster\Resmon\Rmapi.c. The error code was
5007.
Problem Messages similar to this may occur after installation of

Microsoft Cluster Server. If the cluster service starts and
successfully forms or joins the cluster, they may be ignored.
Otherwise, these errors may indicate a corrupt quorum logfile
or other problem.
Solution Ignore the error if the cluster appears to be working properly.

Otherwise, you may want to try creating a new quorum
logfile using the -noquorumlogging or
-fixquorum parameters as documented in the Microsoft
Cluster Server Administrator's Guide.

4.3 Event ID 1006

Source ClusSvc
Description Microsoft Cluster Server was halted because of a cluster

membership or communications error. The error code was 4.
Problem An error may have occurred between communicating cluster

nodes that affected cluster membership. This error may
occur if nodes lose the ability to communicate with each
other.
Solution Check network adapters and connections between nodes.

Check the system event log for errors. There may be a
network problem preventing reliable communication
between cluster nodes.
4.4 Event ID 1007

Source ClusSvc
Description A new node, "ComputerName", has been added to the

cluster.
Information The Microsoft Cluster Server Setup program ran on an

adjacent computer. The setup process completed, and the
node was admitted for cluster membership. No action
required.
4.5 Event ID 1009

Source ClusSvc
Description Microsoft Cluster Server could not join an existing cluster

and could not form a new cluster. Microsoft Cluster Server
has terminated.
Problem The cluster service started and attempted to join a cluster.

The node may not be a member of an existing cluster
because of eviction by an administrator. After a cluster node
has been evicted from the cluster, the cluster software must
be removed and reinstalled if you want it to rejoin the
cluster. And, because a cluster already exists with the same
cluster name, the node could not form a new cluster with
the same name.
Solution Remove MSCS from the affected node, and reinstall MSCS
on that system if desired.

4.6 Event ID 1010

Source ClusSvc
Description Microsoft Cluster Server is shutting down because the

current node is not a member of any cluster. Microsoft
Cluster Server must be reinstalled to make this node a
member of a cluster.
Problem The cluster service attempted to run but found that it is not
a member of an existing cluster. This may be due to eviction
by an administrator or incomplete attempt to join a cluster.
This error indicates a need to remove and reinstall the
cluster software.
Solution Remove MSCS from the affected node, and reinstall MSCS
on that server if desired.
4.7 Event ID 1011

Source ClusSvc
Description Cluster Node "ComputerName" has been evicted from the

cluster.
Information A cluster administrator evicted the specified node from the

cluster.
4.8 Event ID 1012

Source ClusSvc
Description Microsoft Cluster Server did not start because the current
version of Windows is not correct.
4.9 Event ID 1015

Source ClusSvc
Description No checkpoint record was found in the logfile

W:\Mscs\Quolog.log; the checkpoint file is invalid or was
deleted.
Problem The Cluster Service experienced difficulty reading data from

the quorum logfile. The logfile could be corrupted.
Solution If the Cluster Service fails to start because of this problem,

try manually starting the cluster service with the
-noquorumlogging parameter. If you need to adjust the
quorum disk designation, use the -fixquorum startup
parameter when starting the cluster service. Both of these
parameters are covered in the MSCS Administrator's Guide.

4.10 Event ID 1016

Source ClusSvc
Description Microsoft Cluster Server failed to obtain a checkpoint from

the cluster database for log file W:\Mscs\Quolog.log.
Problem The cluster service experienced difficulty establishing a

checkpoint for the quorum logfile. The logfile could be
corrupt, or there may be a disk problem.
Solution You may need to use procedures to recover from a corrupt

quorum logfile. You may also need to run chkdsk on the
volume to ensure against file system corruption.
4.11 Event ID 1019

Source ClusSvc
Description The log file D:\MSCS\Quolog.log was found to be corrupt.

An attempt will be made to reset it, or you should use the
Cluster Administrator utility to adjust the maximum size.
Problem The quorum logfile for the cluster was found to be corrupt.
The system will attempt to resolve the problem.
Solution The system will attempt to resolve this problem. This error
may also be an indication that the cluster property for
maximum size should be increased through the Quorum
tab. You can manually resolve this problem by using the
-noquorumlogging parameter.
4.12 Event ID 1021

Source ClusSvc
Description There is insufficient disk space remaining on the quorum

device. Please free up some space on the quorum device. If
there is no space on the disk for the quorum log files then
changes to the cluster registry will be prevented.
Problem Available disk space is low on the quorum disk and must be
resolved.
Solution Remove data or unnecessary files from the quorum disk so

that sufficient free space exists for the cluster to operate. If
necessary, designate another disk with adequate free space
as the quorum device.

4.13 Event ID 1022

Source ClusSvc
Description There is insufficient space left on the quorum device. The

Microsoft Cluster Server cannot start.
Problem Available disk space is low on the quorum disk and is

preventing the startup of the cluster service.
Solution Remove data or unnecessary files from the quorum disk so

that sufficient free space exists for the cluster to operate. If
necessary, use the -fixquorum startup option to start one
node. Bring the quorum resource online and adjust free
space or designate another disk with adequate free space as
the quorum device.
4.14 Event ID 1023

Source ClusSvc
Description The quorum resource was not found. The Microsoft Cluster
Server has terminated.
Problem The device designated as the quorum resource could not be

found. This could be due to the device having failed at the
hardware level, or that the disk resource corresponding to
the quorum drive letter does not match or no longer exists.
Solution Use the -fixquorum startup option for the cluster service.
Investigate and resolve the problem with the quorum disk.
If necessary, designate another disk as the quorum device
and restart the cluster service before starting other nodes.
4.15 Event ID 1024

Source ClusSvc
Description The registry checkpoint for cluster resource "resourcename"

could not be restored to registry key registrykeyname.
The resource may not function correctly. Make sure that no
other processes have open handles to registry keys in this
registry subkey.
Problem The registry key checkpoint imposed by the cluster service

failed because an application or process has an open handle
to the registry key or subkey.
Solution Close any applications that may have an open handle to the
registry key so that it may be replicated as configured with
the resource properties. If necessary, contact the application
vendor about this problem.
4.16 Event ID 1034

Source ClusSvc
Description The disk associated with cluster disk resource resource

name could not be found. The expected signature of the
disk was signature. If the disk was removed from the
cluster, the resource should be deleted. If the disk was
replaced, the resource must be deleted and created again to
bring the disk online. If the disk has not been removed or
replaced, it may be inaccessible at this time because it is

reserved by another cluster node.
Problem The cluster service attempted to mount a physical disk

resource in the cluster. The cluster disk driver could not
locate a disk with this signature. The disk may be offline or
may have failed. This error may also occur if the drive has
been replaced or reformatted. This error may also occur if
another system continues to hold a reservation for the disk.
Solution Determine why the disk is offline or nonoperational. Check

cables, termination, and power for the device. If the drive
has failed, replace the drive and restore the resource to the
same group as the old drive. Remove the old resource.
Restore data from a backup and adjust resource
dependencies within the group to point to the new disk
resource.
4.17 Event ID 1035

Source ClusSvc
Description Cluster disk resource %1 could not be mounted.
Problem The cluster service attempted to mount a disk resource in

the cluster and could not complete the operation. This could
be due to a file system problem, hardware issue, or drive
letter conflict.
Solution Check for drive letter conflicts, evidence of file system

issues in the system event log, and for hardware problems.
4.18 Event ID 1036

Source ClusSvc
Description Cluster disk resource "resourcename" did not respond to a

SCSI inquiry command.
Problem The disk did not respond to the issued SCSI command. This
usually indicates a hardware problem.
Solution Check SCSI bus configuration. Check the configuration of

SCSI adapters and devices. This may indicate a
misconfigured or failing device.

4.19 Event ID 1037

Source ClusSvc
Description Cluster disk resource %1 has failed a filesystem check.

Please check your disk configuration.
Problem The cluster service attempted to mount a disk resource in

the cluster. A filesystem check was necessary and failed
during the process.
Solution Check cables, termination, and device configuration. If the

drive has failed, replace the drive and restore data. This
may also indicate a need to reformat the partition and
restore data from a current backup.
4.20 Event ID 1038

Source ClusSvc
Description Reservation of cluster disk "Disk W:" has been lost. Please
check your system and disk configuration.
Problem The cluster service had exclusive use of the disk, and lost
the reservation of the device on the shared SCSI bus.
Solution The disk may have gone offline or failed. Another node may
have taken control of the disk, or a SCSI bus reset
command was issued on the bus that caused a loss of
reservation.
4.21 Event ID 1040

Source ClusSvc
Description Cluster generic service "ServiceName" could not be found.
Problem The cluster service attempted to bring the specified generic

service resource online. The service could not be located
and could not be managed by the Cluster Service.
Solution Remove the generic service resource if this service is no

longer installed. The parameters for the resource may be
invalid. Check the generic service resource properties and
confirm correct configuration.

4.22 Event ID 1041

Source ClusSvc
Description Cluster generic service "ServiceName" could not be started.
Problem The cluster service attempted to bring the specified generic

service resource online. The service could not be started at
the operating system level.
Solution Remove the generic service resource if this service is no

longer installed. The parameters for the resource may be
invalid. Check the generic service resource properties and
confirm correct configuration. Check to make sure the
service account has not expired, that it has the correct
password, and has necessary rights for the service to start.
Check the system event log for any related errors.
4.23 Event ID 1042

Source ClusSvc
Description Cluster generic service "resourcename" failed.
Problem The service associated with the mentioned generic service

resource failed.
Solution Check the generic service properties and service

configuration for errors. Check system and application event
logs for errors.
4.24 Event ID 1043

Source ClusSvc
Description The NetBIOS interface for "IP Address" resource has failed.
Problem The network adapter for the specified IP address resource

has experienced a failure. As a result, the IP address is
either offline, or the group has moved to a surviving node in
the cluster.
Solution Check the network adapter and network connection for

problems. Resolve the network-related problem.

4.25 Event ID 1044

Source ClusSvc
Description Cluster IP Address resource %1 could not create the

required NetBios interface.
Problem The cluster service attempted to initialize an IP Address

resource and could not establish a context with NetBios.
Solution This could be a network adapter- or network adapter driver-

related issue. Make sure the adapter is using a current
driver and the correct driver for the adapter. If this is an
embedded adapter, check with the OEM to determine if a
specific OEM version of the driver is a requirement. If you
already have many IP Address resources defined, make sure
you have not reached the NetBios limit of 64 addresses. If
you have IP Address resources defined that do not have a
need for NetBios affiliation, use the IP Address private
property to disable NetBios for the address. This option is
available in SP4 and helps to conserve NetBios address
slots.
4.26 Event ID 1045

Source ClusSvc
Description Cluster IP address "IP address" could not create the

required TCP/IP Interface..
Problem The cluster service tried to bring an IP address online. The

resource properties may specify an invalid network or
malfunctioning adapter. This error may occur if you replace
a network adapter with a different model and continue to
use the old or inappropriate driver. As a result, the IP
address resource cannot be bound to the specified network.
Solution Resolve the network adapter problem or change the

properties of the IP address resource to reflect the proper
network for the resource.
4.27 Event ID 1046

Source ClusSvc
Description Cluster IP Address resource %1 cannot be brought online

because the subnet mask parameter is invalid. Please check
your network configuration.
Problem The cluster service tried to bring an IP address resource

online but could not do so. The subnet mask for the
resource is either blank or otherwise invalid.
Solution Correct the subnet mask for the resource.

4.28 Event ID 1047

Source ClusSvc
Description Cluster IP Address resource %1 cannot be brought online

because the IP address parameter is invalid. Please check
your network configuration.
Problem The cluster service tried to bring an IP address resource

online but could not do so. The IP address property contains
an invalid value. This may be caused by incorrectly creating
the resource through an API or the command line interface.
Solution Correct the IP address properties for the resource.
4.29 Event ID 1048

Source ClusSvc
Description Cluster IP address, "IP address," cannot be brought online

because the specified adapter name is invalid.

resource properties may specify an invalid network or a
malfunctioning adapter. This error may occur if you replace
a network adapter with a different model. As a result, the IP
address resource cannot be bound to the specified network.
Solution Resolve the network adapter problem or change the

properties of the IP address resource to reflect the proper
network for the resource.
4.30 Event ID 1049

Source ClusSvc
Description Cluster IP address "IP address" cannot be brought online

because the address IP address is already present on the
network. Please check your network configuration.

address is already in use on the network and cannot be
registered. Therefore, the resource cannot be brought
online.
Solution Resolve the IP address conflict, or choose another address

for the resource.

4.31 Event ID 1050

Source ClusSvc
Description Cluster Network Name resource %1 cannot be brought

online because the name %2 is already present on the
network. Please check your network configuration.
Problem The cluster service tried to bring a Network Name resource

online. The name is already in use on the network and
cannot be registered. Therefore, the resource cannot be
brought online.
Solution Resolve the conflict, or choose another network name.
4.32 Event ID 1051

Source ClusSvc
Description Cluster Network Name resource "resourcename" cannot be

brought online because it does not depend on an IP address
resource. Please add an IP address dependency.
Problem The cluster service attempted to bring the network name

resource online, and found that a required dependency was
missing.
Solution Microsoft Cluster Server requires an IP address dependency

for network name resource types. Cluster Administrator
presents a pop-up message if you attempt to remove this
dependency without specifying another like dependency. To
resolve this error, replace the IP address dependency for
this resource. Because it is difficult to remove this
dependency, Event 1051 may be an indication of problems
within the cluster registry. Check other resources for
possible dependency problems.
4.33 Event ID 1052

Source ClusSvc
Description Cluster Network Name resource "resourcename" cannot be

brought online because the name could not be added to the
system.
Problem The cluster service attempted to bring the network name

resource online but the attempt failed.
Solution Check the system event log for errors. Check network
adapter configuration and operation. Check TCP/IP
configuration and name resolution methods. Check DNS
servers for possible database problems or invalid static
mappings.

4.34 Event ID 1053

Source ClusSvc
Description Cluster File Share "resourcename" cannot be brought online

because the share could not be created.
Problem The cluster service attempted to bring the share online, but
the attempt to create the share failed.
Solution Make sure the Server service is started and functioning

properly. Check the path for the share. Check ownership and
permissions on the directory. Check the system event log for
details. Also, if diagnostic logging is enabled, check the log
for an entry related to this failure. Use the net helpmsg
errornumber command with the error code found in the log
entry.
4.35 Event ID 1054

Source ClusSvc
Description Cluster File Share %1 could not be found.
Problem The share corresponding to the named File Share resource

was deleted using a mechanism other than Cluster
Administrator. This may occur if you select the share with
Explorer and choose 'Not Shared'.
Solution Delete shares or take them offline via Cluster Administrator

or the command line program CLUSTER.EXE.
4.36 Event ID 1055

Source ClusSvc
Description Cluster File Share "sharename" has failed a status check.
Problem The cluster service (through resource monitors) periodically

monitors the status of cluster resources. In this case, a file
share failed a status check. This could mean that someone
attempted to delete the share through Windows NT Explorer
or Server Manager, instead of through Cluster Administrator.
This event could also indicate a problem with the Server
service, or access to the shared directory.
Solution Check the system event log for errors. Check the cluster
diagnostic log (if it is enabled) for status codes that may be
related to this event. Check the resource properties for
proper configuration. Also, make sure the file share has
proper dependencies defined for related resources.

4.37 Event ID 1056

Source ClusSvc
Description The cluster database on the local node is in an invalid state.

Please start another node before starting this node.
Problem The cluster database on the local node may be in a default

state from the installation process and the node has not
properly joined with an existing node.
Solution Make sure another node of the same cluster is online first
before starting this node. Upon joining with another cluster
node, the node will receive an updated copy of the official
cluster database and should alleviate this error.
4.38 Event ID 1057

Source ClusSvc
Description The cluster service CLUSDB could not be opened.
Problem The Cluster Service tried to open the CLUSDB registry hive
and could not do so. As a result, the cluster service cannot
be brought online.
Solution Check the cluster installation directory for the existence of a

file called CLUSDB. Make sure the registry file is not held
open by any applications, and that permissions on the file
allow the cluster service access to this file and directory.
4.39 Event ID 1058

Source ClusSvc
Description The Cluster Resource Monitor could not load the DLL %1 for
resource type %2.
Problem The Cluster Service tried to bring a resource online that

requires a specific resource DLL for the resource type. The
DLL is either missing, corrupt, or an incompatible version.
As a result, the resource cannot be brought online.
Solution Check the cluster installation directory for the existence of

the named resource DLL. Make sure the DLL exists in the
proper directory on both nodes.

4.40 Event ID 1059

Source ClusSvc
Description The Cluster Resource DLL %1 for resource type %2 failed to

initialize.
Problem The Cluster Service tried to load the named resource DLL
and it failed to initialize. The DLL could be corrupt, or an
incompatible version. As a result, the resource cannot be
brought online.
Solution Check the cluster installation directory for the existence of

the named resource DLL. Make sure the DLL exists in the
proper directory on both nodes and is of proper version. If
the DLL is clusres.dll, this is the default resource DLL that
comes with MSCS. Check to make sure the version/date
stamp is equivalent to or has a later date than the version
contained in the service pack in use.
4.41 Event ID 1061

Source ClusSvc
Description Microsoft Cluster Server successfully formed a cluster on

this node.
Information This informational message indicates that an existing cluster

of the same name was not detected on the network, and
that this node elected to form the cluster and own access to
the quorum disk
4.42 Event ID 1062

Source ClusSvc
Description Microsoft Cluster Server successfully joined the cluster.
Information When the Cluster Service started, it detected an existing

cluster on the network and was able to successfully join the
cluster. No action needed.
4.43 Event ID 1063

Source ClusSvc
Description Microsoft Cluster Server was successfully stopped.
Information The Cluster Service was stopped manually by the

administrator.

4.44 Event ID 1064

Source ClusSvc
Description The quorum resource was changed. The old quorum

resource could not be marked as obsolete. If there is a
partition in time, you may lose changes to your database,
because the node that is down will not be able to get to the
new quorum resource.
Problem The administrator changed the quorum disk designation

without all cluster nodes present.
Solution When other cluster nodes attempt to join the existing

cluster, they may not be able to connect to the quorum disk
and may not participate in the cluster, because their
configuration indicates a different quorum device. For any
nodes that meet this criterion, you may need to use the
-fixquorum option to start the Cluster Service on these
nodes and make configuration changes.
4.45 Event ID 1065

Source ClusSvc
Description Cluster resource %1 failed to come online.
Problem The cluster service attempted to bring the resource online,

but the resource could not reach an online status. The
resource may have exhausted the timeout period allotted for
the resource to reach an online state.
Solution Check any parameters related to the resource and check the
event log for details.
4.46 Event ID 1066

Source ClusSvc
Description Cluster disk resource resourcename is corrupted. Running

Chkdsk /F to repair problems.
Problem The Cluster Service detected corruption on the indicated

disk resource and started Chkdsk /f on the volume to
repair the structure. The Cluster Service will automatically
perform this operation, but only for cluster-defined disk
resources (not local disks).
Solution Scan the event log for additional errors. The disk corruption
could be indicative of other problems. Check related
hardware and devices on the shared bus and ensure proper
cables and termination. This error may be a symptom of
failing hardware or a deteriorating drive.

4.47 Event ID 1067

Source ClusSvc
Description Cluster disk resource %1 has corrupt files. Running Chkdsk

/F to repair problems.
Problem The Cluster Service detected corruption on the indicated

disk resource and started Chkdsk /f on the volume to
repair the structure. The Cluster Service will automatically
perform this operation, but only for cluster-defined disk
resources (not local disks).
Solution Scan the event log for additional errors. The disk corruption
could be indicative of other problems. Check related
hardware and devices on the shared bus and ensure proper
cables and termination. This error may be a symptom of
failing hardware or a deteriorating drive.
4.48 Event ID 1068

Source ClusSvc
Description The cluster file share resource resourcename failed to start.

Error 5.
Problem The file share cannot be brought online. The problem may
be caused by permissions to the directory or disk in which
the directory resides. This may also be related to permission
problems within the domain.
Solution Check to make sure that the Cluster Service account has
rights to the directory to be shared. Make sure a domain
controller is accessible on the network. Make sure
dependencies for the share and for other resource in the
group are set correctly. Error 5 translates to "Access
Denied."
4.49 Event ID 1069

Source ClusSvc
Description Cluster resource "Disk G:" failed.
Problem The named resource failed and the cluster service logged
the event. In this example, a disk resource failed.
Solution For disk resources, check the device for proper operation.
Check cables, termination, and logfiles on both cluster
nodes. For other resources, check resource properties for
proper configuration, and check to make sure dependencies
are configured correctly. Check the diagnostic log (if it is
enabled) for status codes corresponding to the failure.

4.50 Event ID 1070

Source ClusSvc
Description Cluster node attempted to join the cluster but failed with
error 5052.
Problem The cluster node attempted to join an existing cluster but

was unable to complete the process. This problem may
occur if the node was previously evicted from the cluster.
Solution If the node was previously evicted from the cluster, you
must remove and reinstall MSCS on the affected server.
4.51 Event ID 1071

Source ClusSvc
Description Cluster node 2 attempted to join but was refused. Error

5052.
Problem Another node attempted to join the cluster and this node
refused the request.
Solution If the node was previously evicted from the cluster, you
must remove and reinstall MSCS on the affected server.
Look in Cluster Administrator to see if the other node is
listed as a possible cluster member.
4.52 Event ID 1073

Source ClusSvc
Description Microsoft Cluster Server was halted to prevent an

inconsistency within the cluster. The error code was 5028.
Problem The cluster service on the affected node was halted because
of some kind of inconsistency between cluster nodes.
Solution Check connectivity between systems. This error may be an

indication of configuration or hardware problems.
4.53 Event ID 1077

Source ClusSvc
Description The TCP/IP interface for cluster IP address resourcename

has failed.
Problem The IP address resource depends on the proper operation of

a specific network interface as configured in the resource
properties. The network interface failed.
Solution Check the system event log for errors. Check the network
adapter for proper operation and replace the adapter if
necessary. Check to make sure the proper adapter driver is
loaded for the device and check for newer versions of the
driver.
4.54 Event ID 1080

Source ClusSvc
Description The Microsoft Cluster Server could not write file

W:\MSCS\Chk7f5.tmp. The disk may be low on disk space,

or some other serious condition exists.
Problem The cluster service attempted to create a temporary file in

the MSCS directory on the quorum disk. Lack of disk space
or other factors prevented successful completion of the
operation.
Solution Check the quorum drive for available disk space. The file
system may be corrupted or the device may be failing.
Check file system permissions to ensure that the cluster
service account has full access to the drive and directory.
4.55 Event ID 1093

Source ClusSvc
Description Node %1 is not a member of cluster %2. If the name of the

node has changed, Microsoft Cluster Server must be
reinstalled.
Problem The cluster service attempted to start but found that it was
not a valid member of the cluster.
Solution Microsoft Cluster Server may need to be reinstalled on this

node. If this is the result of a server name change, be sure
to evict the node from the cluster (from an operational
node) prior to reinstallation.
4.56 Event ID 1096

Source ClusSvc
Description Microsoft Cluster Server cannot use network adapter %1

because it does not have a valid IP address assigned to it.
Problem The network configuration for the adapter has changed and
the cluster service cannot make use of the adapter for the
network that was assigned to it.
Solution Check the network configuration. If a DHCP address was

used for the primary address of the adapter, the address
may have been lost. For best results, use a static address.
4.57 Event ID 1097

Source ClusSvc
Description Microsoft Cluster Server did not find any network adapters
with valid IP addresses installed in the system. The node will
not be able to join a cluster.
Problem The network configuration for the system needs to be

corrected to match the same connected networks as the
other node of the cluster.
Solution Check the network configuration and make sure it agrees

with the working node of the cluster. Make sure the same
networks are accessible from all systems in the cluster.
4.58 Event ID 1098

Source ClusSvc
Description The node is no longer attached to cluster network

network_id by adapter adapter. Microsoft Cluster Server will

delete network interface interface from the cluster
configuration.
Information The Cluster Service observed a change in network

configuration that might be induced by a change of adapter
type or by removal of a network. The network will be
removed from the list of available networks.
4.59 Event ID 1100

Source ClusSvc
Description Microsoft Cluster Server discovered that the node is now

attached to cluster network network_id by adapter adapter.
A new cluster network interface will be added to the cluster
configuration.
Information The Cluster Service noticed a new network accessible by the

cluster nodes, and has added the new network to the list of
accessible networks.
4.60 Event ID 1102

Source ClusSvc
Description Microsoft Cluster Server discovered that the node is

attached to a new network by adapter adapter. A new
network and network interface will be added to the cluster
configuration.
Information The cluster service noticed the addition of a new network.

The network will be added to list of available networks.
4.61 Event ID 1104

Source ClusSvc
Description Microsoft Cluster Server failed to update the configuration

for one of the nodes Network interfaces. The error code was
errorcode.
Problem The cluster service attempted to update a cluster node and

could not perform the operation.
Solution Use the net helpmsg errorcode command to find an

explanation of the underlying error. For example, error 1393
indicates that a corrupted disk caused the operation to fail.
4.62 Event ID 1105

Source ClusSvc
Description Microsoft Cluster Server failed to initialize the RPC services.

The error code was %1.
Problem The cluster service attempted to utilize required RPC

services and could not successfully perform the operation.
Solution Use the net helpmsg errorcode command to find an

explanation of the underlying error. Check the system event
log for other RPC related errors or performance problems.

4.63 Event ID 1107

Source ClusSvc
Description Cluster node node name failed to make a connection to the

node over network network name. The error code was 1715.
Problem The cluster service attempted to connect to another cluster

node over a specific network and could not establish a
connection. This error is a warning message.
Solution Check to make sure that the specified network is available

and functioning correctly. If the node experiences this
problem, it may try other available networks to establish the
desired connection.
4.64 Event ID 1109

Source ClusSvc
Description The node was unable to secure its connection to cluster

node %1. The error code was %2. Check that both nodes
can communicate with their domain controllers.
Problem The cluster service attempted to connect to another cluster

node and could not establish a secure connection. This could
indicate domain connectivity problems.
Solution Check to make sure that the networks are available and
functioning correctly. This may be a symptom of larger
network problems or domain security issues.
4.65 Event ID 1115

Source ClusSvc
Description An unrecoverable error caused the join of node nodename

to the cluster to be aborted. The error code was errorcode.
Problem A node attempted to join the cluster but was unable to

obtain successful membership.
Solution Use the NET HELPMSG errorcode command to obtain

further description of the error that prevented the join
operation. For example, error code 1393 indicates that a
disk structure is corrupted and nonreadable. An error code
like this could indicate a corrupted quorum disk.

5 Appendix: Related Event Messages
5.1 Event ID 9
Source Disk
Description The device \Device\ScsiPort2 did not respond within the

timeout period.
Problem An I/O request was sent to a SCSI device and was not
serviced within acceptable time. The device timeout was
logged by this event.
Solution You may have a device or controller problem. Check SCSI

cables, termination, and adapter configuration. Excessive
recurrence of this event message may indicate a serious
problem that could indicate potential for data loss or
corruption. If necessary, contact your hardware vendor for
help troubleshooting this problem.
5.2 Event ID 101

Source W3SVC
Description The server was unable to add the virtual root "/" for the
directory "path" because of the following error: The system
cannot find the path specified. The data is the error.
Problem The World Wide Web Publishing service could not create a
virtual root for the IIS Virtual Root resource. The directory
path may have been deleted.
Solution Re-create or restore the directory and contents. Check the

resource properties for the IIS Virtual Root resource and
ensure that the path is correct. This problem may occur if
you had an IIS Virtual Root resource defined and then
uninstalled Microsoft Cluster Server without first deleting the
resource. In this case, you may evaluate and change virtual
root properties by using the Internet Service Manager.

5.3 Event ID 1004

Source DHCP
Description DHCP IP address lease "IP address" for the card with
network address "media access control Address" has been
denied.
Problem This system uses a DHCP-assigned IP address for a network

adapter. The system attempted to renew the leased address
and the DHCP server denied the request. The address may
already be allocated to another system. The DHCP server
may also have a problem. Network connectivity may be
affected by this problem.
Solution Resolve the problem by correcting DHCP server problems or

assigning a static IP address. For best results within a
cluster, use statically assigned IP addresses.
5.4 Event ID 1005

Source DHCP
Description DHCP failed to renew a lease for the card with network
address "MAC Address." The following error occurred: The
semaphore timeout period has expired.
Problem This system uses a DHCP assigned IP address for a network

adapter. The system attempted to renew the leased address
and was unable to renew the lease. Network operations on
this system may be affected.
Solution There may be a connectivity problem preventing access to

the DHCP server that leased the address, or the DHCP
server may be offline. For best results within a cluster, use
statically assigned IP addresses.
5.5 Event ID 2511

Source Server
Description The server service was unable to recreate the share

"Sharename" because the directory "path" no longer exists.
Problem The Server service attempted to create a share using the

specified directory path. This problem may occur if you
create a share (outside of Cluster Administrator) on a
cluster shared device. If the device is not exclusively
available to this computer, the server service cannot create
the share. Also, the directory may no longer exist or there
may be RPC related issues.
Solution Correct the problem by creating a shared resource through

Cluster Administrator, or correct the problem with the
missing directory. Check dates of RPC files in the system32
directory. Make sure they concur with those contained in the
service pack in use, or any hotfixes applied.
5.6 Event ID 4199

Source TCPIP
Description The system detected an address conflict for IP address "IP

address" with the system having network hardware address

"media access control address." Network operations on this

system may be disrupted as a result.
Problem Another system on the network may be using one of the

addresses configured on this computer.
Solution Resolve the IP address conflict. Check network adapter

configuration and any IP address resources defined within
the cluster.
5.7 Event ID 5719

Source Netlogon
Description No Windows NT Domain controller is available for domain

"domain." (This event is expected and can be ignored when
booting with the "No Net" hardware profile.) The following
error occurred: There are currently no logon servers
available to service the logon request.
Problem A domain controller for the domain could not be contacted.

As a result, proper authentication of accounts could not be
completed. This may occur if the network is disconnected or
disabled through system configuration.
Solution Resolve the connectivity problem with the domain controller

and restart the system.
5.8 Event ID 7000

Source Service Control Manager
Description The Cluster Service failed to start because of the following

error: The service did not start because of a logon failure.
Problem The service control manager attempted to start a service

(possibly ClusSvc). It could not authenticate the service
account. This error may be seen with Event 7013.
Solution The service account could not be authenticated. This may be

because of a failure contacting a domain controller, or
because account credentials are invalid. Check the service
account name and password and ensure that the account is
available and that credentials are correct. You may also try
running the cluster service from a command prompt (if
currently logged on as an administrator) by changing to the
%systemroot%\Cluster directory (or where you installed the
software) and typing ClusSvc -debug. If the service starts
and runs correctly, stop it by pressing CTRL+C and
troubleshoot the service account problem. This error may
also occur if network connectivity is disabled through the
system configuration or hardware profile. Microsoft Cluster
Server requires network connectivity.
5.9 Event ID 7013

Description Logon attempt with current password failed with the

following error: There are currently no logon servers
available to service the logon request.
More Info The description for this error message may vary somewhat
based on the actual error. For example, another error that
may be listed in the event detail might be: "Logon Failure:

unknown username or bad password."
Problem The service control manager attempted to start a service

(possibly ClusSvc). It could not authenticate the service
account with a domain controller.
Solution The service account may be in another domain, or this

system is not a domain controller. It is acceptable for the
node to be a nondomain controller, but the node needs
access to a domain controller within the domain as well as
the domain that the service account belongs to. Inability to
contact the domain controller may occur because of a
problem with the server, network, or other factors. This
problem is not related to the cluster software and must be
resolved before you start the cluster software. This error
may also occur if network connectivity is disabled through
the system configuration or hardware profile. Microsoft
Cluster Server requires network connectivity.
5.10 Event ID 7023

Description The Cluster Server service terminated with the following

error: The quorum log could not be created or mounted
successfully
Problem The Cluster Service attempted to start but could not gain
access to the quorum log on the quorum disk. This may be
because of problems gaining access to the disk or problems
joining a cluster that has already formed.
Solution Check the disk and quorum log for problems. If necessary,
check the cluster logfile for more information. There may be
other events in the system event log that may give more
information.

6 Appendix: Maintenance Tools
6.1 Windows 2000 tools
The tools presented below can be used for troubleshooting:
Tools Use to
Disk Management (compmgmt.msc) Determine whether a disk is available to a
particular node
If the disk can be selected under Disk
Management, it is online to the local system. If
the disk object appears dimmed, it is not
available for that node.
Services option in Administrative Tools Verify that the Cluster Service is running
Windows 2000 Explorer, My Computer, or Verify that a particular share has been exported
the Net View command from the server you expected
Event Viewer View and manage System, Security, and
Application event logs
Dr. Watson Detect, log, and diagnose application errors
Task Manager Monitor applications, tasks, and key
performance metrics; and view detailed
information on memory and CPU usage on
each application and process
Performance Monitor Monitor system details of application and
system behaviours, and monitor performance
Network Monitor Monitor and troubleshoot network connectivity
by capturing and analyzing network traffic
Windows Diagnostics (Winmsd.exe) Easily examine your system information on
device drivers, network usage, and system
resources, such as IRQ, DMA, and I/O
addresses

6.2 Windows 2000 Resource Kit tools
The tools presented below can be used for troubleshooting:
Tools Use to
Diskmap This command-line utility produces a detailed report on the configuration of the
hard disk that you specify. It provides information from the registry about disk
characteristics and geometry, and reads and displays data about all of the
partitions and logical drives defined on the disk. It also shows Disk Signatures.
Dumpel Dump Event Log is a command-line utility that dumps an event log for a local or
remote system into a tab-separated text file. This utility can also be used to filter
for or filter out certain event types.
Filever This command-line tool examines the version resource structure of a file or a
directory of files on either a local or remote computer, and displays information on
the versions of executable files, such as .exe and .dll files.
Getmac GetMAC provides a quick method for obtaining the MAC (Ethernet) layer address
and binding order for a computer running Windows 2000, locally or across a
network. This can be useful when you want to enter the address into a sniffer, or if
you need to know what protocols are currently in use on a computer.
Netcons This GUI tool monitors and displays current net connections, taking the place of
the Windows command-line command net use.
Clustool This tool permit cluster configuration backup and restore.

7 Appendix: Using AND Reading the Cluster LogFile
7.1 CLUSTERLOG Environment Variable
If you set the CLUSTERLOG environment variable, the cluster will create a logfile that
contains diagnostic information using the path specified. Important events during the
operation of the Cluster Service will be logged in this file. Because so many different events
occur, the logfile may be somewhat cryptic or hard to read. This document gives some hints
about how to read the logfile and information about what items to look for.
Note: Each time you attempt to start the Cluster Service, the log will be cleared and a new
logfile started. Each component of MSCS that places an entry in the logfile will indicate itself
by abbreviation in square brackets. For example, the Node Manager component would be
abbreviated [NM]. Logfile entries will vary from one cluster to another. As a result, other
logfiles may vary from excerpts referenced in this document.
Note Log entry lines in the following sections have been wrapped for space constraints in this
document. The lines do not normally wrap.
7.2 Operating System Version Number and Service Pack Leve
Near the beginning of the logfile, notice the build number of MSCS, followed by the operating
system version number and service pack level. If you call for support, engineers may ask for
this information:
082::14-21:29:26.625 Cluster Service started - Cluster Version 1.224.
082::14-21:29:26.625 OS Version 4.0.1381 - Service Pack 3.
7.3 Cluster Service Startup
Following the version information, some initialization steps occur. Those steps are followed by
an attempt to join the cluster, if one node already exists in a running state. If the Cluster
Service could not detect any other cluster members, it will attempt to form the cluster.
Consider the following log entries:
0b5::12-20:15:23.531 We’re initing Ep...
0b5::12-20:15:23.531 [DM]: Initialization
0b5::12-20:15:23.531 [DM] DmpRestartFlusher: Entry
0b5::12-20:15:23.531 [DM] DmpStartFlusher: Entry
0b5::12-20:15:23.531 [DM] DmpStartFlusher: thread created
0b5::12-20:15:23.531 [NMINIT] Initializing the Node Manager...
0b5::12-20:15:23.546 [NMINIT] Local node name = NODEA.
0b5::12-20:15:23.546 [NMINIT] Local node ID = 1.
0b5::12-20:15:23.546 [NM] Creating object for node 1 (NODEA)
0b5::12-20:15:23.546 [NM] node 1 state 1

0b5::12-20:15:23.546 [NM] Initializing networks.

0b5::12-20:15:23.546 [NM] Initializing network interface facilities.
0b5::12-20:15:23.546 [NMINIT] Initialization complete.
0b5::12-20:15:23.546 [FM] Starting worker thread...
0b5::12-20:15:23.546 [API] Initializing
0a9::12-20:15:23.546 [FM] Worker thread running
0b5::12-20:15:23.546 [lm] :LmInitialize Entry.
0b5::12-20:15:23.546 [lm] :TimerActInitialize Entry.
0b5::12-20:15:23.546 [CS] Initializing RPC server.
0b5::12-20:15:23.609 [INIT] Attempting to join cluster MDLCLUSTER
0b5::12-20:15:23.609 [JOIN] Spawning thread to connect to sponsor 192.88.80.114
06c::12-20:15:23.609 [JOIN] Asking 192.88.80.114 to sponsor us.
0b5::12-20:15:23.609 [JOIN] Waiting for all connect threads to terminate.
06c::12-20:15:32.750 [JOIN] Sponsor 192.88.80.114 is not available, status=1722.
0b5::12-20:15:32.750 [JOIN] All connect threads have terminated.
0b5::12-20:15:32.750 [JOIN] Unable to connect to any sponsor node.
0b5::12-20:15:32.750 [INIT] Failed to join cluster, status 53
0b5::12-20:15:32.750 [INIT] Attempting to form cluster MDLCLUSTER
0b5::12-20:15:32.750 [Ep]: EpInitPhase1
0b5::12-20:15:32.750 [API] Online read only
04b::12-20:15:32.765 [RM] Main: Initializing.
Note that the cluster service attempts to join the cluster. If it cannot connect with an existing
member, the software decides to form the cluster. The next series of steps attempts to form
groups and resources necessary to accomplish this task. It is important to note that the
cluster service must arbitrate control of the quorum disk.
0b5::12-20:15:32.781 [FM] Creating group a1a13a86-0eaf-11d1-8427-0000f8034599
0b5::12-20:15:32.781 [FM] Group a1a13a86-0eaf-11d1-8427-0000f8034599 contains a1a13a87-
0eaf-11d1-8427-0000f8034599.
0b5::12-20:15:32.781 [FM] Creating resource a1a13a87-0eaf-11d1-8427-0000f8034599
0b5::12-20:15:32.781 [FM] FmpAddPossibleEntry adding 1 to a1a13a87-0eaf-11d1-8427-
0000f8034599 possible node list
0b5::12-20:15:32.781 [FMX] Found the quorum resource a1a13a87-0eaf-11d1-8427-0000f8034599.
0b5::12-20:15:32.781 [FM] All dependencies for a1a13a87-0eaf-11d1-8427-0000f8034599 created
0b5::12-20:15:32.781 [FM] arbitrate for quorum resource id a1a13a87-0eaf-11d1-8427-
0000f8034599.
0b5::12-20:15:32.781 FmpRmCreateResource: creating resource a1a13a87-0eaf-11d1-8427-
0000f8034599 in shared resource monitor
0b5::12-20:15:32.812 FmpRmCreateResource: created resource a1a13a87-0eaf-11d1-8427-
0000f8034599, resid 1363016
0dc::12-20:15:32.828 Physical Disk <Disk D:>: Arbitrate returned status 0.
0b5::12-20:15:32.828 [FM] FmGetQuorumResource successful
0b5::12-20:15:32.828 FmpRmOnlineResource: bringing resource a1a13a87-0eaf-11d1-8427-
0000f8034599 (resid 1363016) online.
0b5::12-20:15:32.843 [CP] CppResourceNotify for resource Disk D:
0b5::12-20:15:32.843 [GUM] GumSendUpdate: Locker waiting type 0 context 8
0b5::12-20:15:32.843 [GUM] Thread 0xb5 UpdateLock wait on Type 0
0b5::12-20:15:32.843 [GUM] DoLockingUpdate successful, lock granted to 1
0b5::12-20:15:32.843 [GUM] GumSendUpdate: Locker dispatching seq 388 type 0 context 8
0b5::12-20:15:32.843 [GUM] GumpDoUnlockingUpdate releasing lock ownership
0b5::12-20:15:32.843 [GUM] GumSendUpdate: completed update seq 388 type 0 context 8
0b5::12-20:15:32.843 [GUM] GumSendUpdate: Locker waiting type 0 context 9
0b5::12-20:15:32.843 [GUM] Thread 0xb5 UpdateLock wait on Type 0
0b5::12-20:15:32.843 [GUM] DoLockingUpdate successful, lock granted to 1
0b5::12-20:15:32.843 [GUM] GumSendUpdate: Locker dispatching seq 389 type 0 context 9
0b5::12-20:15:32.843 [GUM] GumpDoUnlockingUpdate releasing lock ownership
0b5::12-20:15:32.843 [GUM] GumSendUpdate: completed update seq 389 type 0 context 9
0b5::12-20:15:32.843 FmpRmOnlineResource: Resource a1a13a87-0eaf-11d1-8427-0000f8034599
pending
0e1::12-20:15:33.359 Physical Disk <Disk D:>: Online, created registry watcher thread.

090::12-20:15:33.359 [FM] NotifyCallBackRoutine: enqueuing event

04d::12-20:15:33.359 [FM] WorkerThread, processing transition event for a1a13a87-0eaf-11d1-
8427-0000f8034599, oldState = 129, newState = 2.
04d::12-20:15:33.359 [FM] HandleResourceTransition: Resource Name = a1a13a87-0eaf-11d1-
8427-0000f8034599 old state=129 new state=2
04d::12-20:15:33.359 [DM] DmpQuoObjNotifyCb: Quorum resource is online
04d::12-20:15:33.375 [DM] DmpQuoObjNotifyCb: Own quorum resource, try open the quorum log
04d::12-20:15:33.375 [DM] DmpQuoObjNotifyCb: the name of the quorum file is
D:\MSCS\quolog.log
04d::12-20:15:33.375 [lm] LogCreate : Entry FileName=D:\MSCS\quolog.log
MaxFileSize=0x00010000
04d::12-20:15:33.375 [lm] LogpCreate : Entry
In this case, the node forms the cluster group and quorum disk resource, gains control of the
disk, and opens the quorum logfile. From here, the cluster performs operations with the
logfile, and proceeds to form the cluster. This involves configuring network interfaces and
bringing them online.
0b5::12-20:15:33.718 [NM] Beginning form process.
0b5::12-20:15:33.718 [NM] Synchronizing node information.
0b5::12-20:15:33.718 [NM] Creating node objects.
0b5::12-20:15:33.718 [NM] Configuring networks & interfaces.
0b5::12-20:15:33.718 [NM] Synchronizing network information.
0b5::12-20:15:33.718 [NM] Synchronizing interface information.
0b5::12-20:15:33.718 [dm] DmBeginLocalUpdate Entry
0b5::12-20:15:33.718 [dm] DmBeginLocalUpdate Exit, pLocalXsaction=0x00151c20
dwError=0x00000000
0b5::12-20:15:33.718 [NM] Setting database entry for interface a1a13a7f-0eaf-11d1-8427-
0000f8034599
0b5::12-20:15:33.718 [dm] DmCommitLocalUpdate Entry
0b5::12-20:15:33.718 [dm] DmCommitLocalUpdate Exit, dwError=0x00000000
0b5::12-20:15:33.718 [dm] DmBeginLocalUpdate Entry
0b5::12-20:15:33.875 [dm] DmBeginLocalUpdate Exit, pLocalXsaction=0x00151c20
dwError=0x00000000
0b5::12-20:15:33.875 [NM] Setting database entry for interface a1a13a81-0eaf-11d1-8427-
0000f8034599
0b5::12-20:15:33.875 [dm] DmCommitLocalUpdate Entry
0b5::12-20:15:33.875 [dm] DmCommitLocalUpdate Exit, dwError=0x00000000
0b5::12-20:15:33.875 [NM] Matched 2 networks, created 0 new networks.
0b5::12-20:15:33.875 [NM] Resynchronizing network information.
0b5::12-20:15:33.875 [NM] Resynchronizing interface information.
0b5::12-20:15:33.875 [NM] Creating network objects.
0b5::12-20:15:33.875 [NM] Creating object for network a1a13a7e-0eaf-11d1-8427-0000f8034599
0b5::12-20:15:33.875 [NM] Creating object for network a1a13a80-0eaf-11d1-8427-0000f8034599
0b5::12-20:15:33.875 [NM] Creating interface objects.
0b5::12-20:15:33.875 [NM] Creating object for interface a1a13a7f-0eaf-11d1-8427-
0000f8034599.
0b5::12-20:15:33.875 [NM] Registering network a1a13a7e-0eaf-11d1-8427-0000f8034599 with
cluster transport.
0b5::12-20:15:33.875 [NM] Registering interfaces for network a1a13a7e-0eaf-11d1-8427-
0000f8034599 with cluster transport.
0b5::12-20:15:33.875 [NM] Registering interface a1a13a7f-0eaf-11d1-8427-0000f8034599 with
cluster transport, addr 9.9.9.2, endpoint 3003.
0b5::12-20:15:33.890 [NM] Instructing cluster transport to bring network a1a13a7e-0eaf-
11d1-8427-0000f8034599 online.
0b5::12-20:15:33.890 [NM] Creating object for interface a1a13a81-0eaf-11d1-8427-
0000f8034599.
0b5::12-20:15:33.890 [NM] Registering network a1a13a80-0eaf-11d1-8427-0000f8034599 with
cluster transport.
0b5::12-20:15:33.890 [NM] Registering interfaces for network a1a13a80-0eaf-11d1-8427-
0000f8034599 with cluster transport.

0b5::12-20:15:33.890 [NM] Registering interface a1a13a81-0eaf-11d1-8427-0000f8034599 with

cluster transport, addr 192.88.80.190, endpoint 3003.
0b5::12-20:15:33.890 [NM] Instructing cluster transport to bring network a1a13a80-0eaf-
11d1-8427-0000f8034599 online.
After initializing network interfaces, the cluster will continue formation with the enumeration of
cluster nodes. In this case, as a newly formed cluster, the cluster will contain only one node.
If this session had been joining an existing cluster, the node enumeration would show two
nodes. Next, the cluster will bring the Cluster IP address and Cluster Name resources online.
0b5::12-20:15:34.015 [FM] OnlineGroup: setting group state to Online for f901aa29-0eaf-
11d1-8427-0000f8034599
069::12-20:15:34.015 IP address <Cluster IP address>: Created NBT interface
\Device\NetBt_If6 (instance 355833456).
0b5::12-20:15:34.015 [FM] FmpAddPossibleEntry adding 1 to a1a13a87-0eaf-11d1-8427-
0b5::12-20:15:34.015 [FM] FmFormNewClusterPhase2 complete.
.
.
.
0b5::12-20:15:34.281 [INIT] Successfully formed a cluster.
09c::12-20:15:34.281 [lm] :ReSyncTimerHandles Entry.
09c::12-20:15:34.281 [lm] :ReSyncTimerHandles Exit gdwNumHandles=3
0b5::12-20:15:34.281 [INIT] Cluster Started! Original Min WS is 204800, Max WS is 1413120.
08c::12-20:15:34.296 [CPROXY] clussvc initialized
069::12-20:15:40.421 IP address <Cluster IP Address>: IP Address 192.88.80.114 on adapter
DC21X41 online
.
.
.
04d::12-20:15:40.421 [FM] OnlineWaitingTree, a1a13a84-0eaf-11d1-8427-0000f8034599 depends
on a1a13a83-0eaf-11d1-8427-0000f8034599. Start first
04d::12-20:15:40.421 [FM] OnlineWaitingTree, Start resource a1a13a84-0eaf-11d1-8427-
0000f8034599
04d::12-20:15:40.421 [FM] OnlineResource: a1a13a84-0eaf-11d1-8427-0000f8034599 depends on
a1a13a83-0eaf-11d1-8427-0000f8034599. Bring online first.
04d::12-20:15:40.421 FmpRmOnlineResource: bringing resource a1a13a84-0eaf-11d1-8427-
0000f8034599 (resid 1391032) online.
04d::12-20:15:40.421 [CP] CppResourceNotify for resource Cluster Name
04d::12-20:15:40.421 [GUM] GumSendUpdate: Locker waiting type 0 context 8
04d::12-20:15:40.437 [GUM] Thread 0x4d UpdateLock wait on Type 0
04d::12-20:15:40.437 [GUM] DoLockingUpdate successful, lock granted to 1
076::12-20:15:40.437 Network Name <Cluster Name>: Bringing resource online...
04d::12-20:15:40.437 [GUM] GumSendUpdate: Locker dispatching seq 411 type 0 context 8
04d::12-20:15:40.437 [GUM] GumpDoUnlockingUpdate releasing lock ownership
04d::12-20:15:40.437 [GUM] GumSendUpdate: completed update seq 411 type 0 context 8
04d::12-20:15:40.437 [GUM] GumSendUpdate: Locker waiting type 0 context 11
.
.
.
076::12-20:15:43.515 Network Name <Cluster Name>: Registered server name MDLCLUSTER on
transport \Device\NetBt_If6.
076::12-20:15:46.578 Network Name <Cluster Name>: Registered workstation name MDLCLUSTER on
transport \Device\NetBt_If6.
076::12-20:15:46.578 Network Name <Cluster Name>: Network Name MDLCLUSTER is now online
Following these steps, the cluster will attempt to bring other resources and groups online. The
logfile will continue to increase in size as the cluster service runs. Therefore, it may be a good

idea to enable this option when you are having problems, rather than leaving it on for days or
weeks at a time.
7.4 Logfile Entries for Common Failures
After reviewing a successful startup of the Cluster Service, you may want to examine some
errors that may appear because of various failures. The following examples illustrate possible
log entries for four different failures.
Example 1: Quorum Disk Turned Off

If the cluster attempts to form and cannot connect to the quorum disk, entries similar to the
following may appear in the logfile. Because of the failure, the cluster cannot form, and the
Cluster Service terminates.
0b9::14-20:59:42.921 [RM] Main: Initializing.
08f::14-20:59:42.937 [FM] Creating group a1a13a86-0eaf-11d1-8427-0000f8034599
08f::14-20:59:42.937 [FM] Group a1a13a86-0eaf-11d1-8427-0000f8034599 contains a1a13a87-
0eaf-11d1-8427-0000f8034599.
08f::14-20:59:42.937 [FM] Creating resource a1a13a87-0eaf-11d1-8427-0000f8034599
08f::14-20:59:42.937 [FM] FmpAddPossibleEntry adding 1 to a1a13a87-0eaf-11d1-8427-
08f::14-20:59:42.937 [FMX] Found the quorum resource a1a13a87-0eaf-11d1-8427-0000f8034599.
08f::14-20:59:42.937 [FM] All dependencies for a1a13a87-0eaf-11d1-8427-0000f8034599 created
08f::14-20:59:42.937 [FM] arbitrate for quorum resource id a1a13a87-0eaf-11d1-8427-
0000f8034599.
08f::14-20:59:42.937 FmpRmCreateResource: creating resource a1a13a87-0eaf-11d1-8427-
08f::14-20:59:42.968 FmpRmCreateResource: created resource a1a13a87-0eaf-11d1-8427-
0000f8034599, resid 1362616
0e9::14-20:59:43.765 Physical Disk <Disk D:>: SCSI, error reserving disk, error 21.
0e9::14-20:59:54.140 Physical Disk <Disk D:>: Arbitrate returned status 21.
08f::14-20:59:54.140 [FM] FmGetQuorumResource failed, error 21.
08f::14-20:59:54.140 [INIT] Cleaning up failed form attempt.
08f::14-20:59:54.140 [INIT] Failed to form cluster, status 3213068.
08f::14-20:59:54.140 [CS] ClusterInitialize failed 21
08f::14-20:59:54.140 [INIT] The cluster service is shutting down.
08f::14-20:59:54.140 [evt] EvShutdown
08f::14-20:59:54.140 [FM] Shutdown: Failover Manager requested to shutdown groups.
08f::14-20:59:54.140 [FM] DestroyGroup: destroying a1a13a86-0eaf-11d1-8427-0000f8034599
08f::14-20:59:54.140 [FM] DestroyResource: destroying a1a13a87-0eaf-11d1-8427-0000f8034599
08f::14-20:59:54.140 [OM] Deleting object Physical Disk
08f::14-20:59:54.140 [FM] Resource a1a13a87-0eaf-11d1-8427-0000f8034599 destroyed.
08f::14-20:59:54.140 [FM] Group a1a13a86-0eaf-11d1-8427-0000f8034599 destroyed.
08f::14-20:59:54.140 [Dm] DmShutdown
08f::14-20:59:54.140 [DM] DmpShutdownFlusher: Entry
08f::14-20:59:54.156 [DM] DmpShutdownFlusher: Setting event
062::14-20:59:54.156 [DM] DmpRegistryFlusher: got 0
062::14-20:59:54.156 [DM] DmpRegistryFlusher: exiting
0ca::14-20:59:54.156 [FM] WorkItem, delete resource <Disk D:> status 0
0ca::14-20:59:54.156 [OM] Deleting object Disk Group 1 (a1a13a86-0eaf-11d1-8427-
0000f8034599)
0e7::14-20:59:54.375 [CPROXY] clussvc terminated, error 0.
0e7::14-20:59:54.375 [CPROXY] Service Stopping...
0b9::14-20:59:54.375 [RM] Going away, Status = 1, Shutdown = 0.

02c::14-20:59:54.375 [RM] PollerThread stopping. Shutdown = 1, Status = 0, WaitFailed = 0,

NotifyEvent address = 196.
0e7::14-20:59:54.375 [CPROXY] Cleaning up
0b9::14-20:59:54.375 [RM] RundownResources posting shutdown notification.
0e7::14-20:59:54.375 [CPROXY] Cleanup complete.
0e3::14-20:59:54.375 [RM] NotifyChanges shutting down.
0e7::14-20:59:54.375 [CPROXY] Service Stopped.
Perhaps the most meaningful lines from above are:

0e9::14-20:59:54.140 Physical Disk <Disk D:>: Arbitrate returned status 21.
Note The error code on these logfile entries is 21. You can issue net helpmsg 21 from the
command line and receive the explanation of the error status code. Status code 21 means,
"The device is not ready.” This indicates a possible problem with the device. In this case, the
device was turned off, and the error status correctly indicates the problem.

Example 2: Quorum Disk Failure

In this example, the drive has failed or has been reformatted from the SCSI controller. As a
result, the cluster service cannot locate a drive with the specific signature it is looking for.
0b8::14-21:11:46.515 [RM] Main: Initializing.
074::14-21:11:46.531 [FM] Creating group a1a13a86-0eaf-11d1-8427-0000f8034599
074::14-21:11:46.531 [FM] Group a1a13a86-0eaf-11d1-8427-0000f8034599 contains a1a13a87-
0eaf-11d1-8427-0000f8034599.
074::14-21:11:46.531 [FM] Creating resource a1a13a87-0eaf-11d1-8427-0000f8034599
074::14-21:11:46.531 [FM] FmpAddPossibleEntry adding 1 to a1a13a87-0eaf-11d1-8427-
074::14-21:11:46.531 [FMX] Found the quorum resource a1a13a87-0eaf-11d1-8427-0000f8034599.
074::14-21:11:46.531 [FM] All dependencies for a1a13a87-0eaf-11d1-8427-0000f8034599 created
074::14-21:11:46.531 [FM] arbitrate for quorum resource id a1a13a87-0eaf-11d1-8427-
0000f8034599.
074::14-21:11:46.531 FmpRmCreateResource: creating resource a1a13a87-0eaf-11d1-8427-
074::14-21:11:46.562 FmpRmCreateResource: created resource a1a13a87-0eaf-11d1-8427-
0000f8034599, resid 1362696
075::14-21:11:46.671 Physical Disk <Disk D:>: SCSI, Performing bus rescan.
075::14-21:11:51.843 Physical Disk <Disk D:>: SCSI, error attaching to signature 71cd0549,
error 2.
075::14-21:11:51.843 Physical Disk <Disk D:>: Unable to attach to signature 71cd0549.
Error: 2.
074::14-21:11:51.859 [FM] FmGetQuorumResource failed, error 2.
074::14-21:11:51.859 [INIT] Cleaning up failed form attempt.
In this case, the most important logfile entries are:

075::14-21:11:51.843 Physical Disk <Disk D:>: SCSI, error attaching to signature 71cd0549,
error 2.
075::14-21:11:51.843 Physical Disk <Disk D:>: Unable to attach to signature 71cd0549.
Error: 2.
Status code 2 means, "The system cannot find the file specified.” The error in this case may
mean that it cannot find the disk, or that, because of some kind of problem, it cannot locate
the quorum logfile that should be on the disk.

Example 3: Duplicate Cluster IP Address
If another computer on the network has the same IP address as the cluster IP address
resource, the resource will be prevented from going online. Further, the cluster name will not
be registered on the network, as it depends on the IP address resource. Because this name
is the network name used for cluster administration, you will not be able to administer the
cluster using this name, in this type of failure. However, you may be able to use the computer
name of the cluster node to connect with Cluster Administrator. Additionally, you may be able
to connect locally from the console using the loopback address. The following sample entries
are from a cluster logfile during this type of failure:
0b9::14-21:32:59.968 IP Address <Cluster IP Address>: The IP address is already in use on
the network, status 5057.
0d2::14-21:32:59.984 [FM] NotifyCallBackRoutine: enqueuing event
03e::14-21:32:59.984 [FM] WorkerThread, processing transition event for a1a13a83-0eaf-11d1-
8427-0000f8034599, oldState = 129, newState = 4.03e
.
.
.
03e::14-21:32:59.984 FmpHandleResourceFailure: taking resource a1a13a83-0eaf-11d1-8427-
0000f8034599 and dependents offline
03e::14-21:32:59.984 [FM] TerminateResource: a1a13a84-0eaf-11d1-8427-0000f8034599 depends
on a1a13a83-0eaf-11d1-8427-0000f8034599. Terminating first
0d3::14-21:32:59.984 Network Name <Cluster Name>: Terminating name MDLCLUSTER...
0d3::14-21:32:59.984 Network Name <Cluster Name>: Name MDLCLUSTER is already offline.
.
.
.
03e::14-21:33:00.000 FmpRmTerminateResource: a1a13a84-0eaf-11d1-8427-0000f8034599 is now
offline
0c7::14-21:33:00.000 IP Address <Cluster IP Address>: Terminating resource...
0c7::14-21:33:00.000 IP Address <Cluster IP Address>: Address 192.88.80.114 on adapter
DC21X41 offline.
Example 4: Evicted Node Attempts to Join Existing Cluster
If you evict a node from a cluster, the cluster software on that node must be reinstalled to
gain access to the cluster again. If you start the evicted node, and the Cluster Service
attempts to join the cluster, entries similar to the following may appear in the cluster logfile:
032::26-16:11:45.109 [INIT] Attempting to join cluster MDLCLUSTER
032::26-16:11:45.109 [JOIN] Spawning thread to connect to sponsor 192.88.80.115
040::26-16:11:45.109 [JOIN] Asking 192.88.80.115 to sponsor us.
032::26-16:11:45.109 [JOIN] Spawning thread to connect to sponsor NODEA
032::26-16:11:45.125 [JOIN] Waiting for all connect threads to terminate.
092::26-16:11:45.125 [JOIN] Asking NODEA to sponsor us.
040::26-16:12:18.640 [JOIN] Sponsor 192.88.80.115 is not available (JoinVersion),
status=1722.
098::26-16:12:18.640 [JOIN] Sponsor 192.88.80.190 is not available (JoinVersion),
status=1722.
099::26-16:12:18.640 [JOIN] Sponsor 9.9.9.2 is not available (JoinVersion), status=1722.

098::26-16:12:18.640 [JOIN] JoinVersion data for sponsor 157.57.224.190 is invalid, status

1722.
099::26-16:12:18.640 [JOIN] JoinVersion data for sponsor 9.9.9.2 is invalid, status 1722.
040::26-16:12:18.640 [JOIN] JoinVersion data for sponsor 157.58.80.115 is invalid, status
1722.
092::26-16:12:18.703 [JOIN] Sponsor NODEA is not available (JoinVersion), status=1722.
092::26-16:12:18.703 [JOIN] JoinVersion data for sponsor NODEA is invalid, status 1722.
032::26-16:12:18.703 [JOIN] All connect threads have terminated.
032::26-16:12:18.703 [JOIN] Unable to connect to any sponsor node.
032::26-16:12:18.703 [INIT] Failed to join cluster, status 0
032::26-16:12:18.703 [INIT] Attempting to form cluster MDLCLUSTER
.
.
.
032::26-16:12:18.734 [FM] arbitrate for quorum resource id 24acc093-1e28-11d1-9e5d-
0000f8034599.
032::26-16:12:18.734 [FM] FmpQueryResourceInfo:initialize the resource with the registry
information
032::26-16:12:18.734 FmpRmCreateResource: creating resource 24acc093-1e28-11d1-9e5d-
032::26-16:12:18.765 FmpRmCreateResource: created resource 24acc093-1e28-11d1-9e5d-
0000f8034599, resid 1360000
06d::26-16:12:18.812 Physical Disk <Disk G:>: SCSI, error attaching to signature b2320a9b,
error 2.
06d::26-16:12:18.812 Physical Disk <Disk G:>: Unable to attach to signature b2320a9b.
Error: 2.
032::26-16:12:18.812 [FM] FmGetQuorumResource failed, error 2.
032::26-16:12:18.812 [INIT] Cleaning up failed form attempt.
032::26-16:12:18.812 [INIT] Failed to form cluster, status 2.
032::26-16:12:18.828 [CS] ClusterInitialize failed 2
The node attempts to join the existing cluster, but has invalid credentials, because it was
previously evicted. Therefore, the existing node refuses to communicate with it. The node
may attempt to form its own version of the cluster, but cannot gain control of the quorum disk,
because the existing cluster node maintains ownership. Examination of the logfile on the
existing cluster node reveals that the Cluster Service posted entries to reflect the failed
attempt to join:
0c4::29-18:13:31.035 [NMJOIN] Processing request by node 2 to begin joining.
0c4::29-18:13:31.035 [NMJOIN] Node 2 is not a member of this cluster. Cannot join.

8 Appendix: Q258078 Cluster Service Startup Options
PSS ID Number: Q258078

Article last modified on 08-08-2001
:2000,4.0
======================================================================
-------------------------------------------------------------------------------
The information in this article applies to:
- Microsoft Windows 2000 Advanced Server

- Microsoft Windows 2000 Datacenter Server
- Microsoft Windows NT Server, Enterprise Edition version 4.0
-------------------------------------------------------------------------------
SUMMARY
=======
This is a list of all the available switches that can be used as startup
parameters to start the Cluster service.
To do this, go to the properties of the service, and put the appropriate switch
in the Start Parameters box, and then click Start.
NOTE: You must include a forward slash (/) at the beginning of the switch.
You can also use the desired switch when starting the Cluster service from the
command line as well:
net start clussvc.exe /<switch>
NOTE: The Debug command has special startup parameters, please reference the
Debug section below for proper usage.
Valid option switches are:
- FixQuorum -- No quorum device, no quorum logging

- DebugResMon -- Enable debugging of resrcmon process
- Debug -- Displays events during the start of Cluster Service. See below for
special syntax
Windows 2000 and later only switches:

- ResetQuorumLog -- Dynamically re-creates the quorum log and checkpoint files

(this functionality is automatic in Windows NT 4.0)
- NoRepEvtLogging -- No replication of Event Log entries
MORE INFORMATION
================
Explanation of some of the switches:
- Debug
Function: It is possible that Cluster logging may not contain any helpful
information in diagnosing failures of the Cluster service to start. This is
because the Cluster service may fail prior to the Cluster.log starting.
Starting the Cluster service with this switch displays the initialization of
the Cluster service and can be beneficial in identifying these early
occurring problems.
Requirements: This switch is intended for temporary diagnostic use only. If

the Cluster service fails to start because of a logon error of the service
account, or another system-related error, the service may not have a chance
to run. As a result, a cluster.log file may not be created. This method runs
the service outside of the normal environment given by the Service Control
Manager. To use this switch, you must be logged on locally with
administrative rights and launched from the command prompt. Do not use the
/debug option for normal use or for any length of time. The service does not
run as efficiently with the option set.
Usage scenarios: This switch must be used only when the Cluster service fails
to start up. This switch will display on the screen the operation of the
Cluster Service as it attempts to start. This switch can only be used when
starting the service from the command prompt and you must be in the directory
that the Cluster Service is installed to, by default this is
%SystemRoot%\Cluster. This is also the only switch that you do not use the
NET START command to start the service.
Operation: Open a Command Prompt and change your current directory to the
%SystemRoot%\cluster directory. Then type:
"CLUSSVC /debug"
The cluster service will send output to the window similar to what would
normally be seen in the cluster.log. You may also capture this information to
a file by using the following command syntax instead:
"CLUSSVC /debug > c:\debug.log"
At the point that you are satisfied that the Cluster service is running

properly, use CTRL+C from the keyboard to stop the service.
Note: You may wish to use the ClusterLogLevel environment variable to control
the output level when using the Debug switch, see this article for additional
information:
Q168801 How to Enable Cluster Logging in Microsoft Cluster Server
- FixQuorum
Function: Lets the cluster service start up despite problems with the quorum
device. The only resources that will be brought online once the service is
started is the Cluster IP Address and the Cluster Name. You can open Cluster
Administrator and bring other resources online manually.
Requirements: This switch MUST be used only in diagnosis mode on a very

temporary basis and not during normal operation. Only 1 node must be started
up using this switch and a second node must not be attempted to be joined to
the node started up using this switch. Typically, this switch is used alone.
Usage scenarios: If the cluster service is unable to start up in the normal

way due to the failure of the quorum resource, users can start up the cluster
service in this mode and attempt to diagnose the failure.
Operation: After the cluster service is started up, all resources including
the quorum resource remain offline. Users can then manually try to bring the
quorum resource online and monitor the cluster log entries as well as the new
event log entries and attempt to diagnose any problems with the quorum
resource.
NET START ClusSvc /FixQuorum
- ResetQuorumLog
Function: If the quorum log and checkpoint file is not found or is corrupt,
this can be used to create files based from the information in the local
node's %SystemRoot%\Cluster\CLUSDB registry hive. If the quorum log file is
found to be in proper order, this switch has no effect.
Requirements: Typically, only one node is started up using this switch and
this switch is used alone. Must be used only by experienced users who
understand the consequences of using information that is potentially out of
date, to create a new quorum log file.
Usage scenarios: This switch must be used only when the Cluster service fails
to start up on a Windows 2000 or later machine due to a missing/corrupt

quorum log QUOLOG.LOG and CHKxxx.TMP files. Windows NT 4.0 will automatically
recreate these files if they do not exist, this functionality was added in
Windows 2000 to give more control over the start of the Cluster service.
Operation: The Cluster service does an auto-reset of the quorum log file if it
is found missing or corrupt by using the information in the currently loaded
cluster hive using the file %systemroot%\Cluster\CLUSDB.
NET START ClusSvc /ResetQuorumLog
- DebugResMon
Function: Helps you to debug the resource monitor process and, therefore, the
resource dynamic-link libraries (DLLs) that are loaded by the resource
monitor. You can use any standard Windows-based debugger.
Requirements: Can only be used when the cluster service is started from the
command prompt and using the "/debug" option, there is no equivalent registry
setting that could be used when cluster service is run as a service. Debugger
must be available for attaching to the resource monitor when it starts up.
Typically, this switch is used alone.
Usage scenarios: Developers use it to debug the resource monitor process and
resource DLLs. This option is extremely useful if a bug in a resource DLL
causes the resource monitor process to crash soon after it is started up by
the cluster service and before users can manually attach a debugger to the
resource monitor process.
Operation: Just before the resource monitor process is started up, the cluster
service process waits with a message "Waiting for debugger to connect to the
resmon process X" where X is the PID (Process ID) of the resource monitor
process. The cluster service does this waiting for all resource monitor
processes created by it. Once the user attaches a debugger to the resource
monitor process, and the resource monitor process starts up, the cluster
service continues with its initialization.
- NoRepEvtLogging
Function: The norepevtlogging command prevents replication of those events

recorded in the event log. This command is useful in reducing the amount of
information displayed in the command window by filtering out events already
recorded in the event log. Event log replication is a new feature added in
Windows 2000.
Usage scenarios: For example, to start the cluster service and log those
events not recorded in the event log to a local file, Debugnorep.log:

clussvc /debug /norepevtlogging > c:\DEBUGNOREP.log\
Operation: The norepevtlogging command can be set as a start parameter when

starting the cluster service from the Computer Management console.
The command line syntax is:

NET START ClusSvc /NoRepEvtLogging
This will prevent the node that was started with this switch not to replicate
it's information to other nodes, but it will still receive information from
other nodes that were started normally.
Additional query words: MSCS
======================================================================
Keywords : kbenv kbtool w2000mscs kbClustering
Technology : kbWinNTsearch kbWinNT400search kbwin2000AdvServ
kbwin2000AdvServSearch kbwin2000DataServ kbwin2000DataServSearch
kbWinNTSsearch kbWinNTSEntSearch kbWinNTSEnt400 kbWinNTS400search
kbwin2000Search kbWinAdvServSearch kbWinDataServSearch
Version : :2000,4.0
Issue type : kbinfo
======================================================================
=======
Copyright Microsoft Corporation 2001.


GLOBE ISIT OASIS2 Cluster Administration and Troubleshooting

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

GLOBE ISIT OASIS2 Cluster Administration and Troubleshooting

Caricato da

Copyright:

Formati disponibili

Cluster Administration and

Version Number (1.0)

GLOBE ISIT OASIS2

< Cluster Administration and

 2001 Nestec Ltd. – GLOBE – Global Business Excellence. http://veviis01.nestec.ch/GLOBE/ GL-GLOBE

Date of update Updated by Changes Made (section Version Status of

05/01/2004 Fabian SIRACH Draft Creation 1.0 Draft

 2001 Nestec Ltd. – GLOBE – Global Business Excellence. http://veviis01.nestec.ch/GLOBE/ GL-GLOBE

4.10 EVENT ID 1016..............................................................................................................32

4.56 EVENT ID 1096..............................................................................................................46

 2001 Nestec Ltd. – GLOBE – Global Business Excellence. http://veviis01.nestec.ch/GLOBE/ GL-GLOBE

1.1 Document Audience

1.2 Prerequisites and Related Documentation

8909675.doc Page 6 of 68 Created: 06.01.2004

2.1 Resources Management

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative

Take a resource offline:

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative

8909675.doc Page 7 of 68 Created: 06.01.2004

2.2 Group Management

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative

Take a group offline:

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative

Move a group to another node:

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative

8909675.doc Page 8 of 68 Created: 06.01.2004

Specify preferred owners of a group:

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative

Set group failover policy:

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative

8909675.doc Page 9 of 68 Created: 06.01.2004

Set group failback policy:

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative

2.3 Node Management

Starting the Cluster service:

8909675.doc Page 10 of 68 Created: 06.01.2004

1. Open Cluster Administrator (click Start, point to Programs, point to Administrative

Installing Cluster Service when the other node is online:

8909675.doc Page 11 of 68 Created: 06.01.2004

To validate the cluster installation, do to following:

Removing Cluster Service :

Enable diagnostic Logging :

Disable diagnostic Logging :

8909675.doc Page 12 of 68 Created: 06.01.2004

2.4 Applying Service Pack and Hotfix

2.5 Chkdsk and Autochk

• 174617 CHKDSK Runs While Running Microsoft Cluster Server Setup

8909675.doc Page 13 of 68 Created: 06.01.2004

2.6 Cluster Command Line

To verify nodes state: type cluster <cluster name> node

8909675.doc Page 14 of 68 Created: 06.01.2004

3.1 One node is Down

If a node is online, gather information about failure:

Symptoms and solutions:

Symptoms Causes Solutions

8909675.doc Page 15 of 68 Created: 06.01.2004

Symptoms Causes Solutions

8909675.doc Page 16 of 68 Created: 06.01.2004

3.2 Entire Cluster is Down

3.3 One or More Servers Quit Responding

 Domain controllers’ connectivity: Check network connectivity with domain controllers

For theses problems go to the corresponding chapter.

3.4 Cluster Service Do not Start