Parallel Concurrent Processing Failover and Load Balancing of E-Business Suite Release 11i and Release 12 Mike Swing, TruTek PDF

PARALLEL CONCURRENT PROCESSING FAILOVER
AND LOAD BALANCING OF E-BUSINESS SUITE

RELEASE 11I AND RELEASE 12
Mike Swing, TruTek
Abstract
Parallel Concurrent Processing Failover uses two mechanisms to detect a failure, dead connection detection, and detecting a
failure of the process monitor for the Concurrent Managers, otherwise known as PMON (note that this is not the PMON from
the database); introduced with Patch 6495206. Load balancing of the Concurrent Managers is critical if you expect parallel
concurrent processing to function after the failover to the remaining node(s).
This paper reviews Concurrent Manager basics before we discuss the topics of failover and load balancing. One of the key
components used by Concurrent Processing is Generic Service Management. The use of GSM with multiple nodes and
seeded GSM services is discussed. Administering Concurrent Managers, managing control across nodes, starting and
stopping the Concurrent Managers, and managing concurrent log files are skills needed to understand the configuration of
Parallel Concurrent Processing failover and load balancing.
There are a number of ways that an E-Business Suite environment might be configured for failover:
• Database
• Fast Connection Failover (FCF)
• Transparent Application Failover (TAF)
• Parallel Concurrent Processing Failover
• Concurrent Manager Failover
This paper will discuss Parallel Concurrent Processing Failover, ICM Failover, CRM Failover, and Concurrent Manager
Failover. We’ll leave the discussion of Database Failover, Fast Connection Failover and Transparent Application Failover for
another time.
The paper concludes with a discussion of load balancing and the issues that must be considered to properly configure an E-
Business Suite environment to take advantage of Oracle’s load balancing features.
Concurrent Processing
Most user interactions with Oracle Applications data are conducted via the HTML interface or the Forms interface. However,
reporting and interface programs may need to run periodically or on an ad hoc basis. As these programs may require a large
number of computations, they are run in the background at a time, and with a priority, such that the work of interactive users
is not impeded. Such programs are run on the Concurrent Processing server and run under Concurrent Managers.
When a request is submitted to run a Concurrent Program through an Oracle Applications form or through Oracle
Application Manager (OAM), the request inserts a row into the FND_CONCURRENT_REQUESTS table that specifies the
program to be run. Concurrent Managers read the requests from the table and start the appropriate Concurrent Programs.
The Concurrent Processing Server:

• Allows scheduling of batch jobs called Concurrent Requests.
• Processes Concurrent Programs as a Concurrent Request.
• Requests can be grouped together into Request Sets.
• Different types of Concurrent Managers handle different types of requests.
• A Concurrent Program can be assigned to a responsibility, and that responsibility can be assigned to users, allowing
them permission to run the Concurrent Program.
www.rmoug.org 1 RMOUG Training Days 2009

Parallel Concurrent Processing Failover and Load Balancing Swing
• Concurrent Managers may have limits on the Concurrent Programs that can be run, and the times that they can be
started. Concurrent Requests have priorities, statuses, and log and out files in $APPLCSF.
Definitions
The following are some acronyms that we will use throughout this paper:
• CP => Concurrent Processing
• DCD => Dead Connection Detection
• ICM => Internal Concurrent Manager
• IM => Internal Monitor
• CRM => Conflict Resolution Manager
• PCP => Parallel Concurrent Processing
• PMON => Process Monitor for ICM
Concurrent Requests
Figure 1 shows an example of the Concurrent Manager Requests screen.
The Phase and Status tell us what is happening

with each Concurrent Program
Figure 1
Phase and Status of Concurrent Requests

Figure 2 shows the various Phases and Statuses that a Concurrent Program can have, with a description of what they mean:
Phase Status Description – Action

Pending Normal The request is waiting to be picked up by the next available
manager.
Pending Standby Waiting for CRM to resolve conflict. CRM could be slow or an
incompatible program is running.
Running Normal The request is running normally.

Completed Normal The request has finished successfully

Completed Error The request has finished with an error. Check logs.
Completed Warning The request has finished with a Warning. Check the logs.
Inactive No Manager Request won’t run without a manager. Specialization rules aren’t
configured properly.
Figure 2
Concurrent Managers
Figure 3 shows the Concurrent Manager Administer screen. Oracle seeds a number of Concurrent Managers and assigns
Concurrent Programs to those managers. Your Applications System Administrator can also define custom managers and
assign Concurrent Programs to those managers.
Figure 3
Figure 4 shows the different types of Concurrent Managers, their Service Instance, and their Program Name. Your
Applications System Administrator can adjust the Concurrent Managers and Transaction Managers, but the other types of
managers must be left alone.
Manager Type Service Instance Program

Internal Concurrent Manager Internal Manager FNDLIBR
Conflict Resolution Manager Conflict Resolution Manager FNDCRM
Internal Monitor Internal Monitor:Node FNDIMON
Service Manager FNDSM
Concurrent Manager Standard Manager FNDLIBR
Concurrent Manager Inventory Manager INVLIBR
Concurrent Manager Session History Cleanup FNDLIBR
Concurrent Manager PA Streamline Manager PALIBR
Transaction Manager CRP Inquiry Manager CYQLIB

Transaction Manager FastFormula Transaction Manager FFTM

Transaction Manager PO Document Approval Manager POXCON
Transaction Manager Transaction Manager FNDTMTST
Scheduler/Prerelease Manager FNDSVC
OAM Generic Collection Service:Node FNDSVC
Figure 4
Concurrent Processing Overview

This diagram provides an overview of how Concurrent Processing works.
Web HTML Web Server

Browser Interface
Forms Server
JAVA
JInitiator Interface
Reports Server
Internal ICM Service Report SQL*Net

Monitor FNDLIBR Manager Review
FNDIMON FNDSM Agent
.rdx
Standard
Manager
FNDCRM FNDLIBR Requests Log Out
In the diagram, you can see that:

1. The Concurrent Processing server communicates with the database using Oracle SQL*Net.
2. Log and Out files from Concurrent Programs are generated on the Concurrent Processing server. Log files show
what occurred when the program ran, while out files are the output of the program.
3. The Concurrent Program log and output file from a request is passed back as a report to the Report Review Agent.
4. The Report Review Agent passes a file containing the entire report to the forms server.
5. The Forms Services component passes the report back to the user’s browser one page at time. Profile Options can be
used to control the size of the files and pages passed, to suit report volume and available network capacity.
Concurrent Manager Processes
Internal Concurrent Manager
Internal Concurrent Manager (FNDLIBR process) - Communicates with the Service Manager.
• The Internal Concurrent Manager (ICM) starts, sets the number of active processes, monitors, and terminates all
other concurrent processes through requests made to the Service Manager, including restarting any failed processes.
• The ICM also starts, stops, and restarts the Service Manager for each node.
• The ICM will perform process migration during an instance or node failure.
• The ICM will be active on a single node. This is also true in a Parallel Concurrent Processing environment, where
the ICM will be active on at least one node at all times.
• The ICM really does not have any scheduling responsibilities. It has NOTHING to do with scheduling requests, or
deciding which manager will run a particular request. The function of the ICM is to run 'queue control' requests;
requests to startup or shutdown other managers.

• The ICM is responsible for startup and shutdown of the whole concurrent processing facility, and it monitors the
other managers periodically, and restarts them if they should go down. It can also take over the Conflict Resolution
Manager's job, and resolve incompatibilities.
• If the ICM itself should go down, requests will continue to run normally, except for 'queue control' requests. Your
Applications System Administrator can restart the ICM by running the 'startmgr' command; there is no need to kill
the other managers first.
Figure 5 shows the definition of the Internal Manager.
In this example of the ICM definition, there

is a Secondary Node defined for PCP details.
Figure 5
In Release 11i, if there is more than one possible Secondary Node and the Primary Node fails, PCP will failover to any node
that is available. By specifying a Secondary Node, it limits failover only to that node. An available node is any node, except
AUTHENTICATION, in the FND_NODES table whose status is set to ‘Y’.
Figure 6
In Figure 6, the TCP connection to RH9 has been disconnected and it shows a status of ‘N’.

Service Manager
(FNDSM process) - Communicates with the Internal Concurrent Manager, Concurrent Manager, and non-Manager Service
processes.
• The Service Manager (SM) spawns and terminates manager and service processes (these could be Forms, Apache
Listeners, Metrics or Reports Server, and any other process controlled through Generic Service Management).
• When the ICM terminates, the SM that resides on the same node with the ICM will also terminate.
• The SM is “chained” to the ICM. The SM will only reinitialize after termination when there is a function it needs to
perform (start, or stop a process), so there may be periods of time when the SM is not active, and this would be
normal.
• All processes initialized by the SM inherit the same environment as the SM.
• The SM’s environment is set by the APPSORA.env file, and the gsmstart.sh script.
• The TWO_TASK setting used by the SM to connect to a RAC instance must match the instance_name from
GV$INSTANCE.
• The apps_<sid> listener must be active on each Concurrent Processing node to support the Service
Manager connection to the local instance.
• There should be a Service Manager active on each node where a Concurrent or non-Manager service process will
reside.
FNDSM failover as noted in the Concurrent Manager log:
Could not contact Service Manager FNDSM_RH8_VIS. The TNS alias could not be located, the listener
process on RH3 could not be contacted, or the listener failed to spawn the Service Manager process.
Found dead process: spid=(962754), cpid=(2259578), Service Instance=(1045)
CONC-SM TNS FAIL
Call to PingProcess failed for WFMAILER
CONC-SM TNS FAIL
Call to StopProcess failed for WFMAILER
CONC-SM TNS FAIL
Call to PingProcess failed for FNDCPGSC
CONC-SM TNS FAIL
Call to StopProcess failed for FNDOPP
CONC-SM TNS FAIL
Call to PingProcess failed for OAMGCS
CONC-SM TNS FAIL
Call to StopProcess failed for OAMGCS
Starting WFMGSMD Concurrent Manager : 15-AUG-2008 13:28:56

Starting WFMGSMDB Concurrent Manager : 15-AUG-2008 13:28:56
Starting WFALSNRSVCB Concurrent Manager : 15-AUG-2008 13:28:57
Starting STANDARD Concurrent Manager : 15-AUG-2008 13:30:31

Starting Internal Concurrent Manager Concurrent Manager : 15-AUG-2008 13:30:32
Internal Monitor
(FNDIMON process) - Communicates with the Internal Concurrent Manager.
• This manager/service is used to implement Parallel Concurrent Processing.

• You do not need to run this manager/service unless you are using Parallel Concurrent Processing.
• The Internal Monitor (IM) monitors the Internal Concurrent Manager, and restarts any failed ICM on the local node.
It monitors whether the ICM is still running, and if the ICM crashes, it will restart it on another node.
• During a node failure in a PCP environment, the IM will restart the ICM on a surviving node (multiple ICMs may be
started on multiple nodes, but only the first ICM started will eventually remain active, all others will gracefully
terminate).
• There should be an Internal Monitor defined on each node where the ICM may migrate.
Standard Manager
(FNDLIBR process) - Communicates with the Service Manager and any client application process.

• The Standard Manager is a worker process that initiates, and executes client requests on behalf of Applications batch
and OLTP clients.
Figure 7 shows the Administer Concurrent Managers screen:
Notice that there are two nodes

defined, RH7 and RH8
Figure 7
You can also see the Concurrent Managers from the OAM web page:
Figure 8
In Figure 9, the Standard Manager is active on RH9, even though no Primary Node is defined:
3 processes will run if the

Standard Manager fails over
Figure 9
Since no Secondary Node is defined, the Standard Manager will not failover.

Notice in Figure 8 that in the Work Shifts definition, there are now Failover Processes, in order to specify the number of
processes that will run when the Standard Manager fails over to the Secondary Node.
Transaction Manager
Transaction Managers communicate with the Service Manager, and any user process initiated on behalf of Forms, or a
Standard Manager request.
A Transaction Manager:
• Supports synchronous processing of requests from a client program

• Gets a request for a client program to run a server-side program synchronously.
• Returns a status/results to the client program.
• At runtime, it starts a number of these managers as defined.
• Doesn’t poll concurrent request table for a new request
• You only need 1 Transaction Manager per database, not 1 per instance.
Figure 10 shows some of the Transaction Managers in Release 12:
Figure 10
Note that between Release 11i and Release 12, the way that Transaction Managers work has changed:
Release 11i Transaction Managers use DBMS_PIPE

• This does not work across RAC instances
• RAC users must perform additional configuration.
• Requires complicated configuration or additional hardware
Release 12 Transaction Managers use AQ

• Works across RAC Instances
• Simplifies configuration
• Reduces complexity
• Profile Option can switch between mechanisms
• DBMS_PIPE can be used for non-RAC users if performance becomes an issue
Transaction Managers allow a client to make a request for a program to be run on the server immediately. The client then waits
for the program to complete and can receive program results from the server. As the client and server are two separate database
sessions, the communication between them for Release 11i has been handled using the DBMS_PIPE package.
Unfortunately the DBMS_PIPE package does not extend to communications between sessions on different RAC instances. On
an Applications instance using RAC, the client and server are very likely to be on different instances, causing transactions to
time out for long periods or fail completely. The current workaround is to manually set up Transaction Managers to connect to

all RAC instances, which not only takes up additional resources, but may also require additional middle-tier hardware or a
complicated configuration that is difficult to maintain.
In Release 12, the Transaction Managers use the AQ mechanism; the Transaction Managers, work on RAC connected to either
instance. This greatly simplifies the configuration and reduces the complexity for RAC administrators. A Profile Option has
been introduced to allow users to switch between the two transports DBMS_PIPE or AQ.
SERVER (TM) CLIENT

Return with Error
Listen for Transaction Start Transaction
Requests
Timeout
Receive
Request No
Get
Yes No Concurrent
Processor Receive
Shut Return
Process Down? Message
Request Yes
Timeout
Yes
Yes
Place Results on
Return Queue Place message on AQ
Retrieve
Exit Transaction
Results
Here we see the Client and Server Process flows for the AQ Transaction Managers.
The client-side flow is:

1. The Client gets active Concurrent Processor Id which can process the transaction request.
2. The Client returns if it can’t find any processor id.
3. The Client places message containing the transaction details on the transaction AQ with the processor id as the
correlation id. This message is addressed by any available Transaction Manager that can process the client request.
4. The Client listens on the return queue for a return message until one arrives or a timeout period expires.
The server-side flow is:

1. The Transaction Manager will listen for any transaction requests that will get requests for its processor id.
2. The Transaction Manager will process the transaction request if there is any, and puts the results back in the return
AQ.
3. The Transaction Manager will repeat steps 1 and 2 until it shuts down.
TO SET UP TRANSACTION MANAGERS FOR PCP WHEN USING RAC

These steps apply to both 11i and R12:
1. Shut down the application tier services on all the nodes.
2. Shut down all the database instances cleanly in the RAC environment, using the command:
SQL>shutdown immediate;
3. Edit $ORACLE_HOME/dbs/<context_name>_ifile.ora and add these parameters:

_lm_global_posts=TRUE
_immediate_commit_propagation=TRUE From note: 362135.1
4. Start the instance on each database node.
5. Start up the Application tier services on all nodes.
6. Navigate to Profile > System and change the Profile Option ‘Concurrent: TM Transport Type' to ‘QUEUE', and
verify that the Transaction Manager works across the RAC instance. ATG RUP3 (4334965) or higher provides an
option to use AQs in place of Pipes. Note: 240818.1
7. Profile Option “Concurrent:TM Transport Type” can be set to PIPE or QUEUE
8. Pipes are more efficient but require a Transaction Manager to be running on each database Instance.
9. Navigate to the Concurrent > Manager > Define screen, and set up the Primary and Secondary Node names for the
Transaction Managers.
10. Restart the Concurrent Managers.
Conflict Resolution Manager

Concurrent Managers read requests to start Concurrent Programs. The Conflict Resolution Manager checks Concurrent
Program definitions for incompatibility rules.
If a program is identified as Run Alone, then the Conflict Resolution Manager prevents the Concurrent Managers from
starting other programs in the same conflict domain.
When a program lists other programs as being incompatible with it, the Conflict Resolution Manager prevents the program
from starting until any incompatible programs in the same domain have completed running.
If a Concurrent Program cannot run on any Concurrent Manager, perhaps because it has been assigned to a Concurrent
Manager that is disabled, then the Concurrent Request will stack up in the Conflict Resolution Manager.
When a Concurrent Program is started, Concurrent Managers read the request information from the FND Concurrent Request
tables. The Conflict Resolution Manager checks Concurrent Program definitions for incompatibility rules.
If a program is identified as Run Alone, then the Conflict Resolution Manager prevents the Concurrent Managers from
starting other programs in the same conflict domain.
When a program lists other programs as being incompatible with it, the
Conflict Resolution Manager prevents the program from starting until any incompatible programs in the same domain have
completed running.
TO ENABLE/DISABLE THE CONFLICT RESOLUTION MANAGER
• Use the system Profile Option 'Concurrent: Use ICM'. 'No'

o Allows the CRM to be started.
o Setting it to 'Yes' causes the CRM to be shutdown
o The Internal Manager (ICM) will take over the conflict resolution duties.
• Note that using the ICM to resolve conflicts is not recommended.
• The CRM's sole purpose is to resolve conflicts, while the ICM has other functions to perform as well. Only set this
option to 'YES' if you have a good reason to do so.
Internal Scheduler/Prereleaser Manager
The short name for this manager is FNDSCH. It is also known as the

Advanced Scheduler/Prereleaser Manager.
This manager is intended to implement Advanced Schedules. Its job is to determine when a scheduled request is ready to
run. Advanced Schedules were not fully implemented in Release 11.0. They are implemented in Release 11i, but are not
widely used by the various Applications modules.
General Ledger uses FNDSCH for financial schedules based on different calendars and period types. It is then possible to
schedule AutoAllocation sets,
Recurring Journals, MassAllocations, Budget Formulas, and MassBudgets
to run according to the General Ledger schedules that have been defined.
If financial schedules in GL are not being used then it is not a problem to deactivate this manager.
Internal Concurrent Manager Failover Definition
Release 11i
Define Primary and Secondary Nodes in Release 11i
Figure 11
By not specifying a secondary node the ICM can failover to any node that is available. Consider a system that has three or
more concurrent processing nodes and two nodes go down, including primary node RH3. If the secondary node was
specified, there would be a chance the secondary node would not be available. This capability, to failover to an un-named
secondary node, is available for all managers in 11i. In Release 12 this works differently.
Release 12
In Release 12, for failover to function properly, both primary and secondary nodes must be specified. Most managers won’t
start if a primary node is not assigned. However, a few managers, for example, the Internal Concurrent Manager, and the
Conflict Resolution Manager will start on any available node. If a secondary node is not defined, the manager will not
failover.

Figure 12
Release 12 Generic Services and Request Processing Managers
Figure 13
GENERIC SERVICES
Generic Services include the Internal Concurrent Manager and Conflict Resolution Manager.

Figure 14
REQUEST PROCESSING MANAGERS
Request Processing Managers include the Standard Manager and other Concurrent Managers.
Figure 15
GENERIC SERVICE MANAGEMENT
An E-Business Suite system depends on a variety of services, such as Forms Listeners, HTTP Servers, Concurrent Managers,
and Workflow Mailers. These services are composed of one or more processes. In the past, many of these processes had to
be individually started and monitored by the Applications System Administrator. Management of these processes is
complicated, since these services can be distributed across multiple host machines.
The introduction of Generic Service Management in Release 11i helped simplify the management of these processes by
providing a fault tolerant service framework and a central management console built into Oracle Applications Manager
(OAM).
Service Management is an extension of Concurrent Processing, and provides a framework for managing processes on
multiple host machines. With Service Management, virtually any application tier service can be integrated into this
framework.

Figure 16
Figure 16 shows that beginning with Release 11i, services such as the Oracle Forms Listener, Oracle Reports Server, Apache
Web listener, and Oracle Workflow Mailer can be run under Service Management.
With Service Management, the Internal Concurrent Manager (ICM) manages the various service processes across multiple
hosts. On each host, a Service Manager acts on behalf of the ICM, allowing the ICM to monitor and control service processes
on that host. Applications System Administrators can then configure, monitor, and control services though a management
console that communicates with the ICM. Figure 17 shows the Oracle Application Manager (OAM) screen that an
Applications System Administrator can use to manage the Concurrent Managers.

Figure 17
Service Management provides a fault tolerant system. If a service process exits unexpectedly, the ICM will automatically
attempt to restart the process. If a host fails, the ICM may start the affected service processes on a secondary host. The ICM
itself is monitored and kept alive by Internal Monitor processes located on various hosts.
TEST – KILL SERVICES TO SEE IF GSM RESTARTS THEM
In this example, we will kill the FNDSM process and the FNDCRM process to see if the Generic Services Manager correctly
restarts the process:
Kill FNDSM
applvis 9007 1 0 11:53 ? 00:00:00 FNDSM

applvis 9159 9155 0 11:55 ? 00:00:00 FNDLIBR
applvis 9161 5683 0 11:55 pts/3 00:00:00 grep FND
[applvis@rh9 scripts]$ kill -9 9007
[applvis@rh9 scripts]$ ps -ef |grep FND

applvis 9159 9155 0 11:55 ? 00:00:00 FNDLIBR

applvis 9169 1 0 11:55 ? 00:00:00 FNDSM

applvis 9249 5683 0 11:57 pts/3 00:00:00 grep FND
Kill FNDCRM
[applvis@rh9 scripts]$ ps -ef |grep FNDCRM

applvis 8886 1 0 11:52 ? 00:00:00 FNDCRM
APPS/ZGA13053E1E1B7BA773417089054DA88F194EAC0D687728CC2551870E6B78C4B439EADB287342795115A88DBC85788CC
B4 FND FNDCRM N 10 c LOCK Y RH9 1302318
[applvis@rh9 scripts]$ kill -9 8886
[applvis@rh9 scripts]$ ps -ef |grep FNDCRM

applvis 9457 9392 0 12:09 ? 00:00:00 FNDCRM
APPS/ZG26430816FA3570354BC57DE47FF105D145F8DE226EFE58CE04B416633DCB901267BFECFA7585114F7090060EFE1147
BE FND FNDCRM N 10 c LOCK Y RH9 1302343
In each case, both of these services were started before I could enter the grep command to find the corresponding process.
Figure 18 shows that the entire set of system services may be started or stopped with a single action.
Choose an action from the pulldown to start

or stop services
Figure 18
GSM AND MULTIPLE NODES

GSM enables users to manage Applications services across multiple middle-tier nodes. This includes services on Web/Forms
nodes that previously have had no concurrent processing footprint. Users configuring GSM in a multiple-node system should
be sure to have followed the instructions for setting up Parallel Concurrent Processing. This includes setting the environment
variable APPLDCP=ON and assigning a Primary Node for all defined managers and services (if not already defined.)
SEEDED GSM SERVICES
When configuring GSM the following GSM Services are seeded automatically:

• Forms Listener
• Metrics Server
• Metrics Client
• Reports Server
• Apache Listener
• LINUX users should not Activate the Reports Server under GSM
These services, once seeded, may be managed under GSM and controlled via the Oracle Applications Manager.
FNDSVCRG – SERVICE CONTROLLER UTILITY
FNDSVCRG is an executable introduced as a part of the Seeded GSM Services. It provides improved coordination between
the GSM monitoring of these services and their command-line control scripts.
The $FND_TOP/bin/FNDSVCRG executable is triggered from the concurrent processing control script before and after the
script starts or stops the service. FNDSVCRG connects to the database and validates the configuration of the Seeded GSM
Service.
If a service is not enabled to be managed under GSM, the FNDSVCRG executable does nothing and exits. The script then
continues to perform its normal start/stop actions.
If a service is enabled for GSM management, the FNDSVCRG executable will update the service information in the database
including the environment context, the current service log file location, and the current state of the service.
VERIFY GSM
• To verify that GSM is working, start the Concurrent Managers.
• Once GSM is enabled, the ICM uses Service Managers to start all Concurrent Managers and activated services.
• If the ICM successfully starts the managers, then GSM has been configured properly.
• If managers and/or services fail to start, errors should appear in the ICM log file.
Each Service Manager maintains its own log file named FNDSMxxxx.mgr, located in the same directory as the Concurrent
Manager log files. It is useful to examine these log files when there are problems starting services. If you cannot locate the
Service Manager log file, it is likely that the Service Managers are not starting properly and there is a configuration issue that
needs troubleshooting.
Parallel Concurrent Processing
APPLDCP Profile Option
Starting with Release 11.5.10, FND.H, the APPLDCP environment variable is ignored. Release 12 GSM requires the value of
APPLDCP to be set to “ON”. The value is hard-coded in afpcsq.lpc version 115.35, thereby ignoring the value of APPLDCP.
According to Oracle’s ATG Development in Note 753678.1:
“As of file "afpcsq.lpc" version 115.35 or higher, APPLDCP is internally hard-coded to "ON" when the Generic Service
Management (GSM) is enabled--"keeping in mind, use of the GSM is required".
In short, at "afpcsq.lpc" version 115.35 or higher with the GSM enabled, the setting of the APPLDCP environment variable is
ignored--this is the "default behavior on all Release 12 releases."
NOTE: As per ARU, "Patch 11i.FND.H" (3262159) and "Oracle Applications Release 11.5.10" (3140000) contains
"afpcsq.lpc" version 115.37.”
• In a Release 11i or Release 12 environment with Parallel Concurrent Processing enabled, the Primary Node
assignment is optional for the Internal Concurrent Manager.
• The Internal Concurrent Manager can be started from any of the nodes (host machines) identified as concurrent
processing server enabled.

• In the absence of a Primary Node assignment for the Internal Concurrent Manager, the Internal Concurrent Manager
will stay on the node (host machine) where it was started.
• If a Primary Node is assigned, the Internal Concurrent Manager will migrate to that node if it was started on a
different node.
• If the node on which the Internal Concurrent Manager is currently running becomes unavailable, the Internal
Concurrent Manager will be restarted on an alternate concurrent processing node.
• If a Primary Node is not assigned, the Internal Concurrent Manager will continue to operate on the node where it
was restarted.
• If a Primary Node has been assigned to the Internal Concurrent Manager, then it will be migrated back to that node
whenever the node becomes available.
Release 11i Parallel Concurrent Processing
• In releases before Release 11i, there must be an assigned Primary and Secondary Node for each Concurrent
Manager.
• Primary and Secondary Nodes need not be explicitly assigned. However, you can assign Primary and Secondary
Nodes for directed load and failover capabilities.
• In Release 11i, with three or more nodes in the concurrent processing tier, it is recommended to not specify the
Secondary Node for failover. This is because the specified Secondary Node may not be available when the Primary
Node goes down.
• By not specifying the Secondary Node, GSM can find an available node with Concurrent Processing services that
can be used during failover.
Release 12 Parallel Concurrent Processing
• With Release 12, if a Secondary Node is not specified, the processes will not failover as they do in Release 11i. This
is a critical difference between Release 11i and Release 12.
Parallel concurrent processing allows distribution of Concurrent Managers across multiple nodes. Benefits are improved
performance, availability and scalability (load balancing).
Parallel Concurrent Processing (PCP) is activated along with Generic Service Management (GSM); it can not be activated
independently of GSM. With parallel concurrent processing implemented with GSM, the Internal Concurrent Manager (ICM)
tries to assign valid nodes for Concurrent Managers and other service instances.
There should be only one ICM and CRM, at any given time. However, the ICM and CRM could be configured to run on
several of the nodes.
Concurrent Managers migrate to the surviving node when one of the concurrent nodes goes down.

The following diagram shows how Parallel Concurrent Processing works:

Web HTML Web Server

Browser Interface Data
Forms Server
JAVA
JInitiator
Interface Reports Server

FNDIMON FNDSM Agent .rdx
Standard
Manager Requests Logs Out
FNDCRM FNDLIBR

FNDIMON FNDSM Agent .rdx
Standard Database
FNDCRM Manager Requests Logs Out
FNDLIBR
Internal Concurrent Manager:

The Internal Concurrent Manager can run on any node, and can activate and deactivate Concurrent Managers on all nodes.
Since the Internal Concurrent Manager must be active at all times, it needs high fault tolerance. To provide this fault
tolerance, parallel concurrent processing uses Internal Monitor Processes.
Internal Monitor Processes:

The sole job of an Internal Monitor Process is to monitor the Internal Concurrent
Manager and to restart that manager should it fail. The first Internal Monitor Process to detect that the Internal Concurrent
Manager has failed restarts that manager on its own node.
Only one Internal Monitor Process can be active on a single node. You decide which nodes have an Internal Monitor Process
when you configure your system. You can also assign each Internal Monitor Process a Primary and a Secondary Node to
ensure failover protection.
Internal Monitor Processes, like Concurrent Managers, have assigned work shifts, and are activated and deactivated by the
Internal Concurrent Manager. However, automatic activation of PCP does not additionally require that Primary Nodes be
assigned for all Concurrent Managers and other GSM-managed services. If no Primary Node is assigned for a service
instance, the Internal Concurrent Manager (ICM) assigns a valid Concurrent Processing Server Node as the Target Node. In
general, this node will be the same node where the Internal Concurrent Manager is running.
In the case where the ICM is not on a Concurrent Processing Server Node, the ICM chooses an active Concurrent Processing
Server Node in the system. If a Concurrent Processing Server Node is not available, a Target Node will not be assigned.
If a Concurrent Manager does have an assigned Primary Node, it will only try to start up on that node; if the Primary Node is
down, it will look for its assigned Secondary Node, if one exists.
If both the Primary and Secondary Nodes are unavailable, the Concurrent Manager will not start (the ICM will not look for
another node on which to start the Concurrent Manager). This strategy prevents overloading any node in the case of failover.
The Concurrent Managers are aware of many aspects of the system state when they start up. When an ICM successfully starts

up, it checks the TNS listeners and database instances on all remote nodes. If an instance is down, the affected managers and
services switch to their Secondary Nodes.
Processes managed under GSM will only start on nodes that are in Online mode. If a node is changed from Online to Offline,
the processes on that node will be shut down and switched to a Secondary Node if possible.
Concurrent processing provides database instance-sensitive failover capabilities. When an instance is down, all managers
connecting to it switch to a secondary middle-tier node.
However, if you prefer to handle instance failover separately from such middle-tier failover (for example, using the TNS
connection-time failover mechanism instead), use the Profile Option Concurrent:PCP Instance Check. When this Profile
Option is set to OFF, Parallel Concurrent Processing will not provide database instance failover support; however, it will
continue to provide middle-tier node failover support when a node goes down.
For the Internal Concurrent Manager, you assign the Primary Node only.
To Set Up PCP with RAC
The following assumes a 2 node RAC cluster, where node1 is known as vip1 and node2 is known as vip2:
1. Check the configuration files tnsnames.ora and listener.ora located under the 8.0.6 ORACLE_HOME at
$ORACLE_HOME /network/admin/<context>. Ensure that you have information of all the other concurrent
nodes for FNDSM and FNDFS entries.
2. Restart the Applications listener processes on each application node.
3. Log in to Oracle E-Business Suite Release 11i as SYSADMIN and choose the System Administrator Responsibility.
Navigate to the Install > Nodes screen, and ensure that each node in the cluster is registered.
4. Verify that the Internal Monitor for each node is defined properly, with the correct Primary and Secondary Node
specifications and work shift details.
5. Confirm that the Internal Monitor manager is activated from Concurrent > Manager > Administrator, activating the
manager as required. For example, Internal Monitor: Host2 might have the Primary Node as vip2 and Secondary
Node as vip1.
6. On all Concurrent Processing nodes, set the $APPLCSF environment variable to point to a log directory on a shared
file system.
7. On all Concurrent Processing nodes, set the $APPLPTMP environment variable to the value of the
UTL_FILE_DIR entry in the init.ora file on the database nodes. This value should be a directory on a shared
file system.
8. Do not use a load balanced TNS entry for the value of s_cp_twotask. The request may hang if the sessions are
load balanced. Worker 1 connected to DB Instance 1 places a message in the pipe, and expects Worker 2 (which is
connected to DB Instance 2) to consume the message. However, Worker 2 never gets the message since pipes are
instance private. Optimizing the E-Business Suite with Real Application Clusters (RAC) - Ahmed Alomari
9. Set Profile Option 'Concurrent: PCP Instance Check'
o to 'ON' means that Concurrent Managers will fail over to a secondary application tier node if the database
instance to which it is connected goes down.

o to 'OFF' if instance-sensitive failover is not required.
Oracle Network Basics

There are four failover methods (and one method that we haven’t tested yet) that can be used once a TCP failure is detected:
Dead Connection Detection, TCP Keepalive, ICM Process Monitor (PMON), Connection Failure Recovery (Release 12), and
10g Timeout Parameters (our untested method).
1. Dead Connection Detection – sqlnet.expire_time=1 (minute)

Dead Connection Detection (DCD) is a feature of SQL*Net 2.1 and later, including Oracle Net8. DCD detects when a partner
in a SQL*Net V2 client/server or server/server connection has terminated unexpectedly, and releases the resources associated
with it.
DCD is initiated on the server when a connection is established. At this time SQL*Net reads the SQL*Net parameter files
and sets a timer to generate an alarm. The timer interval is set by providing a non-zero value in minutes for the
SQLNET.EXPIRE_TIME parameter in the sqlnet.ora file.
When the timer expires, SQL*Net on the server sends a "probe" packet to the client. The probe is an empty SQL*Net packet
and does not represent any form of SQL*Net level data, but it creates data traffic on the underlying protocol.
If the client end of the connection is still active, the probe is discarded, and the timer mechanism is reset. If the client has
terminated abnormally, the server will receive an error from the send call issued for the probe, and SQL*Net on the server
will signal the operating system to release the connection's resources.
TCP/IP, for example, is a connection-oriented protocol, and as such, the protocol will implement some level of packet
timeout and retransmission in an effort to guarantee the safe and sequenced order of data packets. If a timely
acknowledgement is not received in response to the probe packet, the TCP/IP stack will retransmit the packet some number
of times before timing out. After TCP/IP gives up, then SQL*Net receives notification that the probe failed.
On Unix servers, the sqlnet.ora file must be in either $TNS_ADMIN or

$ORACLE_HOME/network/admin. Neither /etc nor /var/opt/oracle alone is valid.
This is a server feature only. The client may be running any supported
SQL*Net V2 release.
DCD is much more resource-intensive than similar mechanisms at the protocol level.
With DCD enabled, if the connection is idle for the duration of the time interval specified in minutes by the
SQLNET.EXPIRE_TIME parameter, the Server-side process sends a small 10-byte packet to the client. This packet is sent
using TCP/IP.
y Both the Internal Concurrent Manager and the Internal Monitor can use the DCD functionality of the Network (TCP
sqlnet).
y The ICM is a client process connected to a DCD-enabled DB dedicated server process.
y The ICM holds the named PL/SQL Lock, the “ICM lock”.
y The IM is continuously trying to check whether it can get the same named PL/SQL Lock.
y As soon as the “ICM lock” is released by the DB / DCD, FNDIMON pings the ICM node, and the IM deduces that
the ICM has crashed.
o If the ping succeeds, we conclude that the ICM is fine. Obviously, the ICM can be down, even if TCP is
working, so this is bad logic that can lead to false positives.
o If the ping fails, we further check if it has been over four PMON cycles since the ICM updated the
work_start column in the FND_CONCURRENT_QUEUES table.
o If it has been more than four PMON cycles we conclude that the ICM is dead.

y The DCD comes into the picture here after the ICM has crashed and the database needs to identify that the ICM is
gone.
y The database needs to clean up the dedicated server process resource corresponding to the ICM client process.
To Configure Dead Connection Detection (DCD)

Implement by:
adding SQLNET.EXPIRE_TIME = 1 (Minutes) to the sqlnet.ora file
With DCD enabled, if the connection is idle for the duration of the time interval specified in minutes by the
SQLNET.EXPIRE_TIME parameter, the Server-side process sends a small 10-byte packet to the client. This packet is sent
using TCP/IP.
TCP/IP is a connection-oriented protocol. This protocol implements a level of packet timeout and retransmission to help
guarantee the safe and sequenced order of data packets.
If a timely acknowledgement is not received in response to the probe packet, the TCP/IP stack will retransmit the packet
some number of times before timing out. After TCP/IP gives up, then SQL*Net receives notification that the probe failed.
If the client side connection is still connected and responsive, the client sends a response packet back to the database server,
resetting the timer, and another packet will be sent when next interval expires (assuming no other activity on the connection
If the client fails to respond to the DCD probe packet:

• The Server side process is marked as a dead connection and
• PMON performs the clean up of the database processes / resources and
• The client OS processes are terminated
Dead Connection Detection:

1. DCD initiates clean up of OS and database processes that have disconnected / terminated abnormally
2. DCD will not initiate clean up sessions that are still connected ... but are idle / abandoned / inactive.
2. TCP Keepalive
Keep-Alive is a TCP/IP mechanism that allows a connection to detect if the partner has unexpectedly died. It is a function of
the TCP stack in use and is NOT an Oracle mechanism, although Oracle can request for KeepAlive to be enabled or disabled
for a given connection. SQL*Net connections do not enable keepalive for TCP connections by default. However, it is
possible to enable this by adding a parameter to the sqlnet.ora file. Adding this parameter turns on a TCP level facility which
can detect the loss of a server. If the server dies then keepalive will notice this and signal an error to Oracle Net code. In a
RAC environment, TAF notices this error and performs fail-over as if the remote instance had been aborted.
TCP KEEPALIVE PARAMETERS FOR LINUX:
tcp_keepalive_time the time since the last data packet sent and the first keepalive probe
tcp_keepalive_intvl the time between keepalive probes
tcp_keepalive_probes the number of probes to be sent before declaring the connection dead
Initial Settings tcp_keepalive_time = 200 seconds

tcp_keepalive_intvl = 20
tcp_keepalive_probes = 2
After 200 seconds of no response, TCP sends the first of 2 probes, 20 seconds apart. Then, TCP notifies SQL*Net of the
failure, and SQL*Net removes the offending connection.
tcp_retries1 (default: 3) The number of times TCP will attempt to retransmit a packet on an established
connection normally, without the extra effort of getting the network layers involved.
tcp_retries2 (default: 15) The maximum number of times a TCP packet is retransmitted in established state
before giving up

tcp_syn_retries (default: 5) The maximum number of times initial SYNs for an active TCP connection attempt
will be retransmitted. The default value is 5, corresponds to approximately 180
seconds.
Now let’s consider an example where the following TCP parameters are changed from their default values:
tcp_retries1 = 2
tcp_retries2 = 2
tcp_syn_retries = 2
In this example, the time to initialize the PCP failover was an average of 8 seconds after changing these TCP parameters.
We found the following Linux parameters listed in the Metalink note: 249213.1
net.ipv4.tcp_keepalive_time 3000
net.ipv4.tcp_retries2 5
net.ipv4.tcp_syn_retries 1
By changing some of these parameters, the timeout period was reduced to about 20 seconds, with the following breakdown
for the timeout:
• The client initiates a TCP/IP three-way handshake, but there is no response.

• The client waits a specified amount of time (OS configurable usually) like 200ms.
• It sends the SYN packet again, but still gets no response.
• It waits 400ms and tries again.
• Receiving no response, it waits 800ms and tries again.
• Again receiving no response, it waits 1600ms and tries again.
• After another wait of 3200ms, the client gives up.
• By now 6.2 seconds have passed by.
Therefore it keeps trying every 3200ms until a magic interval occurs and it stops. On Sun this interval is
tcp_ip_abort_cinterval and defaults to 3 minutes (180000ms).” Note: 249213.1
Six seconds is very close to the time measured during tests with tcp_syn_retries and tcp_retries2 set to 2. The measured
average was 8 seconds.
Multiple measurements at 5 seconds recorded no change in connection status. However, one failover was initiated at a
measured time of 6 seconds.
When configured correctly, Keepalive enables dead connections to be discovered and closed more quickly, freeing resources
used on the server more quickly.
At the time of this document, client side SQL*Net connections do not enable keepalive for TCP connections by default.
However, it is possible to enable this by adding the ENABLE=BROKEN parameter to the SQL*Net connect string, by adding
this parameter to the sqlnet.ora file.
**WARNING** Keepalive intervals can typically be set to 2 hours or more (i.e,,it can take more than 2 hours to notice a
dead server even if keepalive is enabled). To make keepalive useful for PCP and TAF the keepalive interval needs to be
reduced to a smaller value (such as 2 minutes).
If there are a lot of IDLE connections on your network, then reducing keepalive can increase network traffic significantly.
Sample TNS alias to enable keepalive (notice the ENABLE=BROKEN clause)
VIS_BALANCE =
(DESCRIPTION =
(ENABLE=BROKEN)

(ADDRESS_LIST =
(LOAD_BALANCE = ON)
(FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = rh8)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = rh6)(PORT = 1521)))
3. ICM Process Monitor (PMON) – once TCP fails, this method, introduced with Patch 6495206, takes 2 minutes
• If the “ICM lock” is not available, FNDIMON will now ping the node of the ICM.
• If the ping succeeds, we conclude that the ICM is fine.
• If the ping fails, we further check if it has been over four PMON cycles since the ICM updated the WORK_START
column of the FND_CONCURRENT_QUEUES table.
• If it has been more than four PMON cycles we conclude that the ICM is dead.
Release 11i only uses PMON if patch 6495206 has been applied. The PMON method is included in Release 12.
DEFAULT PMON SETTINGS
Figure 19 shows the Oracle Application Manager screen with the PMON settings for this instance:
Click here to edit the PMON

parameters
Figure 19
4. Connection Failure Recovery (Release 12)

When Concurrent Managers fail due to a loss of the database connection:
• A Reviver process will be started

• When the database connection is possible, the Reviver will restart Concurrent Processing
• Concurrent Processing can be started / stopped when the network or database is down
• This should reduce processing down time because Concurrent Processing restarts as soon as possible
• This should reduce the Applications System Administrator’s workload, since he will no longer need to take the extra
step of restarting the Concurrent Managers
Of the first three methods, in Release 11i, the method that recognizes the failure first depends on the timeout settings of each
method.
In Release 12, Method 4 is used to perform failover.

5. 10g Timeout Parameters (Untested Solution)
With the release of Oracle 10g, Oracle can time out within a desired period, instead of waiting for the TCP timeout to occur.
The following settings can be used in the sqlnet.ora file on the client or server:
• sqlnet.inbound_connect_timeout (server)
• sqlnet.send_timeout (client and/or server)
• sqlnet.recv_timeout (client and/or server)
This method should provide automated recovery for Concurrent Managers after network or database failures. When a
network failure occurs on a concurrent processing node, resulting in a loss of database connectivity, all Concurrent Managers
running on that node will eventually be forced to shut down.
In cases where multiple Concurrent Processing nodes are being used, and these other nodes retain their database connection,
the managers will migrate to the working nodes. In the case where only a single Concurrent Processing node is being used, or
when all Concurrent Processing nodes lose their database connection (for example if the database node suffers a network
failure), all running Concurrent Managers on the entire instance will be forced to shut down.
Without this feature, when the network comes back up, the managers must be restarted manually, as there is no automatic
restart facility. This can lead to lost productivity between the time the network is restored and when the managers are
restarted.
With this new feature, the Concurrent Managers will restart automatically as soon as connectivity is restored. To achieve this,
when a connection failure situation arises, a new monitor process, the Reviver, is started. This process will remain alive until
it is able to obtain a database connection and restart Concurrent Processing.
In addition this allows the Applications System Administrator to maintain control over Concurrent Processing even when
network or database failure has brought down Concurrent Processing. When the connection is down, an administrator can
still start CP using the adcmctl.sh script and by doing so it will start a Reviver process.
When Concurrent Processing is down and a Reviver process is actively waiting to restart Concurrent Processing, the
adcmctl.sh script can be used to stop Concurrent Processing, as it will detect the Reviver and shut it down.
There is no additional setup required to use Connection Failure Recovery. If you wish to disable Connection Failure
Recovery you can do so by setting the Concurrent Processing Reviver Process context file variable to “Disabled”.

REVIVER
ICM
Start No Receive
Starts to Shutdown Shutdown?
Attempt to
Get DB
Lost DB Connection No Sleep
Connection?
Yes
Yes
No Kill Previous DB
Spawn Reviver Session No
Yes ICM
Started?
Start ICM Yes
Exit Exit
From Aaron Weisberg at Oracle.
reviver.sh – code summary

Sleep 30
Test_connection
Kill_old _icm
Get session
Alter system kill session
Check_running_icm
Fnd_conc.ecm_alive
start_icm
startmgr.sh
This example shows the reviver.log:
reviver.sh starting up...

[ Mon Jan 12 20:02:15 MST 2009 ] - Read APPS username/password.
[ Mon Jan 12 20:02:45 MST 2009 ] - Attempting database connection...
[ Mon Jan 12 20:02:45 MST 2009 ] - Successful database connection.
[ Mon Jan 12 20:02:45 MST 2009 ] - Killing previous ICM session...
1 row updated.
Commit complete.
[ Mon Jan 12 20:02:45 MST 2009 ] - Looking for a running ICM process...
[ Mon Jan 12 20:02:45 MST 2009 ] - ICM now running, reviver.sh complete.
Reviver Context variables

• Concurrent Processing Reviver Process
s_cp_reviver
• Reviver Process PID Directory Location

s_cp_fndreviverpiddir
Writable directory location to create a pid file for ICM reviver process

As part of its shutdown process, the ICM will detect that it is being forced to shut down due to losing its database connection.
This is done by looking for specific error messages ORA-3113, ORA-3114 or ORA-1041; If one of these errors is detected:
• The ICM will assume that it has lost its database connection and will spawn the reviver process.
The ICM will pass the Apps username/password to the script using a secure protocol, along with the Oracle session id of the
current ICM process. When the script starts, it will attempt to make a database connection using sqlplus. If unsuccessful, it
will sleep for a 30 seconds before trying again. It will continue this until it either successfully makes a connection or it
receives a signal to shut itself down.
When it successfully makes a connection, it will first kill the old ICM database session to make sure any locks are released,
then start a new ICM using the normal startmgr script. It then checks to make sure an ICM is successfully running; it will
not exit until a new ICM is running.
Once the ICM is restarted, it will start up any other managers that had been shut down and normal processing will resume.
PCP Failover
Failover is the process of migrating the Concurrent Managers from the Primary Node to the Secondary Node because of a
concurrent processing tier failure or listener failure.
Failback is when the Primary Node becomes available again and the Concurrent Managers need to migrate back to their
original Primary Node.
If the Concurrent Managers are set up for PCP fail-over:

• Failover is triggered when a node running the ICM goes down
• When the ICM goes down, the connected database server process clears its resources (including named PL/SQL
“ICM lock”)
• The database server process cleanup is dependent on the DCD mechanism of the network (sql*net)
• sql*net determines that a connected client has closed down through the DCD mechanism, and triggers the database
server process cleanup
For example, if:
Primary Node = HOST1 – The Managers assigned to the Primary Node are ICM (FNDLIBR-cpmgr), and FNDCRM
Secondary node = HOST2 – The Manager assigned to the Secondary Node is Standard Manager (FNDLIBR)
When HOST1 becomes unavailable (this means TCP is no longer working), both the ICM and FNDCRM are migrated to
HOST2. This can be seen from the Administer Concurrent Manager screen in the System Administrator Responsibility.
The $APPLCSF/log/.mgr logfile shows that HOST1 is being added to the unavailable list.
The Log and Out directories must be on a shared disk
On HOST2, after the PMON cycle, FNDICM, FNDCRM, and FNDLIBR are now migrated and running.
FNDIMON and FNDSM run independently on each concurrent processing node.
FNDSM is not a persistent process, and FNDIMON is a persistent process local to each node

Be aware that if a TCP failure is not detected, failover will not occur. The following excerpt from a Concurrent Manager log
shows the case where a failure is detected:
fdpsrp() (running_processes correction):

ICM cannot obtain exclusive lock on FND_CONCURRENT_QUEUES
Oracle error code returned: 1
This message is information and does not indicate a problem with CP functionality.
remote call function (FNDIMON)

15-AUG-2008 10:06:02 - Function to call: PingProcess
The PingProcess at the end of this log continues until the concurrent manager processes resume, or a TCP failure is detected,
and failover is begun.
ICM Failover in Release 11i
• ICM and IM use the DCD functionality of the Network (TCP sqlnet).
• ICM is a client process connected to a DCD enabled DB dedicated server process.
• ICM holds the named PL/SQL Lock, the “ICM lock”.
• IM is continuously trying to check whether it can get the same named PL/SQL Lock.
• As soon as the “ICM lock” is released by the DB / DCD from the ICM crash, FNDIMON pings the ICM node, and
the IM deduces that the ICM has crashed.
• The DCD works after the ICM has crashed and DB needs to identify that the ICM is gone.
• Then, the DB needs to clean up the dedicated server process resource corresponding to the ICM client process
• If the “ICM lock” is not available, FNDIMON will now ping the node of the ICM.
• If the ping succeeds, we conclude that the ICM is fine.
o Obviously, the ICM can be down, even if TCP is working, this is bad logic.
• If the ping fails, we further check if it has been over four PMON cycles since the ICM updated the WORK_START
column in the FND_CONCURRENT_QUEUES table.
• If it has been more than four PMON cycles we conclude that the ICM is dead.
• Fail over is triggered when node running the ICM goes down
• This ICM going down would lead to connected database server process clearing its resources (including named
PL/SQL lock)
• In turn, the database server process cleanup is dependent on DCD mechanism of network (sqlnet)
• That is, sqlnet determines that connected client has closed down through DCD mechanism and triggers database
server process cleanup
11i PCP Failure
The following steps occur in the order indicated:
• TCP Failure
• ICM Lock is released, FNDIMON pings ICM node, if ping fails, check PMON
• PMON detects a “dead process”, crashed ICM
• reviver.sh
• DCD
R12 PCP Failure
• TCP Failure
• PMON detects a “dead process”
• ICM Shutdown
o Look for error messages ORA-3113, ORA-3114 or ORA-1041
• reviver.sh
• DCD
Test PCP Failover Components
Test to explore effect of DCD, PMON and TCP failover methods.
Variables: sqlnet.expire_time, PMON sleep and number of cycles, and the following TCP Keepalive parameters:

• tcp_keepalive_time,
• tcp_keepalive_intvl,
• tcp_keepalive_probes
• tcp_retries1 (default: 3, new value 2)
• tcp_retries2 (default: 15, new value 2)
• tcp_syn_retries (default: 5, new value 2)
Failover Expire_time PMON PMON tcp_KA tcp KA tcp KA tcp tcp tcp syn
time / In Minutes Sleep Cycles time intvl probes retries retries2 retries
Failback
time
In Seconds
241/ 1 30 secs 4 200 20 2 3 15 5
250/ 50 5 30 secs 4 200 20 2 3 15 5
262 / 100 10 30 secs 4 200 20 2 3 15 5
300 / 75 1 15 secs 2 200 20 2 3 15 5
285/ 35 10 30 secs 4 1000 60 10 3 15 5
8/ 105 1 30 secs 4 1000 60 10 2 2 2
10/ 42 1 30 secs 4 200 20 2 2 2 2
7/ 40 10 30 secs 4 200 20 2 2 2 2
6/ 34 1 15 secs 2 200 20 2 2 2 2
Test the Failover and Failback of Parallel Concurrent Processing
In Figure 20, Oracle Application Manager (OAM) shows the details of the Internal Manager (ICM) Activated on RH9:
Figure 20
In Figure 21, the ICM, CRM and Standard Managers all have their Primary Node as RH9.

Figure 21
In Figure 22, we can see that the Standard Manager is configured to failover to the Secondary Node RH7:
Figure 22
Disconnect TCP Connection from RH9

The Internal Concurrent Manager has encountered an error.
Review concurrent manager log file for more detailed information. : 12-JAN-2009 15:22:55 -

Shutting down Internal Concurrent Manager : 12-JAN-2009 15:22:55

12-JAN-2009 15:22:55
The ICM has lost its database connection and is shutting down.
Spawning reviver process to restart the ICM when the database becomes available again.
Spawned reviver process 1541.
The VIS_0112@VIS internal concurrent manager has terminated with status 1 - giving up.
Found dead process: spid=(17963), cpid=(1302176), ORA pid=(26), manager=(0/1)

DB Node – RH8
RH7 Database
RH9
PCP PCP sqlnet.ora
Database
Listener
SQL*Net SQL*Net
Client Client
TCP_KEEPALIVE takes 240 seconds before starting DCD
Found dead process: spid=(1185), cpid=(1301550), ORA pid=(78), manager=(0/1)

Process monitor session started : 12-JAN-2009 15:18:27
Internal Concurrent Manager found node RH9 to be down. Adding it to the list of unavailable nodes.
CONC-SM TNS FAIL
Call to PingProcess failed for XDPCTRLS
CONC-SM TNS FAIL
Call to PingProcess failed for XDPQORDS
In Figure 23, OAM shows node RH9 is down, as well as all the application services on RH9.
Node RH9 is
down!
Figure 23
Figure 24 shows the CRM is down, Actual=0 and Target=1.

The Conflict
Resolution
Manager is down!
Figure 24
The ICM tries to restart the CRM and other failed processes, but can’t.
CONC-SM TNS FAIL

Starting XDP_Q_EVENT_SVC Concurrent Manager : 12-JAN-2009 15:19:21
CONC-SM TNS FAIL

If we run the command ps-ef | grep applvis, we can see defunct processes:
The CRM and two other FNDLIBRs are shutting down, but the FNDSM is still running. The ICM is still running in another
FNDLIBR, show below:
The FNDSM Service Manager is still running.
RH9 is shown as down, TCP is disconnected, and the Internal Manager is failed over to RH7, as shown in Figure 25:

RH7 is now running the

Internal Manager
Figure 25
RH7 starts up the Conflict Resolution Manager in Figure 26:
RH7 starts up the

Conflict Resolution
Manager
Figure 26
In Figure 27, the Concurrent Managers have started processing Concurrent Rerquests on the Secondary Node, RH7:

Figure 27
Figure 28 shows the Oracle Applications Manager screens with RH7 activated:
Figure 28
It is important to note that, unlike Release 11i, Release 12 doesn’t failover a manager if there is no Secondary Node defined.
In Figure 29, only the Session History Cleanup, Standard Manager and WMS Task Archiving Manager have Secondary
Nodes defined. In this case, the Primary Node is RH9 and the Secondary Node is RH7.

The Inventory Manager, MRP

Manager and OAM Metrics
Collection Manager will not
failover unless they are defined to
do so.
Figure 29
ICM Failover
Figure 30 shows the Internal Manager processing migrating back to the Primary Node, RH9.
Starting Internal Concurrent Manager Concurrent Manager : 12-JAN-2009 15:19:45

: Started ICM on Target RH7.
Process monitor session ended : 12-JAN-2009 15:21:15
: Migration of ICM has completed.

The VIS_0112@VIS internal concurrent manager has terminated successfully - exiting.
Figure 30
In Figure 31, the Internal Manager is up for RH9 and the Conflict Resolution Manager is starting up on RH9:

Figure 31
Figure 32
Failover is complete, for the ICM and CRM, from RH9 to RH7. In the next section the TCP is reconnected and the failback
from RH7 to RH9 is documented.
Connect TCP connection

Failback from RH7 to RH9
Failback from RH7 to RH9 is starting:
Start of Failback
Starting Internal Concurrent Manager Concurrent Manager : 12-JAN-2009 15:12:35

: Started ICM on Target RH9.
Process monitor session ended : 12-JAN-2009 15:14:05
: Migration of ICM has completed.


The VIS_0112@VIS internal concurrent manager has terminated successfully - exiting.
=======================================================================
Starting VIS_0112@VIS Internal Concurrent Manager -- shell process ID 14927
logfile=/d01/oracle/VIS/inst/apps/VIS_rh8/logs/appl/conc/log/VIS_0112.mgr
PRINTER=noprint
mailto=applvis
restart=N
diag=N
sleep=30
pmon=4
quesiz=1
Reviver is ENABLED
End of Failback
Administer Concurrent Managers
Figure 33
Target Nodes
Using the Services Instances page in Oracle Applications Manager (OAM) or the Administer Concurrent Managers form,
you can view the Target Node for each Concurrent Manager in a parallel concurrent processing environment.
The Target Node is the node that the processes associated with a Concurrent Manager should run. It can be the node that is
explicitly defined as the Concurrent Manager's Primary Node in the Concurrent Managers window or the node assigned by
the Internal Concurrent Manager, if no Primary Node is defined.

Figure 34
If you have defined Primary and Secondary Nodes for a manager, then when its Primary Node and ORACLE instance are
available, the Target Node is set to the Primary Node. Otherwise, the Target Node is set to the manager's Secondary Node (if
that node and its ORACLE instance are available). During process migration, processes migrate from their current node to
the Target Node.
Control Across Nodes
Using the Application Services category on the Site Map page in Oracle Applications Manager or the Administer Concurrent
Managers form, it is possible to start, stop, abort, restart, and monitor Concurrent Managers and Internal Monitor Processes
running on multiple nodes from any node in your parallel concurrent processing environment.

Figure 35
Figure 36 shows that It is not necessary log onto a node to control concurrent processing on it. It is possible to terminate the
Internal Concurrent Manager or any other Concurrent Manager from any node in your parallel concurrent processing
environment using Oracle Application Manager:

Figure 36
Starting the Concurrent Managers

The Internal Concurrent Manager starts first, followed by the Conflict Resolution Manager and then the other Generic
Managers, Concurrent Managers and Transaction Managers.
Figure 37
Start up parallel concurrent processing by running the adcmctl.sh script from the operating system prompt, as shown
below:
adcmctl.sh start apps/apps
The Internal Concurrent Manager starts up on the node where the adcmctl.sh script is run. If it is assigned to a different
node, the ICM will migrate to the Primary Node, when available.

After the Internal Concurrent Manager starts up, it starts all the Internal Monitor Processes and all the Concurrent Managers.
It attempts to start Internal Monitor Processes and Concurrent Managers on their Primary Nodes, and resorts to a Secondary
Node only if a Primary Node is unavailable.
From the Concurrent Manager logs:
Starting VIS_0815@VIS_BALANCE Internal Concurrent Manager -- shell process ID 978956
logfile=/VIS/logs/apps/log/VIS_0815.mgr
PRINTER=noprint
mailto=VIS
restart=N
diag=Y
sleep=15
pmon=4
quesiz=1 (default)
Edit the ICM Runtime Parameters

Figure 38 shows that you can edit the ICM Runtime Parameters from Oracle Application Manager:
Figure 38
In Figure 39, the defaults for the PMON settings are initially displayed:

Figure 39
Figure 40 shows that you can change the Sleep Interval to 15 seconds and keep the PMON cycles at 4. This should recognize
a failure 1 minute after TCP finds a “dead peer”.

Figure 40
Once you’ve saved your changes, Figure 41 shows a screen that confirms that you made changes:
Figure 41

Make sure the PMON changes are made in the $FND_TOP/bin/batchmgr.sh file.
FILENAME
# batchmgr
# DESCRIPTION
# fire up Internal Concurrent Manager process
# USAGE
# batchmgr arg1=val1 arg2=val2 ...
#
# Parameters may be sent via the environment.
#
# ARGUMENTS DEFAULT
# [appmgr|sysmgr]=username/password
# [sleep=sleep_seconds] 15
# [mgrname=manager_name] icm
# [logfile=log_filename] $FND_TOP/$APPLLOG/$mgrname.mgr
# [restart=N|mim minutes between restarts] N
# [mailto="user1 user2..."] current user
# [PRINTER=printer_name]
# [pmon=iterations] 4
# [quesiz=pmon_iterations] 1
# [diag=Y|N] N
#
# SYSMGR holds the Oracle user as whom the manager should run
# and its password.
#
# SLEEP holds the number of seconds that the manager should wait
# between checks for new requests.
#
# MGRNAME is the name of the manager for locking and log purposes.
#
# LOGFILE is a filename in which the manager's own log is stored.
#
# RESTART is set to N if the manager should not restart itself after
# a crash. Otherwise, it is an integer number of minutes. The
# manager will attempt a restart after an abnormal termination
# if the past invocation lasted for at least RESTART minutes.
#
# MAILTO is a list of users who should receive mail whenever
# the manager terminates.
##
# PMON is the duration of time between process monitor
# checks (checks for failed workers). The unit of time
# is concurrent manager iterations (request table checks).
#
# QUESIZ is the duration of time between worker quantity
# checks (checks for number of active workers). The unit
# of time is process monitor checks.
Concurrent Processing is typically started from the command line by using one of these start scripts, startmgr.sh or
adcmctl.sh:
startmgr.sh
• Schema logon is passed using sysmgr parameter
• Apps logon may be passed using appmgr parameter
• Apps user must have System Administrator responsibility
The startmgr.sh script accepts the schema logon when passed as the sysmgr parameter. Now it will also accept an
Applications user sign on via the appmgr parameter. Note that the Applications User must have System Administrator
responsibility in order to be able to successfully start Concurrent Processing.
adcmctl.sh

• Accepts a single username/password combination

• By default it is the schema logon
• Context File variable: Concurrent Processing Password Type
o AppsSchema or AppsUser
The adcmctl.sh script is more commonly used. It will accept a single username/password combination. There is a context
file variable that determines whether this script expects a schema logon or an Applications logon. By default the schema
logon is expected.
To start using the Application Sign On instead, edit the context file variable Concurrent Processing Password Type and set its
value to AppsUser. Then run autoconfig to regenerate the adcmctl.sh script. The script will then begin to expect an
Applications Username and Password.
• Schema logon style:

o CONCSUB apps/appspass SYSADMIN ‘System Administrator’ SYSADMIN CONCURRENT FND
FNDSCARU <parameters>
• New Apps User Sign On Style

o CONCSUB Apps:User SYSADMIN ‘System Administrator’ User/UserPass CONCURRENT FND
FNDSCARU <parameters>
For this example we will use the Concurrent Program FNDSCARU, the schema logon apps/appspass and the Applications
User logon of User/UserPass. Previously to submit a request to run FNDSCARU using CONCSUB, you would run the
CONCSUB program from the command line as shown here.
Now you can choose to authenticate instead using an Applications username and password. To do so, in place of the schema
logon you should specify Apps:User as shown here. This indicates that an Applications User Sign On will be used. Then for
the Applications username parameter you should append the corresponding password. If you pass the Apps:User parameter
but do not supply a password for your specified Applications username, you will be prompted to enter the password.
Functional Security is enforced for Request Submission. After the Applications username and password is authenticated,
CONCSUB will verify that the user has the appropriate permission to submit the Concurrent Request. If the security check
fails, an error message will be printed to the screen.
Shutting Down Managers
You shut down parallel concurrent processing by issuing a "Stop" command in the OAM Service Instances page or a
"Deactivate" command in the Administer Concurrent Managers form. All Concurrent Managers and Internal Monitor
processes are shut down before the Internal Concurrent Manager shuts down.
Run the adcmctl.sh script from the

Release 11i - $COMMON_TOP/admin/scripts/<Context Name>, or
Release 12 - $INST_TOP/admin/scripts:
adcmctl.sh stop apps/apps
After the failover test, sometimes the services would not failback on RH9. Figure 42 shows the OAM Dashboard and
indicates that RH9 and the applications services are unavailable. Remember, the test pulls the TCP cable from the host.

Figure 42
In order to restart the services on RH9, first stop all the services on RH9 with:
adstpall.sh apps/apps (sometimes a kill -9 -1 is necessary as the APPLMGR user)
By stopping the services, GSM is able to restart the services, except the concurrent processing, which was stopped
Figure 43
In order to start the Concurrent Managers use:
adcmctl.sh start apps/apps
This starts the concurrent processing on all nodes.
Figure 44
Concurrent Manager Log and Out Directories

The Concurrent Manager first looks for the environment variable $APPLCSF. If this is set, it creates a path using two other
environment variables: $APPLLOG and $APPLOUT
It places log files in $APPLCSF/$APPLLOG, output files go in

$APPLCSF/$APPLOUT
So, for example, if you have this environment set:

$APPLCSF = /d01/oracle/VIS/inst/apps/VIS_rh9
$APPLLOG = log
$APPLOUT = out
Then:
• Log files go to: /d01/oracle/VIS/inst/apps/VIS_rh9/logs
• Out files to: /d01/oracle/ VIS/inst/apps/VIS_rh9/out

If $APPLCSF is not set, it places the files under the product top of the application associated with the request. For example, a
PO report would go under $PO_TOP/$APPLLOG and $PO_TOP/$APPLOUT
All these directories must exist and have the correct permissions.
All concurrent requests produce a log file, but not necessarily an output file.
Concurrent Manager log files follow the same convention, and will be found in the $APPLLOG directory
Concurrent Processing Tables
Major tables that contain information about concurrent processing:
Table Description
FND_CONCURRENT_REQUESTS Details of user requests, including status, start date

and completion date
FND_CONCURRENT_PROGRAMS Details of Concurrent Programs, including
execution method, whether the program is
constrained and whether there are incompatibilities
FND_CONCURRENT_PROCESSES Cross reference between concurrent requests and
queues; includes a history of Concurrent Manager
requests.
FND_CONCURRENT_QUEUES Details about the Concurrent Manager queues.
FND_NODES Node info including availability status
FND_CONCURRENT_QUEUE_PARAMS PMON and Reviver parameters

Load Balancing
Types of Load Balancing

There are several types of load balancing:
• Concurrent Processing Load Balancing
• JSP-JDBC Load Balancing
• JVM Load Balancing
• Functionally Referenced Nodes
For this paper, we’ll only discuss Concurrent Processing Load Balancing.
Concurrent Processing Load Balancing
• Load Balancing with both nodes running – no failover
• Load Balancing during failover
Parallel Concurrent Processing has many benefits. Key among these is its capability to provide failover in case of node
failure. When a node fails, the processes that were running on that node are restarted on Secondary Nodes (as defined by the
System Administrator.) This helps maintain throughput and keep the business running during node failures.
However, a resource intensive node (one with many processes) may inadvertently overtax the system when it fails-over. A
Secondary Node may not be able to handle its normal workload and the additional burden of managers/processes from a
failed node.
If too many processes are running on the Secondary Node when the Primary Node fails-over, the Secondary Node may not
have the capacity to process the requests from additional Concurrent Managers.
Release 12 introduces Failover Sensitive Workshifts. This enhancement allows the System Administrator to configure how
many processes failover for each workshift. With this added control, Applications System Administrators are able to enjoy
the benefits of PCP failover while reducing the risk of performance issues through overloaded resources.
Figure 45
Processing capabilities during failover may be severely degraded on the remaining hosts, unless failover processes are
restricted.
A host may be considered underutilized if the CPU utilization is less than 70%. A typical production environment may have
two application tiers, each running Apache, Forms, and Concurrent Processing. Each node supports half the JSP and Forms
users, half the Concurrent Requests and has 70% average CPU utilization.
Release 11i has no mechanism for decreasing the number of processes a manager can run during a failover. It is clearly not
possible to process 140% of the workload on one of two remaining apps tiers.

Figure 46
EXAMPLE OF DECREASING THE NUMBER OF “FAILOVER PROCESSES” IN RELEASE 12

In order to compensate for further failovers, the hosts have received hardware upgrades that allow them to process 100%
more workload. Now each host has an average CPU utilization of 35%. The combined average workload during failover is
70%. This is approaching the limit where queuing theory indicates minor increases in the number of running processes can
cause major increases in wait times.
It’s clear, in order to really run a Release 11i or Release 12 system, during a failover, there are two choices:
• Run the servers at 35% or less utilization
• Reduce the number of processes that are allowed during failover
For most businesses the second option is the most practical.

Figure 47
Conversely, if a failover occurs from node 1 to node 2, we may want to reduce the failover processes, however, this doesn’t
work.
Only if the node fails does the “failover processes” take effect.
Figure 48
Figure 49

Application Affinity – How to Define Application Affinity

Define a Concurrent Manager to handle requests for a specific module. GL reports are commonly run under a GL Manager,
while Payroll requests typically run using a Payroll Manager.
By defining specialized managers it’s possible to direct concurrent requests to a specific concurrent processing node, by
defining the Primary/Secondary Node. Specialization rules allow requests to be excluded from managers and included in the
appropriate manager at the Application level.
Related module requests should be directed to a specialized Concurrent Manager. This manager can have a Primary
concurrent processing node that will use sqlnet to direct the database traffic to a related node in a RAC cluster.
Quick note: It seems a little silly to go to all the trouble to create the RAC cluster and then figure out ways to direct traffic to
a specific node. Why not just get a bigger, monolithic, SMP machine for the database server?
For a more complete, serious discussion, please refer to Optimizing the E-Business Suite with Real Application Clusters
(RAC) by Ahmed Alomari.

References
249213.1 - Performance problems with Failover when TCP Network goes down
364171.1- TAF Session Hangs, Select Fails To Complete W/ Loss Of NIC: Tune TCP Keepalive
211362.1 - Process Monitor Session Cycle Repeats Too Frequently
291201.1 - How To Remove a Dead Connection to the Target Database
362135.1 - Configuring Oracle Applications Release 11i with Oracle10g Release 2 Real Application Clusters and
Automatic Storage Management
Optimizing the E-Business Suite with Real Application Clusters (RAC) - Ahmed Alomari
240818.1 - Concurrent Processing: Transaction Manager Setup and Configuration Requirement in an 11i RAC
Environment
R12 ATG - Concurrent Processing Functional Overview – Aaron Weisberg
210062.1 - Generic Service Management (GSM) in Oracle Applications 11i
271090.1 - Parallel Concurrent Processing Failover/Failback Expectations
241370.1 - Concurrent Manager Setup and Configuration Requirements in an 11i RAC Environment
602899.1 - Some More Facts On How to Activate Parallel Concurrent Processing

Parallel Concurrent Processing Failover and Load Balancing of E-Business Suite Release 11i and Release 12 Mike Swing, TruTek PDF

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Parallel Concurrent Processing Failover and Load Balancing of E-Business Suite Release 11i and Release 12 Mike Swing, TruTek PDF

Caricato da

Copyright:

Formati disponibili

PARALLEL CONCURRENT PROCESSING FAILOVER

AND LOAD BALANCING OF E-BUSINESS SUITE

The Concurrent Processing Server:

www.rmoug.org 1 RMOUG Training Days 2009

The Phase and Status tell us what is happening

Phase and Status of Concurrent Requests

Phase Status Description – Action

www.rmoug.org 2 RMOUG Training Days 2009

Completed Normal The request has finished successfully

Manager Type Service Instance Program

www.rmoug.org 3 RMOUG Training Days 2009

Transaction Manager FastFormula Transaction Manager FFTM

Concurrent Processing Overview

Web HTML Web Server

Internal ICM Service Report SQL*Net

In the diagram, you can see that:

www.rmoug.org 4 RMOUG Training Days 2009

Figure 5 shows the definition of the Internal Manager.

In this example of the ICM definition, there

www.rmoug.org 5 RMOUG Training Days 2009

FNDSM failover as noted in the Concurrent Manager log:

Starting WFMGSMD Concurrent Manager : 15-AUG-2008 13:28:56

Starting STANDARD Concurrent Manager : 15-AUG-2008 13:30:31

• This manager/service is used to implement Parallel Concurrent Processing.

www.rmoug.org 6 RMOUG Training Days 2009

Figure 7 shows the Administer Concurrent Managers screen:

Notice that there are two nodes

3 processes will run if the

www.rmoug.org 7 RMOUG Training Days 2009

• Supports synchronous processing of requests from a client program

Figure 10 shows some of the Transaction Managers in Release 12:

Release 11i Transaction Managers use DBMS_PIPE

Release 12 Transaction Managers use AQ

www.rmoug.org 8 RMOUG Training Days 2009

SERVER (TM) CLIENT

The client-side flow is:

The server-side flow is:

TO SET UP TRANSACTION MANAGERS FOR PCP WHEN USING RAC

1. Shut down the application tier services on all the nodes.

3. Edit $ORACLE_HOME/dbs/<context_name>_ifile.ora and add these parameters:

www.rmoug.org 9 RMOUG Training Days 2009

_immediate_commit_propagation=TRUE From note: 362135.1

4. Start the instance on each database node.

5. Start up the Application tier services on all nodes.

7. Profile Option “Concurrent:TM Transport Type” can be set to PIPE or QUEUE

10. Restart the Concurrent Managers.

Conflict Resolution Manager

• Use the system Profile Option 'Concurrent: Use ICM'. 'No'

www.rmoug.org 10 RMOUG Training Days 2009

Advanced Scheduler/Prereleaser Manager.

www.rmoug.org 11 RMOUG Training Days 2009

Release 12 Generic Services and Request Processing Managers

www.rmoug.org 12 RMOUG Training Days 2009

www.rmoug.org 13 RMOUG Training Days 2009

www.rmoug.org 14 RMOUG Training Days 2009

applvis 9007 1 0 11:53 ? 00:00:00 FNDSM

[applvis@rh9 scripts]$ kill -9 9007

[applvis@rh9 scripts]$ ps -ef |grep FND

www.rmoug.org 15 RMOUG Training Days 2009

applvis 9169 1 0 11:55 ? 00:00:00 FNDSM

[applvis@rh9 scripts]$ ps -ef |grep FNDCRM

[applvis@rh9 scripts]$ kill -9 8886