Sei sulla pagina 1di 4

Available drives are not being used. Jobs are waiting in the queue, or staying in the queue, after writing ha

Page 1 of 4

Document ID: 237534

http://support.veritas.com/docs/237534

4 Document ID: 237534 http://support.veritas.com/docs/237534 E-Mail this document to a colleague Available drives are not

E-Mail this document to a colleague

Available drives are not being used. Jobs are waiting in the queue, or staying in the queue, after writing has completed. New jobs are taking an extended time to appear in the queue. Defunct bpsched processes. Exit status 96s and 54s.

Exact Error Message

Exit Status Code 54: timed out connecting to client;

Exit Status Code 96: unable to allocate new media for backup, storage unit has none available

Details:

Identifying this problem can be done by looking in the /usr/openv/netbackup/logs/bpsched/log.date and the /usr/openv/netbackup/logs/bptm/log.date.

NOTE: VERBOSE=11 must be present in the /usr/openv/netbackup/bp.conf file prior to the failure in order to identify this issue.

As the workload increases on the master server, the response time for the start_bptm -countmedia function call to the volume database can take a long time to return. This delay is typically seen in environments with volume databases containing over 25000 pieces of media and classes configured to use ANY AVAILABLE STORAGE UNIT. Typically the reason the MAIN bpsched process gets behind is due to a large number of user directed backups being submitted at one time. This is common in environments with clients running VERITAS NetBackup (tm) database extensions such as Oracle, Sybase, etc. By configuring all the classes to use ANY AVAILABLE STORAGE UNIT, the taxing of bpsched is dramatically increased because the start_bptm -countmedia function must count media on all configured storage units for each backup. This increases the probability of seeing these problems.

With volume databases containing over 25000 pieces of media, many start_bptm -countmedia requests in a short period of time will cause the MAIN bpsched process to fall behind because of delayed response from VMD. If the MAIN bpsched process falls behind on its work, the waiting

bpsched main_empty's child processes will show up as defunct processes in a ps

MAIN bpsched process catches up however, it will start to clean up those defunct processes. This performance delay causes problems with getting jobs active and can make jobs fail with:

Exit Status Code 54: timed out connecting to client;

and

Exit Status Code 96: unable to allocate new media for backup, storage unit has none available

output. Once the

Below is the media count up example in the bptm log:

bptm: INITIATING: -countmedia

Notice the long time to finish (one minute in this example). Normal countmedia is about one

file://H:\study\netbackup\Upload_site_done\done\New Folder\Available drives are not being used_ Jobs are

7/6/2010

Available drives are not being used. Jobs are waiting in the queue, or staying in the queue, after writing ha

Page 2 of 4

second. A delay will block the scheduler from doing other processing, and keep jobs from going active and drives from being used. It will also prevent completed jobs from being removed from the queue.

From /usr/openv/netbackup/logs/bptm/log.(date)

00:27:33 [29212] <2> bptm: INITIATING: -countmedia -cmd -rt 1 -rn 0 -stunit 9740-0 -den 14 -p RMAN_pool1 -rl 5 00:27:33 [29212] <2> add_to_vmhost_list: added <masterserver>.domain.com to vmhost list 00:27:33 [29212] <2> add_to_vmhost_list: added <mediaserver>.domain.com to vmhost list 00:27:33 [29212] <2> getsockconnected:

host=<masterserver>.domain.com service=vmd address=192.x.x.1 protocol=tcp non-reserved port=13701 00:27:33 [29212] <2> vmdb_get_scratch_list: server returned: Scratch_pool 00:27:33 [29212] <2> vmdb_get_scratch_list: server returned: EXIT_STATUS 0 00:27:33 [29212] <2> getsockconnected:

host=<masterserver>.domain.com service=vmd address=192.x.x.1 protocol=tcp non-reserved port=13701 00:28:33 [29212] <2> bptm: EXITING with status 0 <----------

Workaround:

Touch /usr/openv/netbackup/DISABLE_COUNTMEDIA on the master server. This eliminates the start_bptm -countmedia from being started.

Other possible workarounds are:

- Configure all of your classes for specific storage units rather than ANY AVAILABLE STORAGE UNIT

- Find a more powerful system for hosting vmd

- Reduce/eliminate other applications fighting for system resources on the system where vmd is running

- Ensure that the underlying system is using and has enough cache to handle the volume database

- Ensure that the file system on which the volume database is resident handles disk I/O quickly

- Ensure that the network is fast enough to deliver meta data between vmd and its requesters (bptm)

- Tune the tcp_time_wait_interval to a shorter period of time so the socket resources are more available for the countmedia processes

file://H:\study\netbackup\Upload_site_done\done\New Folder\Available drives are not being used_ Jobs are

7/6/2010

Available drives are not being used. Jobs are waiting in the queue, or staying in the queue, after writing ha

Page 3 of 4

- Purchase new tape technology with higher tape capacity that reduces the need for the number of individual volumes required

- Use multiple, smaller robotic libraries so that storage unit queries don't need to return a large number of volumes on each query

- In the upcoming release of VERITAS NetBackup (tm) 4.5, the use of the storage unit groups will

help reduce the number of media servers that need to be contacted during the countmedia function.

NOTE:

Disabling countmedia will only cause problems if a storage unit is out of media. Backups could fail with a status 96 (no available media) instead of using another storage unit that has media available. This will only be a problem if there are multiple storage units and the classes and/or schedules are set to use ANY AVAILABLE STORAGE UNIT. Even if the storage unit is set to "Any Available," they will not get into this situation if they have available media in all their storage units and pools. To avoid this situation, use the scratch pool feature of NetBackup.

NOTE: Process job complete code will re-enable counting if you get an error

EC_no_available_media(96).

i.e. If they run out of media, NetBackup starts counting again. Once media is added, or becomes available for use, recycling the NetBackup daemons will re-enable the DISABLE_COUNTMEDIA workaround.

NOTE: VERITAS NetBackup engineering is currently exploring ways to improve the performance of the countmedia function.

NOTE: This issue has been resolved in NetBackup 4.5.

Supplemental Material:

System: Ref.# DEFECT: R S V m n 1 5 2 9 4 Description Large

System: Ref.# DEFECT: RSVmn15294

System: Ref.# DEFECT: R S V m n 1 5 2 9 4 Description Large number

Description Large number of volumes in volDB affects NBU scheduler

Products Applied:

NetBackup DataCenter 3.4, 3.4.1, 4.5 (Fixed)

Applied: NetBackup DataCenter 3.4, 3.4.1, 4.5 (Fixed) Last Updated: November 15 2004 06:57 AM GMT Expires

Last Updated: November 15 2004 06:57 AM GMT Expires on: 11-15-2005 Subscribe to receive critical updates about this document

Subjects:

NetBackup DataCenter Application: Documentation, Notification, Usability

file://H:\study\netbackup\Upload_site_done\done\New Folder\Available drives are not being used_ Jobs are

7/6/2010

Available drives are not being used. Jobs are waiting in the queue, or staying in the queue, after writing ha

Page 4 of 4

Database: Configuration, Documentation, Faq, Media Management

Languages:

English (US)

Operating Systems:

AIX

4.3.3

HP-UX

11.0, 11.11

Solaris 2.6, 7.0 (32-bit), 8.0 (32-bit)

Symantec World Headquarters:

20330 Stevens Creek Blvd. Cupertino, CA 95014 World Wide Web: http://www.symantec.com/, Tech Support Web: http://entsupport.symantec.com/, E-Mail Support:

http://seer.entsupport.symantec.com/email_forms, FTP: ftp://ftp.entsupport.symantec.com/ or ftp://ftp.entsupport.symantec.com/

THE INFORMATION PROVIDED IN THE SYMANTEC SOFTWARE KNOWLEDGE BASE IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. SYMANTEC SOFTWARE DISCLAIMS ALL WARRANTIES, EITHER EXPRESS OR IMPLIED, INCLUDING THE WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL SYMANTEC SOFTWARE OR ITS SUPPLIERS BE LIABLE FOR ANY DAMAGES WHATSOEVER INCLUDING DIRECT, INDIRECT, INCIDENTAL, CONSEQUENTIAL, LOSS OF BUSINESS PROFITS OR SPECIAL DAMAGES,EVEN IF SYMANTEC SOFTWARE OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. SOME STATES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES SO THE FOREGOING LIMITATION MAY NOT APPLY.

file://H:\study\netbackup\Upload_site_done\done\New Folder\Available drives are not being used_ Jobs are

7/6/2010