Sei sulla pagina 1di 12

Documentum Connector Field Guide

Background, Setup, Troubleshooting & Debugging

Copyright Vivsimo, Inc. All rights reserved worldwide

Documentum Background
Design
The Velocity Documentum connector allows you to crawl Documentum Docbases. One document is created for the meta-data and one for the contents for each Documentum document. These documents are then merged back together to create a single document. Documentum requires two collections, one to crawl the actual documents in a table, the other to crawl the users table. A form is then added to the document collection's source to pass the user's rights to the search engine.

Prerequisites
EMC Documentum Content Server governs the EMC Documentum content repository. Content Server provides administrators set of content management services and a comprehensive infrastructure to manage all content applications. In order to crawl a Docbase, the Documentum Foundation Classes (DFC) must be installed on the same machine as Vivsimo Velocity, and must be accessible by whatever web service is running. The path to this DFC set must be specified as the Documentum Install Dir. If the Documentum server is on the same machine as Velocity, the Documentum Foundation Classes should already be installed; otherwise, you may have to install them. See the Documentum documentation for more information. (Your customer should be able to obtain the DFC through their customer account on EMC's web site.) You must ensure that you install the version of the DFC that is intended to be used with the operating system that Vivsimo Velocity is running on. The Windows, Linux, and Unix versions of the DFC are not interchangeable. Currently, only a 32-bit version of the DFC is available. What is DFC? EMC Documentum Foundation Classes is a unified application programming interface (API) that applications can call on to leverage any content service provided by the EMC Documentum platform. Documentum Foundation Classes comprises a Documentum-specific API called Documentum Foundation Classes (DFC) and a set of standards-based APIs including WebDAV, SMB, FTP, ADO.NET, ODBC, JDBC, and ECI. Developers can write any type of content-rich application including web or portal applications, or custom user interfaces for the desktop. Vivisimo currently utilizes the JDBC API interface and will be evaluating the WebDaV protocol.

Interfaces
EMC Documentum provides multiple end user interfaces including integrated applications. Vivisimos customers primarily use one of the following: Webtop is the primary interface that provides access to the EMC Documentum repository and content management services within a standard browser application (Figure 1) Documentum Portlets (Figure 2) Lotus Notes Microsoft Outlook (DCO)
Note: In July of 2008 EMC announced Documentum Enterprise Content Management (ECM) Suite version 6.5a family of products that marries the user experience of Web 2.0 with enhanced XML capabilities. This new release has introduced repository architecture updates as well as end user interface a new lightweight client that is completely integrated with the desktop. Content Management Interoperability Services (CMIS) is a new set of standards and web services that ensure interoperability among disparate content repositories. EMC, IBM, and Microsoft have jointly drafted Content Management Interoperability Services (CMIS) specification and have submitted it to the Organization for the Advancement of Structured

Copyright Vivsimo, Inc. All rights reserved worldwide

Information Standards (OASIS), in an effort to allow unprecedented interoperability with and between disparate, multivendor ECM solutions.

Documentum Webtop Figure 1

Documentum Portlets Figure 2

Copyright Vivsimo, Inc. All rights reserved worldwide

Setup
Prerequisites
As stated in the Velocity documentation, the Documentum Foundation Classes (DFC) must be installed on the same machine as Vivsimo Velocity, and must be accessible by whatever web service is running. The path to this DFC set must be specified as the Documentum Install Dir seed element. If the Documentum server is on the same machine as Velocity, the Documentum Foundation Classes should already be installed; otherwise, you will have to install them. You must ensure that you install the version of the DFC that is intended to be used with the operating system that Vivsimo Velocity is running on. The Windows, Linux, and UNIX versions of the DFC are not interchangeable. Currently, only a 32-bit version of the DFC is available. If we are crawling from a 64-bit Velocity environment, we must modify the DFC, jre and lib environment and we will address that later. EMC Documentum Content Server and Documentum Foundation Classes (DFC) are available from the official EMC web site and requires the customers registered logon to access: https://emc.subscribenet.com/control/dctm-eval/login The current versions of the DFC are also available on the Office shared drive: Y:\connectors\Software\Documentum or /office/connectors/Software/Documentum DFC6 DFC6.5 DFC6.5-SP1
Note: Also, there are still many EMC customers that still have Documentum 5.3 and situations where they are in the process of migrating from 5.x to 6.x and have both versions supported and maintained. Documentum has supported this environment with features within the 6.x product including Docobject and metadata schemes to ease the migration provide backward compatibility.

DFC Install
If you need to install the DFC, create a directory named DFC that is accessible by Velocity through your Web server. You may want to add a separate directory to the web server config or just add the DFC directory under the installed Velocity web directory. Extract the contents of the DFC archive into this directory. All files must be readable by the Velocity application. You can then set the Documentum Install Dir in the seed to <install dir>/DFC, and set the Shared Directory in the seed to <install Dir>/DFC/dfc. You will then need to modify two property files, config/dfc.properties and config/log4j.properties to identify the location where you have installed the DFC.

Multiple Version Documentum environments


In the field you may find customers that are currently maintaining multiple versions of Documentum (i.e. version 5.3 and 6.0) and though they are on different host servers, Velocity may be required to crawl and index docbases from both versions. DFC installer will only allow one version to actually be installed at a time and since we must install on our Velocity server, we must install on a separate server and copy the install directory to our Velocity server. Then we must set the appropriate OS environment variables.

Copyright Vivsimo, Inc. All rights reserved worldwide

To set the OS specific variables: Windows: Installation program sets the environment variables. o The DFC installation program for Windows sets environment variables. The only additional setting you need to make is to add jars to the classpath if you need to refer to DFC classes and interfaces in your Java programs. o On Windows systems, the installation program uses the shared subdirectory of the program root directory. It attaches the full path of this directory (followed by a separator character) in front of the value of the PATH system environment variable. o On Windows systems, the installation program asks you for the information that it uses to set these variables. See Table 1 below. UNIX/Linux: You set environment variables. o For UNIX systems, the installation program does not set environment variables. If the installation program does not find the needed environment variables, it aborts the installation. The way to set environment variables depends on the shell that you use. Be sure to set the variables in such a way that a process launched in a different shell has the same values defined. This means using setenv or export (depending on the shell). Do not use set, which defines variables only for the current shell, but not for any child shell. In order to run more than one version of DFC on a UNIX system, you must arrange to run the different DFC versions in different processes. You must install the different versions of DFC in locations that you can distinguish from one another by setting the environment variables. o On UNIX systems the installation program uses the dfc subdirectory of the program root directory. You must place the full path of this directory onto the library path. The library path environment variable has different names in different versions of UNIX. o On UNIX systems, you must set these variables before you run the installation program. Table 1, below lists these environment variables and summarizes the ways that DFC uses them. Environment variables can be set on UNIX systems using the setenv script. The script can be found at $<install Dir>/dfc/set_dctm_env.sh (.csh). You can source this file to properly set the environment variables from table below. How DFC uses it Windows value (installation program sets) Not used by Windows systems Attach the full path (followed by a separator character) in front of the shared subdirectory of the Documentum program root Not used by Windows systems UNIX value (you set) Specify a value before installing DFC Not used by UNIX systems

Variable

DOCUMENTUM_ SHARED

Determine the full path to the program root directory for UNIX Find the directory containing DFC shared libraries (DLLs) on Windows

PATH

Library path (the appropriate installation guide lists the different names for this variable on

Find the directory containing DFC shared libraries on UNIX

Add $DOCUMENTUM_ SHARED/dfc

Copyright Vivsimo, Inc. All rights reserved worldwide

different UNIX systems) DFC_DATA

Documentum has deprecated this variable.

DOCUMENTUM

CLASSPATH

Determine the full path to the user root directory Allow Java runtime to find dctm.jar and, the DFC config directory. See the appropriate installation guide for information about making DFC classes available to the javac compiler Table 1 Environment Variables that DFC Uses

Directory for DFC configuration, the appropriate installation guide provides information about what you should do instead of using this variable. Not used by Windows Specify a value systems before installing DFC Attach (with appropriate Add separator characters) the $DOCUMENTUM_ full paths of dctm.jar and SHARED/dctm. the config directory (for jar and $DOCUMENTUM_ example, C:\Program SHARED/config Files\Documentum\ Shared\dctm. jar and C:\Documentum\ config)

DFC Install on Linux


There are some known issues when installing the current 6.0 DFC on Linux. The following steps have been documented on the Documentum Connector wiki page and may change in the near future. Choose an installation directory (Warning: not a NFS mount!) Add the following environment variables pointing to the install directory o export DOCUMENTUM_SHARED=/opt/DFC o export DOCUMENTUM=/opt/DFC untar the DFC file in $DOCUMENTUM Run the installer and set the following configuration o connection broker: <IP of the Documentum Content Server> o port: 1489 (default) o username: dm_bof_registry (default) o password: <password> The last installer screen should specify the installation was successful Read the install log If there is an error: Publication of DFC instance with global registry failed add the following line to the dfc.properties file: o dfc.bof.registry.repository=<docbase name> o dfc.bof.registry.repository.username=dm_bof_registry (default) o dfc.bof.registry.repository.password=<password> (encrypted version) You should find a log file, log4j.log and if there is an error: o "IO Exception attempting to acquire interprocess lock.../opt/DFC/config/dbor.properties.lck [...] FileNotFoundException .../opt/DFC/cache/[...]/content.lck" o add read and write permissions to dbor.properties.lck and to content.lck for everybody do "chmod -R 777 cache" Run the installer again Now the install.log and log4j.log should no longer report Exceptions and DFC should function properly.

Copyright Vivsimo, Inc. All rights reserved worldwide

DFC Troubleshooting There are some straight forward troubleshooting if you now crawl your Documentum repository and get the following error: Could not get object: [DFC_BOF_CLASS_CACHE_INIT_ERROR] Failed to initialize class cache If you see this you will probably also see the following error in the connector logging file log4j.log in the DFC/logs directory (also see Documentum Connector Debugging below): o com.documentum.fc.common.DfNewInterprocessLockImpl - IO

Exception attempting to acquire interprocess lockjava.io.FileNotFoundException: C:\Documentum\cache\6.5.0.038\bof\inpex_dctm\content.lck (Access is denied)


Add write permissions to content.lck to solve this problem. The installation program maintains an error log, which it writes to a file called setupError.log in the current working directory. If it cannot write into the working directory, it writes to the home directory of the user who initiated the installation. Reading this file may help you see what went wrong.

Documentum Connector on a 64-bit Server


Currently, only a 32-bit version of the DFC is available. If we are crawling from a 64-bit Velocity environment, we currently have two options: 1. We must modify the DFC, jre and lib environment by copying these directories from a 32bit installation to the 64-bit installation. a. Crawling from 64-bit Linux/UNIX: i. Copy the INSTALL_DIR/jre and INSTALL_DIR/lib/libmisc.so b. Crawling from 64-bit Windows: i. Copy the INSTALL_DIR/jre and INSTALL_DIR/lib/misc.dll 2. Install a 32-bit instance of Velocity on the Documentum host server and maintain the collection and source from that instance. Now you can create a source on the 64-bit Velocity server and point to the 32-bit Velocity source.

Copyright Vivsimo, Inc. All rights reserved worldwide

Configuring Documentum Seeds (from the online documentation)


Crawling Documentum requires two collections, one to crawl the documents, the other to crawl the users table. The Documentum seed is used to crawl documents within a docbase and consists of the following fields: Host - Host to connect to. Port - Port on which Documentum is running. Username - Username used to connect to the Documentum server. Password - Password used to connect to the Documentum server. Docbase - Docbase from which to retrieve documents. The name of the Docbase is case-sensitive. DQL Statement (optional) - The DQL statement used to query the Documentum Docbase. When doing a partial refresh the last crawl time must get passed in the DQL statement. To accomplish this, the DQL statement must be edited in xml mode which can be done by clicking the [xml] link. Once in xml mode, the two variables, date-time and new-date, must be declared and set. After those two variables are set the condition r_modify_date > date('<value-of select="$new-date" />') must be added to the where clause. The example below enables partial refreshing for the default DQL query: <declare name="date-time" /> <declare name="new-date" /> <process-xsl> <![CDATA[ <xsl:template match="/"> <set-var name="date-time"> <valueof select="viv:seconds-to-local-date-time($live-crawl-date)" /> </set-var> </xsl:template> ]]> </process-xsl> <set-var name="new-date"><value-of select="date:month-in-year($date-time)" />-<valueof select="date:day-in-month($date-time)" />-<value-of select="date:year($date-time)" /> <value-of select="date:time($date-time)" /></set-var> select r_object_id, r_modify_date from dm_document <if test="$live-crawl-date > 0"> where r_modify_date > date('<value-of select="$new-date" />') </if> o Additional custom Documentum fields may be added to the DQL and contents nodes will be created. Use the Documentum converter to map to specific content by modifying the XPath: viv:choose(@name = 'title', 'title', @name = 'subject', 'description', @name = 'r_modified_date', 'last-modified', @name = 'r_modifier', 'author')

All Versions (optional) - Crawl all versions of a document. By default just the current version is crawled. Virtual Documents (optional) - Crawl documents as virtual documents. Documentum Version (optional) - The Documentum version to crawl. Documentum Install Dir - The Documentum installation directory. This directory should contain both the Documentum Shared and config directories. Shared Directory (optional) - Location of the Documentum Shared directory. If no path is specified, the Shared directory is assumed to be in Documentum Install Dir/Shared. URL Root - Root URL of the Documentum web interface. r_object_id will be appended to the URL provided. Group/User Prefix (optional) - Prefix added to groups and users to make their names unique.

Copyright Vivsimo, Inc. All rights reserved worldwide

Documentum Seed Figure 3

The Documentum User seed is used to crawl user rights within a Documentum server and consists of the following fields: Host - Host to connect to. Port - Port on which Documentum is running. Username - Username used to connect to the Documentum server. Password - Password used to connect to the Documentum server. Docbase - Docbase from which to retrieve documents. The name of the Docbase is case-sensitive. All Versions (optional) - Crawl all versions of a document. By default just the current version is crawled. Documentum Install Dir - The Documentum installation directory. This directory should contain both the Documentum Shared and config directories. Shared Directory (optional) - Location of the Documentum Shared directory. If no path is specified, the Shared directory is assumed to be in Documentum Install Dir/Shared. Group/User Prefix (optional) - Prefix added to groups and users to make their names unique.

Copyright Vivsimo, Inc. All rights reserved worldwide

In the Search Configuration for the Docbase collection (Search Tab) you must set the Rights Required to true. In the Live Source of the of the docbase collection add the form component Documentum Rights with the following info: Documentum Users Collection- This is the name of the user collection that was created. User OS Name - For testing purpose you can pass a known user username to return specific documents that that user is known to have rights to access. User OS Domain - Optional

Restricting Documentum Crawls


The Documentum seed used to crawl your Docbase allows you to change the DQL Statement to restrict the crawl. Examples: Using the following DQL query in the seed to return documents from specific authors o SELECT r_object_id, r_modify_date from dm_document where ANY authors = '<author-name>' and ANY authors='<author-name>' Crawl a specific cabinet or folder and recourse through all sub-folders: o SELECT r_object_id, r_modify_date from dm_document WHERE FOLDER ('/<Cabinet name>',DESCEND) Get all files (and versions) under a particular cabinet: o SELECT r_object_id, object_name from dm_document(all) where folder(/<Cabinet name>, DESCEND) Get only current versions in a cabinet: o SELECT * from dm_document where folder(/<Cabinet name>, DESCEND) DQL to find whether a document is a part of virtual document o SELECT object_name,r_object_id FROM dm_sysobject WHERE r_object_id IN (SELECT parent_id FROM dmr_containment WHERE component_id = (SELECT i_chronicle_id FROM dm_sysobject WHERE r_object_id = <childobject-id>)) The following DQL can be used to debug content issues: DQL to find object type of a document o SELECT r_object_type from dm_document where object_name=ObjectName DQL to list objects having duplicate names o SELECT object_name, count(*) FROM dm_document GROUP BY object_name HAVING count(*) > 1 ORDER BY object_name DQL to get total number of documents and folders under a cabinet o SELECT count(*) as cnt, Docs as category FROM dm_document(all) WHERE FOLDER (/Cabinet Name,DESCEND) UNION SELECT count(*) as cnt, Folders as category FROM dm_folder WHERE FOLDER (/Cabinet Name,DESCEND) DQL to retrieve all required attributes of a particular type

Copyright Vivsimo, Inc. All rights reserved worldwide

10

SELECT attr_name FROM dmi_dd_attr_info WHERE type_name=dm_document AND is_required <> 0 DQL to limit the number of documents to return o SELECT object_name FROM dm_document ENABLE (RETURN_TOP 10) DQL to find the file system path location of a document o SELECT doc.r_object_id, doc.object_name, MFILE_URL(,-1,) as mypath,doc.i_folder_id from dm_document doc where <condition> o

Documentum Connector Debugging


Connector Logging should be used to determine issues with the connector errors and results. Here are the instructions for adding connector logging: Open the collection click Configuration -> Crawling tab Scrolling downward you should see a button called "Add a new condition" Click and add "Connector Logging" In the Log4j configuration box copy and paste the default configuration and then modify the priority value =debug by entering the following and hit OK: a. <category name="com.vivisimo.connector"> <priority value="debug" /> </category> <category name="httpclient.wire"> <priority value="error" /> </category> <category name="com.interwoven"> <priority value="info" /> </category> <root> <priority value="error" /> <appender-ref ref="FILE" /> </root> 5. Start the crawl, or, do a "test it" and then find the connector log file that should be in: ($VIV_INSTALL/tmp/viv_connector-{COLLECTION_NAME}-{PROTOCOL}-log) The Documentum User seed is used to crawl user rights within a Documentum server and has some compatibility issues between Documentum 5.x and 6.x. With Documentum 6.x a user is uniquely identified either by its 5.3 fields ("user_os_name" and user_os_domain), or by the new 6.0 fields ("user_login_name" and user_login_domain). Prior to 6.x, the only field was user_os_name and it has been maintained for backward compatibility. The Documentum User connector should identify this modification after Velocity 7.03 and check which fields have the values and authenticates with the proper user credentials. If your source/collection is not returning results, you can test a specific Documentum user that you know has access to documents and field data. To verify the user: For testing purpose you can pass a known user username to return specific documents that that user is known to have rights to access. Search the Documentum Users Collection and verify that the specific user is actually returned in the results and that the user_login_name is the proper scheme and case. In the form component of your docbase collection source, Documentum Rights, enter the literal string for that user or if logged in as that user force the session value by editing the User OS Name and in xml mode enter: o <value-of select="$user.name" /> 1. 2. 3. 4.

Documentum Query Tools


Samson is a desktop application that comes packaged with the Content Server installation. It can be found in $DOCUMENTUM/unsupported/win32 folder. It comes with a small instruction document as well.

Copyright Vivsimo, Inc. All rights reserved worldwide

11

Delilah is a client application for Documentum, written by Rob de Leeuw. It can be seen as an alternative to Documentums Desktop Client or the unsupported Samson tool. Delilah is recognized by many Documentum Power Users and Administrators for it's performance, search and navigation features and also for the easy way of sending query results to Excel. Using MDI technology, you can open as many windows (e.g. Explorer, DQL, API and Search windows) within Delilah as you need. Delilah is a "light weight" fat client, it is just a few MB in size. http://canservices.nl/cms/ You will need to register to download.

Copyright Vivsimo, Inc. All rights reserved worldwide

12

Potrebbero piacerti anche