Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
version 4.2.x
Administrators Guide
Information in this document is subject to change without notice. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express permission of Autonomy Systems Ltd. Windows is a trademark of Microsoft Corp., UNIX is a trademark of X/OPEN Ltd.
IDOL server and File System Fetch are trademarks of Autonomy Systems Ltd.
Table of Contents
Preface.....................................................................................................................................i Autonomy ........................................................................................................................ i Contact ........................................................................................................................... ii Downloading manual updates from Automater ............................................................. iii Typographical conventions ............................................................................................ iii Related documentation ................................................................................................. iv 1. Autonomy infrastructure .............................................................................................1 IDOL server ....................................................................................................................3 Connectors .....................................................................................................................3 Interfaces ........................................................................................................................3 Distributed systems ........................................................................................................3 Administration .................................................................................................................4 PODS .............................................................................................................................4 Data flow and security ....................................................................................................5 Introduction ..................................................................................................................7 System architecture ........................................................................................................8 Controlling internal file import .........................................................................................9 Installation ..................................................................................................................11 System requirements ...................................................................................................11 Implementation procedure ............................................................................................12 Installing File System Fetch on Windows .....................................................................13 Directory structure: Windows ..................................................................................15 Installing File System Fetch on UNIX ...........................................................................17 Directory structure: UNIX ........................................................................................19 Configuring File System Fetch .................................................................................21 Displaying help on configuration settings .....................................................................21 Modifying configuration parameter values ....................................................................22 Configuration file sections ............................................................................................23 [License] section .....................................................................................................23 [Service] section .....................................................................................................24 [Server] section .......................................................................................................24 [Default] section ......................................................................................................24 [Configuration] section ............................................................................................25 [<AFetchJob>] section ............................................................................................26 Example configuration file ......................................................................................27 Importing PST files .....................................................................................................29
2.
3.
4.
5.
6.
Importing individual files .......................................................................................... 31 Displaying online help ............................................................................................ 31 Action command syntax ......................................................................................... 32 Starting and stopping File System Fetch ................................................................ 33 Starting File System Fetch ........................................................................................... 33 Stopping File System Fetch ......................................................................................... 34
7.
Appendix A: Service port commands................................................................................ 35 GetConfig ..................................................................................................................... 36 GetLogStream .............................................................................................................. 36 GetLogStreamNames .................................................................................................. 37 GetStatistics ................................................................................................................. 37 GetStatus ..................................................................................................................... 38 GetStatusInfo ............................................................................................................... 38 MergeConfig ................................................................................................................ 39 SetConfig ..................................................................................................................... 41 Stop .............................................................................................................................. 41 Glossary ............................................................................................................................... 43 Index ..................................................................................................................................... 45
Preface
Autonomy
Autonomy employs a fundamentally different and unique combination of technologies to enable computers to form an understanding of a page of text, web pages, emails, voice, documents and people. Autonomy's solution is therefore able to power any application dependent upon unstructured information within every market sector, including: e-commerce, customer relationship management, knowledge management, enterprise information portals and online publishing applications. This is evidenced by the significant penetration of the technology in a diversity of vertical markets and has been achieved principally because every market sector needs to manage and leverage the benefits of unstructured information.
Autonomy was founded in 1996 and has offices in Boston, Chicago, Dallas, San Francisco, New York, and Washington, D.C. in the United States, as well as offices throughout EMEA, including Amsterdam, Brussels, Cambridge, Frankfurt, Milan, Paris, Oslo, and Sydney. In July 1998, the company went public on the EASDAQ exchange (EASDAQ:AUTN). Autonomy floated on The NASDAQ National Market (NASDAQ: AUTN) in May 2000, and on the London Stock Exchange (LSE: AU.) in November 2000.
Contact
To contact Autonomy, please get in touch with your nearest location listed below.
Europe and South Pacific Autonomy Systems Ltd. Cambridge Business Park Cowley Road Cambridge CB4 0WZ Help Desk: Switchboard: Fax: Email +44 (0) 800 0 282 858 +44 (0) 1223 448 000 +44 (0) 1223 448 001 for information: for support: Website: www.autonomy.com autonomy@autonomy.com uksupport@autonomy.com
The Help Desk operates from 9.30 am to 6.00 pm (GMT) Monday to Friday.
USA Autonomy Inc. One Market Spear Street Tower San Francisco CA 94105 Help Desk: Switchboard: Fax: Email +1 877 333 7744 +1 415 243 9955 +1 415 243 9984 for information: for support: Website: www.autonomy.com info@us.autonomy.com support@us.autonomy.com
The Help Desk operates from 9.30 am to 6.00 pm (CST) Monday to Friday, toll-free.
ii
Note: the manual's version number (for example, version 4.1.x) corresponds to the product version. The last number of the product version has been replaced with an x for all manuals as this number relates to minor product releases that have no effect on the documentation. If a manual has a revision number (for example revision 5), it indicates that this manual has been revised since it was first released. Automater always contains the latest available revision of all manuals.
Typographical conventions
Autonomy documentation uses the following typographical conventions. Formatting convention: Bold type Type of information: References to any following: Courier font <text> Interface options (for example, menus or buttons) Actions Parameters
Configuration examples A string that needs to be replaced with a personal setting. For example <port> indicates that you have to specify a port number, [<MySection>] indicates that you have to specify a section name and so on. Note that this only applies where this does not explicitly refer to XML. Another exception are instructions for writing ACI templates (an appendix to product manuals where this is applicable) where personal settings are indicated by Italic type.
iii
Preface
Related documentation
You should use the File System Fetch manual in connection with the following:
DIH manual
The DIH (Distributed Index Handler) manual contains details on how you can use a DIH to distribute aggregated documents across multiple IDOL servers.
IAS manual
The IAS manual contains details on how you can use Autonomys Intelligent Asset Protection System (IAS) to ensure secure access through authentication and role permissions.
DiSH manual
The DiSH (Distributed Service Handler) manual contains details on how you can use a DiSH server to administer and control multiple Autonomy services.
Online help
The online help details the actions and configuration settings that are available for File System Fetch. Please refer to Displaying help on configuration settings on page 21 and to Displaying online help on page 31 for details on how to display help.
iv
1. Autonomy infrastructure
"Today, 80% of business is conducted on unstructured information." Gartner Group "85 per cent of all data stored is held in an unstructured format." Butler Group "Unstructured data doubles every three months." Gartner Group
Information that you need in order to conduct business successfully comprises the following types:
In the past companies could only make use of 20% of the information that was relevant to them. In order to deal with this information they used keyword search engines, tagging schemes, collaborative filtering or linguistic methods. These methods were not only costly and time-inefficient but also nonscalable, inaccurate and taking the focus from core business. 80% of relevant information could not be utilized.
Page 1
Autonomy infrastructure Autonomy's software infrastructure allows you to utilize 100% of the information that is relevant to you. It automates all the business processes that formerly had to be dealt with manually. By developing a patented combination of Bayesian Inference, Shannon's information theory and pattern matching, Autonomy has enabled computers to understand unstructured, structured and semistructured information. This means that Autonomy's software infrastructure solves a fundamental problem that affects every industry, and can be used in virtually any application that handles unstructured information:
E-Commerce CRM Knowledge Management Business Intelligence Enterprise Information Portals Online Publishing
Autonomy's software infrastructure is fully scalable and allows you to process information:
Page 2
Autonomy infrastructure
IDOL server
Using Autonomy connectors, Autonomy's Intelligent Data Operating Layer (IDOL) server integrates unstructured, semi-structured and structured information from multiple repositories through an understanding of the content, delivering a real time environment in which operations across applications and content are automated, removing all the manual processes involved in getting the right information to the right people at the right time.
Connectors
Connectors enable automatic content aggregation from any type of local or remote repository (for example, a database, a web site, a real-time telephone conversation etc.), forming a unified solution across all information assets within the organization.
Interfaces
Portlets are windows that can be set up in Autonomy's Portal-in-a-Box or third party portals. Each portlet contains an application that allows the portals' end users to benefit from a variety of IDOL server functionality. Retina, an easy-to-use web interface application that provides a full scale of retrieval methods that adjust to the individual users proficiency. Autonomy Desktop Suite brings the power of Autonomy to every desktop. Conducting a realtime analysis of the ideas involved in the content of any opened desktop application, Desktop Suites ActiveKnowledge or Active Windows Extensions module provides real-time links to relevant internal and external information without the user being needlessly diverted from his work in progress to perform an exasperating search or retrieval operation.
Distributed systems
Autonomys distribution solutions facilitate linear scaling of systems through faster command execution and reduction of processing time DAH (Distributed Action Handler) enables the distribution of ACI (Autonomy Content Infrastructure) action commands to multiple Autonomy IDOL servers, providing failover and load balancing. DIH (Distributed Index Handler) enables distributed indexing of documents into multiple Autonomy IDOL servers, providing failover and load balancing.
Page 3
Autonomy infrastructure
Administration
DiSH (Distributed Service Handler) provides crucial maintenance, administration, control and monitoring functionality for the Autonomy infrastructure. DiSH delivers a unified way to communicate with all Autonomy services such as connectors, DIH, DAH and so on from a centralized location Autonomy Service Dashboard is a stand-alone web application that allows administrators to manage all Autonomy modules /services running locally or remotely. The Dashboard communicates with the Distributed Service Handler (DiSH) module that is the back end process for monitoring and controlling all the Autonomy child services. Autonomy Service Dashboard provides the administrator with a list of all child services that DiSH is monitoring, together with control buttons and status information.
PODS
Autonomys Product Orientated Drop-in Solutions allow Autonomy solutions to be easily integrated with third party applications and solution providers. PODS enable organizations to make their existing applications compatible with IDOL with minimal configuration and administration requirements. Making IDOL server a part of any solution delivers the direct benefits of content automation and the ability to perform a vast range of IDOL server operations, irrelevant of file format or location.
Page 4
Autonomy infrastructure
Page 5
Autonomy infrastructure Aggregation & Distribution Connectors aggregate content from various repositories and index it into IDOL server or, if the content needs to be distributed across multiple IDOL servers, a DIH (Distributed Index Handler).
Querying & Distribution User queries are sent from a front end directly to IDOL server or distributed to multiple IDOL servers using the DAH (Distributed Action Handler).
Distributed Administration The DiSH (Distributed Service Handler) enables administrators to maintain, configure and control multiple Autonomy services via the Autonomy Service Dashboard, a front-end web interface.
Security The Autonomy IAS (Intellectual Asset Protection System) ensures secure access through authentication and role permissions. When a user logs on to a front end (for example, Retina or a 3rd party portal) his authentication details are sent to IDOL server which returns the user's security details to the front end, where they are stored until the user logs off or his session times out. Every time the user issues a query, his security details are attached to the query string that is sent to IDOL server. The group servers store the user group information of repositories that store users in groups. This allows the front end to quickly retrieve user security information from the group servers, and send the query and the user's security information to IDOL server in order to check if the user is permitted to view result documents before they are displayed to the user. When a user queries IDOL server through the front end, his security information is retrieved from the appropriate group server and sent with his query to IDOL server. IDOL server passes the user's security details to the security libraries for the data repositories that contain result documents for the user's query. The security libraries then check the user's security details against the ACLs for the documents that match the query. If the user is entitled to view a document, it is returned as a result to the front end.
Page 6
2. Introduction
File System Fetch is an Autonomy connector that automatically aggregates documents from file systems on local or network machines, imports them into IDX or XML file format (only IDX or XML files can be indexed in IDOL server) and indexes them into an Autonomy IDOL server. Once IDOL server receives the documents, it automatically processes them, performing a number of intelligent operations in real time, for example: Agents Alerting Categorization Channels Clustering Collaboration Dynamic Thesaurus Expertise Hyperlinking Mailing Profiling Retrieval Spelling Correction Summarization Taxonomy Generation
Page 7
Introduction
System architecture
File System Fetch aggregates documents from any type of local or remote repository and indexes them into an IDOL server:
If you want to distribute the documents that File System Fetch aggregates across multiple IDOL servers, you need a DIH (Distributed Index Handler) installation. In this case File System Fetch aggregates documents from any type of local or remote repository and indexes them into the DIH which then distributes the documents between the IDOL servers it connects to, providing Load Balancing and Failover:
Page 8
Introduction
Directory Polling File System Fetch imports and indexes any file that is contained in a specified directory, provided the file meets the criteria that you have specified. File System Fetch creates a <InstallationName>.dirstatn file for each of the jobs that it carries out. This file contains a list of all the files that have been processed and is stored in the File System Fetch installation directory. If you want to stop File System Fetch and restart its process from scratch, you should delete the <InstallationName>.dirstatn and<InstallationName>.dirstatn.bak files. If you don't, File System Fetch will refer to them when it is restarted and carry on its process from where it stopped. If you want to reprocess the last file that File System Fetch dealt with, you can replace the contents of the <InstallationName>.dirstatn with the contents of the <InstallationName>.dirstatn.bak file, which is a copy of the <InstallationName>.dirstatn file before the last file was processed. File System Fetch automatically processes any new files that appear in the specified DirectoryPathCSVs directory. You should therefore ensure that no application will create temporary files in this directory.
Every time new files are added to the list file or the directory from which File System Fetch is reading, it processes them automatically. Use the <InstallationName>.log file (located in the File System Fetch installation directory) to keep track of all actions that File System Fetch performs.
Page 9
Introduction
Page 10
3. Installation
System requirements
File System Fetch should be installed by the system administrator as part of a larger Autonomy system (that is a system that includes Autonomy IDOL server and an interface for the information stored in IDOL server).
Supported platforms
Microsoft Windows NT4, 2000 and XP Linux Solaris Note: File System Fetch also supports other POSIX UNIX versions on request.
Note: this specification is dependent on the amount of data to be fetched. Due to substantially different disk usage patterns it is beneficial to run fetch and IDOL server processes on separate drives or partitions.
Page 11
Installation
Implementation procedure
You can use the following implementation procedure to testrun your File System Fetch installation: 1. Install File System Fetch: Run the installer (see Installing File System Fetch on Windows on page 13). When the IDOL server Details dialog is displayed, enter xxx in the Host field. This stops File System Fetch from indexing files into IDOL server (after they have been imported) and forces it to store them in the main installation directory instead. When the File System Fetch Services dialog is displayed, uncheck the box to ensure that File System Fetch does not start immediately. 2. 3. 4. 5. Open the File System Fetch configuration file in a text editor, and set the PollingPeriod parameter to 0 in order to ensure that File System Fetch cycles only once. Navigate to the data directory in your File System Fetch installation, and place a Word document that you want to index into IDOL server into this directory. Display the Windows Services dialog and start File System Fetch. File System Fetch cycles only once. Wait until it has completed its cycle. You can check the <InstallationName>.log file in the File System Fetch installation directory in order to see when the cycle is finished. (Note that because you have set IP Address to xxx, the <InstallationName>.log file will state that the indexing command failed). Display the Windows Services dialog, and stop File System Fetch. In the installation directory, open the <MyJob>.tmp.queued.idx file in a text editor and check that it contains all the content that you want to index into IDOL server. If it doesnt, you need to configure File System Fetch to aggregate the content you want. You can do this using specialized File System Fetch and Import Module settings (please refer to your online help for details on available settings). Once you have made changes to the File System Fetch configuration file, delete the <InstallationName>.dirstat0 and <InstallationName>.dirstat0.check files that File System Fetch has created in its installation directory (this allows File System Fetch to repeat the cycle), start File System Fetch and repeat steps 2-6 until you are happy with the content of the <MyJob>.temp.queued.idx file. Finalize your File System Fetch configuration: Open the File System Fetch configuration file in a text editor. Set the PollingPeriod parameter to an appropriate number (for example, 86400000 if you want File System Fetch to run every 24 hours). Set the DREHost parameter to the IP address (or name) of the machine that hosts your IDOL server. Set up the Fetch jobs that you want File System Fetch to execute (see [Configuration] section on page 25). 10. You can now run File System Fetch.
6. 7.
8.
9.
Page 12
Installation
1. 2.
The installation opens with the Welcome dialog. Read the text, and click on Next. The License agreement dialog is displayed. Read the license agreement and click on Next to accept it.
3.
The Installation Name dialog is displayed. Enter a unique name for the File System Fetch installation, and click on Next. Note that the unique name must not contain any spaces
4.
The Choose Destination Location dialog is displayed. Select the directory in which you want to install File System Fetch, and click on Next. By default this is C:\Autonomy\FileSystemFetch, but you can use the Browse button to navigate to another location.
5.
The Select Program Manager Group dialog is displayed. Select the Program Manager group to which you want to add icons for File System Fetch, and click on Next.
6.
The IDOL server Details dialog is displayed. Enter the following information for the IDOL server you want File System Fetch to index into, and click on Next: IP Address The IP address (or name) of the machine on which IDOL server is running. Index Port The port that is used to index documents into IDOL server (this must be the IndexPort or the ExtendedIndexPort that you have specified in the IDOL server configuration files [Server] section). Database The name of the IDOL server database in which you want to store the documents that File System Fetch aggregates.
Page 13
Installation 7. The File System Fetch Details dialog is displayed. Enter the following for File System Fetch, and click on Next: ACI Port The port File System Fetch listens on for action commands (this sets the Port parameter in the File System Fetch configuration files [Server] section). Service Port The port File System Fetch uses for service commands (see Appendix A: Service port commands on page 35). 8. The File System Fetch Services dialog is displayed Leave the box checked, if you want to start the File System Fetch service immediately after the installation, and click on Next. Otherwise, uncheck the box to complete the installation without immediately starting the File System Fetch service. 9. The MS Outlook (PST file) processing dialog is displayed. Check the PST file processing box if you want File System Fetch to be able to aggregate Outlook items (appointments, contacts, notes, tasks, messages and attachments) that are contained in PST files, and click on Next. 10. The Start Installation dialog is displayed. Click on Next to confirm the settings you have made and start the installation of File System Fetch. Alternatively, click on Back to return to previous dialogs if you want to make any changes. 11. The Installing dialog is displayed. The progress of the installation process is indicated. If you want to abort the installation process, click on Cancel. 12. The Add Shortcuts dialog is displayed. Select Yes or No to indicate whether you want to add shortcuts to the File System Fetch service to your Start menu, and click on Next. 13. The Installation Complete dialog is displayed. File System Fetch has been installed successfully. Click on Finish to exit the installation. If you selected to start the File System Fetch service immediately after the installation, it will now launch.
Page 14
Installation
convtables data filters binslave.cfg binslave.exe importslave.exe omnislave.cfg omnislave.exe pdfslave.cfg pdfslave.exe various DAT files various DLL files importTemp pstslave redemption.dll pstslave.cfg pstslave.exe <InstallationName>.cfg <InstallationName>.exe INSTALL.LOG Uninstall.exe
Folder that contains various text files that are used for language conversion. Folder from which File System Fetch aggregates data by default. Folder that contains executables that are used during the importing process. Configuration file that contains settings for binslave. Binslave executable (used during the importing process to extract text from binary files). Executable that generates IDX files for IDOL server. Configuration file that contains settings for omnislave. Omnislave executable that parses PDF files not in HTML or PDF format to IDX files. Configuration file that contains settings for pdfslave. Executable that parses PDF files to IDX files. Files used by binslave. Filters used by omnislave. Folder for temporary import data. Folder that contains pstslave files. Library file that is used in the processing of PSt files. Configuration file that contains settings for pstslave. Executable that parses PST files to IDX files. File System Fetch configuration file File System Fetch executable Installation log file Executable to uninstall File System Fetch from your computer
Page 15
Installation In addition, the following folder and files are created when you start the File System Fetch service: queue uid <installation_name>.dirstat0 <installation_name>.dirstat0.bak <installation_name>.lck <installation_name>.log <installation_name>.str <installation_name>cfg.log license.log service.log Folder that stores queued action commands and the results of queued actions (if you have set the results to be stored). Folder that contains document tracking files. Store of which files from the file system have been indexed by File System Fetch. A DIRSTAT file and backup are created for each File System Fetch job. Internally used lock file for File System Fetch. File System Fetch log file. File System Fetch structured configuration file. File System Fetch configuration log file. License log file. Service commands log file.
Page 16
Installation
Page 17
Installation 11. The Autonomy File System Fetch Installation text is displayed. Check that your settings are correct, and press Enter to confirm your settings and to install File System Fetch. If you want to change a setting, enter the corresponding number, press Enter and then enter a new value for the setting. Alternatively, type X or press Ctrl+C to cancel the installation. 12. The Installation complete dialog is displayed. You have successfully installed File System Fetch. Press Enter to finish.
Page 18
Installation
Page 19
Installation In addition, the following folder and files are created when you start the File System Fetch service: queue uid <installation_name>.dirstat0 <installation_name>.dirstat0.bak <installation_name>.lck <installation_name>.log <installation_name>.str <installation_name>.log Folder that stores queued action commands and the results of queued actions (if you have set the results to be stored) Folder that contains document tracking files. Store of which files from the file system have been indexed by File System Fetch. A DIRSTAT file and backup are created for each File System Fetch job. Internally used lock file for File System Fetch File System Fetch log file File System Fetch structured configuration file File System Fetch configuration file
Page 20
To display the online help 1. Issue the following command from your web browser: http://<host>:<port>/action=Help
<host> Enter the IP address (or name) of the machine on which File System Fetch is installed. <port> Enter the port number that client machines use to communicate with File System Fetch (this is specified by the Port setting in the File System Fetch configuration file's [Server] section).
2.
Click on the config help link in the top right-hand corner to display the configuration parameter help (by default the action command help is displayed). Note: the configuration file sections that each configuration parameter can be used in are listed under Allowed in Sections.
Note: You can also generate configuration help without starting File System Fetch. Issue the following command from the command line to generate html files in your installation directory: <FileSystemFetch_installation_directory_path><IDOLserver_installation_name>.exe -help
Page 21
Entering string values If the value that you want to enter for a parameter that requires a string contains quotation marks, you must put the value into quotation marks and escape each quotation mark that the string contains by putting a slash in front of it. For example: FIELDSTART0="<font face=\"arial\"size=\"+1\"><b>" Here the beginning and end of the string is indicated by quotation marks while all quotation marks that are contained in the string are escaped.
If you want to enter a comma separated list of strings for a parameter, and one of the strings contains a comma, you must indicate the start and the end of this string with quotation marks. For example: ParameterName=cat,dog,bird,"wing,beak",turtle
If any string within a comma separated list contains quotation marks, you must put this string into quotation marks and escaped the quotation marks in the string by putting a slash in front of them. For example: ParameterName="<font face=\"arial\"size=\"+1\"><b>",dog,bird,"wing,beak",turtle
Applying modifications to File System Fetch's operation New configuration settings only take effect once the File System Fetch service is stopped and restarted.
Page 22
Note: for import parameters that you can specify in the configuration file's [Default] and [<MyJob>] sections, please refer to the Import module manual.
[License] section
The [License] section contains licensing details. You should not edit this section, as that may cause File System Fetch to stop working. For example: [License] Holder=My Company Key=01234567890
Page 23
[Service] section
The [Service] section contains the details that File System Fetch requires, when it is run as a service under Autonomys Distributed Service Handler (DiSH). For example: [Service] ServicePort=10023 ServiceControlClients=127.0.0.1 ServiceStatusClients=127.0.0.1
[Server] section
This section contains general settings for indexing and querying. For example: [Server] Port=7000 QueryClients=10.1.1.*,127.0.0.1 AdminClients=10.1.1.10,127.0.0.1 Threads=2
[Default] section
The [Default] section contains default settings that apply for each Fetch job that is set up in the configuration file (in the individual Fetch job sections). If you configure settings in an individual Fetch jobs section, they override the default settings for this job. Note: in addition to File System Fetch configuration settings, you can also specify Import module settings in this section (or in individual Fetch job sections). Please refer to your Import module manual for details on the Import module. For example: [Default] PollingPostAction=0 PollingAction=7 PollingMaxNumber=1000 DreHost=127.0.0.1 QueryPort=9000 IndexPort=9001 Database=database0
Page 24
Configuring File System Fetch PollingMethod=2 PollingPeriod=10000 RemoveLogFileOnStart=on ImportIDXFilesAction=0 ImportStoreContent=on ImportTempDir=./importTemp ImportSummary=on ImportBreaking=ON ImportBreakingMinParagraphWords=300 ImportBreakingMaxParagraphWords=500 ImportBreakingMinDocWords=500 ImportIntelligentTitleSummary=0 ImportDefaultSlaveDirectory=./filters ImportCharsetConvTablesDirectory=./ConvTables ImportExtractDateFrom=8 ImportExtractDateToField=DREDATE ImportExtractDateToFormat=EPOCHSECONDS
[Configuration] section
The [Configuration] section lists all individual fetch jobs that you want File System Fetch to carry out. Note that you must list the fetch jobs in consecutive order, starting from 0. For example: [Configuration] Number=2 0=MyFirstJob 1=MySecondJob
Page 25
[<AFetchJob>] section
An individual fetch jobs section contains settings that only apply to this job. The settings that are set for an individual job override default settings (set in the [Default] section) for this job. Note: in addition to File System Fetch configuration settings, you can also specify Import module settings in this section (or in the [Default] section). Please refer to your Import Module manual for details on the Import module. For example: [MyFirstJob] DirectoryPathCSVs= DirectoryFileMatch=*.txt,*.htm* DirectoryRecurse=on [MySecondJob] DirectoryPathCSVs=./data DirectoryFileMatch=*.* DirectoryRecurse=off
Page 26
Page 27
ImportExtractDateFrom=8 ImportExtractDateToField=DREDATE ImportExtractDateToFormat=EPOCHSECONDS [Configuration] Number=1 0=Import [Import] DirectoryPathCSVs= DirectoryFileMatch=*.txt,*.htm?,*.pdf,*.doc,*.xls,*.ppt DirectoryRecurse=on
Page 28
To configure File System Fetch to import PST files: 1. Open the pstslave.cfg file in a text editor, and configure appropriate settings in the [Default] section in order to determine how the pstslave will operate. Note: the settings that you can configure are detailed in the File System Fetch online help (see Displaying help on configuration settings on page 21). 2. 3. Save the changes you have made, and close the configuration file. Find the [Configuration] section, increase the Number setting by 1, and list a new fetch job for PST file importing. For example: [Configuration] Number=3 0=MyFirstJob 1=MySecondJob 2=MyPstFileImportingJob 4. Create a new configuration file section for the new PST file importing fetch job you have listed. For example: [MyPstFileImportingJob] 5. Add appropriate settings for this fetch job to the new section. You must at least specify the following settings: PollingAction Enter 18 to pass the files that this fetch job aggregates to File System Fetchs pstslave for processing. PstSlaveName Enter the name of the pstslave executable that you want to use to process Outlook items (appointments, contacts, notes, tasks, messages and attachments) contained in PST files. By default this is pstslave.
Page 29
Importing PST files PstSlaveDirectory Enter the full path to the directory that contains the pstslave executable that you want to use to process Outlook items (appointments, contacts, notes, tasks, messages and attachments) contained in PST files. By default this is the current working directory. In addition, you can also specify the following settings as well as any other appropriate File System Fetch settings: PstBatchSize PstKeepExtractedFiles PstRootOutputDir Please refer to the online help for details on all available parameters (see Displaying help on configuration settings on page 21). For example: [MyPstFileImportingJob] PollingAction=18 DirectoryPathCSVs=C:\Autonomy\FileSystemFetch\data DirectoryRecurse=off DirectoryFileMatch=*.pst
Page 30
http://<host>:<port>/action=Help <host> Enter the IP address (or name) of the machine on which File System Fetch is installed.
<port> Enter the ACI port by which commands are sent to File System Fetch (this is specified by the Port setting in the File System Fetch configuration file's [Server] section).
Example:
http://12.3.4.56:4000/action=Help
This command uses port 4000 to request Help on action commands from File System Fetch which is located on a machine with the IP address 12.3.4.56.
Note: to display help on configuration settings, click on the config help link in the top right-hand corner (see Displaying help on configuration settings on page 21).
Page 31
http://<host>:<port>/action=<action>&<mandatory_parameters>&<optional_parameters>
<host> Enter the IP address (or name) of the machine on which File System Fetch is installed.
<port> Enter the ACI port by which commands are sent to File System Fetch (this is set by the Port parameter in the File System Fetch configuration file's [Server] section).
<action> Enter the name of the action that you want File System Fetch to execute (for example, Import).
<mandatory_parameters> Enter the parameters that the action that you have specified requires (not all actions require parameters).
<optional_parameters> You can enter optional parameters for the action that you have specified (optional parameters are not available for all actions).
Page 32
using services: 1. 2. 3. Display the Windows Services dialog. Select the <File System Fetch installation name> service, and click on the Start button to start File System Fetch. Click on the Close button to close the Services dialog.
Page 33
services: 1. 2. 3. Display the Windows Services dialog. Select the <File System Fetch installation name> service, and click on the Stop button to stop File System Fetch. Click on the Close button to close the Services dialog.
the service port: Send the following command to File System Fetchs service port (you need to have specified a service port in the File System Fetch configuration file):
http://<host>:<Service_Port>/action=stop
<host> The IP address (or name) of the machine on which File System Fetch is running.
<Service_Port> File System Fetchs service port (which is specified in the [Service] section of the File System Fetch configuration file).
Page 34
Page 35
GetConfig
The GetConfig command returns the services configuration file settings. http://<host>:<port>/action=GetConfig <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section.
GetLogStream
The GetLogStream command returns a specific log stream for the service. http://<host>:<port>/action=GetLogStream&Name=<name>&FromDisk=<true/ false>&Tail=<number> <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section. <name> Enter the name of the log stream that you want to return. <true/false> Enter true if you want the log stream to be read from disk rather than from memory. By default this is false. <number> Enter the number of lines that you want to return from the log stream. The lines are read from the top (that is the most recent lines are retuned). Enter -1 to return all entries (this is the default).
Page 36
GetLogStreamNames
The GetLogStreamNames command returns the names of the log streams that have been set up for the service. http://<host>:<port>/action=GetLogStreamNames <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section.
GetStatistics
The GetStatistics command returns statistics for the service. http://<host>:<port>/action=GetStatistics <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section.
Page 37
GetStatus
The GetStatus command returns the services status (running or stopped). http://<host>:<port>/action=GetStatus <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section.
GetStatusInfo
The GetStatusInfo command returns status information for the service (for example, the services product name, version number and so on). http://<host>:<port>/action=GetStatusInfo <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section.
Page 38
MergeConfig
The MergeConfig command allows you to merge the File System Fetch configuration file with one or more configuration file sections. Alternatively, you can use it to set or delete individual configuration parameters.
Using MergeConfig to merge a configuration file with one or more configuration file sections If the File System Fetch configuration file already contains a section that has the same name as the section with which it is going to be merged, any settings that only the new section contains are added to the existing section. If the new section contains settings that are already present in the existing section, the new section's settings overwrite the settings of the old section. Note: This command requires a POST request method action=MergeConfig&Config=<configuration_file_content> <configuration_file_content Enter the configuration file content that you want to merge with the content of the File System Fetch configuration file. Note that you must escape the configuration file content.
Using MergeConfig to set individual configuration parameters The MergeConfig command allows you to set one or more configuration parameters. http://<host>:<port>/action=MergeConfig&Key<n>=<param>&Value<n>=<value> <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section. <n> A unique number that identifies which <param> belongs to which <value>. <param> The configuration file section that contains the parameter you want to set, and the parameter whose value you want to set. Note that you need to specify this using the format: <config_file_section>/<parameter_name>
Page 39
<value> The value that you want to set for the corresponding <param>.
For example: http://1.23.45.6:10000/action=MergeConfig&Key0=Server/ QueueCleanSeconds&Value0=30& Key1=Default/DirectoryRecurse&Value1=true In this example, the MergeConfig command is used to set the value of the QueueCleanSeconds parameter in the configuration files [Server] section to 30, and to set the value of the DirectoryRecurse parameter in the configuration files [Default] to true.
Using MergeConfig to delete individual configuration parameters The MergeConfig command allows you to delete one or more configuration parameters. http://<host>:<port>/action=MergeConfig&DeleteKey<n>=<param> <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section. <n> A unique number for each <param> you want to delete. <param> The configuration file section that contains the parameter you want to delete, and the parameter you want to delete. Note that you need to specify this using the format: <config_file_section>/<parameter_name>
For example: http://1.23.45.6:10000/action=MergeConfig&Key0=Default/ StableCheckMinWaitTime&Key1=UserEm ail/RunMailer In this example, the MergeConfig command is used to delete the DeleteAfterAdd parameter from the configuration files [Default] section.
Page 40
SetConfig
The SetConfig command allows you to set the File System Fetch configuration file. Note: this command requires a POST request method action=SetConfig&Config=<configuration_file_content> <configuration_file_content Enter the configuration file content with which you want to overwrite the current content of the File System Fetch configuration file. Note that you must escape the configuration file content.
Stop
The Stop command stops the service http://<host>:<port>/action=Stop
<host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section.
Page 41
Page 42
Glossary
Connector
A Connector is an Autonomy fetching solution (for example HTTPFetch, File System Fetch and so on) that allows you to retrieve information from any type of local or remote repository (for example, a database or a web site). It imports the fetched documents into IDX or XML file format and indexes them into IDOL server from where you can retrieve them (for example by sending queries to IDOL server).
Database
An Autonomy database is a data pool that is contained within IDOL server. You can retrieve information that has been indexed into IDOL server from the database, for example, through submitting a query to IDOL server.
Fetching
The process of downloading documents from the location they are stored in (for example a local folder, a website, a database, a Lotus Domino server and so on), importing them to IDX format and indexing them into IDOL server.
IDOL server
Using Autonomy Connectors, Autonomy's Intelligent Data Operating Layer (IDOL) server integrates unstructured, semi-structured and structured information from multiple repositories through an understanding of the content, delivering a real time environment in which operations across applications and content are automated, removing all the manual processes involved in getting the right information, to the right people at the right time.
Importing
After a document has been downloaded from the location it is stored in, it is imported to an IDX file format. This process is called "importing".
Page 43
Indexing
After documents have been imported to IDX file format, their content is stored in IDOL server. This process is called "indexing".
Query
You can submit a natural language query to IDOL server which analyzes the concept of the query and returns documents that are conceptually similar to the query. You can also submit Boolean, bracketed Boolean and keyword searches to IDOL server.
Page 44
Index
A Action commands Help 21, 31 Syntax 32 Administration 4 [<AFetchJob>] section (configuration file) 26 Automater iii Autonomy Data flow and security 5 Infrastructure 1 B Boolean values 22 C Configuration 21 Entering Boolean values 22 Entering string values 22 Example configuration file 27 File sections 23 Modifying configuration parameter values 22 Configuration file [<AFetchJob>] section 26 [Configuration] section 25 [Default] section 24 Example 27 [License] section 23 [Server] section 24 [Service] section 24 [Configuration] section (configuration file) 25 Connector 3, 43 Controlling internal file import 9 D Database 43 [Default] section (configuration file) 24 Directory Polling 9 Directory structure UNIX 19 Windows 15 DiSH (Distributed Service Handler) 43 Displaying Help on configuration settings 21 Online help 31 Distributed systems 3 E Example configuration file 27 F Fetching 43 File Polling 9 File System Fetch Configuration 21 Directory structure 15, 19 Implementation procedure 12 Importing files 31 Installation 11, 13, 15, 17, 19 Introduction 7 Starting and stopping 33 System architecture 8 System requirements 11 G GetConfig (service port command) 36 GetLogStream (service port command) 36 GetLogStreamNames (service port command) 37 GetStatistics (service port command) 37 GetStatus (service port command) 38 GetStatusInfo (service port command) 38 H Help action 21, 31 I IDOL server 3, 43 Action command syntax 32 Data flow and security 5 Online help 31 System architecture 5 Implementation procedure 12
Page 45
Index Importing 43 Import action 31 Individual files 31 Outlook items 29 PST files 29 Indexing 44 Installation 11 On UNIX 17 On Windows 13 Interfaces 3 Introduction 7 L [License] section (configuration file) 23 M MergeConfig (service port command) 39 Modifying configuration parameter values 22 O Online help 21, 31 Outlook items Importing 29 P PODS 4 pos files 9 pos.bak files 9 PST files Importing 29 Q Query 6, 44 S [Server] section (configuration file) 24 Service port commands GetConfig 36 GetLogStream 36 GetLogStreamNames 37 GetStatistics 37 GetStatus 38 GetStatusInfo 38 MergeConfig 39 SetConfig 41 Stop 41 [Service] section (configuration file) 24 SetConfig (service port command) 41 Starting and stopping File System Fetch 33 Stop (service port command) 41 String values 22 System Architecture 8 Requirements 11 T Typographical conventions iii
Page 46