Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Beyond Search:
What to do when
Your Enterprise Search
System Doesn't Work
April 2, 2008
by Stephen E. Arnold
Beyond Search: Intelligenx
12. Intelligenx
www.intelligenx.com
Intelligenx is one of those companies with solid technology which is off the radar. But it
was Intelligenx’s Discovery Engine that was the secret ingredient for the Carlyle Group
when it sold Dex Media to R H. Donnelley Corporation for $9.4 billion. Search
technology from Intelligenx also substantively changed how the Office of AIDS
Research manages and administers research grants at U.S. NIH. And it was their
Discovery Engine that helped to transform the way in which D&B licenses data to
libraries around the country.
Iqbal Talib, his son, and a cadre of skilled engineers have built technology that permits
users to search and interact with incredibly complex datasets. The core product
offering, Discovery Engine is unique in that it was built ground-up to enable full-text
search with categorizations. The display of intuitive refinements (with counts) that are
derived from the structure in data helps users to find and ‘discover’ information.
Clients Publicar, Axesa, MediaTel, OAR at NIH, TDS, ilocal, D&B, WebVisible
Company Privately-held
Contact +1-703-793-3270
The Carlyle Group purchased Denver, CO-based Dex Media for $7.05B. Over the next
26 months, Dex launched a new Internet strategy that harnessed the power of
Discovery Engine. On DexOnline, users could conduct a Google-like full-text search and
for the first time anywhere, they could search all the text from all of Dex Media’s print
directories. Users could refine the search results in order to find (or discover) what they
were looking for. The site was responsive and users took to the interactive search
functionality. During the time Carlyle owned Dex Media, usage of DexOnline
skyrocketed (10-fold increase in traffic) propelling Dex from Internet obscurity to the
number 1 traffic position within its 13 state region, ahead of Google Local, Yahoo Local,
and Switchboard.
Since Dex, Intelligenx has won a number of highly competitive contracts with large
directory publishers around the world who use Discovery Engine to provide interactive
access to yellow page information over the Internet. Mr. Talib said,
The company’s system allows you to search content from a print yellow page ad
(including brands, locations and hours of operation, for instance), including the
standard name, address, and category fields. A user does not have to specify which
fields to query. Each result set is then presented in “buckets,” or collections of on-target
results, not a list of results. You can then refine or “drill down” into these buckets to
find particular listings quickly and intuitively. The suggestion of results that may be
related to the initial query allows you to discover information that they may not have
known even existed.
The Technology
Discovery Engine is proprietary technology. The approach combines full-text search
with fielded search. The result is that the system that provides all the benefits of and
capabilities of conventional full-text search technology and all the search capabilities
that exist in relational database management systems (RDBMs), combined with
navigation and counts. Discovery Engine helps to exploit the underlying structure of the
data for refinements and many other assisted search techniques; it also resolves failed
queries.
With more than a decade of computer science and development, the Discovery Engine
incorporates innovative algorithms for compressing, optimizing and searching
processed content. The approach required a “ground up” rethinking of content
processing, according to the company. Innovations include algorithms for data
compression and storage, content processing, and distributed parallel processing. A
high-level schematic of the Discovery Engine illustrates a number of incorporated
components.
The system does not require a third-party database. A licensee can use commodity
servers to scale the system. Like Google, the Intelligenx approach allows additional
storage and servers to be added without complicated configuration and certification
processes.
Linguistics
The system includes support for linguistic techniques to improve query understanding.
The standard Discovery Engine linguistics toolkit includes spelling checkers, stemmers,
stop word removers, and synonym updating functions. These tools support multiple
languages including multi-byte languages like Japanese, Chinese and Arabic. The
linguistics tools are used within the query transformation infrastructure that can be
used to extend the capabilities of Discovery Engine. This infrastructure can also be used
to perform complex query transformation tasks such as parsing complex Boolean
queries, including Boolean NOTs, translating query operators from different languages,
performing category matches preferentially, and constraining or loosening a query.
APIs
along with data, source code and display files that can be used as a starter kit for
developing a customer-specific application.
The Index API provides all of the functions required to construct an Intelligenx index
from a copy of the customer's data feed. The Search API provides all of the functions
required to search an Intelligenx index. Particular strengths of the Search API are the
very flexible and customizable ranking and sorting methods, query expansion and
linguistic modifiers, inclusion of complex search logic and search trees, and failed
search handling methods. The index and search plug-ins are typically application-
specific code written to process the customer's raw data feed, as well as satisfy the
business requirements specified by the customer. While accessible through an API or
XML web service, Discovery Engine is also packaged with a presentation layer that
consists of visualization pages, e.g., JSP or ASP, to accept a user's query and present the
relevant results.
Other APIs available include a Crawler API for crawling the web and accumulating a
web index to augment the customer's data, as well as a Reporting API for generating
statistical information about the queries processed by the Search API and a
Management API for administering a deployment.
Intelligenx Features
The system includes a number of interesting features. For example, content processed
is automatically categorized and appropriate metadata generated and linked to the
content. The system can process XML, structured data, or unstructured text.
More recently, Intelligenx has packaged its internal data mining tools into rich business
intelligence log analysis tools. These add-on products, Ad Optimizer and Site
Optimizer, build on the Discovery Engine architecture to provide deep, interactive
information about usage. AdOptimizer, tracks user behavior and generates real-time
reports about those actions. One application of AdOptimizer is to permit real-time
inspection of users’ interaction with suggested content. These reports can be syndicated
to allow advertisers, users, or licensee staff to make adjustments to certain system
components; for example, content boosting or advertising fees. SiteOptimizer helps
determine relationships and correlations between user behavior and how those
relationships can be used to drive improvements to the search application.
Another recent add-on, Content Enhancer, crawls web pages and extracts relevant and
meaningful content and entities from web pages in order to enhance the original
content repository.
©2008 Gilbane Group, Inc. 184 http://gilbane.com
Beyond Search: Intelligenx
Knowledgebase Support None needed. The system “discovers” entities and categories
DeTelefoongids (Netherlands)
Dun & Bradstreet (USA)
iLocal (Netherlands, Belgium, Luxemburg)
Localeze (USA)
National Institutes of Health (US Federal Government)
MediaTel (Czech Republic)
WebVisible (USA)
411.ca (Canada)
Upside
The upsides of the Discovery Engine pivot on the system’s ability to handle very large
volumes of content even at extremely high loads. Beyond Search’s tests revealed
response times in the 100 millisecond range for our test queries. Other upsides include:
Downside
The downside of Intelligenx is the low profile the company has adopted in its 10 year
history. Even though the firm is projected to generate $4 to $6 million in profitable
revenue in 2008, most information professionals are not aware of the company’s high-
performance, feature-rich system. And because the company has captured a number of
international customers (mostly directory publishers) Discovery Engine is perceived as
only a local search technology. That’s not true.
In reality, Discovery Engine can bolt on to any database or content repository, including
native XML files and deliver blinding performance, equal to or better than many of the
features associated with Endeca’s or Fast Search & Transfer’s systems. If your
applications require scalable full-text search with categorizations, then you ought to
know about Intelligenx.
The system performs best when the source content is structured; for example,
content from a database or well-formed XML
The basic system can be used in its default mode. However, tuning the system or
integrating it with third party applications requires study of the API
documentation and may involve writing scripts
The company offers a range of professional services. Some of the work is
performed by senior developers. If you want a large, custom project in a very
short time, you may have to wait until the firm’s technical highly trained staff
becomes available.
Net-Net
The truth is that processing so much information so quickly is not so easy using
conventional search technology. Using the wrong technology to achieve this sort of
functionality has its limitations including challenges with performance and scalability.
Today, Intelligenx’s performance over the Internet and its high-speed indexing is closer
to that delivered by Google than most other Web search systems. The software has also
been battle tested under heavy loads where it has delivered the goods.
The system is adept in its manipulation of structured data. It is even possible to use the
Discovery Engine as a database engine, eliminating most of the hassles and processing
bottlenecks associated with traditional relational database architectures. Like Google,
Intelligenx technology works on commodity class clustered computing environments so
that scaling is easy and cost effective.
The product is flexible enough to support custom query transformations to enhance the
user experience. As well. it can provide totally customized ranking/sorting/filtering
schemes in order to accommodate the relevance and ordering of search results. A full
set of APIs, interfaces and complete documentation enables rapid application
development and easy, rapid deployment.
If you want to make use of assisted navigation and offer key word searching, you will
want to take a long, hard look at the Intelligenx system. Using it as the data
management foundation, Intelligenx makes it relatively easy to hook in specialized
visualized, statistical, even additional content processing functionality.