Sei sulla pagina 1di 9

Research Report

Beyond Search:
What to do when
Your Enterprise Search
System Doesn't Work

April 2, 2008

by Stephen E. Arnold
Beyond Search: Intelligenx

12. Intelligenx
www.intelligenx.com
Intelligenx is one of those companies with solid technology which is off the radar. But it
was Intelligenx’s Discovery Engine that was the secret ingredient for the Carlyle Group
when it sold Dex Media to R H. Donnelley Corporation for $9.4 billion. Search
technology from Intelligenx also substantively changed how the Office of AIDS
Research manages and administers research grants at U.S. NIH. And it was their
Discovery Engine that helped to transform the way in which D&B licenses data to
libraries around the country.

Iqbal Talib, his son, and a cadre of skilled engineers have built technology that permits
users to search and interact with incredibly complex datasets. The core product
offering, Discovery Engine is unique in that it was built ground-up to enable full-text
search with categorizations. The display of intuitive refinements (with counts) that are
derived from the structure in data helps users to find and ‘discover’ information.

Item Quick Facts

Product Intelligenx Discovery Engine

Price Starts at $50,000. Custom price quote required.

Key Feature Full-text search with categorizations.

Provide access to structured information, so that users can interact and


Purpose
discover

Clients Publicar, Axesa, MediaTel, OAR at NIH, TDS, ilocal, D&B, WebVisible

Company Privately-held

Contact +1-703-793-3270

Table 26: Quick Look at Intelligenx


Mr. Talib told Beyond Search, “Our Company was one of the first to introduce a
combined full-text search coupled with navigation. What we discovered was that there
are far more effective ways to let users interact with information. We also found that we
could engineer systems to deliver unprecedented search features and functionalities at
far lower costs and without many of the challenges and bottlenecks associated with
other conventional search methods.” The son, Zubair Talib, is the CTO. He attended
MIT and, with some friends from school, developed the first algorithms that are still the
foundation of Discovery Engine.

The Carlyle Group purchased Denver, CO-based Dex Media for $7.05B. Over the next
26 months, Dex launched a new Internet strategy that harnessed the power of
Discovery Engine. On DexOnline, users could conduct a Google-like full-text search and
for the first time anywhere, they could search all the text from all of Dex Media’s print
directories. Users could refine the search results in order to find (or discover) what they
were looking for. The site was responsive and users took to the interactive search

©2008 Gilbane Group, Inc. 180 http://gilbane.com


Beyond Search: Intelligenx

functionality. During the time Carlyle owned Dex Media, usage of DexOnline
skyrocketed (10-fold increase in traffic) propelling Dex from Internet obscurity to the
number 1 traffic position within its 13 state region, ahead of Google Local, Yahoo Local,
and Switchboard.

Since Dex, Intelligenx has won a number of highly competitive contracts with large
directory publishers around the world who use Discovery Engine to provide interactive
access to yellow page information over the Internet. Mr. Talib said,

We had success with directory publishers because our technology can


easily handle very large traffic volumes, large data sets, and complex
business logic. Directory publishers also face challenges with how to
monetize their traffic and how to scale their business models – a problem
that Discovery Engine solves quite naturally.

The company’s system allows you to search content from a print yellow page ad
(including brands, locations and hours of operation, for instance), including the
standard name, address, and category fields. A user does not have to specify which
fields to query. Each result set is then presented in “buckets,” or collections of on-target
results, not a list of results. You can then refine or “drill down” into these buckets to
find particular listings quickly and intuitively. The suggestion of results that may be
related to the initial query allows you to discover information that they may not have
known even existed.

Figure 42: Intelligenx Discovery Engine


The Discovery Engine includes separate APIs: one for indexing, one for acquiring content, and
one for data transformation. The system can be integrated into almost any enterprise
environment.

©2008 Gilbane Group, Inc. 181 http://gilbane.com


Beyond Search: Intelligenx

The Technology
Discovery Engine is proprietary technology. The approach combines full-text search
with fielded search. The result is that the system that provides all the benefits of and
capabilities of conventional full-text search technology and all the search capabilities
that exist in relational database management systems (RDBMs), combined with
navigation and counts. Discovery Engine helps to exploit the underlying structure of the
data for refinements and many other assisted search techniques; it also resolves failed
queries.

With more than a decade of computer science and development, the Discovery Engine
incorporates innovative algorithms for compressing, optimizing and searching
processed content. The approach required a “ground up” rethinking of content
processing, according to the company. Innovations include algorithms for data
compression and storage, content processing, and distributed parallel processing. A
high-level schematic of the Discovery Engine illustrates a number of incorporated
components.

The system does not require a third-party database. A licensee can use commodity
servers to scale the system. Like Google, the Intelligenx approach allows additional
storage and servers to be added without complicated configuration and certification
processes.

Intelligenx’s founder told Beyond Search:

Typical implementations achieve an 80 percent reduction in hardware,


hosting and enterprise database costs. Our software simply bolts on to an
existing enterprise infrastructure, eliminating expensive integration work.
In fact, many of our customers retrofit our system into their existing data
and maintenance infrastructure.

©2008 Gilbane Group, Inc. 182 http://gilbane.com


Beyond Search: Intelligenx

Figure 43: Paginas Amarillas' use of the Intelligenx Interface


The Intelligenx system makes it possible to display a result set with hot links to other Web
pages and related categories. The two-panel display used in Paginas Amarillas displays
related content in the left-hand panel of the display.

Linguistics

The system includes support for linguistic techniques to improve query understanding.
The standard Discovery Engine linguistics toolkit includes spelling checkers, stemmers,
stop word removers, and synonym updating functions. These tools support multiple
languages including multi-byte languages like Japanese, Chinese and Arabic. The
linguistics tools are used within the query transformation infrastructure that can be
used to extend the capabilities of Discovery Engine. This infrastructure can also be used
to perform complex query transformation tasks such as parsing complex Boolean
queries, including Boolean NOTs, translating query operators from different languages,
performing category matches preferentially, and constraining or loosening a query.

APIs

The architecture of the Discovery Engine includes a number of components. The


application programming interfaces make it possible to integrate the Intelligenx system
into other enterprise applications, Web pages, or a portal. The APIs and extensions are
fully documented. The product is typically shipped with a Software Development Kit
(SDK) that contains sample configuration files as well as the entire toolset required to
manage a real application on a real deployment. The SDK contains a sample application

©2008 Gilbane Group, Inc. 183 http://gilbane.com


Beyond Search: Intelligenx

along with data, source code and display files that can be used as a starter kit for
developing a customer-specific application.

The Index API provides all of the functions required to construct an Intelligenx index
from a copy of the customer's data feed. The Search API provides all of the functions
required to search an Intelligenx index. Particular strengths of the Search API are the
very flexible and customizable ranking and sorting methods, query expansion and
linguistic modifiers, inclusion of complex search logic and search trees, and failed
search handling methods. The index and search plug-ins are typically application-
specific code written to process the customer's raw data feed, as well as satisfy the
business requirements specified by the customer. While accessible through an API or
XML web service, Discovery Engine is also packaged with a presentation layer that
consists of visualization pages, e.g., JSP or ASP, to accept a user's query and present the
relevant results.

Other APIs available include a Crawler API for crawling the web and accumulating a
web index to augment the customer's data, as well as a Reporting API for generating
statistical information about the queries processed by the Search API and a
Management API for administering a deployment.

In addition to the public APIs, Intelligenx provides a number of documented extension


sub-systems that can be used to enhance the capabilities of the basic search engine.
These extensions can be used, among other tasks, to augment the indexing process,
configure the query transformation process and control the results ranking process.
Intelligenx also provides a suite of pre-written implementations of these extensions that
suffice to satisfy the business rules of most customers. However, customer-specific
requirements can be incorporated quickly by writing fresh implementations within this
infrastructure.

Intelligenx Features
The system includes a number of interesting features. For example, content processed
is automatically categorized and appropriate metadata generated and linked to the
content. The system can process XML, structured data, or unstructured text.

More recently, Intelligenx has packaged its internal data mining tools into rich business
intelligence log analysis tools. These add-on products, Ad Optimizer and Site
Optimizer, build on the Discovery Engine architecture to provide deep, interactive
information about usage. AdOptimizer, tracks user behavior and generates real-time
reports about those actions. One application of AdOptimizer is to permit real-time
inspection of users’ interaction with suggested content. These reports can be syndicated
to allow advertisers, users, or licensee staff to make adjustments to certain system
components; for example, content boosting or advertising fees. SiteOptimizer helps
determine relationships and correlations between user behavior and how those
relationships can be used to drive improvements to the search application.

Another recent add-on, Content Enhancer, crawls web pages and extracts relevant and
meaningful content and entities from web pages in order to enhance the original
content repository.
©2008 Gilbane Group, Inc. 184 http://gilbane.com
Beyond Search: Intelligenx

Feature Beyond Search Comment

Knowledgebase Support None needed. The system “discovers” entities and categories

Query Types Boolean, free text, and assisted navigation

Visualization Outputs can be displayed as tables or other representations

Entity Extraction Not applicable

Platforms Supported Linux, Windows

Export Content can be generated in XML or user-defined formats

The Discovery Engine can be integrated with any third-party


Third-Party Support
application

Vertical Support Publishing

The system includes strong analytic support including various


Analytic Functions numeric functions. Additional mathematical processes may be
integrated via the APIs

Table 27: Technical Highlights for Intelligenx


Other Intelligenx features include:

Geospatial data support so results can be searched, mapped or manipulated by


geo parameters
Configurable categorization and relevance ranking thresholds
Key word highlighting in results
Near real-time index updating
Multi-threaded architecture to take advantage of multicore processors
Built in content transformation tools
Federated search capability to search across disparate repositories
The system is language-independent and provides a configurable security model based
on the operating system in use. For public access, the system supports hypertext
transport protocol (HTTP) authentication. The system has no limit on the number of
documents or the amount of content it can process and index.

Discovery Engine in Action


You can explore the functionality of the Intelligenx system at Publicar’s Spanish
language directory portal at http://www.paginasamarillas.com/. Publicar is the largest
directory publisher in South America. Traffic has almost doubled for Publicar since
deploying Discovery Engine and the site processes millions of queries per day with high
performance. Publicar will add on extensions for wireless search and SMS that will
utilize the core search infrastructure built on Intelligenx technology.

Other Intelligenx current customers include:

Axesa (Puerto Rico, formerly Verizon Information Systems Puerto Rico)


Conselho Federal da Justiça (Justice Department Brazil)
©2008 Gilbane Group, Inc. 185 http://gilbane.com
Beyond Search: Intelligenx

DeTelefoongids (Netherlands)
Dun & Bradstreet (USA)
iLocal (Netherlands, Belgium, Luxemburg)
Localeze (USA)
National Institutes of Health (US Federal Government)
MediaTel (Czech Republic)
WebVisible (USA)
411.ca (Canada)

Upside
The upsides of the Discovery Engine pivot on the system’s ability to handle very large
volumes of content even at extremely high loads. Beyond Search’s tests revealed
response times in the 100 millisecond range for our test queries. Other upsides include:

Support for structured and unstructured information regardless of the source


document’s language or the physical location of the data.
A scalable architecture that allows licensees to expand the system’s
infrastructure with commodity hardware. Note that Intelligenx also offers
hosted solutions and a suite of web services for merchant-level reporting and
search analytics.
Discovery Engine has excellent failed-search handling
A well-documented and comprehensive suite of APIs with sample code.
Intelligenx makes integration and extension of its system less painful than some
of the other companies profiled in this study.

Downside
The downside of Intelligenx is the low profile the company has adopted in its 10 year
history. Even though the firm is projected to generate $4 to $6 million in profitable
revenue in 2008, most information professionals are not aware of the company’s high-
performance, feature-rich system. And because the company has captured a number of
international customers (mostly directory publishers) Discovery Engine is perceived as
only a local search technology. That’s not true.

In reality, Discovery Engine can bolt on to any database or content repository, including
native XML files and deliver blinding performance, equal to or better than many of the
features associated with Endeca’s or Fast Search & Transfer’s systems. If your
applications require scalable full-text search with categorizations, then you ought to
know about Intelligenx.

Other drawbacks include:

The system performs best when the source content is structured; for example,
content from a database or well-formed XML

©2008 Gilbane Group, Inc. 186 http://gilbane.com


Beyond Search: Intelligenx

The basic system can be used in its default mode. However, tuning the system or
integrating it with third party applications requires study of the API
documentation and may involve writing scripts
The company offers a range of professional services. Some of the work is
performed by senior developers. If you want a large, custom project in a very
short time, you may have to wait until the firm’s technical highly trained staff
becomes available.

Net-Net
The truth is that processing so much information so quickly is not so easy using
conventional search technology. Using the wrong technology to achieve this sort of
functionality has its limitations including challenges with performance and scalability.
Today, Intelligenx’s performance over the Internet and its high-speed indexing is closer
to that delivered by Google than most other Web search systems. The software has also
been battle tested under heavy loads where it has delivered the goods.

The system is adept in its manipulation of structured data. It is even possible to use the
Discovery Engine as a database engine, eliminating most of the hassles and processing
bottlenecks associated with traditional relational database architectures. Like Google,
Intelligenx technology works on commodity class clustered computing environments so
that scaling is easy and cost effective.

The product is flexible enough to support custom query transformations to enhance the
user experience. As well. it can provide totally customized ranking/sorting/filtering
schemes in order to accommodate the relevance and ordering of search results. A full
set of APIs, interfaces and complete documentation enables rapid application
development and easy, rapid deployment.

If you want to make use of assisted navigation and offer key word searching, you will
want to take a long, hard look at the Intelligenx system. Using it as the data
management foundation, Intelligenx makes it relatively easy to hook in specialized
visualized, statistical, even additional content processing functionality.

©2008 Gilbane Group, Inc. 187 http://gilbane.com

Potrebbero piacerti anche