Sei sulla pagina 1di 4

Loki+Lire: A Framework to Create Web-Based Multimedia

Search Engines
Giuseppe Becchi, Marco Bertini,
Lorenzo Cioni, Alberto Del Bimbo,
Andrea Ferracani, Daniele Pezzatini

Mathias Lux
Klagenfurt University - ITEC
Klagenfurt, Austria

mlux@itec.aau.at

Universit degli Studi di Firenze - MICC


Firenze, Italy

[name.surname]@unifi.it
ABSTRACT
In this paper we present Loki+Lire, a framework for the
creation of web-based interfaces for search, annotation and
presentation of multimedia data. The framework provides
tools to ingest, transcode, present, annotate and index different types of media such as images, videos, audio files and
textual documents. The front-end is compliant with the
latest HTML5 standards, while the back-end allows system
administrators to create processing pipelines that can be
adapted for different tasks and purposes.
The system has been developed in a modular way, aiming
at creating loosely coupled components, letting developers
to use it as a whole or to select only the parts that are needed
to develop their own tools and systems.

Categories and Subject Descriptors


H.3.5 [Information Storage and Retrieval]: Online Information Services; H.4 [Information Systems Applications]: Miscellaneous

ments that may range from textual documents (e.g. presentations) to images, audio and videos; ii) the fact that these
media are consumed, distributed and presented through the
web. Under these circumstances it is necessary to develop
systems and components that are capable to handle diverse
media, accounting for their different presentation and for
how users interact with them. From the point of view of
content managers it is required to create different processing, annotating and indexing pipelines that let to provide
different types of services, such as keyword-based retrieval or
content-based multimedia retrieval. The system presented
in this paper caters for all these needs: the media presentation components, tailored for each type of media, allow
to browse, search and annotate, while the back-end components let to create processing pipelines that include ingestion, transcoding and indexing.
The framework can be used for different purposes and by
different users, such as:
Researchers who need an interface to demo their own
annotation systems, that can be added as processes in
the back-end;

General Terms
Algorithms, Design, Experimentation

Researchers and practitioners who need to embed just


some of the Loki+Lire components in their own interfaces, e.g. to create demo or commercial web services;

Keywords
Semantic multimedia annotation, retrieval, content-based
multimedia retrieval, open source software

1.

Researchers who need components to create ground


truth annotations to create datasets or validate experiments, e.g. using web-based crowd-sourced services
such as Amazon Mechanical Turk;

INTRODUCTION

Two important trends of multimedia production and consumption of the latest years are i) the fact that anyone,
from end users to professionals, have become creators of every type of digital data, producing any type of media docu

Content managers or media producers who need to


archive and retrieve one or more of the different types
of media handled by the framework;
Lecturers and teachers who need an educative tool for
their courses related to multimedia.

Corresponding author
Authors listed in alphabetical order

2.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
MM14, November 37, 2014, Orlando, Florida USA.
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.

THE SYSTEM

The system has been designed using a loosely coupled approach for all its components, that have been developed as
plugins for well-established frameworks like jQuery or tools
like Solr/Lucene. This approach lets users to deploy it as
a whole or to select only the components, either from the
front-end or the back-end, that are needed. In particular,
all the interface components of Loki are fully scriptable and

Il componente per video


can be easily embedded in other systems; similarly, the backend components are web services that can be substituted or
Visualizzazione video
reused. The fronted components provide all the widgets re Aggiunta annotazioni ed
quired to create a user interface, while the back-end
compoestrazione keyframe
I risultati
di
ricerca
nents manage users, resources metadata, mediaAutomatica:
features video
and
annotations, and media processing pipelines. processato
A full crossin fase di
upload
media search dei
engine
is provided
as an example
application
La visualizzazione
risultati
della ricerca
Manuale: ad opera
(Fig. 1 shows an example of the main search interface).

Similar frames may come from the same or different videos.

degli utenti

Ricerca annotazioni

Ricerca keyframe simili

in altri video

Figure 2: Video component: video with overlaid annotations and frames visually similar to the
one currently shown (left); search within the video
player: hovering on a result shows the corresponding keyframe in the player.(right).
15 di 19

2.1.2

Audio

The audio component (Fig. 3) is similar to the video component, and has the same properties to activate the functionalities required for the application in which it is embedded.
The main difference is that it shows the audio wave form,
Figure 1: Cross-media search interface built using
that can be used as a cue when browsing and annotating
the Loki+Lire framework.
10 di Il
19 componente per audio
the audio file. Since it is not yet possible to compute this
wave audio
form in realtime using JavaScript, it is automatically
Riproduzione
computed
ingestion and transcoding of the audio
Visualizzazione
formaduring
donda perthe
la navigazione
2.1 The Front-End
filesannotazioni
by a back-end service.
Aggiunta
The fronted components have been developed fully re Ricerca annotazioni
specting the HTML5 standard using HTML5 and CSS3 for
the presentation and JavaScript for the business logic. Two
JavaScript frameworks have been used to create snappier
and more interactive widgets: AngularJS and JQuery, comprising some extensions of the latter framework, such as
jQuery UI. In particular, using AngularJS it has been possible to develop the interface components using the ModelFigure 3: Audio component: audio file with overlaid
View-Controller design pattern. In MVC terms, the View
annotations (left); search within the audio player
corresponds to the HTML code, while the operations on the
(right).
model, that interact with the presentation and are part of
the Controller, are written in JavaScript.
Il componente per immagini
2.1.3 Image
The media components, developed as jQuery plugins
thus allowing their integration in other web-based systems
The image component (Fig. 4) has some differences w.r.t. authat use jQuery let to visualize media, annotate them (it
dio and video components, in that annotations are not shown
Visualizzazione immagine
is possible to activate this functionality depending on the
as overlays and there is no need to perform searches within
access level of a user), search within the media that is bethesame
media.
On the other hand it allows to annotate the
Aggiunta
annotazioni
ing shown. These widgets interact with the back-end using
whole image or just a portion of it. Hovering on localized
Totali: sullintera
Il componente
per
immagini
SOAP web services. All the components
have a large
numannotations
related bounding boxes are shown.
immagine
ber of properties that let to personalize their functionalities
Bounding-box: in unarea
and appearance.
17 di 19

2.1.1

Video

delimitata dellimmagine

Visualizzazione immagine
Aggiunta annotazioni

The video player component (Fig. 2) shows videos and


sullintera
their annotations synchronized accordingTotali:
to their
timecode,
immagine The plugin
allowing also for frame accurate annotations.
offers different levels of functionality and
can be adapted
Bounding-box:
in unarea
from a simple video player to a fully-fledged
annotation
tool
delimitata
dellimmagine
changing a single configuration parameter. Annotations can
be either tags/keywords or results of speech transcriptions.
When a new annotation is added a keyframe is extracted
and indexed for CBIR. The component provides also search
capability within the same video, showing previews of the
result frames (when hovering over the retrieved results), and
can be used for keyframe-based CBIR. Clicking on the similar frames, shown inside an expandable panel of the widget,
the player starts the video they belong to at their timecode.

Figure 4: Image component: image with annota16 di 19


tions: clicking on the + sign it is possible to annotate
the whole image (left); localized image annotations
(right).

2.1.4

Document

The document component


16 di 19 (Fig. 5) shows textual documents using SVG for each page and JPEG for the thumbnails that are arranged as a scrollable list on one side of the

per

Introduzione
Homepage dellapplicazione
widget main view and allow document navigation. Thanks
to SVG it is possible to easily zoom in and out of the page,
or move it while it is zoomed. Page thumbnails are used also
when showing the results of the intra-document search, that
behaves thus similarly to video search. User annotations
can be localized within specific portions of the page, thus
behaving similarly to the image component. Furthermore
the component allows collaboration. The user can choose to
visualise only his or other users annotations. Each authors
annotation is visualised with its own color and shows the
nickname of the user by who was added.
documenti

I filtri di ricerca

La ricerca pu`
o essere filtrata per paro
chiave, visualizzate nel quadro Filters

ocumenti in
e SVG

out di pagina

Figure 6: Cross-media search interface built using


the Loki+Lire framework: starting searches; since
theI filtri
user di
is logged
ricerca in the system he can save the best
results to create his own personalized collection of
media items.

ca

9 di 19

I clusters di immagini I risultati possono essere ulteriormente


La ricerca
pu`
o essere filtrata per parole
filtrati per tipo,
attivando/disattivand
chiave, visualizzate nel quadro Filters.
quattro filtri presenti nella toolbar.

pagine
re

Le immagini molto simili tra loro


clusters.

Figure 5: Document18 dicomponent:


document with
19
vengono raggruppate
in
opened and localized annotations
(left); searching
within the same document, with thumbnail of a result (right).

I risultati possono essere ulteriormente


filtrati per tipo, attivando/disattivando i
quattro filtri presenti nella toolbar.

Filtri
keyword
Cluster chiuso
Unper
cluster
pu`o trovarsi in due
I clusters di immagini
Filtri
per
keyword
Figure 7: Search facets: concepts (left); media type
possibili stati:
2.1.5

Loki: a cross-media search engine

(right).

Chiuso:
immagini
cross-media search
engine le
(Fig.
1) is pro- al suo interno

A full-fledged
vided, so that it can be used asnon
an application
or as
vengonoitself
visualizzate
12 di 19
point molto
for developers
to start to use the
Lea starting
immagini
similiwhotrawant
loro
framework. This application uses all the components deAperto:
le immagini al suo interno
vengono
raggruppate
in clusters.
scribed above
and the back-end
components and function12 di 1
alities described in the next section.
vengono visualizzate allutente
The system starts letting users to search with a keyword
Cluster chiuso
Un
cluster apu`
o trovarsi
due
or uploading
media
from theirinPC
(Fig. 6). Users that log
Figure 8: Results cluster:Cluster
visualization
of a closed
aperto
in
the
system
can
create
collections
by
selecting
the
search
possibili stati:
cluster of images/keyframes (left); visualization of
results that are more interesting for them. Results can be filthe content of the cluster: it is possible to inspect
using twolefacets,
media type
(Fig. 7), and
tered
Chiuso:
immagini
aland
suoconcepts
interno
and interact with each element of the cluster (right).
are clustered based on their similarity to provide a more dinon
visualizzate
verse
andvengono
compact result
set (Fig. 8). Similarity search is ac13 di 19
tivated by dragging and dropping a result item in the search
tions; iv) scripts to ingest and transcode media to the forarea;
if no content-based
similarity
is possible
(e.g. between
Aperto:
le immagini
al suo
interno
mats that can be handled by the HTML5 presentation coman image and an audio file) then keyword similarity is used
ponents that are part of the front-end. These components
vengono
visualizzate
allutente
(e.g. using image tags and audio tags).
have been developed in PHP and Java.

2.2

The Back-End

The main components of the back-end are: i) a relational


database to manage media, metadata, semantic annotations
(i.e. tags and concepts automatically detected or added by
users using the fronted components), media collections and
users, e.g. to let users create their own sets of media by uploading them to the system or by starring their preferred results when searching media; ii) a Solr server instance to store
syntactic annotations (low-level features used for CBIR, obtained using the Lire plugin) and indexing all media data
and annotations to perform cross-media searches; iii) SOAP
web services to perform CRUD operations on the annota-

2.2.1 Lire:
a Solr plugin for high performance CBIR
Cluster
aperto
This component is used to index images and keyframes,
providing CBIR functionality to the framework. It is used
also to create clusters of images and keyframes that are visually similar, thus improving the diversity of retrieved results
di 19 keywords or by similarity.
both when searching13with
While Solr is a well known and well performing text search
engine, the Lire plugin extends it by i) piggy-backing the
content based image features onto indexed text documents,
and ii) hashing the features in a way, so that the inverted
index search can be used for sub-linear retrieval. Basically,
the hash values are used to identify a set of n  l candidate

results, whereas l is the number of desired results. Within


these n candidates l results are found with linear search.
Most critical points are the linear search within the candidate set, for which the feature vectors have to be loaded and
decoded as fast as possible, and the hashing function, which
should provide hashes diverse enough to work well with the
index, but yet with enough of engineered hash collisions to
provide a good candidate set.
Main additions to Solr are the request handler for content
based, visual search, a sort and re-rank function for text
retrieval results based on visual similarity, and an ingest
routine and data model for the visual features [?, 1]. Furthermore, a Solr entity processor takes care of automatic
ingest from various data sources including RSS and Atom
feeds and data bases. Currently five global image features
are supported by the Lire Solr plugin: MPEG-7 ColorLayout
and EdgeHistogram [?], JCD [?], PHOG [?] and a color histogram in the opponent color space. This provides a broad
range of different types of features with (fuzzy) color and
texture, as well as joint histograms. Other features can be
added easily. SIFT feature indexing and search has already
been contributed by the open source community. While the
development of the Lire Solr plugin is an ongoing project, its
current form is extremely stable and is for instance applied
for trademark search by the World Intellectual Property Organization (WIPO)1 .

2.2.2

Media transcoding

HTML5 has introduced new tags to handle multimedia


content: <video>, <audio> and <svg>, that let to embed
these media without requiring plugins such as Adobe Flash.
However browsers do not support all the same set of media
containers (i.e. file formats) and codecs, thus ingested videos
have to be converted to MPEG-4 files with H.264 compression, OGV/Theora and Google WebM, while audio files have
to be transcoded to MP3 and OGG/Vorbis formats. Images
that are not already PNG or JPEG are converted to JPEG.
The most complex operation is required for text documents,
like Powerpoint or Word: they are converted to SVG (allowing an infinite level of zooming due to the vectorial nature
of this format) and to JPEG (to provide thumbnails).

3.

plugin) from install/solr/lib into the Solr installation.


Check the Solr configuration files in install/solr/conf/, in
particular check that the <lib /> directives in solr-config.xml
link correctly to the JAR files, and check the definition of
the SQL data import handler data-config.xml (change DB
user and password used to access the SQL server as well as
URLs - all marked with <! FILL VALUE xxx >). Copy
the updated files to the Solr configuration folder. Modify
the following Loki configuration files with values of Tomcat,
Solr and database servers: i) app/js/serviceInclude.js,
ii) service/config/db-default.php, iii) app/config.php,
iv) app/js/config.js, .
Lire as well as the Lire Solr plugin are licensed under GNU
General Public License (GPL) v2. Lire Solr is provided as is
at https://bitbucket.org/dermotte/liresolr. To install
only the Lire Solr plugin (without the Loki components) in
a running Solr instance use the following instructions. After
downloading the source code from the repository, a single
JAR should be generated using Apache Ant3 (ant task dist).
Then, after adding this jar to the classpath so that Solr can
find it, the new request handler and the sort function have
to be added to the solrconfig.xml file. Finally, the fields for
content based retrieval two for each visual feature are
added to the index schema along with the definition of the
custom index field type storing the feature vectors. Detailed
installation instructions are given on https://bitbucket.
org/dermotte/liresolr.

Acknowledgments. Part of the research leading to these


results has received funding from the EU 7th FP managed by
REA (FP7/2007-2013) under grant agreement no. 262428.

4.

REFERENCES

[1] Mathias Lux. LIRE: open source image retrieval in


Java. In Proc. of ACM Multimedia, pages 843846,
2013.

INSTALLATION AND LICENSES

The project web page is available at http://www.micc.


unifi.it/vim/opensource/. Source code, precompiled libraries for a fast installation, and full installation instructions are available on https://github.com/miccunifi/Loki.
The Loki license is the Apache License 2.0, that allows to use
it in commercial products and, in case of private use does
not require the release of the code. To install the Loki+Lire
framework there is need of a web server (e.g. Apache), a
SQL database server (e.g. MySQL) and an application server
(e.g. Apache Tomcat) with Solr. Download and install the
latest Solr under Tomcat following the official Apache instructions2 . Copy the whole web/ folder into the document
root of the Loki+Lire installation on the HTTP server. Create a database restoring the SQL schema dump in install/db/.
Copy the all the JAR libs (among which there is the Lire Solr
1
see the image filter in the WIPO Brand DB at http://
www.wipo.int/branddb/
2
https://wiki.apache.org/solr/SolrTomcat#
Installing_Solr_instances_under_Tomcat

http://ant.apache.org/

Potrebbero piacerti anche