Sei sulla pagina 1di 27

Phoenix2

A Tool for Web-Based Annotation of Medieval Texts


COST Workshop Connecting Textual Corpora and Dictionaries

Samuel L aubli1,2
1 Institute

Martin-Dietrich Glessgen1

of Romance Studies University of Zurich of Computational Linguistics University of Zurich

2 Institute

April 26, 2013

Samuel L aubli | 2/27

Contents
1. Background Corpus Digital Edition Tools 2. Phoenix2 in Use Import Querying Annotation External Editing 3. Hands-On Session 4. Conclusion

Samuel L aubli | 3/27

Background

1.

Background
Les plus anciens documents linguistiques de la France

Samuel L aubli | 4/27

Background

Corpus

Corpus
Les plus anciens documents linguistiques de la France (DocLing)
Old French charters of the 13th century Collection founded by Jacques Monfrin (Ecole Nationale des Chartes) Now pursued by Martin-Dietrich Glessgen (University of Zurich) Currently comprises over 2000 documents from dierent regions

Samuel L aubli | 5/27

Background

Corpus

Corpus
D epartements 1. Published Volumes Oise Haute-Marne Vosges Aube, S.-et-M., Yonne 2. Revised Volumes Meurthe-et-Moselle Douai Jura Marne Meuse Moselle Ni` evre Haute-Sa one Sa one-et-Loire Chancellerie royale Arnod, Glessgen Mestayer, Brunner Muller Kiha Matthey Pitz Alletsgruber Muller Alletsgruber Videsott Adapted from [Glessgen, 2011]
Samuel L aubli | 6/27

Editors [Adaptors]

# Doc.

Carolus-Barr e [Tock, Gr ubl] Gigot [Tock, Kiha ] Lanher [Trotter] Coq

202 142 285 103 290 350 105 230 250 180 30 155 95 150 [+350]

3. New Volumes in Progress

Background

Digital Edition

Digital Edition
Project lead: Martin-Dietrich Glessgen Aimed at editing Old French charters of the 13th century Charters are manually transcribed into a machine-readable format Double encoding principle: a) Original (ancient) view b) Modern view Use the same data for print and online editions

Samuel L aubli | 7/27

Background

Tools

Digital Edition Requirements


Functional Requirements:
Editor for assisting editors in transcribing charters Storage and management of transcribed charters Querying of transcribed charters Annotation Text level (date, genre, regest, ...) Word level (Lemma, PoS, Morphology, ...) Export in distinct formats for: Print publication Web publication Research (working formats) Use within other tools External Editing

Samuel L aubli | 8/27

Background

Tools

Digital Edition Requirements


Functional Requirements:
S-1.1 S-1.2

Working process, programs

XML-EDITOR

TAGGING TOOL

LEXICOGRAPHIC TOOL

Entities/Data charter xml-charter mapping entry

enhanced xml-charter

UML Control Flow

Samuel L aubli | 9/27

Background

Tools

Digital Edition Requirements


Quality Requirements:
Powerful yet easy to use Fast querying Easily accessible (client-server architecture) Use of non-commercial technology

Samuel L aubli | 10/27

Background

Tools

Phoenix2: Architecture
Phoenix2 is a web-based tool for managing, querying, and annotating medieval texts.
PHOENIX2 Web Interface (browser-based)

phoenix2-css CSS-Framework XHTML CSS

jQuery Javascript-Framework

JavaScript

PHP

Apache Webserver

MySQL RDBMS

informal

Samuel L aubli | 11/27

Phoenix2 in Use

2.

Phoenix2 in Use
Live Demonstration

Samuel L aubli | 12/27

Phoenix2 in Use

Import

Live Demonstration

Importing Texts

Samuel L aubli | 13/27

Phoenix2 in Use

Import

Machine-Readable Format: XML/XSD


Phoenix2 builds upon texts encoded in an idiosyncratic XML format. We use three schemata:
entry: Lightweight markup aimed at facilitating the initial

transcription of charters (original format). Either tokenized or untokenized.


storage: Main format for use within Phoenix2. Thoroughly tokenized;

all Tokens are typed (tok/num/punct).


edit: Similar to storage, but slightly adapted for use in external

XML editors.
Extra attributes for word-level annotations Checksums for re-import into Phoenix2 (check-in)

Samuel L aubli | 14/27

Phoenix2 in Use

Import

Indexing Texts in a Relational Database


Why does importing texts take quite a while? Texts are indexed into a relational database We use a relational MySQL database. This allows for
Fast querying Linking additional entities to texts without including them in the XML Storing system data (user accounts, settings, ...)

Samuel L aubli | 15/27

Phoenix2 in Use

Import

Indexing Texts in a Relational Database

Samuel L aubli | 16/27

Phoenix2 in Use

Querying

Live Demonstration

Querying Texts

Samuel L aubli | 17/27

Phoenix2 in Use

Querying

Regular Expressions
Queries in Phoenix2 can be formulated using Regular Expressions.
abb e nds all words that contain the string abb e

abb e, abb es, ...


^pou?r$ nds por and pour. [aeiou]{3} nds words that contain three consecutive vowels ...

Samuel L aubli | 18/27

Phoenix2 in Use

Annotation

Live Demonstration

Annotating Words

Samuel L aubli | 19/27

Phoenix2 in Use

External Editing

Live Demonstration

External Editing

Samuel L aubli | 20/27

Hands-On Session

3.

Hands-On Session
Try it Yourself

Samuel L aubli | 21/27

Hands-On Session

Log In
All you need is
Any modern internet browser Internet connection

Log in via
URL: tiny.uzh.ch/2A User: cost Password: action

Enter login credentials twice Feel free to explore and manipulate whatever you want its just a copy.

Samuel L aubli | 22/27

Conclusion

4.

Conclusion
Phoenix2 A Tool for Web-Based Annotation of Medieval Texts

Samuel L aubli | 23/27

Conclusion

Conclusion
Phoenix2 is an implementation based on the most recent

computational and philological standards.


It is aimed at
Transperancy of all data and source codes (i.e., well-documented open

source technology)
Connectivity through well-dened interfaces Persistance of all data and interfaces Usability for both experts and novices

We pursue the stringent and uncompromising synthesis of philology, linguistics, and information technology based on a long-term, intensive cooperation between computational linguistics and special branches of academic knowledge.

Samuel L aubli | 24/27

Conclusion

Conclusion
Feel free to try and get in touch with us. Feedback is very welcome.

Samuel L aubli | 25/27

Thank You
These slides are available at www.cl.uzh.ch/people/team/laeubli.html

Samuel L aubli | 26/27

Bibliography
Glessgen, M.-D. (2011). Pr esentation g en erale: architecture et m ethodologie du projet des plus anciens documents linguistiques de la france, edition electronique. In Glessgen, M.-D., Kiha , D., and Videsott, P., editors, L elaboration philologique et linguistique des Plus anciens documents linguistiques de la France, Edition electronique (Biblioth` eque de lEcole des Chartes 168), pages 8394.

Samuel L aubli | 27/27

Potrebbero piacerti anche