Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1.0
1.0
0.8
0.8
0.8
precision
precision
precision
0.6
0.6
0.6
0.4
0.4
0.4
WOEparse WOEparse WOEparse
0.2
0.2
0.2
WOEpos WOEpos WOEpos
TextRunner TextRunner TextRunner
0.0
0.0
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6
recall recall recall
Figure 2: WOEpos performs better than TextRunner, especially on precision. WOEparse dramatically im-
proves performance, especially on recall.
runs about 30X faster by only using shallow fea- triples are mixed with pseudo-negative ones and
tures. Since high speed can be crucial when pro- submitted to Amazon Mechanical Turk for veri-
cessing Web-scale corpora, we additionally learn a fication. Each triple was examined by 5 Turk-
CRF extractor WOEpos based on shallow features ers. We mark a triple’s final label as positive when
like POS-tags. In both cases, however, we gen- more than 3 Turkers marked them as positive.
erate training data from Wikipedia by matching
sentences with infoboxes, while TextRunner used 4.1 Overall Performance Analysis
a small set of hand-written rules to label training In this section, we compare the overall perfor-
examples from the Penn Treebank. mance of WOEparse , WOEpos and TextRunner
We use the same matching sentence set behind (shared by the Turing Center at the University of
DBp to generate positive examples for WOEpos . Washington). In particular, we are going to answer
Specifically, for each matching sentence, we label the following questions: 1) How do these systems
the subject and infobox attribute value as arg1 perform against each other? 2) How does perfor-
and arg2 to serve as the ends of a linear CRF mance vary w.r.t. sentence length? 3) How does
chain. Tokens involved in the expandPath are la- extraction speed vary w.r.t. sentence length?
beled as rel. Negative examples are generated Overall Performance Comparison
from random noun-phrase pairs in other sentences The detailed P/R curves are shown in Figure 2.
when their generalized-CorePaths are not in DBp . To have a close look, for each corpus, we ran-
WOE pos uses the same learning algorithm and domly divided the 300 sentences into 5 groups and
selection of features as TextRunner: a two-order compared the F-measures of three systems in Fig-
CRF chain model is trained with the Mallet pack- ure 3. We can see that:
age (McCallum, 2002). WOEpos ’s features include • WOEpos is better than TextRunner, especially
POS-tags, regular expressions (e.g., for detecting on precision. This is due to better training
capitalization, punctuation, etc..), and conjunc- data from Wikipedia via self-supervision. Sec-
tions of features occurring in adjacent positions tion 4.2 discusses this in more detail.
within six words to the left and to the right of the
• WOEparse achieves the best performance, es-
current word.
pecially on recall. This is because the parser
As shown in the experiments, WOEpos achieves
features help to handle complicated and long-
an improved F-measure over TextRunner between
distance relations in difficult sentences. In par-
15% to 34% on three corpora, and this is mainly
ticular, WOEparse outputs 1.42 triples per sen-
due to the increase on precision.
tence on average, while WOEpos outputs 1.05
and TextRunner outputs 0.75.
4 Experiments
Note that we measure TextRunner’s precision
We used three corpora for experiments: WSJ from & recall differently than (Banko et al., 2007)
Penn Treebank, Wikipedia, and the general Web. did. Specifically, we compute the precision & re-
For each dataset, we randomly selected 300 sen- call based on all extractions, while Banko et al.
tences. Each sentence was examined by two peo- counted only concrete triples where arg1 is a
ple to label all reasonable triples. These candidate proper noun, arg2 is a proper noun or date, and
Figure 4: WOEparse ’s F-measure decreases more
Figure 3: WOE pos achieves an
F-measure, which is slowly with sentence length than WOEpos and Tex-
between 15% and 34% better than TextRunner’s. tRunner, due to its better handling of difficult sen-
WOE parse achieves an improvement between 79% tences using parser features.
and 90% over TextRunner. The error bar indicates
one standard deviation.
he sold the company”, where “Sources” is
wrongly treated as the subject of the object
the frequency of rel is over a threshold. Our ex- clause. A sample error of the second type is
periments show that focussing on concrete triples hthisY ear, willStarIn, theM oviei extracted
generally improves precision at the expense of re- from the sentence “Coming up this year, Long
call.4 Of course, one can apply a concreteness fil- will star in the new movie.”, where “this year” is
ter to any open extractor in order to trade recall for wrongly treated as part of a compound subject.
precision. Taking the WSJ corpus for example, at the dip
The extraction errors by WOEparse can be cat- point with recall=0.002 and precision=0.059,
egorized into four classes. We illustrate them these two types of errors account for 70% of all
with the WSJ corpus. In total, WOEparse got errors.
85 wrong extractions on WSJ, and they are
Extraction Performance vs. Sentence Length
caused by: 1) Incorrect arg1 and/or arg2
We tested how extractors’ performance varies
from NP-Chunking (18.6%); 2) A erroneous de-
with sentence length; the results are shown in Fig-
pendency parse from Stanford Parser (11.9%);
ure 4. TextRunner and WOEpos have good perfor-
3) Inaccurate meaning (27.1%) — for exam-
mance on short sentences, but their performance
ple, hshe, isN ominatedBy, P residentBushi is
deteriorates quickly as sentences get longer. This
wrongly extracted from the sentence “If she is
is because long sentences tend to have compli-
nominated by President Bush ...”5 ; 4) A pattern
cated and long-distance relations which are diffi-
inapplicable for the test sentence (42.4%).
cult for shallow features to capture. In contrast,
Note WOEparse is worse than WOEpos in the low
WOE parse ’s performance decreases more slowly
recall region. This is mainly due to parsing er-
w.r.t. sentence length. This is mainly because
rors (especially on long-distance dependencies),
parser features are more useful for handling diffi-
which misleads WOEparse to extract false high-
cult sentences and they help WOEparse to maintain
confidence triples. WOEpos won’t suffer from such
a good recall with only moderate loss of precision.
parsing errors. Therefore it has better precision on
high-confidence extractions. Extraction Speed vs. Sentence Length
We noticed that TextRunner has a dip point We also tested the extraction speed of different
in the low recall region. There are two typical extractors. We used Java for implementing the
errors responsible for this. A sample error of extractors, and tested on a Linux platform with
the first type is hSources, sold, theCompanyi a 2.4GHz CPU and 4G memory. On average, it
extracted from the sentence “Sources said takes WOEparse 0.679 seconds to process a sen-
4 tence. For TextRunner and WOEpos , it only takes
For example, consider the Wikipedia corpus. From
our 300 test sentences, TextRunner extracted 257 triples (at 0.022 seconds — 30X times faster. The detailed
72.0% precision) but only extracted 16 concrete triples (with extraction speed vs. sentence length is in Figure 5,
87.5% precision). showing that TextRunner and WOEpos ’s extraction
5
These kind of errors might be excluded by monitor-
ing whether sentences contain words such as ‘if,’ ‘suspect,’ time grows approximately linearly with sentence
‘doubt,’ etc.. We leave this as a topic for the future. length, while WOEparse ’s extraction time grows
and the Stanford parse on Wikipedia is less accu-
rate than the gold parse on WSJ.
1.0
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
precision
precision
precision
0.4
0.4
0.4
CRF+w−w=WOEpos CRF+w−w=WOEpos CRF+w−w=WOEpos
CRF+w−tr CRF+w−tr CRF+w−tr
0.2
0.2
0.2
CRF+w−r CRF+w−r CRF+w−r
CRF+tr−tr CRF+tr−tr CRF+tr−tr
TextRunner TextRunner TextRunner
0.0
0.0
0.0
0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4
recall recall recall
Figure 6: Matching sentences with Wikipedia infoboxes results in better training data than the hand-
written rules used by TextRunner.
Figure 7: Filtering prepositional phrase attachments (PPa) shows a strong boost to precision, and we see
a smaller boost from enforcing a lexical ordering of relation arguments (1≺2).
parse
WOECJ50 meet the scalability requirement necessary to pro-
parse
WOEgold cess the Web. Mintz et al. (Mintz et al., 2009)
0.4
0.0 0.1 0.2 0.3 0.4 0.5 0.6 uses Freebase to provide distant supervision for
recall relation extraction. They applied a similar heuris-
Figure 8: Although today’s statistical parsers tic by matching Freebase tuples with unstructured
make errors, they have negligible effect on the sentences (Wikipedia articles in their experiments)
accuracy of WOE compared to operation on gold to create features for learning relation extractors.
standard, human-annotated data. Using Freebase to match arbitrary sentences in-
stead of matching Wikipedia infobox within corre-
sponding articles will potentially increase the size
on identifying general relations such as class at- of matched sentences at a cost of accuracy. Also,
tributes, while open IE aims to extract relation their learned extractors are relation-specific. Alan
instances from given sentences. Another seed- Akbik et al. (Akbik and Broß, 2009) annotated
based system StatSnowball (Zhu et al., 2009) can 10,000 sentences parsed with LinkGrammar and
perform both relation-specific and open IE by it- selected 46 general linkpaths as patterns for rela-
eratively generating weighted extraction patterns. tion extraction. In contrast, WOE learns 29,005
Different from WOE, StatSnowball only employs general patterns based on an automatically anno-
shallow features and uses L1-normalization to tated set of 301,962 Wikipedia sentences. The
weight patterns. Shinyama and Sekine proposed
KNext system (Durme and Schubert, 2008) per- text. WOE can run in two modes: a CRF extrac-
forms open knowledge extraction via significant tor (WOEpos ) trained with shallow features like
heuristics. Its output is knowledge represented POS tags; a pattern classfier (WOEparse ) learned
as logical statements instead of information rep- from dependency path patterns. Comparing with
resented as segmented text fragments. TextRunner, WOEpos runs at the same speed, but
Information Extraction with Wikipedia: The achieves an F-measure which is between 15% and
YAGO system (Suchanek et al., 2007) extends 34% greater on three corpora; WOEparse achieves
WordNet using facts extracted from Wikipedia an F-measure which is between 79% and 90%
categories. It only targets a limited number of pre- higher than that of TextRunner, but runs about
defined relations. Nakayama et al. (Nakayama and 30X times slower due to the time required for
Nishio, 2008) parse selected Wikipedia sentences parsing.
and perform extraction over the phrase structure Our experiments uncovered two sources of
trees based on several handcrafted patterns. Wu WOE ’s strong performance: 1) the Wikipedia
and Weld proposed the K YLIN system (Wu and heuristic is responsible for the bulk of WOE’s im-
Weld, 2007; Wu et al., 2008) which has the same proved accuracy, but 2) dependency-parse features
spirit of matching Wikipedia sentences with in- are highly informative when performing unlexi-
foboxes to learn CRF extractors. However, it calized extraction. We note that this second con-
only works for relations defined in Wikipedia in- clusion disagrees with the findings in (Jiang and
foboxes. Zhai, 2007).
Shallow or Deep Parsing: Shallow features, like In the future, we plan to run WOE over the bil-
POS tags, enable fast extraction over large-scale lion document CMU ClueWeb09 corpus to com-
corpora (Davidov et al., 2007; Banko et al., 2007). pile a giant knowledge base for distribution to the
Deep features are derived from parse trees with NLP community. There are several ways to further
the hope of training better extractors (Zhang et improve WOE’s performance. Other data sources,
al., 2006; Zhao and Grishman, 2005; Bunescu such as Freebase, could be used to create an ad-
and Mooney, 2005; Wang, 2008). Jiang and ditional training dataset via self-supervision. For
Zhai (Jiang and Zhai, 2007) did a systematic ex- example, Mintz et al. consider all sentences con-
ploration of the feature space for relation extrac- taining both the subject and object of a Freebase
tion on the ACE corpus. Their results showed lim- record as matching sentences (Mintz et al., 2009);
ited advantage of parser features over shallow fea- while they use this data to learn relation-specific
tures for IE. However, our results imply that ab- extractors, one could also learn an open extrac-
stracted dependency path features are highly in- tor. We are also interested in merging lexical-
formative for open IE. There might be several rea- ized and open extraction methods; the use of some
sons for the different observations. First, Jiang and domain-specific lexical features might help to im-
Zhai’s results are tested for traditional IE where lo- prove WOE’s practical performance, but the best
cal lexicalized tokens might contain sufficient in- way to do this is unclear. Finally, we wish to com-
formation to trigger a correct classification. The bine WOEparse with WOEpos (e.g., with voting) to
situation is different when features are completely produce a system which maximizes precision at
unlexicalized in open IE. Second, as they noted, low recall.
many relations defined in the ACE corpus are Acknowledgements
short-range relations which are easier for shallow
We thank Oren Etzioni and Michele Banko from
features to capture. In practical corpora like the
Turing Center at the University of Washington for
general Web, many sentences contain complicated
providing the code of their software and useful dis-
long-distance relations. As we have shown ex-
cussions. We also thank Alan Ritter, Mausam,
perimentally, parser features are more powerful in
Peng Dai, Raphael Hoffmann, Xiao Ling, Ste-
handling such cases.
fan Schoenmackers, Andrey Kolobov and Daniel
6 Conclusion Suskin for valuable comments. This material is
based upon work supported by the WRF / TJ Cable
This paper introduces WOE, a new approach to Professorship, a gift from Google and by the Air
open IE that uses self-supervised learning over un- Force Research Laboratory (AFRL) under prime
lexicalized features, based on a heuristic match contract no. FA8750-09-C-0181. Any opinions,
between Wikipedia infoboxes and corresponding findings, and conclusion or recommendations ex-
pressed in this material are those of the author(s) A. Gangemi M. Ciaramita. 2005. Unsupervised learn-
and do not necessarily reflect the view of the Air ing of semantic relations between concepts of a
Force Research Laboratory (AFRL). molecular biology ontology. In IJCAI.
Andrew Kachites McCallum. 2002. Mallet:
A machine learning for language toolkit. In
References http://mallet.cs.umass.edu.
E. Agichtein and L. Gravano. 2000. Snowball: Ex- Mike Mintz, Steven Bills, Rion Snow, and Dan Juraf-
tracting relations from large plain-text collections. sky. 2009. Distant supervision for relation extrac-
In ICDL. tion without labeled data. In ACL-IJCNLP.
Alan Akbik and Jügen Broß. 2009. Wanderlust: Ex- T. H. Kotaro Nakayama and S. Nishio. 2008. Wiki-
tracting semantic relations from natural language pedia link structure and text mining for semantic re-
text using dependency grammar patterns. In WWW lation extraction. In CEUR Workshop.
Workshop.
Dat P.T Nguyen, Yutaka Matsuo, and Mitsuru Ishizuka.
Sören Auer and Jens Lehmann. 2007. What have inns- 2007. Exploiting syntactic and semantic informa-
bruck and leipzig in common? extracting semantics tion for relation extraction from wikipedia. In
from wiki content. In ESWC. IJCAI07-TextLinkWS.
Marius Pasca. 2008. Turning web text and search
M. Banko, M. Cafarella, S. Soderland, M. Broadhead, queries into factual knowledge: Hierarchical class
and O. Etzioni. 2007. Open information extraction attribute extraction. In AAAI.
from the Web. In Procs. of IJCAI.
Fuchun Peng and Andrew McCallum. 2004. Accurate
Razvan C. Bunescu and Raymond J. Mooney. 2005. Information Extraction from Research Papers using
Subsequence kernels for relation extraction. In Conditional Random Fields. In HLT-NAACL.
NIPS.
Hoifung Poon and Pedro Domingos. 2008. Joint Infer-
R. Bunescu and R.Mooney. 2005. A shortest ence in Information Extraction. In AAAI.
path dependency kernel for relation extraction. In
HLT/EMNLP. Y. Shinyama and S. Sekine. 2006. Preemptive infor-
mation extraction using unristricted relation discov-
Eugene Charniak and Mark Johnson. 2005. Coarse- ery. In HLT-NAACL.
to-fine n-best parsing and maxent discriminative
Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2005.
reranking. In ACL.
Learning syntactic patterns for automatic hypernym
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, discovery. In NIPS.
T. Mitchell, K. Nigam, and S. Slattery. 1998. Learn- Fabian M. Suchanek, Gjergji Kasneci, and Gerhard
ing to extract symbolic knowledge from the world Weikum. 2007. Yago: A core of semantic knowl-
wide web. In AAAI. edge - unifying WordNet and Wikipedia. In WWW.
Dmitry Davidov and Ari Rappoport. 2008. Unsuper- Mengqiu Wang. 2008. A re-examination of depen-
vised discovery of generic relationships using pat- dency path kernels for relation extraction. In IJC-
tern clusters and its evaluation by automatically gen- NLP.
erated sat analogy questions. In ACL.
Fei Wu and Daniel Weld. 2007. Autonomouslly Se-
Dmitry Davidov, Ari Rappoport, and Moshe Koppel. mantifying Wikipedia. In CIKM.
2007. Fully unsupervised discovery of concept-
specific relationships by web mining. In ACL. Fei Wu, Raphael Hoffmann, and Danel S. Weld. 2008.
Information extraction from Wikipedia: Moving
Marie-Catherine de Marneffe and Christopher D. Man- down the long tail. In KDD.
ning. 2008. Stanford typed dependencies manual. Min Zhang, Jie Zhang, Jian Su, and Guodong Zhou.
http://nlp.stanford.edu/downloads/lex-parser.shtml. 2006. A composite kernel to extract relations be-
tween entities with both flat and structured features.
Benjamin Van Durme and Lenhart K. Schubert. 2008.
In ACL.
Open knowledge extraction using compositional
language processing. In STEP. Shubin Zhao and Ralph Grishman. 2005. Extracting
relations with integrated information using kernel
R. Hoffmann, C. Zhang, and D. Weld. 2010. Learning methods. In ACL.
5000 relational extractors. In ACL.
Jun Zhu, Zaiqing Nie, Xiaojiang Liu, Bo Zhang, and
Jing Jiang and ChengXiang Zhai. 2007. A systematic Ji-Rong Wen. 2009. Statsnowball: a statistical ap-
exploration of the feature space for relation extrac- proach to extracting entity relationships. In WWW.
tion. In HLT/NAACL.