Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ranking
we show and indicate how effect can the human- retrieval. For the given question, we first segment
free system perform. Our method is built on the each sentences and passages in the given document
two-pass re-ranking process and a density-based set respective to the question. Some of the pas-
scoring function. Both the two pass ranking were sages might describe the same topic. Therefore, we
employed the density-based scoring function. Be- perform a clustering algorithm to group similar
fore first pass ranking, we adopt an efficient top- passages into the same paragraph. Finally the two-
down clustering algorithm, then pass one ranking pass re-ranking models are used to retrieve the use-
model retrieves several important paragraphs via ful and informative passages at first pass. For each
the scorer. For each retrieved passage, the second retrieved passage, the second pass, sentence re-
pass ranking model selects the most important sen- trieval component, selects the most important sen-
tence and adds it to the summary via the same tence for a passage and add it to be the summary.
scoring function. Finally, we re-order the retrieved To make the summary more readable, the added
sentences according to their time-stamps. sentences are re-ordered according to their time-
This paper is organized as following, Section 2 stamps.
describes overview of our system, and Section 3 In the following subsections, we will introduce
describes the density-based scoring function. In the first component in Section 2.1. For the two re-
Section 4, we present the evaluations and experi- trievers and the clustering method are introduced
mental results. At Section 5, we draw the future in Sections 2.2, 2.3,and 2.4.
direction and conclusion.
2.1 Passage/Sentence Segmentation
2 System Description In this step, the sentences are first segmented. The
words are not stemmed and tokenized. We do not
The target of multi-document text summarization perform the word-stemmer to represent the root of
is to extract or refine important sentences from words. Instead, this will be done in the clustering
different documents that belong to the same topic. and ranking steps. The sentences segmentation is
As described above the goal is quite similar to the carried out with a tool1. This tool can successfully
Q/A task. However, we tries to combine some ad- identify boundaries between sentences without to-
vanced techniques in Q/A research domain, e.g., kenizing words.
the powerful density-based passage retrieval algo- The documents in the DUC-provided set had
rithm (Tellex et al., 2003; Lee et al., 2001). Figure been annotated with passage boundaries. Without
1 shows the overall architecture of our system. employing additional passage segmentation tool,
There are four main components within our we directly use the tag to split paragraphs.
model, namely sentence/passage segmentation,
passage clustering, passage retrieval and sentence 1
http://l2r.cs.uiuc.edu/~cogcomp/tools.php
trieval algorithm to retrieve top 10 clustered pas-
2.2 Passage/Sentence Clustering sages. We then use the sentence retrieval to extract
sentences from the 10 passages.
In multi-document summarization scenario, multi-
ple passages or sentences may describe the same
concept meanings. To reduce the redundancy, a 2.4 Sentence Retrieval
conventional clustering technique is applied to We treat each retrieved passage as separable con-
group similar passages into the same group. Dif- cept. Our summary is mainly derived from these
ferent from previous studies, we use the top-down concepts. Therefore for each passage, we extract
bisecting K-means algorithm (Zhao and Karypis, an important sentence within it to represent the
2002) for clustering. The bisecting K-means algo- concept. Unlike most approaches, which estimate
rithm is a top-down step-by-step clustering method the similarity between the centroid and each sen-
that incrementally performs K(=2)-means to split tence. Instead, the density-based retrieval algo-
the largest group into two sub-clusters. In the rithm again is used to measure the importance of
Zhou’s study, they have shown that the bisecting each sentence inside the cluster. At previous stage,
K-means outperformed the traditional K-means in some large clusters may be ranked higher due to
document clustering task. Nevertheless the passage they contain more question word. Although the
clustering is very similar to the document cluster- centroid sentence indicates that it shared common
ing. Hence, we select the bisecting K-means algo- words or bigrams as similar sentences. It could not
rithm to avoid the risk of randomly initialization of be used to be the summary since it would not suffi-
the traditional K-means. cient to answer the given question.
We slightly modify the bisecting K-means and Moreover, the sentence that can be used to form
set the number of clusters as 300. There are several the summary should be able to answer the question.
criteria functions to evaluate the quality of a clus- Thus, we perform the density-based retrieval algo-
tering result. We use the internal criteria functions rithm to rank the importance of each sentence
to measure the similarity inside the cluster. This within the same cluster. For the retrieved top 10
function was demonstrated as a very effect method clusters, we derive 10 most important sentences to
to determine the clustering result at each splitting form the summary. To make the summary more
step for bisecting K-means. In addition, the settings readable, each sentence is further re-ordered ac-
of the algorithm are almost the same as the litera- cording to its time-stamps in the original document.
ture (Zhao and Karypis, 2002), except for the ex- If the summary contains more than 250 words, we
ample representation. In order to capture more remove the final part of the summary to enable the
accurate meanings in passages, we do not only use size no more than the size limitation.
the traditional bag of words model (with Porter
stemming), but also include the bag of bigrams.
Bigrams are meaningful than unigram. 3 Density-based Retrieval Algorithm
Searching answers in a small dataset is more effi-
2.3 Passage Retrieval cient than in the whole corpus. To find out answers
Both passage retrieval and sentence retrieval com- in passages is much easier than searching the
ponents adopt the same ranking model to extract whole relevant document set. In this section, we
important passages and sentences. As mentioned will introduce our retrieval model.
above, our method is based on the two pass rank-
ing models. We can replace the two pass frame- 3.1 Passage Retrieval
work with one-pass sentence ranking. However, in
this way, it will cause the sentences too similar to The passage retriever segments each retrieved
make the reader over-understanding. In section 3, document into passages and retains the paragraphs
we will discuss the scoring function that devotes that contain at least one of the query terms. We
on retrieving important words that appear both in implemented the similar idea of the IBM (Ittyche-
passage and the given question. riah et al., 2001), SiteQ (Lee et al., 2001), and ISI
For the passage retrieval, we simply use the (Hovy et al, 2001) passage retrievers and modified
given question as query to the density-based re-
the ranking functions. There are three common when the term could be found in the synonym,
features within our algorithms. hypernym, hyponym sets of the question
words. If a term was matched more than twice,
(1) Query expansion we select the highest level as its weight. For
Query expansion is useful when a question example, when the term matched with W1,
contains very few informative terms, for ex- and W2, W1 were chosen.
ample.
“What is an atom?” (3) Density-based ranking
The question asks for a definition of the term The density-based ranking method is different
“atom” where there is no other content word from traditional similarity scoring function
in the question. In this case, both relevant and which focuses on the density between
irrelevant passages will be retrieved, which matched keywords. Traditional similarity cri-
makes the passage retriever difficult to rank terion, like Euclid, and Hamming distance,
these passages. In order to acquire more in- estimates the matched numbers of the two
formation for this short query, we use Word- vectors without considering the geometric re-
Net to expand query terms. We derive lations among these matched words. Density-
synonyms, hyponyms, and hypernyms of all based ranking method calculates the distance
of the content words in the question. For the between each of the matched keywords. The
above example, without considering the sense closer the matched words appear, the higher
of the query term, all the hypernyms, hypo- rank the passage can be ordered. As reported
nyms, and synonyms of “atom” will be ex- by (Tellex et al., 2003), the density-based
tracted through WordNet querying. ranking method can enhance the passage re-
trieval performance. In this paper, we use the
(2) Keyword weighting same density-based scoring function which
As for the second feature “keyword weight- was defined by (Lee et al., 2001) as following
ing”, terms will be weighted in different level. equation.
This technique aims to highlight some impor-
tant content words, like named entity terms k −1
W (tj ) + W (tj + 1) k
Score( Si ) = ∑ ×
and content words in the retrieved passages. j =1 dist ( j , j + 1)
2
k −1
In this paper, we define the seven degree of
scoring function, W(tj) is the weighting score (see the seven
matched types) of term j; k is the number of
W1: Named entity match (1.5) matched terms between question and passage.
W2: Question first noun phrase match (1.2) The equation “dist(j, j+1)” computes the
W3: Question term exact match (1) number of words between matched term j and
W4: Stem match (0.7) j+1. Figure 2 shows the overall passage re-
W5: Synonym match (0.5) trieval algorithm.
W6: Hyponym match (0.4)
W7: Hypernym match (0.3) Before starting, we shall remove unimportant
passages (clusters) which contain no question
If a term in the passage has the same named words in the passage. In the first step, named entity
entity type in the question, then this term will tagger identifies the proper nouns in the original
be given W1 (W1 = 1.5) weight. The second question. In this paper, we employ the named en-
type gives weight (W2 = 1.2) when a term ap- tity tagger proposed by (Wu et al., 2006). The
pears in the first noun phrase of the question. named entity tagger was trained with MUC-7 train-
The third type gives weight (W3 = 1) for a ing and development set based on SVM learning.
term which matches any of the question The performance of this NER-tagger is 86.40 in F(β)
words. The forth type gives weight (W4 = 0.7) rate (Wu et al., 2006). Then, we give more weight
for a term if it matches with one of the ques- to the first noun POS (part-of-speech) tag due to
tion stemming words. The remaining match the first appear noun often contains more informa-
types give the weight (W5~ W7) for a term tion. To identify noun POS tags in text we also
Table 1: Overall score of our system in DUC-2006
Ling Quality Responsiveness Responsiveness ROURGE2 ROURGE Pyramid
System BE-Score
Mean (Content) (Overall) Score SU4 Score
Our Method 3.3 2.4 1.9 0.06 0.11 0.0 0.1374
System (AVG) 3.35 2.56 2.19 0.07 0.13 0.04 0.12
Human (AVG) 4.84 4.75 4.74 0.11 0.17 0.07 0.00
System-baseline 3.38 2.54 2.19 0.07 0.13 0.04 0.12
adopt the Brill-tagger (Brill, 1995) to recognize topic statement. Our summarization system
most noun words. achieved 2.4 and rank 24 (out of 34) on content
The second step aims to extend more knowl- response score. The overall responsiveness score
edge from WordNet for content words. The third was 1.9 and rank 31 (out of 34). We do not sur-
step evaluates the weight for each term in the pas- prise the low score since we do not pre-define any
sage according to the seven matched types. Step templates or rules to pre-assume the summaries.
four calculates the density score function for each All of the sentences are fully scored from the set of
passage. Here, we select the top N (N=10) passages documents.
to be the candidates.
4.2 Linguistic Quality
Preprocessing: Removes passages that contain no This measurement estimates the linguistic quality
question terms. of the auto-generated summaries. NIST employ
several human experts who develop the given topic.
Step1: Identifies Named Entities, question first noun They created the following judgments for evalua-
phrase terms, and content words in the given tions.
question.
Step2: Extracts the synonym, hypernym, hyponym ♦ Grammatically
terms of all content words.
♦ Non-redundancy
Step3: Weights all terms in each passage according
to the match level (W1~W7). ♦ Referential clarity
Step4: Calculate the density score (see equation (3)) ♦ Focus
of each passage ♦ Structure and coherence
Step5: Select top N passages as the answer candi-
dates. Each summary is judged for each of the above fac-
Figure 2: Passage Retrieval Algorithm tor and gave it the score from one (lowest) to five
(highest). As shown in Table 1, we found our auto-
summarizer attended quite satisfactory score (at the
4 Evaluation Results middle rank).
DUC-2006 has evaluated summaries in several
ways: human evaluation with pyramid score, re- 4.3 ROUGE
sponsiveness to the topic and linguistic quality, and The recall-oriented understudy of gisting evalua-
automatic ROUGE evaluations. The overall results tion (ROUGE) (Lin, 2004) is a statistical summari-
are shown in Table 1. zation measurement. ROUGE computes the recall-
based metrics using N-gram matching between the
4.1 Responsiveness candidate summary and a reference set of summa-
ries. The longer the N-gram matches, the higher
This evaluation gives the responsiveness score be- score the summary achieves. Table 2 shows the
tween one (lowest) to five (highest) to each auto- evaluation result of our method in ROUGE meas-
matic summaries. Responsiveness is a urement.
measurement that is supposed to contribute toward
satisfying the information need expressed in the
Table 2: ROUGE Evaluation retrieve important concepts in documents. We start
System ROUGE-2 ROUGE-SU4 to address the issues of combining question an-
Our method 0.065 0.115 swering models for text summarization.
Human AVG 0.11 0.17
System AVG 0.07 0.13