A Reduction Model For Natural Language Text Retrieval

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 17, ISSUE 1, JANUARY 2013 23
A Reduction Model for Natural Language Text Retrieval

Enikuomehin, A.O, Rahman and M.O. Sadiku, J.S
Abstract Many modeling methodologies have been introduced to Information Retrieval processes over the years, all of which are aimed
at achieving a suitable high performance retrieval system. In this paper, we present a new paradigm for a human usable knowledge based information retrieval system that could perform its functions in a way close to how human will behave in such situation. Artificial Information retrieval problem is formulated as physical problems with representations made in terms of general algebraic processes. These were used in generating the usefulness of each text term in raltion to its position. Textual measure within documents is incorporated into Natural Language Interface and proposed as a way of representing documents similar to physical measurement on quantum state. This concept is then implemented within the framework of a well defined Local Appropriator. The concept of Local Appropriator (LA) is used since some terms, considered not relevant, will be unwrapped during the process of extraction of information from the document space. Some mathematics was used to generate the needed expressiveness. Models are implemented by providing an executable algorithm for the retrieval process. Result shows that the retrieved documents are of high degree of relevance and experiment confirms that this new model outplays other existing model for Human usable Information Retrieval interfaces.
Index Terms Quantum Logic; Textual wrapper; Local Appropriator; Natural Language Interface; Information Retrieval
1 INTRODUCTION
he information age has brought in complex challenges which are related to the management of infor-
some of which could be the irrelevant, unreliable, inaccurate, outdated, misleading, etc.? The "IR problem" is especially problematic as knowledge increases, as the number of media and platforms increases, as the integration of media grows, as the interoperability of platforms increases, and as we face information overload. Representation of text documents have earlier been based on the occurrence of terms in documents. This means that almost all the well established models have had a benchmark on the process of the occurrence of term. Some of these models include: the Boolean Model (BM), Binary Independence Model (BIM), Vector Space Model (VSM), Language Model (LM), and lately Fuzzy Retrieval Model (FRM). These models are exclusively discussed in [1][2][3]. Statistical analysis has been shown to be the most recently successful method of text indexing which is based on the occurrence of terms in documents [4]. The approach in this work is a sharp diversion from the approaches of earlier work. Here, we see documents as been considered useful or relevant based on the text it contains. These texts are human usable texts which are the basis of language presentation and thus make it comfortable for us to consider such, as a natural language text. Objects, maybe textual or multimedia, will be considered as states of
mation and its sources. Information is useless unless it can be found and used. Searching for information is an event that takes place almost every other minute by every other person, this act, which can both be formally and informally defined, has made searching a human act. This process is characterized by the fact that the user of the information request for documents via query to solve his information need (IN). The problem therefore can be broadly classified into two: (1) the appropriate presentation of the user information need and (2) the relevance of the retrieved document. The first problem identified here is largely due to the users knowledge of the domain of discussion. This context forms the basis of the Information Retrieval (IR) Problem, coined the IR problem. It is the problem of How do we distinguish what we want from the sea of what we do not want, especially the bad,
A.O Enikuomehin is with the Computer Dept. of the Lagos State University.Nigeria. M.O. Rahman is with the Department of Computer Science, Lagos State University, Nigeria. J.S Sadiku, is with the department of computer science, university of Ilorin, Nigeria.
24
physical systems and their features (such as term) can be viewed as physical observables to be measured in such systems. Quantum Mechanics relation to Information Retrieval in this direction has earlier been proposed in Van Rijsbergen [5] since a holistic implementation was not achieved, such proposal did not go further into implementation in Natural Language systems. We can employ the techniques of Quantum Mechanics as prescribed [6] to represent information and documents as measure of observables in physical system. Quantum theorys foundations currently rest on abstract mathematical formulations known as Hilbert spaces and C* algebras. These abstractions work well for calculating the probability of a particular outcome in an experiment. But they lack the intuitive physical meaning that physicists crave the elegance of Einsteins theory of special relativity, for instance, which says that the speed of light is constant and that laws of physics dont change from one reference frame to the next [7]. Purification, a system with uncertain properties (a mixed state) is always part of a larger pure state that can, in principle, be completely known was developed. Consider the pion, this particle, which has a spin of zero, can decay into two spinning photons. Each single photon is in a mixed state it has an equal chance of spinning up or down. The pair of photons together, though, comprises a pure state in which they must always spin in opposite directions. [8] Conclude that there is only one way in which a theory can satisfy the purification postulate: it must contain entangled states. The other option, that the theory must not contain mixed states, that is, that the probabilities of outcomes in any measurement are either 0 or 1 like in classical deterministic theory, cannot hold, as one can always prepare mixed states by mixing deterministic ones. The purification postulate alone allows some of the key features of quantum information processing to be derived, such as the no-cloning theorem or teleportation [9].These presentations were collectively analyzed and scientist presented some additional axiom to support its application in information retrieval. This is not complex, since in documents retrieval, the essence of reference is determined by terms, this is closely synonymous to the process of extracting features of a physical system. The extraction is centrally controlled by a form of object relation. This
relation will be referred to as textual relation to signify that the relation is performed on text terms of retrieved documents. The textual relation is related to the term frequency of documents, again this also makes it comfortable for us to represent this as textual relation. The main idea in this work is to present a formulation for representing documents such that a generated retrieval process can be implemented on a natural language interface. Such development will assist in the presentation of an interface that can perform information retrieval on Natural Language text. Quantum mechanics is an important part of physics and an underlying logic has been designed in theory. This refers to the widely used Quantum theory [10]. Quantum theory offers impressive advantages over classical theory in the estimation of physical parameters and this has been shown in several works supported by core physics principle [11],[12],[13],[14],[15]. The prototypical example is the estimation of an unknown phase shift where the variance vanishes as N2 with the number N of accesses to the phase-shifting process, whereas a classical statistics over independent copies would give the scaling N1 The user of information need have been shown to be in anomalous state of knowledge [16]. In this, the state of such user changes as more information is obtained. This is closely related to physical system such as wave collapse. A phenomenon in which a wave function initially in a superposition of several different possible eigenstates appears to reduce to a single one of those status after interaction with an observer, that is, the reduction of the physical possibilities into a single possibly as seen by an observer. The state is described as an element of a projective Hilbert space expressed in Dirac (bra-ket) notation as a vector:
| = ci | i
i
(1)
The kets
| 1 , >, | 2 >, | 3 > ... represents the alterna i | j = i j

(2)
tives (other available quantum states) which forms an orthornormal satisfying
This representation is convenient to define the IR problem in mathematical framework. .
25
2 TEXTUAL RELATION OF DOCUMENTS

Detailed Probability has been used to make clear representation of the state of physical systems. This is inherent in the existing IR models other than the VSM. It defines the probabilities of possible outcomes of such systems. In these, the state of the physical system can have some of the measurement outcome determined but not all of them. This is supported by the Werner Heisenberg uncertainty principle (In quantum mechanics, the uncertainty principle is any of a variety of mathematical inequalities asserting a fundamental limit to the precision with which certain pairs of physical properties of a particle, such as position x and momentum p, can be simultaneously known. The more precisely the position of some particle is determined, the less precisely its momentum can be known, and vice versa). One of the two properties can be determined with certainty while the state of the other becomes uncertain. When the first outcome is determined, the success of others based on the textual relation is dependent on the pattern at which it is performed such as that of measurement of physical systems. This follows that in document retrieval, when a probability is assigned to a particular term, the probability of relevance of other terms within the document remains uncertain. This leads to a further definition on the framework of these concepts. The purpose of measurement is to provide information about a quantity of interest, No measurement is exact. When a quantity is measured, the outcome depends on the measuring system, the measurement procedure, the skill of the operator, the environment, and other effects. The uncertainty has a probabilistic basis and reflects incomplete knowledge of the quantity. All measurements are subject to uncertainty and a measured value is only complete if it is accompanied by a statement of the associated uncertainty. The measure can be said to be compatible or incompatible. The compatible subsets are those that their textual relation outcomes can be predetermined while the other is the reverse. Obviously, for the incompatible textual terms, less information will be available. The information helps to know the maximum term occurrence for such measures. This is important to formulate the definition of our textual relation for document to match the concept of physical system used as an analogy. An important concept in the research
of processes in IR is the degree of relevance of results of each model. In this work, a matching algorithm is also developed as a methodology for establishing the relevance level of components obtained from documents as regards the content of the query
4 APPLICATION OF LOCAL APPROPRIATOR

The use of the concept of Local Appropriator is fundamental, the concept of Local Appropriator have been given an elaborate discussion in [17]. The LA is an implementation engine that uses a complete flexibility to implement the information retrieval model. It is based on the transformation that can be applied to natural language text. These transformations uses the Textual Wrapper (TW) within the Local Appropriator and it will be denoted by TW(t,n) where t is the central term and n is number of terms on either side. The appropriator in the LA establishes that relevant terms are appropriated (accounted for) and the irrelevant terms are immediately discarded to reduce the search space. The appropriated text terms will be referred to as Textual Wrapped term. Thus, the total number of term is given in terms of tokens as X = 2 w + 1 tokens ( 3) Where w is the total number wrapped items and X the total number of items in search space. A token is an occurrence of a term from the text of a field. It consists of a term's text, the start and end offset of the term in the text of the field, and a type string. A Token can optionally have metadata. A representation for TW(t,n) within a LA can thus be defined as a conversion/transformation/translation of the document D to D1 with some appropriated token such that TW(t,n)D=D1 (4) This formulation can be presented in our quantum analogy. This also allows the descriptions of operations that can be defined on them and establishing the capabilities contained which will allow them to be likened to a proper implementation of a quantum like IR system. The second part of the appropriator can also be sub-categorized. This research proposes the use of the complete the LA system but further emphasis is on the IR usable NLI model which
26
can be incorporated into the system. The quantum consideration begins by a review of the work of [18 ]where he established that for any quantum measurement to be useable, it must obey the following three properties; (i) idempotency (ii) ordered structure (iii) a possibility for non-communicative. In the LA, each of the properties is well formulated under the categories and can be further discussed as: (i) Idempotency: multiple application yields same result as a single application. That is, applying our text wrapping operation for a given query returns the same result with that of the first operation. That is, applying them in a number of times give the same result as applying them once. This can also be explained with this example: Let D be a document D will either exist as
stands for the inverse document frequency of form ti given as
idf (ti) = log ( N / n)
since we consider any form
of repository, we thus extend to any type of database where N represents the total number of documents and ni is the total number of documents containing the index term ti
LAi (dj )(ti) = ( Ai(ti)(dj ) 2 (5)

and using the weak conjunction principle of Lukasiewicz implicator,
tk
[20]
we
have
sim(di.q) = min(1,1 LA), (dj )(ti) + LA( q ) (ti) . (6)

The above ensures that a single value is used in representing and evaluating the similarity between document d and query q. It means that a relation exist between the appropriator index. Such relation can be generalized as LAi < LA, where 1, 2 n represents the position of the occurrence. In the document D example above, if (LA(is, 2) and LA(is, 3) is considered, on applying to D, both will appropriate the terms leaving ( is
DrQ rW where r and r

as:
is relevant or not relevant ap-
propriately. Then, let D be a natural language text given D= for us or not for us, very simple. The TW (is, 2) to D, will be appropriated to D given as D1= us, very simple. Then if TW (is, 2) is performed again no further reduction will take place because all the term are already contained with the defined space. (ii) Ordered structure is formulated from the type of relation they possess. If given a natural language text as above, and on application the same result is achieved. In the following algorithm, the probability of the token is not considered, as the first consideration is the determination of relevance by appropriately expressing the terms that were not reduced in the document. A model for this formulation is presented. Meanwhile, it must be noted that the process of generating relevance includes a methodology for relaxing the users query. An appropriator index does the relaxations. For a universe of document, this can be applied as a transformation of the proposal of Roelleke [19];
( Dt rQ).LA2 (is,2) LAi (is,2) LA2 ( )
) as it will
also leave the terms wrapped by LAi(i). This can mathematically be expressed as
LAi LA2 D; LA2 [ LAI Di ] = L A2 D; L AI L A2 ...D; LA F ... . L
(7)
Incompatibility can also exist and be represented as (not equal) given
The results of applying LAs at different t (central term) are not the same and thus they do not commute. This formation compares excellently with the principles of Quantum theory where the same baseline properties such as measurement are incompatible with other properties such as wavelength. This is the situation in wave collapse. States in quantum theory are called quantum states and they refer to the complete and a maximum summary of the characteristics of the quantum system at a moment in time. [21] Maintained that the
function is now a means
for predicting probability results of measurements of physical systems. The states are represented by a unit vector in Hilbert space, that is, the states are such that
fi, j idf (ti) LA1 (d j ) = . 1 where Maxf kij Maxidf (t k ) k k
idf(ti) (4)
|| || 2 =< | >= 1.
(8)
The performance of this part of the retrieval process takes
27
place just after the token are parsed. So in our word, the index term is key for generating expressiveness and relation for the terms. The appropriator fully uses some proposition like the textual relation between terms which helps in the generation of probability for terms in the given document. If the proportion is Dr(Q) then, the terms are appropriated leaving those terms that
Will is the central term ti, with w, then Smith have associated width w-1 because the two terms appear together. This can be represented as
LA(Will , w) LA( Smith, w 1)
and this will hold for
several related terms that are commonly used together such as in Robin Van Parsee.
D rQ . fulfils
Thus, the appropriation can be thought of as propositions themselves that allows for some logical operations to be defined on them. Similar proposition has been used initially in the logics for implementation of retrieval processes but can also be extended and used in the discus-
LA( Robin, w) LA(Van, w 1) ( LA( Pearse, w 2) (10)

This can be complex depending on the size of terms and also considering the complexity in human natural language. This aspect of this work will be studied deeply in the areas of natural language complex counterfactual statements.
sion of our appropriator. They are the complement , the and the intersection. The complement , exunion
cludes all term that are not in the set Dr(Q) and thus they
4 TEXTUAL MEASURE
From the above, it can be seen that any term of a document have a textual relationship with other. The rigorous mathematical foundation of quantum mechanics is generally agreed to be based on von Neumanns formulation, which uses the notion of a state of quantum logic, a probability measure on the collection of quantum events, i.e., on the closed subspaces of the Hilbert space. An observable, discrete or continuous, is defined as a mapping from the subsets of the real line to the collection of states of the quantum logic. These two notions are then used to derive the expected value of an observable, which provides a consistent quantum theory treating discrete and continuous observables in a uniform manner. Gleasons fundamental theorem shows that, for Hilbert spaces of dimension greater than two, the states of logic are in one to one correspondence with density operators, i.e. trace class operators on the Hilbert space with trace one, and thus allows us to work with the more convenient density operators instead of the probability measures on the collection of closed subspaces of the Hilbert space. Gleason's theorem states that any totally additive measure on the closed subspaces, or projections, of a Hilbert space of dimension greater than two is given by a positive operator of trace class. All standard programming languages allow for non-trivial recursion, a fundamental feature of computation which can potentially result in the non- termination of programs [23]. The search space is reduced following the application of
do not form the next search space. If given two D (appropriator index) on a set of documents, the union returns all terms that are satisfying (DrQ) without the ones where
D rQ > 1. The intersection defines a similar case but considers the meet. The following follows mathematically;
( L A1 L A2 ) L A1 ( LA1 LA2 ) = (LA1 L AL ) ( L A1 L AC )
. (9)
The simplest form of the appropriators is one that ignores all terms except the one, maintaining that every other term satisfies
( Dt rQ) .
Such an appropriator can be de-
fined as: LA (t,0). Such will be easy to represent in the search space as a one dimensional projector. The application of the mind to each natural language text natural makes the projector to be orthogonal to one another since if one apply one to the document, the result of applying another will make other term to be ignored such that is
LAi (t1, 0) LAL (t1, 0) = 0 which mrans t1 t 2 .
The LA
further explains the reality of the proportion of [22] on the co-occurrence of terms. In this, if given term ti and tj in w with n(Ei) and n(tj) are determined by width w, all in a document D, the general space is reduced by applying LA(ti, w) to D until a term is achieved where LA(ti,0) can be applied such that the resulting words are enumerated. Consider a document that contains Will Smith together and the LA centers on Will (see section 2) with respect to some relation w called the width which is a measure of the position of the central term in the document. Thus, if
28
the textual measure as non-relevant terms are ignored. Thus, a truth value of every token can be achieved using probability of the occurrence of such term. Gleason theory explains the natural way of presenting probabilities. Gleasons theorem can be generalized as follows: Let
If given three documents; d1; social sciences student with ICT knowledge d2; philosophy lecturers do not use ICT d3; ICT staff and query q: student usage of ICT. The set K of index terms is therefore K=(social, science, student, ICT, with, knowledge, philosophy, lecturer, do, use, staff) with t, = social, t2=science tu=staff which can be transformed as follows: We assume the above documents are presented as a large corpus, we can therefore present its relative frequencies as:: A(3), B(2), C(1) So if we assume that the collection contains 10,000 documents and document frequencies of these terms are: A(50), B(1300), C(250) Then: LA(D1): tf = 3/3; idf = log(10000/50) = 5.3; tf-idf = 5.3 LA(D2): tf = 2/3; idf = log(10000/1300) = 2.0; tf-idf = 1.3 LA(D3): tf = 1/3; idf = log(10000/250) = 3.7; tf-idf = 1.2 This generates the same result when implemented with
be any measure on the closed subspace of a sepa-
rable (real or complex) Hilbert space H of dimension at least 3. There exist a positive self adjoin operator T of the trace, such that for closed subspaces of H, we have
= tr(Tp L ); (L) = to(TPL ) P

(H ) = 1
which is the projec-
tion onto L, and an operator T, of trace class provided T is positive and its trace is finite. The interest is the measure in which (11) trace
tr(TP1+ ) = tr(T1 ) = tr(T ) = 1.
The above defines the Gleason way of generating associated probability. The trace is referred to in this work as pre-probability. In linear algebra, the trace of an n-by-n square matrix A is defined to be the sum of the elements on the main diagonal (the diagonal from the upper left to the lower right) of A, i.e.,
(11) where aii represents the entry on the ith row and ith column of A. The trace of a matrix is the sum of the (complex) eigenvalues, and it is invariant with respect to a change of basis. Its properties were also carved in [22]. Literally, given M as set of objects about Male and F as set of objects about Female. If
our transformation:
LA(di )(t1 ) =
idf ( f i ) fi.i . max max k , f k ,1 idf ( fx) k
(14)
Thus, the document d1 is the most relevant. After the application of the Tw, the unwrapped text occupies uncertain position since information about them is unknown and they are no longer considered. The non zero other terms that is, not the central term has a probability of almost zero but net zero. However, the probability distribution of the central term is zero, others except the central term will have a probability give us
M F
represents set of objects about both, then we
can think of tr(.) as an indexing operation which will generate a set of objects for a given set of attributes. If tr(m) is generates for male, tr(f) generates for female. Then
tr ( M F ) tr (M L) = {E V a t aM F ) .(13) tr ( M F ) is not an
1 N k 1
where Nk is the length of the natural language text.
However, we need to establish that
The concept of textual relation is now inherent as the probability distribution is based on length. Since the central term determine the probability distribution
artificial class which is only achievable by defining an appropriate probability. With probability, a non-Boolean logic has been initialized thus we can insist that the probability obeys some given conditions. As example,
5 MODELLING THE WRAPPING WRAPPING
29
FRAMEWORK
determined. It should be noted that when the Tw acts on such documents, some of them will be unwrapped and thus their term cannot be determined. The unwrapped term can be any term except the central term. Thus, if the information
p
5.1 Intersection
The intersection of two Tw, is the transformation that preservers the information preserved by both Tws;
is
given
in
terms
of
position
as:
t , Ot ((W (a, wa ) B W (b, wb )) = Ot ((W (b, wb ).Di) .

The wrapped text is part of the information unit in which the concept represented by the central term is involved in some ways. If all the text is considered close enough to the central term, then the Tw can wrap all the text in a given document. As posted earlier, when applied at first, information is last but on further application, no further information is lost. This is similar to the property of idempotency. Idempotency is a property of certain operation in mathematics that an operation can be applied multiple times without changing the result beyond the initial operation. This concept arises in a number of place in abstract algebra especially in the area of projectors and closure operators and functional programming in which it is connected to the property of referential transparency). Generally; two idempotent a and b are called orthogonal if ab=ba=0. If a is idempotent in the ring R then so is b=1a, a and b are orthogonal. The idempotency can be expressed as
log( Nv) + t t (is in position s) log( w(t...x)) log( NV )
= I positions (18)
since terms with that probability will have a probability
Nv
when Nv is the size of the natural language text.
Therefore, the information after applying the text wrapper cannot be represented like the previous one above, The complete information after Tw can be given as (19) Where LD is the length of the document, Nv the size of the Natural Language Text and Nu the unwrapped tokens;
I (w(t , p) D) = N u + ( LD N u )(1 log( N v1 ) 1 log N v1 log( N v 1) = N u log( N ) + ( LD NU ) log( N ) log ( N v ) v v
log( N v1 ) 1. log( N v )
6
(20)
CONCLUSION
D, (W (t , p) = [ w(t , p).D] .(15)

5.2 Union The unions of a Tw exist if the set of natural language text wrapped by the first Tw is the same as the one wrapped by the union of two Tw when applied to a given document D in a set k
In this paper, we have provided formalism for retrieving natural language text based on the wrapping techniques of text positioning. This process is well justified as it supports some benchmark mathematical processes. This justification is similar to the behavior of physical systems when subjected to representation by measurement. This has not only led to futher understanding of the IR process but also provide a framework for the inprovemnet of (16) such models. a conclusion may review the main points of the paper, do not replicate the abstract as the conclusion. A conclusion might elaborate on the importance of the work or suggest applications and extensions. We present a Local Approximator where the implementation framework is completed. We introduced an improvement on search result based on reduction of search space by using appropriate measures to determine relevance. The reduced search space can then be separated with each document as vectors. Since the interest is in the implementation in a Natural language interface, we extend the work into understanding the relationship between terms. This paper has
( w(t i , pi ) w(t 2 , p 2 ) (Dk , [ w(t i , p1 ) B w(t 2 , p 2 )]D = u w(t i , pi ) D)

items as; Tw[D(A)]
This transformation can be summarized for the wrapped
(17) It should be noted that same set of text in same positions are wrapped. Such that it can be said to have been preserved. The position where term can be determined will be assigned a probability and others with unidentified position with another probability thus, if all the positions of a document is independent, the total information of the document would be a precise length when all terms are
Tw[D,(A)D(B)]
30
only extensively discussed the retrieval process however a major task is the appropriate formulation of information need thus a further direction from here is to apply the same techniques to the query formulation process.
[18] M.A., Nelsen and I.L., Chuang (2000) Quantum computation and Quantum information, Cambridge, UK: Cambridge University Press. [19] T. ,Roelleke, (2003), A frequency-based and a poisson-based denition of the probability of being informative, in J. Callan, G. Cormack, C. Clarke, D. Hawking and A. Smeaton Eds), SIGIR 2003: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, New York, pp. 227234. [20] J. ukasiewicz, (1920) O logice trjwartociowej (in Polish). Ruch filozoficzny 5:170171. English translation: On threevalued logic, in L. Borkowski (ed.), Selected works by Jan ukasiewicz, NorthHolland, Amsterdam, 1970, pp. 87 88. ISBN 0-7204-2252-3 [21] A. M. Gleason. Measures on the closed subspaces of a Hilbert space. Journal of Mathematics and Mechanics, 6:885894, 1957 [22] P. Bruza, D. Song, (2003), Towards content sensitive information inference. Journal of American society for information Science and Technology, 54(3):pp321-334,2003 [23] A. Edalat A, (2004), An extension of Gleason's theorem for quantum computation, Imperial College London SW7 2BZ. International Journal of Theoretical Physics (impact factor: 0.85).; 43(7):1827-1840
REFERENCES
[1] I.Van Rijsbergen, C.J., Information retrieval. Buttermorths, Chapter 2. Automatic text analysis, 1979 [2] J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.6879 [3] R. Nicole, Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (1999). [4] [5] W.B. Croft, A language modeling approach to information retrieval. In proc of SIG, 298, 1998, pp.275-281. C.J. Van Rijsbergen, The geometry of information retrieval, Cambridge universal press), 2004 [6] A. Huertas-Rosero, L. Azzopardi., C. Van Rijsbergen : characterizing through erasing: A theoretical framework for representing documents inspired by quantum theory in P.D. Bruza, W. Lawless C.J. V.R., ed: Proc. [7] [8] [9]
2nd
AAAI Quantum interaction symA. O Enikuomehin is the head of training, ICT centre of the lagos State University. He is a programmer with expertise in interface generation for natural Language processes. He has published in well reputable journals. M.O. Rahman is the head of Dept, computer science at the Lagos State University. He has published widely in area of computer science. J.S. Sadiku, is the Head, Department of computer science, university of Ilorin, Nigeria
posium, Oxford, U.K. College publication pp160-163, 2008 A. Einstein, B. Podolsky, N. Rosen, Can quantum-mechanical description of physicalreality be considered complete? Physical review 47(10) (1935) 777780 G.Chiribella, G.M.D'Ariano, and P. Perinotti, "Probabilistic theories with purification", Phys. Rev. A 81, 062348, 2010 G. Chiribella, G. M. D'Ariano and M. F. Sacchi, Optimal estimation of group parameters using entanglement, Phys. Rev. A 72 042338 (2005 [10] R.P. , Feynman, Lectures on Physics: Quantum Mechanics. Volume 3. Addison-Wesley (1963) [11] G. M. D'Ariano, R. Demkowicz-Dobrzanski, P. Perinotti, and M. F. Sacchi, Quantum-state decorrelation, Phys. Rev. A 77,
032344 (2008) [12] G. M. D'Ariano, Physics as Information Processing,
in Advances in Quantum Theory, AIP Conf. Proc. 1327 7 (2011); Also arXiv 1012.0535 [13] A. Khrennikov, (2003) Quantum-like formalism for cognitive measurements. Biosystems 70(3) 211233 [14] I. Schmitt (2006)Quantum query processing: unifying database querying and information retrieval. Otto von Guericke Universitt Magdeburg, Institut fr Technische Informationssysteme [15] L. Hardy, (2001) Quantum theory from five reasonable axioms [16] N.J. Belkin, (2005), Anomalous State of Knowledge. IN: K.E. Fisher, S. Erdelez and E.F. McKechine(Eds.), theories of information behavior: A researchers guide. Medford, NJ: Information Today (PP. 44-48) [17] O. Enikuomehin and J.S.Sadiku, LANLI:A natural language interfacing tool for relational database query, International Journal of Advance research in Computer Science and Engineering., 2012

A Reduction Model For Natural Language Text Retrieval

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

A Reduction Model For Natural Language Text Retrieval

Caricato da

Copyright:

Formati disponibili

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 17, ISSUE 1, JANUARY 2013 23

A Reduction Model for Natural Language Text Retrieval

| 1 , >, | 2 >, | 3 > ... represents the alterna i | j = i j

tives (other available quantum states) which forms an orthornormal satisfying

This representation is convenient to define the IR problem in mathematical framework. .

2 TEXTUAL RELATION OF DOCUMENTS

4 APPLICATION OF LOCAL APPROPRIATOR

stands for the inverse document frequency of form ti given as

idf (ti) = log ( N / n)

since we consider any form

LAi (dj )(ti) = ( Ai(ti)(dj ) 2 (5)

sim(di.q) = min(1,1 LA), (dj )(ti) + LA( q ) (ti) . (6)

DrQ rW where r and r

is relevant or not relevant ap-

( Dt rQ).LA2 (is,2) LAi (is,2) LA2 ( )

LAi LA2 D; LA2 [ LAI Di ] = L A2 D; L AI L A2 ...D; LA F ... . L

Incompatibility can also exist and be represented as (not equal) given

function is now a means

fi, j idf (ti) LA1 (d j ) = . 1 where Maxf kij Maxidf (t k ) k k

The performance of this part of the retrieval process takes

LA(Will , w) LA( Smith, w 1)

and this will hold for

LA( Robin, w) LA(Van, w 1) ( LA( Pearse, w 2) (10)

( L A1 L A2 ) L A1 ( LA1 LA2 ) = (LA1 L AL ) ( L A1 L AC )

Such an appropriator can be de-

LAi (t1, 0) LAL (t1, 0) = 0 which mrans t1 t 2 .

be any measure on the closed subspace of a sepa-

= tr(Tp L ); (L) = to(TPL ) P

which is the projec-

tr(TP1+ ) = tr(T1 ) = tr(T ) = 1.

idf ( f i ) fi.i . max max k , f k ,1 idf ( fx) k

represents set of objects about both, then we

where Nk is the length of the natural language text.

However, we need to establish that

5 MODELLING THE WRAPPING WRAPPING

t , Ot ((W (a, wa ) B W (b, wb )) = Ot ((W (b, wb ).Di) .

log( Nv) + t t (is in position s) log( w(t...x)) log( NV )

since terms with that probability will have a probability

when Nv is the size of the natural language text.

D, (W (t , p) = [ w(t , p).D] .(15)

( w(t i , pi ) w(t 2 , p 2 ) (Dk , [ w(t i , p1 ) B w(t 2 , p 2 )]D = u w(t i , pi ) D)

This transformation can be summarized for the wrapped

032344 (2008) [12] G. M. D'Ariano, Physics as Information Processing,

Potrebbero piacerti anche