Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Mark DeArman
Math 320
12/11/2003
Introduction
Linguistics is the scientific study of language. Key to the scientific study of any
subject is the ability of the scientists to represent their findings in the concrete language
of mathematics as apposed to simple empirical results. In this paper, I will introduce the
mathematical axioms and structures used to model language and grammar in a way such
that mathematical analysis is possible. Three sections divide the content logically so each
section builds upon the last. The only prerequisite knowledge assumed, is a basic
understanding of abstract algebra, topology and set theory.
The first section describes the algebraic structures that are building blocks for a Formal
Context-Free Linear Grammar. The purpose of this chapter is to refresh prerequisite
knowledge and show applications of those structures to language modeling.
The second section explains the axioms used to define a context-free linear grammar
and show examples of the flexibility of the structure. The purpose of this chapter is to
show in detail how a CFG is constructed from the building blocks of section one. The
section concludes with a discussion of transformations between grammars, which are
important structures to sentence formation or ever translation work in natural language
computing.
The final section describes how two CFGs can be analyzed using topological
techniques to determine their similarity. The purpose of this section is to show more
application of the previous two sections content. Though the final proof given in this
section is incomplete, it is included to promote further study in this area, again applicable
to translation work.
Section I :
Algebraic Structures
Key to visualizing and solving a problem in mathematics is a deep knowledge of how the
structure of that problem is setup. In this chapter, I will explain the use of the various algebraic
structures that contribute to the formalization of context-free grammars.
The basic building block of natural language is an alphabet. We define an alphabet as a finite
set of distinguishable elements called characters. For example we can define an alphabet of six
characters as a set such that = {,,,,,}. It is important to note that has no underlying
structure and is simply an unordered set of elements (Hockett, 55.)
A semigroup is a non-empty set of elements closed under association. Let be a semigroup
(,). Then for any x,y , (xy) z = x(yz) will hold. In terms of our alphabet , we
are interested in a more specific type of semigroup called a free monoid (Hockett, 52.)
A free monoid is a semigroup with an identity e=, closed under association (), and
concatenation (). Let F be a free monoid, then F() is a free monoid over the alphabet whose
elements F are all the finite strings over such that F={ x | i=0 [ i =0 x ]}
Since F() contains all the finite strings of its order is infinite. For example, F() must
contain all the individual characters of along with all the combinations of their concatenation
and association. If is simply the null set, then F() contains simply the string. Let
={ ,,,,,,,,} and let continue to be the as defined above. Then it follows that F()
F() , since F() contains a infinite number elements which differ from F() (Hockett, 56.)
It is important to note that from the previous example that since there must be some
isomorphism f : F() F(). If we let H : F()F() be the inclusion group generated by f ,
then H is a simple example of a language generated by the free monoid F() (Spanier, 2.)
Obviously, such a simple isomorphism does nothing to characterize a natural language. Our goal
in mathematical linguistics is to find isomorphisms f0 , f1 , , fn , which generate a group H such
that strings of H mimic natural language. The collection of functions f will be developed in the
next section into a definition of formal context-free grammars as linear generative grammars
(Hockett, 58.)
From the definition above a trivial grammar G can be constructed as an example to find
H(G) F() such that |H(G)| = 0 < n < . Granted that this trivial case in no way
resembles a natural language, it shows all the steps necessary to find some H(G) which
does.
Example One:2
if s = I
This definition is taken from Hockett, 59-61 and contains only slight modifications to better fit the needs of this paper.
This example was derived from Hockett, 62 and contains slight modifications to better fit the needs of the paper.
language H(G) of a more complex structure, more applicable to the discussion of Section
III.
Example Two:3
Let A = { I ,b,B,l,L,p,P } and let T = { b,l,p }.
Let R be the set containing the following rules:
if s =
if s = I
.
R3 ( s ) =
R1 ( s ) =
otherwise
otherwise
if s =
if s =
R2 ( s ) =
R4 ( s ) =
otherwise
otherwise
Let G(A,I,T,R,) be a grammar and Let C(I)=s be some permutation of S(I) which
generates the terminal string for the system of rules.
Visually, a certain C(I) for this G would be structured as follows:
I
B
L
R1 :
R2 :
b
L
R3 :
b
l
P
b
l
p
R4 :
Note that the ordering of the rules of R is not necessary since only permutations of S
which generate valid outstings are acceptable for a choice of C(I). Observe that the
structure of C gives rise to a tree topology (Blackett, 165.)
Let T0 be a topological network formed by the ordering of C(I) for the above permutation
of S. Then T0 follows, where each level of the tree represents a transformation by Rn
which either fixes elements or replaces them based on a given rule.
I
R1
R2
R3
R4
Engelking, 13.
The larger the set P and P, the more likely the homotopy is to exist and be continuous
from K into K because P B(x) (Kuroda, 183.)
The results given in this section are by no means complete. The information is presented
here as a starting point for further research into the topology of various classes of phrase
structures.
Conclusion
This paper is the culmination of at least a year of research and has done nothing but
open further doors, awaiting exploration, by the Author.
I have tried to give a brief overview of basic algebraic structures and their application to
linguistics. In the third section, the explanation may have gone off the deep end and it is
obvious more research needs to concentrate in this area.
In a final section, which I would have included space permitting, I would have liked to
investigate some of the more geometric representations of phrase structure applying
results from Geometric Topology and graph theory to obtain further analysis. This seems
the most promising area of study, since the algebraic homotopy relations do not yield to
easy or meaningful solutions. Matrix representations of phrase structure vertex and edge
equations might be easy to solve, as vector processing gets faster and faster on modern
computers.
My conclusion is of course as stated earlier, that more research needs to be done before
any meaningful results can be derived.
o Hockett, Charles F. 1967. Language, Mathematics, and Linguistcs. Mouton and Co,
Paris.
o Spanier, Edwin H. 1996. Algebraic Topology. McGraw and Hill, New York.