Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Assignment # 1
BY
M. Suhaib Khalid
Roll no: 105
BSCS (4th Semester)
Section: C
Submitted to
Sir. Umar Farooq
Language
If Σ is an alphabet, and L⊆Σ∗, then Lisa(formal) language over Σ.
A (possibly infinite) set of strings all of which are chosen from some Σ∗.
A language over Σ need not include strings with all symbols of Σ. Thus, a language
over Σ is also a language over any alphabet that is a superset of Σ.
A language is a set of strings. One special language is Σ∗, which is the set of all
possible strings generated over the alphabet Σ∗. For example, if
Σ = {a,b,c} then Σ∗ = {a,b,c,aa,ab,ac,ba,...,aaaaaabbbaababa,...}
Namely, Σ∗ is the “full” language made of characters of Σ. Naturally, any language
over Σ is going to be a subset of Σ∗.
Examples:
1. Programming language C
2. Legal programs are a subset of the possible strings that can be formed from the
alphabet of the language (a subset of ASCII characters).
Other Language Examples:
1. L = {b,ba,baa,baaa,baaaa,...}
2. L = {aa,ab,ba}
3. Σ∗ is a language for any alphabet Σ.
4. ∅, the empty language, is a language over any alphabet.
5. The language of all strings consisting of n 0’s followed by n 1’s (n≥0):
{ǫ,01,0011,000111,...}
6. The set of strings of 0’s and 1’s with an equal number of each:
{ǫ,01,10,0011,0101,1001,...}
7. { w | w consists of an equal number of 0’s and1’s }
Lexicographic ordering
Lexicographic ordering of a set of strings is an ordering of strings that have shorter
strings first and sort the strings alphabetically within each length. Naturally, we
assume that we have an order on the given alphabet.
Example:
For Σ = {a,b}, the Lexicographic ordering of Σ∗ is a,b,aa,ab,ba,bb,aaa,aab,....
Concatenation
The concatenation of languages L and M, denoted L.M or just LM, is the set of
strings that can be formed by taking any string in L and concatenating it with any
string in M.
Examples:
If L = {001,10,111} and M = {ǫ,001} then,
L.M = {001,10,111,001001,10001,111001}
Closure
The closure of a language L is denoted as L∗ and represents the set of those strings
that can be formed by taking any number of strings from L, possibly with
repetitions (i.e., the same string may be selected more than once) and
concatenating all of them.
Examples:
1. If L = {0,1} then L∗ is all strings of 0 and 1.
2. If L = {0,11} then L∗ consists of strings of 0’s and 1’s such that the 1come in
pairs, e.g., 011, 11110 and ǫ. But not 01011 or 101.
Lexical Analyzer
1. The first phase of a compiler.
2. Lexical analysis: process of taking an input string of characters (such as the
source code of a computer program) and producing a sequence of symbols called
lexical tokens, or just tokens, which may be handled more easily by a parser.
3. The lexical analyzer reads the source text and, thus, it may perform certain
secondary tasks:
Eliminate comments and white spaces in the form of blanks, tab and new line
characters.
Correlate errors messages from the compiler with the source program ( e.g, keep
track of the number of lines).
4. The interaction with the parser is usually done by making the lexical analyzer be
a sub-routine of the parser.
Tokens, Patterns, Lexemes
1. Token: A token is a group of characters having collective meaning: typically, a
word or punctuation mark, separated by a lexical analyzer and passed to a parser.
2. A lexeme is an actual character sequence forming a specific instance of a token,
such as num.
3. Pattern: A rule that describes the set of strings associated to a token. Expressed
as a regular expression and describing how a particular token can be formed. For
example,
[A-Za-z] [A-Za-z_0-9] *
The pattern matches each string in the set.
A lexeme is a sequence of characters in the source text that is matched by the
pattern for a token.
Example:
Lexical Errors
1. Few errors are discernible at the lexical level alone.
2. Lexical analyzer has a very localized view of the source text.
3. It cannot tell whether a string fi is am is spelling of a keyword if or an identifier.
4. The lexical analyzer can detect characters that are not in the alphabet or strings
that have no pattern.
5. In general, when an error is found, the lexical analyzer stops (but other actions
are also possible).
Evaluator
1.Goes over the characters of the lexeme to produce a value.
2.The lexeme’s type combined with its value is what properly constitutes a token,
which can be given to a parser.
3.Some tokens such as parentheses do not really have values, and so the evaluator
function for these can return nothing.
4.The evaluators forintegers, identifiers, and strings can be considerably more
complex.
5.Sometimes evaluators can suppressa lexeme entirely, concealing it from the
parser, which is useful for white space and comments.
Grammars
1. Precise, easy-to understand description of syntax.
2. Context-free grammars -> efficient parsers (automatically!).
3. Help in translation and error detection.
4. E.g. Attribute grammars.
5. Easier language evolution.
6. Can add new constructs systematically.
Syntax Errors
1. Many errors are syntactic or exposed by parsing.
2. E.g. Unbalanced ().
3. Error handling goals:
4. Report errors quickly & accurately.
5. Recover quickly (continue parsing after error).
6. Little overhead on parse time.
Error Recovery
Panic mode
1. Discard tokens until synchronization token found (often ‘;’).
Phrase level
1. Local correction: replace a token by another and continue.
Error productions
1. Encode commonly expected errors in grammar.
Global correction
1. Find closest input string that is in L(G).
2. Too costly in practice.
Context-free Grammars
1. Precise and easy way to specify the syntactical structure of a programming
language.
2. Efficient recognition methods exist.
3. Natural specification of many “recursive” constructs:
4. expr -> expr + expr | term.
Example:
G = ({-,*,(,),<id>}, {E}, E, {E -> E + E, E-> E * E , E -> (E) , E-> - E, E -> <id>})