Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
T1 - Overview
yahave@post.tau.ac.il
http://www.cs.tau.ac.il/~yahave
1
Who
Eran Yahav
Schrieber Open-space
Tel: 6405358
yahave@post.tau.ac.il
Wednesday 14:00-16:00
http://www.cs.tau.ac.il/~yahave
2
What
Compiler
txt exe
Frontend Semantic Backend
Source Executable
(analysis) Representation (synthesis)
text code
3
Say What?
Compiler
txt exe
Frontend Semantic Backend
Source Executable
(analysis) Representation (synthesis)
text code
4
How
txt Lexical Syntax AST Symbol Inter. Code exe
Analysis Analysis Table Rep.
Gen.
etc.
Parsing (IR)
Turkish Executable
Coffee code
5
How II
6
Why?
7
Today
txt Lexical Syntax AST Symbol Inter. Code exe
Analysis Analysis Table Rep.
Gen.
etc.
Parsing (IR)
Turkish Executable
Coffee code
Goals:
•Understand project scope
•Learn how to use JLex
8
Turkish Coffee
(extended) subset of Java
Main features
Object oriented
• Objects, virtual method calls, but no overloading
Strongly typed
• Primitives for int, boolean, string
• Reference types, array types
Dynamic allocation and Garbage Collection
• Heap allocation, automatic deallocation
Run-time checks
• Null references, array bounds, negative array size
• Adapted with permission from Cornell course material by Radu
Rugina
9
Good News
No “static” modifier
No interfaces
No method overloading
(but still allow overriding)
No exceptions
No packages
No multiple files to handle
10
Better News
11
Jumping into the water
/** Sort the array a[] in ascending order
** using an insertion sort.
*/
void sort(int a[], int size) {
for (int i = 1; i < size; i++) {
// a[0..i-1] is sorted
// insert a[i] in the proper place
int x = a[i];
int j;
for (j = i-1; j >=0; --j) {
if (a[j] <= x)
break;
a[j+1] = a[j];
}
// now a[0..j] are all <= x
// and a[j+2..i] are > x
a[j+1] = x;
}
} // sort
12
Jumping into the water
class HelloTest {
public static void main(String[] args) {
Hello greeter = new Hello();
greeter.speak();
}
}
class Hello {
void speak() {
System.out.println(“I know Java, really!");
}
}
(see http://www.cs.wisc.edu/~solomon/cs537/java-tutorial.html)
13
Jumping into the water
class Pair { int x, y; }
C++ Java
Pair origin; Pair origin = new Pair();
Pair *p, *q, *r; Pair p, q, r;
origin.x = 0; origin.x = 0;
p = new Pair; p = new Pair();
p -> y = 5; p.y = 5;
q = p; q = p;
r = &origin; N/A
(see http://www.cs.wisc.edu/~solomon/cs537/java-tutorial.html) 14
Jumping into the water
p = new Pair();
// ...
q = p;
// ...
delete p;
q -> x = 5; // oops!
15
Jumping into the water
16
Lexical Analysis with JLex
Lexical
spec JLex .java javac
analyzer
tokens
17
JLex Spec File
Possible source
User code of javac errors
down the road
Copied directly to Java file
%%
DIGIT= [0-9]
JLex directives LETTER= [a-zA-Z]
Define macros, state names
YYINITIAL
%%
Lexical analysis rules
Optional state, regular expression, action
How to break input to tokens
Action when token matched
{LETTER}
({LETTER}|{DIGIT})*
18
User Code
package TC.Lexer;
import TC.Error.*;
import TC.Parser.sym;
…
any lexer-helper Java code
…
19
JLex Directives
20
Regular Expressions
$ end of a line
. (dot) any character except the newline
"..." ignore meaning
{name} macro expansion
* zero or more repetitions
+ one or more repetitions
? zero or one repetitions
(...) grouping within regular expressions
[...] class of characters - any one character enclosed in brackets
a – b range of characters
[^…] negated class – any one not enclosed in brackets
21
Example Macros
ALPHA=[A-Za-z_]
DIGIT=[0-9]
ALPHA_NUMERIC={ALPHA}|{DIGIT}
IDENT={ALPHA}({ALPHA_NUMERIC})*
NUMBER=({DIGIT})+
WHITE_SPACE=([\ \n\r\t\f])+
22
Lexical Analysis Rules
Rule structure
[states] regexp { action }
23
Action Body
Java code
Can use special methods and vars
yytext()
yyline,yychar (when enabled)
Lexer state transition
yybegin(state-name)
YYINITIAL
24
More on Lexer States
Example
“if” is a keyword token when in program text
“if” is part of comment text when inside a comment
25
<YYINITIAL> {NUMBER} {
return new Symbol(sym.NUMBER, new Token(yytext(), yyline,yychar));
}
<YYINITIAL> {WHITE_SPACE} { }
<YYINITIAL> "+" {
return new Symbol(sym.PLUS, new Token(yytext(), yyline, yychar));
}
<YYINITIAL> "-" {
return new Symbol(sym.MINUS, new Token(yytext(), yyline, yychar));
}
<YYINITIAL> "*" {
return new Symbol(sym.TIMES, new Token(yytext(), yyline, yychar));
}
...
26
Putting it all together –
count number of lines
File: lineCount
import java_cup.runtime.*;
%%
%cup
%{
private int lineCounter = 0;
%}
%eofval{
System.out.println("line number=" + lineCounter);
return new Symbol(sym.EOF);
%eofval}
NEWLINE=\n
%%
<YYINITIAL>{NEWLINE} {
lineCounter++;
}
<YYINITIAL>[^{NEWLINE}] { } 27
Putting it all together –
count number of lines
text
lineCount JLex lineCount.java
Lexical
java JLex.Main lineCount javac
analyzer
javac *.java
Main.java tokens
sym.java
} catch (Exception e) {
throw new RuntimeException("IO Error (brutal exit)");
}
}
}
30
Common Pitfalls
Classpath
Path to executable
Define environment variables
JAVA_HOME
CLASSPATH
Note the use of . (dot) as part of package
name / directory structure
e.g., JLex.Main
31
Assignment 1
class Token
At least - id, value, line
Should extend java_cup.runtime.Symbol
Numeric token Ids in sym.java
• Will be later generated by javaCup
class Compiler
class LexicalError
32
Token Class
import java_cup.runtime.Symbol;
33
(some of the) JLex directives to
be used
34
http://www.cs.tau.ac.il/~yahave
35