1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
|
Structure of the Myrddin Compiler
Aug 2012
Ori Bernstein
TABLE OF CONTENTS:
1. OVERVIEW
2. PARSING
2.1. Lexing
2.2. Parsing
2.3. Type checking
2.4. Generic Specialization
2.5. Serialization
2.6. Usefiles
3. FLATTENING
3.1. Control Flow
3.2. Complex Expressions
4. OPTIMIZATION
4.1. Constant Folding
5. CODE GENERATION
5.1. Instruction Selection
5.2. Register Allocation
6. TUTORIAL: ADDING A STATEMENT
6.1. Stubbing in the node types
6.2. Parsing
6.3. Flattening
6.4. Optimization
6.5. Instruction Selection
1. OVERVIEW:
The Myrddin compiler suite consists of a set of binaries, written in C,
which translate Myrddin source code to the assembly most appropriate for
the target platform, and subsequently invoke the native assembler on it.
The linker is not invoked by the compiler, and the final output is an
object file for the target platform.
The compilers are named with a single character for the target platform,
with a single character for the language being compiled. A table of the
compilers and their names is below:
Compiler Platform
-------------------------
6m x86-64
The compilation is divided into a small number of phases. The first phase
is parsing. The first phase is parsing, where the source code is first
tokenized, parsed, and semantically checked. The second phase is the
machine dependent tree flattening. In this phase, the tree is decomposed
function by function into simple operations that are relatively close to
the machine. Sizes are fixed, and all loops, if statements, etc are
replaced with gotos. The next phase is a machine independent optimizer,
which currenty does nothing other than simply folding trees. In the final
phase, the instructions are selected and the registers are allocated.
So, to recap, the phases are as follows:
parse Tokenize, parse and analyze the source.
flatten Rewrite the complex nodes into simpe ones
opt Optimize the flattened source trees
gen Generate the assembly code
2. PARSING:
This phase takes in a source file, and outputs a tree that is guaranteed
to be valid.
2.1. Lexing:
Lexing occurs in parse/tok.c. Because we desire to use this lexer from
within yacc, the entry point to this code is in 'yylex()'. As required
by yacc, 'yylex()' returns an integer defining the token type, and
sets the 'tok' member of yylval to the token that was taken from the
input stream. In addition, to allow for better error messages, the
global variable 'curtok' is set to the value of 'yylval.tok'. This
allows yyerror to print the last token that was seen.
The tokens that are allowable are generated by Yacc from the '%token'
definiitions in parse/gram.y, and are placed into the file
'parse/gram.h'. The lexer and parser code is the only code that
depends on these token constants.
The lexer is initalized through 'tokinit(char *file)'. This function
will open the file passed in, read all the data from it in one go
and set up the internal data for the tokenizer. The tokenizing is then
done while the whole file is in memory, which means that this code
will work poorly on files that are larger than the address space
available to the compiler. If this is a problem, you deserve all the
pain that is caused.
The file data is stored in the three global variables 'fidx', 'fbuf',
and 'fbufsz'. The actual tokenization happens through 'toknext()' and
its callees, which operate on these data structures character by
character, matching the values read, and shoving them into the 'Tok'
data structure.
2.2. Parsing:
The parser used is a traditional Yacc based parser. It is generated
from the source in parse/gram.y. The starting production is 'file',
which fills in a global 'file' tree node. This 'file' tree node must
be initialized before yyparse() is called.
2.3. Type Checking:
Type checking is done through unification of types. It's implemented
in parse/infer.c
2.4. Generic Specialization:
2.5. Serialization:
2.6. Usefiles:
3. FLATTENING:
This phase is invoked repeatedly on each top level declaration that we
want to generate code for.
4. OPTIMIZATION:
4.1. Constant Folding:
5. CODE GENERATION:
5.1. Instruction Selection:
5.2. Register Allocation:
6: TUTORIAL: ADDING A STATEMENT:
6.1. Stubbing in the node types:
6.2. Parsing:
6.3. Flattening:
6.4. Optimization:
6.5. Instruction Selection:
|