summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorOri Bernstein <ori@eigenstate.org>2012-08-18 20:32:44 -0400
committerOri Bernstein <ori@eigenstate.org>2012-08-18 20:32:44 -0400
commitd2abc97bfa136ff2dcdf970457b1434cfd7cc712 (patch)
treed487fd656ac800da3709b72bfac99709692cb377 /doc
parentb7198a182264b7ef24dd1d03baca4c275361748b (diff)
downloadmc-d2abc97bfa136ff2dcdf970457b1434cfd7cc712.tar.gz
Add start of compiler internals documentation.
Diffstat (limited to 'doc')
-rw-r--r--doc/compiler.txt131
-rw-r--r--doc/lang.txt4
2 files changed, 133 insertions, 2 deletions
diff --git a/doc/compiler.txt b/doc/compiler.txt
new file mode 100644
index 0000000..c2e7094
--- /dev/null
+++ b/doc/compiler.txt
@@ -0,0 +1,131 @@
+ Structure of the Myrddin Compiler
+ Aug 2012
+ Ori Bernstein
+
+TABLE OF CONTENTS:
+
+ 1. OVERVIEW
+ 2. PARSING
+ 2.1. Lexing
+ 2.2. Parsing
+ 2.3. Type checking
+ 3. FLATTENING
+ 4. OPTIMIZATION
+ 5. CODE GENERATION
+ 6. TUTORIAL: ADDING A STATEMENT
+
+1. OVERVIEW:
+
+ The Myrddin compiler suite consists of a set of binaries, written in C,
+ which translate Myrddin source code to the assembly most appropriate for
+ the target platform, and subsequently invoke the native assembler on it.
+ The linker is not invoked by the compiler, and the final output is an
+ object file for the target platform.
+
+ The compilers are named with a single character for the target platform,
+ with a single character for the language being compiled. A table of the
+ compilers and their names is below:
+
+ Compiler Platform
+ -------------------------
+ 6m x86-64
+
+
+ The compilation is divided into a small number of phases. The first phase
+ is parsing. The first phase is parsing, where the source code is first
+ tokenized, parsed, and semantically checked. The second phase is the
+ machine dependent tree flattening. In this phase, the tree is decomposed
+ function by function into simple operations that are relatively close to
+ the machine. Sizes are fixed, and all loops, if statements, etc are
+ replaced with gotos. The next phase is a machine independent optimizer,
+ which currenty does nothing other than simply folding trees. In the final
+ phase, the instructions are selected and the registers are allocated.
+
+ So, to recap, the phases are as follows:
+
+ parse Tokenize, parse and analyze the source.
+ flatten Rewrite the complex nodes into simpe ones
+ opt Optimize the flattened source trees
+ gen Generate the assembly code
+
+
+2. PARSING:
+
+ This phase takes in a source file, and outputs a tree that is guaranteed
+ to be valid.
+
+ 2.1. Lexing:
+
+ Lexing occurs in parse/tok.c. Because we desire to use this lexer from
+ within yacc, the entry point to this code is in 'yylex()'. As required
+ by yacc, 'yylex()' returns an integer defining the token type, and
+ sets the 'tok' member of yylval to the token that was taken from the
+ input stream. In addition, to allow for better error messages, the
+ global variable 'curtok' is set to the value of 'yylval.tok'. This
+ allows yyerror to print the last token that was seen.
+
+ The tokens that are allowable are generated by Yacc from the '%token'
+ definiitions in parse/gram.y, and are placed into the file
+ 'parse/gram.h'. The lexer and parser code is the only code that
+ depends on these token constants.
+
+ The lexer is initalized through 'tokinit(char *file)'. This function
+ will open the file passed in, read all the data from it in one go
+ and set up the internal data for the tokenizer. The tokenizing is then
+ done while the whole file is in memory, which means that this code
+ will work poorly on files that are larger than the address space
+ available to the compiler. If this is a problem, you deserve all the
+ pain that is caused.
+
+ The file data is stored in the three global variables 'fidx', 'fbuf',
+ and 'fbufsz'. The actual tokenization happens through 'toknext()' and
+ its callees, which operate on these data structures character by
+ character, matching the values read, and shoving them into the 'Tok'
+ data structure.
+
+ 2.2. Parsing:
+
+ The parser used is a traditional Yacc based parser. It is generated
+ from the source in parse/gram.y. The starting production is 'file',
+ which fills in a global 'file' tree node. This 'file' tree node must
+ be initialized before yyparse() is called.
+
+
+ 2.3. Type Checking:
+
+ Type checking is done through unification of types. It's implemented
+ in parse/infer.c
+
+ 2.4. Speciaization:
+
+ 2.4. Serialization:
+
+ 2.5. Use Files:
+
+3. FLATTENING:
+
+ This phase is invoked repeatedly on each top level declaration that we
+ want to generate code for.
+
+4. OPTIMIZATION:
+
+ 4.1. Constant Folding:
+
+5. CODE GENERATION:
+
+ 5.1. Instruction Selection:
+ 5.2. Register Allocation:
+
+6: TUTORIAL: ADDING A STATEMENT:
+
+ 6.1. Stubbing in the node types:
+
+ 6.2. Parsing:
+
+ 6.3. Flattening:
+
+ 6.4. Optimization:
+
+ 6.5. Instruction Selection:
+
+
diff --git a/doc/lang.txt b/doc/lang.txt
index 6353275..ea48c08 100644
--- a/doc/lang.txt
+++ b/doc/lang.txt
@@ -4,7 +4,7 @@
TABLE OF CONTENTS:
- 1. OVERVIEW
+ 1. ABOUT
2. LEXICAL CONVENTIONS
3. SYNTAX
3.1. Declarations
@@ -20,7 +20,7 @@ TABLE OF CONTENTS:
8. GRAMMAR
9. FUTURE DIRECTIONS
-1. OVERVIEW:
+1. ABOUT:
Myrddin is designed to be a simple, low level programming
language. It is designed to provide the programmer with