summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorOri Bernstein <ori@eigenstate.org>2014-06-18 01:10:30 -0400
committerOri Bernstein <ori@eigenstate.org>2014-06-18 01:10:30 -0400
commitf6e53709407611de940092c9d73e63926723252f (patch)
tree320c6d9474ba1eb51f2dc88640e9edd0d1672a55 /doc
downloadmc-f6e53709407611de940092c9d73e63926723252f.tar.gz
Merge pull request #1 from akoshibe/master
added sockproto values
Diffstat (limited to 'doc')
-rw-r--r--doc/Makefile21
-rw-r--r--doc/compiler.txt336
-rw-r--r--doc/lang.txt832
-rw-r--r--doc/mc.184
-rw-r--r--doc/muse.195
-rw-r--r--doc/myrbuild.166
6 files changed, 1434 insertions, 0 deletions
diff --git a/doc/Makefile b/doc/Makefile
new file mode 100644
index 0000000..9ee91a7
--- /dev/null
+++ b/doc/Makefile
@@ -0,0 +1,21 @@
+MAN=mc.1 \
+ muse.1 \
+ myrbuild.1 \
+
+include ../config.mk
+
+all:
+
+install:
+ @echo install -m 644 $(MAN) $(abspath $(DESTDIR)/$(INST_ROOT)/share/man/man1); \
+ mkdir -p $(abspath $(DESTDIR)/$(INST_ROOT)/share/man/man1); \
+ install -m 644 $(MAN) $(abspath $(DESTDIR)/$(INST_ROOT)/share/man/man1); \
+
+uninstall: $(MAN)
+ @for i in $^; do \
+ echo rm -f $(abspath $(DESTDIR)/$(INST_ROOT)/share/man/man1/$$i); \
+ rm -f $(abspath $(DESTDIR)/$(INST_ROOT)/share/man/man1/$$i); \
+ done
+
+clean:
+
diff --git a/doc/compiler.txt b/doc/compiler.txt
new file mode 100644
index 0000000..e296c0e
--- /dev/null
+++ b/doc/compiler.txt
@@ -0,0 +1,336 @@
+ Structure of the Myrddin Compiler
+ Aug 2012
+ Ori Bernstein
+
+TABLE OF CONTENTS:
+
+ 1. OVERVIEW
+ 1.1. Tree Structure
+ 2. PARSING
+ 2.1. Lexing
+ 2.2. AST Creation
+ 2.3. Type checking
+ 2.4. Generic Specialization
+ 2.5. Serialization
+ 2.6. Usefiles
+ 3. FLATTENING
+ 3.1. Control Flow
+ 3.2. Complex Expressions
+ 4. OPTIMIZATION
+ 4.1. Constant Folding
+ 5. CODE GENERATION
+ 5.1. Instruction Selection
+ 5.2. Register Allocation
+ 6. TUTORIAL: ADDING A STATEMENT
+ 6.1. Stubbing in the node types
+ 6.2. Parsing
+ 6.3. Flattening
+ 6.4. Optimization
+ 6.5. Instruction Selection
+
+1. OVERVIEW:
+
+ The Myrddin compiler suite consists of a set of binaries, written in C,
+ which translate Myrddin source code to the assembly most appropriate for
+ the target platform, and subsequently invoke the native assembler on it.
+ The linker is not invoked by the compiler, and the final output is an
+ object file for the target platform.
+
+ The compilers are named with a single character for the target platform,
+ with a single character for the language being compiled. A table of the
+ compilers and their names is below:
+
+ Compiler Platform
+ -------------------------
+ 6m x86-64
+
+
+ The compilation is divided into a small number of phases. The first phase
+ is parsing, where the source code is first tokenized, the abstract syntax
+ tree (AST) is generated, and semantically checked. The second phase is the
+ machine-dependent tree flattening. In this phase, the tree is decomposed
+ function by function into simple operations that are relatively close to
+ the machine. Sizes are fixed, and all loops, if statements, etc. are
+ replaced with gotos. The next phase is a machine-independent optimizer,
+ which currenty does nothing other than simply folding trees. In the final
+ phase, the instructions are selected and the registers are allocated.
+
+ So, to recap, the phases are as follows:
+
+ parse Tokenize, parse, and analyze the source
+ flatten Rewrite the complex nodes into simpe ones
+ opt Optimize the flattened source trees
+ gen Generate the assembly code
+
+ 1.1. Tree Structure:
+
+ File nodes (n->type == Nfile) represent the files being compiled. The current
+ node is held in a global variable called, unsurprisingly, 'file'. The
+ global symbol table, export table, uses, and other compilation-specific
+ information is stored in this node. This implies that the compiler can
+ only process one file at a time.
+
+ Name nodes (n->type == Nname) are simply names, possibly with a namespace
+ attached. They are left as leaf nodes in the tree, specifying variable
+ names, union tags, and just about anything else with a name.
+
+ Use nodes (n->type == Nuse) simply tell the compiler that a usefile by the
+ name stored in this node will be loaded.
+
+ Expression nodes (n->type == Nexpr) represent expressions. They consist of
+ an operator, a type, a few flags, possibly a declaration ID, and a list of
+ arguments.
+
+ Operators are defined in parse/ops.def, and start with an 'O' by
+ convention; eg: Oadd, Osub, etc.
+
+ The declaration id (n->expr.did) is only valid on expressions representing
+ a single variable (n->expr.op == Ovar). The DID is a unique identifier
+ representing the declaration node that the variable refers to. This is used
+ for a variety of things, from fast comparisons to allowing us to put the
+ node into a bit set easily.
+
+ Literal nodes (n->type == Nlit) hold a literal value. The type held is
+ stored in littype, which are defined in parse/lits.def.
+
+ The various statement nodes (Nloopstmt, Nifstmt, Nmatchstmt, Nblock,
+ Nlbl) are all statements that may appear within a block node (Nblock).
+
+ Declaration nodes declare a name in a symbol table. TODO: MORE DETAIL.
+
+ Uelt nodes declare a union element. TODO: MORE DETAIL.
+
+ Func nodes declare a function. TODO: MORE DETAIL.
+
+
+
+2. PARSING:
+
+ This phase takes in a source file, and outputs a tree that is guaranteed
+ to be valid. The tree nodes are defined in parse/parse.h in struct Node,
+ and have one of the types defined in parse/nodetypes.def. Node types
+ start with 'N' by convention; eg: Nfile, Nifstmt, etc.
+
+ 2.1. Lexing:
+
+ Lexing occurs in parse/tok.c. Because we want to use this lexer from
+ within yacc, the entry point to this code is in 'yylex()'. As required
+ by yacc, 'yylex()' returns an integer defining the token type, and
+ sets the 'tok' member of yylval to the token that was taken from the
+ input stream. In addition, to allow for better error messages, the
+ global variable 'curtok' is set to the value of 'yylval.tok'. This
+ allows yyerror to print the last token that was seen.
+
+ The tokens that are allowable are generated by Yacc from the '%token'
+ definitions in parse/gram.y, and are placed into the file
+ 'parse/gram.h'. The lexer and parser code is the only code that
+ depends on these token constants.
+
+ The lexer is initalized through 'tokinit(char *file)'. This function
+ will open the file passed in, read all the data from it in one go
+ and set up the internal data for the tokenizer. The tokenizing is then
+ done while the whole file is in memory, which means that this code
+ will work poorly on files that are larger than the address space
+ available to the compiler. If this is a problem, you deserve all the
+ pain that is caused.
+
+ The file data is stored in the three global variables 'fidx', 'fbuf',
+ and 'fbufsz'. The actual tokenization happens through 'toknext()' and
+ its callees, which operate on these data structures character by
+ character, matching the values read, and shoving them into the 'Tok'
+ data structure.
+
+ 2.2. AST Creation:
+
+ The parser used is a traditional Yacc-based parser. It is generated
+ from the source in parse/gram.y. The starting production is 'file',
+ which fills in a global 'file' tree node. This 'file' tree node must
+ be initialized before yyparse() is called.
+
+
+ 2.3. Type Checking:
+
+ Type checking is done through unification of types. It's implemented
+ in parse/infer.c. It proceeds through a simple unification algorithm,
+ which is documented in lang.txt. As a result, only the internal
+ details of this algorithm will be discussed here.
+
+ The first step done is loading and resolving use files. This is
+ deferred to the type checking phase for two reasons. First, we
+ do not want to force tools to have all dependencies compiled if they
+ use this parser, even though type full type checking is impossible
+ until all usefiles are loaded. And second, this is when the
+ information is actually needed.
+
+ Next, the types declared in the package section are merged with the
+ exported types, allowing us to start off with our type information as
+ complete as possible, and making sure that the types of globals
+ actually match up with the exported types.
+
+ The next step is the actual type inference. We do a bottom-up walk of
+ the tree, unifying types as we go. There are subtleties with the
+ member operator, however. Because the '.' operator is used for both
+ member lookups and namespace lookups, before we descend into a node
+ that has operator Omemb, we need to check if it's a namespaced name,
+ or an actual member reference. If it is a namespaced name, we replace
+ the expression with an Ovar expression. This check happens in the
+ 'checkns()' function. Second, because we need to know the LHS of a
+ member expression before we can check if the RHS is valid, and we
+ are not guaranteed to know this at the first time that we see it, the
+ expression is assumed to be valid, and this asumption is verified in
+ a post-processing pass. Casts are validated in a deferred manner
+ similarly.
+
+ Generic variables are added to a list of generic callsites to
+ specialize when they are seen in as a leaf of an Ovar node.
+
+ The type inference, to this point, has only built up a mapping
+ of types. So, for example, if we were to have the inferred types
+ for the following set of statements:
+
+ var a
+ var b
+ var c
+ a = b
+ c = b + 1
+
+ We would have the mappings:
+
+ $t0 -> $t1
+ $t1 -> $t2
+ $t2 -> int
+
+ So, in the 'typesub()' function, we iterate over the entire tree,
+ replacing every instance of a non-concrete type with the final
+ mapped type. If a type does not map to a fully concrete type,
+ this is where we flag an error.
+
+ FIXME: DESCRIBE HOW YOU FIXED GENERICS ONCE YOU FIX GENERICS.
+
+ 2.4. Generic Specialization:
+
+ After type inference (well, technially, as the final step of it),
+ we walk through the list of callsites that need instantiations
+ of generics, and create a specialized generic instance for each of
+ them. This specialization is done, unsurprisingly, in specialize.c,
+ by the simple algorithm of cloning the entire tree that needs to
+ be specialized, and walking over all nodes substituting the types
+ that are replacing the type parameters.
+
+ 2.5. Serialization:
+
+ Trees of all sorts can be serialized and deserialized from files,
+ as long as they are fully typed. Trees containing type variables (ie,
+ uninferred types) cannot be serialized, as type variables cannot be
+ deserialized meaningfully.
+
+ The format for this is only documented in the source, and is a
+ straighforward dump of the trees to memory. It is constantly shifting,
+ and will not reliably work between compiler versions.
+
+ 2.6. Usefiles:
+
+ Usefiles are more or less files that consist of a single character tag
+ that tells us what type of tree to deserialize. Because serialized
+ trees are compiler version dependent, so are usefiles.
+
+3. FLATTENING:
+
+ This phase is invoked repeatedly on each top-level declaration that we
+ want to generate code for. There is a good chance that this flattening
+ phase should be made machine-independent, and passed as a parameter
+ a machine description describing known integer and pointer sizes, among
+ other machine attributes. However, for now, it is machine-dependent,
+ and lives in 6/simp.c.
+
+ The goal of flattening a tree is to take semantically involved constructs
+ such as looping, and simplify things into something that is easy to
+ generate code for, as well as something that is easier to analyze for
+ optimization.
+
+ 3.1. Control Flow:
+
+ All if statements, loops, and other complex constructs are simplified
+ to jumps and conditional jumps. Loops are generally simplified from
+ something that would look like this:
+
+ loop
+ init
+ cond
+ inc
+ body
+
+ To something that would look like this:
+
+ init
+ jmp cond
+ .loop:
+ body
+ inc
+ .cond:
+ cjmp cond .loop .end
+ .end:
+
+ Boolean expressions are simplified to a location to jump to, as
+ described in section 8.4 of the Dragon book[1].
+
+ 3.2. Complex Expressions:
+
+ Complex expressions such as copying types larger than a single machine
+ word, pulling members out of structures, emulating multiplication and
+ division for larger integers sizes, and similar operations are reduced
+ to trees that are expressible in terms of simple machine operations.
+
+ By the end of the simplification pass, the following operators should
+ not be present in the trees:
+
+ Obad Oret Opreinc Opostinc Opredec Opostdec Olor Oland Oaddeq
+ Osubeq Omuleq Odiveq Omodeq Oboreq Obandeq Obxoreq Obsleq
+ Obsreq Omemb Oslice Oidx Osize Numops Oucon Ouget Otup Oarr
+ Oslbase Osllen Ocast
+
+
+4. OPTIMIZATION:
+
+ Currently, there is virtually no optimization done on the trees after
+ flattening. The only optimization that is done is constant folding.
+
+ 4.1. Constant Folding:
+
+ Expressions with constant values are simplified algebraically. For
+ example, the expression 'x*1' is simplified to 'x', '0/n' is
+ simplified to '0', and so on.
+
+
+5. CODE GENERATION:
+
+ 5.1. Instruction Selection:
+
+ Instruction selection is done via a simple handwritten bottom-up pass
+ over the tree. Common patterns such as scaled or offset indexing are
+ recognized by the patterns, but no attempts are made at finding an
+ optimal tiling.
+
+ 5.2. Register Allocation:
+
+ Register allocation is done via the algorithm described in "Iterated
+ Regster Coalescing" by Appel and George. As of the time of this
+ writing, the register allocator does not yet implement overlapping
+ register classes. This will be done as described in "A generalized
+ algorithm for graph-coloring register allocation" by Smith, Ramsey,
+ and Holloway.
+
+6: TUTORIAL: ADDING A STATEMENT:
+
+ 6.1. Stubbing in the node types:
+
+ 6.2. Parsing:
+
+ 6.3. Flattening:
+
+ 6.4. Optimization:
+
+ 6.5. Instruction Selection:
+
+[1] Aho, Sethi, Ullman: Compilers: Principles, Techniques, and Tools, 1988.
+ ISBN 0-201-10088-6
diff --git a/doc/lang.txt b/doc/lang.txt
new file mode 100644
index 0000000..008c95a
--- /dev/null
+++ b/doc/lang.txt
@@ -0,0 +1,832 @@
+ The Myrddin Programming Language
+ Jul 2012
+ Ori Bernstein
+
+TABLE OF CONTENTS:
+
+ 1. ABOUT
+ 2. LEXICAL CONVENTIONS
+ 3. SYNTAX
+ 3.1. Declarations
+ 3.2. Literal Values
+ 3.3. Control Constructs and Blocks
+ 3.4. Expressions
+ 3.5. Data Types
+ 3.6. Packages and Uses
+ 4. TOOLCHAIN
+ 5. EXAMPLES
+ 6. STYLE GUIDE
+ 7. STANDARD LIBRARY
+ 8. GRAMMAR
+ 9. FUTURE DIRECTIONS
+
+1. ABOUT:
+
+ Myrddin is designed to be a simple, low-level programming
+ language. It is designed to provide the programmer with
+ predictable behavior and a transparent compilation model,
+ while at the same time providing the benefits of strong
+ type checking, generics, type inference, and similar.
+ Myrddin is not a language designed to explore the forefront
+ of type theory or compiler technology. It is not a language
+ that is focused on guaranteeing perfect safety. Its focus
+ is on being a practical, small, fairly well defined, and
+ easy to understand language for work that needs to be close
+ to the hardware.
+
+ Myrddin is a computer language influenced strongly by C
+ and ML, with ideas from Rust, Go, C++, and numerous other
+ sources and resources.
+
+
+2. LEXICAL CONVENTIONS:
+
+ The language is composed of several classes of tokens. There
+ are comments, identifiers, keywords, punctuation, and whitespace.
+
+ Comments begin with "/*" and end with "*/". They may nest.
+
+ /* this is a comment /* with another inside */ */
+
+ Identifiers begin with any alphabetic character or underscore,
+ and continue with any number of alphanumeric characters or
+ underscores. Currently the compiler places a limit of 1024
+ bytes on the length of the identifier.
+
+ some_id_234__
+
+ Keywords are a special class of identifier that is reserved
+ by the language and given a special meaning. The set of
+ keywords in Myrddin are as follows:
+
+ castto match
+ const pkg
+ default protect
+ elif sizeof
+ else struct
+ export trait
+ extern true
+ false type
+ for union
+ generic use
+ goto var
+ if while
+
+
+ At the current stage of development, not all of these keywords are
+ implemented within the language.[1]
+
+ Literals are a direct representation of a data object within the source of
+ the program. There are several literals implemented within the language.
+ These are fully described in section 3.2 of this manual.
+
+ In the compiler, single semicolons (';') and newline (\x10)
+ characters are treated identically, and are therefore interchangable.
+ They will both be referred to "endline"s thoughout this manual.
+
+
+3. SYNTAX OVERVIEW:
+
+ 3.1. Declarations:
+
+ A declaration consists of a declaration class (i.e., one
+ of 'const', 'var', or 'generic'), followed by a declaration
+ name, optionally followed by a type and assignment. One thing
+ you may note is that unlike most other languages, there is no
+ special function declaration syntax. Instead, a function is
+ declared like any other value: by assigning its name to a
+ constant or variable.
+
+ const: Declares a constant value, which may not be
+ modified at run time. Constants must have
+ initializers defined.
+ var: Declares a variable value. This value may be
+ assigned to, copied from, and modified.
+ generic: Declares a specializable value. This value
+ has the same restricitions as a const, but
+ taking its address is not defined. The type
+ parameters for a generic must be explicitly
+ named in the declaration in order for their
+ substitution to be allowed.
+
+ In addition, there is one modifier allowed on declarations:
+ 'extern'. Extern declarations are used to declare symbols from
+ another module which cannot be provided via the 'use' mechanism.
+ Typical uses would be to expose a function written in assembly. They
+ can also be used as a workaround for external dependencies.
+
+ Examples:
+
+ Declare a constant with a value 123. The type is not defined,
+ and will be inferred:
+
+ const x = 123
+
+ Declare a variable with no value and no type defined. The
+ value can be assigned later (and must be assigned before use),
+ and the type will be inferred.
+
+ var y
+
+ Declare a generic with type '@a', and assigns it the value
+ 'blah'. Every place that 'z' is used, it will be specialized,
+ and the type parameter '@a' will be substituted.
+
+ generic z : @a = blah
+
+ Declare a function f with and without type inference. Both
+ forms are equivalent. 'f' takes two parameters, both of type
+ int, and returns their sum as an int
+
+ const f = {a, b
+ var c : int = 42
+ -> a + b + c
+ }
+
+ const f : (a : int, b : int -> int) = {a : int, b : int -> int
+ var c : int = 42
+ -> a + b + c
+ }
+
+ 3.2. Literal Values
+
+ Integers literals are a sequence of digits, beginning with a
+ digit and possibly separated by underscores. They are of a
+ generic type, and can be used where any numeric type is
+ expected. They may be prefixed with "0x" to indicate that the
+ following number is a hexadecimal value, or 0b to indicate a
+ binary value. Decimal values are not prefixed, and octal values
+ are not supported.
+
+ eg: 0x123_fff, 0b1111, 1234
+
+ Floating-point literals are also a sequence of digits beginning with
+ a digit and possibly separated by underscores. They are also of a
+ generic type, and may be used whenever a floating-point type is
+ expected. Floating point literals are always in decimal, and
+ as of this writing, exponential notation is not supported[2]
+
+ eg: 123.456
+
+ String literals represent a compact method of representing a byte
+ array. Any byte values are allowed in a string literal, and will be
+ spit out again by the compiler unmodified, with the exception of
+ escape sequences.
+
+ There are a number of escape sequences supported for both character
+ and string literals:
+ \n newline
+ \r carriage return
+ \t tab
+ \b backspace
+ \" double quote
+ \' single quote
+ \v vertical tab
+ \\ single slash
+ \0 nul character
+ \xDD single byte value, where DD are two hex digits.
+
+ String literals begin with a ", and continue to the next
+ unescaped ".
+
+ eg: "foo\"bar"
+
+ Character literals represent a single codepoint in the character
+ set. A character starts with a single quote, contains a single
+ codepoint worth of text, encoded either as an escape sequence
+ or in the input character set for the compiler (generally UTF8).
+ They share the same set of escape sequences as string literals.
+
+ eg: 'א', '\n', '\u1234'[3]
+
+ Boolean literals are either the keyword "true" or the keyword
+ "false".
+
+ eg: true, false
+
+ Funciton literals describe a function. They begin with a '{',
+ followed by a newline-terminated argument list, followed by a
+ body and closing '}'. They will be described in more detail
+ later in this manual.
+
+ eg: {a : int, b
+ -> a + b
+ }
+
+ Sequence literals describe either an array or a structure
+ literal. They begin with a '[', followed by an initializer
+ sequence and closing ']'. For array literals, the initializer
+ sequence is either an indexed initializer sequence[4], or an
+ unindexed initializer sequence. For struct literals, the
+ initializer sequence is always a named initializer sequence.
+
+ An unindexed initializer sequence is simply a comma separated
+ list of values. An indexed initializer sequence contains a
+ '#number=value' comma separated sequence, which indicates the
+ index of the array into which the value is inserted. A named
+ initializer sequence contains a comma separated list of
+ '.name=value' pairs.
+
+ eg: [1,2,3], [#2=3, #1=2, #0=1], [.a = 42, .b="str"]
+
+ A tuple literal is a parentheses separated list of values.
+ A single element tuple contains a trailing comma.
+
+ eg: (1,), (1,'b',"three")
+
+ Finally, while strictly not a literal, it's not a control
+ flow construct either. Labels are identifiers preceded by
+ colons.
+
+ eg: :my_label
+
+ They can be used as targets for gotos, as follows:
+
+ goto my_label
+
+ the ':' is not part of the label name.
+
+ 3.3. Control Constructs and Blocks:
+
+ if for
+ while match
+ goto
+
+ The control statements in Myrddin are similar to those in many other
+ popular languages, and with the exception of 'match', there should
+ be no surprises to a user of any of the Algol derived languages.
+ Where a truth value is required, any type with the builtin trait
+ 'tctest' can be used in all of these.
+
+ Blocks are the "carriers of code" in Myrddin programs. They consist
+ of series of expressions, typically ending with a ';;', although the
+ function-level block ends at the function's '}', and in if
+ statemments, an 'elif' may terminate a block. They can contain any
+ number of declarations, expressions, control constructs, and empty
+ lines. Every control statement example below will (and, in fact,
+ must) have a block attached to the control statement.
+
+ If statements branch one way or the other depending on the truth
+ value of their argument. The truth statement is separated from the
+ block body
+
+ if true
+ std.put("The program always get here")
+ elif elephant != mouse
+ std.put("...eh.")
+ else
+ std.put("The program never gets here")
+ ;;
+
+ For statements begin with an initializer, followed by a test
+ condition, followed by an increment action. For statements run the
+ initializer once before the loop is run, the test each on each
+ iteration through the loop before the body, and the increment on
+ each iteration after the body. If the loop is broken out of early
+ (for example, by a goto), the final increment will not be run. The
+ syntax is as follows:
+
+ for init; test; increment
+ blockbody()
+ ;;
+
+ While loops are equivalent to for loops with empty initializers
+ and increments. They run the test on every iteration of the loop,
+ and exit only if it returns false.
+
+ Match statements do pattern matching on values. They take as an
+ argument a value of type 't', and match it against a list of other
+ values of the same type. The patterns matched against can also contain
+ free names, which will be bound to the sub-value matched against. The
+ patterns are checked in order, and the first matching pattern has its
+ body executed, after which no other patterns will be matched. This
+ implies that if you have specific patterns mixed with by more general
+ ones, the specific patterns must come first.
+
+ Match patterns can be one of the following:
+
+ - Union patterns
+
+ These look like union constructors, only they define
+ a value to match against.
+
+ - Literal patterns
+
+ Any literal value can be matched against.
+
+ - Constant patterns
+
+ Any constant value can be matched against.
+
+ More types of pattern to match will be added over time.
+
+ Match statements consist of the keyord 'match', followed by
+ the expression to match against the patterns, followed by a
+ newline. The body of the match statement consists of a list
+ of pattern clauses. A patterned clause is a pattern, followed
+ by a ':', followed by a block body.
+
+ An example of the syntax follows:
+
+ const Val234 = `Val 234 /* set up a constant value */
+ var v = `Val 123 /* set up variable to match */
+ match v
+ /* pattern clauses */
+ `Val 123:
+ std.put("Matched literal union pat\n");;
+ Val234:
+ std.put("Matched const value pat\n")
+ ;;
+ `Val a:
+ std.put("Matched pattern with capture\n")
+ std.put("Captured value: a = %i\n", a)
+ ;;
+ a
+ std.put("A top level bind matches anything.");;
+ `Val 111
+ std.put("Unreachable block.")
+ ;;
+ ;;
+
+
+ 3.4. Expressions:
+
+ Myrddin expressions are relatively similar to expressions in C. The
+ operators are listed below in order of precedence, and a short
+ summary of what they do is listed given. For the sake of clarity,
+ 'x' will stand in for any expression composed entirely of
+ subexpressions with higher precedence than the current current
+ operator. 'e' will stand in for any expression. Unless marked
+ otherwise, expressions are left associative.
+
+ BUG: There are too many precedence levels.
+
+
+ Precedence 14: (*ok, not really operators)
+ (,,,) Tuple Construction
+ (e) Grouping
+ name Bare names
+ literal Values
+
+ Precedence 13:
+ x.name Member lookup
+ x++ Postincrement
+ x-- Postdecrement
+ x[e] Index
+ x[from,to] Slice
+
+ Precedence 12:
+ ++x Preincrement
+ --x Predecrement
+ *x Dereference
+ &x Address
+ !x Logical negation
+ ~x Bitwise negation
+ +x Positive (no operation)
+ -x Negate x
+
+ Precedence 11:
+ x << x Shift left
+ x >> x Shift right
+
+ Precedence 10:
+ x * x Multiply
+ x / x Divide
+ x % x Modulo
+
+ Precedence 9:
+ x + x Add
+ x - x Subtract
+
+ Precedence 8:
+ x & y Bitwise and
+
+ Precedence 7:
+ x | y Bitwise or
+ x ^ y Bitwise xor
+
+ Precedence 6:
+ `Name x Union construction
+
+ Precedence 5:
+ x castto(type) Cast expression
+
+ Precedence 4:
+ x == x Equality
+ x != x Inequality
+ x > x Greater than
+ x >= x Greater than or equal to
+ x < x Less than
+ x <= x Less than or equal to
+
+ Precedence 3:
+ x && x Logical and
+
+ Precedence 2:
+ x || x Logical or
+
+ Precedence 1:
+ x = x Assign Right assoc
+ x += x Fused add/assign Right assoc
+ x -= x Fused sub/assign Right assoc
+ x *= x Fused mul/assign Right assoc
+ x /= x Fused div/assign Right assoc
+ x %= x Fused mod/assign Right assoc
+ x |= x Fused or/assign Right assoc
+ x ^= x Fused xor/assign Right assoc
+ x &= x Fused and/assign Right assoc
+ x <<= x Fused shl/assign Right assoc
+ x >>= x Fused shr/assign Right assoc
+
+ Precedence 0:
+ -> x Return expression
+
+ All expressions on integers act on two's complement values which wrap
+ on overflow. Right shift expressions fill with the sign bit on
+ signed types, and fill with zeros on unsigned types.
+
+ 3.5. Data Types:
+
+ The language defines a number of built in primitive types. These
+ are not keywords, and in fact live in a separate namespace from
+ the variable names. Yes, this does mean that you could, if you want,
+ define a variable named 'int'.
+
+ There are no implicit conversions within the language. All types
+ must be explicitly cast if you want to convert, and the casts must
+ be of compatible types, as will be described later.
+
+ 3.5.1. Primitive types:
+
+ void
+ bool char
+ int8 uint8
+ int16 uint16
+ int32 uint32
+ int64 uint64
+ int uint
+ long ulong
+ float32 float64
+
+ These types are as you would expect. 'void' represents a
+ lack of type, although for the sake of genericity, you can
+ assign between void types, return values of void, and so on.
+ This allows generics to not have to somehow work around void
+ being a toxic type.
+
+ bool is a type that can only hold true and false. It can be
+ assigned, tested for equality, and used in the various boolean
+ operators.
+
+ char is a 32 bit integer type, and is guaranteed to be able
+ to hold exactly one codepoint. It can be assigned integer
+ literals, tested against, compared, and all the other usual
+ numeric types.
+
+ The various [u]intXX types hold, as expected, signed and
+ unsigned integers of the named sizes respectively.
+ Similarly, floats hold floating point types with the
+ indicated precision.
+
+ var x : int declare x as an int
+ var y : float32 declare y as a 32 bit float
+
+
+ 3.5.2. Composite types:
+
+ pointer
+ slice array
+
+ Pointers are, as expected, values that hold the address of
+ the pointed to value. They are declared by appending a '*'
+ to the type. Pointer arithmetic is not allowed. They are
+ declared by appending a '*' to the base type
+
+ Arrays are a group of N values, where N is part of the type.
+ Arrays of different sizes are incompatible. Arrays in
+ Myrddin, unlike many other languages, are passed by value.
+ They are declared by appending a '[SIZE]' to the base type.
+
+ Slices are similar to arrays in many contemporary languages.
+ They are reference types that store the length of their
+ contents. They are declared by appending a '[,]' to the base
+ type.
+
+ foo* type: pointer to foo
+ foo[123] type: array of 123 foo
+ foo[,] type: slice of foo
+
+ 3.5.3. Aggregate types:
+
+ tuple struct
+ union
+
+ Tuples are the traditional product type. They are declared
+ by putting the comma separated list of types within square
+ brackets.
+
+ Structs are aggregations of types with named members. They
+ are declared by putting the word 'struct' before a block of
+ declaration cores (ie, declarations without the storage type
+ specifier).
+
+ Unions are the traditional sum type. They consist of a tag
+ (a keyword prefixed with a '`' (backtick)) indicating their
+ current contents, and a type to hold. They are declared by
+ placing the keyword 'union' before a list of tag-type pairs.
+ They may also omit the type, in which case, the tag is
+ suficient to determine which option was selected.
+
+ [int, int, char] a tuple of 2 ints and a char
+
+ struct a struct containing an int named
+ a : int 'a', and a char named 'b'.
+ b : char
+ ;;
+
+ union a union containing one of
+ `Thing int int or char. The values are not
+ `Other float32 named, but they are tagged.
+ ;;
+
+
+ 3.5.4. Magic types:
+
+ tyvar typaram
+ tyname
+
+ A tyname is a named type, similar to a typedef in C, however
+ it genuinely creates a new type, and not an alias. There are
+ no implicit conversions, but a tyname will inherit all
+ constraints of its underlying type.
+
+ A typaram is a parametric type. It is used in generics as
+ a placeholder for a type that will be substituted in later.
+ It is an identifier prefixed with '@'. These are only valid
+ within generic contexts, and may not appear elsewhere.
+
+ A tyvar is an internal implementation detail that currently
+ leaks in error messages out during type inference, and is a
+ major cause of confusing error messages. It should not be in
+ this manual, except that the current incarnation of the
+ compiler will make you aware of it. It looks like '@$type',
+ and is a variable that holds an incompletely inferred type.
+
+ type mine = int creates a tyname named
+ 'mine', equivalent to int.
+
+
+ @foo creates a type parameter
+ named '@foo'.
+
+
+ 3.6.
+
+ The myrddin type system is a system similar to the Hindley Milner
+ system, however, types are not implicitly generalized. Instead, type
+ schemes (type parameters, in Myrddin lingo) must be explicitly provided
+ in the declarations. For purposes of brevity, instead of specifying type
+ rules for every operator, we group operators which behave identically
+ from the type system perspective into a small set of classes. and define
+ the constraints that they require.
+
+ Type inference in Myrddin operates as a bottom up tree walk,
+ applying the type equations for the operator to its arguments.
+ It begins by initializing all leaf nodes with the most specific
+ known type for them as follows:
+
+ 3.6.1 Types for leaf nodes:
+
+ Variable Type
+ ----------------------
+ var foo $t
+
+ A type variable is the most specific type for a declaration
+ or function without any specified type
+
+ var foo : t t
+
+ If a type is specified, that type is taken for the
+ declaration.
+
+ "asdf" byte[:]
+
+ String literals are byte arrays.
+
+
+ 'a' char
+
+ Char literals are of type 'char'
+
+ true bool
+ false bool
+
+ true/false are boolean literals
+
+ 123 $t::(tcint,tcnum,tctest)
+
+ Integer literals get a fresh type variable of type with
+ the constraints for int-like types.
+
+ 123.1 $t::(tcfloat,tcnum,tctest)
+
+ Float literals get a fresh type variable of type with
+ the constraints for float-like types.
+
+ {a,b:t; } ($a,t -> $b)
+
+ Function literals get the most specific type that can
+ be determined by their signature.
+
+
+ num-binop:
+
+ + - * / %
+ += -= *= /= %
+
+ Number binops require the constraint 'tcnum' for both the
+
+ num-unary:
+ - +
+ Number binops require the constraint 'tcnum'.
+
+ int-binop:
+ | & ^ << >>
+ |= &= ^= <<= >>
+ int-unary:
+ ~ ++ --
+
+ bool-binop:
+ || && == !=
+ < <= > >=
+
+
+ 3.7. Packages and Uses:
+
+ pkg use
+
+ There are two keywords for module system. 'use' is the simpler
+ of the two, and has two cases:
+
+ use syspkg
+ use "localfile"
+
+ The unquoted form searches all system include paths for 'syspkg'
+ and imports it into the namespace. By convention, the namespace
+ defined by 'syspkg' is 'syspkg', and is unique and unmerged. This
+ is not enforced, however. Typical usage of unquoted names is to
+ import a library that already exists.
+
+ The quoted form searches the local directory for "localpkg". By
+ convention, the package it imports does not match the name
+ "localpkg", but instead is used as partial of the definition of the
+ importer's package. This is a confusing description.
+
+ A typical use of a quoted import is to allow splitting one package
+ into multiple files. In order to support this behavior, if a package
+ is defined in the current file, and a use statements imports a
+ package with the same namespace, the two namespaces are merged.
+
+ The 'pkg' keyword allows you to define a (partial) package by
+ listing the symbols and types for export. For example,
+
+ pkg mypkg =
+ type mytype
+
+ const Myconst : int = 42
+ const myfunc : (v : int -> bool)
+ ;;
+
+ declares a package "mypkg", which defines three exports, "mytype",
+ "Myconst", and "myfunc". The definitions of the values may be
+ defined in the 'pkg' specification, but it is preferred to implement
+ them in the body of the code for readability. Scanning the export
+ list is desirable from a readability perspective.
+
+4. TOOLCHAIN:
+
+ The toolchain used is inspired by the Plan 9 toolchain in name. There
+ is currently one compiler for x64, called '6m'. This compiler outputs
+ standard elf .o files, and supports these options:
+
+ 6m [-h] [-o outfile] [-d[dbgopts]] inputs
+ -I path Add 'path' to use search path
+ -o Output to outfile
+
+5. EXAMPLES:
+
+ 5.1. Hello World:
+
+ use std
+ const main = {
+ std.put("Hello World!\n")
+ -> 0
+ }
+
+ TODO: DESCRIBE CONSTRUCTS.
+
+ 5.2. Conditions
+
+ use std
+ const intmax = {a, b
+ if a > b
+ -> a
+ else
+ -> b
+ ;;
+ }
+
+ const main = {
+ var x = 123
+ var y = 456
+ std.put("The max of %i, %i is %i\n", x, y, max(x, y))
+ }
+
+ TODO: DESCRIBE CONSTRUCTS.
+
+ 5.3. Looping
+
+ use std
+ const innerprod = {a, b
+ var i
+ var sum
+ for i = 0; i < a.len; i++
+ sum += a[i]*b[i]
+ ;;
+ }
+
+ const main = {
+ std.put("The inner product is %i\n", innerprod([1,2,3], [4,5,6]))
+ }
+
+ TODO: DESCRIBE CONSTRUCTS.
+
+6. STYLE GUIDE:
+
+ 6.1. Brevity:
+
+ Myrddin is a simple language which aims to strip away abstraction when
+ possible, and it is not well served by overly abstract or bulky code.
+ The code written should be a readable description of an algorithm,
+ aimed at conveying the essential operations in a linear and
+ straightforward fasion.
+
+ Write for humans, not machines. Write linearly, so that an algorithm
+ can be understood with minimal function-chasing.
+
+ 6.2. Naming:
+
+ Names should be brief and evocative. A good name serves as a reminder
+ to what the function does. For functions, a single verb is ideal. For
+ local variables, a single character might suffice. Compact notation
+ is simpler to read, typographically.
+
+ Variables names should describe the value contained, and function
+ names should describe the value returned.
+
+ Good: spawn(myfunc)
+ Bad: create_new_thread_starting_at_function(myfunc)
+
+ The identifiers used for constant values are put in Initialcase.
+ Functions and types are in singleword style, although underscores are
+ occasionally necessary to specify additional information within
+ functions, due to the lack of overloading.
+
+ Good:
+ type mytype = int
+ var myvar : mytype
+ const Myconst = 42
+ union
+ `Tagone int
+ ;;
+
+ Bad:
+ type MyType = int /* types are 'singleword' */
+ const my_func = {;...} /* function names should avoid _ */
+ const myconst /* constants start with Uppercase */
+ union
+ `sometag /* tags start with uppercase */
+ ;;
+
+ Acceptable:
+ const length_mm = {;...} /* '_' disambiguates returned values. */
+ cosnt length_cm = {;...}
+
+ 6.3. Collections:
+
+
+
+7. STANDARD LIBRARY:
+
+8. GRAMMAR:
+
+9. FUTURE DIRECTIONS:
+
+BUGS:
+
+[1] TODO: trait, default, protect,
+[2] TODO: exponential notation.
+[3] TODO: \uDDDD escape sequences not yet recognized
+[4] TODO: currently the only sequence literal implemented is the
+ unindexed one
+
diff --git a/doc/mc.1 b/doc/mc.1
new file mode 100644
index 0000000..471d013
--- /dev/null
+++ b/doc/mc.1
@@ -0,0 +1,84 @@
+.TH MC 1
+.SH NAME
+6m
+.SH SYNOPSIS
+.B 6m
+.I -[hioS]
+.I [file...]
+.br
+.SH DESCRIPTION
+.PP
+The ?m family of compilers compile Myrddin source into object files
+for the corresponding architecture. There is one compiler for each
+architecture supported, with a unique name. By default, if the input
+file is named
+.I filename.myr
+then the the object file that is generated will be named
+.I filename.o.
+If the filename does not end with the suffix
+.I .myr
+then the suffix
+.I .o
+will simply be appended to it.
+
+.PP
+The following architectures are currently supported:
+.TP
+6m
+x86-64
+
+.PP
+The compiler options are:
+
+.TP
+.B -d [flTri]
+Print debugging dumps. Additional options may be given to give more
+debugging information for specific intermediate states of the compilation.
+
+.TP
+.B -h
+Print a summary of the available options.
+
+.TP
+.B -I path
+Add 'path' to the search path for unquoted use statments. This option
+does not affect the search path for local usefiles, which are always
+searched relative to the compiler's current working directory. Without
+any options, the search path defaults to /usr/include/myr.
+
+.TP
+.B -o output-file
+Specify that the generated code should be placed in
+
+.TP
+.B -S
+Generate assembly code instead of an object file.
+
+.SH EXAMPLE
+.EX
+ 6m foo.myr
+ 6m bar.myr
+ ld -o foobar foo.o bar.o
+.EE
+
+.SH FILES
+The source code for this compiler is available from
+.B git://git.eigenstate.org/git/ori/mc.git
+
+.SH SEE ALSO
+.IR muse(1)
+.IR ld(1)
+.IR as(1)
+
+.SH BUGS
+.PP
+The language is not yet complete, and the compilers often omit useful
+constructs in spite of their desirability.
+.PP
+There are virtually no optimizations done, and the generated source is
+often very poorly performing.
+.PP
+The current calling convention is stack-based and not register-based, even
+on architectures where it should be register-based.
+.PP
+The calling convention is not compatible with C.
diff --git a/doc/muse.1 b/doc/muse.1
new file mode 100644
index 0000000..d56d072
--- /dev/null
+++ b/doc/muse.1
@@ -0,0 +1,95 @@
+.TH MUSE 1
+.SH NAME
+muse
+.SH SYNOPSIS
+.B muse
+.I -[hmidos]
+.I [file...]
+.br
+.SH DESCRIPTION
+.PP
+The 'muse' tool takes as input a Myrddin source file and generates
+a usefile from it. A usefile collects definitions exported from the
+package specifications in Myrddin source code, and makes them available
+for other programs to include with a 'use' statement.
+.PP
+It can also merge together a number of usefiles into one larger usefile
+including all of the exported symbols. If an output file name is not given,
+and we are not merging usefiles, then an input file named
+.I filename.myr
+will generate a usefile named
+.I filename.use
+\&.
+
+If the filename does not end with the suffix
+.I .myr
+then the suffix
+.I .o
+will simply be appended to it.
+
+.PP
+The output of muse is architecture-independent. However, the format of the
+generated file is not stable, and is not guaranteed to work across
+different compiler versions.
+
+.PP
+The muse options are:
+
+.TP
+.B -d [flTri]
+Print debugging dumps. Additional options may be given to give more
+debugging information for specific intermediate states of the compilation.
+
+.TP
+.B -h
+Print a summary of the available options.
+
+.TP
+.B -I path
+Add 'path' to the search path for unquoted use statments. This option
+does not affect the search path for local usefiles, which are always
+searched relative to the compiler's current working directory. Without
+any options, the search path defaults to /usr/include/myr.
+
+.TP
+.B -o output-file
+Specify that the generated usefile should be named
+.I output-file
+
+.TP
+.B -s
+Print a summary of the symbols exported from the usefile that is specified.
+
+.SH EXAMPLE
+.EX
+ muse foo.myr
+ muse -o bar.use bar-system-version.myr
+ muse -mo library foo.use bar.use
+.EE
+
+.SH FILES
+The source for muse is available from
+.B git://git.eigenstate.org/git/ori/mc.git
+and lives in the
+.I muse/
+directory within the source tree.
+
+.SH SEE ALSO
+.IR mc(1)
+.IR ld(1)
+.IR as(1)
+
+.SH BUGS
+.PP
+There is insufficient checking and validation done on usefiles.
+.PP
+The file format is in flux, and in current iterations, it is not at all compact.
+.PP
+There is no versioning on the usefiles as it stands. If you use the wrong
+version with the wrong compiler, a mysterious error or even segfault is
+likely.
+.PP
+This utility should not exist. Instead, the compiler should put the
+exported symbol information into the object file or library directly.
+.PP
+The file format is not closed under concatentation.
diff --git a/doc/myrbuild.1 b/doc/myrbuild.1
new file mode 100644
index 0000000..7eb2693
--- /dev/null
+++ b/doc/myrbuild.1
@@ -0,0 +1,66 @@
+.TH MUSE 1
+.SH NAME
+myrbuild
+.SH SYNOPSIS
+.B myrbuild
+.I -[hblI]
+.I [file...]
+.br
+.SH DESCRIPTION
+.PP
+The 'myrbuild' tool takes as input a list of Myrddin or assembly sources,
+and compiles them in the correct dependency order into either a library or
+an executable. If the source files are myrddin sources, then the appropriate
+usefiles will be created as well. It expects Myrddin source to be in '.myr'
+files.
+
+.PP
+Myrbuild will default to building for the current architecture.
+
+.PP
+The myrbuild options are:
+
+.TP
+.B -h
+Print a summary of the available options.
+
+.TP
+.B -b name
+Compile source into a binary named 'name'. If neither this option nor
+the '-l' option are given, myrbuild will create a binary called 'a.out'.
+
+.TP
+.B -l 'name'
+Compile source given into a library called 'lib<name>.a', and a matching
+usefile called 'name'. Only static libraries are currently supported.
+
+.TP
+.B -I path
+Add 'path' to the search path for unquoted use statments. This option
+does not affect the search path for local usefiles, which are always
+searched relative to the compiler's current working directory. Without
+any options, the search path defaults to /usr/include/myr.
+
+.SH EXAMPLE
+.EX
+ myrbuild -b foo foo.myr
+ myrbuild -l foo bar.myr baz.myr
+ muse -mo library foo.use bar.use
+.EE
+
+.SH FILES
+The source for muse is available from
+.B git://git.eigenstate.org/git/ori/mc.git
+and lives in the
+.I myrbuild/
+directory within the source tree.
+
+.SH SEE ALSO
+.IR mc(1)
+.IR muse(1)
+.IR ld(1)
+.IR as(1)
+
+.SH BUGS
+.PP
+None known.