summaryrefslogtreecommitdiff
path: root/libregex/doc/myr-regex.3
blob: c0d0da02cd6ca07ac866014169b5a9064362a0f8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
.TH MYR REGEX 1
.SH NAME
regex myr-regex
.SH LIBRARY
regex
.SH SYNOPSIS
.B use regex
.I const compile	: (re : byte[:] -> std.error(regex#, status))
.I const dbgcompile	: (re : byte[:] -> std.error(regex#, status))
.I const free           : (re : regex# -> void)
.br
.I const exec	: (re : regex#, str : byte[:] -> bool)
.I const search	: (re : regex#, str : byte[:] -> bool)
.SH DESCRIPTION
.PP
The regex library provides functions for compiling and evaluating regular
expressions, as described later in this document, or in myr-regex(7).
.PP
.I regex.compile will take a string describing a regex, and will attempt
to compile it, returing 
.I `std.Success regex#
if the regex is valid, and there were no error conditions encountered during
compilation. If the compilation failed,
.I `std.Failure regex.status
will be returned, where regex.status is a failure code.

.PP 
.I regex.dbgcompile
is identical to 
.I regex.compile,
however, it will print debugging information as it compiles, and each
time the regex is evaluated.

.PP
.I regex.exec
will take the regex passed to it, and evaluate it over the text provided,
returning the 
.I `std.Some matches,
or 
.I `std.None
if there were no matches found. The matches must span the whole string.

.PP
.I regex.search
is similar to regex.exec, but it will attempt to find a match somewhere
within the string, instead of attempting to find a match spanning the whole
string.

.SH REGEX SYNTAX
.PP
The grammar used by libregex is below:

.EX
    regex       : altexpr
    altexpr     : catexpr ('|' altexpr)+
    catexpr     : repexpr (catexpr)+
    repexpr     : baseexpr[*+?]
    baseexpr    : literal
                | charclass
                | charrange
                | escaped
                | '.'
                | '^'
                | '$'
                | '(' regex ')'
    charclass   : see below
    charrange   : '[' (literal('-' literal)?)+']'
.EE

The following metacharacters have the meanings listed below:
.TP
.
Matches a single unicode character
.TP
^
Matches the beginning of a line. Does not consume any characters.
.TP
$
Matches the end of a line. Does not consume any characters.
.TP
*
Matches any number of repetitions of the preceding regex fragment.
.TP
*?
Reluctantly matches any number of repetitions of the preceding regex fragment.
.TP
+
Matches one or more repetitions of the preceding regex fragment.
.TP
+?
Reluctantly matches one or more repetitions of the preceding regex fragment.
.TP
?
Matches zero or one of the preceding regex fragment.

.PP
In order to match a literal metacharacter, it needs to be preceded by
a '\\' character.

The following character classes are supported:
.TP
\\d
ASCII digits
.TP
\\D
Negation of ASCII digits
.TP
\\x
ASCII Hex digits
.TP
\\X
Negation of ASCII Hex digits
.TP
\\s
ASCII spaces
.TP
\\S
Negation of ASCII spaces
.TP
\\w
ASCII word characters
.TP
\\W
Negation of ASCII word characters
.TP
\\h
ASCII whitespace characters
.TP
\\H
Negation of ASCII whitespace characters
.TP
\\pX, \\p{X}
Characters with unicode property 'X'
.TP
\\PX, \\P{X}
Negation of characters with unicode property 'X'

.PP
Unicode properties that are supported are listed below:

.TP
L, Letter
Unicode letter property
.TP
Lu, Uppercase_Letter
Uppercase letter unicode property
.TP
Ll, Lowercase_Letter
Lowercase letter unicode property
.TP
Lt, Titlecase_Letter
Titlecase letter unicode property
.TP
N, Number
Number unicode property
.TP
Z, Separator
Any separator character unicode property
.TP
Zs, Space_Separator
Space separator unicode property


.SH EXAMPLE
.EX
        use std
        use regex

        const main = {
            match regex.compile(pat)
            var i
            | `std.Success re:
                    match regex.exec(re, text)
                    | `std.Some matches:
                            for i = 0; i < matches.len; i++
                                std.put("Match %i: %s\n", i, match[i])
                            ;;
                    | `std.None: std.put("Text did not match\n")
                    ;;
            | `std.Failure err:
                    std.put("failed to compile regex")
            ;;
        }
.EE

.SH FILES
The source code for this compiler is available from
.B git://git.eigenstate.org/git/ori/libregex.git

.SH SEE ALSO
.IR mc(1)

.SH BUGS
.PP
This code is insufficiently tested.

.PP
This code does not support all of the regex features that one would expect.