Functional programming languages are a neat tool to write parsers, especially when equipped with a pattern matching mechanism.
Here’s the beginning of our SQL parser in memdb:
(define parse_sql (lambda (s) (begin (define identifier (lambda (s) (match s (regex "(?is)^`(.*)`(.*)" _ id rest) '(id rest) (regex "(?is)^([a-zA-Z_][a-zA-Z_0-9]*)(.*)" _ id rest) '(id rest) (error (concat "expected identifier, found " s)) ))) (match s (regex "(?s)^\\s*(?m:--.*?$)(.*)" _ rest) /* comment */ (parse_sql rest) (concat "\n" rest) (parse_sql rest) (regex "(?is)^\\s+(.*)" _ rest) (parse_sql rest) (regex "(?is)^CREATE(?:\\s|\\n)+TABLE(?:\\s|\\n)+(.*)" _ rest) (match (identifier rest) '(id rest) (concat "tablecreate " id ", decl: " rest) (error "expected identifier")) (error (concat "unknown SQL syntax: " s)) ) )))
As you can see, you can nearly write EBNF syntax in scheme despite the exception that you are bound to left linearity.
I extended scheme with the match
instruction which takes the following parameters:
(match value [pattern result] ... [default])
Where pattern
can be one of:
"string"
matches string equalityvariable
matches any value and stores the value into the variable(concat "string" variable)
splits away a fixed prefix(concat variable "string" variable)
splits away a fixed infix(concat variable "string")
splits away a fixed postfix'(pattern pattern pattern ...)
matches only a list with the same amount of items and matching patterns(regex "pattern" variable variable variable)
will parse the value as a regex and store all captured groups into variables
When a value matches the pattern, the result is evaluated. Matched variables can be used in the result
expression. If no match has been found, default
is evaluated.
This way, we can write parsers very intuitively, but also optimizers that traverse tree structure patterns become very easy to implement.
Comments are closed