Functional programming languages are a neat tool to write parsers, especially when equipped with a pattern matching mechanism.
Here’s the beginning of our SQL parser in memdb:
(define parse_sql (lambda (s) (begin (define identifier (lambda (s) (match s (regex "(?is)^`(.*)`(.*)" _ id rest) '(id rest) (regex "(?is)^([a-zA-Z_][a-zA-Z_0-9]*)(.*)" _ id rest) '(id rest) (error (concat "expected identifier, found " s)) ))) (match s (regex "(?s)^\\s*(?m:--.*?$)(.*)" _ rest) /* comment */ (parse_sql rest) (concat "\n" rest) (parse_sql rest) (regex "(?is)^\\s+(.*)" _ rest) (parse_sql rest) (regex "(?is)^CREATE(?:\\s|\\n)+TABLE(?:\\s|\\n)+(.*)" _ rest) (match (identifier rest) '(id rest) (concat "tablecreate " id ", decl: " rest) (error "expected identifier")) (error (concat "unknown SQL syntax: " s)) ) )))
As you can see, you can nearly write EBNF syntax in scheme despite the exception that you are bound to left linearity.
I extended scheme with the
match instruction which takes the following parameters:
(match value [pattern result] ... [default])
pattern can be one of:
"string"matches string equality
variablematches any value and stores the value into the variable
(concat "string" variable)splits away a fixed prefix
(concat variable "string" variable)splits away a fixed infix
(concat variable "string")splits away a fixed postfix
'(pattern pattern pattern ...)matches only a list with the same amount of items and matching patterns
(regex "pattern" variable variable variable)will parse the value as a regex and store all captured groups into variables
When a value matches the pattern, the result is evaluated. Matched variables can be used in the
result expression. If no match has been found,
default is evaluated.
This way, we can write parsers very intuitively, but also optimizers that traverse tree structure patterns become very easy to implement.