Writing a SQL parser in scheme

Functional programming languages are a neat tool to write parsers, especially when equipped with a pattern matching mechanism.

Here’s the beginning of our SQL parser in memdb:

(define parse_sql (lambda (s) (begin
                        
        (define identifier (lambda (s) (match s
                (regex "(?is)^`(.*)`(.*)" _ id rest) '(id rest)
                (regex "(?is)^([a-zA-Z_][a-zA-Z_0-9]*)(.*)" _ id rest) '(id rest)
                (error (concat "expected identifier, found " s))
        )))                             
                                        
        (match s                        
                (regex "(?s)^\\s*(?m:--.*?$)(.*)" _ rest) /* comment */ (parse_sql rest)
                (concat "\n" rest) (parse_sql rest)
                (regex "(?is)^\\s+(.*)" _ rest) (parse_sql rest)
                (regex "(?is)^CREATE(?:\\s|\\n)+TABLE(?:\\s|\\n)+(.*)" _ rest) (match (identifier rest) '(id rest) (concat "tablecreate " id ", decl: " rest) (error "expected identifier"))
                (error (concat "unknown SQL syntax: " s))
        )               
))) 

As you can see, you can nearly write EBNF syntax in scheme despite the exception that you are bound to left linearity.

I extended scheme with the match instruction which takes the following parameters:

(match value [pattern result] ... [default])

Where pattern can be one of:

  • "string" matches string equality
  • variable matches any value and stores the value into the variable
  • (concat "string" variable) splits away a fixed prefix
  • (concat variable "string" variable) splits away a fixed infix
  • (concat variable "string") splits away a fixed postfix
  • '(pattern pattern pattern ...) matches only a list with the same amount of items and matching patterns
  • (regex "pattern" variable variable variable) will parse the value as a regex and store all captured groups into variables

When a value matches the pattern, the result is evaluated. Matched variables can be used in the result expression. If no match has been found, default is evaluated.

This way, we can write parsers very intuitively, but also optimizers that traverse tree structure patterns become very easy to implement.

de_DEGerman

Durch die weitere Nutzung der Seite stimmst du der Verwendung von Cookies zu. Weitere Informationen

Die Cookie-Einstellungen auf dieser Website sind auf "Cookies zulassen" eingestellt, um das beste Surferlebnis zu ermöglichen. Wenn du diese Website ohne Änderung der Cookie-Einstellungen verwendest oder auf "Akzeptieren" klickst, erklärst du sich damit einverstanden.

Schließen