Rapid Programming Language Prototypes with Ruby and Racc

hey rubyconf!

I'm Tom Lee (@tglee)

I work for New Relic

on Mobile & Linux Server Monitoring

We want your braaaains

http://www.newrelic.com/about/jobs

Rapid Programming Language Prototypes with Ruby and Racc

Before we start: racc. not rack.

Racc is Yacc/Bison for Ruby

We are talking about this ...


Rack is web server middleware

... not this.

overview

compilers 101

generalized compiler architecture

about racc

why not yacc/bison?

introducing sucklang®

the end

Compilers 101

What is a compiler?

Translates code in some source language to a target language

(target code is usually semantically equivalent to the source)

Usually via several intermediate steps

Compilers 101

Scanner

aka lexical analyzer or tokenizer

takes raw source code as input

outputs a stream of tokens

Compilers 101

Parser

takes tokens as input

outputs an intermediate representation

(e.g. an abstract syntax tree)

the parser "makes sense" of the token stream

parsers can be automatically generated from a grammar

programs that do this are called parser generators

Compilers 101

Context-Free Grammar

Or simply grammar

describes the complete syntax of a (programming) language

usually expressed in Extended Backus-Naur Form (EBNF)

variants of EBNF often used for parser generator DSLs

Compilers 101

Abstract Syntax Tree

aka AST

Rooted tree data structure

In-memory representation of the input program

"abstract" in that some detail is omitted

Contrast to a concrete syntax tree

Compilers 101

Code Generator

Input: intermediate representation from the parse

Output: code in the target language

(e.g. x86/x64 assembly, C, JVM bytecode, JavaScript...)

For an AST, we will traverse each node in the tree

for each node, emit the corresponding target code

Generalized Compiler Architecture

about racc

racc is a parser generator

http://github.com/tenderlove/racc

(by the one & only @tenderlove)

(by Minero Aoki)

write your grammar using an EBNF-y DSL

run racc my_grammar.racc

this generates Ruby code that can parse your "language"

why not zoidberg yacc/bison?

Compilers are fun, dammit.

Fighting with C/C++, Makefiles, etc.?

Not so much.

Let's make it easier to experiment with our crazy ideas!

introducing sucklang®

grammar

sucklang sucks by design

call ::= ID "(" ")"

(assume ID is /[a-zA-Z_][a-zA-Z_0-9]*/)

introducing sucklang®

example

ohai()

introducing sucklang®

implementing the scanner

racc tokens are two-element sequences

the first element is the token type

the second element is the token value

(e.g. [:ID, "ohai"])

Let's write a scanner!

introducing sucklang®

implementing the parser

now we need to write our racc grammar!

first, make sure you've got racc installed:

gem install racc

invoking racc (I use rbenv):

rbenv exec racc -o lib/sucklang/parser.rb \
                    lib/sucklang/parser.racc

hot tip: add this to a Rakefile :)

introducing sucklang®

building an AST

our parser doesn't output anything yet

let's build an AST!

introducing sucklang®

implementing the code generator

nearly there! the parser now constructs an AST

let's generate C code for kicks.

stuff I didn't cover

or: exercises for the reader/attendee/whatever

More complex grammars!

Symbol tables!

Code generators for different languages!

AST & peephole optimization!

... and all the hard fun stuff :)

the end

New Relic

(my god-like employer)

http://newrelic.com

BRAAAAAINS

http://newrelic.com/about/jobs

Tom Lee

(compiler nerd wannabe extraordinaire)

http://tomlee.co

http://github.com/thomaslee

@tglee