bcg Language

See:
          Description

Packages
bcg  
bcg.ast  
bcg.parser  

 

bcg Language

The bcg language is a simple line oriented language with the following
features:

The following sections describe some of the language basics.

The grammar for the language (using pj2) is included in this documentation
as grammar.txt (compile and execute via web
service
).

All of the text in this overview is stored as examples in the
doc/examples/overview/ directory. Any example with
“-bad” in the title are expected not to compile. Any example with “-nyi” in
the title shouldn’t compile, but currently do because of missing compiler
features.

Some addional examples are also provided:

Format

Text in monospaced font indicates pieces of code that are not executable on their own.

 
 Indented monospace text in a grey background is executable code.

 Some code will not compile.  These are commented out and labeled with
 # ERROR, and possibly a reason.
 

^ Top ^

Basic Structure

A simple statement in bcg is one of:

A statement is bcg is one of:

Additionally, at the top level (not inside another block) can be:

Statements in bcg are separated by either newlines or semicolons. The
following two code blocks are equivalent:

 
 var x
 x = 2
 print x
 
 
 var x; x = 2; print x
 

Comments begin with a number sign (#) and extend until the end of line.

 
 # Example comments
 x = x + 1 # Increment x
 

^ Top ^

Types and Literals

bcg has three types: boolean, integer and string. They are referred to by
the names bool, int, and string.

Booleans are either true or false. bcg is case sensitive, so TRUE
and FALSE are not valid boolean values.

Integers are expressed in base 10 and can not begin with a 0:

100 valid
-100 valid because bcg understands unary negation
0644 invalid, can’t begin numbers with 0
0xff invalid, bcg doesn’t accept C style hex constants

Strings are represented by text in double quotes, such as "this example".
bcg does not currently recognize any C-style escapes, but this may change.

Arrays can also be created from any of the above types. An array literal
is a comma separated list of expressions surrounded by curly braces. For
example:

 
 print { 1, 2, 3 }
 print { false, true }
 print { "foo", "bar", "baz" }
 

There is also a void type, which is an absence of type. Void is only
used as the type of a subroutine that does not return a
value. Void subroutines cannot be used as a function in an
expression.

^ Top ^

Variable Declaration and Assignment

Variables are declared with a var statement which has the variable name
and possibly a type and/or initializer. Once declared, a variable can be
used in any expression where an expression of its
type is expected. All variables must be declared before they are used.

Variable names are any number of letters, numbers or underscores but may
not begin with a number. Variables can not have the same name as a
subroutine or another variable in the same scope. In addition,
the following names are reserved and may not be used as a variable name:

and, bool, break, const, continue, else, for, func,
if, in, int, not, or, print, read, return, string,
unless, until, var, void, while

The words break, const, continue, for and in have no meaning in
this language, but are reserved for future use.

The declaration may optionally include a type after a colon (:). The
type can be bool, int, or string. If the declaration has no type,
the variable has the same type as its initializer. A variable with no type
and no initializer is an integer (int). Unlike
subroutines, variables can not have a void type.

The declaration may optionally include some initial value for the variable
after an = operator. This value can be any valid
expression. It is an error to give an initializer with a
different type than the variable. Integer variables without an initializer
are given the value zero (0), and boolean variables are given false.

 
 # print x # ERROR, x must be declared first
 var x # Is an integer and equal to 0
 var y : int  = 1
 var z = 24 * 60 * 60 # 1 day in seconds

 var b : bool
 var b2 = true # boolean
 # var b3 : bool = 5 # ERROR, initializer must match variable type

 var s = "string"
 var s2 : string
 

Variables can have new values assigned to them anywhere a statement is
expected. The syntax is similar to C and many other languages: a variable
name, an equal sign (=), and an expression. Multiple assignments can
not be combined as in C. It is an error if the expression and variable
have different types.

 
 var x
 var y
 var z : bool
 var s : string
 x = 1
 y = x + 1
 # x = y = 3 # ERROR
 # z = y + 1 # ERROR
 z = y == 2
 s = "value"
 

^ Top ^

Array Variables

Variables can also be declared as arrays of any type. This is done by
adding square brackets to the type. A size must be provided either by a
non-negative integer literal between the brackets or an initializer for the
variable. If an initializer is provided, the variable array length is the
length of the given array. Both an explicit size and a literal array
initializer can be given, but the lengths must match. (The contents of the
initializer do not have to be literals.)

 
 var list : int[3]
 var names : string[] = { "Tom", "Dick", "Harry" }
 var options : bool[2] = { true, false }

 # var list2 : int[3] = list # ERROR: Can not give size and non-literal initializer
 var list2 : int[] = list # OK

 # var options2 : bool[3] = { false, true } # ERROR: sizes must match
 var options2 : bool[2] = { options[1], options[0] }
 

Array variables without initializers become arrays of the given size whose
contents are the same as uninitialized variable of that type.

Array variables of the same type can be assigned to each other. This does
not check the size of the arrays, instead the destination becomes a reference
to the original array.

 
 var list : int[3] = { 1, 2, 3 }
 var list2 : int[2]
 list2 = list
 print list[2] == 3 # prints true
 

As the above examples imply, individual elements can be extracted from
arrays by adding a number between square brackets after the variable name.
Any integer expression can be used, in fact. Arrays are indexed beginning
at zero and ending at one less than their size. An index value outside
that range is an error and will cause the program to immediately terminate.

The size of an array can be determined at runtime by omitting the number in
square brackets. The size of an array is always an integer.

 
 var list : int[3] = { 1, 2, 3 }
 print list[0] # prints 1
 print list[list[] - 1] # prints 3
 # print list[10] # ERROR: index out of bounds
 

^ Top ^

Scoping Rules

Variables may be accessed within the block that declared them and any
nested scope (such as an if block). Variables may not be
re-declared in the same scope. Variables inside a scope may have the same
name variables outside it, hiding the outer variable. An initializer
hiding a variable in this way may refer to the outer variable, but any
later code will access the inner one.

 
 # x = 0 # ERROR, undeclared variable x
 var x = 1
 if x == 1
   print x       # prints 1
   var x = x + 1 # outer x
   var y = x     # inner x
   print y       # prints 2
 end
 print x # prints 1
 # var x = 2 # ERROR, x already declared in this scope
 # print y # ERROR, y not declared in this scope
 

^ Top ^

Subroutines and Functions

Subroutines cannot be defined inside any other statement (i.e. a
subroutine, an if statement or a while loop). Subroutines have a name, a
type, and zero or more arguments. They are declared with the following
syntax:

 
 func <name> ( <argument> : <type>, ... ) : <type>
   <block>
 end
 
 

All subroutines must have names and have the same naming limitations as
variables. Because subroutines can only be defined at the
outermost scope, they can not be hidden by new subroutines. Attempting to
declare a subroutine with the same name as an already declared variable or
subroutine is an error.

There can be any number of arguments and they can have the same types as a
variable. In the block, these arguments act the same as
variables defined at the beginning of the block, meaning that they can be
used whenever an expression is expected and defining a variable with the
same name in the same scope is an error. (Variable names can be reused in
inner scopes as normal.)

Subroutine arguments can not have initializers, but can be declared without
a type. As per normal variable declarations, an argument with no type is
an integer (int) and does not require a colon (:) afterwards.

Array arguments must be declared without a size, as they will have the size
of whatever array is passed.

Multiple arguments are separated by commas, and no comma is required if
there is only a single argument. If a subroutine has no arguments, the
parentheses can also be omitted.

The return type of the subroutine comes after the name, arguments and a
colon. The return type can be the type of any variable
(bool, int, string, or an array) or void. A void type indicates
that the subroutine can only be used as a statement. A subroutine
with any other type can be used wherever an expression of its return type
is expected. A non-void subroutine is also called a
function.

Unlike a variable, a subroutine without a declared return type is of type
void. If the return type is omitted the colon must be as well. The most
minimal kind of function has no return type or arguments:

 
 func hello_world
   print "Hello, world."
 end

 hello_world
 

All non-array values are passed into subroutines by value. Assigning to a
subroutine’s argument does not change the value of anything outside that
subroutine. (For those who know the details of the Java VM, strings are
technically passed by reference but because they are immutable, any changes
create new objects instead of altering the one passed in.)

Array arguments refer to the original array and altering the argument will
change the original array. However, assigning a new array to the argument
will break this connection (as it creates a new copy)

^ Top ^

Built-Ins

bcg has three built-in subroutines: read, print, and return.

Read must be followed by a single, already declared, integer variable name.
It reads an integer from the standard input and stores it in the variable.

Print must be followed by an expression (of any type) and
will print the result on the standard output followed by a newline.

If given the input “1”, the following code will output “2\ntrue\n”:

 
 var input
 read input
 print input+1
 print input == 1
 

Inside a subroutine, return must be followed by an
expression with the same type as the subroutine itself.
Outside a subroutine or inside one with a void type, return can only be
used with no expression. This will end the subroutine or program early.

^ Top ^

Subroutine calls

Calling a subroutine works like using a built-in. If the subroutine takes
more than one argument, they are separated by commas. Note that unlike C
(and similar languages), there are no parenthesis around the arguments. If
the subroutine has a non-void return type, it’s return value is ignored.
If you wish to use the return value, see the section on function
calls
.

 
 func hello( name : string )
   print "Hello, " + name + "!"
 end
 hello "world"

 func print_add( a, b ) : int
   print "  " + a
   print "+ " + b
   print "= " + (a + b)
   return a + b
 end
 print_add 2, 2
 # print_add( 2, 2 ) # ERROR: no parenthesis around arguments.
 print_add (1+1), 1  # prints 2 + 1 = 3
 

^ Top ^

Expressions

bcg has an expression language that includes:

All expressions are typed, although the type system is extremely simple.
All mathematical operators take and return integers. All comparisons take
integers and return booleans. All logical operators take and return
booleans.

The table below is a summary of all operators, ordered from highest to
lowest precedence:

Operators Associativity Kind
- None Mathematics
^ Right
* / % Left
+ - Left
< <= == != >= > Chained Comparisons
not None Boolean Logic
and Left
or Left

Expressions are not valid as statements on their own, but many statements
involve expressions.

^ Top ^

Mathematical Operations

bcg understands a variety of mathematical operations on any integer
expression: integer constants, variables
and other mathematical operations.

a + b addition
a - b subtraction
a * b multiplication
a / b division
a % b modulus a/k/a remainder
a ^ b exponentiation a/k/a power
-a negation

These operators behave as in normal algebra, as described in the precedence
table above. For example -2 ^ 2 ^ 2 + 3 - 4 * 5 / 6 % 7 is
evaluated:

  1. -2 ^ (2 ^ 2) + 3 - 4 * 5 / 6 % 7 (^ starts from the right)
  2. ((-2) ^ 4) + 3 - 4 * 5 / 6 % 7 (unary - occurs before ^)
  3. (16 + 3) - 4 * 5 / 6 % 7
  4. 19 - (4 * 5) / 6 % 7 (*, / and % occurs before -)
  5. 19 - (20 / 6) % 7
  6. 19 - (3 % 7) (3 is the integer result of 20 / 6)
  7. 19 - 3 (3 / 7 is 0 with 3 remaining)
  8. 16
 
 print -2 ^ 2 ^ 2 + 3 - 4 * 5 / 6 % 7
 print 16
 

The result of all mathematical operations are integers and can be used
anywhere an integer is expected, such as assignment and
comparisons.

^ Top ^

String Concatenation

The + operator is also used for string concatenation. If either side of
it is a string expression, the other side is converted to a string. The
following prints “a1true\n”.

 
 print "a" + 1 + true
 

Comparisons

Any integer or string expression can be compared to another expression of
the same type using the following comparison operators:

a < b is a less than b?
a <= b is a less than or equal to b?
a == b is a equal to b?
a != b is a not equal to b?
a >= b is a greater than or equal to b?
a > b is a greater than to b?

The result of any comparison is a boolean and can be used wherever a
boolean expression is expected. Integers sort numerically and strings sort
lexicographically. All of the following are true:

 
 print 1 < 2
 print "a" < "b"
 print "aa" < "ab"
 

Using booleans with any comparison operator or comparisons between
expressions of differing types are errors:

 
 # print true > false # ERROR
 # print 1 < "a"      # ERROR
 

Assignments can not be used where comparisons are expected, so
== (equality) can not be confused with = (assignment).

Arrays can be checked for equality with arrays of the same type. Arrays
are equal if their size is the same and their contents are equal. Using
any comparison other than == or != with an array is an error.

 
 var list : int[] = {1, 2, 3}
 print list == {1, 2, 3} # prints true
 print list != {1, 2}    # prints true
 # print list < {1, 2, 2} # ERROR: Arrays can only be tested for equality
 
 var list2 : bool[3]
 # print list != list2 # ERROR: Arrays are of different type
 

^ Top ^

Boolean Logic

Boolean expressions can be combined using some basic boolean logic. bcg
understands and, not, and or as operators. These operators group as
usual, so A and not B or C is the same as A and ((not B) or C).

These logical operators only operates on booleans (such as the results of
comparisons), so the previous example only works if A,
B, and C are replaced with boolean expressions like a < b.

The result of boolean logic is a boolean, so they can be used wherever a
boolean expression is expected.

^ Top ^

Chained Comparisons

Comparisons can be chained together so that expressions can easily be
compared against multiple values. For example 0 < x < 10 is the same as
0 < x and x < 10. Any number and combination of comparisons can be
combined in that fashion and work as follows:

A ? B ?? C

becomes

A ? B and B ?? C

where A, B and C are expressions
and ? and ?? are any comparison operators

Note that this only compares pairs of values. a < b < c > d implies
nothing about the relationship between d and either a or b. It does
imply that a < c, but only due to the transitive properties of
comparisons, and the language does not perform that comparisons.

^ Top ^

Function Calls

Wherever a expression is expected, a function of the expected type can be
called instead. This means that the result of int functions can be used
in mathematics, bool functions in logic, and string functions in
concatenation. Subroutines of type void can not be used in an
expression.

The syntax for function calls is similar to subroutine calls as
statements, except that it requires parenthesies around the arguments.
Functions with no arguments may have the parentheses omitted.

 
 func math : int
   return 1 + 1
 end

 func condition( val : int ) : bool
   return val < 3
 end

 func output( worked : bool, name : string ) : string
   if worked
     return "It worked, " + name
   else
     return "Sorry, but no"
   end
 end

 # Prints: It worked, User."
 print output( math < 2 and condition( math() + 1 ), "User" ) + "."
 

The result of calling a function is the value passed to the return
built-in. Functions that never call the return built-in return the same
value as an uninitialized variable of its type (i.e. 0, false, or the empty
string).

 
 func b : bool
   print "b"
 end
 func i : int
   print "i"
 end
 func s : string
   print "s"
 end

 print b  # prints b, then false
 print i  # prints i, then 0
 print s  # prints s, then an empty line
 

^ Top ^

If Blocks

bcg supports branching with an if/else construct. The following code
assigns the variable max to the larger of x and y (assuming those
variables have been appropriately declared).

 
 if x < y
   max = y
 else
   max = x
 end
 

The generic form of this is:

 
 if <boolean expression>
   <block1>
 else
   <block2>
 end
 

<block1> and <block2> are any number of statements and create their own
variable scopes separate from each other and their surroundings.
An if will execute the first block if its boolean expression
is true, and the second when it is false.

Just as multiple statements may appear on a single line, these blocks may
be written on the same line as the if, else, or end by separating
them with a semicolon (;). The previous example may be more compactly
written:

 
 if x < y; max = y; else; max = x; end
 

If the only statement after else is another if block, the else and
if may be on the same line without a semicolon:

 
 if x > y and x > z
   max = x
 else if x < y > z
   max = y
 else
   max = z
 end
 

If the else block is empty, it may be omitted:

 
 if x > 10
   x = x / 10
 end
 

^ Top ^

Unless Blocks

bcg also has an unless construct analogous to the if. It works
identically except that it executes its first block if the boolean
expression is false and its second otherwise. This can help avoid using
not with long parenthesized expressions.

unless may have an else block and may be written else unless without
a semicolon, and if and unless may be used freely together:

 
 unless a == 0
   # Runs if not a == 0
 else if b == 0
   # Runs if not a == 0 and b == 0
 else unless c == 0
   # Runs if not a == 0 and b == 0 and not c == 0
 end
 

^ Top ^

Conditional Statements

When an if or unless contains a single simple statement, the condition
can be written as a suffix to the statement. For example, the following
two pieces of code are equivalent:

 
 if x > 10
   x = x / 10
 end
 unless y == 0
   x = x / y
 end
 
 
  x = x / 10 if x > 10
  x = x / y  unless y == 0
 

^ Top ^

While and Until Loops

Iteration is accomplished via while loops:

 
 var x = 10
 while x != 0
   x = x - 1
 end
 

Generically, while is written:

 
 while <boolean expression>
   <block>
 end
 

<block> is any number of statements and creates a new variable
scope
. An while will execute its block repeatedly as long as
its boolean expression is true.

until is related to while just as unless is related to if. It
executes its block repeatedly as long as its condition is false. The
following example does the same thing as the previous one: (Notice one uses
== and the other !=)

 
 var x = 10
 until x == 0
   x = x - 1
 end
 



Written by Brian Gernhardt for Compiler Construction at RIT, Winter 2010