6
Limbo
Overview
Inferno programs are written in Limbo or in Java. Both languages are compiled into Dis code which is interpreted or compiled just-in-time for execution.
Limbo combines concepts from C and Modula with threads (spawn) and channels for synchronous communication from Hoare's CSP. There are dynamic strings and vectors (array) with a slicing operation as well as lists. Characters are manipulated as string using Unicode; conversions from and to array of byte with UTF are possible. Abstract datatypes (adt) are structures with procedure components that can be called C++ style. Finally, there are tuples -- value lists much like anonymous structures -- frequently used as function results. Inferno 2.0 adds an Exception adt.
Limbo is not object oriented; there is no inheritance or dynamic binding. There are no pointers but some types have ref semantics for passing parameters and adt values can be manipulated by reference (ref). Dis manages memory dynamically, normally be reference counting and adt values marked cyclic with garbage collection, i.e., ref can be used to construct self referential, dynamic data structures.
init (nil: ref Context, argv: list of string) {
sys = load Sys Sys->PATH;
if (argv != nil) # hd argv is the command name
while ((argv = tl argv) != nil) {
arg := hd argv;
if (len argv > 1)
sys->print("%s ", arg);
else
sys->print("%s\n", arg);
}
}
{}
A program consists of modules which are dynamically instantiated (
load).
A module contains global constants (PATH), adt definitions (Context), global variables (sys) and procedures with or without a result (init).
A source file starts with
implement and selects the module interface to be implemented (Echo). The module interface contains the exported names; constants (PATH) and adt types (Context)
can be reached through module names, variables and functions (print) only through module references obtained with load (sys).
import is used to avoid explicit references with ->.
include replaces a declaration and includes a file that is searched for in various places. By convention the file contains a module interface with the name of the file, usually for import clauses. implement introduces a module interface that is completely imported; it's contents can then be used without references (init).
init() follows the convention for command line programs (see sh.b).
nil makes a parameter inaccessible.
list has
ref semantics; therefore, argv can have nil as a value.
hd produces the first element,
tl the rest, and
len the length of a list. When defining a variable,
:= can be used to obtain the type from the initializer (arg); basically, the type between : and = is omitted. As usual,
names are visible locally.
The following example demonstrates that modules can be instantiated more than once and their global variables can be accessed quite intentionally:
{06/mymod.m}
MyMod: module { # a module with a string
name: string;
};
{06/mymod.b}
implement MyMod; # to implement it...
include "mymod.m"; # ...the module interface is sufficient
{06/mods.b}
implement Mods;
include "sys.m"; sys: Sys;
include "draw.m";
include "mymod.m";
Mods: module {
init: fn (c: ref Draw->Context, argv: list of string);
};
init (nil: ref Draw->Context, argv: list of string) {
sys = load Sys Sys->PATH;
a := load MyMod "mymod.dis"; # create 2 instances of MyMod
b := load MyMod "mymod.dis";
if (a == nil || b == nil) # did it work?
sys->print("cannot load a or b: %r\n");
else { # deposit two names...
a->name = "a"; b->name = "b";
print(a); print(b); # ...and display by module reference
}
}
print (m: MyMod) {
sys->print("%s\n", m->name);
}
{}
Modules have
ref semantics. MyMod contains a string. The module is instantiated twice, then the two strings are initialized and output with a function that uses a module reference.
%r as a format element produces the last system error message.
Statements mostly follow C:
Statements, definitions, and declarations may appear in any order.
Conditions must use int;
missing conditions produce infinite loops.
break exits from loops, case and alt statements,
continue continues loops; labels can be used to specify the range.
spawn creates a new thread to execute the
function call term(...). Threads begin by sharing everything; they must be
synchronized using channels.
exit terminates it's thread and releases exclusive resources.
cnt, like
wc, counts lines, words, characters, bytes, and characters within a range in files. The implementation is designed to exhibit as many features of Limbo as possible.
Data structure
{06/cnt.b}
implement Cnt; # a baroque version of wc
include "sys.m"; sys: Sys; FD, OREAD: import Sys;
stdin, stderr: ref FD;
include "draw.m"; Context: import Draw;
include "string.m"; str: String;
Cnt: module {
init: fn (ctxt: ref Context, argv: list of string);
};
usage () {
sys->fprint(stderr, "Usage: cnt [-lwcb] [-C class] file...\n");
exit;
}
Count: adt {
count: big;
report: fn (c: self Count);
};
lines, words, chars, bytes, spec, all: con iota;
class: string; # class for spec
{}
The String module contains functions to deal with strings and characters. In particular, it can handle character ranges such as a-z or ^0-9.
Count is an
adt that the Cnt module does not export. The component count is used for counting. Limbo has the
integer types byte (8 bits, positive), int (32 bits) and big (64 bits) as well as real for floating point calculations.
report() is a method; because of
self it is called without an argument for a Count object that is then referred to by the parameter c in the procedure.
adt objects have
value semantics; ref can be used to create
references and pass them.
con defines constants. iota takes the values int 0, 1... for the names in turn. Here, these constants specify in which order the counters will later be output if Count objects exist.
any := 0;
total := array[all] of ref Count;
if (argv != nil)
while ((argv = tl argv) != nil) { # first word is "cnt"
arg := hd argv;
if (arg[0] != '-' || arg == "-")
break; # -: stdin
if (arg == "--") { # --: end of options
argv = tl argv;
break;
}
opt: for (i := 1; i < len arg; i ++)
case arg[i] {
'b' => # -b: bytes
any++; total[bytes] = ref Count(big 0);
'c' => # -c: chars
any++; total[chars] = ref Count(big 0);
'C' => # -C class
class = arg[i+1:]; # rest of arg
if (class == nil) {
argv = tl argv; # next arg
if (argv == nil)
usage();
class = hd argv;
}
any++; total[spec] = ref Count(big 0);
break opt;
'l' => # -l: lines
any++; total[lines] = ref Count(big 0);
'w' => # -w: words
any++; total[words] = ref Count(big 0);
* =>
usage();
}
}
{}
Limbo uses
list and
array for aggregates. Both include the element type but not the number of elements as part of the aggregate type. Both have
ref semantics. Elements may be aggregates.
len produces the number of elements in a list, in a vector, or in a string.
To create a
vector, the number of elements or the elements themselves must be specified.
| x := array[10] of int; | 10 (local) undefined elements | |
| y := array of { 1, 2, 3 }; | 3 defined elements | |
| z := array[7] of { 1, 2, 5 => 5, * => 0 }; | here, four elements are 0 | |
| w := list of { 1.5, 2.8, 3.4 }; |
Global definitions are initialized with zero, objects with ref semantics are always initialized with nil, the rest remains uninitialized.
To initialize a vector, positions can be specified; * initializes all others in undefined order.
Integer constants are specified in decimal or with an explicit base as in 16rabc; if the value is large enough, it is big. Additionally, unicode characters and escape sequences such as \n or \uabcd in single quotes may be used as int constants.
string constants use double quotes.
A string element is a unicode character as an int value. Strings may be
compared.
There are
arithmetic conversions such as big 10, real 3,
string 10,
real "1.23". Additionally, an array of byte
can be converted to string
and back, see below. Finally, a
tuple, i.e., a sequence of values in parentheses, can be converted into an adt or ref adt #; this creates a new objects with it's value componentes filled with the tuple elements one after another. Therefore, if -b is specified to cnt, total[bytes] refers to a Count object initialized with big 0.
cnt has a typical command line with switches like -b and values for options like -C class. As usual, one or two minus signs terminate the options; one additionally refers to standard input. Switches can be specified singly or together. The value of an option is the rest of an argument or the next argument; 'C' demonstrates how this is decoded.
Vectors and strings may be
sliced by specifying the first index and the one past the end; if the latter is missing a slice like arg[i+1:] extends to the end. A vector slice has ref semantics while a string slice is a copy. One can assign an int value to the
end of a string to increase it's length.
Elements in a vector slice can be replaced.
Empty strings "" are equivalent to nil.
if (argv == nil)
cnt(total, "-");
else {
nfile := len argv;
do
cnt(total, hd argv);
while ((argv = tl argv) != nil);
if (nfile > 1)
report("total", total);
}
}
{}
With no options given, most counters are shown; therefore, Count objects are created for most elements in total.
If a range of characters is to be counted the String module must be loaded. If this fails an error message is sent with
fprint() to diagnostic output and the program is terminated.
If there are no arguments left, cnt() should count standard input. Otherwise, a file name is specified to be counted. If more than one file is specified on the command line report() must produce a sum.
Functions like cnt() or report() can be used before they are defined.
if (fnm == "-")
fd = stdin;
else
fd = sys->open(fnm, OREAD);
if (fd == nil) {
sys->fprint(stderr, "cnt: cannot open %s: %r\n", fnm);
return;
}
n := array[all] of ref Count;
for (i := 0; i < all; ++ i)
if (total[i] != nil) n[i] = ref Count(big 0);
unicode := total[chars] != nil || total[spec] != nil
|| total[words] != nil;
content := unicode || total[lines] != nil;
buf := array[1024] of byte;
while ((nb := sys->read(fd, buf, len buf)) > 0) {
if (n[bytes] != nil) n[bytes].count += big nb;
if (content) {
for ((beg, i) := (0, 0); i < nb; ++ i)
if (int buf[i] == '\n') {
if (n[lines] != nil) ++ n[lines].count;
if (unicode) {
wc(n, buf[beg:i+1]);
beg = i+1;
}
}
if (unicode) wc(n, buf[beg:nb]); # rest of buf, if any
}
}
if (nb < 0)
sys->fprint(stderr, "cnt: read error %s: %r\n", fnm);
if (unicode) wc(n, nil); # rest of file, if any
report(fnm, n);
for (i = 0; i < all; ++ i)
if (total[i] != nil) total[i].count += n[i].count;
}
{}
Each file is counted in a copy of total[] containing Count objects initialized with big 0. If one wants to count characters or words one has to deal with unicode, see below. Another gain in efficiency is possible if the number of lines need not be counted.
Following these preparations,
read() produces a buffer with bytes and the length is counted if required. If the contents need to be analyzed the buffer is split at line separators and the lines are counted if required. buf[] contains byte values which have to be
converted before they can be compared with int values.
The
for loop illustrates a
definition and initialization with a tuple value.
++ and -- exist not only as
postfix operators: as in C, they deliver as prefixes the new and as postfixes the previous value.
Taking care of unicode is not quite trivial. Therefore, a function
wc() is used which receives the Count vector n[] and a
buffer slice. In principle, wc() is called once per line, but at the end of the buffer and at the end of a file some bytes may be left over. Therefore, beg tracks the beginning of a line and the buffer slice for wc() can be empty or nil.
At the end, file name and counters are output with report() and the sums are added up in total[].
Starting with an adt object,
ref produces a reference to a copy.
* is used to get from a reference to an object, but there, too, only a copy is reached.
It would have been better to define Count.report() so that a reference suffices as an argument, but here it serves as an example for passing an adt value.
{06/cnt.b}
Count.report (c: self Count) {
sys->print("%7bd", c.count);
}
{}
report() is a global function in the module, Count.report() is a method for Count. The method has no particular privileges.
To output
big values the format element must include b.
Files contain bytes but Limbo processes unicode characters with string. By convention, a file should contain UTF where ASCII represents itself and all bytes with value >= 16r80 represent other unicode characters using two or three bytes.
Given a byte vector with one line it is relatively simple to analyze characters:
{06/cnt.b}
line: array of byte;
wc (c: array of ref Count, buf: array of byte) {
if (buf != nil && len buf > 0) {
l := 0;
if (line != nil)
l = len line;
new := array[l + len buf] of byte; # line + buf
new[0:] = line; # copy line
new[l:] = buf; # append buf
line = new; # yields new line
if (int line[len line-1] != '\n') # ...complete?
return;
}
if (line != nil) {
s := string line; l := len s;
if (c[chars] != nil) c[chars].count += big l;
if (c[words] != nil) {
(n, nil) := sys->tokenize(s, " \t\n");
c[words].count += big n;
}
if (class != nil)
for (n := 0; n < l; ++ n)
c[spec].count += big str->in(s[n], class);
line = nil;
}
}
{}
The line separator \n is an ASCII character, i.e., it represents itself in a byte and can be used to recognize lines as complete UTF sequences. If line[] does not contain a complete line wc() waits for it's next activation. line[] is global and thus persists between calls.
At the latest at end of file wc() is called with nil and an incomplete line does get processed.
When
converting between array of byte and string
UTF is converted into unicode and back. However, the result is undefined if the bytes do not follow UTF conventions. The official implementation of
wc uses a finite automaton rather than string to discover defective sequences. An exact solution could also be based on
sys->byte2char().
len produces the length of a string, i.e., the number of unicode characters. Words are either counted with a finite automaton that reacts appropriately for white space or one uses
sys->tokenize() to split at particular characters. If
nil is used in the target tuple of an assignment the corresponding value is ignored. Unicode contains more white space than mentioned here.