Information for MS Capstone Students
I am happy to supervise the MS capstone activities (projects or
theses) of students interested in pursuing a topic in programming
languages, especially those topics that are closely aligned with my
primary areas of interest. A student pursuing an MS capstone
activity under my supervision will almost certainly need to have
taken one or more of the following courses: 4005-710
(Programming Language Theory), 4005-711
(Compiler Construction), and 4005-714
(Programming Skills: Functional Programming).
If you are considering asking me to chair your MS capstone
activity, then first take the time to investigate one or more of the
suggested topics listed below. Next, schedule a meeting with me to
discuss a potential capstone activity topic in more detail. While I
am especially interested in pursuing the suggested topics listed
below, feel free to propose similar topics.
Be sure to understand the policies and procedures related to MS
capstone activities, as described in the Computer Science
Graduate Advising Handbook. Also, be aware that during the time
that I am supervising your MS capstone activity, I will expect to
meet with you on a weekly basis in order to review progress, discuss
questions and obstacles, and plan future work.
Primary Areas of Interest
- Functional Programming
- Compiler Technology (especially for functional programming languages)
- Type Systems
- Program Semantics
- Language Design and Implementation for Concurrency and Parallelism
Suggested Topics
MLton
MLton is an open-source,
whole-program, optimizing compiler for the Standard ML
programming language.
Begun in the summer of 1997, MLton has grown to over 100,000
lines of code and has become a viable (and, in some instances, the
only viable) option for the compilation and execution of significant
applications written in Standard ML. MLton is widely regarded
as one of the best compilers for any functional programming
language; it is actively used in both industry and academia. As a
mature infrastructure with numerous users, it provides ample
opportunity for students of all levels to investigate topics in
language design and implementation.
My RIT
Computer Science Community Teacha' Talk gives an overview of the
MLton project.
-
Explore Alternate Control-Flow Analyses for Defunctionalization
Defunctionalization is a program transformation that converts a
higher-order program into a first-order program.
Defunctionalization is a key tranformation in MLton's compilation
strategy. Control-flow analysis is a program analysis that
identifies the set of functions that could be invoked at each call
site. Control-flow analysis is used to guide the
defunctionalization transformation. Control-flow analysis has had
a recent renaissance, and this topic aims to investigate if
advances in control-flow analysis can lead to better
defunctionalization.
Background:
-
Explore Alternate Closure Representations
Status: in progress (Daniel Rosenwasser)
A closure is a data structure that represents a first-class
function at runtime. A closure is responsible for providing a
first-class function access to its free variables. MLton's
closure conversion algorithm uses a simple "flat" closure to
represent each function. This topic aims to explore the costs and
benefits of alternate closure representations.
Background:
-
Adapt and Implement Hoopl for MLton's SSA Intermediate Langauge
Status: in progress (Nathan Davis)
Hoopl is a library that makes it easy to implement program
tranformations based on dataflow analyses. The original Hoopl
library was implemented in Haskell for the Glasgow Haskell
Compiler. This topic aims to adapt and implement the library in
Standard ML for the MLton compiler, with a particular focus
on MLton's SSA intermediate language.
Background:
-
Implement Partial Redundancy Elimination
Partial redundancy elimination (PRE) is a program
transformation that removes operations that are redundant on some,
but not necessarily all paths, through the program. PRE can
subsume both common subexpression elimination and loop-invariant
code motion, and is therefore a potentially powerful optimization.
However, a naïve implmentation of PRE on a program in static
single assignment (SSA) form is unlikely to be effective. This
topic aims to adapt and implement the SSAPRE algorithm(s) of
Thomas VanDrunen in MLton's SSA intermediate langauge.
Background:
-
Investigate Equality Saturation
Status: abandoned (Brian Gernhardt)
Equality saturation is a novel approach for structuring the
optimization phase of a compiler. In this approach, rather than
optimizing a program via a sequence of destructive
transformations, analyses add equality information to the program
representation, resulting in a representation that encodes
multiple transformed programs from which a heuristic chooses the
final optimized program. This topic aims to adapt and implement
equality saturation in MLton's SSA intermediate language.
Background:
-
Investigate Supercompilation
Supercompilation is a powerful program transformation technique
that can be used to greatly improve the efficiency of a program;
it is not uncommon for a supercompiled program to be more than
twice as fast as its (non-super)compiled equivalent.
Supercompilation is closely related to partial evaluation, and can
be informally specified as "execute the program (as much as
possible) at compile time." This topic aims to investigate the
suitability of supercompilation in the context of the
Standard ML programming language and the MLton compiler.
Background:
-
Multi-entry and Multi-return Function Calls
Status: in progress (David Larsen)
Function calls and returns are shackled by the procedural
paradigm legacy, where each function has exactly one entry point
and exactly one return point. Shivers and Fisher investigated
"multi-return" function calls, whereby a function could be called
with multiple return points; they demonstrated the utility and
expressiveness of such an extension. Dually, one could imagine
"multi-entry" function calls, whereby a function could be called
with multiple entry points. Such "multi-entry" functions can be
motivated by the control-flow graph representation of functions,
where multiple entry nodes are allowed. A "multi-entry" function
could be used to improve the performance of mutually
tail-recursive functions or to perform call-pattern
specialization. This topic aims to implement multi-entry and
multi-return functions in MLton's SSA intermediate language and to
investigate optimization opportunities admitted by such
functions.
Background:
-
Heap-allocated Activation Records
Activation records (a.k.a., stack frames) are traditionally
allocated on a stack. This naturally corresponds to the
call-return pattern of function invocation. However, there are
some disadvantages to stack-allocated activation records. In a
functional programming language, functions may be deeply
recursive, resulting in call stacks that are much larger than
typically supported by the operating system; hence, a functional
programming language implementation will typically store its stack
in its heap. Furthermore, a functional programming language
implementation must handle and recover from stack overflow, by
allocating a larger stack (again, in its heap) and copying
activation records from the old stack to the new stack. In the
presence of threads, stacks must be allocated in a heap and, in
the presence of a garbage collector, should be garbage collected
when unreachable. While heap-allocated activation records avoid
many of these disadvantages, they have not been widely
implemented. This project aims to implement and evaluate
heap-allocated activation records in the MLton compiler.
Background:
-
Formal Verification of Safe-for-Space Transformations
Are MLton's optimization passes correct and 'good'? Any
compiler transformation should preserve a program's (observable)
input/output and termination behavior. A variety of techniques
can be used to guarantee this semantics preservation: formal
(machine-checked) proofs of compiler correctness, (per program)
translation validation, domain-specific languages for writing
optimizations. However, this semantics preservation does not
guarantee that the transformed program executes in less time
and/or in less space than the original program. The space usage
of a program is especially important in functional languages with
garbage collection, because semantics preserving program
transformations can both break sharing of data structures
(increasing space usage) and change the data retained by the
garbage collector (increasing space usage). A program
transformation that does not change the asymptotic space usage of
a program is called "safe-for-space". This topic aims to explore
extensions of the aforementioned techniques for guaranteeing
semantics preservation to guarantee that program transformations
are safe-for-space. The motivation for this topic was the recent
discovery of a space-safety bug in one of MLton's optimization
passes.
Background:
-
Type-Flow Analysis and Maximal Monomorphisation
How can MLton's compilation techniques be extended to advanced
language features? Currently, MLton's compilation strategy relies
upon a number of program transformations (monomorphisation,
defunctionalization) that appear to be at odds or incompatible
with advanced language features (polymorphic recursion,
first-class polymorphism, existential types) not present in the
Standard ML language. For example, it is known that a
program making use of first-class polymorphism cannot be
completely monomorphised. However, it is not known how to
monomorphise "as much as possible," whereby only those portions of
the program making essential use of first-class polymorphism are
excluded from the transformation. This topic aims to develop a
static analysis that determines the (potentially infinite, but
possibly finite) set of types that instantiate a type variable;
such an analysis may be seen as an extension of control-flow
analysis to types.
Background:
-
Representation Monomorphisation
How can MLton's compilation techniques be extended to advanced
language features? Currently, MLton's compilation strategy relies
upon a number of program transformations (monomorphisation,
defunctionalization) that appear to be at odds or incompatible
with advanced language features (polymorphic recursion,
first-class polymorphism, existential types) not present in the
Standard ML language. Other compilers for other functional
languages support these features by choosing a single
representation for all data. The penalty for choosing this
uniform representation is that primitive data types are "boxed".
This topic aims to investigate a representation monomorphism
transformation and supporting type system, whereby a program is
monomorphised not with respect to the source language types (a
potentially infinite set), but with respect to data
representations (a necessarily finite set).
-
Design and Implement a Polymorphic SSA Intermediate Language with Equality Coercions
Background:
-
Explore an LLVM Backend
Status: in progress (Brian Leibig) ;; abandoned (Gary Reichard)
Although MLton has native backends for x86 and amd64
architectures, these native backends are rather naïve. The
Low Level Virtual Machine (LLVM) project is a collection of
modular and resusable compiler and toolchain technologies. This
topic aims to explore an LLVM backend for the MLton compiler.
Background:
-
Design and Implement a Heap Profiler
A heap profile is a description of the space usage of a
program. A heap profile is concerned with the allocation,
retention, and deallocation (via garbage collection) of heap data
during the execution of a program. A heap profile can be used to
diagnose performance problems in a functional program that arise
from space leaks. This topic aims to design and implement a heap
profiler for MLton compiled programs.
Background:
-
Explore Garabge Collector Improvements
The garbage collector plays a significant role in the
performance of functional languages. Garbage collect too often,
and program performance suffers due to the excessive time spent in
the garbage collector. Garbage collect not often enough, and
program performance suffers due to the excessive space used by the
uncollected garbage. One particular issue is ensuring that a
program utilizing a garbage collector "plays nice" with other
processes on the system, by not using too much or too little
physical memory. While there are some reasonable theoretical
results about garbage collections with heaps of fixed size, there
seems to be insufficient work that really looks carefully at the
question of dynamically resizing the heap in response to the live
data demands of the application and, similarly, in response to the
behavior of the operating system and other processes. This topic
aims to investigate improvments to the memory behavior of MLton
compiled programs through better tuning of the garbage
collector.
Background:
- Isla Vista Heap Sizing: Using Feedback to Avoid Paging; Chris Grzegorczyk, Sunil Soman, Chandra Krintz, and Rich Wolski
- Controlling garbage collection and heap growth to reduce the execution time of Java applications; im Brecht, Eshrat Arjomandi, Chang Li, and Hang Pham
- Garbage collection without paging; Matthew Hertz, Yi Feng, and Emery D. Berger
- Automatic heap sizing: taking real memory into account; Ting Yang, Matthew Hertz, Emery D. Berger, Scott F. Kaplan, and J. Eliot B. Moss
-
Implement Successor ML Language Features
Any programming language, including Standard ML, can be
improved. The community has identified a number of modest
extensions and revisions to the Standard ML programming
language that would likely prove useful in practice. This topic
aims to implement these language features in the MLton
compiler.
Background:
-
Implement the ML Basis System in other Standard ML Compilers
The ML Basis system in an extension of Standard ML to
support programming-in-the-very-large, namespace management at the
module level, separate delivery of library sources, and
more. While Standard ML modules are a sophisticated language
for programming-in-the-large, it is difficult, if not impossible,
to accomplish a number of routine namespace management operations
when a program draws upon multiple libraries provided by different
vendors. The ML Basis system was initially designed and
implemented in the MLton compiler. This project aims to hasten
adoption of the system by implementing it in other
Standard ML compilers.
Background:
Transactional Events
Transactional
events are a novel concurrency abstraction that combines
first-class synchronous message-passing events with all-or-nothing
transactions. This combination enables simple solutions to
interesting problems in concurrent programming. The expressive
power of transactional events arises from a sequencing combinator
whose semantics enforces an all-or-nothing transactional property --
either both of the constituent events synchronize in sequence or
neither of them synchronizes.
-
Build a Medium- to Large-Scale Application using Transactional Events
Although transactional events have been proposed as a novel
concurrency abstraction that can solve interesting problems in
concurrent programming, the abstraction has not been "used in
anger" for a significant application. This topic aims to find a
non-trivial application that could make use of transactional
events in an interesting manner and report on how transactional
events improved the design or implementation of the
application.
Background:
|