Information for MS Capstone Students

I am happy to supervise the MS capstone activities (projects or theses) of students interested in pursuing a topic in programming languages, especially those topics that are closely aligned with my primary areas of interest. A student pursuing an MS capstone activity under my supervision will almost certainly need to have taken one or more of the following courses: CSCI-740 (Programming Language Theory), CSCI-742 (Compiler Construction), and CSCI-641 (Advanced Programming Skills: Functional Programming).

If you are considering asking me to chair your MS capstone activity, then first take the time to investigate one or more of the suggested topics listed below. Next, schedule a meeting with me to discuss a potential capstone activity topic in more detail. While I am especially interested in pursuing the suggested topics listed below, feel free to propose similar topics.

Be sure to understand the policies and procedures related to MS capstone activities, as described in the Computer Science Graduate Advising Handbook. Also, be aware that during the time that I am supervising your MS capstone activity, I will expect to meet with you on a weekly basis in order to review progress, discuss questions and obstacles, and plan future work.

Primary Areas of Interest

  • Functional Programming
  • Compiler Technology (especially for functional programming languages)
  • Type Systems
  • Program Semantics
  • Language Design and Implementation for Concurrency and Parallelism

Suggested Topics


MLton is an open-source, whole-program, optimizing compiler for the Standard ML programming language.

Begun in the summer of 1997, MLton has grown to over 100,000 lines of code and has become a viable (and, in some instances, the only viable) option for the compilation and execution of significant applications written in Standard ML. MLton is widely regarded as one of the best compilers for any functional programming language; it is actively used in both industry and academia. As a mature infrastructure with numerous users, it provides ample opportunity for students of all levels to investigate topics in language design and implementation.

My RIT Computer Science Community Teacha' Talk gives an overview of the MLton project.

  • Explore Alternate Control-Flow Analyses for Defunctionalization

    Defunctionalization is a program transformation that converts a higher-order program into a first-order program. Defunctionalization is a key tranformation in MLton's compilation strategy. Control-flow analysis is a program analysis that identifies the set of functions that could be invoked at each call site. Control-flow analysis is used to guide the defunctionalization transformation. Control-flow analysis has had a recent renaissance, and this topic aims to investigate if advances in control-flow analysis can lead to better defunctionalization.

  • Explore Alternate Closure Representations

    A closure is a data structure that represents a first-class function at runtime. A closure is responsible for providing a first-class function access to its free variables. MLton's closure conversion algorithm uses a simple "flat" closure to represent each function. This topic aims to explore the costs and benefits of alternate closure representations.

  • Adapt and Implement Hoopl for MLton's SSA Intermediate Langauge

    Hoopl is a library that makes it easy to implement program tranformations based on dataflow analyses. The original Hoopl library was implemented in Haskell for the Glasgow Haskell Compiler. This topic aims to adapt and implement the library in Standard ML for the MLton compiler, with a particular focus on MLton's SSA intermediate language.

  • Implement Partial Redundancy Elimination

    Partial redundancy elimination (PRE) is a program transformation that removes operations that are redundant on some, but not necessarily all paths, through the program. PRE can subsume both common subexpression elimination and loop-invariant code motion, and is therefore a potentially powerful optimization. However, a naïve implmentation of PRE on a program in static single assignment (SSA) form is unlikely to be effective. This topic aims to adapt and implement the SSAPRE algorithm(s) of Thomas VanDrunen in MLton's SSA intermediate langauge.

  • Investigate Equality Saturation

    Equality saturation is a novel approach for structuring the optimization phase of a compiler. In this approach, rather than optimizing a program via a sequence of destructive transformations, analyses add equality information to the program representation, resulting in a representation that encodes multiple transformed programs from which a heuristic chooses the final optimized program. This topic aims to adapt and implement equality saturation in MLton's SSA intermediate language.

  • Investigate Supercompilation

    Supercompilation is a powerful program transformation technique that can be used to greatly improve the efficiency of a program; it is not uncommon for a supercompiled program to be more than twice as fast as its (non-super)compiled equivalent. Supercompilation is closely related to partial evaluation, and can be informally specified as "execute the program (as much as possible) at compile time." This topic aims to investigate the suitability of supercompilation in the context of the Standard ML programming language and the MLton compiler.

  • Multi-entry and Multi-return Function Calls

    Status: in progress (David Larsen)

    Function calls and returns are shackled by the procedural paradigm legacy, where each function has exactly one entry point and exactly one return point. Shivers and Fisher investigated "multi-return" function calls, whereby a function could be called with multiple return points; they demonstrated the utility and expressiveness of such an extension. Dually, one could imagine "multi-entry" function calls, whereby a function could be called with multiple entry points. Such "multi-entry" functions can be motivated by the control-flow graph representation of functions, where multiple entry nodes are allowed. A "multi-entry" function could be used to improve the performance of mutually tail-recursive functions or to perform call-pattern specialization. This topic aims to implement multi-entry and multi-return functions in MLton's SSA intermediate language and to investigate optimization opportunities admitted by such functions.

  • Heap-allocated Activation Records

    Activation records (a.k.a., stack frames) are traditionally allocated on a stack. This naturally corresponds to the call-return pattern of function invocation. However, there are some disadvantages to stack-allocated activation records. In a functional programming language, functions may be deeply recursive, resulting in call stacks that are much larger than typically supported by the operating system; hence, a functional programming language implementation will typically store its stack in its heap. Furthermore, a functional programming language implementation must handle and recover from stack overflow, by allocating a larger stack (again, in its heap) and copying activation records from the old stack to the new stack. In the presence of threads, stacks must be allocated in a heap and, in the presence of a garbage collector, should be garbage collected when unreachable. While heap-allocated activation records avoid many of these disadvantages, they have not been widely implemented. This project aims to implement and evaluate heap-allocated activation records in the MLton compiler.

  • Formal Verification of Safe-for-Space Transformations

    Are MLton's optimization passes correct and 'good'? Any compiler transformation should preserve a program's (observable) input/output and termination behavior. A variety of techniques can be used to guarantee this semantics preservation: formal (machine-checked) proofs of compiler correctness, (per program) translation validation, domain-specific languages for writing optimizations. However, this semantics preservation does not guarantee that the transformed program executes in less time and/or in less space than the original program. The space usage of a program is especially important in functional languages with garbage collection, because semantics preserving program transformations can both break sharing of data structures (increasing space usage) and change the data retained by the garbage collector (increasing space usage). A program transformation that does not change the asymptotic space usage of a program is called "safe-for-space". This topic aims to explore extensions of the aforementioned techniques for guaranteeing semantics preservation to guarantee that program transformations are safe-for-space. The motivation for this topic was the recent discovery of a space-safety bug in one of MLton's optimization passes.

  • Type-Flow Analysis and Maximal Monomorphisation

    How can MLton's compilation techniques be extended to advanced language features? Currently, MLton's compilation strategy relies upon a number of program transformations (monomorphisation, defunctionalization) that appear to be at odds or incompatible with advanced language features (polymorphic recursion, first-class polymorphism, existential types) not present in the Standard ML language. For example, it is known that a program making use of first-class polymorphism cannot be completely monomorphised. However, it is not known how to monomorphise "as much as possible," whereby only those portions of the program making essential use of first-class polymorphism are excluded from the transformation. This topic aims to develop a static analysis that determines the (potentially infinite, but possibly finite) set of types that instantiate a type variable; such an analysis may be seen as an extension of control-flow analysis to types.

  • Representation Monomorphisation

    How can MLton's compilation techniques be extended to advanced language features? Currently, MLton's compilation strategy relies upon a number of program transformations (monomorphisation, defunctionalization) that appear to be at odds or incompatible with advanced language features (polymorphic recursion, first-class polymorphism, existential types) not present in the Standard ML language. Other compilers for other functional languages support these features by choosing a single representation for all data. The penalty for choosing this uniform representation is that primitive data types are "boxed". This topic aims to investigate a representation monomorphism transformation and supporting type system, whereby a program is monomorphised not with respect to the source language types (a potentially infinite set), but with respect to data representations (a necessarily finite set).

  • Design and Implement a Polymorphic SSA Intermediate Language with Equality Coercions

  • Design and Implement a Heap Profiler

    A heap profile is a description of the space usage of a program. A heap profile is concerned with the allocation, retention, and deallocation (via garbage collection) of heap data during the execution of a program. A heap profile can be used to diagnose performance problems in a functional program that arise from space leaks. This topic aims to design and implement a heap profiler for MLton compiled programs.

  • Explore Garabge Collector Improvements

    The garbage collector plays a significant role in the performance of functional languages. Garbage collect too often, and program performance suffers due to the excessive time spent in the garbage collector. Garbage collect not often enough, and program performance suffers due to the excessive space used by the uncollected garbage. One particular issue is ensuring that a program utilizing a garbage collector "plays nice" with other processes on the system, by not using too much or too little physical memory. While there are some reasonable theoretical results about garbage collections with heaps of fixed size, there seems to be insufficient work that really looks carefully at the question of dynamically resizing the heap in response to the live data demands of the application and, similarly, in response to the behavior of the operating system and other processes. This topic aims to investigate improvments to the memory behavior of MLton compiled programs through better tuning of the garbage collector.

  • Implement Successor ML Language Features

    Any programming language, including Standard ML, can be improved. The community has identified a number of modest extensions and revisions to the Standard ML programming language that would likely prove useful in practice. This topic aims to implement these language features in the MLton compiler.

  • Implement the ML Basis System in other Standard ML Compilers

    The ML Basis system in an extension of Standard ML to support programming-in-the-very-large, namespace management at the module level, separate delivery of library sources, and more. While Standard ML modules are a sophisticated language for programming-in-the-large, it is difficult, if not impossible, to accomplish a number of routine namespace management operations when a program draws upon multiple libraries provided by different vendors. The ML Basis system was initially designed and implemented in the MLton compiler. This project aims to hasten adoption of the system by implementing it in other Standard ML compilers.


Transactional Events

Transactional events are a novel concurrency abstraction that combines first-class synchronous message-passing events with all-or-nothing transactions. This combination enables simple solutions to interesting problems in concurrent programming. The expressive power of transactional events arises from a sequencing combinator whose semantics enforces an all-or-nothing transactional property -- either both of the constituent events synchronize in sequence or neither of them synchronizes.

  • Build a Medium- to Large-Scale Application using Transactional Events

    Although transactional events have been proposed as a novel concurrency abstraction that can solve interesting problems in concurrent programming, the abstraction has not been "used in anger" for a significant application. This topic aims to find a non-trivial application that could make use of transactional events in an interesting manner and report on how transactional events improved the design or implementation of the application.