This thesis presents the fastest known (approximation) algorithms for two such problems, the permanent and binary contingency tables. The permanent counts the number of (weighted) perfect matchings of a given bipartite graph, and it has applications in statistical physics, statistics, and several areas of computer science. Binary contingency tables count the number of bipartite graphs with prescribed degree sequences, and they are used in statistical tests in a number of fields including ecology, education and psychology.
Our MCMC algorithms use the simulated annealing approach, a popular heuristic in combinatorial optimization. It utilizes intuition from the physical annealing process, where a substance is heated and slowly cooled so that it moves from a disordered configuration into a crystalline (i.e., ordered) state. In optimization, simulated annealing starts at an easy instance at high temperature, and slowly "cools" into the desired hard instance at low temperature. Simulated annealing is widely used, with much empirical success but little theoretical backing.
This work presents a new cooling schedule for simulated annealing, with provable guarantees for several counting problems. We also improve results on the rate of convergence for the Markov chains which are at the core of the simulated annealing algorithm. As a consequence, we obtain a new algorithm, which for a graph with $n$ vertices, approximates the permanent of a 0/1 matrix (within an arbitrarily close approximation) in time O*(n^7). We also present a new simulated annealing algorithm for binary contingency tables, which relies on an interesting combinatorial lemma. In fact, we obtain a more general result: we give a polynomial-time algorithm for counting bipartite graphs with prescribed degree sequence and an arbitrary set of forbidden edges.
Finally, we discuss the importance sampling method, which is a popular method for binary contingency tables. For any input instance, this technique converges asymptotically to the correct answer and in many cases it appears to converge fast, making it especially suited for practical purposes. On the contrary, we present a negative result: we describe a very simple instance for which, in any subexponential number of steps, the sequential importance sampling estimate differs from the correct answer by an exponential factor (with high probability).
To complement this result we show that no truthful mechanism can compute the exact optimum. We also consider the algorithmic perspective when the (true) additive valuation functions are part of the input. We present a simple 1/(m-k+1) approximation algorithm which allocates to every player at least 1/k fraction of the value of all but the k-1 heaviest items and we give an algorithm with additive error against the fractional optimum bounded by the value of the largest item. The two approximation algorithms are incomparable in the sense that there exist instances when one outperforms the other.
Kannan, Lovasz, and Simonovits recently analyzed a natural random walk known as the ball walk. By bounding the so-called conductance, they obtain (via a Cheeger-type inequality) a bound on the Poincare constant of the random walk. Their bound proves the walk converges to a random point in time O*(n^3D^2), where n is the dimension and D is the diameter of the body. We survey and present a self-contained view of their work. In addition, following an outline proposed by Jerrum, we slightly modify the KLS analysis to bound the Poincare constant without using a Cheeger-type inequality.
The adjacency matrix is optimal with respect to the adjacency test but for n vertices its space complexity is always O(n^2) bits, regardless of the number of edges m. The adjacency list representation is space-optimal but the adjacency test takes more time than it necessarily needs. The aim of this thesis was to find a representation as close as possible to the optimal space complexity of O(m log n) bits, and to the optimal time complexity of O(log n) bit-operations for the adjacency test (and the update operations).
We present a representation using the technique of hashing: it takes O(m log n) bits and the adjacency test can be performed in O(log n loglog n) bit-operations. The "list-of-all-neighbors" procedure takes O((k + loglog n)log n) bit-operations, where k is the number of neighbors to be listed, and the expected amortized time complexity of the update operations is O(log n loglog n).
We also suggest a new representation based on computing the distance from all vertices to some given vertex. We show that this representation is optimal (in both time and space complexities) for the family of planar graphs.