## THEORY CANAL: The Rochester Theory Seminar Series

The THEORY CANAL meeting (the Rochester Theory Seminar) is a joint project of the UR and RIT theory groups, and the focus is all areas of theoretical computer science. THEORY CANAL meets (when classes are in session) on the first and third (and sometimes, when a fifth exists, fifth) Monday of each month. (Due to slot demand, school holidays, and religious holidays, there are sometimes exceptions to that rule: Mondays of that form that we skip and Mondays not of that form that we don't skip. So see the schedule below for the actual dates.) The talks start at 12:30PM and typically take 60 to 90 minutes.

The meetings are held in Room 703, Computer Studies Building, University of Rochester, Rochester, NY 14627.

The meetings are open to the public; all are very welcome.

Chronological list of 12:30PM THEORY CANAL talks for the 2005-2006 academic year:

• September 5, 2005, 12:30pm (No Meeting: Labor Day)

• September 19, 2005, 12:30pm

Speaker: Daniel Stefankovic, URCS
Topic: Locally Testable Cyclic Codes

Cyclic linear codes of block length n over a finite field F_q are the linear subspace of F_q^n that are invariant under a cyclic shift of their coordinates. A family of codes is good if all the codes in the family have constant rate and constant normalized distance (distance divided by block length). It is a long-standing open problem whether there exists a good family of cyclic linear codes (cf. [MS, p. 270]). A code C is r-testable if there exist a randomized algorithm which, given a word x in F_q^n, adaptively selects r positions, checks the entries of x in the selected positions, and makes a decision (accept or reject x) based on the positions selected and the numbers found, such that

* if x then x is surely accepted;

* if dist(x,C)>=eps.n then x is probably rejected. ("dist" refers to Hamming distance.)

A family of codes is locally testable if all members of the family are r-testable for some some constant r. This concept arose from holographic proofs/PCPs. Goldreich and Sudan [GS] asked whether there exist good, locally testable families of codes. It is an open problem whether there exists a good locally testable. We will show:

Theorem: There is no family of good locally testable cyclic codes.

Joint work with Laszlo Babai, and Amir Shpilka.

• October 3, 2005, 12:30pm

Speaker: Ashwin Lall, URCS
Topic: The Space-Complexity of Estimating the Entropy of a Stream

We can define the entropy of a stream over items {1, \ldots, n} to be the sum: H = -\sum_i (m_i/m)\log{m_i/m}, where m_i is the frequency of the i-th item and m is the total number of items in the stream (by convention, 0\log{0} = 0). In this talk we will discuss the space usage of a one-pass, streaming algorithm for computing H.

We will first demonstrate that randomization or approximation alone cannot solve this problem with sub-linear space consumption. We then present an (\epsilon, \delta)-approximation algorithm for H that is closely related to the one of Alon, Matias and Szegedy. We give bounds on how much space the algorithm requires, based upon some reasonable assumptions. Finally, we will discuss some open problems and future directions in which this work can be extended.

• (note: special date) October 10, 2005, 12:30pm

Speaker: Ivona Bezakova, Univ. of Chicago
Topic: Sampling Binary Contingency Tables with a Greedy Start

I will talk about the problem of counting and randomly sampling binary contingency tables. Binary contingency table is a n x m binary matrix which satisfies given row and column sums.I will present a simulated annealing algorithm with running time T=O((nm)^2D^3d_{\max}\log^5(n+m)) which counts the binary contingency tables for any row and column sums, where D is the number of non-zero entries and d_{\max} is the maximum row or column sum. More precisely, given any \epsilon and \delta, the algorithm outputs a number within (1\pm\epsilon) of the true value with probability >1-\delta in time O(T\epsilon^{-3}\log(1/\delta)).

This is the first algorithm which solves the problem directly. Previous algorithms either reduced the problem to the permanent (resulting in a worse running time) or restricted the row/column sums to near-regular. An interesting aspect of this algorithm is that the simulated annealing is started at a non-trivial instance obtained by a specific greedy algorithm.

Joint work with Nayantara Bhatnagar and Eric Vigoda.

• October 17, 2005, 12:30pm (No Meeting: UR Fall Break)

• October 31, 2005, 12:30pm

Speaker: Oliver Kikic, RIT
Topic: Simultaneous Visual Cryptography

A visual cryptography scheme (VCS), as proposed by M. Naor and A. Shamir in their publication entitled Visual Cryptography, can encode a secret image into n different shares. The scheme ensures that only a single, previously designated combination of shares can recover the original image, whereas other combinations do not yield any information about the secret image. In this paper, we claim that there exist simultaneous visual cryptography schemes (SVCS), which allow for multiple secret images to be encoded across a set of n shares. The essential part of this research endeavor is to design and examine different approaches for establishing valid SVCS constructions, as well as derive a set of formal definitions used to construct a valid SVCS. We describe an SVCS that allows encoding distinct secret images across a set of n shares.

• November 7, 2005, 12:30pm

Speaker: Mitsunori Ogihara, URCS
Topic: Nonuniform Leaf Language Classes

The leaf language is a tool for defining complexity classes systematically by examining the pattern of outputs produced at the leaf level in the computation tree of nondeterministic Turing machines have. Each leaf language A is associated with a nondeterministic Turing machine N that outputs a character along each computation path and with a pattern language K. Then a word x is defined to be in A if and only if the output word of N on input x belongs to the pattern language K. A paper by Bovet, Crescenzi, and Silvestri introduces this beautiful concept and shows many complexity classes can be defined in terms of leaf languages.

A paper by Ungar in MFCS this summer studies leaf languages defined using an NP-machine with the full complete binary tree as the computation tree and any very (poly(log) to be more precise) sparse K. Ungar proves some interesting results in the paper, including the following: if SAT is reducible to a leaf defined by such a very sparse K, then the polynomial hierarchy collapses to Theta-2. This paper has motivated me to think about the languages defined using very sparse leaf pattern languages. In this talk, I will present a work-in-progress report of my research on this topic as well as introduce this work of Ungar.

Disclaimer: it's a work-in-progress report, so please don't expect to hear big new results.

• November 21, 2005, 12:30pm

Speaker: Proshanto Mukherji, URCS
Topic: Computation with Absolutely No Space Overhead

We study Turing machines that are allowed absolutely no space overhead. The only work space the machines have, beyond the fixed amount of memory implicit in their finite-state control, is that which they can create by cannibalizing the input bits' own space. This model more closely reflects the fixed-sized memory of real computers than does the standard complexity-theoretic model of linear space.

Though some context-sensitive languages cannot be accepted by such machines, we show that all context-free languages can be accepted nondeterministically in polynomial time with absolutely no space overhead, and that all deterministic context-free languages can be accepted deterministically in polynomial time with absolutely no space overhead.

• December 5, 2005, 12:30pm (****NOTE: THIS TALK WILL BE IN A NONSTANDARD LOCATION, NAMELY, ROOM 632, Computer Studies Building, University of Rochester****)

Speaker: Christopher Homan, RIT
Topic: Algorithmic Aspects of Planar Graphs and Tree Decompositions

Tree decompositions play many roles in graph theory. Many NP-complete graph problems are, when restricted to graphs of bounded treewidth, easy to compute. Many NP-hard-to-approximate problems have polynomial-time approximation schemes when restricted to planar graphs, and this is due in part to the fact that the treewidth of a planar graph depends only on its diameter.

I will discuss the so-called diameter-treewidth property of planar graphs and present recent results that extend the diameter-treewidth property to broader families of graphs.

• January 30, 2005, 12:30pm

Speaker: Edith Hemaspaandra, RIT
Topic: Anyone but Him: The Complexity of Precluding an Alternative

Preference aggregation in a multiagent setting is a central issue in both human and computer contexts. We study in terms of complexity the vulnerability of preference aggregation to destructive control. That is, we study the ability of an election's chair to, through such mechanisms as voter/candidate addition/suppression/partition, ensure that a particular candidate (equivalently, alternative) does not win. And we study the extent to which election systems can make it impossible, or computationally costly (NP-complete), for the chair to execute such control. Among the systems we study---plurality, Condorcet, and approval voting---we find cases where systems immune or computationally resistant to a chair choosing the winner nonetheless are vulnerable to the chair blocking a victory. Beyond that, we see that among our studied systems no one system offers the best protection against destructive control. Rather, the choice of a preference aggregation system will depend closely on which types of control one wishes to be protected against. We also find concrete cases where the complexity of or susceptibility to control varies dramatically based on the choice among natural tie-handling rules.

Joint work with Lane Hemaspaandra and Joerg Rothe.

• February 6, 2006, 12:30pm

Speaker: Piotr Faliszewski, URCS
Topic: Some Properties of Uniformly Hard Languages

In this talk I will discuss basic properties of uniformly hard languages. I will show that EXP contains uniformly hard languages, and that in fact all EXP-complete languages are uniformly hard. I will also show that either there are uniformly hard languages at every level of the polynomial hierarchy, or that there are none.

Uniformly hard languages are languages that do not contain long runs of easy instances. The motivation for them comes from a famous theorem of Ladner, which says that if A is a decidable language not in P, then there is a language B such that

- B not in P

- B many-one reduces to A

- A does not many-one reduce to B

The way Ladner's theorem works is that it takes A, and punches big holes in it (that is, long runs of easy instances). It is interesting to know whether one could obtain similar results without introducing non-(uniform hardness).

• February 20, 2006, 12:30pm

Speaker: Ming Zhong, URCS
Topic: Modeling and Analysis of BitTorrent Systems

As a simple, efficient data distribution mechanism, BitTorrent have drawn millions of users and are dominating current Internet traffic. In this talk I will introduce two recent work on modeling and analyzing BitTorrent systems. One work (Qiu et. al, ACM SIGCOMM 2004) is a simple fluid model that leads to steady-state performance results on BitTorrent systems. The other work (SODA 2006) uses graph theory model to achieve asymptotic bounds for the average downloading time.

• March 6, 2006, 12:30pm

Speaker: Chengliang Zhang, URCS
Topic: Trend Analysis for Large Document Streams

More and more powerful computer technology inspires people to investigate information hidden under huge amount of documents. We are especially interested in documents with relative time order, which we also call document streams, for example, TV news, forums, emails of company projects, call center telephone logs, etc.

To get an insight into these document streams, first we need to detect the events among the document streams. We use a time-sensitive Dirichlet process mixture model to find the events in the document streams. A time sensitive Dirichlet process mixture model is a generative model, which allows potentially infinite number of mixture components and uses Dirichlet compound multinomial model to model the distribution of words in documents. We consider three different time sensitive Dirichlet process mixture models: an exponential decay kernel model, a polynomial decay function kernel Dirichlet process model and a sliding window kernel model. We also compare their performances in addition to that of the Dirichlet process mixture model.

We use theme river presentation to provide a visualization of the events along the time axis. With the help of theme rivers, people can easily get an overall picture of how differently events evolve.

• March 20, 2006, 12:30pm

Speaker: Daniel Stefankovic, URCS
Topic: Phylogeny of Mixture Models: Maximum Likelihood, Ambiguity, and Linear Tests

It is well known that phylogenetic trees can vary between genes. Even within regions having the same tree topology, the mutation rates often vary. This motivates the study of phylogenetic reconstruction in heterogeneous settings. We study the (im)possibility of reconstructing the underlying phylogeny when data is generated from a mixture of trees (same topology, different branch lengths). We first show the pitfalls of popular methods, including maximum likelihood and MCMC algorithms. We then determine in which evolutionary models, reconstructing the tree topology, under a mixture distribution, is (im)possible. We prove that every model either has ambiguous distributions, in which case reconstruction is impossible in general, or there exist linear tests which identify the topology. This duality theorem, relies on our notion of linear tests and uses ideas from linear programming duality. Linear tests are closely related to linear invariants, which were first introduced by Lake.

• April 3, 2006, 12:30pm

Speaker: Joel Seiferas, URCS
Topic: Are Shared Tapes More Powerful?

Are Turing machines with multiple heads on the same tape more powerful than TM's with ordinary tapes? We will review the sense in which, surprisingly, there is no advantage; and then we will review attempts to capture the senses in which there *is* some demonstrable advantage. The main results are that 4 single-head tapes are as powerful as one two-head tape, but that 2 are not. There are many open questions.

• April 17, 2006, 12:30pm

Speaker: Laura Beth Lincoln, RIT
Topic: Removing Guilt by Association with Private Information Retrieval

Suppose there is a movie you would be interested watching via pay-per-view, but you refuse to purchase the feed because you believe that the supplier will sell your information to groups paying for the contact information of all the people who purchased that movie, and the association of your name to that purchase could hinder career, relationships, or increase the amount of time you spend cleaning SPAM out of your mailbox. Private Information Retrieval (PIR) will allow you to retrieve a particular feed without the supplier knowing which feed you actually got, and Symmetric Private Information Retrieval (SPIR) will assure the supplier, if the feeds are equally priced, that you received only the number of feeds you purchased. Now you can purchase without risking your name being associated with a particular feed and the supplier has gained the business of a once paranoid client. The speed of available technology may currently limit such transactions for movies or music, but this talk will present additive homomorphic probabilistic encryption schemes, such as Paillier's, and show how such encryption schemes can be used in a one round SPIR protocol.

This talk will summarize M.S. thesis work by Laura Beth Lincoln done at RIT under the supervision of Stanislaw Radziszowski.

• May 1, 2006, 12:30pm

Speaker: Henning Schnoor, Univ. of Hannover
Topic: Enumerating all Solutions for Constraint Satisfaction Problems

Constraint satisfaction problems are generalizations of many combinatorial problems. In recent years, there have been many results on the complexity of this problem, and it has proven to be interesting both from a practical and from a theoretical point of view. We study the corresponding enumeration problem, and present new efficient algorithms which enumerate, for a given CSP-formula, the set of its solutions. Such algorithms exist for a broad class of constraint languages on arbitrary finite domains. We also show that the algebraic techniques usually applied in the constraint satisfaction context do not work for the enumeration problem.

Here are the talk lists for some previous years:

The seminar series is open to public, and is being organized this year by Lane Hemaspaandra and Ming Zhong of UR and Edith Hemaspaandra of RIT. If you have any questions, please send email to lane'' or zhong'' (both in the domain cs.rochester.edu) or eh'' (in the domain cs.rit.edu).