1. M. Jordan (ed.), Learning in Graphical Models , MIT Press, 1998
Overview of the book:
Graphical models, a marriage between probability theory and graph theory,
provide a natural tool for dealing with two problems that occur
throughout applied mathematics and engineering—uncertainty and
complexity. In particular, they play an increasingly important role in
the design and analysis of machine learning algorithms. Fundamental to
the idea of a graphical model is the notion of modularity: a complex
system is built by combining simpler parts. Probability theory serves
as the glue whereby the parts are combined, ensuring that the system
as a whole is consistent and providing ways to interface models to
data. Graph theory provides both an intuitively appealing interface by
which humans can model highly interacting sets of variables and a data
structure that lends itself naturally to the design of efficient
general-purpose algorithms.
This book presents an in-depth exploration of issues related to
learning within the graphical model formalism. Four chapters are
tutorial chapters—Robert Cowell on Inference for Bayesian Networks,
David MacKay on Monte Carlo Methods, Michael I. Jordan et al. on
Variational Methods, and David Heckerman on Learning with Bayesian
Networks. The remaining chapters cover a wide range of topics of
current research interest.
2. M. Mezard and A. Montanari, Information, Physics and Computation , Oxford University Press, 2009
Overview of this book:
This book presents a unified approach to a rich and rapidly evolving
research domain at the interface between statistical physics,
theoretical computer science/discrete mathematics, and
coding/information theory. It is accessible to graduate students and
researchers without a specific training in any of these fields. The
selected topics include spin glasses, error correcting codes,
satisfiability, and are central to each field. The approach focuses on
large random instances and adopts a common probabilistic formulation
in terms of graphical models. It presents message passing algorithms
like belief propagation and survey propagation, and their use in
decoding and constraint satisfaction solving. It also explains
analysis techniques like density evolution and the cavity method, and
uses them to study phase transitions.
3. C. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), 2010
Overview of this book:
This is the first textbook on pattern recognition to present the
Bayesian viewpoint. The book presents approximate inference algorithms
that permit fast approximate answers in situations where exact answers
are not feasible. It uses graphical models to describe probability
distributions when no other books apply graphical models to machine
learning. No previous knowledge of pattern recognition or machine
learning concepts is assumed. Familiarity with multivariate calculus
and basic linear algebra is required, and some experience in the use
of probabilities would be helpful though not essential as the book
includes a self-contained introduction to basic probability theory.
These books are self-contained and thus the prerequisites mentioned in the details are a add-on for research in these areas. Of course these are just the subset of many other content-rich books out there.
Hope this helps.