Programme

Time	Monday, Nov 26	Tuesday, Nov 27
9:00-9:30	Registration/Coffee	Registration/Coffee
9:30-10:30	Fatemeh Mohammadi	Kaie Kubjas
10:30-11:00	Coffee break	Coffee break
11:00-12:00	Fabio Rapallo	Nina Otter
12:00-13:30	Lunch	Lunch
13:30-14:30	Eva Riccomagno	Paul Breiding
14:30-15:00	Coffee break	Coffee break
15:00-16:00	Henry Wynn	Hugo Maruri-Aguilar
16:00-17:00	Christian Haase

The conference dinner will take place at 19:00hr at Culinaria.

Titles and Abstracts

Speaker : Fatemeh Mohammadi

Title : Learning Bayesian Networks Using Generalized Permutohedra

Abstract : Graphical models (Bayesian networks) based on directed acyclic graphs (DAGs) are used to model complex cause-and-effect systems. A graphical model is a family of joint probability distributions over the nodes of a graph which encodes conditional independence relations via the Markov properties. One of the fundamental problems in causality is to learn an unknown graph based on a set of observed conditional independence relations. In this talk, I will describe a greedy algorithm for DAG model selection that operate via edge walks on so-called DAG associahedra. For an undirected graph the set of conditional independence relations are represented by a simple polytope known as the graph associahedron, which can be constructed as a Minkowski sum of standard simplices. For any regular Gaussian model, and its associated set of conditional independence relations we construct the analogous polytope DAG associahedon which can be defined using relative entropy. For DAGs we construct this polytope as a Minkowski sum of matroid polytopes corresponding to Bayes-ball paths in graph.

This is a joint work with Caroline Uhler, Charles Wang, and Josephine Yu.

Speaker: Fabio Rapallo

Title: Circuits in experimental design

Abstract: In the framework of factorial design, a statistical (linear) model is defined through an appropriate model matrix, which depends on the experimental runs and encodes their geometric structure. In this seminar we descuss some properties of the circuit basis of the model matrix in connection with two well-known properties of the designs, namely robustness and D-optimality. Exploiting the identification of a fraction with a binary contingency table, we define a criterion to check whether a fraction is saturated or not with respect to a given model and we generalize such a result in order to study the robustness of a fraction by inspecting its intersections with the supports of the circuits. Using some simulations, we show that the combinatorial description of a fraction with respect to the circuit basis is strictly related to the notion of D-optimal fraction and to other optimality criteria. This talk is based on joint work with Henry Wynn.

Speaker: Eva Riccomagno

Title: Discovering statistical equivalence classes of discrete statistical models using computer algebra

Abstract: We show that representations of certain polynomials in terms of a nested factorization are intrinsically linked to labelled event trees. We give a recursive formula for the construction of such polynomials from a tree and present an algorithm in the computer algebra software CoCoA to derive all tree graphs from a polynomial. We finally use our results in applications linked to staged tree models. (Joint work with Anna Bigatti (University of Genova, Italy), Christiane Goergen and Jim Q. Smith (The University of Warwick, UK)

Speaker: Christian Haase

Title: Tell me, how many modes does the Gaußian mixture have . . .

Abstract: Gaussian mixture models are widely used in Statistics. A fundamental aspect of these distributions is the study of the local maxima of the density, or modes. In particular, it is not known how many modes a mixture of k Gaussians in d dimensions can have. We give improved lower bounds and the first upper bound on the maximum number of modes, provided it is finite. (joint work with Carlos Améndola and Alexander Engström)

Speaker : Henry Wynn

Title : On Rational Interpolation

Abstract: This continues work on general multivariate interpolation in [1] based on the the work of Bekker and Weispfennig [2]. The use of G-bases in polynomial interpolation is well understood and has been used by the authors and co-workers particularly to gain understanding of the identifiability of polynomial regression models over particular designs. The approach taken here is to include both the output variable $y$ and its values $y_i$ on arbitrary algebraic varieties $V_i, i = 1,\ldots,k$, respectively, in the formulation of the interpolation problem; that is to say interpolating between the varieties. By careful specification of the problem, as in [1] ands [2], it is possible to obtain rational interpolators of the form:

$$y(x) = \sum_i y_i \frac{P_i(x)}{Q_i(x)}.$$
Under special conditions it is possible to choose the denominators, $Q_i(x)$ to be non-zero, thus avoiding poles. It is a challenge to make links to more standard rational interpolation such as NURBS (non-uniform rational basis splines). The application to experimental design and the possibility of a new type of rational Sobolev smoothing is also mentioned.

[1] Maruri-Aguilar, H. & Wynn, H. P. (2008). Generalised design: Interpolation and statistical modelling over varieties. Algebraic and Geometric Methods in Statistics. Cambridge University Press. 159-173.
[2] Becker, Y & Weispfennig, V. (1991). The chinese remainder problem, multivariate interpolation and Grobner bases. in Proc. ISSAC '91 (Bonn Germany), 64-9.

Speaker : Kaie Kubjas

Title : Geometry and maximum likelihood estimation of the binary latent class model

Abstract : The binary latent class model consists of binary tensors of nonnegative rank at most two inside the standard simplex. We characterize its boundary stratification with the goal to use this stratification for exact maximum likelihood estimation in statistics. We explain two different approaches for deriving the boundary stratification: by studying the geometry of the model and by using the fixed points of the Expectation-Maximization algorithm. In the case of 2x2x2 tensors, we obtain closed formulas for the maximum likelihood estimates. This talk is based on the joint work with Elizabeth Allman, Hector Banos Cervantes, Robin Evans, Serkan Hosten, Daniel Lemke, John Rhodes and Piotr Zwiernik.

Speaker: Nina Otter

Title: Computable invariants for multiparameter persistent homology and their stability

Abstract: Persistent homology (PH) is arguably one of the best known methods in topological data analysis. PH allows to study topological features of data across different values of a parameter, which one can think of as scales of resolution, and provides a summary of how long individual features persist across the different scales of resolution. In many applications, data depend not only on one, but several parameters, and to apply PH to such data one therefore needs to study the evolution of qualitative features across several parameters. While the theory of 1-parameter PH is well understood, the theory of multiparameter PH is hard, and it presents one of the biggest challenges of topological data analysis. In this talk I will briefly introduce persistent homology, and then explain how tools from commutative algebra give computable invariants for multiparameter PH, which are able to capture homology classes with large persistence. I will then discuss efficient algorithms for the computation of these invariants, as well as stability questions. This talk is based on joint work with A. M. del Campo, H. Harrington, H. Schenck, U. Tillmann, and L. Waas.

Speaker: Paul Breiding

Title: Monte Carlo Homology

Abstract: Persistent homology is a tool to estimate the homology groups of a topological space from a finite point sample. The underlying idea is as follows: for varying t, put a ball of radius t around each point and compute the homology of the union of those balls. The theoretical foundation is a theorem by Niyogi, Smale and Weinberger: under the assumption that the finite point sample was drawn from the uniform distribution on a manifold, the theorem tells us how to choose the radius of the balls and the size of the sample to get the correct homology with high probability. In practice, however, the assumptions of the theorem are hard to satisfy. This is why persistent homology looks at topological features that persists for large intervals in the t-space. In this talk, I want to discuss how one could satisfy the assumptions of the Niyogi-Smale-Weinberger Theorem for manifolds that are also algebraic varieties. The algebraic structure opens the path to sampling from the uniform distribution and to computing the appropriate radius of balls. We get an estimator for the homology of the variety that returns the correct answer with high probability.

Speaker : Hugo Maruri-Aguilar

Title : Lasso and model complexity

Abstract: The statistical technique of Lasso (Tibshirani et al, 1996) is built around weighted penalisation of the error term by the absolute sum of coefficients. As the control parameter increases in value, model coefficients shrink progressively towards zero, thus providing the user with a collection of models that start from the ordinary least squares regression model and end with a model with no terms.

This work gravitates around hierarchical squarefree regression models. These models can be seen as simplicial complexes and thus a measure of complexity is given by Betti numbers of the "model/complex". We detail our computations and implementation of the methodology and illustrate our proposal with simulation results and also apply the methodology to a dataset from the literature.

This is joint work with S. Hu (Queen Mary).

Speaker: Emil Horobet (This talk is cancelled)

Title: Multidegrees of the extended likelihood correspondence

Abstract: Maximum likelihood estimation is a fundamental problem in statistics. For discrete statistical models the EM algorithm aims to solve this. One of the drawbacks of this algorithm is that the optimal solution either lies in the relative interior of the model or it lies in the model’s boundary (which we can consider as a submodel). We want to characterize those data which have at least one critical point on a given submodel (for example the boundary of the respective model). In order to do this we consider the extended likelihood correspondence, which is the graph of the conormal variety under the Hadamard product. We develop bounds on the algebraic complexity (multidegrees) of computing MLE on these submodels. This talk is based on joint work with J.I Rodriguez.