Abstract

The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. It builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.

Main image