Reconstructing Networks with Unknown and Heterogeneous Errors

Open Access

Reconstructing Networks with Unknown and Heterogeneous Errors

Tiago P. Peixoto

Phys. Rev. X 8, 041011 – Published 16 October 2018

Abstract

The vast majority of network data sets contain errors and omissions, although this fact is rarely incorporated in traditional network analysis. Recently, an increasing effort has been made to fill this methodological gap by developing network-reconstruction approaches based on Bayesian inference. These approaches, however, rely on assumptions of uniform error rates and on direct estimations of the existence of each edge via repeated measurements, something that is currently unavailable for the majority of network data. Here, we develop a Bayesian reconstruction approach that lifts these limitations by allowing for not only heterogeneous errors, but also for single edge measurements without direct error estimates. Our approach works by coupling the inference approach with structured generative network models, which enable the correlations between edges to be used as reliable uncertainty estimates. Although our approach is general, we focus on the stochastic block model as the basic generative process, from which efficient nonparametric inference can be performed and yields a principled method to infer hierarchical community structure from noisy data. We demonstrate the efficacy of our approach with a variety of empirical and artificial networks.

10 More

Received 25 June 2018
Revised 23 August 2018
Corrected 22 January 2019

DOI:https://doi.org/10.1103/PhysRevX.8.041011

Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.

Published by the American Physical Society

Physics Subject Headings (PhySH)

Collective behavior in networks Community structure Network structure Patterns in complex systems

Networks & random structures Random graphs Stochastic networks

Block models Data analysis Metropolis algorithm Monte Carlo methods Network Models Networks Analysis Tools Statistical methods

NetworksInterdisciplinary PhysicsStatistical Physics & Thermodynamics

Corrections

22 January 2019

Correction: The caption to Fig. 3 contained typographical errors and has been fixed.

Authors & Affiliations

Tiago P. Peixoto^*

Department of Mathematical Sciences and Centre for Networks and Collective Behaviour, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom and ISI Foundation, Via Chisola 5, 10126 Torino, Italy

^*t.peixoto@bath.ac.uk

Popular Summary

The past two decades have witnessed a surge of empirical data on large-scale networks such as transportation routes, social contacts, and the Internet. This has driven the development of sophisticated techniques for extracting scientific understanding from this wealth of relational data. Despite these advances, most studies of real-world networks neglect observational errors. This omission makes it difficult to distinguish between real features of the underlying system and those brought on by distortions during measurement. Here, we develop a reconstruction method that incorporates the possibility of measurement errors, and yields an estimate of the underlying network most consistent with empirical evidence.

A central feature of our method is that it is usable even if the network data have no additional information that can be used for error assessment, such as multiple measurements. This is achieved by basing the reconstruction on network models that are capable of extracting the large-scale modular patterns found in the data, thus exploiting the existence of correlations as a proxy for the estimation of uncertainty. The result is an algorithm that can be used on most available data sets that omit error estimates. Furthermore, our approach can provide the missing error estimates for these data, which can then be seamlessly incorporated into any chain of analysis.

The systematic inclusion of uncertainties is a requirement in any data-driven scientific program. Our proposed method enables this to be performed for a wide class of network data, making it possible, in a general way, to attribute any statement based on observations with an assessment of its statistical evidence.

Key Image

Article Text

Click to Expand

References

Click to Expand

Issue

Vol. 8, Iss. 4 — October - December 2018

Subject Areas

Reuse & Permissions

Author publication services for translation and copyediting assistance advertisement

Physical Review X