Abstract
The vast majority of network data sets contain errors and omissions, although this fact is rarely incorporated in traditional network analysis. Recently, an increasing effort has been made to fill this methodological gap by developing network-reconstruction approaches based on Bayesian inference. These approaches, however, rely on assumptions of uniform error rates and on direct estimations of the existence of each edge via repeated measurements, something that is currently unavailable for the majority of network data. Here, we develop a Bayesian reconstruction approach that lifts these limitations by allowing for not only heterogeneous errors, but also for single edge measurements without direct error estimates. Our approach works by coupling the inference approach with structured generative network models, which enable the correlations between edges to be used as reliable uncertainty estimates. Although our approach is general, we focus on the stochastic block model as the basic generative process, from which efficient nonparametric inference can be performed and yields a principled method to infer hierarchical community structure from noisy data. We demonstrate the efficacy of our approach with a variety of empirical and artificial networks.
10 More- Received 25 June 2018
- Revised 23 August 2018
- Corrected 22 January 2019
DOI:https://doi.org/10.1103/PhysRevX.8.041011
Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.
Published by the American Physical Society
Physics Subject Headings (PhySH)
Corrections
22 January 2019
Correction: The caption to Fig. 3 contained typographical errors and has been fixed.
Popular Summary
The past two decades have witnessed a surge of empirical data on large-scale networks such as transportation routes, social contacts, and the Internet. This has driven the development of sophisticated techniques for extracting scientific understanding from this wealth of relational data. Despite these advances, most studies of real-world networks neglect observational errors. This omission makes it difficult to distinguish between real features of the underlying system and those brought on by distortions during measurement. Here, we develop a reconstruction method that incorporates the possibility of measurement errors, and yields an estimate of the underlying network most consistent with empirical evidence.
A central feature of our method is that it is usable even if the network data have no additional information that can be used for error assessment, such as multiple measurements. This is achieved by basing the reconstruction on network models that are capable of extracting the large-scale modular patterns found in the data, thus exploiting the existence of correlations as a proxy for the estimation of uncertainty. The result is an algorithm that can be used on most available data sets that omit error estimates. Furthermore, our approach can provide the missing error estimates for these data, which can then be seamlessly incorporated into any chain of analysis.
The systematic inclusion of uncertainties is a requirement in any data-driven scientific program. Our proposed method enables this to be performed for a wide class of network data, making it possible, in a general way, to attribute any statement based on observations with an assessment of its statistical evidence.