## Theory and Measure of Networks

**Reporter**

Jean-Baptiste Rouquier Contributors

Cyril Bertelle Jean-Philippe Cointet Jean-Pierre Gaudin Damien Olivier Matthieu Latapy Lionel Tabourier

**Keywords**

Complex networks, Metrology, Hierarchical clustering, Communities, Graphs, Dynamics, Robustness, Partial data, Contextual data, Diffusion processes.

**Introduction**

Technology, society and nature produce data at an ever increasing rate, and part of this data can be modeled as graphs. In many fields, the graph point of view sheds light complementary to the specialized analysis proper to each field. The complex network approach is a way to work on the relational structure characterizing most complex systems.

In order to define the boundaries of the complex network domain, we suggest the three following criteria: this domain deals with graphs that are:

built from observation data

sufficiently large to allow statistical analysis

evolving mainly by local mechanisms

Much progress has been done during recent years on this specific area, revealing in particular some features common to many networks from various fields (existence of a giant connected component, short distance between nodes, heterogeneous degree distribution, tendency to share common neighbours etc).

Beyond these characteristics, what are the current challenges of complex networks? Our aim is to identify transverse problems from various fields, or challenges asking for general methods. Networks are considered here not only as a tool to study complex systems, but also as an object of study in itself.

**1.1.1. Metrology and modeling****1.1.1.1. Incomplete, noisy, biased data**

In most cases the available data obtained are the result of an intricate measurement protocol which is by nature incomplete and biased.

Data incompleteness can be due to various reasons: privacy concerns, retention strategies… But beyond that, in many cases a massive collection is too costly to be implemented and sometimes even irrelevant.

Bias and noise are intrinsic features of most available data collection procedures. The example of the Internet is typical: routing links are discovered with very heterogeneous probability and noise is produced by phenomena such as load-balancing which creates fake links.

An important challenge for the field consists in overcoming these issues through innovative methods and tools for data collection. In particular, the barrier of cost or feasibility may be reduced by methods that rely on embedded systems such as mobile phones, devices equipped with RFID chips (like Navigo or Oyster), software offered to the user, etc.

Three main directions seem relevant to deal with such situations:

* Use of artificial models to replace missing data, smooth noise, or detect inconsistencies; * Rely on other sources to fill gaps or cross-check the measurements

* Focus on specific meaningful properties which bias can be estimated and even corrected using the knowledge we have of the collection process.

**1.1.1.2. Formalism enrichment**

The network formalism is based on a strictly relational approach (and homogeneous in terms of nodes) which needs to be driven closer to actual systems.

Need for new contexts

This includes geographical, demography, economics and politics approaches. For example, how to deal with data concerning the involvement of partners in an interacting group? Taking that into account calls for an enriched analysis of relational structures, and vice versa: the network approach sheds light on these contexts.

These issues request important modeling efforts, to control what information is lost in the modeling process which is necessarily simplifying – e.g. by approximating the economy of a city population by some flows such as water and money.

Tools adaptation

We therefore must develop procedures capable of taking into account more detailed information, beyond the already wide-spread case of directed and / or weighted networks:

* The overlay of several network (multi-network)

* The semantic of links (e.g. text content attached to an exchange of emails)

* Heterogeneous nodes (bipartite networks if there are two types of node, but possibly more)

**1.1.2. Analyzing networks: from statics to dynamics**

**1.1.2.1 Brief overview of past and present work**

During the last decade, important efforts have been dedicated to seize the properties of static networks:

* generation models fitting real networks (e.g. Goldenberg, Zheng, Fienberg, Airoldi, 2009) * promising frameworks of visualization software (Gephi, Tulip…)

* many parameters to quantify their topology and efficient algorithms to compute them: at a

local scale (patterns, node importance) as well as at a global one (diameter, centrality…), see da Costa Rodrigues Travieso Villas Boas 2007

* algorithms to discern structure:

– community detection methods to describe their organization (see Fortunato 2010) – small world structure and navigability

– hierarchical clustering (like a dendrogram).

Those concepts help to address the need to classify networks. A classic distinction is between natural and man-made networks. There is a tendency towards assortativity in the former, where links derive from affinity, while in the latter, disassortativity is often observed and external constraints have an influence (like public policy on business networks). But this distinction about the origin of data has limits in its operative strength, other classifications should be looked for. Indeed, we now need discriminating properties to classify networks, and ways to relate networks across fields, to ease the porting of methods, to gain insights and intuition from other fields

**1.2.1.2. Measuring and analysing the dynamics**

Yet there is now a great interest in taking into account the dynamical aspects of these large networks (sometimes also called “diachronic” or “longitudinal” data), as interaction patterns evolve through time. This induces changes both on and of the network itself, revealing events which occur through its life and explaining its structure. In the long term such study would provide control on trajectories of the structure by local or global actions on the network (like a public policy). If those events are incidents with negative consequences, it is important to provide ways to make the network robust against accidents and resilient to quickly recover from them.

Collecting dynamical data induces new metrological problems that have to be tackled: see the example of the Internet infrastructure (network routers and servers) whose dynamics is faster than the measurement process.

A well-spread point of view consists in analyzing the dynamical network as a sequence of static snapshots. This involves adapting the existing tools to a larger amount of data, but beyond that, this method is not sufficient to grasp comprehensively the dynamics of the network: we need to propose intrinsically dynamical notions that would take into account its evolving nature.

**1.2.1.3. Evolution of the structure**

Understanding the multiscale organization (i.e. the hierarchy) is crucial to understand both how large scale emerges from local interactions (bottom-up) and conversely how global institutions and existing structure influence individual entities (top-down).

Community structure evolution is a central issue in characterizing the organization of networks: how to adapt the efficient tools that have been proposed to describe how a community appears, dies, splits or merges? Or is it possible to locate community cores which would be particularly stable through time? To discern steps during the evolution?

One may also cite event detection, link prediction, and others (many of which still being to invent).

**1.2.1.4. Diffusion processes on networks**

Another aspect of the dynamics consists in studying phenomena occurring on the network and in particular spreading processes. From a virus epidemic to a packet of information transferred through peer-to-peer exchanges, these problems can be described as diffusion processes on a complex network.

Still important efforts must be realized in order to propose efficient estimates of real spreading phenomena characteristics:

* the records of diffusion processes are scarce, often noisy and relies on unverifiable hypotheses, but recent data collection campaigns promise imminent progresses in this field

* spreading models are mostly based on intuition and rarely cross-checked with real-world data

Then we will be able to study obvious links with the previous section: an evolving structure affects the diffusion phenomena, while agents on a network can (in some models) affect the network topology (stigmergy).

**References**

S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.U. Hwang. “Complex Networks: Structure and Dynamics”, Physics Reports 424(2006), pp 175-308

Alain Barrat, Mac Barthélemy, Alessandro Vespignani. “Dynamical Proecesses on Complex Networks”, Cambridge University Press, 2008

Anna Goldenberg, Alice X. Zheng, Stephen E. Fienberg and Edoardo M. Airoldi, “A Survey of Statistical Network Models”, 2009

Santo Fortunato, “Community detection in graphs”, 2010

L. da F. Costa, F. A. Rodrigues, G. Travieso and P. R. Villas Boas, “Characterization of Complex Networks: A Survey of measurements”, Advances in Physics 57, 2007