# Percolation Centrality

Learn Data Science

Centrality as a concept is largely subjective because it answers the question of what nodes are most important in a network.  Importance is heavily dependent on what question you're trying to answer: if you want to know who has the most connections in a network, you need to only count the degrees.  If you're trying to understand information transmission, Betweenness is great.  For cases where you want to understand the ability to transmit information over time and given specific nodal properties, however, you need something like Percolation Centrality (PC) - which is today's topic of discussion!

I want to talk about Betweenness first, as it's the basis for PC.  Betweenness as a concept assumes that information will travel over the "path of least resistance" - If your mother-in-law knows a joke, you're more likely to hear it from her than you are from the Pope - who may have told his friend, who tells his gardener, who tells her cousin, who tells her accountant, who finally tells you.  Likewise, Betweenness is calculated as the proportion of shortest paths that go through any given node.  If we make the assumption discussed above regarding shortest paths, then the proportion should describe the relative control a node has over the transmission of information, in this case, the telling of a joke.

The question then becomes: how does this control change over time, and given nodal states?  If your mother-in-law doesn't know the joke, she can't tell you the joke.  Therefore, her ability to transmit information (the joke) in this case would be 0.  This ability is something I'll refer to as a nodal state.  This is the one of the key additions that PC adds over Betweenness Centrality.  At each point in time, PC is computed as the proportion of percolated paths that go through a given node (where percolated paths refer to paths that originate from a percolated source).  If you're interested in learning more about the mathematical foundation of Percolation Centrality, please refer to the PLoS One publication by Piraveenan et. al found here

As part of my M.S. thesis, I developed logic to parallelize the computation of Percolation Centrality, and extended Aster's Betweennness code to accommodate the changes necessary.  The key files are attached to this post, however, if dependencies are needed, please contact Mike Rote (Mike.Rote@Teradata.com) for the complete software package.

The Percolation Centrality function takes the same arguments as the Betweenness function, with the exception that the "source" input is mandatory, and should be a table of "percolated" nodes (node ID is the only required column).  More information regarding the Aster implementation can be found in my thesis document ("Bulk Synchronous Parallel Implementation of Percolation Centrality for Large Scale Graphs", which will be published through OhioLink.

Percolation Centrality can be used in applications from diverse fields.  Examples include:

• Epidemiology (identifying high-risk contagion points in a social)
• Marketing (identifying potential recruiters in a network)
• Transportation (issue tracking)

Cheers!