graphframes.examples package

Example Graphs and Code

class graphframes.examples.BeliefPropagation[source]

Example code for Belief Propagation (BP)

This provides a template for building customized BP algorithms for different types of graphical models.

This example:

  • Ising model on a grid
  • Parallel Belief Propagation using colored fields

Ising models are probabilistic graphical models over binary variables (see Graphs.gridIsingModel()).

Belief Propagation (BP) provides marginal probabilities of the values of the variables xi i.e., P(xi) for each i. This allows a user to understand likely values of variables. See Wikipedia for more information on BP.

We use a batch synchronous BP algorithm, where batches of vertices are updated synchronously. We follow the mean field update algorithm in Slide 13 of the talk slides from: Wainwright. “Graphical models, message-passing algorithms, and convex optimization.”

The batches are chosen according to a coloring. For background on graph colorings for inference, see for example: Gonzalez et al. “Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees.” AISTATS, 2011.

The BP algorithm works by:

  • Coloring the graph by assigning a color to each vertex such that no neighboring vertices share the same color.
  • In each step of BP, update all vertices of a single color. Alternate colors.
classmethod runBPwithGraphFrames(g, numIter)[source]

Run Belief Propagation using GraphFrame.

This implementation of BP shows how to use GraphFrame’s aggregateMessages method.

class graphframes.examples.Graphs(spark)[source]

Example GraphFrames for testing the API

Parameters:spark – SparkSession
friends()[source]

A GraphFrame of friends in a (fake) social network.

gridIsingModel(n, vStd=1.0, eStd=1.0)[source]

Grid Ising model with random parameters.

Ising models are probabilistic graphical models over binary variables xi. Each binary variable xi corresponds to one vertex, and it may take values -1 or +1. The probability distribution P(X) (over all xi) is parameterized by vertex factors ai and edge factors bij:

P(X) = (1/Z) * exp[ sum_i a_i x_i + sum_{ij} b_{ij} x_i x_j ]

where Z is the normalization constant (partition function). See Wikipedia for more information on Ising models.

Each vertex is parameterized by a single scalar ai. Each edge is parameterized by a single scalar bij.

Parameters:
  • n – Length of one side of the grid. The grid will be of size n x n.
  • vStd – Standard deviation of normal distribution used to generate vertex factors “a”. Default of 1.0.
  • eStd – Standard deviation of normal distribution used to generate edge factors “b”. Default of 1.0.
Returns:

GraphFrame. Vertices have columns “id” and “a”. Edges have columns “src”, “dst”, and “b”. Edges are directed, but they should be treated as undirected in any algorithms run on this model. Vertex IDs are of the form “i,j”. E.g., vertex “1,3” is in the second row and fourth column of the grid.