graphframes.examples package¶

Example Graphs and Code¶

class graphframes.examples.BeliefPropagation[source]¶

Example code for Belief Propagation (BP)

This provides a template for building customized BP algorithms for different types of graphical models.

This example:

Ising model on a grid
Parallel Belief Propagation using colored fields

Ising models are probabilistic graphical models over binary variables (see Graphs.gridIsingModel()).

Belief Propagation (BP) provides marginal probabilities of the values of the variables x_i i.e., P(x_i) for each i. This allows a user to understand likely values of variables. See Wikipedia for more information on BP.

We use a batch synchronous BP algorithm, where batches of vertices are updated synchronously. We follow the mean field update algorithm in Slide 13 of the talk slides from: Wainwright. “Graphical models, message-passing algorithms, and convex optimization.”

The batches are chosen according to a coloring. For background on graph colorings for inference, see for example: Gonzalez et al. “Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees.” AISTATS, 2011.

The BP algorithm works by:

Coloring the graph by assigning a color to each vertex such that no neighboring vertices share the same color.
In each step of BP, update all vertices of a single color. Alternate colors.

classmethod runBPwithGraphFrames(g, numIter)[source]¶

Run Belief Propagation using GraphFrame.

This implementation of BP shows how to use GraphFrame’s aggregateMessages method.

class graphframes.examples.Graphs(spark)[source]¶

Example GraphFrames for testing the API

Parameters: spark – SparkSession

friends()[source]¶: A GraphFrame of friends in a (fake) social network.

gridIsingModel(n, vStd=1.0, eStd=1.0)[source]¶

Grid Ising model with random parameters.

Ising models are probabilistic graphical models over binary variables x_i. Each binary variable x_i corresponds to one vertex, and it may take values -1 or +1. The probability distribution P(X) (over all x_i) is parameterized by vertex factors a_i and edge factors b_ij:

P(X) = (1/Z) * exp[ sum_i a_i x_i + sum_{ij} b_{ij} x_i x_j ]

where Z is the normalization constant (partition function). See Wikipedia for more information on Ising models.

Each vertex is parameterized by a single scalar a_i. Each edge is parameterized by a single scalar b_ij.

Parameters

n – Length of one side of the grid. The grid will be of size n x n.
vStd – Standard deviation of normal distribution used to generate vertex factors “a”. Default of 1.0.
eStd – Standard deviation of normal distribution used to generate edge factors “b”. Default of 1.0.

Returns

GraphFrame. Vertices have columns “id” and “a”. Edges have columns “src”, “dst”, and “b”. Edges are directed, but they should be treated as undirected in any algorithms run on this model. Vertex IDs are of the form “i,j”. E.g., vertex “1,3” is in the second row and fourth column of the grid.

graphframes.examples package¶

Example Graphs and Code¶

Table of Contents

Previous topic

Next topic

This Page