graphframes.examples package¶
Example Graphs and Code¶
- class graphframes.examples.BeliefPropagation[source]¶
Example code for Belief Propagation (BP)
This provides a template for building customized BP algorithms for different types of graphical models.
This example:
Ising model on a grid
Parallel Belief Propagation using colored fields
Ising models are probabilistic graphical models over binary variables (see
Graphs.gridIsingModel()
).Belief Propagation (BP) provides marginal probabilities of the values of the variables x_{i} i.e., P(x_{i}) for each i. This allows a user to understand likely values of variables. See Wikipedia for more information on BP.
We use a batch synchronous BP algorithm, where batches of vertices are updated synchronously. We follow the mean field update algorithm in Slide 13 of the talk slides from: Wainwright. “Graphical models, message-passing algorithms, and convex optimization.”
The batches are chosen according to a coloring. For background on graph colorings for inference, see for example: Gonzalez et al. “Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees.” AISTATS, 2011.
The BP algorithm works by:
Coloring the graph by assigning a color to each vertex such that no neighboring vertices share the same color.
In each step of BP, update all vertices of a single color. Alternate colors.
- class graphframes.examples.Graphs(spark)[source]¶
Example GraphFrames for testing the API
- Parameters
spark – SparkSession
- gridIsingModel(n, vStd=1.0, eStd=1.0)[source]¶
Grid Ising model with random parameters.
Ising models are probabilistic graphical models over binary variables x_{i}. Each binary variable x_{i} corresponds to one vertex, and it may take values -1 or +1. The probability distribution P(X) (over all x_{i}) is parameterized by vertex factors a_{i} and edge factors b_{ij}:
P(X) = (1/Z) * exp[ sum_i a_i x_i + sum_{ij} b_{ij} x_i x_j ]
where Z is the normalization constant (partition function). See Wikipedia for more information on Ising models.
Each vertex is parameterized by a single scalar a_{i}. Each edge is parameterized by a single scalar b_{ij}.
- Parameters
n – Length of one side of the grid. The grid will be of size n x n.
vStd – Standard deviation of normal distribution used to generate vertex factors “a”. Default of 1.0.
eStd – Standard deviation of normal distribution used to generate edge factors “b”. Default of 1.0.
- Returns
GraphFrame. Vertices have columns “id” and “a”. Edges have columns “src”, “dst”, and “b”. Edges are directed, but they should be treated as undirected in any algorithms run on this model. Vertex IDs are of the form “i,j”. E.g., vertex “1,3” is in the second row and fourth column of the grid.