This is a primitive for implementing graph algorithms.
This is a primitive for implementing graph algorithms. This method aggregates values from the neighboring edges and vertices of each vertex. See AggregateMessages for detailed documentation.
Breadth-first search (BFS)
Breadth-first search (BFS)
Refer to the documentation of org.graphframes.lib.BFS for the description of the output.
Persist the dataframe representation of vertices and edges of the graph with the default storage level.
Connected component algorithm.
Connected component algorithm.
See org.graphframes.lib.ConnectedComponents for more details.
The degree of each vertex in the graph, returned as a DataFrame with two columns:
The degree of each vertex in the graph, returned as a DataFrame with two columns:
Version of edgeColumns which maps column names to indices in the Rows.
The vertex names in the vertices DataFrame, in order.
The dataframe representation of the edges of the graph.
The dataframe representation of the edges of the graph.
It contains two columns called GraphFrame.SRC and GraphFrame.DST that contain the ids of the source vertex and the destination vertex of each edge, respectively. It may also contain various other columns with user-defined attributes for each edge.
For symmetric graphs, both pairs src -> dst and dst -> src are present with the same attributes for each pair.
The order of the columns is available in edgeColumns.
Motif finding: Searching the graph for structural patterns
Motif finding: Searching the graph for structural patterns
Motif finding uses a simple Domain-Specific Language (DSL) for expressing structural queries.
For example, graph.find("(a)-[e]->(b); (b)-[e2]->(a)")
will search for pairs of vertices
a,b
connected by edges in both directions. It will return a DataFrame
of all such
structures in the graph, with columns for each of the named elements (vertices or edges)
in the motif. In this case, the returned columns will be in order of the pattern:
"a, e, b, e2."
DSL for expressing structural patterns:
"(a)-[e]->(b)"
expresses an edge e
from vertex a
to vertex b
.
Note that vertices are denoted by parentheses (a)
, while edges are denoted by
square brackets [e]
."(a)-[e]->(b); (b)-[e2]->(c)"
specifies two edges from a
to b
to c
."(a)-[e]->(b)"
has three named elements: vertices a,b
and edge e
.
These names serve two purposes:"(a)-[e]->(b); (b)-[e2]->(c)"
specifies that the same vertex b
is the destination
of edge e
and source of edge e2
.DataFrame
. If a motif contains
named vertex a
, then the result DataFrame
will contain a column "a" which is a
StructType
with sub-fields equivalent to the schema (columns) of
GraphFrame.vertices. Similarly, an edge e
in a motif will produce a column "e"
in the result DataFrame
with sub-fields equivalent to the schema (columns) of
GraphFrame.edges."(a)-[]->(b)"
expresses an edge between vertices a,b
but does not assign a name
to the edge. There will be no column for the anonymous edge in the result DataFrame
.
Similarly, "(a)-[e]->()"
indicates an out-edge of vertex a
but does not name
the destination vertex."(a)-[]->(b); !(b)-[]->(a)"
finds edges from a
to b
for which there is *no*
edge from b
to a
.More complex queries, such as queries which operate on vertex or edge attributes,
can be expressed by applying filters to the result DataFrame
.
Pattern specifying a motif to search for.
DataFrame
containing all instances of the motif.
The in-degree of each vertex in the graph, returned as a DataFame with two columns:
The in-degree of each vertex in the graph, returned as a DataFame with two columns:
Label propagation algorithm.
Label propagation algorithm.
See org.graphframes.lib.LabelPropagation for more details.
The out-degree of each vertex in the graph, returned as a DataFrame with two columns:
The out-degree of each vertex in the graph, returned as a DataFrame with two columns:
PageRank algorithm.
PageRank algorithm.
See org.graphframes.lib.PageRank for more details.
Parallel personalized PageRank algorithm.
Parallel personalized PageRank algorithm.
See org.graphframes.lib.ParallelPersonalizedPageRank for more details.
Persist the dataframe representation of vertices and edges of the graph with the given storage level.
Persist the dataframe representation of vertices and edges of the graph with the given storage level.
One of: MEMORY_ONLY
, MEMORY_AND_DISK
, MEMORY_ONLY_SER
,
MEMORY_AND_DISK_SER
, DISK_ONLY
, MEMORY_ONLY_2
, MEMORY_AND_DISK_2
, etc..
Persist the dataframe representation of vertices and edges of the graph with the default storage level.
Shortest paths algorithm.
Shortest paths algorithm.
See org.graphframes.lib.ShortestPaths for more details.
Strongly connected components algorithm.
Strongly connected components algorithm.
See org.graphframes.lib.StronglyConnectedComponents for more details.
SVD++ algorithm.
SVD++ algorithm.
See org.graphframes.lib.SVDPlusPlus for more details.
Converts this GraphFrame instance to a GraphX Graph
.
Converts this GraphFrame instance to a GraphX Graph
.
Vertex and edge attributes are the original rows in vertices and edges, respectively.
Note that vertex (and edge) attributes include vertex IDs (and source, destination IDs) in order to support non-Long vertex IDs. If the vertex IDs are not convertible to Long values, then the values are indexed in order to generate corresponding Long vertex IDs (which is an expensive operation).
The column ordering of the returned Graph
vertex and edge attributes are specified by
vertexColumns and edgeColumns, respectively.
Triangle count algorithm.
Triangle count algorithm.
See org.graphframes.lib.TriangleCount for more details.
Returns triplets: (source vertex)-[edge]->(destination vertex) for all edges in the graph.
Returns triplets: (source vertex)-[edge]->(destination vertex) for all edges in the graph. The DataFrame returned has 3 columns, with names: GraphFrame.SRC, GraphFrame.EDGE, and GraphFrame.DST. The 2 vertex columns have schema matching GraphFrame.vertices, and the edge column has a schema matching GraphFrame.edges.
Mark the dataframe representation of vertices and edges of the graph as non-persistent, and remove all blocks for it from memory and disk.
Mark the dataframe representation of vertices and edges of the graph as non-persistent, and remove all blocks for it from memory and disk.
Whether to block until all blocks are deleted.
Mark the dataframe representation of vertices and edges of the graph as non-persistent, and remove all blocks for it from memory and disk.
Version of vertexColumns which maps column names to indices in the Rows.
The column names in the vertices DataFrame, in order.
The dataframe representation of the vertices of the graph.
The dataframe representation of the vertices of the graph.
It contains a column called GraphFrame.ID with the id of the vertex, and various other user-defined attributes with other attributes.
The order of the columns is available in vertexColumns.
A representation of a graph using
DataFrame
s.