Inherited from Serializable
Inherited from Arguments
Inherited by implicit conversion any2stringadd from
BFS to any2stringadd[BFS]
Inherited by implicit conversion StringFormat from
BFS to StringFormat[BFS]
Inherited by implicit conversion Ensuring from
BFS to Ensuring[BFS]
Inherited by implicit conversion ArrowAssoc from
BFS to ArrowAssoc[BFS]
Breadth-first search (BFS)
This method returns a DataFrame of valid shortest paths from vertices matching
fromExpr
to vertices matchingtoExpr
. If multiple paths are valid and have the same length, the DataFrame will return one Row for each path. If no paths are valid, the DataFrame will be empty. Note: "Shortest" means globally shortest path. I.e., if the shortest path between two vertices matchingfromExpr
andtoExpr
is length 5 (edges) but no path is shorter than 5, then all paths returned by BFS will have length 5.The returned DataFrame will have the following columns:
from
start vertex of pathe[i]
edge i in the path, indexed from 0v[i]
intermediate vertex i in the path, indexed from 1to
end vertex of path Each of these columns is a StructType whose fields are the same as the columns of GraphFrame.vertices or GraphFrame.edges.For example, suppose we have a graph g. Say the vertices DataFrame of g has columns "id" and "job", and the edges DataFrame of g has columns "src", "dst", and "relation".
If we found a path of 3 edges, each row would have columns:
In the above row, each vertex column (from, v1, v2, to) would have fields "id" and "job" (just like g.vertices). Each edge column (e0, e1, e2) would have fields "src", "dst", and "relation".
If there are ties, then each of the equal paths will be returned as a separate Row.
If one or more vertices match both the from and to conditions, then there is a 0-hop path. The returned DataFrame will have the "from" and "to" columns (as above); however, the "from" and "to" columns will be exactly the same. There will be one row for each vertex in GraphFrame.vertices matching both
fromExpr
andtoExpr
.Parameters:
fromExpr
Spark SQL expression specifying valid starting vertices for the BFS. This condition will be matched against each vertex's id or attributes. To start from a specific vertex, this could be "id = [start vertex id]". To start from multiple valid vertices, this can operate on vertex attributes.toExpr
Spark SQL expression specifying valid target vertices for the BFS. This condition will be matched against each vertex's id or attributes.maxPathLength
Limit on the length of paths. If no valid paths of length <= maxPathLength are found, then the BFS is terminated. (default = 10)edgeFilter
Spark SQL expression specifying edges which may be used in the search. This allows the user to disallow crossing certain edges. Such filters can be applied post-hoc after BFS, run specifying the filter here is more efficient.Returns: