Gets the connected component algorithm to use.
Gets the connected component algorithm to use.
Gets broadcast threshold in propagating component assignment.
Gets broadcast threshold in propagating component assignment.
Gets checkpoint interval.
Gets checkpoint interval.
Runs the algorithm.
Sets the connected components algorithm to use (default: "graphframes").
Sets the connected components algorithm to use (default: "graphframes"). Supported algorithms are:
Sets broadcast threshold in propagating component assignments (default: 1000000).
Sets broadcast threshold in propagating component assignments (default: 1000000). If a node degree is greater than this threshold at some iteration, its component assignment will be collected and then broadcasted back to propagate the assignment to its neighbors. Otherwise, the assignment propagation is done by a normal Spark join. This parameter is only used when the algorithm is set to "graphframes".
Sets checkpoint interval in terms of number of iterations (default: 2).
Sets checkpoint interval in terms of number of iterations (default: 2).
Checkpointing regularly helps recover from failures, clean shuffle files, shorten the
lineage of the computation graph, and reduce the complexity of plan optimization.
As of Spark 2.0, the complexity of plan optimization would grow exponentially without
checkpointing.
Hence disabling or setting longer-than-default checkpoint intervals are not recommended.
Checkpoint data is saved under org.apache.spark.SparkContext.getCheckpointDir
with
prefix "connected-components".
If the checkpoint directory is not set, this throws a java.io.IOException
.
Set a nonpositive value to disable checkpointing.
This parameter is only used when the algorithm is set to "graphframes".
Its default value might change in the future.
org.apache.spark.SparkContext.setCheckpointDir
in Spark API doc
Connected components algorithm.
Computes the connected component membership of each vertex and returns a DataFrame of vertex information with each vertex assigned a component ID.
The resulting DataFrame contains all the vertex information and one additional column:
LongType
): unique ID for this component