Packages

class GraphFrame extends Logging with Serializable

A representation of a graph using DataFrames.

Linear Supertypes
Serializable, Serializable, Logging, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. GraphFrame
  2. Serializable
  3. Serializable
  4. Logging
  5. AnyRef
  6. Any
Implicitly
  1. by any2stringadd
  2. by StringFormat
  3. by Ensuring
  4. by ArrowAssoc
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new GraphFrame()

    Default constructor is provided to support serialization

    Default constructor is provided to support serialization

    Attributes
    protected

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. def +(other: String): String
    Implicit
    This member is added by an implicit conversion from GraphFrame to any2stringadd[GraphFrame] performed by method any2stringadd in scala.Predef.
    Definition Classes
    any2stringadd
  4. def ->[B](y: B): (GraphFrame, B)
    Implicit
    This member is added by an implicit conversion from GraphFrame to ArrowAssoc[GraphFrame] performed by method ArrowAssoc in scala.Predef.
    Definition Classes
    ArrowAssoc
    Annotations
    @inline()
  5. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  6. def aggregateMessages: AggregateMessages

    This is a primitive for implementing graph algorithms.

    This is a primitive for implementing graph algorithms. This method aggregates values from the neighboring edges and vertices of each vertex. See AggregateMessages for detailed documentation.

  7. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  8. def bfs: BFS

    Breadth-first search (BFS)

    Breadth-first search (BFS)

    Refer to the documentation of org.graphframes.lib.BFS for the description of the output.

  9. def cache(): GraphFrame.this.type

    Persist the dataframe representation of vertices and edges of the graph with the default storage level.

  10. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  11. def connectedComponents: ConnectedComponents

    Connected component algorithm.

    Connected component algorithm.

    See org.graphframes.lib.ConnectedComponents for more details.

  12. lazy val degrees: DataFrame

    The degree of each vertex in the graph, returned as a DataFrame with two columns:

    The degree of each vertex in the graph, returned as a DataFrame with two columns:

    • GraphFrame.ID the ID of the vertex
    • 'degree' (integer) the degree of the vertex Note that vertices with 0 edges are not returned in the result.
    Annotations
    @transient()
  13. def dropIsolatedVertices(): GraphFrame

    Drop isolated vertices, vertices not contained in any edges.

  14. def edgeColumnMap: Map[String, Int]

    Version of edgeColumns which maps column names to indices in the Rows.

  15. def edgeColumns: Array[String]

    The vertex names in the vertices DataFrame, in order.

    The vertex names in the vertices DataFrame, in order.

    Helper method for toGraphX which specifies the schema of edge attributes. The edge attributes of the returned edges are given as a Row, and this method defines the column ordering in that Row.

  16. def edges: DataFrame

    The dataframe representation of the edges of the graph.

    The dataframe representation of the edges of the graph.

    It contains two columns called GraphFrame.SRC and GraphFrame.DST that contain the ids of the source vertex and the destination vertex of each edge, respectively. It may also contain various other columns with user-defined attributes for each edge.

    For symmetric graphs, both pairs src -> dst and dst -> src are present with the same attributes for each pair.

    The order of the columns is available in edgeColumns.

  17. def ensuring(cond: (GraphFrame) ⇒ Boolean, msg: ⇒ Any): GraphFrame
    Implicit
    This member is added by an implicit conversion from GraphFrame to Ensuring[GraphFrame] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  18. def ensuring(cond: (GraphFrame) ⇒ Boolean): GraphFrame
    Implicit
    This member is added by an implicit conversion from GraphFrame to Ensuring[GraphFrame] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  19. def ensuring(cond: Boolean, msg: ⇒ Any): GraphFrame
    Implicit
    This member is added by an implicit conversion from GraphFrame to Ensuring[GraphFrame] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  20. def ensuring(cond: Boolean): GraphFrame
    Implicit
    This member is added by an implicit conversion from GraphFrame to Ensuring[GraphFrame] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  21. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  22. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  23. def filterEdges(conditionExpr: String): GraphFrame

    Filter the edges according to String expression.

  24. def filterEdges(condition: Column): GraphFrame

    Filter the edges according to Column expression, keep all vertices.

  25. def filterVertices(conditionExpr: String): GraphFrame

    Filter the vertices according to String expression, remove edges containing any dropped vertices.

  26. def filterVertices(condition: Column): GraphFrame

    Filter the vertices according to Column expression, remove edges containing any dropped vertices.

  27. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  28. def find(pattern: String): DataFrame

    Motif finding: Searching the graph for structural patterns

    Motif finding: Searching the graph for structural patterns

    Motif finding uses a simple Domain-Specific Language (DSL) for expressing structural queries. For example, graph.find("(a)-[e]->(b); (b)-[e2]->(a)") will search for pairs of vertices a,b connected by edges in both directions. It will return a DataFrame of all such structures in the graph, with columns for each of the named elements (vertices or edges) in the motif. In this case, the returned columns will be in order of the pattern: "a, e, b, e2."

    DSL for expressing structural patterns:

    • The basic unit of a pattern is an edge. For example, "(a)-[e]->(b)" expresses an edge e from vertex a to vertex b. Note that vertices are denoted by parentheses (a), while edges are denoted by square brackets [e].
    • A pattern is expressed as a union of edges. Edge patterns can be joined with semicolons. Motif "(a)-[e]->(b); (b)-[e2]->(c)" specifies two edges from a to b to c.
    • Within a pattern, names can be assigned to vertices and edges. For example, "(a)-[e]->(b)" has three named elements: vertices a,b and edge e. These names serve two purposes:
      • The names can identify common elements among edges. For example, "(a)-[e]->(b); (b)-[e2]->(c)" specifies that the same vertex b is the destination of edge e and source of edge e2.
      • The names are used as column names in the result DataFrame. If a motif contains named vertex a, then the result DataFrame will contain a column "a" which is a StructType with sub-fields equivalent to the schema (columns) of GraphFrame.vertices. Similarly, an edge e in a motif will produce a column "e" in the result DataFrame with sub-fields equivalent to the schema (columns) of GraphFrame.edges.
      • Be aware that names do *not* identify *distinct* elements: two elements with different names may refer to the same graph element. For example, in the motif "(a)-[e]->(b); (b)-[e2]->(c)", the names a and c could refer to the same vertex. To restrict named elements to be distinct vertices or edges, use post-hoc filters such as resultDataframe.filter("a.id != c.id").
    • It is acceptable to omit names for vertices or edges in motifs when not needed. E.g., "(a)-[]->(b)" expresses an edge between vertices a,b but does not assign a name to the edge. There will be no column for the anonymous edge in the result DataFrame. Similarly, "(a)-[e]->()" indicates an out-edge of vertex a but does not name the destination vertex. These are called *anonymous* vertices and edges.
    • An edge can be negated to indicate that the edge should *not* be present in the graph. E.g., "(a)-[]->(b); !(b)-[]->(a)" finds edges from a to b for which there is *no* edge from b to a.

    Restrictions:

    • Motifs are not allowed to contain edges without any named elements: "()-[]->()" and "!()-[]->()" are prohibited terms.
    • Motifs are not allowed to contain named edges within negated terms (since these named edges would never appear within results). E.g., "!(a)-[ab]->(b)" is invalid, but "!(a)-[]->(b)" is valid.

    More complex queries, such as queries which operate on vertex or edge attributes, can be expressed by applying filters to the result DataFrame.

    This can return duplicate rows. E.g., a query "(u)-[]->()" will return a result for each matching edge, even if those edges share the same vertex u.

    pattern

    Pattern specifying a motif to search for.

    returns

    DataFrame containing all instances of the motif.

  29. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  30. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  31. lazy val inDegrees: DataFrame

    The in-degree of each vertex in the graph, returned as a DataFame with two columns:

    The in-degree of each vertex in the graph, returned as a DataFame with two columns:

    • GraphFrame.ID the ID of the vertex "- "inDegree" (int) storing the in-degree of the vertex Note that vertices with 0 in-edges are not returned in the result.
    Annotations
    @transient()
  32. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  33. def labelPropagation: LabelPropagation

    Label propagation algorithm.

    Label propagation algorithm.

    See org.graphframes.lib.LabelPropagation for more details.

  34. def logDebug(s: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  35. def logInfo(s: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  36. def logTrace(s: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  37. def logWarn(s: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  38. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  39. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  40. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  41. lazy val outDegrees: DataFrame

    The out-degree of each vertex in the graph, returned as a DataFrame with two columns:

    The out-degree of each vertex in the graph, returned as a DataFrame with two columns:

    • GraphFrame.ID the ID of the vertex
    • "outDegree" (integer) storing the out-degree of the vertex Note that vertices with 0 out-edges are not returned in the result.
    Annotations
    @transient()
  42. def pageRank: PageRank

    PageRank algorithm.

    PageRank algorithm.

    See org.graphframes.lib.PageRank for more details.

  43. def parallelPersonalizedPageRank: ParallelPersonalizedPageRank

    Parallel personalized PageRank algorithm.

    Parallel personalized PageRank algorithm.

    See org.graphframes.lib.ParallelPersonalizedPageRank for more details.

  44. def persist(newLevel: StorageLevel): GraphFrame.this.type

    Persist the dataframe representation of vertices and edges of the graph with the given storage level.

    Persist the dataframe representation of vertices and edges of the graph with the given storage level.

    newLevel

    One of: MEMORY_ONLY, MEMORY_AND_DISK, MEMORY_ONLY_SER, MEMORY_AND_DISK_SER, DISK_ONLY, MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc..

  45. def persist(): GraphFrame.this.type

    Persist the dataframe representation of vertices and edges of the graph with the default storage level.

  46. def pregel: Pregel

    Pregel algorithm.

    Pregel algorithm.

    See also

    org.graphframes.lib.Pregel

  47. def shortestPaths: ShortestPaths

    Shortest paths algorithm.

    Shortest paths algorithm.

    See org.graphframes.lib.ShortestPaths for more details.

  48. def stronglyConnectedComponents: StronglyConnectedComponents

    Strongly connected components algorithm.

    Strongly connected components algorithm.

    See org.graphframes.lib.StronglyConnectedComponents for more details.

  49. def svdPlusPlus: SVDPlusPlus

    SVD++ algorithm.

    SVD++ algorithm.

    See org.graphframes.lib.SVDPlusPlus for more details.

  50. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  51. def toGraphX: Graph[Row, Row]

    Converts this GraphFrame instance to a GraphX Graph.

    Converts this GraphFrame instance to a GraphX Graph. Vertex and edge attributes are the original rows in vertices and edges, respectively.

    Note that vertex (and edge) attributes include vertex IDs (and source, destination IDs) in order to support non-Long vertex IDs. If the vertex IDs are not convertible to Long values, then the values are indexed in order to generate corresponding Long vertex IDs (which is an expensive operation).

    The column ordering of the returned Graph vertex and edge attributes are specified by vertexColumns and edgeColumns, respectively.

  52. def toString(): String
    Definition Classes
    GraphFrame → AnyRef → Any
  53. def triangleCount: TriangleCount

    Triangle count algorithm.

    Triangle count algorithm.

    See org.graphframes.lib.TriangleCount for more details.

  54. lazy val triplets: DataFrame

    Returns triplets: (source vertex)-[edge]->(destination vertex) for all edges in the graph.

    Returns triplets: (source vertex)-[edge]->(destination vertex) for all edges in the graph. The DataFrame returned has 3 columns, with names: GraphFrame.SRC, GraphFrame.EDGE, and GraphFrame.DST. The 2 vertex columns have schema matching GraphFrame.vertices, and the edge column has a schema matching GraphFrame.edges.

  55. def unpersist(blocking: Boolean): GraphFrame.this.type

    Mark the dataframe representation of vertices and edges of the graph as non-persistent, and remove all blocks for it from memory and disk.

    Mark the dataframe representation of vertices and edges of the graph as non-persistent, and remove all blocks for it from memory and disk.

    blocking

    Whether to block until all blocks are deleted.

  56. def unpersist(): GraphFrame.this.type

    Mark the dataframe representation of vertices and edges of the graph as non-persistent, and remove all blocks for it from memory and disk.

  57. def vertexColumnMap: Map[String, Int]

    Version of vertexColumns which maps column names to indices in the Rows.

  58. def vertexColumns: Array[String]

    The column names in the vertices DataFrame, in order.

    The column names in the vertices DataFrame, in order.

    Helper method for toGraphX which specifies the schema of vertex attributes. The vertex attributes of the returned Graph are given as a Row, and this method defines the column ordering in that Row.

  59. def vertices: DataFrame

    The dataframe representation of the vertices of the graph.

    The dataframe representation of the vertices of the graph.

    It contains a column called GraphFrame.ID with the id of the vertex, and various other user-defined attributes with other attributes.

    The order of the columns is available in vertexColumns.

  60. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  61. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  62. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  63. def [B](y: B): (GraphFrame, B)
    Implicit
    This member is added by an implicit conversion from GraphFrame to ArrowAssoc[GraphFrame] performed by method ArrowAssoc in scala.Predef.
    Definition Classes
    ArrowAssoc

Deprecated Value Members

  1. def formatted(fmtstr: String): String
    Implicit
    This member is added by an implicit conversion from GraphFrame to StringFormat[GraphFrame] performed by method StringFormat in scala.Predef.
    Definition Classes
    StringFormat
    Annotations
    @deprecated @inline()
    Deprecated

    (Since version 2.12.16) Use formatString.format(value) instead of value.formatted(formatString), or use the f"" string interpolator. In Java 15 and later, formatted resolves to the new method in String which has reversed parameters.

Inherited from Serializable

Inherited from Serializable

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Inherited by implicit conversion any2stringadd from GraphFrame to any2stringadd[GraphFrame]

Inherited by implicit conversion StringFormat from GraphFrame to StringFormat[GraphFrame]

Inherited by implicit conversion Ensuring from GraphFrame to Ensuring[GraphFrame]

Inherited by implicit conversion ArrowAssoc from GraphFrame to ArrowAssoc[GraphFrame]

Conversions

Graph topology

Motif finding

Standard graph algorithms

Structure information

Subgraph selection

Ungrouped