GraphFrames Overview
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. It provides high-level APIs in Scala, Java, and Python. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.
What are GraphFrames?
GraphFrames are to DataFrames as GraphX is to RDDs.
GraphFrames represent graphs: vertices (e.g., users) and edges (e.g., relationships between users). If you are familiar with GraphX, then GraphFrames will be easy to learn. The key difference is that GraphFrames are based upon Spark DataFrames, rather than RDDs.
GraphFrames also provide powerful tools for running queries and standard graph algorithms. With GraphFrames, you can easily search for patterns within graphs, find important vertices, and more. Refer to the User Guide for a full list of queries and algorithms.
Will GraphFrames be part of Apache Spark?
The GraphX component of Apache Spark has no DataFrames - or Dataset-based equivalent, so it is natural to ask this question.
The new plan is to merge GraphFrames with Apache Spark to make it a component of core Spark. Spark committer Holden Karau is willing to support this effort by reviewing the relevant pull requests.
Downloading
Get GraphFrames from the Spark Packages website. This documentation is for GraphFrames version 0.8.4. GraphFrames depends on Apache Spark, which is available for download from the Apache Spark website.
GraphFrames should be compatible with any platform which runs Spark. Refer to the Apache Spark documentation for more information.
GraphFrames is compatible with Spark 1.6+. However, later versions of Spark include major improvements to DataFrames, so GraphFrames may be more efficient when running on more recent Spark versions.
GraphFrames is tested with Java 8, Python 2 and 3, and running against Spark 2.2+ (Scala 2.11).
Applications, the Apache Spark shell, and clusters
See the Apache Spark User Guide for more information about submitting Spark jobs to clusters, running the Spark shell, and launching Spark clusters. The GraphFrame Quick-Start guide also shows how to run the Spark shell with GraphFrames supplied as a package.
Where to Go from Here
User Guides:
- Quick Start: a quick introduction to the GraphFrames API; start here!
- GraphFrames User Guide: detailed overview of GraphFrames in all supported languages (Scala, Java, Python)
- Motif Finding Tutorial: learn to perform pattern recognition with GraphFrames using a technique called network motif finding over the knowledge graph for the
stackexchange.com
subdomain data dump
API Docs:
Community Forums:
- GraphFrames Mailing List: ask questions about GraphFrames here
- #graphframes Discord Channel on GraphGeeks
External Resources: