Skip to content

Full Stack Graph Machine Learning: Theory, Practice, Tools and Techniques

License

Notifications You must be signed in to change notification settings

Graphlet-AI/graphml-class

Folders and files

NameName
Last commit message
Last commit date
Jan 2, 2024
Nov 16, 2024
Dec 10, 2024
Dec 24, 2024
Nov 28, 2024
Sep 4, 2023
Oct 2, 2023
Dec 10, 2024
May 22, 2023
Sep 22, 2023
Nov 28, 2024
Dec 5, 2024
Dec 16, 2023
Feb 2, 2025
Feb 2, 2025
Feb 2, 2025
Feb 2, 2025
Feb 2, 2025
Dec 8, 2024
Nov 28, 2024
Dec 11, 2024
Dec 2, 2024
Dec 24, 2024
Dec 10, 2024

Repository files navigation

Full Stack Graph Machine Learning: Theory, Practice, Tools and Techniques v1.1.1

This is a course from Graphlet AI on full-stack graph machine learning taught by Russell Jurney.

Graphlet AI

Environment Setup

This class uses a Docker image rjurney/graphml-class. To bring it up as the jupyter service along with neo4j, run:

# Pull the Docker images BEFORE class starts, or it can take a while on a shared connection
docker compose pull

# Run a Jupyter Notebook container in the background with all requirements.txt installed
docker compose up -d

# Tail the Jupyter logs to see the JupyterLab url to connect in your browser
docker logs jupyter -f --tail 100

To shut down docker, be in this folder and type:

docker compose down

docker compose vs docker-compose

You say potato, I say patato... the docker compose command changed in recent versions :)

NOTE: older versions of docker may use the command docker-compose rather than the two word command docker compose.

VSCode Setup

To edit code in VSCode you may want a local Anaconda Python environment with the class's PyPi libraries installed. This will enable VSCode to parse the code, understand APIs and highlight errors.

Note: if you do not use Anaconda, consider using it :) You can use a Python 3 venv in the same way as conda.

Class Anaconda Environment

Create a new Anaconda environment:

conda create -n graphml python=3.10.11 -y

Activate the environment:

conda activate graphml

Install the project's libraries:

poetry install

VSCode Interpretter

You can use a Python environment in VSCode by typing:

SHIFT-CMD-P

to bring up a command search window. Now type Python or Interpreter or if you see it, select Python: Select Interpreter. Now choose the path to your conda environment. It will include the name of the environment, such as:

Python 3.10.11 ('graphml') /opt/anaconda3/envs/graphml/bin/python

Note: the Python version is set to 3.10.11 because Jupyter Stacks have not been updated more recently.

Knowledge Graph Construction in PySpark

We build a knowledge graph from the Stack Exchange Archive for the network motif section of the course.

Docker Exec Commands

To run a bash shell in the Jupyter container, type:

docker exec -it jupyter bash

Once you're there, you can run the following commands to download and prepare the data for the course.

First, download the data:

graphml_class/stats/download.py stats.meta

Then you will need to convert the data from XML to Parquet:

spark-submit --packages "com.databricks:spark-xml_2.12:0.18.0" graphml_class/stats/xml_to_parquet.py

The course covers knowledge graph construction in PySpark in graphml_class.stats.graph.py.

spark-submit graphml_class/stats/graph.py

Network Motifs with GraphFrames

This course now covers network motifs in property graphs (frequent patterns of structure) using pyspark / GraphFrames (see motif.py, no notebook yet). It supports directed motifs, not undirected. All the 4-node motifs are outlined below. Note that GraphFrames can also filter the paths returned by its f.find() method using any Spark DataFrame filter - enabling temporal and complex property graph motifs.

All 4-node directed network motifs