Skip to content

Commit 81b0c40

Browse files
committed
release-0.2 #17
release: 0.2 polished docs and readme
2 parents efda3b4 + 12a2959 commit 81b0c40

File tree

4 files changed

+229
-176
lines changed

4 files changed

+229
-176
lines changed

README.md

Lines changed: 107 additions & 174 deletions
Original file line numberDiff line numberDiff line change
@@ -1,230 +1,163 @@
1-
# NebulaGraph Data Intelligence(ngdi) Suite
2-
31
![image](https://user-images.githubusercontent.com/1651790/221876073-61ef4edb-adcd-4f10-b3fc-8ddc24918ea1.png)
42

5-
[![pdm-managed](https://img.shields.io/badge/pdm-managed-blueviolet)](https://pdm.fming.dev) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE) [![PyPI version](https://badge.fury.io/py/ngdi.svg)](https://badge.fury.io/py/ngdi) [![Python](https://img.shields.io/badge/python-3.6%2B-blue.svg)](https://www.python.org/downloads/release/python-360/)
3+
<p align="center">
4+
<em>Data Intelligence Suite with 4 line code to run Graph Algo on NebulaGraph</em>
5+
</p>
66

7-
NebulaGraph Data Intelligence Suite for Python (ngdi) is a powerful Python library that offers a range of APIs for data scientists to effectively read, write, analyze, and compute data in NebulaGraph. This library allows data scientists to perform these operations on a single machine using NetworkX, or in a distributed computing environment using Spark, in unified and intuitive API. With ngdi, data scientists can easily access and process data in NebulaGraph, enabling them to perform advanced analytics and gain valuable insights.
7+
<p align="center">
8+
<a href="LICENSE" target="_blank">
9+
<img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="License">
10+
</a>
811

9-
```
10-
┌───────────────────────────────────────────────────┐
11-
│ Spark Cluster │
12-
│ .─────. .─────. .─────. .─────. │
13-
┌─▶│ : ; : ; : ; : ; │
14-
│ │ `───' `───' `───' `───' │
15-
Algorithm │
16-
Spark └───────────────────────────────────────────────────┘
17-
Engine ┌────────────────────────────────────────────────────────────────┐
18-
└──┤ │
19-
│ NebulaGraph Data Intelligence Suite(ngdi) │
20-
│ ┌────────┐ ┌──────┐ ┌────────┐ ┌─────┐ │
21-
│ │ Reader │ │ Algo │ │ Writer │ │ GNN │ │
22-
│ └────────┘ └──────┘ └────────┘ └─────┘ │
23-
│ ├────────────┴───┬────────┴─────┐ └──────┐ │
24-
│ ▼ ▼ ▼ ▼ │
25-
│ ┌─────────────┐ ┌──────────────┐ ┌──────────┐┌───────────┐ │
26-
┌──┤ │ SparkEngine │ │ NebulaEngine │ │ NetworkX ││ DGLEngine │ │
27-
│ │ └─────────────┘ └──────────────┘ └──────────┘└───────────┘ │
28-
│ └──────────┬─────────────────────────────────────────────────────┘
29-
│ │ Spark
30-
│ └────────Reader ────────────┐
31-
Spark Reader Query Mode │
32-
Scan Mode ▼
33-
│ ┌───────────────────────────────────────────────────┐
34-
│ │ NebulaGraph Graph Engine Nebula-GraphD │
35-
│ ├──────────────────────────────┬────────────────────┤
36-
│ │ NebulaGraph Storage Engine │ │
37-
└─▶│ Nebula-StorageD │ Nebula-Metad │
38-
└──────────────────────────────┴────────────────────┘
39-
```
12+
<a href="https://badge.fury.io/py/ngdi" target="_blank">
13+
<img src="https://badge.fury.io/py/ngdi.svg" alt="PyPI version">
14+
</a>
15+
16+
<a href="https://www.python.org/downloads/release/python-360/" target="_blank">
17+
<img src="https://img.shields.io/badge/python-3.6%2B-blue.svg" alt="Python">
18+
</a>
19+
20+
<a href="https://pdm.fming.dev" target="_blank">
21+
<img src="https://img.shields.io/badge/pdm-managed-blueviolet" alt="pdm-managed">
22+
</a>
23+
24+
</p>
25+
26+
---
27+
28+
**Documentation**: <a href="https://github.com/wey-gu/nebulagraph-di#documentation" target="_blank">https://github.com/wey-gu/nebulagraph-di#documentation</a>
29+
30+
**Source Code**: <a href="https://github.com/wey-gu/nebulagraph-di" target="_blank">https://github.com/wey-gu/nebulagraph-di</a>
31+
32+
---
33+
34+
35+
NebulaGraph Data Intelligence Suite for Python (ngdi) is a powerful Python library that offers APIs for data scientists to effectively read, write, analyze, and compute data in NebulaGraph.
36+
37+
With the support of single-machine engine(NetworkX), or distributed computing environment using Spark we could perform Graph Analysis and Algorithms on top of NebulaGraph in less than 10 lines of code, in unified and intuitive API.
4038

4139
## Quick Start in 5 Minutes
4240

4341
- Setup env with Nebula-Up following [this guide](https://github.com/wey-gu/nebulagraph-di/blob/main/docs/Environment_Setup.md).
4442
- Install ngdi with pip from the Jupyter Notebook with http://localhost:8888 (password: `nebula`).
45-
- Open the demo notebook and run cells with `Shift+Enter` or `Ctrl+Enter`.
43+
- Open the demo notebook and run cells one by one.
44+
- Check the [API Reference](https://github.com/wey-gu/nebulagraph-di/docs/API.md)
4645

4746
## Installation
4847

4948
```bash
5049
pip install ngdi
5150
```
5251

53-
### Spark Engine Prerequisites
54-
- Spark 2.4, 3.0(not yet tested)
55-
- [NebulaGraph 3.4+](https://github.com/vesoft-inc/nebula)
56-
- [NebulaGraph Spark Connector 3.4+](https://repo1.maven.org/maven2/com/vesoft/nebula-spark-connector/)
57-
- [NebulaGraph Algorithm 3.1+](https://repo1.maven.org/maven2/com/vesoft/nebula-algorithm/)
58-
59-
### NebulaGraph Engine Prerequisites
60-
- [NebulaGraph 3.4+](https://github.com/vesoft-inc/nebula)
61-
- [NebulaGraph Python Client 3.4+](https://github.com/vesoft-inc/nebula-python)
62-
- [NetworkX](https://networkx.org/)
63-
64-
## Run on PySpark Jupyter Notebook(Spark Engine)
65-
66-
Assuming we have put the `nebula-spark-connector.jar` and `nebula-algo.jar` in `/opt/nebulagraph/ngdi/package/`.
52+
## Usage
6753

68-
```bash
69-
export PYSPARK_PYTHON=python3
70-
export PYSPARK_DRIVER_PYTHON=jupyter
71-
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --ip=0.0.0.0 --port=8888 --no-browser"
72-
73-
pyspark --driver-class-path /opt/nebulagraph/ngdi/package/nebula-spark-connector.jar \
74-
--driver-class-path /opt/nebulagraph/ngdi/package/nebula-algo.jar \
75-
--jars /opt/nebulagraph/ngdi/package/nebula-spark-connector.jar \
76-
--jars /opt/nebulagraph/ngdi/package/nebula-algo.jar
77-
```
54+
### Spark Engine Examples
7855

79-
Then we could access Jupyter Notebook with PySpark and refer to [examples/spark_engine.ipynb](https://github.com/wey-gu/nebulagraph-di/examples/spark_engine.ipynb)
56+
See also: [examples/spark_engine.ipynb](https://github.com/wey-gu/nebulagraph-di/blob/main/examples/spark_engine.ipynb)
8057

81-
## Submit Algorithm job to Spark Cluster(Spark Engine)
58+
Run Algorithm on top of NebulaGraph:
8259

83-
Assuming we have put the `nebula-spark-connector.jar` and `nebula-algo.jar` in `/opt/nebulagraph/ngdi/package/`;
84-
We have put the `ngdi-py3-env.zip` in `/opt/nebulagraph/ngdi/package/`.
85-
And we have the following Algorithm job in `pagerank.py`:
60+
> Note, there is also query mode, refer to [examples](https://github.com/wey-gu/nebulagraph-di/blob/main/examples/spark_engine.ipynb) or [docs](https://github.com/wey-gu/nebulagraph-di/docs/API.md) for more details.
8661
8762
```python
88-
from ngdi import NebulaGraphConfig
8963
from ngdi import NebulaReader
9064

91-
# set NebulaGraph config
92-
config_dict = {
93-
"graphd_hosts": "graphd:9669",
94-
"metad_hosts": "metad0:9669,metad1:9669,metad2:9669",
95-
"user": "root",
96-
"password": "nebula",
97-
"space": "basketballplayer",
98-
}
99-
config = NebulaGraphConfig(**config_dict)
100-
101-
# read data with spark engine, query mode
65+
# read data with spark engine, scan mode
10266
reader = NebulaReader(engine="spark")
103-
query = """
104-
MATCH ()-[e:follow]->()
105-
RETURN e LIMIT 100000
106-
"""
107-
reader.query(query=query, edge="follow", props="degree")
67+
reader.scan(edge="follow", props="degree")
10868
df = reader.read()
10969

11070
# run pagerank algorithm
11171
pr_result = df.algo.pagerank(reset_prob=0.15, max_iter=10)
11272
```
11373

114-
> Note, this could be done by Airflow, or other job scheduler in production.
115-
116-
Then we can submit the job to Spark cluster:
74+
Write back to NebulaGraph:
11775

118-
```bash
119-
spark-submit --master spark://master:7077 \
120-
--driver-class-path /opt/nebulagraph/ngdi/package/nebula-spark-connector.jar \
121-
--driver-class-path /opt/nebulagraph/ngdi/package/nebula-algo.jar \
122-
--jars /opt/nebulagraph/ngdi/package/nebula-spark-connector.jar \
123-
--jars /opt/nebulagraph/ngdi/package/nebula-algo.jar \
124-
--py-files /opt/nebulagraph/ngdi/package/ngdi-py3-env.zip \
125-
pagerank.py
126-
```
76+
```python
77+
from ngdi import NebulaWriter
78+
from ngdi.config import NebulaGraphConfig
12779

128-
## Run ngdi algorithm job from python script(Spark Engine)
80+
config = NebulaGraphConfig()
12981

130-
We have everything ready as above, including the `pagerank.py`.
82+
properties = {"louvain": "cluster_id"}
13183

132-
```python
133-
import subprocess
134-
135-
subprocess.run(["spark-submit", "--master", "spark://master:7077",
136-
"--driver-class-path", "/opt/nebulagraph/ngdi/package/nebula-spark-connector.jar",
137-
"--driver-class-path", "/opt/nebulagraph/ngdi/package/nebula-algo.jar",
138-
"--jars", "/opt/nebulagraph/ngdi/package/nebula-spark-connector.jar",
139-
"--jars", "/opt/nebulagraph/ngdi/package/nebula-algo.jar",
140-
"--py-files", "/opt/nebulagraph/ngdi/package/ngdi-py3-env.zip",
141-
"pagerank.py"])
84+
writer = NebulaWriter(
85+
data=df_result, sink="nebulagraph_vertex", config=config, engine="spark")
86+
writer.set_options(
87+
tag="louvain", vid_field="_id", properties=properties,
88+
batch_size=256, write_mode="insert",)
89+
writer.write()
14290
```
14391

144-
## Run on single machine(NebulaGraph Engine)
92+
Then we could query the result in NebulaGraph:
14593

146-
Assuming we have NebulaGraph cluster up and running, and we have the following Algorithm job in `pagerank_nebula_engine.py`:
94+
```cypher
95+
MATCH (v:louvain)
96+
RETURN id(v), v.louvain.cluster_id LIMIT 10;
97+
```
14798

148-
This file is the same as `pagerank.py` except for the following line:
99+
### NebulaGraph Engine Examples(not yet implemented)
100+
101+
Basically the same as Spark Engine, but with `engine="nebula"`.
149102

150103
```diff
151104
- reader = NebulaReader(engine="spark")
152105
+ reader = NebulaReader(engine="nebula")
153106
```
154107

155-
Then we can run the job on single machine:
156-
157-
```bash
158-
python3 pagerank.py
159-
```
160-
161108
## Documentation
162109

163-
[API Reference](https://github.com/wey-gu/nebulagraph-di/docs/API.md)
164-
165-
## Usage
110+
[Environment Setup](https://github.com/wey-gu/nebulagraph-di/blob/main/docs/Environment_Setup.md)
166111

167-
### Spark Engine Examples
168-
169-
See also: [examples/spark_engine.ipynb](https://github.com/wey-gu/nebulagraph-di/examples/spark_engine.ipynb)
170-
171-
```python
172-
from ngdi import NebulaReader
173-
174-
# read data with spark engine, query mode
175-
reader = NebulaReader(engine="spark")
176-
query = """
177-
MATCH ()-[e:follow]->()
178-
RETURN e LIMIT 100000
179-
"""
180-
reader.query(query=query, edge="follow", props="degree")
181-
df = reader.read() # this will take some time
182-
df.show(10)
183-
184-
# read data with spark engine, scan mode
185-
reader = NebulaReader(engine="spark")
186-
reader.scan(edge="follow", props="degree")
187-
df = reader.read() # this will take some time
188-
df.show(10)
112+
[API Reference](https://github.com/wey-gu/nebulagraph-di/docs/API.md)
189113

190-
# read data with spark engine, load mode (not yet implemented)
191-
reader = NebulaReader(engine="spark")
192-
reader.load(source="hdfs://path/to/edge.csv", format="csv", header=True, schema="src: string, dst: string, rank: int")
193-
df = reader.read() # this will take some time
194-
df.show(10)
114+
## How it works
195115

196-
# run pagerank algorithm
197-
pr_result = df.algo.pagerank(reset_prob=0.15, max_iter=10) # this will take some time
116+
ngdi is an unified abstraction layer for different engines, the current implementation is based on Spark, NetworkX, DGL and NebulaGraph, but it's easy to extend to other engines like Flink, GraphScope, PyG etc.
198117

199-
# convert dataframe to NebulaGraphObject
200-
graph = reader.to_graphx() # not yet implemented
118+
```
119+
┌───────────────────────────────────────────────────┐
120+
│ Spark Cluster │
121+
│ .─────. .─────. .─────. .─────. │
122+
┌─▶│ : ; : ; : ; : ; │
123+
│ │ `───' `───' `───' `───' │
124+
Algorithm │
125+
Spark └───────────────────────────────────────────────────┘
126+
Engine ┌────────────────────────────────────────────────────────────────┐
127+
└──┤ │
128+
│ NebulaGraph Data Intelligence Suite(ngdi) │
129+
│ ┌────────┐ ┌──────┐ ┌────────┐ ┌─────┐ │
130+
│ │ Reader │ │ Algo │ │ Writer │ │ GNN │ │
131+
│ └────────┘ └──────┘ └────────┘ └─────┘ │
132+
│ ├────────────┴───┬────────┴─────┐ └──────┐ │
133+
│ ▼ ▼ ▼ ▼ │
134+
│ ┌─────────────┐ ┌──────────────┐ ┌──────────┐┌───────────┐ │
135+
┌──┤ │ SparkEngine │ │ NebulaEngine │ │ NetworkX ││ DGLEngine │ │
136+
│ │ └─────────────┘ └──────────────┘ └──────────┘└───────────┘ │
137+
│ └──────────┬─────────────────────────────────────────────────────┘
138+
│ │ Spark
139+
│ └────────Reader ────────────┐
140+
Spark Reader Query Mode │
141+
Scan Mode ▼
142+
│ ┌───────────────────────────────────────────────────┐
143+
│ │ NebulaGraph Graph Engine Nebula-GraphD │
144+
│ ├──────────────────────────────┬────────────────────┤
145+
│ │ NebulaGraph Storage Engine │ │
146+
└─▶│ Nebula-StorageD │ Nebula-Metad │
147+
└──────────────────────────────┴────────────────────┘
201148
```
202149

203-
### NebulaGraph Engine Examples(not yet implemented)
150+
### Spark Engine Prerequisites
151+
- Spark 2.4, 3.0(not yet tested)
152+
- [NebulaGraph 3.4+](https://github.com/vesoft-inc/nebula)
153+
- [NebulaGraph Spark Connector 3.4+](https://repo1.maven.org/maven2/com/vesoft/nebula-spark-connector/)
154+
- [NebulaGraph Algorithm 3.1+](https://repo1.maven.org/maven2/com/vesoft/nebula-algorithm/)
204155

205-
```python
206-
from ngdi import NebulaReader
156+
### NebulaGraph Engine Prerequisites
157+
- [NebulaGraph 3.4+](https://github.com/vesoft-inc/nebula)
158+
- [NebulaGraph Python Client 3.4+](https://github.com/vesoft-inc/nebula-python)
159+
- [NetworkX](https://networkx.org/)
207160

208-
# read data with nebula engine, query mode
209-
reader = NebulaReader(engine="nebula")
210-
reader.query("""
211-
MATCH ()-[e:follow]->()
212-
RETURN e.src, e.dst, e.degree LIMIT 100000
213-
""")
214-
df = reader.read() # this will take some time
215-
df.show(10)
216-
217-
# read data with nebula engine, scan mode
218-
reader = NebulaReader(engine="nebula")
219-
reader.scan(edge_types=["follow"])
220-
df = reader.read() # this will take some time
221-
df.show(10)
222-
223-
# convert dataframe to NebulaGraphObject
224-
graph = reader.to_graph() # this will take some time
225-
graph.nodes.show(10)
226-
graph.edges.show(10)
161+
## License
227162

228-
# run pagerank algorithm
229-
pr_result = graph.algo.pagerank(reset_prob=0.15, max_iter=10) # this will take some time
230-
```
163+
This project is licensed under the terms of the Apache License 2.0.

docs/API.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,18 @@ reader.query(query=query, edge="follow", props="degree")
6969
df = reader.read()
7070
```
7171

72+
- Load mode
73+
74+
> not yet implemented
75+
76+
```python
77+
# read data with spark engine, load mode (not yet implemented)
78+
reader = NebulaReader(engine="spark")
79+
reader.load(source="hdfs://path/to/edge.csv", format="csv", header=True, schema="src: string, dst: string, rank: int")
80+
df = reader.read() # this will take some time
81+
df.show(10)
82+
```
83+
7284
## engines
7385

7486
- `ngdi.engines.SparkEngine` is the Spark Engine for `ngdi.NebulaReader`, `ngdi.NebulaWriter` and `ngdi.NebulaAlgorithm`.

0 commit comments

Comments
 (0)