|
| 1 | +- Feature Name: Support Nebula Graph |
| 2 | +- Start Date: 2022-04-16 |
| 3 | +- RFC PR: [amundsen-io/rfcs#48](https://github.com/amundsen-io/rfcs/pull/48) |
| 4 | +- Amundsen Issue: [amundsen-io/amundsen#1816](https://github.com/amundsen-io/amundsen/issues/1816) |
| 5 | + |
| 6 | +# Nebula Graph Support |
| 7 | + |
| 8 | +## Summary |
| 9 | + |
| 10 | +The support includes the Nebula Graph data builder and a new proxy for Nebula Graph in metadata service. |
| 11 | + |
| 12 | +## Motivation |
| 13 | + |
| 14 | +Metadata can be published into Nebula Graph from Amundsen data builder now and this RFC is going to make metadata retrieval API work for Nebula Graph in metadata service. |
| 15 | + |
| 16 | +## Guide-level Explanation (aka Product Details) |
| 17 | + |
| 18 | +The goal of this RFC is to add additional loaders, publishers, and serializers to the library suite so that Nebula Graph is supported. |
| 19 | + |
| 20 | +## UI/UX-level Explanation |
| 21 | + |
| 22 | +N/A |
| 23 | + |
| 24 | +## Reference-level Explanation (aka Technical Details) |
| 25 | + |
| 26 | +A new proxy for Nebula Graph is added to support Nebula Graph in the metadata service. |
| 27 | + |
| 28 | +To support Nebula Graph in the data builder, several new components are added: |
| 29 | + |
| 30 | +- Nebula Extractor |
| 31 | +- Nebula Search Data Extractor |
| 32 | +- Nebula CSV Loader |
| 33 | +- Nebula CSV Publisher |
| 34 | +- Nebula Serializer |
| 35 | +- Nebula Sample Data Loader |
| 36 | + |
| 37 | +### Nebula Graph Schema handling |
| 38 | + |
| 39 | +They worked quite similarly to those components for Neo4j(it even speaks a dialect of OpenCypher) but differentiated in one thing: Nebula is not schemaless like Neo4j, that is, a label(named tag in Nebula Graph) or an edge type should be created before it's being referred in a query. |
| 40 | + |
| 41 | +Instead of maintaining a versioned schema by the user(with extra interfaces introduced), the proposed design is to parse the schema from data being published(Nebula CSV Publisher), which will do a schema check and DDL change when needed automatically, where the schema information will be in a single source of truth: the data model of the data builder. |
| 42 | + |
| 43 | +This requires the user to run the Nebula data builder sample script to initialize the Nebula Graph Schema, which could be discussed/revisited. |
| 44 | + |
| 45 | +### Nebula Graph Index |
| 46 | + |
| 47 | +Another difference in Nebula towards Neo4j is, when it comes to "starting point seeking" of an OpenCypher `MATCH` Query, if non of the `key`(it's called Vertex ID in Nebula Graph) is provided, an index on the LABEL(tag) should be created(or a `LIMIT` clause is added), that is either no conditions at all: `MATCH (n: Table) return n` or conditions for starting point is only under a property: `MATCH (n: Table{category: "foo"}) return n` in the query. [Here](https://siwei.io/en/nebula-index-explained/) is a post where I explained why it's designed so. |
| 48 | + |
| 49 | +All those data models with queries in meta service API required indexes were handled by Nebula Publisher now, included in `NEBULA_INDEX_TAG_FIELDS`. |
| 50 | + |
| 51 | +## Drawbacks |
| 52 | + |
| 53 | +The RFC adds support for another datastore which brings in additional components and increases the code size of the repo. |
| 54 | + |
| 55 | +## Alternatives |
| 56 | + |
| 57 | +For the Nebula Schema auto adaptation in Nebula Publisher, one alternative is adding an extra interface/utility to manage the Nebula Graph schema. |
| 58 | + |
| 59 | +## Prior art |
| 60 | + |
| 61 | +N/A |
| 62 | + |
| 63 | +## Unresolved questions |
| 64 | + |
| 65 | +N/A |
| 66 | + |
| 67 | +## Future possibilities |
| 68 | + |
| 69 | +None so far. |
0 commit comments