Skip to content

Commit

Permalink
Gh-534: Additional updates for federated POC (#539)
Browse files Browse the repository at this point in the history
* basic page layout

* configuration options page

* finish the configuration page

* Add access control docs

* operation handling info

* add additional information

* Apply suggestions from code review

Co-authored-by: cn337131 <[email protected]>
Co-authored-by: p29876 <[email protected]>

* address comments

* update federated docs

* Apply suggestions from code review

Co-authored-by: cn337131 <[email protected]>

* address comments

---------

Co-authored-by: cn337131 <[email protected]>
Co-authored-by: p29876 <[email protected]>
Co-authored-by: wb36499 <[email protected]>
  • Loading branch information
4 people authored Nov 5, 2024
1 parent 752fab8 commit 7eef07b
Show file tree
Hide file tree
Showing 3 changed files with 74 additions and 21 deletions.
3 changes: 3 additions & 0 deletions docs/administration-guide/gaffer-stores/federated-store.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Federated Store

!!! warning
The current version of the federated store and how it currently operates is deprecated, it will be replaced by the current [simple federated store](./simple-federated/configuration.md#) in v2.4.0.

The Federated Store is a Gaffer store which forwards operations to a collection of sub-graphs and returns a single response as though a single graph were queried.

## Introduction
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,23 @@ operation. These can be used to do things like pick graphs or control the
merging, a full list of the available options are outlined in the following
table:

| Option | Description |
| --- | --- |
| `federated.graphIds` | List of graph IDs to submit the operation to, formatted as a comma separated string e.g. `"graph1,graph2"` |
| `federated.excludedGraphIds` | List of graph IDs to exclude from the query. If this is set any graph IDs on a `federated.graphIds` option are ignored and instead, all graphs are executed on except the ones specified e.g. `"graph1,graph2"` |
| `federated.aggregateElements` | Should the element aggregator be used when merging element results. |
| `federated.forwardChain` | Should the whole operation chain be sent to the sub graph or not. If set to `false` each operation will inside the chain will be sent separately, so merging from each graph will happen after each operation instead of at the end of the chain. This will be inherently slower if turned off so is `true` by default. |
| Option | Default | Description |
| --- | --- | --- |
| `federated.graphIds` | None | List of graph IDs to submit the operation to, formatted as a comma separated string e.g. `"graph1,graph2"` |
| `federated.excludedGraphIds` | None | List of graph IDs to exclude from the query. If this is set any graph IDs on a `federated.graphIds` option are ignored and instead, all graphs are executed on except the ones specified e.g. `"graph1,graph2"` |
| `federated.aggregateElements` | See store properties | Should the element aggregator be used when merging element results. |
| `federated.useDefaultGraphIds` | None | Explicitly specifies that the default Graph IDs from the store.properties file should be used. By default if no graph ID options are specified the default graph IDs will still be used where applicable. However, specifying this on an operation chain means the whole chain will be sent to the sub graph, and so merging from each graph will happen at the end of the chain instead of after each operation, hopefully increasing performance.
| `federated.separateResults` | `false` | A boolean option to specify if the results from each graph should be kept separate. If set, this will return a map where each key value is the graph ID and its respective result. |
| `federated.skipGraphOnFail` | `false` | A boolean option to specify if the operation should continue even if it fails on one or more of the sub graphs. |

Along with the options above, all merge classes can be overridden per query
using the same property key as you would via the store properties. Please see
the table [here](./configuration.md#store-properties) for more information.

If you wish to submit different operations to different graphs in the same query
you can do this using the `federate.forwardChain` option. By setting this to
false on the outer operation chain the options on the operations inside it will
be honoured. An example of this can be seen below:
you can do this by omitting any graph ID options on the outer operation chain.
You can then specify the graph IDs on the individual operations in the chain
instead. An example of this can be seen below:

!!! note
This will turn off any merging of the results at the end of the chain, the
Expand All @@ -44,9 +46,6 @@ be honoured. An example of this can be seen below:
```json
{
"class": "OperationChain",
"options": {
"federated.forwardChain": false
},
"operations": [
{
"class": "GetElements",
Expand Down Expand Up @@ -77,21 +76,60 @@ graphs that have been added to the store. This means all features available to
normal caches are also available to the graph storage, allowing the sharing and
persisting of graphs between instances.

The federated store will use the default cache service to store graphs in. It
will also add a standard suffix meaning if you want to share graphs you will
need to set this to something other than the graph ID (see [here](../store-guide.md#cache-service)).
The federated store will use the default cache service to store graphs in. It will
also store graphs in a cache named `"federatedGraphCache_"` followed by the graph
ID of the federated store. You may wish to change this to have common storage
of graphs between stores using the `gaffer.store.federated.graphCache.name`
store property.

### Named Operations and Views

Named Operations and Views can be added to different caches if specified. By
passing graph IDs in the add operation (e.g. `AddNamedOperation`) you can make
the Named Operation or View specific to the graph(s) you specified. However,
this will mean if you try to run it on another graph it will not be available.

If you do not specify any graph IDs in the add operation, any Named
Operations/Views will instead be added to the federated store's cache. By doing
this anything Named will be resolved before forwarding to sub graphs meaning in
essence it is available to all sub graphs.

!!! example ""
=== "Add to a sub graph"
```java
final AddNamedOperation addNamedOp = new AddNamedOperation.Builder()
.option(FederatedOperationHandler.OPT_GRAPH_IDS, "subGraph")
.name("NamedOperation")
.operationChain(new OperationChain.Builder()
.first(new GetAllElements())
.build())
.build();
```

=== "Add to a federated store"
```java
final AddNamedOperation addNamedOp = new AddNamedOperation.Builder()
.name("NamedOperation")
.operationChain(new OperationChain.Builder()
.first(new GetAllElements())
.build())
.build();
```

## Schema Compatibility

When querying multiple graphs, the federated store will attempt to merge each graph's schema together. This means the schemas will need to be
compatible in order to query across them. Generally you will need to ensure
any shared groups can be merged correctly, a few examples of criteria to
consider are:
When querying multiple graphs, the federated store will attempt to merge each
graph's schema together. This means the schemas will need to be compatible in
order to query across them. Generally you will need to ensure any shared groups
can be merged correctly, a few examples of criteria to consider are:

- Any properties in a shared group defined in both schemas need to have the same
type and aggregation function.
- Any visibility properties need to be compatible or they will be removed from the
schema.
- If the visibility property has been defined differently in each schema it will
be removed from the merged schema. This does not effect the actual visibility
of the data as that will still be applied at the sub graph level.
- Groups with different properties in each schema will be merged so the group has
all the properties in the merged schema.
- Any groupBy definitions need to be compatible or will be removed.
- If the vertex serialiser has been defined differently in each schema it will
be removed from the merged schema.
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ specific to a federated store and their usage.
| `gaffer.store.federated.default.graphIds` | `""` | The list of default graph IDs for if a user does not specify what graph(s) to run their query on. Takes a comma separated list of graph IDs e.g. `"graphID1,graphID2"` |
| `gaffer.store.federated.allowPublicGraphs` | `true` | Are graphs with public access allowed to be added to this store. |
| `gaffer.store.federated.default.aggregateElements` | `false` | Should queries aggregate returned Gaffer elements together using the binary operator for merging elements. False by default as it can be slower meaning results are just chained into one big list. |
| `gaffer.store.federated.graphCache.name` | `"federatedGraphCache_<graphId>"` | The name of the cache that the federated store will store its graphs in. This allows sharing of graphs between different federated stores if the cache name is the same (and same default implementation). |
| `gaffer.store.federated.merge.number.class` | `uk.gov.gchq.koryphe.impl.binaryoperator.Sum` | Default binary operator for merging [`Number`](https://docs.oracle.com/javase/8/docs/api/java/lang/Number.html) results (e.g. from a `Count` operation) from multiple graphs. |
| `gaffer.store.federated.merge.string.class` | `uk.gov.gchq.koryphe.impl.binaryoperator.StringConcat` | Default binary operator for merging [`String`](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html) results from multiple graphs. |
| `gaffer.store.federated.merge.boolean.class` | `uk.gov.gchq.koryphe.impl.binaryoperator.And` | Default binary operator for merging [`Boolean`](https://docs.oracle.com/javase/8/docs/api/java/lang/Boolean.html) results from multiple graphs. |
Expand All @@ -64,6 +65,11 @@ satisfy Java's [`BinaryOperator`](https://docs.oracle.com/javase/8/docs/api/java
interface, you can then specify it using the property key for the data type you
wish to use it for.

!!! note
Please note you currently can't chose a merge operator for operations that
return an `Iterable` type, they will always just be chained together (an
iterable of `Element`s is an obvious exception, please see below).

### The Default Element Merge Operator

The default operator used to merge Gaffer elements is unique compared to the
Expand Down Expand Up @@ -95,6 +101,12 @@ to the individual graph results, this means two results separately will
satisfy the `View` but once aggregated they may not.
- If you wish to write or use your own operator for merging elements the class
must extend the [`ElementAggregateOperator`](https://github.com/gchq/Gaffer/blob/develop/store-implementation/simple-federated-store/src/main/java/uk/gov/gchq/gaffer/federated/simple/merge/operator/ElementAggregateOperator.java).
- If you have chosen in the schema to use a time sensitive aggregation function
(e.g. [`First`](../../../reference/binary-operators-guide/koryphe-operators.md#first))
for a property that is in multiple sub graphs, you may end up with duplicates
in the result as the aggregator does not know which sub graph is first or
last. This means you may get duplicates of the same vertex but with different
properties in the result.

## Adding and Removing Graphs

Expand Down

0 comments on commit 7eef07b

Please sign in to comment.