Skip to content

Commit 9bf7e31

Browse files
authored
Merge pull request #13 from hkuich/playerMatchAttributeCasting
Player match attribute casting
2 parents c3f9e94 + 444d8ff commit 9bf7e31

25 files changed

+488
-74
lines changed

README.md

Lines changed: 141 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ Do above for all entities and their attributes in the schema. GraMi will ensure
117117

118118
For each relation in your schema, define a processor object that specifies
119119
- each relation attribute, its value type, and whether it is required
120-
- each relation player entity type, role, identifying attribute in the data file and its value type, as well as whether the player is required
120+
- each relation player of type entity, its role, identifying attribute in the data file and value type, as well as whether the player is required
121121

122122
For example, given the following relation in your schema:
123123

@@ -126,7 +126,8 @@ call sub relation,
126126
relates caller,
127127
relates callee,
128128
has started-at,
129-
has duration;
129+
has duration,
130+
plays past-call;
130131
```
131132

132133
Add the following processor object:
@@ -143,8 +144,7 @@ Add the following processor object:
143144
"uniquePlayerId": "phone-number", // using attribute phone-number as unique identifier for type person
144145
"idValueType": "string", // of value type string
145146
"roleType": "caller", // inserts person as player the role caller
146-
"required": true // which is a required role for each data record
147-
147+
"required": true // which is a required role for each data record
148148
},
149149
"callee": { // ID of player generator
150150
"playerType": "person", // matches entity of type person
@@ -172,6 +172,64 @@ Add the following processor object:
172172

173173
Do above for all relations and their players and attributes in the schema. GraMi will ensure that all values in your data files adhere to the value type specified or try to cast them. GraMi will also ensure that no data records enter grakn that are incomplete (missing required attributes/players).
174174

175+
##### Relation-Of-Relation Processors
176+
177+
Grakn comes with the powerful feature of using relations as players in other relations.
178+
179+
For each relation-of-relation in your schema, define a processor object that specifies
180+
- each relation attribute, its value type, and whether it is required
181+
- each relation player of type entity, its role, identifying attribute in the data file and its value type, as well as whether the player is required
182+
- each relation player of type relation, its role, identifying attribute in the data file and its value type, as well as whether the player is required
183+
184+
For example, given the following relation in your schema:
185+
186+
```GraphQL
187+
person sub entity,
188+
...,
189+
plays peer;
190+
191+
call sub relation,
192+
relates caller,
193+
relates callee,
194+
has started-at,
195+
has duration,
196+
plays past-call;
197+
198+
communication-channel sub relation,
199+
relates peer,
200+
relates past-call;
201+
```
202+
203+
Add the following processor object:
204+
205+
```
206+
{
207+
"processor": "communication-channel", // the ID of your processor
208+
"processorType": "relation-of-relation", // creates a relation
209+
"schemaType": "communication-channel", // of type communication-channel
210+
"conceptGenerators": {
211+
"players": { // with the following players according to schema
212+
"peer": { // ID of player generator
213+
"playerType": "person", // matches entity of type person
214+
"uniquePlayerId": "phone-number", // using attribute phone-number as unique identifier for type person
215+
"idValueType": "string", // of value type string
216+
"roleType": "peer", // inserts person as player the role caller
217+
"required": true // which is a required role for each data record
218+
}
219+
"past-call": { // ID of player generator
220+
"playerType": "call", // matches entity of type person
221+
"uniquePlayerId": "started-at", // using attribute phone-number as unique identifier for type person
222+
"idValueType": "date", // of value type string
223+
"roleType": "past-call", // inserts person as player the role callee
224+
"required": true // which is a required role for each data record
225+
},
226+
}
227+
}
228+
}
229+
```
230+
231+
Just remember that these relations of relation must be added AFTER the relations that will act as players in the relation have been migrated. GraMi will migrate all relation-of-relations after having migrated entities and relations - but keep this in mind as you are building your graph - relations are only inserted as expected when all its players are already present.
232+
175233
See the [full configuration file for phone-calls here](https://github.com/bayer-science-for-a-better-life/grami/tree/master/src/test/resources/phone-calls/processorConfig.json).
176234

177235
#### Data Configuration
@@ -257,7 +315,7 @@ The data config entry would be:
257315
"players": [ // player columns present in the data file
258316
{
259317
"columnName": "caller_id", // column name in data file
260-
"generator": "caller" // player generator in processor call to be used for the column
318+
"generator": "caller" // player generator in processor call to be used for the column
261319
},
262320
{
263321
"columnName": "callee_id", // column name in data file
@@ -267,7 +325,7 @@ The data config entry would be:
267325
"attributes": [ // attribute columns present in the data file
268326
{
269327
"columnName": "started_at", // column name in data file
270-
"generator": "started-at" // attribute generator in processor call to be used for the column
328+
"generator": "started-at" // attribute generator in processor call to be used for the column
271329
},
272330
{
273331
"columnName": "duration", // column name in data file
@@ -279,6 +337,83 @@ The data config entry would be:
279337

280338
Do above for all data files that need to be migrated.
281339

340+
Please note that you can also add a listSeparator for players that are in a list in a column:
341+
342+
Your data might look like:
343+
344+
```
345+
company_name,person_id
346+
Unity,+62 999 888 7777###+62 999 888 7778
347+
```
348+
349+
```
350+
"contract": {
351+
"dataPath": "src/test/resources/phone-calls/contract.csv",
352+
"separator": ",",
353+
"processor": "contract",
354+
"players": [
355+
{
356+
"columnName": "company_name",
357+
"generator": "provider"
358+
},
359+
{
360+
"columnName": "person_id",
361+
"generator": "customer",
362+
"listSeparator": "###" // like this!
363+
}
364+
],
365+
"batchSize": 100,
366+
"threads": 4
367+
}
368+
```
369+
370+
For troubleshooting, it might be worth setting the troublesome data configuration entry to a single thread, as the log messages for error from grakn are more verbose and specific that way...
371+
372+
##### Relation-of-Relation Data Config Entries
373+
374+
Given the data file [communication-channel.csv](https://github.com/bayer-science-for-a-better-life/grami/tree/master/src/test/resources/phone-calls/communication-channel.csv):
375+
376+
```CSV
377+
peer_1,peer_2,call_started_at
378+
+54 398 559 0423,+48 195 624 2025,2018-09-16T22:24:19
379+
+54 398 559 0423,+48 195 624 2025,2018-09-17T22:24:19
380+
+54 398 559 0423,+48 195 624 2025,2018-09-18T22:24:19
381+
+54 398 559 0423,+48 195 624 2025,2018-09-19T22:24:19
382+
+54 398 559 0423,+48 195 624 2025,2018-09-20T22:24:19
383+
+263 498 495 0617,+33 614 339 0298,2018-09-11T22:10:34###2018-09-12T22:10:34###2018-09-13T22:10:34###2018-09-14T22:10:34###2018-09-15T22:10:34###2018-09-16T22:10:34
384+
+54 398 559 0423,+7 552 196 4096,2018-09-25T20:24:59
385+
...
386+
```
387+
388+
The data config entry would be:
389+
390+
```
391+
"communication-channel": {
392+
"dataPath": "/your/absolute/path/to/communication-channel.csv", // the absolute path to your data file
393+
"separator": ",", // the separation character used in your data file (alternatives: "\t", ";", etc...)
394+
"processor": "communication-channel", // processor from processor config file
395+
"batchSize": 100, // batchSize to be used for this data file
396+
"threads": 4, // # of threads to be used for this data file
397+
"players": [ // player columns present in the data file
398+
{
399+
"columnName": "peer_1", // column name in data file
400+
"generator": "peer" // player generator in processor call to be used for the column
401+
},
402+
{
403+
"columnName": "peer_2", // column name in data file
404+
"generator": "peer" // player generator in processor call to be used for the column
405+
},
406+
{
407+
"columnName": "call_started_at", // column name in data file
408+
"generator": "past-call", // player generator in processor call to be used for the column
409+
"listSeparator": "###"
410+
}
411+
]
412+
}
413+
```
414+
415+
Do above for all data files that need to be migrated.
416+
282417
For troubleshooting, it might be worth setting the troublesome data configuration entry to a single thread, as the log messages for error from grakn are more verbose and specific that way...
283418

284419
See the [full configuration file for phone-calls here](https://github.com/bayer-science-for-a-better-life/grami/tree/master/src/test/resources/phone-calls/dataConfig.json).

build.gradle

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ plugins {
55
}
66

77
group 'com.github.bayer-science-for-a-better-life'
8-
version '0.0.2'
8+
version '0.0.2-hotfix-1'
99

1010
repositories {
1111
mavenCentral()

src/main/java/generator/GeneratorUtil.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ public static StatementInstance cleanExplodeAdd(StatementInstance pattern, Strin
6868
}
6969
}
7070

71-
private static StatementInstance addAttributeOfColumnType(StatementInstance pattern, String conceptType, String valueType, String cleanedValue) {
71+
public static StatementInstance addAttributeOfColumnType(StatementInstance pattern, String conceptType, String valueType, String cleanedValue) {
7272
if (valueType.equals("string")) {
7373
pattern = pattern.has(conceptType, cleanedValue);
7474
} else if (valueType.equals("long")) {

src/main/java/generator/RelationInsertGenerator.java

Lines changed: 51 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import static generator.GeneratorUtil.idxOf;
44
import static generator.GeneratorUtil.cleanToken;
55
import static generator.GeneratorUtil.addAttribute;
6+
import static generator.GeneratorUtil.addAttributeOfColumnType;
67

78
import configuration.DataConfigEntry;
89
import configuration.ProcessorConfigEntry;
@@ -12,10 +13,7 @@
1213
import org.apache.logging.log4j.LogManager;
1314
import org.apache.logging.log4j.Logger;
1415

15-
import java.util.ArrayList;
16-
import java.util.Arrays;
17-
import java.util.Collection;
18-
import java.util.Map;
16+
import java.util.*;
1917

2018
public class RelationInsertGenerator extends InsertGenerator {
2119

@@ -62,12 +60,13 @@ public ArrayList<ArrayList<Statement>> graknRelationshipQueryFromRow(String row,
6260
appLogger.debug("processing tokenized row: " + Arrays.toString(tokens));
6361
GeneratorUtil.malformedRow(row, tokens, headerTokens.length);
6462

65-
ArrayList<Statement> matchStatements = new ArrayList<>(playersMatch(tokens, headerTokens, insertCounter));
63+
ArrayList<Statement> miStatements = new ArrayList<>(createPlayerMatchAndInsert(tokens, headerTokens, insertCounter));
64+
ArrayList<Statement> matchStatements = new ArrayList<>(miStatements.subList(0, miStatements.size() - 1));
6665
ArrayList<Statement> insertStatements = new ArrayList<>();
6766

6867
if (!matchStatements.isEmpty()) {
69-
StatementInstance insert = playersInsert(matchStatements, insertCounter);
70-
insert = relationInsert(insert);
68+
StatementInstance playerInsert = (StatementInstance) miStatements.subList(miStatements.size() - 1, miStatements.size()).get(0);
69+
StatementInstance insert = relationInsert(playerInsert);
7170
if (dce.getAttributes() != null) {
7271
for (DataConfigEntry.GeneratorSpecification attDataConfigEntry : dce.getAttributes()) {
7372
insert = addAttribute(tokens, insert, headerTokens, attDataConfigEntry, gce.getAttributeGenerator(attDataConfigEntry.getGenerator()));
@@ -101,61 +100,61 @@ private String assembleQuery(ArrayList<ArrayList<Statement>> queries) {
101100
return ret.toString();
102101
}
103102

104-
private StatementInstance relationInsert(StatementInstance si) {
105-
if (si != null) {
106-
si = si.isa(gce.getSchemaType());
107-
return si;
108-
} else {
109-
return null;
110-
}
111-
}
112-
113-
private StatementInstance playersInsert(ArrayList<Statement> matchStatements, int insertCounter) {
114-
Statement s = Graql.var("rel-" + insertCounter);
115-
int playerCounter = 0;
116-
for (DataConfigEntry.GeneratorSpecification dataPlayer : dce.getPlayers()) {
117-
ProcessorConfigEntry.ConceptGenerator playerGenerator = gce.getPlayerGenerator(dataPlayer.getGenerator());
118-
boolean insert = false;
119-
for (Statement st :matchStatements) {
120-
//need to have player in match statement or cannot insert as player in relation
121-
if (st.toString().contains(playerGenerator.getUniquePlayerId())) {
122-
insert = true;
123-
}
124-
}
125-
if (insert) {
126-
s = s.rel(playerGenerator.getRoleType(), playerGenerator.getPlayerType() + "-" + playerCounter + "-" + insertCounter);
127-
}
128-
playerCounter++;
129-
}
130-
if (s.toString().contains("(")) {
131-
return (StatementInstance) s;
132-
} else {
133-
return null;
134-
}
135-
}
136-
137-
private Collection<? extends Statement> playersMatch(String[] tokens, String[] headerTokens, int insertCounter) {
103+
private Collection<? extends Statement> createPlayerMatchAndInsert(String[] tokens, String[] headerTokens, int insertCounter) {
138104
ArrayList<Statement> players = new ArrayList<>();
105+
Statement playersInsertStatement = Graql.var("rel-" + insertCounter);
139106
int playerCounter = 0;
140107
for (DataConfigEntry.GeneratorSpecification playerDataConfigEntry : dce.getPlayers()) {
141108
ProcessorConfigEntry.ConceptGenerator playerGenerator = gce.getPlayerGenerator(playerDataConfigEntry.getGenerator());
142109
int playerDataIndex = idxOf(headerTokens, playerDataConfigEntry);
110+
143111
if(playerDataIndex == -1) {
144112
appLogger.error("The column header in your dataconfig mapping to the uniquePlayerId [" + playerGenerator.getUniquePlayerId() + "] cannot be found in the file you specified.");
145113
}
146-
if (tokens.length > playerDataIndex &&
147-
!cleanToken(tokens[playerDataIndex]).isEmpty()) {
148-
StatementInstance ms = Graql
149-
.var(playerGenerator.getPlayerType() + "-" + playerCounter + "-" + insertCounter)
150-
.isa(playerGenerator.getPlayerType()).has(playerGenerator.getUniquePlayerId(),
151-
cleanToken(tokens[playerDataIndex]));
152-
players.add(ms);
114+
115+
if (tokens.length > playerDataIndex && // make sure that there are enough tokens in the row for your column of interest
116+
!cleanToken(tokens[playerDataIndex]).isEmpty()) { // make sure that after cleaning, there is more than an empty string
117+
String listSeparator = playerDataConfigEntry.getListSeparator();
118+
if(listSeparator != null) {
119+
for (String exploded: tokens[playerDataIndex].split(listSeparator)) {
120+
if(!cleanToken(exploded).isEmpty()) {
121+
String playerVariable = playerGenerator.getPlayerType() + "-" + playerCounter + "-" + insertCounter;
122+
players.add(createPlayerMatchStatement(exploded, playerGenerator, playerVariable));
123+
playersInsertStatement = playersInsertStatement.rel(playerGenerator.getRoleType(), playerVariable);
124+
playerCounter++;
125+
}
126+
}
127+
} else { // single player, no listSeparator
128+
String playerVariable = playerGenerator.getPlayerType() + "-" + playerCounter + "-" + insertCounter;
129+
players.add(createPlayerMatchStatement(cleanToken(tokens[playerDataIndex]), playerGenerator, playerVariable));
130+
playersInsertStatement = playersInsertStatement.rel(playerGenerator.getRoleType(), playerVariable);
131+
playerCounter++;
132+
}
153133
}
154-
playerCounter++;
155134
}
135+
players.add(playersInsertStatement);
156136
return players;
157137
}
158138

139+
private StatementInstance createPlayerMatchStatement(String token, ProcessorConfigEntry.ConceptGenerator playerGenerator, String playerVariable) {
140+
String cleanedValue = cleanToken(token);
141+
StatementInstance ms = Graql
142+
.var(playerVariable)
143+
.isa(playerGenerator.getPlayerType());
144+
ms = addAttributeOfColumnType(ms, playerGenerator.getUniquePlayerId(), playerGenerator.getIdValueType(), cleanedValue);
145+
//.has(playerGenerator.getUniquePlayerId(), cleanedValue);
146+
return ms;
147+
}
148+
149+
private StatementInstance relationInsert(StatementInstance si) {
150+
if (si != null) {
151+
si = si.isa(gce.getSchemaType());
152+
return si;
153+
} else {
154+
return null;
155+
}
156+
}
157+
159158
private boolean isValid(ArrayList<ArrayList<Statement>> si) {
160159
ArrayList<Statement> matchStatements = si.get(0);
161160
ArrayList<Statement> insertStatements = si.get(1);
@@ -169,6 +168,9 @@ private boolean isValid(ArrayList<ArrayList<Statement>> si) {
169168
if (!matchStatement.toString().contains("isa " + generatorEntry.getValue().getPlayerType())) {
170169
return false;
171170
}
171+
if (!insertStatement.contains(generatorEntry.getValue().getRoleType())) {
172+
return false;
173+
}
172174
}
173175
// missing required attribute
174176
for (Map.Entry<String, ProcessorConfigEntry.ConceptGenerator> generatorEntry: gce.getRequiredAttributes().entrySet()) {

src/main/java/migrator/GraknMigrator.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,9 @@ public void migrate(boolean migrateEntities, boolean migrateRelations) throws IO
7474
}
7575

7676
private void migrateThingsInOrder(GraknClient.Session session, boolean migrateEntities, boolean migrateRelations) throws IOException {
77+
if(!migrateEntities && migrateRelations) {
78+
migrateEntities = true;
79+
}
7780
if (migrateEntities) {
7881
appLogger.info("migrating entities...");
7982
getStatusAndMigrate(session, "entity");

0 commit comments

Comments
 (0)