-
Notifications
You must be signed in to change notification settings - Fork 5
[267] Tidy & slim schema transformations #281
Conversation
@mindrones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noted down a couple of changes I'd like to see, but not really sure what implications these would have.
@@ -1,24 +1,24 @@ | |||
{ | |||
"mappings":{ | |||
"_doc":{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just a comment, in Svizzle I'm actually going back to tabs as it's editor configurable so it makes everyone happy :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, actually I have a new unit test that this would have failed (all json in the repo must be clean to pass) - so this will fail anyway :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to merge for my part 👍
* changed schema_transformor to use new simpler mapping * removed to/from keys * new null syntax mapping implemented * adding temporary eurito-dev index to avoid conflating es7 compatibility issues * adding temporary eurito-dev index to avoid conflating es7 compatibility issues * testing es7 on cordis only * testing es7 on cordis only * testing es7 on cordis only * changes to make cordis es7 run * eurito-dev iteration * compatibility issues between arxlive and eurito arxiv * sorted json * pycountry change no longer assumes not null country * needed to split pathstub args * removed redundant es mappings * old new index paradigm fix Co-authored-by: Joel Klinger <[email protected]>
* changed branch name * mappings build * updated docs * updated docs * updated docs * added docstrings * added dynamic strict to settings * removed index.json in favour of a single defaults file * using soft alias until a future PR to minimise changes * cleaned and sorted json * [267] Tidy & slim schema transformations (#281) * pruned deprecated schema transformations * updated fos fieldname on arxlive * unified data set schema transformations * restructured directory * refactored references to schema_transformation * refactored references to schema_transformation * slimmed down transformations, and included entity_type * pruned ontology * tidied schemas * consistency tests * reverted unrelated json file * harmonised name fieldsofstudy across arxiv * added novelty back in * sorted json * sorted json * sorted json Co-authored-by: Joel Klinger <[email protected]> Co-authored-by: Joel Klinger <[email protected]>
* make sure conf dir is empty * simplified es config * added orm es config reader * modified setup_es to pick up new es config * swapped es_mode for boolean * aliases now consistent with config * aliases now automatically located * added endpoint field to estasks * added endpoint field to sql2estasks * [267] Pool ES mappings across datasets (#280) * changed branch name * mappings build * updated docs * updated docs * updated docs * added docstrings * added dynamic strict to settings * removed index.json in favour of a single defaults file * using soft alias until a future PR to minimise changes * cleaned and sorted json * [267] Tidy & slim schema transformations (#281) * pruned deprecated schema transformations * updated fos fieldname on arxlive * unified data set schema transformations * restructured directory * refactored references to schema_transformation * refactored references to schema_transformation * slimmed down transformations, and included entity_type * pruned ontology * tidied schemas * consistency tests * reverted unrelated json file * harmonised name fieldsofstudy across arxiv * added novelty back in * sorted json * sorted json * sorted json Co-authored-by: Joel Klinger <[email protected]> Co-authored-by: Joel Klinger <[email protected]> * patched out es config setup from tests * removed redundant tests * fixed json formatting * none included for testing * picked up bug in test Co-authored-by: Joel Klinger <[email protected]>
* make sure conf dir is empty * simplified es config * added orm es config reader * modified setup_es to pick up new es config * swapped es_mode for boolean * aliases now consistent with config * aliases now automatically located * added endpoint field to estasks * added endpoint field to sql2estasks * changed branch name * mappings build * updated docs * updated docs * updated docs * added docstrings * pruned deprecated schema transformations * updated fos fieldname on arxlive * unified data set schema transformations * restructured directory * refactored references to schema_transformation * refactored references to schema_transformation * slimmed down transformations, and included entity_type * pruned ontology * tidied schemas * consistency tests * reverted unrelated json file * added dynamic strict to settings * removed index.json in favour of a single defaults file * harmonised name fieldsofstudy across arxiv * using soft alias until a future PR to minimise changes * added novelty back in * sorted json * sorted json * sorted json * changed schema_transformor to use new simpler mapping * removed to/from keys * new null syntax mapping implemented * cleaned and sorted json * adding temporary eurito-dev index to avoid conflating es7 compatibility issues * adding temporary eurito-dev index to avoid conflating es7 compatibility issues * testing es7 on cordis only * testing es7 on cordis only * testing es7 on cordis only * changes to make cordis es7 run * eurito-dev iteration * compatibility issues between arxlive and eurito arxiv * sorted json * pycountry change no longer assumes not null country * needed to split pathstub args * removed redundant es mappings * empty gtr transformation * [267] Pool ES mappings across datasets (#280) * changed branch name * mappings build * updated docs * updated docs * updated docs * added docstrings * added dynamic strict to settings * removed index.json in favour of a single defaults file * using soft alias until a future PR to minimise changes * cleaned and sorted json * [267] Tidy & slim schema transformations (#281) * pruned deprecated schema transformations * updated fos fieldname on arxlive * unified data set schema transformations * restructured directory * refactored references to schema_transformation * refactored references to schema_transformation * slimmed down transformations, and included entity_type * pruned ontology * tidied schemas * consistency tests * reverted unrelated json file * harmonised name fieldsofstudy across arxiv * added novelty back in * sorted json * sorted json * sorted json Co-authored-by: Joel Klinger <[email protected]> Co-authored-by: Joel Klinger <[email protected]> * patched out es config setup from tests * removed redundant tests * fixed json formatting * fixed bad table name (NB table was empty anyway) * fixed bad table name (NB table was empty anyway) * gtr ontology * none included for testing * added schema transformation * picked up bug in test * gtr ontology is self consistent * added gtr mapping * added gtr to config * fixed merge conflicts * fixed merge conflicts * changed json field names * instiutes are now analyzed and text * sorted and cleaned json * added geopoint * fixed bad json * fixed bad json Co-authored-by: Joel Klinger <[email protected]>
* make sure conf dir is empty * simplified es config * added orm es config reader * modified setup_es to pick up new es config * swapped es_mode for boolean * aliases now consistent with config * aliases now automatically located * added endpoint field to estasks * added endpoint field to sql2estasks * changed branch name * mappings build * updated docs * updated docs * updated docs * added docstrings * pruned deprecated schema transformations * updated fos fieldname on arxlive * unified data set schema transformations * restructured directory * refactored references to schema_transformation * refactored references to schema_transformation * slimmed down transformations, and included entity_type * pruned ontology * tidied schemas * consistency tests * reverted unrelated json file * added dynamic strict to settings * removed index.json in favour of a single defaults file * harmonised name fieldsofstudy across arxiv * using soft alias until a future PR to minimise changes * added novelty back in * sorted json * sorted json * sorted json * changed schema_transformor to use new simpler mapping * removed to/from keys * new null syntax mapping implemented * cleaned and sorted json * adding temporary eurito-dev index to avoid conflating es7 compatibility issues * adding temporary eurito-dev index to avoid conflating es7 compatibility issues * testing es7 on cordis only * testing es7 on cordis only * testing es7 on cordis only * changes to make cordis es7 run * eurito-dev iteration * compatibility issues between arxlive and eurito arxiv * sorted json * pycountry change no longer assumes not null country * needed to split pathstub args * removed redundant es mappings * empty gtr transformation * [267] Pool ES mappings across datasets (#280) * changed branch name * mappings build * updated docs * updated docs * updated docs * added docstrings * added dynamic strict to settings * removed index.json in favour of a single defaults file * using soft alias until a future PR to minimise changes * cleaned and sorted json * [267] Tidy & slim schema transformations (#281) * pruned deprecated schema transformations * updated fos fieldname on arxlive * unified data set schema transformations * restructured directory * refactored references to schema_transformation * refactored references to schema_transformation * slimmed down transformations, and included entity_type * pruned ontology * tidied schemas * consistency tests * reverted unrelated json file * harmonised name fieldsofstudy across arxiv * added novelty back in * sorted json * sorted json * sorted json Co-authored-by: Joel Klinger <[email protected]> Co-authored-by: Joel Klinger <[email protected]> * patched out es config setup from tests * removed redundant tests * fixed json formatting * fixed bad table name (NB table was empty anyway) * fixed bad table name (NB table was empty anyway) * gtr ontology * none included for testing * added schema transformation * picked up bug in test * gtr ontology is self consistent * added gtr mapping * added gtr to config * fixed merge conflicts * fixed merge conflicts * changed json field names * instiutes are now analyzed and text * sorted and cleaned json * added gtr batchable * empty test commit * couple of tests * tidied json * added schema module to reqs, finished tests * set up root task * moved to es7 image * removed standard token filter, as it is deprecated in es6.5 then removed in es7 * removed start/end dates since theyre empty * misalignment between batchable keys and field names * fixed mapping and removed outcomes due to mapping explosion * removed seconds from fund date fields * tidied json * added none value edgecase to str truncation * Update elasticsearchplus.py Co-authored-by: Joel Klinger <[email protected]>
* make sure conf dir is empty * simplified es config * added orm es config reader * modified setup_es to pick up new es config * swapped es_mode for boolean * aliases now consistent with config * aliases now automatically located * added endpoint field to estasks * added endpoint field to sql2estasks * [267] Pool ES mappings across datasets (#280) * changed branch name * mappings build * updated docs * updated docs * updated docs * added docstrings * added dynamic strict to settings * removed index.json in favour of a single defaults file * using soft alias until a future PR to minimise changes * cleaned and sorted json * [267] Tidy & slim schema transformations (#281) * pruned deprecated schema transformations * updated fos fieldname on arxlive * unified data set schema transformations * restructured directory * refactored references to schema_transformation * refactored references to schema_transformation * slimmed down transformations, and included entity_type * pruned ontology * tidied schemas * consistency tests * reverted unrelated json file * harmonised name fieldsofstudy across arxiv * added novelty back in * sorted json * sorted json * sorted json Co-authored-by: Joel Klinger <[email protected]> Co-authored-by: Joel Klinger <[email protected]> * patched out es config setup from tests * removed redundant tests * fixed json formatting * none included for testing * picked up bug in test Co-authored-by: Joel Klinger <[email protected]>
* make sure conf dir is empty * simplified es config * added orm es config reader * modified setup_es to pick up new es config * swapped es_mode for boolean * aliases now consistent with config * aliases now automatically located * added endpoint field to estasks * added endpoint field to sql2estasks * changed branch name * mappings build * updated docs * updated docs * updated docs * added docstrings * pruned deprecated schema transformations * updated fos fieldname on arxlive * unified data set schema transformations * restructured directory * refactored references to schema_transformation * refactored references to schema_transformation * slimmed down transformations, and included entity_type * pruned ontology * tidied schemas * consistency tests * reverted unrelated json file * added dynamic strict to settings * removed index.json in favour of a single defaults file * harmonised name fieldsofstudy across arxiv * using soft alias until a future PR to minimise changes * added novelty back in * sorted json * sorted json * sorted json * changed schema_transformor to use new simpler mapping * removed to/from keys * new null syntax mapping implemented * cleaned and sorted json * adding temporary eurito-dev index to avoid conflating es7 compatibility issues * adding temporary eurito-dev index to avoid conflating es7 compatibility issues * testing es7 on cordis only * testing es7 on cordis only * testing es7 on cordis only * changes to make cordis es7 run * eurito-dev iteration * compatibility issues between arxlive and eurito arxiv * sorted json * pycountry change no longer assumes not null country * needed to split pathstub args * removed redundant es mappings * empty gtr transformation * [267] Pool ES mappings across datasets (#280) * changed branch name * mappings build * updated docs * updated docs * updated docs * added docstrings * added dynamic strict to settings * removed index.json in favour of a single defaults file * using soft alias until a future PR to minimise changes * cleaned and sorted json * [267] Tidy & slim schema transformations (#281) * pruned deprecated schema transformations * updated fos fieldname on arxlive * unified data set schema transformations * restructured directory * refactored references to schema_transformation * refactored references to schema_transformation * slimmed down transformations, and included entity_type * pruned ontology * tidied schemas * consistency tests * reverted unrelated json file * harmonised name fieldsofstudy across arxiv * added novelty back in * sorted json * sorted json * sorted json Co-authored-by: Joel Klinger <[email protected]> Co-authored-by: Joel Klinger <[email protected]> * patched out es config setup from tests * removed redundant tests * fixed json formatting * fixed bad table name (NB table was empty anyway) * fixed bad table name (NB table was empty anyway) * gtr ontology * none included for testing * added schema transformation * picked up bug in test * gtr ontology is self consistent * added gtr mapping * added gtr to config * fixed merge conflicts * fixed merge conflicts * changed json field names * instiutes are now analyzed and text * sorted and cleaned json * added geopoint * fixed bad json * fixed bad json Co-authored-by: Joel Klinger <[email protected]>
* make sure conf dir is empty * simplified es config * added orm es config reader * modified setup_es to pick up new es config * swapped es_mode for boolean * aliases now consistent with config * aliases now automatically located * added endpoint field to estasks * added endpoint field to sql2estasks * changed branch name * mappings build * updated docs * updated docs * updated docs * added docstrings * pruned deprecated schema transformations * updated fos fieldname on arxlive * unified data set schema transformations * restructured directory * refactored references to schema_transformation * refactored references to schema_transformation * slimmed down transformations, and included entity_type * pruned ontology * tidied schemas * consistency tests * reverted unrelated json file * added dynamic strict to settings * removed index.json in favour of a single defaults file * harmonised name fieldsofstudy across arxiv * using soft alias until a future PR to minimise changes * added novelty back in * sorted json * sorted json * sorted json * changed schema_transformor to use new simpler mapping * removed to/from keys * new null syntax mapping implemented * cleaned and sorted json * adding temporary eurito-dev index to avoid conflating es7 compatibility issues * adding temporary eurito-dev index to avoid conflating es7 compatibility issues * testing es7 on cordis only * testing es7 on cordis only * testing es7 on cordis only * changes to make cordis es7 run * eurito-dev iteration * compatibility issues between arxlive and eurito arxiv * sorted json * pycountry change no longer assumes not null country * needed to split pathstub args * removed redundant es mappings * empty gtr transformation * [267] Pool ES mappings across datasets (#280) * changed branch name * mappings build * updated docs * updated docs * updated docs * added docstrings * added dynamic strict to settings * removed index.json in favour of a single defaults file * using soft alias until a future PR to minimise changes * cleaned and sorted json * [267] Tidy & slim schema transformations (#281) * pruned deprecated schema transformations * updated fos fieldname on arxlive * unified data set schema transformations * restructured directory * refactored references to schema_transformation * refactored references to schema_transformation * slimmed down transformations, and included entity_type * pruned ontology * tidied schemas * consistency tests * reverted unrelated json file * harmonised name fieldsofstudy across arxiv * added novelty back in * sorted json * sorted json * sorted json Co-authored-by: Joel Klinger <[email protected]> Co-authored-by: Joel Klinger <[email protected]> * patched out es config setup from tests * removed redundant tests * fixed json formatting * fixed bad table name (NB table was empty anyway) * fixed bad table name (NB table was empty anyway) * gtr ontology * none included for testing * added schema transformation * picked up bug in test * gtr ontology is self consistent * added gtr mapping * added gtr to config * fixed merge conflicts * fixed merge conflicts * changed json field names * instiutes are now analyzed and text * sorted and cleaned json * added gtr batchable * empty test commit * couple of tests * tidied json * added schema module to reqs, finished tests * set up root task * moved to es7 image * removed standard token filter, as it is deprecated in es6.5 then removed in es7 * removed start/end dates since theyre empty * misalignment between batchable keys and field names * fixed mapping and removed outcomes due to mapping explosion * removed seconds from fund date fields * tidied json * added none value edgecase to str truncation * Update elasticsearchplus.py Co-authored-by: Joel Klinger <[email protected]>
Related to #267, which isn't closed until #280 is addressed (which is the base of this branch).
entity_type
to new dataset ontologiestier_1.json
(renamedontology.json
)dev
pipeline validation