This repository has been archived by the owner on Aug 13, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 5
[266] Refactor and simplify ES configuration #275
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jaklinger
added
dataset: NIH
National Institutes of Health
dataset: Crunchbase
crunchbase.com
dataset: Meetup
meetup.com
dataset:CORDIS
proj: EURITO
tech: ES
dataset: gtr
GtR
proj: HealthMosaic
proj: arxlive
dataset: arxiv
arxiv
dataset: patstat
labels
Jun 1, 2020
jaklinger
added this to the Create integrated pipeline for data collections to ES milestone
Jun 1, 2020
This was referenced Jun 3, 2020
* changed branch name * mappings build * updated docs * updated docs * updated docs * added docstrings * added dynamic strict to settings * removed index.json in favour of a single defaults file * using soft alias until a future PR to minimise changes * cleaned and sorted json * [267] Tidy & slim schema transformations (#281) * pruned deprecated schema transformations * updated fos fieldname on arxlive * unified data set schema transformations * restructured directory * refactored references to schema_transformation * refactored references to schema_transformation * slimmed down transformations, and included entity_type * pruned ontology * tidied schemas * consistency tests * reverted unrelated json file * harmonised name fieldsofstudy across arxiv * added novelty back in * sorted json * sorted json * sorted json Co-authored-by: Joel Klinger <[email protected]> Co-authored-by: Joel Klinger <[email protected]>
jaklinger
added a commit
that referenced
this pull request
Sep 28, 2020
* make sure conf dir is empty * simplified es config * added orm es config reader * modified setup_es to pick up new es config * swapped es_mode for boolean * aliases now consistent with config * aliases now automatically located * added endpoint field to estasks * added endpoint field to sql2estasks * [267] Pool ES mappings across datasets (#280) * changed branch name * mappings build * updated docs * updated docs * updated docs * added docstrings * added dynamic strict to settings * removed index.json in favour of a single defaults file * using soft alias until a future PR to minimise changes * cleaned and sorted json * [267] Tidy & slim schema transformations (#281) * pruned deprecated schema transformations * updated fos fieldname on arxlive * unified data set schema transformations * restructured directory * refactored references to schema_transformation * refactored references to schema_transformation * slimmed down transformations, and included entity_type * pruned ontology * tidied schemas * consistency tests * reverted unrelated json file * harmonised name fieldsofstudy across arxiv * added novelty back in * sorted json * sorted json * sorted json Co-authored-by: Joel Klinger <[email protected]> Co-authored-by: Joel Klinger <[email protected]> * patched out es config setup from tests * removed redundant tests * fixed json formatting * none included for testing * picked up bug in test Co-authored-by: Joel Klinger <[email protected]>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
dataset: arxiv
arxiv
dataset:CORDIS
dataset: Crunchbase
crunchbase.com
dataset: gtr
GtR
dataset: Meetup
meetup.com
dataset: NIH
National Institutes of Health
dataset: patstat
proj: arxlive
proj: EURITO
proj: HealthMosaic
tech: ES
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #266
This will likely require spawning of sub-PRs to manage complexity of reviewing the rewiring of
a) the base config [this PR]
b) the ES mappings [PR #280]
c) schema transformation tidying and mapping tests [PR #281]
d) the pipeline codebase [need to spawn this PR by rebasing from this branch]