Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to classify tables/fields within a model #92

Open
murphyke opened this issue Sep 23, 2015 · 7 comments
Open

Ability to classify tables/fields within a model #92

murphyke opened this issue Sep 23, 2015 · 7 comments

Comments

@murphyke
Copy link
Member

Sometimes I wish I could distinguish between the PEDSnet vocabulary tables and the core tables. Maybe I want a tags column for tables (and fields?), or maybe I want the vocabulary to be a separate data model that can be composed into the PEDSnet model .... Thoughts?

@bruth
Copy link
Contributor

bruth commented Sep 24, 2015

In what context is it difficult to distinguish? Other than knowing which tables are the vocabulary tables (which not everyone does), where would this be useful?

@murphyke
Copy link
Member Author

Yes, knowing which tables are the vocab tables. The use case at hand is wanting to automatically denormalize references from the main tables to concept.concept_name via concept.concept_id. This is a hack but involves finding all columns named *_concept_id that are not in vocabulary tables and creating new *_concept_name columns. There probably aren't enough use cases to justify this, but I thought I'd mention it.

@bruth
Copy link
Contributor

bruth commented Sep 24, 2015

This is a hack but involves finding all columns named *_concept_id that are not in vocabulary tables and creating new *_concept_name columns.

In practice, reliable conventions appear to be just as good as constraints 😉 Sarcasm aside, the references file could be used to determine which foreign keys are associated with the vocab tables.

import csv

vocab_tables = {'concept', ...}
matches = []

with open('references.csv') as f:
    reader = csv.DictReader(f)

    for row in reader:
        if row['field'].endswith('_concept_id') and row['ref_table'] not in vocab_tables:
            matches.append(row)

for m in matches:
    # create the corresponding `_concept_name` column

@murphyke
Copy link
Member Author

Thanks; I already wrote the code using the dmsa module; I was just slightly resenting the required magic list of vocab tables ....

@bruth
Copy link
Contributor

bruth commented Sep 24, 2015

magic list of vocab tables ....

That puts things into perspective. I guess this information is no where. Tags or labels could be an interesting piece of the spec. As long as we state they are optional and the semantics are model specific, then it could work.

@gracebrownecodes
Copy link
Contributor

I support the addition of tags or labels. I agree no attempt should be made to specify the semantics. In other words, it should be entirely up to the data model governance body whether or for what purpose to implement them. In this case, we are the data model governance body.

@gracebrownecodes
Copy link
Contributor

The addition of the "tags" attribute can be included with the json-table-schema refactor work. I suggest a tag1=val1; tag2=val2 format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants