-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📊 Create combined trust dataset #1409
Conversation
# Process data. | ||
|
||
# Concatenate the two tables. | ||
tb = pr.concat([tb_latino, tb_ess[["country", "year", "trust"]]], ignore_index=True, short_name="trust_surveys") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question, @pabloarosado (not to review the entire PR): I am using this pr.concat
function to keep metadata from both sources. Though it seems that it keeps part of it, when I run the steps the ETL doesn't find sources, so for now I'm adding them in trust_surveys.meta.yml
, which is not ideal. Is there anything I should change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, pr.concat
should ensure all columns keep their original licenses and sources (or origins). If a column is shared by multiple tables concatenated, then that column should have multiple sources. If that is not happening, then there's a bug somewhere (please let me know).
Another topic is how to propagate additional metadata (e.g. display
of the variables). Mojmir created a PR so that pr.concat
automatically propagates all metadata of variables, but it has the issue that, if a column is shared by multiple tables, then only the metadata of the last table will prevail. In my opinion, we have to decide how to combine metadata in such cases. So, if this is your situation (you want to propagate all metadata fields, beyond licenses/sources/origins) then unfortunately you'll have to do it manually, for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that the dataset sources (the ones you see when doing table.metadata.dataset
) are currently not properly propagated (I have a fix for that in an open PR), but, in the end, what matters the most is that each of the individual variables has the correct sources and licenses.
I'd need to look into the operations done to the trust variable, to see where the metadata is lost. But the display
metadata is definitely not taken care of (yet), so yes, that's expected to be lost.
In a nutshell, yes, feel free to redefine the metadata manually. We'll need to look into these cases to see how we can improve metadata propagation in the future.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Don't close it |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Create trust questions dataset from the European Social Survey, rounds 9 and 10. The data is extracted from Stata code, which estimates the percentages of trust selecting scores 6 to 10 in a scale from 0 (no trust) to 10 (complete trust). These percentages are estimated using survey weights and stratification.
Part of this issue: https://github.com/owid/owid-issues/issues/1127