Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📊 Create combined trust dataset #1409

Closed

Conversation

paarriagadap
Copy link
Contributor

@paarriagadap paarriagadap commented Aug 2, 2023

Create trust questions dataset from the European Social Survey, rounds 9 and 10. The data is extracted from Stata code, which estimates the percentages of trust selecting scores 6 to 10 in a scale from 0 (no trust) to 10 (complete trust). These percentages are estimated using survey weights and stratification.

Part of this issue: https://github.com/owid/owid-issues/issues/1127

@paarriagadap paarriagadap marked this pull request as ready for review August 2, 2023 11:09
@paarriagadap paarriagadap requested review from lucasrodes and spoonerf and removed request for lucasrodes and spoonerf August 2, 2023 11:13
@paarriagadap paarriagadap marked this pull request as draft August 4, 2023 10:24
# Process data.

# Concatenate the two tables.
tb = pr.concat([tb_latino, tb_ess[["country", "year", "trust"]]], ignore_index=True, short_name="trust_surveys")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question, @pabloarosado (not to review the entire PR): I am using this pr.concat function to keep metadata from both sources. Though it seems that it keeps part of it, when I run the steps the ETL doesn't find sources, so for now I'm adding them in trust_surveys.meta.yml, which is not ideal. Is there anything I should change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, pr.concat should ensure all columns keep their original licenses and sources (or origins). If a column is shared by multiple tables concatenated, then that column should have multiple sources. If that is not happening, then there's a bug somewhere (please let me know).
Another topic is how to propagate additional metadata (e.g. display of the variables). Mojmir created a PR so that pr.concat automatically propagates all metadata of variables, but it has the issue that, if a column is shared by multiple tables, then only the metadata of the last table will prevail. In my opinion, we have to decide how to combine metadata in such cases. So, if this is your situation (you want to propagate all metadata fields, beyond licenses/sources/origins) then unfortunately you'll have to do it manually, for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. When I only leave the dataset's title and description in the yaml file (commenting out the rest) the sources are not propagated:
image

And for the trust variable I have a small subset of the metadata, I lose the description and the display metadata but I keep the title and units (short and long)
image

So for now I need to fill metadata again to get
image
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that the dataset sources (the ones you see when doing table.metadata.dataset) are currently not properly propagated (I have a fix for that in an open PR), but, in the end, what matters the most is that each of the individual variables has the correct sources and licenses.
I'd need to look into the operations done to the trust variable, to see where the metadata is lost. But the display metadata is definitely not taken care of (yet), so yes, that's expected to be lost.
In a nutshell, yes, feel free to redefine the metadata manually. We'll need to look into these cases to see how we can improve metadata propagation in the future.

@paarriagadap paarriagadap changed the title Create European Social Survey dataset - Trust questions Create trust dataset: ESS + several barometers Aug 4, 2023
@paarriagadap paarriagadap changed the title Create trust dataset: ESS + several barometers Create combined trust dataset Aug 4, 2023
@lucasrodes lucasrodes changed the title Create combined trust dataset 📊 Create combined trust dataset Aug 10, 2023
@stale
Copy link

stale bot commented Oct 9, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Oct 9, 2023
@paarriagadap
Copy link
Contributor Author

Don't close it

@stale stale bot removed the wontfix This will not be worked on label Oct 9, 2023
Copy link

stale bot commented Jan 27, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jan 27, 2024
@paarriagadap paarriagadap deleted the create-european-social-survey-dataset-trust branch January 28, 2024 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants