Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrative experiment registration #87

Open
markwhiting opened this issue Oct 6, 2023 · 3 comments · Fixed by #121
Open

Integrative experiment registration #87

markwhiting opened this issue Oct 6, 2023 · 3 comments · Fixed by #121
Assignees

Comments

@markwhiting
Copy link
Member

markwhiting commented Oct 6, 2023

A preregistration based on the As Predicted template.

Registration

Data collection. Have any data been collected for this study already?

Yes, we already collected the data.
No, no data have been collected for this study yet.
It's complicated. We have already collected some data but explain in Question 8 why readers may consider this a valid pre-registration nevertheless.
(Note: 'Yes' is not an accepted answer.)

Hypothesis What's the main question being asked or hypothesis being tested in this study?

What types of claims are the most commonsensical, given a taxonomy of claims?
Our existing hypothesis reflect those in our existing analysis: https://osf.io/9kxt2/

Dependent variable Describe the key dependent variable(s) specifying how they will be measured.

Metrics defined https://osf.io/9kxt2/

  1. Individual and Statement commonsensicality
  2. PQ common sense

With the addition of:
3. The definite integral of PQ common sense

Conditions How many and which conditions will participants be assigned to?

Conditions are design points in the space of possible statement types (not all of which will be sampled):

  • 6 binary knowledge properties: Social vs Physical, Everyday vs Abstract, Figure of speech vs Literal language, Normative vs Positive, Opinion vs Factual and Knowledge vs Reasoning.
  • knowledge domain question with 13 values: General reference, Culture and the arts, Geography and places, Health and fitness, History and events, Human activities, Mathematics and logic, Natural and physical sciences, People and self, Philosophy and thinking, Religion and belief systems, Society and social sciences, and Technology and applied sciences.
  • 7 sources of claims: Category prompt, Situation prompt, ConceptNet, Atomic, News media, Campaign emails, Aphorisms.

2^6 * 13 * 7 = 5,824 total design points.

The stimulus for each design point will be a single set of 15 statements randomly sampled from an updated version of the corpus in https://osf.io/9kxt2/. If design points don't contain enough statements, new statements will be generated with a language model. A date stamped version of the corpus, design point samples, and acquisition pipeline is available at https://github.com/Watts-Lab/commonsense-statements.

Analyses Specify exactly which analyses you will conduct to examine the main question/hypothesis.

  1. We will compare core metrics of commonsensicality across grouping variables with the results in https://osf.io/9kxt2/. We will report 95%CI to evaluate similarity between results.
  2. We will train a model to predict the area under the PQ common sense curve for each sampled design point and analyze the models out of sample predictive accuracy using $Q^2$ for future design points. We will do this progressively as we sample more design points.

Outliers and Exclusions Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations.

We will exclude data of participants who provide incomplete responses or fail to meet attention checks in the survey tool.

Sample Size How many observations will be collected or what will determine sample size?

No need to justify decision, but be precise about exactly how the number will be determined.

We aim to sample at least 100 participants per design point. We intend to stop sampling when our $Q^2$ stabilizes for new design points — when adding more training data doesn't improve accuracy in an out of sample prediction of a new design point.

Other Anything else you would like to pre-register?

(e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?)

We intend to make registrations of predictions for each design point before sampling it.

Name Give a title for this AsPredicted pre-registration

Suggestion: use the name of the project, followed by study description.

World scale evaluation of common sense

Type of study.

Class project or assignment
Experiment
Survey
Observational/archival study
Other:

Data source

Prolific
MTurk
University lab
Field experiment / RCT
Other:

@markwhiting
Copy link
Member Author

markwhiting commented Nov 2, 2023

7 sources of claims: Category prompt, Situation prompt, ConceptNet, Atomic, News media, Campaign emails, Aphorisms.

Update to talk about: direct elicitation, in-the-wild use, corpus (which we will probably deemphasize in the future), and have GPT as an additional construct here.

@markwhiting
Copy link
Member Author

We aim to sample at least 100 participants per design point. We intend to stop sampling when our stabilizes for new design points — when adding more training data doesn't improve accuracy in an out of sample prediction of a new design point.

We should have a state a goal but also add discussion of future batches, i.e., that we might find a better sample size and adjust accordingly.

@markwhiting
Copy link
Member Author

Shift to do all points at start.

Collect more points on the back end.

@amirrr amirrr linked a pull request Mar 26, 2024 that will close this issue
markwhiting added a commit that referenced this issue Apr 11, 2024
@markwhiting markwhiting reopened this Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant