Add either a function or a data.frame in the package to get the list of respondent metadata columns #272

chrisumphlett · 2022-07-20T19:32:09Z

I would like to be able to have the package provide a global list of the 17 standard metadata columns (plus browser meta info which comes through like response data but should be treated like metadata IMHO), rather than me having them hardcoded. Perhaps others have them hardcoded too; if/when Qualtrics adds another, we'll all need to update our list. If the package holds it then we could automagically have it update whatever we're doing in our code.

Why, you might ask? Here's a 3:42 video with context on why I have a hardcoded list today, and why you might want to have one as well.

re: the missing metadata I refer to in the video, I was wrong. Qualtrics labels it as "Response Type" but the column is "Status" and that is included.

I can work on this but am curious to hear how those of you who have managed the package long-term feel about including this at all, and if so, whether you'd want it as a function, data.frame, or something else.

Relevant links from Qualtrics' docs:

juliasilge · 2022-07-21T19:56:20Z

I could see that being useful! I lean toward adding it as a dataframe? Could there be any automated way for us to regularly check (like with CI) that the info is not outdated?

chrisumphlett · 2022-07-21T20:24:13Z

Maybe from this?
https://api.qualtrics.com/02a178db2ab5b-get-schema-response

And perhaps it could be instead a way to get all of the "dataTypes" for a survey rather than just having the list of metadata. It even has the display order stuff. But the browser meta info isn't included

chrisumphlett · 2022-07-29T18:20:36Z

Also response quality columns: https://www.qualtrics.com/support/survey-platform/survey-module/survey-checker/response-quality/

eg, qualtrics_data_quality_cols <- c("q_datapolicyviolations", "q_recaptchascore", "q_relevantidduplicatescore","q_ballotboxstuffing", "q_ambiguoustextpresent", "q_unansweredPercentage", "q_unansweredquestions", "q_straightliningCount", "q_straightlining_Percentage", "q_straightliningquestions")

jmobrien · 2022-08-18T20:14:20Z

Hi @chrisumphlett, just saw your video. You got one thing wrong--you said it was "okay" that requests to exclude the metadata and questions didn't actually exclude them. It wasn't okay--it was my mistake.

Sorry about that! But I think you had the right idea about an approach, and I think it should work now.

(FYI @juliasilge requests for "exclude all types of this variable" specifically were being stripped out of the request body due to a subtlety I didn't catch, and didn't properly write a test for. Fixed now, plus added a suitable test.)

So, @chrisumphlett, if you want just the metadata & embedded data for one table, you should be able to do it this way:

fetch_survey(
  "[surveyid]",
  include_questions = NA
)

Similarly, for just the questions (keeping response ID to pivot/merge on):

fetch_survey(
  "[surveyid]",
  include_metadata = "ResponseId",
  include_embedded = NA
)

Does that look right for what you need?

I still tend to agree with your thought that having something that more automatically stays up-to-date with the API schema would be better, but I hopefully this can serve well enough for now.

chrisumphlett · 2022-08-19T15:01:02Z

Thanks Jim. This is definitely an improvement, not perfect (through not fault of yours). Qualtrics data model is not well-aligned with the categories available. When I run the call to exclude questions, I get some things I don't want, and don't get things I do want.

Things that are included

Calculated columns, like the EN translation of an open-ended question, Topics that users have created, Sentiment polarity, score and label for a response

Things not included

The "browser" fields... information on OS, web browser, resolution. We learned that Qualtrics changed the default of how this is added to the survey in the last few years, which had led to some inconsistency in how I was processing them.

This stuff can be programmed around. I'm doing this with the browser fields now, looking for questions that end in _browser. The same could be done to find *Topics and then re-assign those as response data.

It's a strange, arbitrary division that Qualtrics makes. NPS_GROUP is another calculated column, and that one comes through with question data. Same for all of the *DO* display order fields. Why should those and topics/sentiment/translations be treated differently?

chrisumphlett · 2022-08-19T15:30:46Z

Here's a 2.5m video that shows what I ultimately settled on with several hardcoded lists and the metadata() function.

juliasilge · 2022-08-19T15:36:46Z

Let's reopen this and still consider if we should add something like a dataframe of columns.

jmobrien · 2022-08-19T17:29:48Z

@chrisumphlett, I didn't notice the music in the last video until I realized I was grooving a bit by the end. Thanks for that.

You're describing things that closely overlap with what I dealt with in my own big project, so I can sympathize. The trouble with the browser meta items and the timing items, both, is that they are types of questions in the Qualtrics sense--an item that goes in a block on the main page (using the web interface). So I don't think we can do anything about that via tweaks to the API requests. Same for display ordering items, which despite being metadata are closely linked with their questions.

I actually worry about metadata() since it's an old function that looks at the V2 API, and it isn't always accurate. (IIRC one example is that if there's an embedded data field specified in a way that it never receives data, it will be listed in metadata() but won't be present in the response download, at least by default.) metadata() is likely to be deprecated eventually here, and Qualtrics will probably kill that endpoint eventually even if we don't. Unfortunately its replacement fetch_description() doesn't contain the same embedded data element.

We should think about this more, but like you I ended up setting up a programmatic solution. But I will say that was why I added extract_colmap(), which ended up being the "dataframe of columns" that gave me an approach that worked consistently across all columns

Below is one from my API testing survey. The key tool for me was the "ImportId" column, which is non-user-editable and thus far more standardized. I added in browser meta, timing, topics, quality checks & scoring to see:

All "questions" start with "QID".
Timing and browser meta again have standard suffixes, but with the "QID" prefix so it's unambiguous.
Question-level randomization gets "_DO" or "_ADO" appended (choice v answer randomization).
Flow- and block-level randomization is FL_.*_DO & BL_.*_DO, respectively.
metadata names have a unique set that are different than the qnames (which are the customizable names for q's/embeds)
scoring starts with SC_
Text topics start with the related item's QID, then extra stuff
everything else (embedded data, quality) just has the same ImportId as qname

Here's how they break out via API params:

Metadata (same as always, looks pretty fixed):

Questions (all QID, FL_, BL_, almost, see next):

Embedded (embeddeds, quality, scores, & topics. Topics breaks the pattern for QID's):

jmobrien · 2022-08-19T17:35:33Z

So, anyway, one option I'm thinking about now that I've written this is that we could apply this logic to the code that creates column mapping and make another column that labels variables by type based on "ImportId", which would make filtering like this a lot more straightforward. Wouldn't be too hard--basically something I did before for myself.

I could see edge cases we'd need to address (e.g., I've encountered embedded data fields labeled as "status" so you'd be facing duplicates). But I think it's something we could work around (we already know what metadata columns the user requested for any given download).

Still might be good if there's a way to ensure this stays up-to-date with whatever Qualtrics does for the API, but that seems like a separate question.

jmobrien · 2022-08-19T17:43:11Z

Caveat - we don't know what the user requested if they just call read_survey(). I'm unclear how much that actually happens, though, even though the read_survey() function is exposed.

chrisumphlett · 2022-08-23T12:26:25Z

re: the music. Using my company's video editor, Camtasia, you can add an "emphasize audio" effect to your screen recording and it automatically suppresses other audio to a good background level. Glad you enjoyed :)

I asked our customer researcher to not use status as an embedded data field after that caused a headache. It's another weird Qualtrics thing where the description ("Response Type") doesn't match the question name.

I agree w/you that Qualtrics treats the browser fields as questions because they are blocks in the survey flow. My philsophical objection I guess would be that it shouldn't be added to the survey flow in the first place. Perhaps there's some reason in their view that it needs to be done that way.

I don't think I have a good perspective on balancing usability for the broad user base of the package and these types of power user functionality. For me, the ability to get the embedded data dynamically has made a significant improvement in my process in combination with the static lists for respondent metadata, browser, and timing.

jmobrien · 2022-08-31T20:11:38Z

Thanks. Generally agree; separating out variable types was a really important part of my processes as well.

We might eventually want to think about something better for handling duplicated names, including things like status

jmobrien mentioned this issue Aug 18, 2022

Bugfix & tests for fetch_survey() when include_* = NA, closes #272 #277

Merged

juliasilge closed this as completed in 107738a Aug 18, 2022

juliasilge reopened this Aug 19, 2022

jmobrien mentioned this issue Sep 7, 2022

Consider making convert = FALSE the default in fetch_survey() #281

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add either a function or a data.frame in the package to get the list of respondent metadata columns #272

Add either a function or a data.frame in the package to get the list of respondent metadata columns #272

chrisumphlett commented Jul 20, 2022

juliasilge commented Jul 21, 2022

chrisumphlett commented Jul 21, 2022

chrisumphlett commented Jul 29, 2022 •

edited

Loading

jmobrien commented Aug 18, 2022

chrisumphlett commented Aug 19, 2022

chrisumphlett commented Aug 19, 2022

juliasilge commented Aug 19, 2022

jmobrien commented Aug 19, 2022 •

edited

Loading

jmobrien commented Aug 19, 2022

jmobrien commented Aug 19, 2022

chrisumphlett commented Aug 23, 2022

jmobrien commented Aug 31, 2022

Add either a function or a data.frame in the package to get the list of respondent metadata columns #272

Add either a function or a data.frame in the package to get the list of respondent metadata columns #272

Comments

chrisumphlett commented Jul 20, 2022

juliasilge commented Jul 21, 2022

chrisumphlett commented Jul 21, 2022

chrisumphlett commented Jul 29, 2022 • edited Loading

jmobrien commented Aug 18, 2022

chrisumphlett commented Aug 19, 2022

chrisumphlett commented Aug 19, 2022

juliasilge commented Aug 19, 2022

jmobrien commented Aug 19, 2022 • edited Loading

jmobrien commented Aug 19, 2022

jmobrien commented Aug 19, 2022

chrisumphlett commented Aug 23, 2022

jmobrien commented Aug 31, 2022

chrisumphlett commented Jul 29, 2022 •

edited

Loading

jmobrien commented Aug 19, 2022 •

edited

Loading