Skip to content

DataSet Contents and DataSetManifest Structure

Stephanie Hong edited this page Mar 23, 2023 · 19 revisions

DataSet submitted will come in four different flavors: PCORI, i2b2ACT, TriNetX, OMOP

Dataset should be a zip and the file name should contain data site name and date of submission: DataSiteNameSubmitDate.zip (jhu_052720.zip).

The content of the zip file should include: data files (*.csv | pipe delimited), Manifest.csv, dataCount.csv

Manifest.csv – this table contains one row with the following information. Below is the dataSet manifest table structure:

  • SITE_ABBREV_NAME– Abbreviation of the site name
  • SITE_NAME – Name of the site
  • CONTACT_NAME - Full name of N3C technical contact at your site
  • CONTACT_EMAIL - Email address of N3C technical contact at your site
  • CDM_NAME - Choose one: OMOP | PCORnet | ACT | TriNetX
  • CDM_VERSION - Numbered version of your chosen CDM
  • N3C_PHENOTYPE_YN - Enter 'Y' if you are using the N3C phenotype code to define your cohort;
  • N3C_PHENOTYPE_VERSION -If using the N3C phenotype, which numbered version was used for this run?
  • RUN_DATE - Date the current extract was run.
  • UPDATE_DATE - Date for which the data in this extract is current (i.e., the maximum date present in your dataset)
  • NEXT_SUBMISSION_Date - Next planned data submission date
  •   VOCABULARY_VERSION - Vocabulary version
    
  • DATASET_STATUS - Data ingestion status, leave it blank used internally
  • DATA_PARTNER_ID --Internal code – leave it blank, used by N3C in order to sequence generate key fields
  •   Resolved - An issue about this structure is found here: 
    

#4

Clone this wiki locally