Skip to content

Commit

Permalink
Merge pull request #60 from gf-dcc/patch/upload-help
Browse files Browse the repository at this point in the history
Revise data sharing help
  • Loading branch information
anngvu authored Sep 27, 2023
2 parents fc4960b + a27c6da commit 8a4368f
Show file tree
Hide file tree
Showing 2 changed files with 90 additions and 21 deletions.
4 changes: 2 additions & 2 deletions pages/onboarding/_meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@
"account-setup": "Account Setup",
"dsp-template": "Data Sharing Plan",
"data-processing-organization": "Data Processing & Organization",
"data-sharing": "Data Sharing Implementation",
"data-analysis": "Data Analysis Procedures"
"data-sharing": "Data Sharing",
"data-analysis": "Data Visualization & Analysis"
}
107 changes: 88 additions & 19 deletions pages/onboarding/data-sharing.mdx
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@

## Introduction

The figure below provides an overview of contributing data.

The figure below provides an overview of contributing data (data upload and data tagging).
For more details, refer to each of the steps below.

<iframe allowfullscreen frameborder="0"
src="https://lucid.app/documents/embedded/eb1cfc68-9729-4114-a149-9568b8f92ad5"
Expand All @@ -12,30 +12,99 @@ The figure below provides an overview of contributing data.
>
</iframe>

## Uploading data
## 1. Uploading data

First, make sure you know where your data should go. Most data have a pre-designated folder, and data are organized by assay and then the level of data.
You can often create subfolders if you prefer further organization by batches.

### Using the UI

The UI is only recommended for smaller numbers of files and files under ~100Mb in size.
In the designated folder, go to to the **Folder Tools** menu to see upload options.
General documentation for the UI can be found [here](https://help.synapse.org/docs/Uploading-and-Organizing-Data-Into-Projects,-Files,-and-Folders.2048327716.html#UploadingandOrganizingDataIntoProjects,Files,andFolders-UploadingaFileviatheSynapseUI).


### Programmatic clients

This is suitable for larger and more numerous files and allows for more efficient and streamlined uploading.
Three clients exist: the command line tool, Python script, or R.
For all options refer to the documentation [here](https://help.synapse.org/docs/Uploading-and-Organizing-Data-Into-Projects,-Files,-and-Folders.2048327716.html#UploadingandOrganizingDataIntoProjects,Files,andFolders-UploadingaFileProgrammatically).


#### Most typical workflow

The command-line workflow should be familiar to you if you are a bioinformatician/data engineer/data scientist.
If you need help, please contact the DCC.

**For the most typical and robust workflow, here are the steps with the Python command-line client:**

1. **Install Python package** from PyPI: https://pypi.org/project/synapseclient/. This will also automatically install the command line utility. Test out that the command line utility is working by typing `synapse help` and feel free to review [docs](https://python-docs.synapse.org/build/html/articles/cli.html) for the Python CLI client.
2. For large uploads, it is best to **create an access token**. Go to your **Account profile > Account Settings > Personal Access Tokens > Create new token**.
3. For convenience, copy and paste the token into a `.synapseConfig` text file. This text file simply looks like:
```
[authentication]
authtoken = sometokenstringxxxxxxxxxxxxxxxxxx
```

4. First, you need to create a list of files to transfer (called a manifest). The `parent-id` is the Synapse folder you are trying to upload files to.

`synapse manifest --parent-id syn12345678 --manifest-file manifest.txt PATH_TO_DIR_WITH_FILES`

5. This should create a manifest file `manifest.txt` for the tool to reference. Now check that `.synapseConfig` is present locally to provide authentication. You can now do:

`synapse sync manifest.txt`

There are options for retries if you might have poor connection, e.g.

`synapse sync --retries 3 manifest.txt`

#### One-off uploads

For just a few files (but maybe still very large-size files) where you don't need to create a list of files, a more convenient command might be:

`synapse store my_image.tiff --parentId syn12345678`


### Alternative methods

Under rare unique circumstances, the DCC can explore the following options:

- Receiving data via physical hard drive
- Utilizing Globus transfers (if really needed)
- Transferring from a custom S3 bucket


## 3. Annotating data

### Data Curator App

Once data has been uploaded, data annotation is done through Sage's [Data Curator App](https://dca.app.sagebionetworks.org/).
Note that the application is in beta, so if there is any issue with the below steps please contact the DCC.

### Using the UI or programmatic clients
1. Go to the link above, and on the very first selection page **select your DCC, Gray Foundation**.
2. Next **select your Synapse project**.
3. Next there will be a flattened list of all folders. Select **the folder where you've just uploaded data**.
4. Next, again, you'll see a selection where you can **select the template type**. This should be obvious in most cases.
5. Click **Download template** and go through the steps of populating the template, validating it, and submitting it in the app.

Data can be uploaded using the User Interface (UI) of Synapse. You can find detailed instructions in the documentation [here](https://help.synapse.org/docs/Uploading-and-Organizing-Data-Into-Projects,-Files,-and-Folders.2048327716.html#UploadingandOrganizingDataIntoProjects,Files,andFolders-UploadingaFileviatheSynapseUI).
### Troubleshooting

However, for larger and numerous files, we recommend utilizing programmatic methods such as the command line, Python, or R. This allows for more efficient and streamlined uploading. You can refer to the documentation [here](https://help.synapse.org/docs/Uploading-and-Organizing-Data-Into-Projects,-Files,-and-Folders.2048327716.html#UploadingandOrganizingDataIntoProjects,Files,andFolders-UploadingaFileProgrammatically) for instructions on uploading files programmatically.
Some common problems:
- You cannot see your project or folder -- This is usually a permissions issue. Please contact us to get your account added.
- You do not see a template that looks right -- Usually the right template should already exist. Contact us and we'll add the right template.
- The app hangs or crashes -- Contact the DCC and we'll contact the tooling team for troubleshooting.
- You have a validation issue -- Check the validation error messages, and if it is confusing or you disagree, please contact the DCC.

### Help and alternative methods

Under unique circumstances, the DCC can explore the following options:
## 4. Data release

Sending data via physical hard drive
Utilizing Globus transfers
Providing assistance with programmatic methods
Once data has been uploaded and annotated, they can be reviewed for any issues, assigned a DOI (especially to use in a publication), and shared in other apps or made public.
However, this does not usually happen right after data upload because of the embargo/holding period where it has to be confirmed that data can be moved to the next phase.

## Annotating data
### Bring data to cBioPortal, CELLxGENE, etc.

Data annotation is done through Sage's Data Curator App ([GitHub](https://github.com/Sage-Bionetworks/data_curator)).
Data are annotated by their folders and the right templates for that folder/data type should be selected in the interface.
The Gray Foundation currently has its own temporary instance, but we will be migrating to a central app that serves all consortia supported by Sage.
We will provide more instructions when the central app is live.
The DCC will coordinate on this manually once data is available in Synapse.

## Releasing data
### Adding a DOI

Once data has been annotated, they can be reviewed (for any governance issues), made public, and assigned a DOI by the DCC (especially to use in a publication).
Please coordinate with the DCC at this stage.
This can be done to include in a publication by you directly or with help from DCC.

1 comment on commit 8a4368f

@vercel
Copy link

@vercel vercel bot commented on 8a4368f Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.