Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datashare: initial S3 support #27

Merged
merged 10 commits into from
Nov 14, 2024
Merged

Datashare: initial S3 support #27

merged 10 commits into from
Nov 14, 2024

Conversation

pdiakumis
Copy link
Member

Setting up S3 support for umccrise and WTS results.
For now the following will work.
I'll add CLI support next.

d <- tibble::tribble(
  ~subject_id, ~library_id, ~wgts,
  "SBJ05374", "L2401596", "WGS",
  "SBJ05374", "L2401585", "WTS",
)
# log into AWS umccr prod
token <- rportal::orca_jwt() |> rportal::jwt_validate()
urls <- d2 |>
  dplyr::rowwise() |>
  dplyr::mutate(
    share = ifelse(
      type == "WGS",
      list(rportal::datashare_um_s3(.data$library_id, token)),
      list(rportal::datashare_wts_s3(.data$library_id, token))
    )
  )
# then just unnest the share list-col
urls |> tidyr::unnest(share)

@pdiakumis
Copy link
Member Author

pdiakumis commented Nov 14, 2024

This should work now @ohofmann (subject id is optional, the URLs CSV will only contain the library id). Also note I haven't used --append in the first run.
Also-also note that I'm not grabbing FASTQs any more.
Also-also-also note I'm including BAM md5sum from Dragen.

./datashare.R --s3 --subject_id SBJ05374 --library_id_tumor L2401596 --csv_output urls.csv
./datashare.R --s3 --subject_id SBJ05857 --library_id_tumor L2401595 --csv_output urls.csv --append
./datashare.R --s3 --subject_id SBJ05854 --library_id_tumor L2401593 --csv_output urls.csv --append
./datashare.R --s3 --subject_id SBJ05853 --library_id_tumor L2401591 --csv_output urls.csv --append
./datashare.R --s3 --subject_id SBJ04893 --library_id_tumor L2401589 --csv_output urls.csv --append
./datashare.R --s3  --wts --subject_id SBJ05374 --library_id_tumor L2401585 --csv_output urls.csv --append
./datashare.R --s3  --wts --subject_id SBJ05857 --library_id_tumor L2401582 --csv_output urls.csv --append
./datashare.R --s3  --wts --subject_id SBJ05853 --library_id_tumor L2401579 --csv_output urls.csv --append
./datashare.R --s3  --wts --subject_id SBJ05854 --library_id_tumor L2401578 --csv_output urls.csv --append
./datashare.R --s3  --wts --subject_id SBJ04893 --library_id_tumor L2401577 --csv_output urls.csv --append
cut -d, -f1-6 urls.csv | head

libid,type,bname,size,lastmodified,filesystem
L2401596,BAM_normal,L2401015_normal.bam,70.61G,2024-11-11 14:11:27,s3
L2401596,BAM_tumor,L2401596_tumor.bam,163.29G,2024-11-11 14:11:29,s3
L2401596,BAMi_normal,L2401015_normal.bam.bai,8.94M,2024-11-11 14:11:27,s3
L2401596,BAMi_tumor,L2401596_tumor.bam.bai,9.56M,2024-11-11 14:11:29,s3
L2401596,BAMmd5sum_tumor,L2401015_normal.bam.md5sum,32,2024-11-11 14:11:27,s3
L2401596,BAMmd5sum_tumor,L2401596_tumor.bam.md5sum,32,2024-11-11 14:11:29,s3
L2401596,HTML_CPSR,SBJ05374__L2401596-normal.cpsr.html,4.06M,2024-11-11 18:02:35,s3
L2401596,HTML_CanRep,SBJ05374__L2401596_cancer_report.html,9.24M,2024-11-11 18:02:35,s3
L2401596,HTML_MultiQC,SBJ05374__L2401596-multiqc_report.html,3M,2024-11-11 18:02:35,s3
cut -d, -f1 urls.csv | uniq -c
   1 libid
  43 L2401596
  42 L2401595 # no DBS sigs for this one
  43 L2401593
  43 L2401591
  43 L2401589
   8 L2401585
   8 L2401582
   8 L2401579
   8 L2401578
   8 L2401577

@pdiakumis pdiakumis merged commit 35f4b4a into main Nov 14, 2024
3 checks passed
@pdiakumis pdiakumis deleted the datashare_s3 branch November 14, 2024 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant