Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide exported function to determine BioC version for R version #164

Open
hadley opened this issue Apr 21, 2023 · 8 comments
Open

Provide exported function to determine BioC version for R version #164

hadley opened this issue Apr 21, 2023 · 8 comments

Comments

@hadley
Copy link

hadley commented Apr 21, 2023

I know of at least two packages that have had to implement this themselves, so it would be useful for BiocManager to provide an exported interface.

@LiNk-NY
Copy link
Contributor

LiNk-NY commented Apr 21, 2023

Hi Hadley,

Do you have a specific use case in mind?
My first thought was something like:

> .version_bioc_for_r_version("4.2")
[1] '3.15' '3.16'

but I'm not sure how helpful that is.

We could also just export the .version_map with some annotations (e.g., active <lgl> column) and in tibble format.

Best,
Marcel

@hadley
Copy link
Author

hadley commented Apr 22, 2023

I'm looking into this for the rsconnect package — there we have to generate a package "manifest" that describes the version and source of all packages so we can install the same version on the server. Sometime the user has installed BioConductor package but there isn't a BioC repo included in getOption("repos"). (I don't know how common this is, but it at least affects me during testing) So I need to somehow figure out what BioC repo they probably used, based on the version of R they're using.

@mtmorgan
Copy link
Collaborator

mtmorgan commented Apr 22, 2023

I am supportive of providing this information, and include the following narrative only to clarify need.

How does rsconnect deal with the situation where a user has installed a package from, e.g., github, where there is no relationship between R and package version? It seems like the 'manifest' needs to record the version of all packages (and dependencies) in use, and then rsconnect needs to map package versions to possible source?

I ask knowing that (a) there is not a 1:1 mapping between R version and Bioconductor version and (b) there is a 'release' and 'devel' branch of Bioconductor, with the devel branch anticipating the next R / Bioc release combination

calendar --|--------------------------|---------------------------|--
R        --|--------- 4.1 ------------|---------- 4.2 ------------|--
release  --|---3.13 ---|---- 3.14 ----|--- 3.15 ---|---- 3.16 ----|--
devel    --|---3.14 ---| 3.15 / R 4.2 |--- 3.16 ---| 3.17 / R 4.3 |--

Knowing that the user has R 4.2 tells us that the user could be using Bioc 3.15, 3.16, or perhaps pre-release 3.15. The version of R is not sufficient to know version of Bioconductor.

It is not uncommon for support questions to indicate Bioc version mismatches -- Bioc packages have x.y.z release 'y' version as even, devel version as odd; a mix of even / odd 'y' versions in sessionInfo() points to an unsupported installation (maybe the user installed some packages from github or conda or ...), and BiocManager::valid() suggests remedies.

The 'z' version may no longer be available via BiocManager::install() / install.packages(repos = BiocManager::repositories()) -- for release packages these are archived following the CRAN scheme, but devel package 'z' versions can only be reconstructed via git.bioconductor.org. Even in the 'release' case it would be very difficult to reconstruct the package versions of dependent packages when 'z' was available (as also with CRAN).

Would it be sufficient for BiocManager to provide the output of something like

BiocManager:::.version_map()

?

@hadley
Copy link
Author

hadley commented Apr 23, 2023

Packages installed from GitHub are handled separately; we reinstall from the exact repo used.

It doesn't appear that BioConductor provides enough information to record the exact version used (it would be nice if the repository automatically included that in the DESCRIPTION), but an approximate version should be enough to get us close in most cases, and if not, then the user can provide an exact repository url by setting options(repos) appropriately.

@mtmorgan
Copy link
Collaborator

mtmorgan commented Apr 23, 2023

This is what I see for a package

> packageDescription("BiocGenerics")
Package: BiocGenerics
... usual stuff
git_url: https://git.bioconductor.org/packages/BiocGenerics
git_branch: RELEASE_3_15
git_last_commit: 3582d47
git_last_commit_date: 2022-04-26
Date/Publication: 2022-04-26
NeedsCompilation: no
Packaged: 2022-04-26 21:08:47 UTC; biocbuild
Built: R 4.2.0; ; 2022-04-27 13:31:40 UTC; unix

The git_branch tells us that the package was installed (if following 'best practice') from BiocManager::repositories(version = "3.15") but also provides enough information to build from exact checkout via git_url / git_branch / git_last_commit

@hadley
Copy link
Author

hadley commented Apr 24, 2023

@mtmorgan ok, if we can rely on these fields, that makes life easier. I don't think we'll want to install every package from Git (since that means no binaries), but I can certainly use the git_branch to guess at the repository url.

@mtmorgan
Copy link
Collaborator

mtmorgan commented Apr 24, 2023

I spoke a little more with people who are more intimately familiar with this than I am.

There are three types of Bioconductor packages -- software, 'annotation data', and 'experiment data'. The good news is that the software and experiment data packages are under version control and include the git timestamps etc. The bad news is that the annotation packages are not all under version control and so do not contain the timestamps etc. Some (e.g., GO.db) follow a version number convention mirroring the Bioconductor version, but others (e.g., DO.db) are more-or-less arbitrary, e.g., the same package version distributed across Bioconductor and R versions.

I notice in the issue that you link to that accessing 'Archive' directories of the annotation and experiment data repositories fails. I do not know whether this is because there are no annotation or experiment data archives for the six month duration of a Bioconductor release (could be -- the primary value of these packages is that they don't change) or whether in fact these types of packages are not archived. I'll check on this.

UPDATE The annotation- and experiment-data packages never have an 'Archive' location associated with them.

@hadley
Copy link
Author

hadley commented Apr 25, 2023

Thanks for looking into this further 😄 — I'm pretty sure the Archive urls generated in the linked issue are either a bug or some fallback getting activated when it shouldn't be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants