Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jgo managed packages in conda #1

Closed
jakirkham opened this issue Nov 20, 2018 · 17 comments
Closed

jgo managed packages in conda #1

jakirkham opened this issue Nov 20, 2018 · 17 comments
Labels
question Further information is requested

Comments

@jakirkham
Copy link
Member

As other packages are looking to use jgo particularly in a conda setting, have a question about its use. Namely where should artifacts downloaded by jgo live? According to the README IIUC these live in the users home directory. However in many cases conda users have gotten use to package content living within the conda environment. For instance, with julia we install all packages into a special path within the environment. The advantage being these are isolated from other environments with different julia installs and possibly other non-conda installs of julia. Would this strategy make sense for jgo and Maven installed packages? Or are there use cases where this would be problematic?

@ctrueden
Copy link
Member

ctrueden commented Nov 20, 2018

Good question. jgo calls mvn to do the actual artifact resolution and download. Maven's local repository cache is in ~/.m2/repository by default. The jgo code then links files from there into ~/.jgo under subfolders named after the endpoints. By default, hard links are used, but jgo can be configured to do soft/symlinks or none/copying instead as the user desires. Currently this is specified in a ~/.jgorc file; see the docs. Here is where the default configuration is computed.

To make things more conda-environment-friendly, we could enhance these defaults. It could be as simple as wrapping all the pathlib.Path.home() calls in a function that uses $CONDA_PREFIX if that variable is set, or pathlib.Path.home() otherwise. What do you think, @jakirkham?

Tricky things and downsides:

  1. Java release artifacts are immutable, and it is rather wasteful to cache them separately per environment. But I guess the environment isolation is preferred to the space savings here?

  2. Users may still expect ~/.jgorc to work. Maybe we should check for $CONDA_PREFIX/etc/jgorc and union that with ~/.jgorc? I don't know. @hanslovsky what do you think?

  3. From the code, I don't see anywhere that mvn is told the path to use for its local repository cache. The original purpose of the m2Repo setting was to align it with Maven's behavior—not to actually override the repository cache directory that mvn will use. We should probably change this. One way or another, we'll need to teach Maven to use $CONDA_PREFIX/.m2/repository or similar when conda is in play. (Probably we should pass -Dmaven.repo.local=<path> to the mvn execution.)

A middle ground could be to leave Maven's local repository cache alone, but keep the .jgo cache folder under $CONDA_PREFIX. The danger there is it increases the chances of the Maven local repo cache and jgo cache folders being on two different partitions, preventing hard links from being used.

@hanslovsky
Copy link
Contributor

Generally, I would like to use $HOME/.m2/repository with jgo, regardless if it's from within a conda environment or not. My workflow to run development versions outside an IDE looks like this:

mvn clean install
conda activate # if not yet activated
jgo groupId:artifactId:version-SNAPSHOT

This would be made unnecessarily complicated if the default ~/.m2/repository cache location were to be changed. If there is a strong desire to make the location of the local maven repository configurable, I feel that it should be through an environment variable (empty or ~/.m2/repository by default). Maybe MAVEN_OPTS is the right choice, and jgo could parse the contents to check for the location of that repo (not sure that that is even necessary). In addition, the maven package could set MAVEN_OPTS appropriately. If I use non-conda maven, I would prefer not to set it by default, though.

Users may still expect ~/.jgorc to work. Maybe we should check for $CONDA_PREFIX/etc/jgorc and union that with ~/.jgorc? I don't know. @hanslovsky what do you think?

Using two rc files seems odd to me. Which one would take precedence in case of clashes? I feel that only one rc file should be read. We could make jgo configurable to use an arbitrary rc file. I personally would still expect it to use ~/.jgorc by default, even from within a conda environment, but that might be handled differently in conda, in general. How are rc files treated in conda enviornments, @jakirkham? It seems to me that, for example, ipython reads the config from my ~/.ipython directory.

@jakirkham
Copy link
Member Author

These are all good points.

Conda has a package cache that is actually hard linked per environment. So saving space should be possible with either approach.

The RC file location is a really good point. With julia we only use the file in the environment. However in other cases like keras, we use the default RC file location if found and fallback to our preincluded one otherwise. We have also played with solutions where a user can configure this with an environment variable. In other cases we do nothing. This depends a lot on what functionality the RC file has (or will have in the future). Also these use cases rely a lot on user feedback.

Certainly understand the motivation for using the default maven repository path. That said, installing packages could prove tricky for cases where users lack internet connection for whatever reason or are relying on old conda packages from conda's cache to reproduce a result though.

Interesting. I'm not familiar with how the internals of jgo work. What is in jgo's package cache?

@ctrueden
Copy link
Member

ctrueden commented Nov 20, 2018

If there is a strong desire to make the location of the local maven repository configurable, I feel that it should be through an environment variable (empty or ~/.m2/repository by default).

The Maven Way™ to control it is via ~/.m2/settings.xml—see here. I thought it used to work to set the M2_REPO environment variable, but I just tested it and that does not work on my system with Maven 3.5.0. I don't think it's necessary to complexify jgo here, but willing to do so if others disagree.

What is in jgo's package cache?

Here is an example from a fresh install, @jakirkham:

$ jgo org.scijava:parsington
> ^D
$ tree .jgo
.jgo
└── org
    └── scijava
        └── parsington
            └── RELEASE
                ├── mainClass
                ├── parsington-1.0.3.jar
                └── pom.xml

4 directories, 3 files
$ test .jgo/org/scijava/parsington/RELEASE/parsington-1.0.3.jar -ef .m2/repository/org/scijava/parsington/1.0.3/parsington-1.0.3.jar && echo same
same
$ cat .jgo/org/scijava/parsington/RELEASE/mainClass
org.scijava.parse.Main
$ cat .jgo/org/scijava/parsington/RELEASE/pom.xml
<project>
	<modelVersion>4.0.0</modelVersion>
	<groupId>org.scijava-BOOTSTRAPPER</groupId>
	<artifactId>parsington-BOOTSTRAPPER</artifactId>
	<version>0</version>
	<dependencyManagement>
		<dependencies></dependencies>
	</dependencyManagement>
	<dependencies><dependency><groupId>org.scijava</groupId><artifactId>parsington</artifactId><version>RELEASE</version></dependency></dependencies>
	<repositories><repository><id>imagej.public</id><url>https://maven.imagej.net/content/groups/public</url></repository><repository><id>jitpack</id><url>https://jitpack.io</url></repository></repositories>
</project>

(The <repository> elements are only there because of my personal .jgorc configuration.)

In the example above, I requested to launch the org.scijava:parsington artifact at the newest release version, which it discerned as 1.0.3. The main class was extracted from the Main-Class attribute of the META-INF/MANIFEST.MF file of the parsington-1.0.3.jar file.

Bundling all the stuff into a directory like this makes launching simpler: the actual invocation of java ends up being something like this:

java -cp ~/.jgo/org/scijava/parsington/RELEASE/'*' "$(cat ~/.jgo/org/scijava/parsington/RELEASE/mainClass)"

So as you can see, the endpoint cache is really minimal, particularly when hard or soft links are used. I don't feel strongly whether we use $CONDA_PREFIX/.jgo or ~/.jgo as the cacheDir when run from a conda environment. I see both sides of it: on one hand, I want the hard-linking to work in as many situation as possible, but on the other I see the value in environmental isolation of the endpoint cache.

installing packages could prove tricky for cases where users lack internet connection for whatever reason or are relying on old conda packages from conda's cache to reproduce a result

Note that once an endpoint exists in the cacheDir, jgo will never rebuild it unless you feed the -u or -U flag. So things should stay reproducible unless the user requests an update. And they should be reproducible regardless if the user gave an explicit release version in the endpoint.

I'm pretty pressed for time at the moment preparing for the I2K conference, but willing to revisit this after the new year if work needs to be done.

@jakirkham
Copy link
Member Author

Ok, so the contents of .jgo are just hard linked to ~/.m2/repository, correct? If so, your proposal of moving .jgo to $CONDA_PREFIX sounds good. Though we can probably dispense with the . prefix. 😉

As to the hard linking concerns, while it may be possible for these issues to come up, am not sure how often they will show up in practice. Most users install conda in their home directories anyways. After all that is the beauty of having relocatable packages that don't require sudo (particularly in administered systems scenarios 😄).

In the cases where conda is installed in some root owned directory and locked down, users are still free to install packages, which will be placed in a cache in their home directory (unless they override it). Same story with the environments they create. It may be possible as packages become more stable system administrators will begin including them on the locked down conda side. That may cause hard linking to fail depending on how these different locations are setup, but this starts to becomes a system administrator issue as opposed to a packaging issue.

No worries. This can wait I think. Just wanted to raise this point before we had 100s of packages like this. 😄

@ctrueden
Copy link
Member

Ok, so the contents of .jgo are just hard linked to ~/.m2/repository, correct?

Yes, by default. If you specify links = soft in your .jgorc then it uses symlinks. That used to be the default, but I changed it for some reason I already forgot (so things would work better on Windows, maybe?).

Most users install conda in their home directories anyways.

On macOS, brew cask install miniconda installs to /usr/local/miniconda3 by default.

That may cause hard linking to fail depending on how these different locations are setup

Probably our best bet long-term would be to enhance the linking logic of jgo to be something like links = auto by default, which auto-senses whether hard and/or symlinks will work as desired, and only uses copying as a last resort. I filed scijava/jgo#22 to remember this idea for later.

@ctrueden
Copy link
Member

your proposal of moving .jgo to $CONDA_PREFIX sounds good. Though we can probably dispense with the . prefix.

Something like this?

diff --git a/jgo/jgo.py b/jgo/jgo.py
index 2c1d612..80c26f5 100644
--- a/jgo/jgo.py
+++ b/jgo/jgo.py
@@ -264,7 +264,7 @@ def default_config():
     # settings
     config.add_section('settings')
     config.set('settings', 'm2Repo', os.path.join(str(pathlib.Path.home()), '.m2', 'repository'))
-    config.set('settings', 'cacheDir', os.path.join(str(pathlib.Path.home()), '.jgo'))
+    config.set('settings', 'cacheDir', os.path.join(os.getenv('CONDA_PREFIX', str(pathlib.Path.home())), '.jgo'))
     config.set('settings', 'links', 'hard')

     # repositories

That keeps the . prefix, though... why would we dispense with it? Where under $CONDA_PREFIX would it make best sense to put this cache directory? Just directly under there?

@hanslovsky
Copy link
Contributor

I think that jgo should not check for conda environment but a variable like JGO_CACHE_DIR. The conda package should then set this variable appropriately.

Apologies for brevity, sent from phone.

@ctrueden
Copy link
Member

@hanslovsky I agree about JGO_CACHE_DIR. Do you have time to file a PR? I am swamped with I2K through mid December.

@hanslovsky
Copy link
Contributor

@ctrueden Once scijava/jgo#23 is merged we can go on here.

@jakirkham
Copy link
Member Author

Sorry, catching up here.

Most users install conda in their home directories anyways.

On macOS, brew cask install miniconda installs to /usr/local/miniconda3 by default.

Interesting. Was not aware of that. Curious how well these to package managers behave when installed like that.

AFAIK the recommended way to download Miniconda or Anaconda remains to get them from Anaconda directly.

That said, a sysadmin could install Conda somewhere where users could not modify it. Users of this install could still create new environments and download/install packages (insomuch as the admin has not blocked these). These environments and packages would live in the users home directory in this case. Though these locations are configurable.

That keeps the . prefix, though... why would we dispense with it?

Shouldn't be any need for hidden directories within the Conda environment.

I think that jgo should not check for conda environment but a variable like JGO_CACHE_DIR. The conda package should then set this variable appropriately.

SGTM

Can backup the old value for this variable as well. That way it can be easily restored after deactivating.

@ctrueden
Copy link
Member

Curious how well these to package managers behave when installed like that.

It works well because Homebrew installations are owned by the active user, rather than root. So the miniconda installation is mutable as usual.

Once scijava/jgo#23 is merged we can go on here.

Merged!

@hanslovsky
Copy link
Contributor

It works well because Homebrew installations are owned by the active user,

If multiple users install miniconda through brew, does that not clash with the install location at /usr/local/miniconda3?

I used to install python-conda from the Arch Linux User Repository but started installing into my home directory through the miniconda installer because it is more straight forward to follow conda examples if the base environment is not immutable (without root).

Merged!

Thank you!

@ctrueden ctrueden added the question Further information is requested label Nov 27, 2018
@ctrueden
Copy link
Member

ctrueden commented Nov 27, 2018

If multiple users install miniconda through brew

Multi-user brew installations are very rare in my experience. Some people do it. See this SO post for some related issues. Not really Conda's problem! 😉

I used to install python-conda from the Arch Linux User Repository but started installing into my home directory through the miniconda installer because it is more straight forward to follow conda examples if the base environment is not immutable (without root).

Yeah, I'm not a fan of Linux package managers for inherently extensible things like Conda, Eclipse and Fiji. Too much brokenness.

@hanslovsky
Copy link
Contributor

Is this resolved with #4?
The conda package now defines JGO_CACHE_DIR to reside within conda environment.

Or should we wait until for the next jgo release that will contain 'auto' link type (and potentially tests)?

Not really Conda's problem!

Agreed.

Yeah, I'm not a fan of Linux package managers for inherently extensible things like Conda, Eclipse and Fiji. Too much brokenness.

I feel the issue is the clash of paradigms (global settings/behavior that are locally overwritten by user config files/extension (what Linux expects) vs user-space only local installations). Apart from that, I never had issues extending eclipse or intellij (or conda) installed through package manager. (Eclipse settings is a whole other topic, but as we say in German "Das Kind ist in den Brunnen gefallen").

@jakirkham
Copy link
Member Author

At least from my perspective, yes, it is solved.

On these other details, will defer to others.

Not really Conda's problem!

Yeah, this sums up my feelings on these more unusual installs. Only wanted to point out in many cases like this Conda still tries to do something reasonable and provide as many options to the end user as possible.

@hanslovsky
Copy link
Contributor

At least from my perspective, yes, it is solved.

Will close then (interpreting @ctrueden's thumbs up as agreement).

Solved in #4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants