-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install nbgitpuller in the base image #2000
Conversation
[nbgitpuller](https://github.com/jupyterhub/nbgitpuller/) is a very popular way to distribute content to users, both on mybinder.org as well as on JupyterHubs. On Binder, this allows for fast launches where the environment is separated from the content (see https://discourse.jupyter.org/t/tip-speed-up-binder-launches-by-pulling-github-content-in-a-binder-link-with-nbgitpuller/922). So by putting nbgitpuller in these images, users can simply pick an image to launch and then just pull in the content they need, instead of having to rebuild their own image. It's also extremely popular when used with JupyterHub, and is included by default in the [TLJH](https://tljh.jupyter.org) distribution.
FYI a previous discussion #751 |
Ah I see! I would really love to see it included in the base image though. As evidence of its usefulness, I can present:
|
Since I think this project doesn't yet have clear acceptance criterias for introducing new software in images, I'd like to make a thorough effort to establish some precedence on what is worth considering.
My conclusionInitially I wasn't confident what I thought even though I value Action points
References
|
I'd add a 6th point to those criteria: Impact on security: E.g. Does the package open additional ports, or add new web endpoints, that could be exploited?
Based on that I'd be reluctant to automatically push it out to every JupyterHub. |
I will take a look at this next week. I'm not familiar with |
Thanks @mathbunnyru! Let me know if I can help in any way :) |
I agree with almost all the points made by @consideRatio (and thank you for sharing these thoughts, they are extremely useful) and I do agree that impact on security is important as well. But I have some doubts about this one.
This assumes that the teacher doesn't want/can't change the single-user image, otherwise, the teacher could have had an image derived from one of our images and installed this package on top (and then shared a link). I think "our images work perfectly in 95% of use cases" is not feasible. There will always be things/features missing and you only need this one more package. That being said, I believe our images should move forward and when new things become widespread (or some things get deprecated), we should adapt our images accordingly. Still, drawing a line between what to include and what not is difficult. Maybe we should have some kind of vote on what to include/exclude. (and the same for the new images like Please, tell me what you think. Sorry, I haven't just merged the PR 😆 |
@yuvipanda could you please do things suggested in |
@mathbunnyru I don't think we should ask for the PR to finalize before voting, as I don't think it gives us more information of relevance to the decision on the intent to approve a PR to add I suggest that we try to decide if |
Ok, makes sense to me, thanks. |
My take is that With I think the redirection part isn't a notable concern, that can be done without nbgitpuller by directly providing a link that does something first. I think the download is a concern though. I'm thinking about an exploit could be to craft a link that downloads a folder that will be read by other software, such as configuration folders Any package installed in the image being malicious would be trouble, so installing anyting is a risk. Installing I think I'll abstain from voting favorably for now, but think a resolution to jupyterhub/nbgitpuller#330 could make me rethink it. |
Ok, I think your point is valid and the upstream issue is relevant to solving it, nice work 👍 I'm currently not convinced people aren't able to install this package to benefit from it, so I'm not in favor of merging this package currently. |
I think @manics has expressed his opinion here. Please, correct me if your opinion changed. |
After considering jupyterhub/nbgitpuller#330 a bit more I'm even more against it. A potential mitigation (in addition to the suggestions in jupyterhub/nbgitpuller#330) is to install nbgitpuller but make it's activation conditional on an environment variable. That way anyone running a JupyterHub can include something like |
Another option- repositories cloned by nbgitpuller most likely need other libraries than what's available in the base/minimal images. The https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#jupyter-datascience-notebook image incorporates a lot of libraries, so a And one more thought (prob for a different issue?), what can we do to make it easier for people to publish their own images with extensions? For example, could we have a template repository with a GitHub workflow or action where the workflow/action takes the template parameters:
The workflow could be written to push to |
I definitely don't want to add another image just for nbgitpuller, in my opinion it's a bit too much for this package and it doesn't just increase the maintenance, but from user perspective they will have to spend more time choosing an image in this case.
We actually already have this (in a bit different implementations though). I also made an attempt to not to use cookiecutter and just create a simple example repo (which can be forked and changed), but it didn't get enough attention, so I didn't push it. |
+1 for not adding another image. My drive comes from these two facts:
That said, I agree it's important that people not get docker images that are insecure because of features they don't use at all, and non-educational cases often don't have any need for nbgitpuller! So how about we implement the following changes in nbgitpuller:
With these, nbgitpuller links would only create directories (it already fails if the directory is not empty) based on the name of the repo being cloned. If that sounds ok, I'm happy to work on implementing these in nbgitpuller. |
I started jupyterhub/nbgitpuller#332 as a draft to show how we could disable targetPath by default. |
Thanks @yuvipanda. I am not as optimistic about 95% without external packages as you are. Still, I see your point about education and JupyterHub + these docker images are an essential part of many educational programs. And I'm pretty sure you're right about the use of That being said, if you fix 2 action points above, I think I would be ok with merging
I have no idea if it's easy, fits Jupyter Hub well, or something like that, but I think it's a nice option to have, and this way we'll be a bit more secure. But for me, 2 action points above are a must, and this one is nice to have. |
Both the images in use at UCMerced don't have nbgitpuller installed in them, while the previous combined image did. I've been working on getting nbgitpuller into upstream jupyter docker-stacks (jupyter/docker-stacks#2000) for a while, but it looks like it'll take a bit longer. So I've temporarily created a repo (https://github.com/2i2c-org/scipy-notebook-with-nbgitpuller/) to use here. It's based on the same tag as before, but with nbgitpuller installed. We can get rid of this once nbgitpuller lands upstream. I have created https://github.com/2i2c-org/rocker-with-nbgitpuller/ to do the same for use with rocker. This too is a temporary repo. I hopefully will have a better longer term solution speced out next week. This unblocks ucmerced currently teaching. Ref https://2i2c.freshdesk.com/a/tickets/1089
Marked this as a draft, since we won't be merging this until |
Closing this PR, as there is no activity here |
Describe your changes
nbgitpuller is a very popular way to distribute content to users, both on mybinder.org as well as on JupyterHubs. On Binder, this allows for fast launches where the environment is separated from the content (see https://discourse.jupyter.org/t/tip-speed-up-binder-launches-by-pulling-github-content-in-a-binder-link-with-nbgitpuller/922). So by putting nbgitpuller in these images, users can simply pick an image to launch and then just pull in the content they need, instead of having to rebuild their own image. It's also extremely popular when used with JupyterHub, and is included by default in the TLJH distribution.
Checklist (especially for first-time contributors)