-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container for building sphinx-based documentation #7
Comments
I think this is a great idea and would really help streamline adding new
documentation.
…On Fri, Oct 23, 2020 at 11:24 AM Bill Sacks ***@***.***> wrote:
This is not at all high priority, but I wanted to mention this as
something to think about. It just occurred to me that it could be very
helpful if we had a Docker container with the software prerequisites needed
for building the sphinx-based documentation that we use throughout CESM. At
least in CTSM, we are asking scientists both inside NCAR and externally to
build the documentation themselves, and getting all the tools installed can
be a major barrier for them.
The installation requirements are documented here
https://github.com/ESCOMP/CTSM/wiki/Directions-for-editing-CLM-documentation-on-github-and-sphinx.
Essential prerequisites are sphinx, our sphinx theme, latexmk and maybe
rst2pdf, and git-lfs. I don't understand containers well enough to know if
there would be issues with any of this (I'm especially thinking about
whether there are issues with making git-lfs in a container play nicely
with your system's git and any git repositories that exist outside the
container), but we could hopefully at least make it much easier to get an
environment set up to build the documentation, even if we can't get 100% of
the way there.
Another benefit of this is that we could ensure that we all use the same
version of sphinx, which would avoid the annoying thing that happens now,
where building with a different sphinx version leads to changes in all of
the generated html pages.
@briandobbins <https://github.com/briandobbins> @mvertens
<https://github.com/mvertens> @mnlevy1981 <https://github.com/mnlevy1981>
@negin513 <https://github.com/negin513> @wwieder
<https://github.com/wwieder>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4XCE3SQIOIT4VSB6WDOKDSMG35VANCNFSM4S42RWFQ>
.
--
Mariana Vertenstein
CESM Software Engineering Group Head
National Center for Atmospheric Research
Boulder, Colorado
Office 303-497-1349
Email: [email protected]
|
I also have no idea what's involved here, but the benefits seem significant!
Will
…On Fri, Oct 23, 2020 at 11:24 AM Bill Sacks ***@***.***> wrote:
This is not at all high priority, but I wanted to mention this as
something to think about. It just occurred to me that it could be very
helpful if we had a Docker container with the software prerequisites needed
for building the sphinx-based documentation that we use throughout CESM. At
least in CTSM, we are asking scientists both inside NCAR and externally to
build the documentation themselves, and getting all the tools installed can
be a major barrier for them.
The installation requirements are documented here
https://github.com/ESCOMP/CTSM/wiki/Directions-for-editing-CLM-documentation-on-github-and-sphinx.
Essential prerequisites are sphinx, our sphinx theme, latexmk and maybe
rst2pdf, and git-lfs. I don't understand containers well enough to know if
there would be issues with any of this (I'm especially thinking about
whether there are issues with making git-lfs in a container play nicely
with your system's git and any git repositories that exist outside the
container), but we could hopefully at least make it much easier to get an
environment set up to build the documentation, even if we can't get 100% of
the way there.
Another benefit of this is that we could ensure that we all use the same
version of sphinx, which would avoid the annoying thing that happens now,
where building with a different sphinx version leads to changes in all of
the generated html pages.
@briandobbins <https://github.com/briandobbins> @mvertens
<https://github.com/mvertens> @mnlevy1981 <https://github.com/mnlevy1981>
@negin513 <https://github.com/negin513> @wwieder
<https://github.com/wwieder>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5IWJHDRYVOOOQMK2ZE7ETSMG35VANCNFSM4S42RWFQ>
.
|
Hi all, Installing these is pretty trivial; I just did a quick trial and was able to already do it. It does add just under 400MB of packages (fonts, largely) to the image, so you could have separate containers for 'using' CESM (no documentation tools) and another for 'development' (including those tools), but frankly, I think a simple one-stop-shop is better, especially given the community nature of the model. We could even add a tutorial on adding documentation if people feel it helps. In my opinion, the primary question in terms of installation is whether we want this in the 'base' image (which all the ESCOMP images build off of - it includes MPI, HDF, NetCDF, etc), or the CESM image? If other ESCOMP models use Sphinx, we probably want it in the base. If not, we probably want it in a CESM image. I'd like to get Ben's ( @bekozi ) take on this. What does ESMF do for documentation? As for git issues, I don't expect any, unless we get severe version mismatches between the containerized and native versions of git, and people swap back and forth between the two, but I'll set up a test container and walk through the documentation instructions myself as a quick test. Maybe one or two of us can get together next week and try it out too? |
Cool, thanks @briandobbins ! Good point that a one-stop-shop may be better here. For CTSM purposes (and I'm guessing similarly for other components), it doesn't really make sense to get the whole of CESM, but I also guess there isn't a huge downside to that if it is easier to maintain or use that way. Many of the ESCOMP models do use sphinx, so my opinion is that it would make sense to put this in the base image, but I don't know enough about this to have a very good sense. I'm curious if there's a third option of having an image that extends the base ESCOMP image and adds this documentation stuff, and then the CESM image could extend that further (like multiple levels of inheritance)? But I could also see that maybe being too complex. I should note that I don't think most components need latexmk and rst2pdf: CTSM needs these because of the way we handle equations and also because we generate pdf as well as html documentation. I'd be happy to spend some time with you next week trying it out, if you don't mind holding my hand through the process. |
Just sent you an invite for Monday, Bill. I just walked through the
process, and it 'succeeded' but there were some warnings. Let's establish
if this will indeed work well, and if so, then we'll decide how best to
include the packages then. (An image that extends the base is definitely
possible, but probably not necessary - I'm leaning towards the base image,
especially if other ESCOMP models use this. We try to put all the 'common'
stuff in there.)
Thanks!
…On Fri, Oct 23, 2020 at 12:22 PM Bill Sacks ***@***.***> wrote:
Cool, thanks @briandobbins <https://github.com/briandobbins> ! Good point
that a one-stop-shop may be better here.
For CTSM purposes (and I'm guessing similarly for other components), it
doesn't really make sense to get the whole of CESM, but I also guess there
isn't a huge downside to that if it is easier to maintain or use that way.
Many of the ESCOMP models do use sphinx, so my opinion is that it would
make sense to put this in the base image, but I don't know enough about
this to have a very good sense. I'm curious if there's a third option of
having an image that extends the base ESCOMP image and adds this
documentation stuff, and then the CESM image could extend that further
(like multiple levels of inheritance)? But I could also see that maybe
being too complex.
I should note that I don't think most components need latexmk and rst2pdf:
CTSM needs these because of the way we handle equations and also because we
generate pdf as well as html documentation.
I'd be happy to spend some time with you next week trying it out, if you
don't mind holding my hand through the process.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACL2HPMHWEQJM4VTHBYC2L3SMHCYDANCNFSM4S42RWFQ>
.
|
ESMPy uses sphinx. The base image recipe is here: https://github.com/ESCOMP/ESCOMP-Containers/blob/master/ESMF/doc/esmpy-doc-base/Dockerfile The actual documentation is built in this recipe: https://github.com/ESCOMP/ESCOMP-Containers/blob/master/ESMF/doc/esmpy-doc/Dockerfile It's a little strange because the ESMPy documentation requires an ESMF build. This could be hacked by just providing an For reference, the ESMF CI builds the above containers and publishes the docs:
This approach allows for Docker layer caching in the base build. The ESMF doc recipes, which do not use sphinx, are located in the same folder. Ping @rsdunlapiv @him-28 |
A brief update -- I've added the Sphinx (and related) utilities to the base-centos8 image, used to build CESM, CESM-Lab and ESMF. The current ESMPy images seem to use an Ubuntu base instead, so they're unaffected. This seemed like a good approach if we expect future ESCOMP containers will also use Sphinx, since it ensures the 'base' image has all the common libraries / tools (now, MPI, compilers, NetCDF, HDF5, PNetCDF, Sphinx, etc), and the 'application' images are simpler. That said, I did also build an 'escomp/sphinx' image that just has the tools needed for documentation, and thus is a fair bit smaller. I think it'll take some feedback from potential users to see what the best approach is - eg, if people are using the CESM container (for example) to run / develop AND do documentation, maybe the 'sphinx' container is superfluous. But if they're using larger systems like Cheyenne for runs, but doing documentation on their laptops, then perhaps the Sphinx one is useful. As such, I'm going to leave this issue open for now, but did want to provide that update. I'll revisit this in the coming weeks. The CTSM wiki page on using the containers for documentation is here, for anyone interested: |
Thanks so much @briandobbins ! I'm a little confused about this (maybe just because I'm a noob when it comes to the use of containers): If sphinx is part of the base image, then what's the need for the 'escomp/sphinx' image? Could someone just get 'escomp/base' and get everything they need? Or is there some reason why that doesn't work (or isn't a preferred way of working)? If just getting 'escomp/base' doesn't make sense, then I also don't have a great sense right now of the best way to go here – suggesting escomp/sphinx or the one-stop-shopping escomp/cesm. I think the relevant questions are: (1) how hard it is to maintain the separate 'escomp/sphinx' image, and (2) what fraction of people who want the sphinx image are also going to want to run CESM on the same machine where they are building the documentation. I don't have any sense of either of these. I'm happy to defer to your intuition, or to leave things unsettled for now and return to this question later, once we have a better sense of what users are doing. |
Good question - the basic difference is that, broadly speaking, it's like asking for a hammer. I can give you just the hammer (escomp/sphinx), or I can give you a toolbox with a hammer, a screwdriver, and some nails (escomp/cesm.. or maybe escomp/base-centos8, but more on that in a minute). If you KNOW all you need is the hammer, maybe that's a better solution, since it's smaller. But if you might need other tools, the 'toolbox' version is better. In this case, the 'base' and 'cesm' images contain things people don't need for documentation, like GNU compilers, MPI, NetCDF, HDF5, PNetCDF, etc. Does this hurt in any way? Not really, it just adds some size to the image, but it goes against the principle of simple, focused tools. As for escomp/base-centos8 (there is no straight 'base', in case we later do bases on other distros... but maybe that should be reassessed too), the (minor) issue there is we aren't yet creating the 'user' account, so the mapping of directories upon running is slightly different. I'd left that off originally because inside the CESM container (for example), we make the 'user' account a member of the 'ncar' group, and figured other applications might do their own. But now, I'm thinking maybe we make an 'escomp' group, add the user during the 'base' install, and make the 'base' image into escomp/base, and tag versions (eg, 'escomp/base:centos8'). This would give us more consistency ('escomp' for the user's group, paths are always /home/user), a simpler naming convention ('escomp/base') for the base container, and eliminate the need to maintain an extra container (sphinx). It does still give you the whole 'toolbox', at a cost of a few hundred extra megabytes, but that seems relatively minor. Ultimately I expect most users will download the application containers, as opposed to the 'base', but at this point it's a moot point since both are maintained anyway. Thanks for the feedback - I hope that made sense. I'll start updating things, and will send you fixes to the documentation soon, too. |
I'm not following all of the details, but it sounds like you have a good path forward. I had been confused, I guess: I was thinking that the base image was the minimal set, and then everything else just added to that. I'm still somewhat confused, but since it sounds like you have a path forward that you're happy with, I'm happy to go along with that. (I really don't know enough to have opinions of my own at this point.) |
Basically, we had a 'tree' originally - the 'base' image, on which CESM, CESM-Lab and ESMF containers were built. The 'Sphinx' container was a whole new tree, with no relation to the base image, since the 'minimal set' for building the documentation doesn't include the compilers, NetCDF libraries, etc. I think doing away with that, and just including things in the base (your original understanding, I think), is a better path forward after seeing your feedback. So from now on, the 'base' image will indeed include Sphinx, and we'll have a single hierarchy again. Easier to maintain and understand, at the minor expense of a slightly larger download. |
Ah, got it, now I understand. Yes, your proposed path forward is what I had understood all along, and this makes sense to me. |
@briandobbins The cime and cesm documentation have an extra requirement:
Can you please add that to the base image when you get a chance? |
No problem, I'll have this set shortly, but I'm working on one other fix
before I push the new images.
If you want to test it in your running image, and haven't already, you can
just do:
*sudo /usr/bin/pip3 install sphinxcontrib-programoutput*
.. And that should do it.
- Brian
…On Thu, Oct 29, 2020 at 2:37 PM Bill Sacks ***@***.***> wrote:
@briandobbins <https://github.com/briandobbins> The cime and cesm
documentation have an extra requirement:
pip install sphinxcontrib-programoutput
Can you please add that to the base image when you get a chance?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACL2HPPPUWZP5EZP5ANZCHTSNHHCJANCNFSM4S42RWFQ>
.
|
Thanks @briandobbins . No rush. I ran into another issue, though it may not impact many people beyond me: I use git worktrees heavily in my development workflow. This becomes a problem in the CTSM documentation build, because this build invokes a git lfs pull. In a git worktree, the .git directory is replaced with a text file giving the absolute path to the parent git repository, e.g.:
So when trying to execute a git command from within the Docker image, I get a message like this:
This is because in Docker-land, this path doesn't exist. From browsing some StackOverflow suggestions, I came up with the solution of doing this from inside my Docker shell:
This seems to work, but I wanted to check with you:
By the way, I am going down the path of building the docker run command into a python script to do the build, rather than suggesting use of an interactive docker terminal session. I am able to leverage a python tool I had already built to wrap the documentation build process, and I think this will make things easier for users. And issues like the above can be handled programatically rather than needing to run those commands manually all the time. |
Ah, this is a good question. I guess before looking at potential ways to make it easier, I'd love to get from you a sense of how common that use case is? In short, it seems like there's two approaches:
If you're unsure the container ID, you can get it while it's running via 'docker ps'. Note that the above line would the modified image to a new docker container called 'cesm', as opposed to the 'escomp/cesm-2.2' one you're running, so in the future you'd want to just run your local one... otherwise those changes won't persist! I definitely don't see that causing any issues, no. Conceivably, if you had symlinks or references outside that 'ctsm' directory, that would be a problem, but I don't think that's likely?
In this case, instead of mapping whatever directory you provide to /home/user, we allow you to specify what to map it to - in this case, /Users/sacks/ctsm. I feel like this is more dangerous because it removes the commonality of always knowing what a container user's home directory is (/home/user), and may run into issues if someone specifies something weird, like /var. Can we do it? Probably, but unless this is a very common use case, I think the slight customization of the Docker environment by the user is better. We can even post documentation on how to do this for expert users. Chances are there are other approaches too, but that's the best off the top of my head. When I get some time, I'll try to duplicate this and look into ways of dealing with it. |
Thanks a lot for your thoughts @briandobbins . I think there's no need for you to spend more time on this git worktree-related question for now. My takeaway is that it's safe to put the mkdir & ln commands in the docker run command in my python wrapper script for now, and we can revisit this if it's causing more general issues down the line. |
@briandobbins - Okay, I have added capability for using docker in ESMCI/doc-builder#3 . No need to look through that unless you're interested, but I thought I'd point out a few key elements in case it helps for future tooling:
|
This is not at all high priority, but I wanted to mention this as something to think about. It just occurred to me that it could be very helpful if we had a Docker container with the software prerequisites needed for building the sphinx-based documentation that we use throughout CESM. At least in CTSM, we are asking scientists both inside NCAR and externally to build the documentation themselves, and getting all the tools installed can be a major barrier for them.
The installation requirements are documented here https://github.com/ESCOMP/CTSM/wiki/Directions-for-editing-CLM-documentation-on-github-and-sphinx. Essential prerequisites are sphinx, our sphinx theme, latexmk and maybe rst2pdf, and git-lfs. I don't understand containers well enough to know if there would be issues with any of this (I'm especially thinking about whether there are issues with making git-lfs in a container play nicely with your system's git and any git repositories that exist outside the container), but we could hopefully at least make it much easier to get an environment set up to build the documentation, even if we can't get 100% of the way there.
Another benefit of this is that we could ensure that we all use the same version of sphinx, which would avoid the annoying thing that happens now, where building with a different sphinx version leads to changes in all of the generated html pages.
@briandobbins @mvertens @mnlevy1981 @negin513 @wwieder
The text was updated successfully, but these errors were encountered: