Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Campaign to get people Publishing Wheels #25

Open
dstufft opened this issue Nov 3, 2013 · 133 comments
Open

Campaign to get people Publishing Wheels #25

dstufft opened this issue Nov 3, 2013 · 133 comments

Comments

@dstufft
Copy link
Member

dstufft commented Nov 3, 2013

How can we get more people to publish Wheels, especially for Windows? Christoph Gohlke published Windows installers but that won't work for Wheel because he won't have rights to upload them.

Perhaps the Build farm I've wanted to do can be used here?

@alex
Copy link
Member

alex commented Jan 2, 2014

http://pythonwheels.com/ is an attempt at this, now that pip 1.5 installs them by default this should be easier.

I think one part of this would be to make the setup.py ... package uploading process more streamlined and do the right thing.

@kura
Copy link

kura commented Jan 5, 2014

I was thinking surely it makes sense for a simple "build" command that made wheels, eggs and an sdist by default? Rather than having to specify each one separately?

Am I wrong in thinking that you still need to install another package just to create wheels?

@alex
Copy link
Member

alex commented Jan 5, 2014

Yes, you need to pip install wheel before setup.py bdist_wheel works. Also, you really shouldn't be making eggs ;)

@hickford
Copy link
Contributor

As of 2015 Christoph Gohlke publishes wheels rather than msi installers http://www.lfd.uci.edu/~gohlke/pythonlibs/

@brainwane
Copy link
Contributor

@scopatz is this something you could comment on?

@scopatz
Copy link

scopatz commented Jul 3, 2019

Thanks for roping me into this issue @brainwane.

I am speaking a on behalf of conda-forge here. But basically, we'd love it if conda-forge could be used to build & publish wheels. To that end, it might be more useful to think of conda-forge as just "The Forge."

We have the infrastructure for building binary packages across Linux, OS X, Windows, ARM, and Power8 already. We have a tool called conda-smithy that we develop and maintain that helps us keep all of the packages / recipes / CIs configured and up-to-date.

I see two major hurdles to building and deploying wheels from conda-forge. These could be worked on in parallel.

Building: conda-smithy would need to be updated so that packages that are configured to do would generate the approriate CI scripts (from Jinja templates) to build wheels. This would be CI-provider and architecture specific. Probably the easiest place to start is building from manylinux on Azure. We would probably need at least one configuration variable to live in conda-forge.yml that actively enables wheel building (enable_wheels: true? enable_wheels: {linux-64: true}?). Conda-smithy reads this file when it rerenders a feedstock (a git repo with a specific structure for building packages). There are probably some subtleties and difficulties here with working through which compiler toolchains should be used on different platforms (there is really only the manylinux standard for linux). But this is the basic idea.

The challenge with building is that most of the conda-forge people are not used to building wheels. I am happy to help work on the conda-forge infrastructure side, but I think we need someone who is an expert on the wheels side who is also willing to jump in and help scale this out with me.

Deploying: Once we can build wheels, we need a place to put them. Nominally, this would be PyPI. But we need to be able to do this from a CI service. We are happy to have an authentication token that we use. There isn't much that I see that conda-forge can really do about this (which has prevented us from working on this issue previously). However, I think that the PyPI is working on this.

I am super excited about this; the fundemental premis of conda-forge is to be open source, cross platform, community build infrastructure. If there are other folks out there who are enthusiatsic about getting this working, please reach out to me or put me touch!

@brainwane
Copy link
Contributor

Thanks @scopatz! @waveform80 and @bennuttall would you like to speak from the piwheels perspective? And @jwodder, from what you have learned via Wheelodex? (Found out about you via this thread.)

@astrojuanlu
Copy link

astrojuanlu commented Jul 9, 2019

Perhaps the work that @matthew-brett did at MacPython to build wheels of key packages of the Scientific Python stack will be helpful as well. Also, I discovered cibuildwheel by @joerick recently. (Edit: wrong Matthew Brett)

@bennuttall
Copy link

bennuttall commented Jul 9, 2019

For the piwheels project we build arm platform wheels for the Raspberry Pi, built natively on Raspberry Pi hardware, on piwheels.org we don't try to bundle dependencies ala manylinux2010, instead we target what's stable in the distro (Raspbian) and make no promises elsewhere. The project source itself is open, so others could run their own repos targeting other platforms.

I don't recommend maintainers upload arm wheels, and instead let us build them knowing they work on the Pi.

We also attempt to show library dependencies on our project pages e.g. https://www.piwheels.org/project/numpy/ rather than let people work them out e.g. https://blog.piwheels.org/how-to-work-out-the-missing-dependencies-for-a-python-package/

@mingwandroid
Copy link

Hi @scopatz, what do you propose to do about shared libraries that have no natural place in a wheel (to me, most shared libraries have no natural place in a wheel).

We cannot stick our heads in the sand on that. That we use shared libraries heavily in conda is one of our most compelling advantanges and because we use the same ones across languages, putting those shared libraries in a wheel would be a bad thing to do.

I'm not coming with a solution here. I wish I were, I really do.

@matthew-brett
Copy link

It would probably be neater not to ship the external libraries in the wheels, but it has in practice been working, at least on Linux and macOS.

I can see the problem is more urgent for Conda, because y'all are building a multi-language software distribution.

A few years ago, @njsmith wrote a spec for pip-installable libaries: pypa/wheel-builders#2

It isn't merged, and it looks like 'the current setup works for me' has meant that no-one thus far has had the time or energy to work further on that. I suspect something on that line is the proper solution, if we could muster the time.

@matthew-brett
Copy link

By the way - @scopatz - I'm happy to help integrating the wheel builds into conda forge - but I'm crazy busy these days, so I won't have much time for heavy lifting.

@mingwandroid
Copy link

It would probably be neater not to ship the external libraries in the wheels, but it has in practice been working, at least on Linux and macOS.

Well, the software needs to work of course and I'm not being facetious!

We end up discussing where the line is between the thing itself and the system libraries that support it, and that's not clear cut. Take xgboost as an example. It has a C/C++ library and bindings for Python and R. Now xgboost itself builds static libs for each so they sidestepped that issue while we're much more efficient (n many dimensions). Now libxgboost is clearly a part of the xgboost stack, but what about ncurses? Is it system or not? In conda-forge, we provide it, and in all honesty that line is organic and something we move as and when we find we need to.

@pradyunsg
Copy link
Member

@brainwane @scopatz if there's a better title for this issue today, could you change it/comment so that someone else who can make the change, changes it?

@snakescott
Copy link

I can offer mild packaging familiarity, reasonable python / CI / cloud experience and say 10-20 hours a week for the next month if it would be helpful. I think I would be a good fit if there's a rough consensus on direction and pypa/conda experts available for consulting but bottlenecked on elbow grease

cc @brettcannon @dstufft @asottile

@mikofski
Copy link

@matthew-brett I thought Carl Kleffner did something similar to a pip installed tool chain with openBLAS for NumPy though my memory might be foggy

@matthew-brett
Copy link

@mikofski - right - Carl was working on Mingwpy, which was (still is) a pip-installable gcc compiler chain to build Python extensions that link against the Python.org Microsoft Visual C++ runtime library.

Work has stalled on that, for a variety of reasons, although I still think it would be enormously useful. I can go into more details - or - @carlkl - do you want to give an update here?

@mattip - because we were discussing this a couple of weeks ago.

@teoliphant
Copy link

It would probably be neater not to ship the external libraries in the wheels, but it has in practice been working, at least on Linux and macOS.

I can see the problem is more urgent for Conda, because y'all are building a multi-language software distribution.

A few years ago, @njsmith wrote a spec for pip-installable libaries: pypa/wheel-builders#2

It isn't merged, and it looks like 'the current setup works for me' has meant that no-one thus far has had the time or energy to work further on that. I suspect something on that line is the proper solution, if we could muster the time.

I don't know if we have a clear answer that pip should be used as a general-purpose packaging solution. My view which seems to be shared by several others from the recent discourse discussion about it is that it should not try to "reinvent the wheel" or replace general purpose packaging solutions (like conda, yum, apt-get, nix, brew, spack, etc...), pip has clear use as a packaging tool for developers and "self-integrators".

For that use case, statically linking dependencies into a wheel (vendoring native dependencies) can be a stop-gap measure but become very difficult for distributors as evidenced by pytorch, rapids, arrow, and other communities. It is definitely not ideal and in-fact a growing problem with promoting the use of wheels for all Python users.

Using pip to package native libraries is conceivably possible, but a bigger challenge than it seems at first. It is hard to understand the motivation for this considerable work when this problem is already solved by several other open-source and more general-purpose packaging systems.

A better approach in my view is to enable native-library requirements to be satisfied by external packaging systems. In this way, pip can allow other package managers to install native requirements and only install wheels with native requirements if they are already present.

Non-developer, end-users who use Python integrated with many other libraries (such as the PyData and SciPy users) should be also be encouraged to use their distribution package manager to get their software. These distributions (such as conda-forge) already satisfy robustly the need for one-command installation. This is a better user-experience than encouraging these particular users to "pip install"

In sum: conda-forge infrastructure producing wheels is a good idea, conda-build recipes producing wheels that allow for conda-packages to satisfy native-library dependencies is an even better idea.

@pfmoore
Copy link
Member

pfmoore commented Jul 14, 2019

@teoliphant While theoretically a reasonable idea, this ignores the fact that a significant number of users are asking for pip-installable versions of these packages. Ignoring those users, or suggesting that they should "just" switch to another packaging solution, is dismissing a genuine use case without sufficient investigation.

I know from personal experience that there are people who do need such packages but who can't or won't switch to Conda (for example). And on Windows there is no OS-level distribution package manager. How do we serve such users?

@msarahan
Copy link

msarahan commented Jul 14, 2019 via email

@mingwandroid
Copy link

mingwandroid commented Jul 15, 2019 via email

@matthew-brett
Copy link

matthew-brett commented Jul 15, 2019 via email

@astrojuanlu
Copy link

What difficulties are pytorch, rapids, arrow having? I'm happy to advise.

For arrow, I think it's best summarized here:

https://twitter.com/wesmckinn/status/1149319821273784323

  • many C++ dependencies
  • several bundled shared libraries
  • some libraries statically linked
  • privately namespaced, bundled version of Boost

@matthew-brett
Copy link

@wesm - I'm happy to help with this - let me know if I can. Did you already contact the scikit-build folks? I have the impression they are best for C++ chains. (Sorry, I can't reply on Twitter, have no account).

@wesm
Copy link

wesm commented Jul 15, 2019

I believe we have one of the most complex package builds in the whole Python ecosystem. I think TensorFlow or PyTorch might have us beat, but it's close (it's obviously not a competition =D).

I haven't contacted the scikit-build folks yet, if that could help us simplify our Python build I'm quite interested. I'm personally all out of budget for this after I lost a third or more of my June to build and package-related issues so maybe someone else can look into it

cc @pitrou @xhochy @kszucs @nealrichardson

@matthew-brett
Copy link

Thanks - that sounds very tiring. I bet we can use this as a stimulus to improve the tooling. Would you mind making an issue in some sensible place in the Arrow repositories for us to continue the discussion?

@pitrou
Copy link

pitrou commented Jul 15, 2019

I'll echo what @wesm said here. I spent a lot of time as well trying to cope with wheel packaging issues on PyArrow. I'd be much happier if people accepted to settle on conda for distribution and installation of compiled Python packages.

(disclaimer: I used to work for Anaconda but don't anymore. Also I own a very small amount of company shares)

@matthew-brett
Copy link

@pitrou - I hear the hope, but I really doubt that's going to happen in the short term. So I still think the best way, for now, is for those of us with some interest and time, to try and improve the wheel building machinery to the point where they are a minimal drain on your development resources.

@wesm
Copy link

wesm commented Jul 15, 2019

Just to drop some statistics to indicate the seriousness of this problem, our download numbers are growing to the same magnitude as NumPy and pandas

$ pypistats overall pyarrow
|    category     | percent | downloads  |
|-----------------|--------:|-----------:|
| with_mirrors    |  50.18% |  9,700,974 |
| without_mirrors |  49.82% |  9,630,781 |
| Total           |         | 19,331,755 |

$ pypistats overall numpy
|    category     | percent |  downloads  |
|-----------------|--------:|------------:|
| with_mirrors    |  50.15% | 114,356,740 |
| without_mirrors |  49.85% | 113,661,813 |
| Total           |         | 228,018,553 |

$ pypistats overall pandas
|    category     | percent |  downloads  |
|-----------------|--------:|------------:|
| with_mirrors    |  50.12% |  67,694,077 |
| without_mirrors |  49.88% |  67,358,042 |
| Total           |         | 135,052,119 |

One of the reasons for our complex build environment is that we're solving problems that are very difficult or impossible to solve without a deep dependency stack. So there is no end in sight to our suffering with the current state of wheels

@matthew-brett
Copy link

@isuruf - a tip of the hat for your enterprise sir! How easy would this be to generalize?

@scopatz
Copy link

scopatz commented Aug 9, 2019

I have released the first version of conda-press (v0.0.1) https://github.com/regro/conda-press. It would be awesome if you all could play around with it & help improve it!

Next steps are to:

  • Add to conda-smithy (off by default) so that we can build these wheels on conda-forge along with the original package
  • Figure out how to deploy to PyPI (probably using the new tokens)

@pfmoore
Copy link
Member

pfmoore commented Aug 9, 2019

I have release the first version of conda-press (v0.0.1) https://github.com/regro/conda-press

Excellent news! Are the resulting wheels compatible with "normal" wheels from PyPI? (Specifically on Windows - I'm assuming that the Linux ones won't be manylinux-compatible...)

To put it another way, the README talks about pip install into a conda managed environment, but doesn't say anything about pip installing wheels built with conda-press into a non-conda Python installation. Is that a use case you're intending to support?

@scopatz
Copy link

scopatz commented Aug 9, 2019

@pfmoore - Great questions! These are intended as general purpose wheels. I'll update the README to make that more clear.

The Windows wheels should be compatible with non-conda installations because conda-forge goes to great lengths to be compatible with externally built wheels. Also, on the Linux side, they still should be compatible, even though they don't adhere to the manylinux specs (yet). This is because the wheels built here should still be ABI compatible. If there are compatibility issues, we should look into fixing them.

@pfmoore
Copy link
Member

pfmoore commented Aug 9, 2019

Cool - that's really great to know.

@isuruf
Copy link

isuruf commented Aug 9, 2019

@scopatz, how are the shared libraries handled?

@scopatz
Copy link

scopatz commented Aug 9, 2019

@isuruf - Depends what you mean by handled and where they live. Normal ones live in site-packages/lib/ dir. Extension modules live in their normal place. CLIs live in site-packages/bin/ and get proxy scripts that set $LD_LIBRARY_PATH as needed. Everything gets relinked to the proper dir.

@isuruf
Copy link

isuruf commented Aug 9, 2019

What happens if A and B were compiled with libfoo=1.x and they talk to each other using libfoo objects and you update A compiled with libfoo=2.x and libfoo is ABI incompatible with major version?

Note that pip dependency resolution is different than conda's and doesn't ensure a consistent environment.

@scopatz
Copy link

scopatz commented Aug 9, 2019

It depends on how you build the wheel with conda-press. If you are building "fat" wheels (with all non-python deps included in a single wheel), then the second package installed would clobber the first.

If "thin" wheels were built, then libfoo would get its own wheel and be managed by pip and the version pins would kick in saying packages are incompaible.

I am very open to having an option where the shared lib/ goes into a subdirectory with the package name. I think that this could work on platforms that support multiple RPATHs. PRs welcome!

@msarahan
Copy link

msarahan commented Aug 10, 2019 via email

@pradyunsg
Copy link
Member

pradyunsg commented Aug 10, 2019

It depends on how you build the wheel with conda-press. If you are building "fat" wheels (with all non-python deps included in a single wheel), then the second package installed would clobber the first.

This is a pip bug worth fixing pypa/pip#4625. sigh

If "thin" wheels were built, then libfoo would get its own wheel and be managed by pip and the version pins would kick in saying packages are incompaible.

s/would/should/ because... pypa/pip#988. sigh

@msarahan
Copy link

This is a pip bug worth fixing pypa/pip#4625. sigh

Even if you fix the bug, this is still a situation where the only truly correct behavior is at most to offer a user override, but more likely to error out. I really think the better answer for these fat wheels is to dodge the issue by avoiding conflicts, especially since auditwheel has that mostly figured out (except for windows, as I understand it - but machomachomangler was promising?)

s/would/should/ because... pypa/pip#988. sigh

and remember, even with a solver, the quality of the job it can do is limited by the quality of the metadata available to it. The solver in pip will be a big step forward, but it will take a long time to work out the kinks in the metadata. It's an ongoing exercise for conda and our package ecosystem, and it always will be, because our understanding of compatibility is always improving, and new breaks in compatibility are always appearing.

@pradyunsg
Copy link
Member

even with a solver, the quality of the job it can do is limited by the quality of the metadata available to it.

Yea, I agree.

For now though, doing a better job with the metadata we do have is, as you said, a big step forward. :)

@msarahan
Copy link

I hope you're bracing for lots of people being upset about the solver. We hear a lot of people complain about conda that it is too rigid - it won't let them install things that "should work." They like that pip will just do what they want, presumably because any damage that results is something they either don't see, or don't correlate with pip ignoring some other constraint. Being technically correct isn't always a comfortable place to be, unfortunately. Before you put the solver into place as the default mechanism, it would probably be a good idea to make a blog post or docs page explaining what the solver does and why it is really a good thing, even though it may present some new issues that people aren't used to thinking about.

@brainwane
Copy link
Contributor

@msarahan Thanks for bringing this up! Mind putting that comment in the pip resolver rollout planning issue pypa/pip#6536 ? Thanks.

@scopatz
Copy link

scopatz commented Aug 10, 2019

@msarahan - PRs very welcome!

@pradyunsg
Copy link
Member

I hope you're bracing for lots of people being upset about the solver.

Yep yep. @brainwane helpfully linked to the issue where, I am hoping, we can figure out what's the best way to deal with potential breakage for users that doing proper resolution can cause and how to handle the communication around it.

@msarahan
Copy link

@msarahan - PRs very welcome!

@scopatz, so the answer to

Why not put needed shared libraries alongside their extension modules and
use auditwheel to avoid collisions at load time? It'll be less efficient
but much safer

is "that would be better, yes, but I can't or don't want to spend the time to do that?" or what? I didn't say you should do anything. I asked about a glaring technical flaw in your implementation, and wondered why you wouldn't try to address that. Saying PR's welcome here is dodging the question and trying to pawn off responsibility for the issue that your tool creates. You should label your wheels as "great in isolation, but exercise caution when using more than one." And by caution, I mean judicious use of backups before every operation, since pypa/pip#4625 indicates that there's currently no real way to know when damage is going to happen until it happens.

@scopatz
Copy link

scopatz commented Aug 10, 2019

Sorry @msarahan - not trying to dodge the question, per se. I am on vacation and was answering from my phone. I also don't want to get sucked into a big this-vs-that debate.

We all have our opinions about the technical issues here. (And I agree with you on the technical merits.) The problem conda-press is solving is that it is difficult to easily create wheels on variety of platforms. What is out & implemented right now is a first step (v0.0.1). I agree that their are a lot of ways that the wheels it builds could be improved. But I am (conda-press?) isn't going to necessarily take a lot of sides in the various discussions, e.g. thin vs fat wheels, passing audit-wheel vs not. I believe that their are different use cases out there and conda-press is a place where we can come together to build some tooling around these. I think that there are a good number of people out there who just want any wheel, even if that comes with the "only safe in isolation" caveat. People use a lot of virtual environments these days, so isolation & known-work-well-together packages reasonable.

It is probably a good idea to have a tracking issue for use cases.

My hope with conda-press is that we can together build a tool that helps us publish better wheels that cover more use cases over time. I know that this going to require more eyes than just my own, and want to make it clear that the project is a kind and welcoming place to contribute.

@scopatz
Copy link

scopatz commented Aug 10, 2019

I also totally want to have these kinds of conversations about what should conda-press do, but over at conda-press. Maybe I should have said "please open an issue" instead of a PR above. Sorry about any confusion.

@di
Copy link
Member

di commented Jun 27, 2022

For folks following this issue, there is some related discussion happening over here as well: psf/fundable-packaging-improvements#19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests