Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update notebook header #95

Merged
merged 3 commits into from
Sep 6, 2019
Merged

Update notebook header #95

merged 3 commits into from
Sep 6, 2019

Conversation

jrbourbeau
Copy link
Member

This PR highlights our existing header (with binder and GitHub links) using the sphinx .. admonition directive.

Screen Shot 2019-09-04 at 9 10 24 PM

While we're thinking about this, we could also try:

  • Further customizing the title of the blue box (I went with "Live Notebook", but am totally open to other suggestions)

  • Using a different binder button that makes more sense if you don't know what binder is (e.g. something like badge)

xref #94

@TomAugspurger
Copy link
Member

Looks nice, thanks.

Not sure what's up with CI. Do you have time to look into it?

@jrbourbeau
Copy link
Member Author

Hmm I'm able to reproduce the slow behavior when running json-data-on-the-web.ipynb locally. It looks like there are Dask bag from_delayed calls that aren't failing, but are taking a few minutes each to complete. Not sure what the root of the problem is yet. The notebook pulls data from https://archive.analytics.mybinder.org/, but just manually downloading a file and loading with pandas works fine (takes around a second)

@jrbourbeau
Copy link
Member Author

Running the corresponding notebook on the example's binder also works fine, so it's not an issue with https://archive.analytics.mybinder.org/

@jrbourbeau
Copy link
Member Author

Downgrading the fsspec version locally from the current release (0.4.4) to what's in the already built image on binder (0.4.1) fixed the problem for me locally

@TomAugspurger
Copy link
Member

Hmm, which snippet is failing? (cc @martindurant)

@martindurant
Copy link
Member

OK, so in readuntil() (used in readline/iter), the code is extending the current block out to a full block, which is then always a line beyond the end of the current block, so we are requesting a full 5MB for every line. The following code greatly improves this for the case that the data is less than a blocksize (this case):

--- a/fsspec/implementations/http.py
+++ b/fsspec/implementations/http.py
@@ -234,11 +234,12 @@ class HTTPFile(AbstractBufferedFile):
         This is only called when position is still at zero,
         and read() is called without a byte-count.
         """
-        r = self.session.get(self.url, **self.kwargs)
-        r.raise_for_status()
-        out = r.content
-        self.cache = AllBytes(out)
-        self.size = len(out)
+        if not isinstance(self.cache, AllBytes):
+            r = self.session.get(self.url, **self.kwargs)
+            r.raise_for_status()
+            out = r.content
+            self.cache = AllBytes(out)
+            self.size = len(out)

I am open to suggestions for something more general. I am thinking that the current BytesCache should just be simplified to be read-ahead only, instead of the complex logic that has caused so many problems and situations like this.

@TomAugspurger
Copy link
Member

TomAugspurger commented Sep 5, 2019 via email

@martindurant
Copy link
Member

@TomAugspurger , can you please have a careful look at the code in fsspec/filesystem_spec#128 ?

@quasiben
Copy link
Member

quasiben commented Sep 5, 2019

@TomAugspurger and I were chatting and we came to the conclusion that perhaps ignoring the CI failure for now would be ok. The improvement is really great -- downgrading seems like the wrong course given other recent improvements to fsspec. Thoughts?

Note: @martindurant has already added a PR with the suggested code changes: fsspec/filesystem_spec#128

@jrbourbeau
Copy link
Member Author

I'm not sure we can ignore the CI failure since that's where the sphinx docs are built / deployed. Is there a way to tell nbsphinx to skip a specific notebook?

I don't have a full understanding of the changes in fsspec/filesystem_spec#128, but FWIW I can confirm it fixes the problem we're seeing in json-data-on-the-web.ipynb

@quasiben
Copy link
Member

quasiben commented Sep 5, 2019

You can ignore errors https://nbsphinx.readthedocs.io/en/0.2.7/allow-errors.html but that might not be want you want.

@mrocklin
Copy link
Member

mrocklin commented Sep 5, 2019 via email

@martindurant
Copy link
Member

@jrbourbeau , @quasiben , @mrocklin all very welcome to look at fsspec/filesystem_spec#128 - I have not yet decided if the simpler caching should be the default (it makes no difference for this HTTP case, because the data is less than one blocksize big)

@martindurant
Copy link
Member

If you are installing fsspec from master, then this can be closed.

@quasiben
Copy link
Member

quasiben commented Sep 6, 2019

All checks passed! Hooray. @jrbourbeau should we merge ?

@quasiben
Copy link
Member

quasiben commented Sep 6, 2019

If the merge happens we should open up another issue to replace the dev fsspec with the released version

@jrbourbeau jrbourbeau merged commit 3fa7fe8 into dask:master Sep 6, 2019
@jrbourbeau jrbourbeau mentioned this pull request Sep 6, 2019
@jrbourbeau jrbourbeau deleted the update-header branch September 6, 2019 20:10
@jrbourbeau
Copy link
Member Author

we should open up another issue to replace the dev fsspec with the released version

Great idea : ) xref #96

@jrbourbeau
Copy link
Member Author

Also, thanks for fixing upstream @martindurant!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants