Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk download satellite data concurrently via asyncio #2

Open
weiji14 opened this issue Jun 30, 2017 · 1 comment
Open

Bulk download satellite data concurrently via asyncio #2

weiji14 opened this issue Jun 30, 2017 · 1 comment

Comments

@weiji14
Copy link
Owner

weiji14 commented Jun 30, 2017

Use python 3.5's built-in asyncio module to concurrently bulk download satellite data from http/ftp servers.

See:
Hackernoon blog post
asynctio
aioftp docs
aiohttp docs

weiji14 referenced this issue Jun 30, 2017
Very experimental and buggy get_cryosat.py at the moment. Much reliant on riohttp framework from https://hackernoon.com/asyncio-for-the-working-python-developer-5c468e6e2e8e. TODO port to rioftp instead...

Temporarily backport to using atom stable 1.18.0 instead of 1.19 beta due to hydrogen truncated output bubble issue nteract/hydrogen#898
@weiji14
Copy link
Owner Author

weiji14 commented Jul 3, 2017

In order to be a bit nice on people's server infrastructure, and prevent FTP error 421 "Too many connections", use semaphores to limit number of simultaneous FTP connections.

Helpful examples of Python 3 implementations of Semaphores in asyncio:

Official Python3 API docs on the implementation:
https://docs.python.org/3/library/asyncio-sync.html#semaphores

Note use of async with sem: syntax in Python 3.6? But earlier versions (e.g. Python 3.4/3.5 ) may use something like with (yield from sem): or with (await sem).

weiji14 referenced this issue Jul 3, 2017
…emaphore limit

Better code execution workflow using asyncio.new_event_loop(). This creates a new async event loop so we don't have to shut down hydrogen kernels every time we run the script. I.e. You can re-run the script over and over again in the same IPython console.

Change from python os module to pathlib module for more high level and elegant filename parsing (may break windows compatibility). 

Migrate asynchronous function's previous dependency on ftplib to aioftp instead, so that synchronous and asynchronous code are now fully independent. Intention to deprecate synchronous codeblock on next commit, leaving both in here for record/benchmark-comparison purposes.

Implement asyncio.Semaphore to prevent raising of FTP 421 Too many connections ... error. Possibility to softcode the current '7' connection limit using a smart check loop in the future?
weiji14 referenced this issue Jul 14, 2017
Get working copy of new asyncio based get_cryosat.py, and an up to date copy of the atom-hydrogen-beta dockerfile (with commented out fallback atom-stable code lines)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant