Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Concurrent HTTP Request Count Configurable #1

Open
hallahan opened this issue Jan 2, 2015 · 8 comments
Open

Make Concurrent HTTP Request Count Configurable #1

hallahan opened this issue Jan 2, 2015 · 8 comments

Comments

@hallahan
Copy link

hallahan commented Jan 2, 2015

Hi Seth,

I'm using your tool to scrape an endpoint and populate an mbtiles database. I'm noticing that the requests are being done in series rather than there being a set of parallel requests. I think it's ok to hit my endpoint harder and get what I want faster...

Could you give me a clue on how to set maybe 10 parallel HTTP requests? I'm thinking I need to tweak some parameter to the async module?

Nick

@mojodna
Copy link
Owner

mojodna commented Jan 2, 2015

Right, that. I'd mostly been using it to convert stuff locally where concurrency mattered less. I've been meaning to add configurable support to this for a while.

It's possible that you'll get the behavior you're looking for using the tilelive command line utilities.

(Stream-of-consciousness exploration follows...)

Here's the block responsible for setting up the copy, which points at tilelive-streaming being the place to add support:

https://github.com/mojodna/tl/blob/master/lib/commands/copy.js#L78-L90

Readable is the source stream.

highWaterMark: 32 should tell it to buffer up to 32 tiles at a time, but as you've seen, that doesn't seem to work in practice. I don't see anything there that artificially limits it, so I'm going to look at createWriteStream, which is specific for mbtiles: lib/mbtiles.js

@mojodna
Copy link
Owner

mojodna commented Jan 2, 2015

createWriteStream leads me back to Writable and Collector, as it primarily adds logic to handle opening and closing SQLite files.

The general approach with streaming here is that createReadStream returns a single stream that emits byte streams containing tile data. The underlying "collector" is a non-objectMode stream that emits tile events once all data has been gathered. (I took this approach with the intent of later supporting getTile methods that also return streams, preventing them from needing to fully buffer data from their underlying source; not relevant for Mapnik (or SQLite?), but sensible for HTTP and file sources and useful for pipeing to HTTP responses, etc.)

I'm still not seeing anything that imposes a bottleneck from a read-through, so I think I'll try doing what you're doing and adding some logging to tilelive-http to see what the request / [stream] pulling behavior looks like.

@mojodna
Copy link
Owner

mojodna commented Jan 2, 2015

Here's the copy command I'm using:

tl copy -z 5 -Z 5 tilejson+http://tile.stamen.com/toner/index.json mbtiles://./toner.mbtiles

@mojodna
Copy link
Owner

mojodna commented Jan 2, 2015

Oh, right. That uses tilejson under the hood; I should be using:

tl copy -z 5 -Z 5 http://tile.stamen.com/toner/{z}/{x}/{y}.png mbtiles://./toner.mbtiles

@mojodna
Copy link
Owner

mojodna commented Jan 2, 2015

Figured it out. I was ignoring readable.push's return value, so the internal buffer was never being filled, so it was only making one request per _read call.

This is the relevant commit:
mojodna/tilelive-streaming@57acb82

It's part of [email protected] and tl is now 0.3.1, so reinstall / upgrade and you should start seeing better performance. Let me know how it goes.

It's not configurable yet (it defaults to 32, which is Readable's current highWaterMark), so I'm leaving this open. Setting that according to the constructor's options, creating an option in the copy command's args, and passing it through would do the trick.

@hallahan
Copy link
Author

hallahan commented Jan 7, 2015

Thanks for the fix Seth! Your tool is extremely useful. It will be saving us a lot of time!

@mojodna mojodna changed the title Set Number of Concurrent HTTP Requests Make Concurrent HTTP Request Count Configurable Apr 15, 2015
@mojodna
Copy link
Owner

mojodna commented Apr 30, 2015

Refs mojodna/tilelive-streaming#2

@mojodna
Copy link
Owner

mojodna commented Apr 30, 2015

mojodna/tilelive-streaming@a90f328 and mojodna/tilelive-http@9dffd42 now make this possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants