Group tasks from a generator #5

gsouf · 2021-10-05T05:44:16Z

Is your feature request related to a problem?

I have a large dataset that I want to process as a grouped set of tasks. task.group forces me to create 1 function per task to run, it is slow and no efficient for memory.

Describe the solution you'd like

It would be more efficient to have something like task.map that takes a generator or an array of data, as well as a function to execute for each record.

example:

const data = function*() {
  yield 'a';
  yield 'b';
  yield 'c';
};

// or
// data = ['a', 'b', 'c'];

task.map(data, (task, item)=> task(item, () => 
  // so something
}));

Task.map can also manage concurrency and invoke the next function only when a seat is freed:

const data = function*() {
  yield 'a';
  yield 'b';
  yield 'c';
};
task.map(
  data, 
  (task, item)=> task(item, () => 
  // so something
  }),
  {concurrency: 2}
);

I'm happy to send a PR for this

The text was updated successfully, but these errors were encountered:

privatenumber · 2021-10-05T05:56:49Z

Thanks for the suggestion.

I'm not sure I see the benefit of building an API like that though.

I think you can already do the same thing using native API.

With an array of data, you can use Array#map:

const data = ['a', 'b', 'c'];

await task.group(
    task => data.map(
        item => task(`Working on ${item}`, async ({ setTitle }) => {
            await someAsyncTask()
        })
    ),
    {
        concurrency: 2
    }
)

In the case of generators, since the generator doesn't send all the values at once, I don't think a task group can be used. Task Groups are better for when you know all the tasks but they're queued.

With generators, you can do:

const data = function*() {
    yield 'a';
    yield 'b';
    yield 'c';
};

for (const item of data()) {
    task(`Working on ${item}`, async ({ setTitle }) => {
        await someAsyncTask()
    })
}

gsouf · 2021-10-05T07:56:44Z

@privatenumber what you propose is the problem I explained at the beginning of the issue.

I have to create a new callback for each task, instead of reusing the same one for each record. Additionally I don't know in advance how many records I will generate. In my case I have thousands of them and they all run using the same function.

If we can depend on the concurrency of the task system to spawn function as spots are freed then we can write simple code that performs well without the need to create our own concurrency system as that's something already managed by tasuku.

Do you get the idea?

My current code with a big array is a bit slow and that causes issues in the terminal where tasuku spawns thousands of pending line (1 by task). What I want is that he spawns 1 line per task currently running in the map function, when a task completes then it invokes the next one, and so on...

privatenumber · 2021-10-05T19:53:16Z

I have to create a new callback for each task, instead of reusing the same one for each record.

This is just a performance optimization, right? I'm curious if this is actually a measurable difference. Behind the scenes, it's using React via Ink, and I know they are quite memory inefficient (high allocate/de-allocate) by design.

If we can depend on the concurrency of the task system to spawn function as spots are freed then we can write simple code that performs well without the need to create our own concurrency system as that's something already managed by tasuku.

I prefer not to add async/concurrency control to keep tasuku minimal and focused on rendering the task. I think that can be done with another module.

import PQueue from 'p-queue'

const queue = new PQueue({ concurrency: 2 })

for (const item of generator()) {
	queue.add(
		() => task(`Working on ${item}`, async ({ setTitle }) => {
        	await someAsyncTask()
    	})
	);
}

Does that meet your needs?

My current code with a big array is a bit slow and that causes issues in the terminal where tasuku spawns thousands of pending line (1 by task). What I want is that he spawns 1 line per task currently running in the map function, when a task completes then it invokes the next one, and so on...

And you want these tasks to be running with a limited concurrency too, correct?

It might make more sense to simply add a hidePendingTasks option to task.group.

Alternatively, you can directly pass into p-map, which tasuku uses under the hood:

import pMap from 'p-map';

const sites = [
	getWebsiteFromUsername('sindresorhus'), //=> Promise
	'https://avajs.dev',
	'https://github.com'
];

const mapper = site => task(`Working on ${site}`, async ({ setTitle }) => {
     await someAsyncTask()
 })

const result = await pMap(sites, mapper, {concurrency: 2});

console.log(result);

Seems p-map also accepts an iterable, which might address your first problem with the generator.

gsouf · 2021-10-06T09:40:49Z

@privatenumber thanks for the suggestions. I'll check them out shortly.

I just think it would be nice to be able to reuse the concurrency feature of tasuku instead of using p-queue. Also if you want to keep tasuku small and focused, then do the features task.group as well as the concurency param make sense at all? Because you can also replace them with other vendors.

One more thing - just so you know, when I run a dataset of 500 records with

task.group(task => data.map(item => task('title', async () => await doSomething(item)));

takes around 50 seconds (including around 4~5 seconds of overhead from the rest of my code).

Then I run the exact same code, with the only exception that I remove tasuku and I map doSomething asyncs calls into an array that I await Promise.all ex:

await Promise.all(data.map(doSomething));

It takes less than 7 seconds (including the 4~5 seconds of prior overhead).

So there is definitively a bottleneck somewhere here.

I only did quick tests for the moment as I have been short in time, I'll try to come back with a reproducible case as soon as I find a moment

privatenumber · 2021-10-06T17:26:12Z

The primarily feature of task.group is that it shows the pending tasks, which isn't accomplishable otherwise. But the functionality is pretty much all p-map, so it doesn't necessarily add code if you're already using it.

Interesting. I haven't had a case where I needed to run that many tasks but that's definitely a problem.

Thanks, looking forward to the reproduction!

privatenumber added the feature request label Oct 5, 2021

privatenumber mentioned this issue Mar 10, 2022

yoga-layout as a component vue-terminal/vue-termui#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group tasks from a generator #5

Group tasks from a generator #5

gsouf commented Oct 5, 2021 •

edited

Loading

privatenumber commented Oct 5, 2021

gsouf commented Oct 5, 2021

privatenumber commented Oct 5, 2021 •

edited

Loading

gsouf commented Oct 6, 2021 •

edited

Loading

privatenumber commented Oct 6, 2021

Group tasks from a generator #5

Group tasks from a generator #5

Comments

gsouf commented Oct 5, 2021 • edited Loading

Is your feature request related to a problem?

Describe the solution you'd like

privatenumber commented Oct 5, 2021

gsouf commented Oct 5, 2021

privatenumber commented Oct 5, 2021 • edited Loading

gsouf commented Oct 6, 2021 • edited Loading

privatenumber commented Oct 6, 2021

gsouf commented Oct 5, 2021 •

edited

Loading

privatenumber commented Oct 5, 2021 •

edited

Loading

gsouf commented Oct 6, 2021 •

edited

Loading