Add Parallelism (again) #9

seonghobae · 2020-01-31T16:36:19Z

Sorry for my misunderstanding, I fix codes properly work what I get reviews in #8 here.

Get rid off lines what related with installed.packages().
Added requireNamespace('future.apply')

…justments.

… R warns in the check procedures.

jwijffels · 2020-01-31T16:45:22Z

Thanks. Looks fine

seonghobae · 2020-01-31T17:00:30Z

Thanks, I'm testing this commits with my real research project; it seems faster if I added multiple clusters with ssh connections with this cluster activation procedure below.

  future::plan(list(
    future::tweak(
      'cluster',
      workers = paste0('[email protected].', 179:180),
      homogeneous = F
    ),
    future::tweak('multiprocess', workers = max(c(
      1, round(parallel::detectCores(logical = F) * .5)
    )))
  ))

jwijffels · 2020-01-31T17:43:56Z

Can you also compare speed wrt pull request #7

seonghobae · 2020-01-31T18:48:42Z

ing> Can you also compare speed wrt pull request #7

Request #7 has some appropriate speed improvements theoretically within the application of the data.table library using primary keys and have beautiful interfaces. However, I can not find out where I can set the number of parallel cores in request #7. Request #7 uses pbapply to display progress information; however, in my knowledge, pbapply API doesn't support any multi-machine environment (only able to single machine parallelism). That means future.apply can support supercomputing works with the 'future' API, but pbapply can't.

I need the 'multi-machine parallelism' environment to real speed improvements to extensive scientific language research with heterogeneous computing. I have ten machines, including my VPS and Workstations; they made significant speed improvements eight times with #9 even I'm using 1Gbps lines. Without any multicore or multimachine based function; the parallelized apply functions like future.apply library; I can not believe any ideas can improve calculation speed without multi-machine and multicore based parallelized apply functions. The data.table speed up the data processing as a temporary in-memory database, not calculation speed improvements.

Request #8 and #9 include nested parallel structures with replacing all of existed *apply functions, not only textrank_sentence but also all of the textrank:: related functions. Even pbapply supports the cl objects from parallel::makeCluster(), however, that hard to support any nested parallelism. Therefore, they can reach speedup of calculations depends on the number of threads and machines. The main issue doesn't exist among data.table library, the core is parallelized apply functions to solve the issue of #7 with among machines.

seonghobae added 5 commits January 31, 2020 23:27

add parallel processing with the future.apply

ed2e378

Move to the 'suggests' field, not in the 'imports' field with some ad…

c0fc96a

…justments.

remove the 'installed.packages' function from the namespace, even the…

a161341

… R warns in the check procedures.

Resolution work for warnings during check procedures

aa5e306

Make a fix that following the maintainer's request.

7a26dcb

seonghobae requested a review from jwijffels January 31, 2020 16:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Parallelism (again) #9

Add Parallelism (again) #9

seonghobae commented Jan 31, 2020

jwijffels commented Jan 31, 2020

seonghobae commented Jan 31, 2020

jwijffels commented Jan 31, 2020

seonghobae commented Jan 31, 2020 •

edited

Loading

Add Parallelism (again) #9

Are you sure you want to change the base?

Add Parallelism (again) #9

Conversation

seonghobae commented Jan 31, 2020

jwijffels commented Jan 31, 2020

seonghobae commented Jan 31, 2020

jwijffels commented Jan 31, 2020

seonghobae commented Jan 31, 2020 • edited Loading

seonghobae commented Jan 31, 2020 •

edited

Loading