多核并行计算效率问题


#1

一个简单的例子,数组a中的每一个元素除以3,代码如下:

function foo_2(n)
    a = collect(1:n)
    for i = 1:size(a, 1)
        a[i] = a[i]/3
    end
end

julia > @benchmark foo_2(100000000.0)
BenchmarkTools.Trial: 
  memory estimate:  762.94 MiB
  allocs estimate:  2
  --------------
  minimum time:     1.264 s (0.90% GC)
  median time:      1.399 s (8.76% GC)
  mean time:        1.376 s (7.50% GC)
  maximum time:     1.444 s (10.73% GC)
  --------------
  samples:          4
  evals/sample:     1

我将它改成并行的例子后:

using SharedArrays
using Distributed
addprocs(2)

function foo_1(n)
    a = collect(1:n)
    a = SharedArray(a)
    @sync @distributed for i = 1:size(a, 1)
        a[i] = a[i]/3
    end
end

julia > @benchmark foo_1(100000000.0)
BenchmarkTools.Trial: 
  memory estimate:  762.97 MiB
  allocs estimate:  809
  --------------
  minimum time:     10.795 s (0.10% GC)
  median time:      10.795 s (0.10% GC)
  mean time:        10.795 s (0.10% GC)
  maximum time:     10.795 s (0.10% GC)
  --------------
  samples:          1
  evals/sample:     1

我不太理解为什么并行了之后怎么还慢了这么多:joy:,是哪里的问题呢?


#2

这样对比不太合理,多进程的例子里,把共享内存的分配那块算进去了,另外,这里不太适合用@distributed,它会把后面的计算内容发到各个worker去计算,这相当于会发起n次调度,每次调度只做了简单的计算,所以很慢的。

你可以琢磨琢磨文档里的那三个例子:

https://docs.juliacn.com/latest/manual/parallel-computing/#man-shared-arrays-1


#3

这种问题用多线程就可以,开销会比多进程小的多:thinking:


一个并行思路的实现
#4

谢谢,我明白了:grin:


#5

没错,这种确实更适合多线程,我之前看到pmap的文档,可能我理解错了:

Julia’s pmap is designed for the case where each function call does a large amount of work. In contrast, @distributed for can handle situations where each iteration is tiny, perhaps merely summing two numbers. Only worker processes are used by both pmap and @distributed for for the parallel computation. In case of @distributed for, the final reduction is done on the calling process.

所以我一直以为@distributed适合算简单一点儿的例子:joy: