# 多核并行计算效率问题

``````function foo_2(n)
a = collect(1:n)
for i = 1:size(a, 1)
a[i] = a[i]/3
end
end

julia > @benchmark foo_2(100000000.0)
BenchmarkTools.Trial:
memory estimate:  762.94 MiB
allocs estimate:  2
--------------
minimum time:     1.264 s (0.90% GC)
median time:      1.399 s (8.76% GC)
mean time:        1.376 s (7.50% GC)
maximum time:     1.444 s (10.73% GC)
--------------
samples:          4
evals/sample:     1
``````

``````using SharedArrays
using Distributed

function foo_1(n)
a = collect(1:n)
a = SharedArray(a)
@sync @distributed for i = 1:size(a, 1)
a[i] = a[i]/3
end
end

julia > @benchmark foo_1(100000000.0)
BenchmarkTools.Trial:
memory estimate:  762.97 MiB
allocs estimate:  809
--------------
minimum time:     10.795 s (0.10% GC)
median time:      10.795 s (0.10% GC)
mean time:        10.795 s (0.10% GC)
maximum time:     10.795 s (0.10% GC)
--------------
samples:          1
evals/sample:     1
``````

#### 我不太理解为什么并行了之后怎么还慢了这么多，是哪里的问题呢？

https://docs.juliacn.com/latest/manual/parallel-computing/#man-shared-arrays-1

1 个赞

Julia’s pmap is designed for the case where each function call does a large amount of work. In contrast, @distributed for can handle situations where each iteration is tiny, perhaps merely summing two numbers. Only worker processes are used by both pmap and @distributed for for the parallel computation. In case of @distributed for, the final reduction is done on the calling process.

1 个赞

``````a./3
``````

• 用MKL自带并行
• 用CuArray
• 刚才所说的分割法
不过，点+除法 在大多数情况下有不错的性能，因此个人认为除非这个地方特别耗时，没必要过早压榨这个时间
1 个赞