我希望实现一个线程并行的sample,在求解含时演化问题中比较常见的例子(如代码中的 main_serial
。
随时间变化生成一个矩阵arr_task,这个矩阵是随循环/时间 update的,且当前循环产生的矩阵arr_task是依赖上次的arr_task结果的,所以必须是串行地生成arr_task。然后将arr_task传入process!()中处理,处理结果放在arr_stored中。这样就得到了各个时间对arr_task的处理结果。
代码中的 main_serial
是串行的程序,为了并行,我尝试用copy将arr_task传入process!()处理,但是结果却与串行程序不同,似乎是存在数据竞争(根据我运行后的输出推测)。在julia中文群的群友的帮助下,改为了main_parallel_assign
后,将arr_task赋给arr_temp: arr_temp = copy(arr_task)
,再将arr_temp传入main_parallel_assign
后才得以实现了行并提速的愿景。
运行环境:
- win10: Microsoft Windows [Version 10.0.17763.4252]
- julia Version 1.8.5
- 运行方法:
julia -t4 par.jl
这是par.jl脚本中的代码:
function main_serial()
arr_stored = zeros(Int, 4)
arr_task = zeros(Int, 1)
for time in 1:4
# get a task, should be serial
arr_task[1] = time + arr_task[1]
sleep(1)
# handle the task
process!(arr_stored, arr_task, time)
end
println("serial : $arr_stored")
end
function main_parallel_copy()
arr_stored = zeros(Int, 4)
arr_task = zeros(Int, 1)
@sync for time in 1:4
# get a task, should be serial
# println("copy!")
arr_task[1] = time + arr_task[1]
sleep(1)
# handle the task, but parallel on threads
Threads.@spawn process!(arr_stored, copy(arr_task), time)
end
println("parallel copy : $arr_stored")
end
function main_parallel_assign()
arr_stored = zeros(Int, 4)
arr_task = zeros(Int, 1)
@sync for time in 1:4
# get a task, should be serial
arr_task[1] = time + arr_task[1]
sleep(1)
arr_temp = copy(arr_task)
# handle the task, but parallel on threads
Threads.@spawn process!(arr_stored, arr_temp, time)
end
println("parallel assign : $arr_stored")
end
function process!(stored, task, t)
# time of processing
@time begin a = rand(100,100)
[exp(a) for i in 1:100]
end
stored[t] = task[1]
end
@time main_serial()
println()
@time main_parallel_copy()
println()
@time main_parallel_assign()
这是运行结果:
2.040268 seconds (1.50 k allocations: 46.069 MiB, 2.20% gc time)
2.136096 seconds (1.50 k allocations: 46.069 MiB, 1.82% gc time)
2.072491 seconds (1.50 k allocations: 46.069 MiB, 1.06% gc time)
2.073747 seconds (1.50 k allocations: 46.069 MiB, 0.64% gc time)
serial : [1, 3, 6, 10]
12.372814 seconds (6.39 k allocations: 184.297 MiB, 0.96% gc time, 0.32% compilation time)
2.692697 seconds (2.70 k allocations: 82.098 MiB, 1.35% gc time)
2.884239 seconds (3.36 k allocations: 101.799 MiB, 1.26% gc time)
2.676654 seconds (2.80 k allocations: 84.476 MiB, 1.36% gc time)
2.124010 seconds (1.52 k allocations: 46.069 MiB, 1.87% gc time)
parallel copy : [3, 6, 10, 10]
7.806365 seconds (7.28 k allocations: 184.345 MiB, 0.98% gc time, 0.26% compilation time)
2.763133 seconds (2.64 k allocations: 80.335 MiB, 2.10% gc time)
3.105436 seconds (3.35 k allocations: 101.645 MiB, 1.87% gc time)
2.889880 seconds (2.78 k allocations: 83.787 MiB, 3.14% gc time)
2.118461 seconds (1.52 k allocations: 46.069 MiB, 0.87% gc time)
parallel assign : [1, 3, 6, 10]
8.015358 seconds (6.95 k allocations: 184.329 MiB, 1.36% gc time, 0.12% compilation time)
我的问题是:
- 为什么 将数组copy进一个函数后,外部还是可以对该数组update?
- 这种类型的循环,还可以怎么线程并行化?
- julia线程并行有没有类似openmp中的firstprivate的变量属性设置选项?