线程并行中copy数组的问题

我希望实现一个线程并行的sample,在求解含时演化问题中比较常见的例子(如代码中的 main_serial

随时间变化生成一个矩阵arr_task,这个矩阵是随循环/时间 update的,且当前循环产生的矩阵arr_task是依赖上次的arr_task结果的,所以必须是串行地生成arr_task。然后将arr_task传入process!()中处理,处理结果放在arr_stored中。这样就得到了各个时间对arr_task的处理结果。

代码中的 main_serial 是串行的程序,为了并行,我尝试用copy将arr_task传入process!()处理,但是结果却与串行程序不同,似乎是存在数据竞争(根据我运行后的输出推测)。在julia中文群的群友的帮助下,改为了main_parallel_assign后,将arr_task赋给arr_temp: arr_temp = copy(arr_task),再将arr_temp传入main_parallel_assign后才得以实现了行并提速的愿景。

运行环境:

  1. win10: Microsoft Windows [Version 10.0.17763.4252]
  2. julia Version 1.8.5
  • 运行方法: julia -t4 par.jl

这是par.jl脚本中的代码:

function main_serial()
    arr_stored = zeros(Int, 4)
    arr_task = zeros(Int, 1)
    for time in 1:4
        # get a task, should be serial
        arr_task[1] = time + arr_task[1]
        sleep(1)

        # handle the task
        process!(arr_stored, arr_task, time)
    end
    println("serial : $arr_stored")
end

function main_parallel_copy()
    arr_stored = zeros(Int, 4)
    arr_task = zeros(Int, 1)
    @sync for time in 1:4
        # get a task, should be serial
        # println("copy!")
        arr_task[1] = time + arr_task[1]
        sleep(1)

        # handle the task, but parallel on threads
        Threads.@spawn process!(arr_stored, copy(arr_task), time)
    end
    println("parallel copy : $arr_stored")
end

function main_parallel_assign()
    arr_stored = zeros(Int, 4)
    arr_task = zeros(Int, 1)
    @sync for time in 1:4
        # get a task, should be serial
        arr_task[1] = time + arr_task[1]
        sleep(1)
        arr_temp = copy(arr_task)

        # handle the task, but parallel on threads
        Threads.@spawn process!(arr_stored, arr_temp, time)
    end
    println("parallel assign : $arr_stored")
end

function process!(stored, task, t)
    # time of processing
    @time begin a = rand(100,100)
        [exp(a) for i in 1:100]
    end
    stored[t] = task[1]
end

@time main_serial()
println()
@time main_parallel_copy()
println()
@time main_parallel_assign()

这是运行结果:

  2.040268 seconds (1.50 k allocations: 46.069 MiB, 2.20% gc time)
  2.136096 seconds (1.50 k allocations: 46.069 MiB, 1.82% gc time)
  2.072491 seconds (1.50 k allocations: 46.069 MiB, 1.06% gc time)
  2.073747 seconds (1.50 k allocations: 46.069 MiB, 0.64% gc time)
serial : [1, 3, 6, 10]
 12.372814 seconds (6.39 k allocations: 184.297 MiB, 0.96% gc time, 0.32% compilation time)

  2.692697 seconds (2.70 k allocations: 82.098 MiB, 1.35% gc time)
  2.884239 seconds (3.36 k allocations: 101.799 MiB, 1.26% gc time)
  2.676654 seconds (2.80 k allocations: 84.476 MiB, 1.36% gc time)
  2.124010 seconds (1.52 k allocations: 46.069 MiB, 1.87% gc time)
parallel copy : [3, 6, 10, 10]
  7.806365 seconds (7.28 k allocations: 184.345 MiB, 0.98% gc time, 0.26% compilation time)

  2.763133 seconds (2.64 k allocations: 80.335 MiB, 2.10% gc time)
  3.105436 seconds (3.35 k allocations: 101.645 MiB, 1.87% gc time)
  2.889880 seconds (2.78 k allocations: 83.787 MiB, 3.14% gc time)
  2.118461 seconds (1.52 k allocations: 46.069 MiB, 0.87% gc time)
parallel assign : [1, 3, 6, 10]
  8.015358 seconds (6.95 k allocations: 184.329 MiB, 1.36% gc time, 0.12% compilation time)

我的问题是:

  1. 为什么 将数组copy进一个函数后,外部还是可以对该数组update?
  2. 这种类型的循环,还可以怎么线程并行化?
  3. julia线程并行有没有类似openmp中的firstprivate的变量属性设置选项?