最近我遇到了一些Julia数组循环的速度问题。一开始采用的是dot fusion的写法,发现可能有内存分配的速度影响,于是查询文档后决定改用loop的写法,结果却更慢了…
以下是我截取的函数例子:
x = range(0.0, 1.0, length=10000)
y = 0.5:0.5:1.5
z = 0.5:0.5:1.5
var = zeros(10000,3,3,3)
function foo(hx::StepRangeLen,hy::StepRangeLen,hz::StepRangeLen,
var::Array{Float64,4})::Array{Float64,3}
siz = size(var)::NTuple{4,Int64}
px = zeros(Float64,siz[1:3]...)
qy = zeros(Float64,siz[1:3]...)
rz = zeros(Float64,siz[1:3]...)
n = size(hx,1)
# Right now do nothing for the ghost cells maybe needed later!
# Take central differences on interior points
if n > 2
@time for k=1:siz[3], j=1:siz[2], i=1:n-2
px[i+1,j,k] = (var[i+2,j,k,1] - var[i,j,k,1])/(hx[i+2] - hx[i])
end
@time @. px[2:n-1,:,:] = (var[3:n,:,:,1] - var[1:n-2,:,:,1])/(hx[3:n] - hx[1:n-2])
end
px
end
px = foo(x,y,z,var);
速度上显示,dot fusion写法比 explicit loop快了5倍,内存分配也更少:
px = foo(x,y,z,var);
0.006217 seconds (175.37 k allocations: 2.676 MiB)
0.001222 seconds (18 allocations: 1.374 MiB)
我尝试了用@code_warntype 查看生成的type stability,有红色部分,但是我不太能看懂。
我的实际代码中有大量类似的数组操作。请问我哪里做得不对?