请问大家,在对大量数据中搜索时,怎么提高速度。我本来以为findall
有类似findnext
的性能问题。在issue里搜了一下,发现也没有。然后我就改了一个不用findall
的版本,从benchmark
看,并没有多大提升。
然后我又试了一下,应该是和数据量有关系。dov_all
的数据量大概在300万,减少数据量到10000提升就很明显了。
我觉得这可能不是julia
语言的问题了,但还是希望大家给点意见。
function search_point(lon::T,lat::T,limit::T,all_data::Array{T,2}) where {T<:Float64}
flag1=findall(x-> lon+limit >= x >= lon-limit,all_data[:,1])
flag2=findall(x-> lat+limit >= x >= lat-limit,all_data[:,2])
return all_data[intersect(flag1,flag2),:]
end
julia> @benchmark point_list=search_point(110.0,10.0,2/60/2,$dov_all)
BenchmarkTools.Trial:
memory estimate: 30.75 MiB
allocs estimate: 75
--------------
minimum time: 11.407 ms (0.00% GC)
median time: 11.823 ms (0.00% GC)
mean time: 13.111 ms (10.63% GC)
maximum time: 20.530 ms (43.83% GC)
--------------
samples: 381
evals/sample: 1
function search_point(lon::T,lat::T,limit::T,all_data::Array{T,2}) where {T<:Float64}
flag1=@. lon+limit > all_data[:,1]> lon-limit
temp=all_data[flag1,:]
flag2=@. lat+limit > temp[:,2]> lat-limit
return temp[flag2,:]
end
julia> @benchmark point_list=search_point(110.0,10.0,2/60/2,$dov_all)
BenchmarkTools.Trial:
memory estimate: 15.60 MiB
allocs estimate: 29
--------------
minimum time: 11.108 ms (0.00% GC)
median time: 11.679 ms (0.00% GC)
mean time: 12.380 ms (5.61% GC)
maximum time: 21.287 ms (19.34% GC)
--------------
samples: 404
evals/sample: 1
julia> @benchmark point_list=search_point(110.0,10.0,2/60/2,$dov_all[1:10000,:])
BenchmarkTools.Trial:
memory estimate: 419.38 KiB
allocs estimate: 25
--------------
minimum time: 103.835 μs (0.00% GC)
median time: 110.987 μs (0.00% GC)
mean time: 140.061 μs (17.53% GC)
maximum time: 12.806 ms (98.27% GC)
--------------
samples: 10000
evals/sample: 1