我想计算人类染色体某个指标,然后以染色体并行,最后写到一个csv文件中,但是我用Threads.@threads这个宏时直接就报错了,我如果不加这个是可以直接运行的
using CSV,DataFrames
......
Threads.@threads for number in 1:chr.ngroups#代表我的每个染色体
chromosome=chr[number]
start=chromosome[1,3];final=chromosome[end,3]
chromosomename=chromosome[1,1]
println('[',Dates.format(now(), "YYYY-m-d HH:MM:SS"),']'," Start calculating ",chromosomename,"'s pdrs")
chdf=generatechdf(reader,chromosomename,start,final)#这个函数是自己写的
chdf=lastchdf(chdf)
pdrs=repeat([(NaN,0,0,0)],nrow(chromosome))
pdrspos=calculatepospdrs(chdf,chromosome)#这个函数是自己写的
pdrsneg=calculatenegpdrs(chdf,chromosome)#这个函数是自己写的
pdrs[findall(==("C"),chromosome.Column2)]=pdrspos
pdrs[findall(==("G"),chromosome.Column2)]=pdrsneg
pd=vcat(pd,rename!(DataFrame(pdrs),[:pdr,:discordant,:sum,:allsum]))#这一步vcat相当于拼接染色体结果
println('[',Dates.format(now(), "YYYY-m-d HH:MM:SS"),']'," Finish calculating ",chromosomename,"'s pdrs")
end
[2022-10-18 14:40:23] Start calculating chr16's pdrs
[2022-10-18 14:40:23] Start calculating chr21's pdrs
[2022-10-18 14:40:23] Start calculating chr7's pdrs
[2022-10-18 14:40:23] Start calculating chr1's pdrs
ERROR: TaskFailedException
Stacktrace:
[1] wait
@ ./task.jl:334 [inlined]
[2] threading_run(func::Function)
@ Base.Threads ./threadingconstructs.jl:38
[3] top-level scope
@ ./threadingconstructs.jl:97
nested task error: BGZFStreams.BGZFDataError("invalid gzip identifier")
Stacktrace:
我还有一个问题,我在for循环里面vcat了每个染色体计算的结果,但是我想让他按照1,2,3号染色体去输出,可是他并行算的时候并不是按照1,2,3的顺序,而且最后vcat时会不会两个染色体同时算完,然后这样就乱了?
然后我取消了vcat这个命令。再次运行
Threads.@threads for number in 1:chr.ngroups
chromosome=chr[number]
start=chromosome[1,3];final=chromosome[end,3]
chromosomename=chromosome[1,1]
println('[',Dates.format(now(), "YYYY-m-d HH:MM:SS"),']'," Start calculating ",chromosomename,"'s pdrs")
chdf=generatechdf(reader,chromosomename,start,final)
chdf=lastchdf(chdf)
pdrs=repeat([(NaN,0,0,0)],nrow(chromosome))
pdrspos=calculatepospdrs(chdf,chromosome)
pdrsneg=calculatenegpdrs(chdf,chromosome)
pdrs[findall(==("C"),chromosome.Column2)]=pdrspos
pdrs[findall(==("G"),chromosome.Column2)]=pdrsneg
#pd=vcat(pd,rename!(DataFrame(pdrs),[:pdr,:discordant,:sum,:allsum]))
println('[',Dates.format(now(), "YYYY-m-d HH:MM:SS"),']'," Finish calculating ",chromosomename,"'s pdrs")
end
[2022-10-18 20:17:49] Start calculating chr21's pdrs
[2022-10-18 20:17:49] Start calculating chr1's pdrs
[2022-10-18 20:17:49] Start calculating chr7's pdrs
[2022-10-18 20:17:49] Start calculating chr16's pdrs
ERROR: TaskFailedException
Stacktrace:
[1] wait
@ ./task.jl:334 [inlined]
[2] threading_run(func::Function)
@ Base.Threads ./threadingconstructs.jl:38
[3] top-level scope
@ ./threadingconstructs.jl:97
nested task error: BGZFStreams.BGZFDataError("invalid gzip identifier")
Stacktrace:
[1] bgzferror(message::String)
@ BGZFStreams ~/anaconda3/envs/julia/share/julia/packages/BGZFStreams/bsx6S/src/bgzfstream.jl:350
[2] read_bgzf_block!(input::IOStream, block::Vector{UInt8})
@ BGZFStreams ~/anaconda3/envs/julia/share/julia/packages/BGZFStreams/bsx6S/src/bgzfstream.jl:415
[3] read_blocks!(stream::BGZFStreams.BGZFStream{IOStream})
依旧报错,是我要再写一个函数包含中间的命令么?有没有会并行的大佬给指点下啊,小弟感激不尽