我正在写一个量子多体计算中的精确对角化的程序,因为Python过慢所以转向Julia,以下是我的两个程序,其中Julia是按照Python写的。
结果发现在Windows系统上当sparse_hamiltonian()和Hamiltonian()第二个参数传入14时,python用时4s,限制OpenBLAS线程后5s,Julia用时5s,当传入16时,python用时70s,限制OpenBLAS线程后120s,Julia用时166s,并且第二次运行158s,第三次150s。我的代码耗时主要是写矩阵的部分以及对矩阵进行对角化两部分。详细看了一下用时其中写矩阵python限制OpenBLAS线程是24s, julia是26s,那么np.linalg.eigvalsh和eigvals函数分别是100s, 140s。我也尝试将Julia的矩阵写成Symmetric(Hamiltonian(1,16))或者是Hermitan(Hamiltonian(1,16))的形式,但反而更慢了。当用Mac OS系统跑完全相同的代码的时候,情况正好反过来,python用时8min(限制OpenBLAS线程后),不限制跑用了7min09s,Julia用时60s。
我的Windows系统是Windows10,AMD Ryzen5 4600H, 1650, python 3.7.12, NumPy 1.21.6,Julia 1.8.5; Mac OS是Ventura 13.2.1,M1 Max,numpy 1.24.2, python 3.10.9, julia 1.8.5。目前我的Linux系统机器正在升级,暂时跑不了代码。
请问为什么:
- 我的程序Windows系统上Julia比Python慢?不仅慢在对于数组的遍历与读写上,而且在矩阵的对角化上。如果我想优化,还可以怎么优化,我理解的Julia应该是默认可以跑多线程,但是似乎在Windows上为单核。(很抱歉,我还没有学习Julia性能部分。是需要提前声明变量吗?)
- 为什么在Mac OSJulia比Python快?
1.21.6的numpy.show_config()如下,
blas_info:
libraries = ['cblas', 'blas', 'cblas', 'blas', 'cblas', 'blas']
library_dirs = ['D:/Anaconda3/envs/quantum\\Library\\lib']
include_dirs = ['D:/Anaconda3/envs/quantum\\Library\\include']
language = f77
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
libraries = ['cblas', 'blas', 'cblas', 'blas', 'cblas', 'blas']
language = f77
lapack_info:
libraries = ['lapack', 'blas', 'lapack', 'blas']
language = f77
lapack_opt_info:
libraries = ['lapack', 'blas', 'lapack', 'blas', 'cblas', 'blas', 'cblas', 'blas', 'cblas', 'blas']
language = f77
define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
not found = AVX512F,AVX512CD,AVX512_SKX,AVX512_CLX,AVX512_CNL
LinearAlgebra.BLAS.get_config()如下:
LinearAlgebra.BLAS.LBTConfig
Libraries:
└ [ILP64] libopenblas64_.dll
哈密顿量形式为:
where on-site potentials \omega_i are independent Gaussian random numbers with mean zero and variance W^2
import os
import numpy as np
import scipy.sparse as sp
import datetime
os.environ["OMP_NUM_THREADS"] = "1" # export OMP_NUM_THREADS=4
os.environ["OPENBLAS_NUM_THREADS"] = "1" # export OPENBLAS_NUM_THREADS=4
os.environ["NUMEXPR_NUM_THREADS"] = "1" # export NUMEXPR_NUM_THREADS=6
t0 = datetime.datetime.now()
#这一步是生成所有占据数为N/2的基态,比如N=4的话就是即0011,0101,0110,1001,1010,1100一系列的态。最终返回一个列表,里面都是二次量子化基矢,以列表的形式存储。
def half_filling(N):
basis = []
for x in range(0, 2**N):
xb = bin(x)[2:].zfill(N)
if xb.count('1') == N//2:
basis.append(list(map(int, xb)))
return basis
#这一部分是写哈密顿量,具体来说是用前文生成的半填充的基矢来展开哈密顿量,
def sparse_hamiltonian(W,N,t=1,tprime=1,V=2):
basis=half_filling(N)
basis=[list(i) for i in basis]
L=len(basis)
H=sp.dok_matrix((L,L))
omegalis=np.ones(N) #这里应为np.normal(0,W**2)
for row in range(L):
rowstate=basis[row]
temp=0
#先写对角元素
for i in range(N):
if rowstate[i]==1:
temp+=omegalis[i]
if rowstate[(i+1)%N]==1:
temp+=V/4
else:
temp-=V/4
else:
if rowstate[(i+1)%N]==1:
temp-=V/4
else:
temp+=V/4
H[row,row]=temp
#再写非对角元素,因为c_i^{\dagger}c_{i+1}保持粒子数守恒所以作用后的基仍在我们已经写好的基矢空间内。遍历每一个基矢作为矩阵的横坐标,如果是比如0110态,就找对应的1010态作为纵坐标,以此类推。如果你不熟悉,可以参见Sandvik的精确对角化教程。
for j in range(N):
if rowstate[j] == 0 and rowstate[(j+1)%N] == 1:
newcolumnstate = rowstate.copy()
newcolumnstate[j] = 1
newcolumnstate[(j+1)%N] = 0
column = basis.index(newcolumnstate)
H[row, column] += t
elif rowstate[j] == 0 and rowstate[(j+2)%N] == 1:
newcolumnstate = rowstate.copy()
newcolumnstate[j] = 1
newcolumnstate[(j+2)%N] = 0
column = basis.index(newcolumnstate)
H[row, column] += tprime
H = H + H.T
for i in range(L):
H[i, i] /= 2
return H.toarray()
energy=np.linalg.eigvalsh(sparse_hamiltonian(1,16))
print(datetime.datetime.now()-t0)
using SparseArrays
using LinearAlgebra
function half_filling(N)
basis = []
for x in range(0, 2^N-1)
xb = string(x, base=2, pad=N)
if count('1', xb) == N÷2
push!(basis, map(x -> parse(Int64, x), split(xb, "")))
end
end
return basis
end
function Hamiltonian(W,N,t=1,tprime=1,V=2)
basis=half_filling(N)
L=length(basis)
H=spzeros(L,L)
omegalis=ones(N)
for row in 1:L
rowstate=basis[row]
temp=0
for i in 1:N
if rowstate[i]==1
temp+=omegalis[i]
if rowstate[mod1(i+1,N)]==1
temp+=V/4
else
temp-=V/4
end
else
if rowstate[mod1(i+1,N)]==1
temp-=V/4
else
temp+=V/4
end
end
H[row,row]=temp
end
for j in 1:N
if rowstate[j]==0 && rowstate[mod1(j+1,N)]==1
newcolumnstate=copy(rowstate)
newcolumnstate[j]=1
newcolumnstate[mod1(j+1,N)]=0
column = findfirst(x -> isequal(x, newcolumnstate), basis)
H[row,column]+=t
elseif rowstate[j] == 0 && rowstate[mod1(j+2,N)] == 1
newcolumnstate = copy(rowstate)
newcolumnstate[j] = 1
newcolumnstate[mod1(j+2,N)] = 0
column = findfirst(x -> isequal(x, newcolumnstate), basis)
H[row, column] += tprime
end
end
end
H=H+transpose(H)
for k in 1:L
H[k,k]/=2
end
return Matrix(H)
end
@time eigvals(Hamiltonian(1,16))