如何以最快的速度计算第n个fibonacci数

yzqiu0 · 2018 年9 月 27 日 16:28

去年在光学课上推导出了快速计算fibonacci数列的一个算法，当时以Python来计算，速度大概是幂矩阵算法的4倍，前个星期五再阮一峰老师的博客里面提到了Julia这门语言，于是了解了一下，现在正在学Julia，下面是我用Julia实现的快速计算fibonacci(n)的代码，计算第1亿个fibonacci数在我电脑上只需花费1.3秒，跟我以前在Linux用tcc调用GMP大数算法库的速度基本上是一样的，当然这得益于Julia封装了GMP库。如果您能写出更快的计算fibonacci函数的代码，请告知我，谢谢。

function fibonacci(n::Int)
	n < 2 && return n
	x, y, bits = big(0), big(1), []
	while n > 0
	    pushfirst!(bits, Bool(n&0x1))
	    n >>= 1
	end
	for bit ∈ bits[1:end-1]
		u, v = x * (x + 2y), (x^2 + y^2)
		x, y = bit ? (u + v, u) : (u, v)
	end
	u = x * (x + 2y)
	bits[end] ? u + x^2 + y^2 : u
end


map(x->println(x, "\t=>\t", fibonacci(x)), 0:10)
@time fib = fibonacci(100000000)
@time len = ceil(Int, log10(fib))
println(len)

我也用Julia写了一个简单的埃拉托斯特尼素数筛选函数，比我用C语言实现的大概要快1/3.
用Julia实现的代码很优雅。

function primeCount(maxn::Int)
	mark_array = trues(maxn)
	cnt = 1  # count for 2
	for i in 3:2:maxn
		if mark_array[i]
			cnt += 1
			j, delta = i^2, 2i
			while j <= maxn
				mark_array[j] = false
				j += delta
			end
		end
	end; cnt
end


@time println(primeCount(10^8))

Scheme · 2018 年9 月 27 日 17:33

用map的话会返回nothing，推荐直接写foreach(x->println(x, "\t=>\t", fibonacci(x)), 0:10)。

julia> using Combinatorics

julia> @time fibonaccinum(100000000);
  0.597373 seconds (683 allocations: 131.589 MiB, 1.73% gc time)

julia> @time fib = fibonacci(100000000);
  0.844977 seconds (1.46 k allocations: 210.346 MiB, 1.34% gc time)

Scheme · 2018 年9 月 27 日 17:40

用 @inbounds 可以加快速度

julia> function primeCount2(maxn::Int)
               mark_array = trues(maxn)
               cnt = 1  # count for 2
               @inbounds for i in 3:2:maxn
                       if mark_array[i]
                               cnt += 1
                               j, delta = i^2, 2i
                               while j <= maxn
                                       mark_array[j] = false
                                       j += delta
                               end
                       end
               end; cnt
       end
primeCount2 (generic function with 1 method)

运行速度最好用 BenchmarkTools 来测量

julia> using BenchmarkTools

julia> @btime primeCount(10^8)
  488.427 ms (3 allocations: 11.92 MiB)
5761455

julia> @btime primeCount2(10^8)
  446.679 ms (3 allocations: 11.92 MiB)
5761455

Scheme · 2018 年9 月 27 日 17:51

一个非常非常小的问题。因为Julia是个编译的语言，去信任编译器做这些基本的优化就好了。不怎么需要做这种超级优化。

julia> foo(n) = Bool(n&0x1)
foo (generic function with 1 method)

julia> goo(n) = n % 2 != 0
goo (generic function with 1 method)

julia> @code_native foo(1)
        .text
; Function foo {
; Location: REPL[84]:1
; Function Type; {
; Location: REPL[84]:1
        andb    $1, %dil
;}
        movl    %edi, %eax
        retq
        nopw    (%rax,%rax)
;}

julia> @code_native goo(1)
        .text
; Function goo {
; Location: REPL[85]:1
; Function !=; {
; Location: operators.jl:185
; Function ==; {
; Location: REPL[85]:1
        andb    $1, %dil
;}}
        movl    %edi, %eax
        retq
        nopw    (%rax,%rax)
;}

可见两者完全一样。顺便一提，Julia里也有 isodd 函数。

yzqiu0 · 2018 年9 月 27 日 17:55

主要是为了让代码看起来优雅一些。

Scheme · 2018 年9 月 27 日 17:56

容易看懂的代码更优雅

yzqiu0 · 2018 年9 月 27 日 18:03

谢谢啦，我才开始入门Julia，相信有社区的帮助，进步会很快。

yzqiu0 · 2018 年9 月 27 日 18:05

这个是直接调用GMP中的fib函数计算直接出结果的，哈
谢谢，见识不多，又了解了新东西。

Scheme · 2018 年9 月 27 日 18:08

我工作基本上不和整数接触，所以对算法没什么能说，或者有资格说的。有些小地方可以改进，但不会影响速度了。毕竟这个函数最慢的地方是用GMP。

这样写的话Julia不能推导 bits 的数据类型，所以因为你已经知道了 bits 的数据类型一定是 Vector{Bool} 或者 BitArray{1} 的话，可以写 Bool[] 或者 falses(0)。

Julia进行类似于 x[a:b] 这种操作的时候用的是拷贝，所以对速度有影响。

julia> a = zeros(3); b = a[1:2]; b[1] = 1
1

julia> a
3-element Array{Float64,1}:
 0.0
 0.0
 0.0

这里你可以写

for i in 1:lastindex(bits)-1
    bit = bits[i]
    ...
end

这样。如果可以保证不出界的话，那么可以加上 @inbounds 。

yzqiu0 · 2018 年9 月 27 日 18:15

问您一个问题，在Julia中，Bool类型只占用一个bit吗？

Scheme · 2018 年9 月 27 日 18:16

julia> a = falses(64);

julia> sizeof(a)
8

julia> sizeof(1) # Int64
8

yzqiu0 · 2018 年9 月 27 日 18:19

谢谢了。好像是动态计算的，sizeof(falses(8)) 也是8.

Scheme · 2018 年9 月 27 日 18:27

因为你的系统是64位的。最小也得64位，再小的话也没有速度优势。Julia的BitArray是用整数（或者一列整数）来实现的。

我说的大概是这么个道理

julia> for i in 1:10000
           v = falses(i)
           @assert sizeof(v) == 8cld(i, 8sizeof(UInt))
       end

julia>

Scheme · 2018 年9 月 27 日 18:48

啊，才发现我看错问题了。。。你问的是Bool类型不是Vector{Bool}或者BitArray。讨论Bool类型的大小没有太大意义，毕竟编译器很可能会优化掉Bool。现代CPU上不存在只有一个bit的数据。比如说，很多情况下Int16什么的也是用Int64模拟出来的。单独的Bool也是用8个bits。

https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/

Scheme · 2018 年9 月 27 日 19:54

值得注意的是，Vector{Bool}和BitArray{1}的储存方式并不一样。Vector{Bool}是字面意义上的一列Bool。但是BitArray{1}用的是一列整数，如第一个整数的第一个bit代表第一个元素。

julia> for i in 1:10000
           v1 = falses(i)
           v2 = Vector{Bool}(undef, i)
           @assert sizeof(v1) == 8cld(i, 8sizeof(UInt))
           @assert sizeof(v2) == i
       end

julia> typeof(falses(2))
BitArray{1}

julia> typeof(Vector{Bool}(undef, 2))
Array{Bool,1}

yzqiu0 · 2018 年9 月 28 日 00:58

我之所以这样问是因为上面的@time primeCount(10^8)显示占用内存11.92M左右，正是以每个Bool 1bit计算得到的结果。那么按你所说，这里用的就是BitArray{1}了。

freeman · 2018 年9 月 28 日 09:19

function primes(n::Int64)
nlen=ceil(n/2);
nlen=convert(Int64,nlen)
p=trues(nlen);
q = length(p);
temp=sqrt(convert(Float64,n));
ua = floor(sqrt(n));
ub=convert(Int64,ua)
for k = 3:2:ub
if p[div(k+1,2)]
for i=div(k*k+1,2):k:q
p[i] = false;
end
end
end
s=0
for px in p
if px
s += 1
end
end
return s
end
这样更快

yzqiu0 · 2018 年9 月 28 日 11:39

请问下q = length§是什么意思，这句编译会出错。

freeman · 2018 年9 月 28 日 11:42

TIM%E5%9B%BE%E7%89%8720180928194257
不知道为什么复制粘贴竟然出错

freeman · 2018 年9 月 28 日 12:01

bool是占用8个字节，和Int8，uint8一样的占用