Julia工具的部署

xgdgsc · 2019 年6 月 24 日 12:58

最近用julia写一个处理一批csv文本文件做一些统计生成表格的程序，感觉julia目前最大的问题是加载包特别慢，using DataFrames和CSV等一些包几乎就能用10s左右，即使用PackageCompiler编译后也要3s，而且第一次跑CSV读取到DataFrame也要5s，以后每次只要0.001s以下，计算部分第二次运行也只用0.1s以下（总体用时可以到30s以上）。所以现在用Genie写一个web服务启动一个julia进程一直跑着，用golang撸个命令行程序去发送http请求调用。不知道大家有没有更好的做法？

June-6th · 2019 年6 月 25 日 10:42

我现在是用 Jupyter 加 IJulia 来挂着 julia 进程的，可以显示图片，不过自动化或者嵌入到其他程序中可能会麻烦点。

Roger · 2019 年6 月 26 日 01:12

你遇到这个问题的根本原因在多个module的交叉编译结果没有进行cache，目前没有特别好的方案。未来应该会在各个环境里再提供全局cache就会改善这个问题。但是依然是有workaround的方法的。

原理上我们希望把一些我们要用的generic 函数和他们的specific type的组合给编译掉，然后存下来。所以我们可以对一个已经开发完成的应用进行tracing，获得我们需要编译的类型和方法信息。这利用到

SnoopeCompile可以用来自动产生precompile的指令，这样就可以把大部分实际上会用的函数给静态预编译出来。

然后我们希望能够基于这个方法再编译出一个静态的链接库储存这些编译结果，这个方法叫Fezzik，它会运行一遍这些函数做trace，然后获得类型信息和预编译指令，最后给你编译出一个sysimg的链接库。

不过如果是应用使用ApplicationBuilder可能更好，需要就你自己的情况试一下：

然后我预计年底或者明年在下个版本里会有编译器自带的静态编译功能：

github.com/JuliaLang/julia

[Julep/WIP] Standalone AOT compilation mode

JuliaLang:jn/codegen-norecursion ← tshort:standalone-mode-new

opened 12:12AM - 10 Jun 19 UTC

tshort

+1348 -26

This mode of compilation aims to statically compile Julia code to libraries or e…xecutables that do not need a system image. This will allow Julia to support more use cases: * Smaller standalone executables with faster startup. * Compilation to standalone libraries. For example, R or Python packages could link to Julia binary libraries. * Cross compilation to more limited systems. This could be an embedded system or WebAssembly for web apps. To support these modes, the following compilation targets could be supported: * A shared library that links to the `libjulia` shared library. * An executable that links to the `libjulia` shared library. * An object file meant to dynamically link to the `libjulia` shared library. In addition to these, we'd also like to support these same targets, but statically link to `libjulia.a` for smaller standalone executables or libraries. My main interest is compilation to WebAssembly (see [this issue](https://github.com/Keno/julia-wasm/issues/5)). See [here](https://tshort.github.io/wasm-playing/example/) for a simple web app compiled using this branch of Julia. ## Approach This is based on @vtjnash's work on [jn/codegen-norecursion](https://github.com/JuliaLang/julia/pull/25984). That capability will be great to have for codegen work. Hopefully, that can be merged soon. This approach works by introducing a `standalone-aot-mode` into Julia's code generation process. This is similar to the `imaging-mode`. The main differences are: * `ccall` -- `foreigncall`'s normally are converted to calls to function pointers. In `standalone-aot-mode`, these are compiled to normal external function calls to be resolved at link time. * `cglobal` -- As with `ccall`'s, these are compiled to normal external references. * *Global variables* -- This is a tricky part. Global variables (symbols, strings, and Julia global variables) are serialized to a "mini image" (a binary array). An initialization function is provided to restore the global variables upon startup. The serialization code reuses the machinery in "src/dump.c". Some non-core structs and types are converted to tuples or other types that have the same memory layout. * *Initialization* -- This is another tricky part. Initialization includes a simplified version of `jl_init` that does not load the standard library. It initializes many types, including some defined in `base/boot.jl`. ## Miscellaneous notes * Generic code that uses `jl_invoke()` or `jl_apply_generic()` isn't supported. A warning is currently issued for code that is compiled with either of these. This often includes error-handling code. * `cfunction` isn't supported. I'm not sure how to handle that. * The tests target Linux. The tests currently use `julia-debug`. * There's a garbage-collection bug lurking somewhere. For at least the `rand()` test, it crashes unless GC is disabled. ## Feedback / next Steps I'm looking forward to guidance on steps needed to get this into Julia as an experimental feature. This includes tests and code cleanups. If anyone one sees any big gotcha's or problems with the approach, that discussion would help, too.

应该会大幅改善这个情况，此外1.2版本也有很多compile latency方面的改进，八月初就可以用上。或者也可以用当前的release candidate尝鲜。

xgdgsc · 2019 年6 月 26 日 01:42

ApplicationBuilder 和 PackageCompiler 是不是差不多，编译出来速度也一样

Roger · 2019 年6 月 26 日 01:46

PackageCompiler是一个静态编译器，它在编译出来的结果第一次运行速度快不快取决于你的预编译指令是否覆盖完全，因为Julia本质上是一个动态语言，所以一般来说类型信息要到运行时才能确定。Fezzik也是依赖于PackageCompile的，但是不同之处在Fezzik会进行自动的tracing，也就是把程序运行一遍然后产生编译指令，这可以帮助编译器知道具体要编译的concrete type都有什么。否则理论上是无法编译的（因为generic函数的编译可能有无数多种）。

shilu1984 · 2019 年7 月 5 日 05:19

要是Julia能对编译的结果进行cache就好了。这样不要头次运行函数都要编译一次，比如zeros这些常用函数，历史上都不知道被执行了多少遍了。
上述带来的问题，就是调试代码时候不爽，改一点代码，就要重新include，重新编译。

woclass · 2019 年7 月 5 日 11:47

试试 timholy/Revise.jl: Automatically update function definitions in a running Julia session

Roger · 2019 年7 月 15 日 04:29

有正在讨论中的environment cache，现在的主要问题是module A的函数module B的类型编译出来的不知道cache算谁的（不算A的也不算B的），所以方案是cache到environment全局里去。我预计会改善这个情况很多。不过1.2和1.3的编译速度已经提高了很多了，所以一些情况下不cache，延迟也不那么明显了。

yym0924 · 2020 年6 月 16 日 03:26

所以目前编译器自带的静态编译功能处于什么进度呢？