为何定义的结构体大小并不等于结构体各字段之和?

我按如下方式定义了一个结构体:

struct CodeInfo
sec_type::Int32
sec_name::NTuple{24,UInt8}
date::Int32
high_limited::UInt32
low_limited::UInt32
multiplier::Int32
margin_ratio::Int32
price_tick::Int32
capital::Int64
cap_change_date::UInt32
trade_date_in::UInt32
trade_date_out::UInt32
is_halt::UInt8
margin_unit::UInt32
margin_ratio_param1::Int32
margin_ratio_param2::Int32
end

然后如果用

sizeof(CodeInfo)

得到的结果是96。
而用如下方式计算各个字段的大小之和:

sum(sizeof.(CodeInfo.types))

得到的结果是85。(我手工验算我定义的结构体的各个字段之和加起来确实也是这么多)
这究竟是为什么呢?

我在julia里面定义结构体主要是为了解析一个C接口返回的结构体指针。我定义好以上结构体之后,如下方法解析这个指针数据:

## itmedata 其中是 Ptr{UInt8} @0x000000007c1da090这样的从一个C接口返回的指针
itemdata = convert(Ptr{CodeInfo}, itemdata)
res = unsafe_load(itemdata)

我也可以逐个字段做偏移来解,比如我解第二个字段sec_nam,可以用如下方法:

itemdata = convert(Ptr{NTuple{24,UInt8}}, itemdata)
sec_name = unsafe_load(itemdata+4)
##这样逐个字段按偏移来解析也是可以把它全部解析出来的

两种方法得到结果不一样,前一种方法效率高,但是有几个字段解析出来不正确。后一种方法才能解出正确的值。我一直想不通这是为啥,然后用sizeof方法算了一下CodeInfo这个结构体的字节数,发现居然比逐个字段加起来大。我怀疑问题是出在这里,但这是为什么呢?
跪求各路高手解答。

用这个就不需要在julia里重新定义一遍了

1 个赞

我使用链接里面的第二种方法,按照他的示例来定义结构体:

c"""
       typedef struct t_HMDCodeInfo {
         int32_t sec_type;
         char sec_name[24];
         uint32_t date;
         uint32_t high_limited;
         uint32_t low_limited;
         int32_t multiplier;
         int32_t margin_ratio;
         int32_t price_tick;
         int64_t capital;
         uint32_t cap_change_date;
         uint32_t trade_date_in;
         uint32_t trade_date_out;
         uint8_t is_halt;
         uint32_t margin_unit;
         int32_t margin_ratio_param1;
         int32_t margin_ratio_param2;
       } HMDCodeInfo;
       """

直接就报错:error: unknown type name ‘int32_t’
我以这种方式:

julia> const c"int32_t" = Int32
Int32

julia> const c"uint32_t" = UInt32
UInt32

julia> const c"int64_t" = Int64
Int64

julia> const c"uint64_t" = UInt64
UInt64

julia> const c"uint8_t" = UInt8

先赋值,再定义,也还是报一样的错误: error: unknown type name ‘int32_t’

using CBinding
const c"int32_t" = Int32
const c"uint32_t" = UInt32
const c"int64_t" = Int64
const c"uint64_t" = UInt64
const c"uint8_t" = UInt8

c``
c"#include <stdint.h>"s
c"""
    typedef struct t_HMDCodeInfo {
        int32_t sec_type;
        char sec_name[24];
        uint32_t date;
        uint32_t high_limited;
        uint32_t low_limited;
        int32_t multiplier;
        int32_t margin_ratio;
        int32_t price_tick;
        int64_t capital;
        uint32_t cap_change_date;
        uint32_t trade_date_in;
        uint32_t trade_date_out;
        uint8_t is_halt;
        uint32_t margin_unit;
        int32_t margin_ratio_param1;
        int32_t margin_ratio_param2;
    } HMDCodeInfo;
    """

我按这个输的没问题

顺便说下可以用

c"""
#pragma pack(push, 1)
typedef struct t_HMDCodeInfo {
        int32_t sec_type;
        char sec_name[24];
        uint32_t date;
        uint32_t high_limited;
        uint32_t low_limited;
        int32_t multiplier;
        int32_t margin_ratio;
        int32_t price_tick;
        int64_t capital;
        uint32_t cap_change_date;
        uint32_t trade_date_in;
        uint32_t trade_date_out;
        uint8_t is_halt;
        uint32_t margin_unit;
        int32_t margin_ratio_param1;
        int32_t margin_ratio_param2;
    } HMDCodeInfo;
#pragma pack(pop)
"""

得到不align的struct

1 个赞

那看来是我没有输

c"#include <stdint.h>"s

导致的

抱歉打扰但我只是看到一个很illustrative的例子:

It would still be possible for an e.g. 7-byte object to be misaligned in an array. In an array of 7-byte objects, the 10th object would be placed at byte offset 7×(10−1)=63, and the object would straddle the cache line. However, the compiler usually does not allow struct with a nonstandard size for this reason. If we define a 7-byte struct:

struct AlignmentTest
    a::UInt32 # 4 bytes +
    b::UInt16 # 2 bytes +
    c::UInt8  # 1 byte = 7 bytes?
end

Then we can use Julia’s introspection to get the relative position of each of the three integers in an AlignmentTest object in memory:

Size of Main.workspace#3.AlignmentTest: 8bytes
Name: a	Size: 4	Offset: 0
Name: b	Size: 2	Offset: 4
Name: c	Size: 1	Offset: 6

We can see that, despite an AlignmentTest only having 4 + 2 + 1 = 7 bytes of actual data, it takes up 8 bytes of memory, and accessing an AlignmentTest object from an array will always be aligned.

Link:
https://github.com/jakobnissen/hardware_introduction#what-scientists-must-know-about-hardware-to-write-fast-code

1 个赞

C的结构体也是如此

1 个赞

最终的问题还是没有解决,我成功定义结构体var"c"struct t_HMDCodeInfo""后,我用如下语句进行指针类型转换:

##codesinfodata 是一个这样的指针 Ptr{UInt8} @0x000000007aa51fd0
codesinfodata = convert(Cptr{var"c\"struct t_HMDCodeInfo\""}, codesinfodata)
##转换后得到一个这样的指针:Cptr{var"c\"struct t_HMDCodeInfo\""}(0x000000007aa51fd0)

最后用类似codesinfodata.margin_unit[]的方式查看所有字段,发现还是没解对。这种方式跟我前面提到的第一种方法,即在julia里面定义一个CodeInfo的结构体再进行类型转换,结果一样,都没有得到正确的结果。

C也是这样吗?为啥C里面进行强制类型转换就能够解对,在julia里面就不行呢?

Julia的struct layout和C兼容,你应该给出C的struct定义,大家才能进一步诊断。

1 个赞
       typedef struct t_HMDCodeInfo {
         int32_t sec_type;
         char sec_name[24];
         uint32_t date;
         uint32_t high_limited;
         uint32_t low_limited;
         int32_t multiplier;
         int32_t margin_ratio;
         int32_t price_tick;
         int64_t capital;
         uint32_t cap_change_date;
         uint32_t trade_date_in;
         uint32_t trade_date_out;
         uint8_t is_halt;
         uint32_t margin_unit;
         int32_t margin_ratio_param1;
         int32_t margin_ratio_param2;
       } HMDCodeInfo;

C中的结构体长这样

是用的我说的

c"""
#pragma pack(push, 1)
typedef struct t_HMDCodeInfo {
        int32_t sec_type;
        char sec_name[24];
        uint32_t date;
        uint32_t high_limited;
        uint32_t low_limited;
        int32_t multiplier;
        int32_t margin_ratio;
        int32_t price_tick;
        int64_t capital;
        uint32_t cap_change_date;
        uint32_t trade_date_in;
        uint32_t trade_date_out;
        uint8_t is_halt;
        uint32_t margin_unit;
        int32_t margin_ratio_param1;
        int32_t margin_ratio_param2;
    } HMDCodeInfo;
#pragma pack(pop)
"""

这个pack的版本吗,我反正这样定义以后直接

a = read(io, HMDCodeInfo)
a.margin_unit

就可以访问了,转指针我没搞过

1 个赞

谢谢@xgdgsc 我加了pragma pack之后,就解析对了。加了这个语句之后,定义出来的结构体就是85字节了,然后就能通过指针转换解析正确了。@xgdgsc

native version:

struct t_HMDCodeInfo
    data::NTuple{85, UInt8}
end

function Base.getproperty(x::Ptr{t_HMDCodeInfo}, f::Symbol)
    f === :sec_type && return Ptr{Int32}(x + 0)
    f === :sec_name && return Ptr{NTuple{24, Cchar}}(x + 4)
    f === :date && return Ptr{UInt32}(x + 28)
    f === :high_limited && return Ptr{UInt32}(x + 32)
    f === :low_limited && return Ptr{UInt32}(x + 36)
    f === :multiplier && return Ptr{Int32}(x + 40)
    f === :margin_ratio && return Ptr{Int32}(x + 44)
    f === :price_tick && return Ptr{Int32}(x + 48)
    f === :capital && return Ptr{Int64}(x + 52)
    f === :cap_change_date && return Ptr{UInt32}(x + 60)
    f === :trade_date_in && return Ptr{UInt32}(x + 64)
    f === :trade_date_out && return Ptr{UInt32}(x + 68)
    f === :is_halt && return Ptr{UInt8}(x + 72)
    f === :margin_unit && return Ptr{UInt32}(x + 73)
    f === :margin_ratio_param1 && return Ptr{Int32}(x + 77)
    f === :margin_ratio_param2 && return Ptr{Int32}(x + 81)
    return getfield(x, f)
end

function Base.getproperty(x::t_HMDCodeInfo, f::Symbol)
    r = Ref{t_HMDCodeInfo}(x)
    ptr = Base.unsafe_convert(Ptr{t_HMDCodeInfo}, r)
    fptr = getproperty(ptr, f)
    GC.@preserve r unsafe_load(fptr)
end

function Base.setproperty!(x::Ptr{t_HMDCodeInfo}, f::Symbol, v)
    unsafe_store!(getproperty(x, f), v)
end

const HMDCodeInfo = t_HMDCodeInfo

1 个赞

对,这种逐字段根据字节偏移来解析的方案没有问题。就是效率上我测试了一下,没有直接通过强制类型转换来的快。后来通过上面的讨论我知道了问题出在了"pragma pack"上面,我调用的那个CPP接口的结构体定义都是加了"pragma pack",所以各种字段都是没有做字节对齐的,而默认julia里面定义的结构体都是会做字段对齐,所以解出来就有问题。我借助CBinding包里面,按照

c"""
#pragma pack(push, 1)
typedef struct t_HMDCodeInfo {
        int32_t sec_type;
        char sec_name[24];
        uint32_t date;
        uint32_t high_limited;
        uint32_t low_limited;
        int32_t multiplier;
        int32_t margin_ratio;
        int32_t price_tick;
        int64_t capital;
        uint32_t cap_change_date;
        uint32_t trade_date_in;
        uint32_t trade_date_out;
        uint8_t is_halt;
        uint32_t margin_unit;
        int32_t margin_ratio_param1;
        int32_t margin_ratio_param2;
    } HMDCodeInfo;
#pragma pack(pop)
"""

去定义结构体,就能够用convert对指针做类型转换,正常解析出来。

都是最后调的getfield,应该不会有performance上的差异,走完优化的pass出来的都是GEP。