Stream: t-lang/wg-unsafe-code-guidelines

Topic: Linker-generated arrays


Amanieu (Jan 15 2020 at 19:46, on Zulip):

Many linkers have a feature where it generates symbols at the start/end of a section. This means that if you define a bunch of statics to be located in a particular ELF section, you can get an array of all these statics:

extern "C" {
    static __start_my_list: [MyStruct; 0];
    static __stop_my_list: [MyStruct; 0];
}

let my_list: &[MyStruct] = unsafe {
    let start = __start_my_list.as_ptr();
    let stop = __stop_my_list.as_ptr();
    let len = stop.offset_from(start) as usize;
    slice::from_raw_parts(start, len)
};
Amanieu (Jan 15 2020 at 19:46, on Zulip):

Is this allowed by the unsafe code guidelines? If not, could we find a way to allow this?

RalfJ (Jan 15 2020 at 19:53, on Zulip):

I think the main concern is that start.offset(n) looks like it'll be out-of-bounds of that static

RalfJ (Jan 15 2020 at 19:54, on Zulip):

so either for extern static we declare that the actual size might be bigger than the type says, or we have an attribute for that

RalfJ (Jan 15 2020 at 19:55, on Zulip):

(also you probably want static mut unless you want the static to be immutable)

Lokathor (Jan 16 2020 at 06:31, on Zulip):

It would be a lot easier to, if possible, declare one large array static and then define offset values for reaching into the array.

centril (Jan 16 2020 at 12:15, on Zulip):

This doesn't seem like something we should add special cases for in the language semantics at any rate.

RalfJ (Jan 16 2020 at 12:42, on Zulip):

well, if it's a sufficiently widely used pattern we should try our best to support it somehow

Elichai Turkel (Jan 16 2020 at 13:36, on Zulip):

well, if it's a sufficiently widely used pattern we should try our best to support it somehow

Shouldn't that work if you cast to usize and do arithmetic on that?

RalfJ (Jan 16 2020 at 17:46, on Zulip):

hm, no

RalfJ (Jan 16 2020 at 17:46, on Zulip):

not if we say that static's are allocations that the language "understands", and whose size is given by the type

RalfJ (Jan 16 2020 at 17:46, on Zulip):

then any attempt to access things outside their bounds is just use of a dangling pointer

RalfJ (Jan 16 2020 at 17:47, on Zulip):

I mean I guess you are proposing to go through a usize to "conceal" the provenance, but that's not really a "solution" I'd say...

Amanieu (Jan 16 2020 at 18:28, on Zulip):

We do something similar in our unwinding implementation for SEH: https://github.com/rust-lang/rust/blob/master/src/libpanic_unwind/seh.rs#L67

Amanieu (Jan 16 2020 at 18:29, on Zulip):

Basically some static data used by the unwinder requires statics represented by offsets from the __ImageBase symbol rather than their actual address.

RalfJ (Jan 16 2020 at 19:58, on Zulip):

wow

RalfJ (Jan 16 2020 at 19:58, on Zulip):

I wonder if/how LLVM sanctions permits this

RalfJ (Jan 16 2020 at 19:59, on Zulip):

wtf... I just realized "sanction" has its own opposite as meaning. such a useless word... sometimes I really wonder how English can ever work at all^^

RalfJ (Jan 16 2020 at 20:00, on Zulip):

like, is there some wording in LLVM that says "statics may be bigger than their type says", or is that somehow implicitly understood, or what?

RalfJ (Jan 16 2020 at 20:01, on Zulip):

we could probably account for all of that by saying that the size of an extern static is at least what the type says.
however, this still leaves the problem (for the begin-end-style) that taking the difference between two statics is pointer subtraction between pointers from different allocations, which is... a gray area.

Amanieu (Jan 16 2020 at 20:06, on Zulip):

Well, LLVM seems to support it since my code works...

Lokathor (Jan 16 2020 at 20:52, on Zulip):

that's what all the UB users say ;P

simulacrum (Jan 16 2020 at 21:15, on Zulip):

I think we pulled the SEH impl out of clang's standard library

simulacrum (Jan 16 2020 at 21:15, on Zulip):

but I could be wrong

Amanieu (Jan 16 2020 at 22:19, on Zulip):

We pulled the SEH impl out of the LLVM IR that clang generates for a try {} catch {}.

simulacrum (Jan 16 2020 at 22:37, on Zulip):

ah okay

gnzlbg (Jan 17 2020 at 12:39, on Zulip):

Is this allowed by the unsafe code guidelines? If not, could we find a way to allow this?

Why can't you use two pointers ?

gnzlbg (Jan 17 2020 at 12:40, on Zulip):

e.g.

extern "C" {
    static BEGIN: *const c_int ;
    static END: *const c_int;
}

?

gnzlbg (Jan 17 2020 at 12:41, on Zulip):

That way you can access the array as follows:

/*safe*/ fn get(idx: usize) -> c_int {
    unsafe {
        let len = END - BEGIN;
        assert(idx <= len);
        BEGIN.add(len).read()
    }
}
gnzlbg (Jan 17 2020 at 12:42, on Zulip):

I think we pulled the SEH impl out of clang's standard library

@simulacrum @Amanieu In C and C++, an array is a pointer, so if a C array was converted to a Rust array, that conversion was incorrect.

gnzlbg (Jan 17 2020 at 12:44, on Zulip):

I.e., the call ABI of extern "C" fn foo(x: [c_int; 4]) and C's void foo(int c[4]); are not the same

simulacrum (Jan 17 2020 at 12:44, on Zulip):

I think it was correct, and I may have misremembered, not sure.

gnzlbg (Jan 17 2020 at 12:44, on Zulip):

I'm not sure how that applies to extern statics

Amanieu (Jan 17 2020 at 12:46, on Zulip):

@gnzlbg That's not how linker symbols work though. The symbol doesn't point to a pointer to the start/end of the section. The symbol points directly to the start/end of the section.

gnzlbg (Jan 17 2020 at 12:46, on Zulip):

That's what I meant with I'm not sure how that applies to a static

Amanieu (Jan 17 2020 at 12:47, on Zulip):
extern unsigned char __EH_FRAME_BEGIN__[];
Amanieu (Jan 17 2020 at 12:48, on Zulip):

Here's an example in C.

gnzlbg (Jan 17 2020 at 12:48, on Zulip):

I expect __EH_FRAME_BEGIN to be the address of where the array is

gnzlbg (Jan 17 2020 at 12:48, on Zulip):

so that &__EH_FRAME_BEGIN__[0] desugars to &(*(__EH_RAME_BEGIN + 0)) and gives the address of the first element

gnzlbg (Jan 17 2020 at 12:49, on Zulip):

(although &array[i] does not literally desugars to that in the C spec)

gnzlbg (Jan 17 2020 at 12:50, on Zulip):

In Rust, extern { static __EH_FRAME_BEGIN__[c_uchar; 0]; } &__EH_FRAME_BEGIN__[0] should return the same address, so for a static, that looks fine

gnzlbg (Jan 17 2020 at 12:51, on Zulip):

and the only problem you might get is that of pointer arithmetic out of bounds

gnzlbg (Jan 17 2020 at 12:51, on Zulip):

(which is avoided by using two raw pointers)

gnzlbg (Jan 17 2020 at 12:52, on Zulip):

You can probably fix the Rust code to use &raw, but if it is using &T as *const T to produce the pointers, those pointers have a provenance from the [T; 0] allocation, and doing arithmetic out of bounds looks like UB

Amanieu (Jan 17 2020 at 12:52, on Zulip):

But extern { static __EH_FRAME_BEGIN__: *const c_char; } has an entirely different meaning. It says that there is a pointer in the data section of the executable, which is not true.

gnzlbg (Jan 17 2020 at 12:53, on Zulip):

technically, what you want is an opaque extern

gnzlbg (Jan 17 2020 at 12:53, on Zulip):

but we don't have those yet I think

gnzlbg (Jan 17 2020 at 12:55, on Zulip):

i.e. extern { type __EH_FRAME_BEGIN__; type __EH_FRAME_END__; }

nagisa (Jan 17 2020 at 14:25, on Zulip):

But extern { static __EH_FRAME_BEGIN__: *const c_char; } has an entirely different meaning. It says that there is a pointer in the data section of the executable, which is not true.

for most uses of putting arrays into executable, there is data at that location though

nagisa (Jan 17 2020 at 14:26, on Zulip):

e.g. ARM’s vector table is an array of addresses

RalfJ (Jan 17 2020 at 18:51, on Zulip):

Well, LLVM seems to support it since my code works...

@Amanieu I'd really rather have something explicit in the LLVM spec or at least a comment in the code -- as you know and others pointed out, plenty of UB code seems to work^^

Last update: Jun 07 2020 at 10:40UTC