Stream: t-lang/wg-unsafe-code-guidelines

Topic: Using two statics to "mark a region"


RalfJ (Aug 01 2019 at 09:05, on Zulip):

I think I saw someone once propose to use two statics to mark a region of memory, i.e. the linker would later be instructed to set one static to the beginning and one to the end.
Seems like this caused miscompilations before: https://stefansf.de/post/pointers-are-more-abstract-than-you-might-expect/ (GCC changed behavior in that case but I dont know what LLVM does)

RalfJ (Aug 01 2019 at 09:06, on Zulip):

does anyone remember where that came up? @eddyb ?

RalfJ (Aug 01 2019 at 09:09, on Zulip):

(also WTF, that post was published the exact same day as my own https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html...)

gnzlbg (Aug 01 2019 at 09:09, on Zulip):

I've used asm! to mark regions of code before, and that did not cause micompilations

gnzlbg (Aug 01 2019 at 09:10, on Zulip):

Either way, is that a use case that the language supports?

gnzlbg (Aug 01 2019 at 09:11, on Zulip):

Also, why does the example use two statics, instead of a static array and a size_t in C?

gnzlbg (Aug 01 2019 at 09:12, on Zulip):

That would have avoided the problem

gnzlbg (Aug 01 2019 at 09:12, on Zulip):

(to obtain the end pointer, you need to offset the start pointer with the size_t, and that returns a pointer with the right provenance)

RalfJ (Aug 01 2019 at 09:20, on Zulip):

no idea why this was done the way it was

RalfJ (Aug 01 2019 at 09:20, on Zulip):

but it's the same thing I saw someone ask/propose for use in Rust

gnzlbg (Aug 01 2019 at 09:20, on Zulip):

It seems that it was always done that way in the Linux kernel

gnzlbg (Aug 01 2019 at 09:20, on Zulip):

and it was always broken, but people started seeing issues as the GCC version was updated

gnzlbg (Aug 01 2019 at 09:20, on Zulip):

they had linker scripts to do that

gnzlbg (Aug 01 2019 at 09:21, on Zulip):

the linux kernel mailing list never considered changing the linker scripts, so I don't know whether that would have been possible

gnzlbg (Aug 01 2019 at 09:21, on Zulip):

I don't see why it wouldn't

gnzlbg (Aug 01 2019 at 09:22, on Zulip):

they ended up using black_box instead to make the arrays opaque to the optimizer

gnzlbg (Aug 01 2019 at 09:23, on Zulip):

its duct-tape all the way down

RalfJ (Aug 01 2019 at 09:25, on Zulip):

its duct-tape all the way down

indeed :/

Lokathor (Aug 08 2019 at 02:11, on Zulip):

I've needed to mark the end of the statics in my binary before

Lokathor (Aug 08 2019 at 02:12, on Zulip):

well, in my ROM, it was for GBA, so execution and the whole address space are very well defined with no OS meddling in your way

RalfJ (Aug 08 2019 at 08:14, on Zulip):

right but the compiler is still meddling ;)

RalfJ (Aug 08 2019 at 08:14, on Zulip):

@Lokathor so why didn't it work to mark begin + length instead?

Lokathor (Aug 08 2019 at 08:24, on Zulip):

So the linker does a start and end thing for the .data, and for .bss it just marks the end. Then crt0 copies necessary data at boot before entering main, and then inside rust we just declare an extern static with the right name and let the linker figure it out.

And, well, it seems to work out.

RalfJ (Aug 08 2019 at 08:29, on Zulip):

it works out until LLVM sees you are doing cross-object arithmetic...

RalfJ (Aug 08 2019 at 08:30, on Zulip):

LLVM considers each static its own "allocated object"

RalfJ (Aug 08 2019 at 08:30, on Zulip):

not sure what the best solution here is. maybe as a start we should have an issue tracking this somewhere.

gnzlbg (Aug 08 2019 at 09:08, on Zulip):

@Lokathor is there a way for the linker to write the length of the region to a static ?

rkruppe (Aug 08 2019 at 09:16, on Zulip):

I don't know about the linker used for that target but some linkers definitely support that

Lokathor (Aug 08 2019 at 09:21, on Zulip):

I did not write that linker script, so i don't know

Lokathor (Aug 08 2019 at 09:22, on Zulip):

@RalfJ it's just so that you can know the base address for where your own allocator should start. it's not intended for cross-object pointer jumping or anything like that

rkruppe (Aug 08 2019 at 09:23, on Zulip):

Is this GNU ld? Then see the example for SIZEOF near the end of this page https://sourceware.org/binutils/docs/ld/Builtin-Functions.html#Builtin-Functions

Lokathor (Aug 08 2019 at 09:23, on Zulip):

uhm

Lokathor (Aug 08 2019 at 09:23, on Zulip):

yeah, gnu binutils for arm

RalfJ (Aug 08 2019 at 09:48, on Zulip):

RalfJ it's just so that you can know the base address for where your own allocator should start. it's not intended for cross-object pointer jumping or anything like that

the thing is, BEGIN and END are different objects for LLVM

RalfJ (Aug 08 2019 at 09:48, on Zulip):

so if you subtract them as pointers (offset_from), you won't get well-defined results

RalfJ (Aug 08 2019 at 09:48, on Zulip):

if you compare BEGIN.add(n) with END as pointers, that can be optimized to false without even looking at n

RalfJ (Aug 08 2019 at 09:49, on Zulip):

you can try comparing as integers instead, that should be well-defined but LLVM has bugs

RalfJ (Aug 08 2019 at 09:50, on Zulip):

I think if you use END.wrapping_offset_from(BEGIN), and then never use END again, that should work

gnzlbg (Aug 08 2019 at 11:32, on Zulip):

writing a linker script that does array+size feels simpler than writing one that generates two arrays appropriately

gnzlbg (Aug 08 2019 at 11:32, on Zulip):

doing the simplest thing avoids UB, so why do the complicated thing that's often UB ?

nagisa (Aug 08 2019 at 18:45, on Zulip):

A perfect reason to use 2 symbols to mark beginning and the end is that it saves space.

nagisa (Aug 08 2019 at 18:45, on Zulip):

symbols, being a link-time concept, take 0 bytes in the loaded binary.

nagisa (Aug 08 2019 at 18:48, on Zulip):

(and obviously these symbols should end up in your code as *const u8 pointers, not statics of T)

RalfJ (Aug 09 2019 at 07:28, on Zulip):

(and obviously these symbols should end up in your code as *const u8 pointers, not statics of T)

how would one do that? seems like a static is the usual way to get an address from the linker to the program

gnzlbg (Aug 09 2019 at 08:47, on Zulip):

@nagisa when one uses two symbols, the linker has to somehow insert two addresses there

gnzlbg (Aug 09 2019 at 08:47, on Zulip):

why can't the linker insert the address in one symbol, and (end - begin) in the other ?

nagisa (Aug 09 2019 at 16:20, on Zulip):

You don’t need to store the addresses, referring to a symbol will cause linker to replace it with an address of this symbol in-place.

nagisa (Aug 09 2019 at 16:21, on Zulip):

the same way call foo instruction does not store the address of foo "somewhere" first, but rather calls the address of foo directly.

nagisa (Aug 09 2019 at 16:22, on Zulip):

(and obviously these symbols should end up in your code as *const u8 pointers, not statics of T)

how would one do that? seems like a static is the usual way to get an address from the linker to the program

A good question. &symbol as *const u8?

nagisa (Aug 09 2019 at 16:23, on Zulip):

I guess that’s the thing where it is not clear whether symbol is an object of u8 or an object of unknown dimensions.

RalfJ (Aug 09 2019 at 18:00, on Zulip):

it better be a ZST if there's not definitely some memory there

Lokathor (Aug 09 2019 at 18:04, on Zulip):

I don't know why it would matter? The best plan either way is to cast the address of the marker to usize and avoid all of LLVM's crazytown by doing all of your address operations on usize values.

RalfJ (Aug 09 2019 at 18:04, on Zulip):

static FOO: u8 promises the compiler there is a byte of memory there

Lokathor (Aug 09 2019 at 18:05, on Zulip):

oh i see, sorry, i misread. Yes, you also want a valid byte at wherever it gets puts

gnzlbg (Aug 11 2019 at 13:54, on Zulip):

Can't one write static FOO: *const u8; ?

rkruppe (Aug 11 2019 at 14:02, on Zulip):

That still implies the symbol refers to valid memory, only now it has to be pointer sized instead of just one byte

gnzlbg (Aug 12 2019 at 11:19, on Zulip):

Ah, gotcha. Wouldn't one byte also be sub-optimal?

gnzlbg (Aug 12 2019 at 11:20, on Zulip):

e.g. what we want is static FOO: Zst; and for &Foo to have an address chosen by the linker

gnzlbg (Aug 12 2019 at 11:20, on Zulip):

such that if then the other "end" pointer, or the size are zero, no memory needs to be allocated for this

gnzlbg (Aug 12 2019 at 11:21, on Zulip):

otherwise when the size is zero, static FOO: u8; would need to be dereferenceable for one byte, but this wouldn't be the case

rkruppe (Aug 12 2019 at 11:23, on Zulip):

Yeah, a ZST would fine as Ralf said earlier. Although I'm not so sure we guarantee to actually give you the address of the symbol rather than e.g. align_of::<TheZst>() as *const

gnzlbg (Aug 12 2019 at 12:05, on Zulip):

I recall that the implementation currently does align_of::<Zst>() as *_, but I don't think we guarantee anything beyond "the address will be suitably aligned", to avoid UB when creating a reference

Lokathor (Aug 12 2019 at 20:41, on Zulip):

well, the symbol is placed by the linker script

Lokathor (Aug 12 2019 at 20:42, on Zulip):

in rust you just use an extern

Lokathor (Aug 12 2019 at 20:42, on Zulip):

if i still follow what we're talking about

Last update: Nov 20 2019 at 11:45UTC