Stream: t-cargo/PubGrub

Topic: Test data format


view this post on Zulip Alex Tokarev (Oct 06 2020 at 20:05):

We need to choose a format to keep our test data in. The data is mostly going to consist of 1) a list of all packages with their published versions 2) dependency trees for those packages; the format should accommodate those. It should be human readable, so binary formats can probably be safely skipped altogether.

Some choices that come to mind:
1) TOML as it is used in Cargo. Not sure if it's a proper choice due to the flat structure.
2) JSON. Common choice with performant parsers available. Downsides of having no comments (major) and trailing commas (minor) ask to look into other options.
3) RON. Solves the JSON downsides listed above. As it was created specifically with Rust interoperability in mind, it should provide natural mapping to the actual structs in the code.

Feel free to comment on the options from the list and bring up new suggestions. If someone actually used RON and has some experience to share, I'd be glad to hear.

view this post on Zulip Matthieu Pizenberg (Oct 06 2020 at 20:41):

Never use RON myself. Seems a good choice, but @Eh2406 has probably the best insight on this.

view this post on Zulip Eh2406 (Oct 06 2020 at 21:15):

I thought I would like toml. but the way it stored a crate with 2 dependents was a no go. Comments are needed, so I am working from RON for now. Thanks for the sugestion!

view this post on Zulip Alex Tokarev (Oct 06 2020 at 21:32):

You are welcome.

If someone else sees this thread and has something to add I'd be glad to hear from anyone. :)

view this post on Zulip Eh2406 (Oct 06 2020 at 22:26):

Here is what I have so far. For example fn no_conflict()'s solver is encoded as:

{
    "root": {
        "1.0.0": {
            "foo": (
                segments: [
                    ("1.0.0", Some("2.0.0")),
                ],
            ),
        },
    },
    "foo": {
        "1.0.0": {
            "bar": (
                segments: [
                    ("1.0.0", Some("2.0.0")),
                ],
            ),
        },
    },
    "bar": {
        "2.0.0": {},
        "1.0.0": {},
    },
}

view this post on Zulip Alex Tokarev (Oct 07 2020 at 06:19):

Looks pretty clean

view this post on Zulip Matthieu Pizenberg (Oct 07 2020 at 10:43):

The (segments: [...]) is a rather literal mirror of the Range implementation. Since dependency constraints are more restricted (not the result of intersections and unions of ranges) would it make more sense to have a special codec that generate things more similar to what is usable in Cargo.toml, or maybe more systematic since every correct dependency can be expressed in this form maybe 1.0.0 <= v < 1.4.5

view this post on Zulip Alex Tokarev (Oct 07 2020 at 10:46):

I agree it would be great to have it more general to allow us to change the representation in the future.

view this post on Zulip Alex Tokarev (Oct 07 2020 at 10:54):

Regarding 1.0.0 <= v < 1.4.5, I'm a bit cautious if we are about to reinvent our own semver that uses explicit ranges only? If so, does it make sense to do it or are we better off using the semver spec itself?

view this post on Zulip Eh2406 (Oct 07 2020 at 13:21):

some minor improvements:

(
    dependencies: {
        "bar": {
            "1.0.0": {},
            "2.0.0": {},
        },
        "foo": {
            "1.0.0": {
                "bar": [
                    ("1.0.0", Some("2.0.0")),
                ],
            },
        },
        "root": {
            "1.0.0": {
                "foo": [
                    ("1.0.0", Some("2.0.0")),
                ],
            },
        },
    },
)

view this post on Zulip Eh2406 (Oct 07 2020 at 13:34):

And I am feeling "perfect is the enemy of good" vibes. I'd love to improve the range representation, get rid of the [ if there is only one item, but I don't want to add a lot of potentially buggy parsing and deparsing code while still working on adding other needed testing.
Then again writing it to a file was easier than I expected, so it may be worth a quick try.

view this post on Zulip Alex Tokarev (Oct 07 2020 at 13:43):

99% of the time it's going to be 1 item, might be worth hiding the array if it's not too much of a hustle.
Cargo doesn't even support disjoint ranges, does it?

view this post on Zulip Alex Tokarev (Oct 07 2020 at 13:43):

Let me know if you need some help with this btw

view this post on Zulip Eh2406 (Oct 07 2020 at 13:48):

I know there is talk of adding and "or" syntax to semver. Do you know what is it? As you sead, we may as well be valid semver.

view this post on Zulip Matthieu Pizenberg (Oct 07 2020 at 14:14):

My personal impression (that you may have deduced already ^^) is that I don't like much all the ranges syntax of semver that we see in the wild. I don't think the semver spec itself specifies anything more than what versions means. To my understanding, those ranges syntaxes are mostly there to make it easier to write configuration files in the npm ecosystem which popularized them. But the exact meaning of caret ^ or tilde ~ for example is not clear at all until you read somewhere what they mean. What I like about things like v1 <= v < v2 is that it works whatever the system of versions you are using, numbers, semantic versions, strings, whatever as long as they have an order. That's why I proposed this as an example. It's not as much as reinventing semver, but rather staying as close as possible to the meaning of our version trait. At least that's how I see it, but that's not me doing the work ^^

view this post on Zulip Matthieu Pizenberg (Oct 07 2020 at 14:18):

Eh2406 said:

And I am feeling "perfect is the enemy of good" vibes. I'd love to improve the range representation, get rid of the [ if there is only one item, but I don't want to add a lot of potentially buggy parsing and deparsing code while still working on adding other needed testing.

That makes perfect sense!

view this post on Zulip Alex Tokarev (Oct 07 2020 at 14:25):

I like math notation for specifying ranges, e.g. "[1, 2)" for 1 inclusive, 2 exclusive. It gets rid of some arbitrary symbol in the middle ("v") and is quite established.
Not sure how it fits in RON file though, where [] are for arrays and () are for tuples. Might make it confusing.
What do you think guys?

view this post on Zulip Eh2406 (Oct 07 2020 at 15:38):

Then again writing it to a file was easier than I expected, so it may be worth a quick try.

Poked at it this morning, and did not get a solution I was happy with. So I am leaning to making these improvements, in some other PR.

view this post on Zulip Matthieu Pizenberg (Oct 07 2020 at 21:10):

Alex Tokarev said:

I like math notation for specifying ranges, e.g. "[1, 2)" for 1 inclusive, 2 exclusive. It gets rid of some arbitrary symbol in the middle ("v") and is quite established.

Did you meant "[1, 2["? the ")" is rather for directions so more or less infinite here. Like it to :) and the None case, which corresponds to infinite could be something like "[1,[" with an empty second slot.

Poked at it this morning, and did not get a solution I was happy with

Makes sense to improve this later then

view this post on Zulip Alex Tokarev (Oct 07 2020 at 22:29):

Where I studied we used round braces to specify "exclusive". First time I see the reverse square bracket.
Might just be country specific.

view this post on Zulip Alex Tokarev (Oct 07 2020 at 22:31):

[1, ∞)
That's how infinity would be indicated

view this post on Zulip Eh2406 (Oct 11 2020 at 15:08):

btw, just found https://github.com/afilip1/ronfmt it works great for benchmark files!

view this post on Zulip Alex Tokarev (Oct 12 2020 at 19:10):

Nice find! I was looking for Intellij plugin to do that, found one that was released a few versions ago and I couldn't use that. I thought of maybe trying to upgrade it, but your solution solves the problem right away :smile:


Last updated: Oct 21 2021 at 20:03 UTC