Stream: general

Topic: davidtwco's memory issue


davidtwco (Jun 25 2018 at 10:17, on Zulip):

Not sure if this is the best place to ask, it's unrelated to any Rust contributions I'm working on - I've got a project where the memory usage keeps growing when run with multiple threads. The same code without threads uses a constant amount of memory, running that code on multiple threads - even though they don't interact or share any data - causes the memory to keep increasing.

I've made a small-ish example (https://gist.github.com/davidtwco/46b9a212654c9b8d8061df056467a20d) but it requires you have multiple Jenkins instances running with a bunch of jobs in each - so it's not really all that useful. I've checked it on three different machines with two different OS' and I've included a massif output file since it might be hard to run it. It seems to be all serde_json making the allocations, but I can't see why it wouldn't be getting dropped - it's entirely possible (and probably likely) that I've just made a mistake somewhere.

Would appreciate any thoughts anyone has - been staring at this for a while with no luck.

Jake Goulding (Jun 25 2018 at 13:12, on Zulip):

Are more and more threads spinning up?

Jake Goulding (Jun 25 2018 at 13:14, on Zulip):

My first thought is memory fragmentation; memory usage isn't really going up, but thread 1 allocates memory, thread 2 allocates, thread 1 frees, but it cannot be coalesced because thread 2

Jake Goulding (Jun 25 2018 at 13:14, on Zulip):

so the total allocation from the OS gets bigger and bigger, but the actual used doesnt

Jake Goulding (Jun 25 2018 at 13:14, on Zulip):

heh, I was going to suggest trying a different allocator, do you see this with both jemalloc and system?

davidtwco (Jun 25 2018 at 13:15, on Zulip):

There's only five (one for each Jenkins instance) and then however many hyper/reqwest uses. That stays constant.

davidtwco (Jun 25 2018 at 13:15, on Zulip):

Happens with system and jemalloc.

davidtwco (Jun 25 2018 at 13:15, on Zulip):

I was just using system because I read jemalloc doesn't like valgrind.

Jake Goulding (Jun 25 2018 at 13:15, on Zulip):

it is true

davidtwco (Jun 25 2018 at 13:16, on Zulip):

If it were memory fragmentation, how would I go about resolving that?

davidtwco (Jun 25 2018 at 13:17, on Zulip):

Been banging my head against a wall for a good week looking through the stacktrace for the allocations looking for something dodgy.

Jake Goulding (Jun 25 2018 at 13:21, on Zulip):

I honestly don't know. One answer is "use a compacting garbage collector" :cry:

Jake Goulding (Jun 25 2018 at 13:22, on Zulip):

I don't suppose you can make it even smaller repro? Using just serde in threads?

davidtwco (Jun 25 2018 at 13:22, on Zulip):

I'll see what I can do about getting the repro smaller.

davidtwco (Jun 25 2018 at 13:22, on Zulip):

Thanks.

Jake Goulding (Jun 25 2018 at 13:23, on Zulip):

I don't even know how to validate that it is fragmentation

Jake Goulding (Jun 25 2018 at 13:24, on Zulip):

causes the memory to keep increasing.

How are you seeing this manifest? Are you getting errors of some kind?

davidtwco (Jun 25 2018 at 13:25, on Zulip):

On machines with low memory, it would get killed. On my own working machine I can just watch htop or some other utility remark that the process' memory is increasing gradually. It can reach around 40% or so of the system memory (~7 GB) and it increases at around 20MB/s.

Jake Goulding (Jun 25 2018 at 13:27, on Zulip):

That's a pretty quick rate

Jake Goulding (Jun 25 2018 at 13:27, on Zulip):

https://github.com/mockersf/jenkins-api.rs/blob/ba7e33540cd159c20d13bab4311406e4a2d31293/src/job/common.rs#L71-L83

I see two relatively interesting features there — the flatten and the PhantomData

davidtwco (Jun 25 2018 at 13:27, on Zulip):

Yeah, luckily it often terminates before that but on hosts with less memory it won't. With a single thread it would use around 0.1-0.5% of system memory.

Jake Goulding (Jun 25 2018 at 13:33, on Zulip):

_IMPL_DESERIALIZE_FOR_CommonAction , _IMPL_DESERIALIZE_FOR_ShortView, _IMPL_DESERIALIZE_FOR_ShortJob, all make use of flatten

Jake Goulding (Jun 25 2018 at 13:34, on Zulip):

I know that's a newer feature of Serde, but I cannot think how it would have threading differences at all

davidtwco (Jun 25 2018 at 13:35, on Zulip):

This may be inconsequential, but this code came up when I was running massif the first few times, but on recent runs it hasn't shown up, not sure why. It struck me as potentially the issue, though unlikely.

davidtwco (Jun 25 2018 at 13:53, on Zulip):

I've tried pasting (and anonymizing) a dump from the API and just doing a loop where I parse that a bunch into that CommonJob struct and that doesn't seem to reproduce it.

davidtwco (Jun 25 2018 at 13:55, on Zulip):

That makes me think it's something the API wrapper is doing.

simulacrum (Jun 25 2018 at 14:05, on Zulip):

You can try cloning the jenkins crate locally and adding drop impls which print 'dropping' to the struct you are deserializing as a spot-check, but that does seem unlikely to indicate problems...

davidtwco (Jun 25 2018 at 14:05, on Zulip):

I've thrown in mem::drop(..) manually and I can't see anywhere that they override or mess with the default Drop.

Jake Goulding (Jun 25 2018 at 14:06, on Zulip):
n1: 2799128 0x15E065: serde_json::de::from_reader (de.rs:2178)
 n1: 2799128 0x1A3CC8: reqwest::response::Response::json (response.rs:167)
  n1: 2799128 0x1908E8: jenkins_api::home::<impl jenkins_api::client::Jenkins>::get_home (home.rs:51)
   n1: 2799128 0x132A4A: poller_repro::buggy_code (main.rs:72)
    n1: 2799128 0x133977: poller_repro::run_threads::{{closure}} (dfa.rs:650)

There's not a lot of "API" around the serde code though

Jake Goulding (Jun 25 2018 at 14:07, on Zulip):

Instead of printing, you may want like an atomic variable that counts creations and one that counts drops, then print that when it should be equal

davidtwco (Jun 25 2018 at 14:07, on Zulip):

That specifically isn't the big allocation, the call to get_home is fine. The massif data lists allocation backtraces by time, so if you go to the bottom of that dump, then you'll see the big allocation - at the start that allocation may be biggest but it doesn't grow and quickly gets overtaken.

simulacrum (Jun 25 2018 at 14:07, on Zulip):

valgrind's memcheck might tell you where you're missing memory, too

simulacrum (Jun 25 2018 at 14:07, on Zulip):

It prints out "leaks"

davidtwco (Jun 25 2018 at 14:08, on Zulip):

It's the get_full_job call that seems to allocate a lot. I've ran heaptrack on it and observed that its lots of small allocations rather than one large one.

davidtwco (Jun 25 2018 at 14:08, on Zulip):

I've not yet run normal valgrind on the repro code, I'll do that.

Jake Goulding (Jun 25 2018 at 14:08, on Zulip):

The last thing in the file also doesn't have much:

n1: 11111824 0x13B4B5: serde_json::de::from_reader (de.rs:2178)
 n1: 11111824 0x128528: reqwest::response::Response::json (response.rs:167)
  n1: 11111824 0x151D72: <jenkins_api::job::common::ShortJob<T>>::get_full_job (common.rs:93)
   n1: 11111824 0x132F9E: poller_repro::buggy_code2 (main.rs:81)
    n1: 11111824 0x132E38: poller_repro::buggy_code (dfa.rs:1417)
Jake Goulding (Jun 25 2018 at 14:10, on Zulip):

I reinstalled my OS recently, so now I'm reinstalling Xcode to get those profiling tools

Jake Goulding (Jun 25 2018 at 14:10, on Zulip):

just to see if I can do anything

davidtwco (Jun 25 2018 at 14:10, on Zulip):

I recommend using massif_visualizer or ms_print to read it rather than just looking at the raw dump.

davidtwco (Jun 25 2018 at 14:12, on Zulip):

I've updated the gist with some public Jenkins servers (this doesn't poll too hard, so it's probably alright to run it for a minute or two to see the memory tick up, sorry Ubuntu people!) that way you'll be able to see it happening.

Jake Goulding (Jun 25 2018 at 14:13, on Zulip):

A very small scale DDOS, nice

davidtwco (Jun 25 2018 at 14:35, on Zulip):

I've added Valgrind data, I had to CTRL+C because it would have taken a few hours on my local test Jenkii with the Valgrind performance hit, but you can probably see the largest allocation at the bottom as the one that is problematic.

Jake Goulding (Jun 25 2018 at 15:03, on Zulip):

@David Wood this does exit normally after a while , yah?

davidtwco (Jun 25 2018 at 15:03, on Zulip):

Once it has looped over every job, there's a lot on the Ubuntu jenkins so it will take a while. Normally a minute or so is enough to demonstrate the effect in htop/etc.

Jake Goulding (Jun 25 2018 at 15:04, on Zulip):

pasted image

davidtwco (Jun 25 2018 at 15:04, on Zulip):

It will go back down once some threads finish.

Jake Goulding (Jun 25 2018 at 15:04, on Zulip):

no leaks reported

Jake Goulding (Jun 25 2018 at 15:05, on Zulip):

That peak is ~750MB

davidtwco (Jun 25 2018 at 15:06, on Zulip):

Do you see the memory just keep increasing when it is running? I don't know if it leaks memory necessarily. But if you run it with the run_no_threads, it hovers around 0.1-0.5% system usage on my machine, whereas with run_threads it will go to 20+, 40+% eventually (though sometimes less too).

davidtwco (Jun 25 2018 at 15:07, on Zulip):

pasted image

Jake Goulding (Jun 25 2018 at 15:09, on Zulip):

That was the whole run for ubuntu repos it went up was stable, went down

davidtwco (Jun 25 2018 at 15:11, on Zulip):

I mean, I guess it could be related to the specific Jenkii I'm testing with. But it seems odd that the memory usage just keeps going up for me when running with multiple threads - I'd imagine there'd be more memory used, sure, but it would be a constant factor on the single-threaded version.

Jake Goulding (Jun 25 2018 at 15:12, on Zulip):

did you graph your ubuntu version?

simulacrum (Jun 25 2018 at 15:14, on Zulip):

@David Wood It's worth pointing out that it looks like the memory is in the Error since at least for me I don't see the println in the Ok at all

davidtwco (Jun 25 2018 at 15:14, on Zulip):

I haven't with the ubuntu repos since I've got local ones to test with. I'll kick that off now. Unless my Jenkii are returning some strange malformed stuff that is causing the memory to just keep increasing..? Seems strange.

simulacrum (Jun 25 2018 at 15:15, on Zulip):

Specifically I don't see https://gist.github.com/davidtwco/46b9a212654c9b8d8061df056467a20d#file-main-rs-L77 at all with the gist as is

Jake Goulding (Jun 25 2018 at 15:16, on Zulip):
thread 'ubuntu3' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Json(Error("Unexpected eof during chunk size line", line: 1, column: 19333120)), url: None }', libcore/result.rs:945:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.
thread 'ubuntu2' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Json(Error("Unexpected eof during chunk size line", line: 1, column: 23142400)), url: None }', libcore/result.rs:945:5
thread 'ubuntu1' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Json(Error("Unexpected eof during chunk size line", line: 1, column: 26378240)), url: None }', libcore/result.rs:945:5
davidtwco (Jun 25 2018 at 15:16, on Zulip):

I didn't make it particularly robust, but I see job names being printed regularly.

simulacrum (Jun 25 2018 at 15:18, on Zulip):

@David Wood It looks to me like the main allocations that keep growing come from the jenkins.get_home() or something -- not from get_full_job

davidtwco (Jun 25 2018 at 15:18, on Zulip):

It's possible, I only say get_full_job because of all the various massif and heaptrack runs I've done, I always see get_full_job as the largest contributor.

Jake Goulding (Jun 25 2018 at 15:19, on Zulip):

@David Wood Are you running in dev or release?

simulacrum (Jun 25 2018 at 15:19, on Zulip):

(I'm using a debug build)

davidtwco (Jun 25 2018 at 15:19, on Zulip):

Currently dev, but I've seen it in release too.

davidtwco (Jun 25 2018 at 15:19, on Zulip):

I originally didn't notice it in the normal application because I ran it in short bursts to test - it was only when I deployed it with a release build did I see it get killed and started investigating.

davidtwco (Jun 25 2018 at 15:23, on Zulip):

@simulacrum On the Ubuntu Jenkins, all I get is Err(e) too. But on my own Jenkins instances, I see Ok(..).

davidtwco (Jun 25 2018 at 15:23, on Zulip):

Hadn't noticed that before.

davidtwco (Jun 25 2018 at 15:23, on Zulip):

I still see the steady increase though.

davidtwco (Jun 25 2018 at 15:24, on Zulip):

It might be that the increase is not as dramatic if the data returned from the API calls is smaller - on my own Jenkins instances, they return quite a lot of data, probably more than the individual ubuntu instance API calls are returning - probably why I see more memory used.

davidtwco (Jun 25 2018 at 15:24, on Zulip):

Though the same increasing effect is observed.

davidtwco (Jun 25 2018 at 15:26, on Zulip):

Have you seen that the no threads function does not have a steady increase but stays pretty much the same? That's mostly where I'm confused - if that increased gradually, then at least it wouldn't be the threads. But it's that when introducing threads - that don't interact - suddenly there's an increase rather than a constant factor more memory usage.

simulacrum (Jun 25 2018 at 15:29, on Zulip):

Let me look

davidtwco (Jun 25 2018 at 15:30, on Zulip):

pasted image
my graph for the ubuntu ones - there's still a increase before it crashes, though not as sharp as I think it is returning less information in the API calls.

Jake Goulding (Jun 25 2018 at 15:30, on Zulip):

How big is your instances? e.g. how many threads?

davidtwco (Jun 25 2018 at 15:31, on Zulip):

Yeah - I don't think there's a memory leak, it is freed when the thread terminates at the end (if allowed to finish naturally), but I'd expect it to be freed before that (and it seemingly is when threads are not involved).

davidtwco (Jun 25 2018 at 15:31, on Zulip):

Same as the example I gave for ubuntu, five.

davidtwco (Jun 25 2018 at 15:31, on Zulip):

Though I have reproduced it with less (I forget exactly how few I've tried - I was checking whether any one of my Jenkins instances was causing the issue but alas, no)

simulacrum (Jun 25 2018 at 15:32, on Zulip):

Is it possible that while the majority of the allocations come from the get_full_job call the majority of the memory use is the jobs list (which lives until thread end)

davidtwco (Jun 25 2018 at 15:33, on Zulip):

I had considered that, but looking through the source of jenkins-api, get_full_job just takes the url of the job (a String) from the ShortJob in home.jobs and does a separate query on that (or slightly modified) URL - returning a new struct that we own. I'd understand it if it was a reference that had the same lifetime as the Jenkins instance, or as the ShortJob, but it isn't?

Jake Goulding (Jun 25 2018 at 15:34, on Zulip):

Yeah, I think you can see it with just one thread (that isn't main thread)

davidtwco (Jun 25 2018 at 15:34, on Zulip):

And even if that were the case, why wouldn't we see an, albeit smaller, but steady increase with the one thread?

simulacrum (Jun 25 2018 at 15:34, on Zulip):

Yes but ShortJob seems to be a fairly large struct in theory -- it has that #[serde(flatten)] other field...

In other words, ShortJob lives until thread end

davidtwco (Jun 25 2018 at 15:35, on Zulip):

Ah, yeah, good point.

simulacrum (Jun 25 2018 at 15:35, on Zulip):

So maybe we're basically seeing the same thing it's just less noticeable in the non-threaded case...

davidtwco (Jun 25 2018 at 15:36, on Zulip):

Yeah, I think you can see it with just one thread (that isn't main thread)

Is this referring to the how many instances query?

simulacrum (Jun 25 2018 at 15:37, on Zulip):

It's also possibly worth noting that it looks like these responses are fairly huge -- >45 MB for the one that crashes at least... and serde is likely storing effectively the whole response in memory via flatten etc

davidtwco (Jun 25 2018 at 15:37, on Zulip):

So maybe we're basically seeing the same thing it's just less noticeable in the non-threaded case...

Perhaps. Looking at a API response from Jenkins (just prefix a URL of a job with /api/json), the shortjob normally is only like three fields.

simulacrum (Jun 25 2018 at 15:37, on Zulip):

hm, okay

davidtwco (Jun 25 2018 at 15:38, on Zulip):

Here's an example. It has two builds whereas in my internal Jenkins instances, I have potentially hundreds, so the response is huge.

simulacrum (Jun 25 2018 at 15:39, on Zulip):

Threaded case for me:

GB
1.760^                                                          #
     |                                                       @@@#         :@:
     |                                                    @@@@@@#      ::::@:
     |                                                 @@@@@@@@@#:  ::@::::@:
     |                                             ::::@@@@@@@@@#:::::@::::@:
     |                                            :::: @@@@@@@@@#:::::@::::@:
     |                                        :@:::::: @@@@@@@@@#:::::@::::@:
     |                                     @:@:@: :::: @@@@@@@@@#:::::@::::@:
     |                                  ::@@:@:@: :::: @@@@@@@@@#:::::@::::@:
     |                               ::@::@@:@:@: :::: @@@@@@@@@#:::::@::::@:
     |                             @@: @::@@:@:@: :::: @@@@@@@@@#:::::@::::@::
     |                         :@@@@@: @::@@:@:@: :::: @@@@@@@@@#:::::@::::@::
     |                       @@:@@ @@: @::@@:@:@: :::: @@@@@@@@@#:::::@::::@::
     |                    @@@@@:@@ @@: @::@@:@:@: :::: @@@@@@@@@#:::::@::::@::
     |                 @@@@ @@@:@@ @@: @::@@:@:@: :::: @@@@@@@@@#:::::@::::@::
     |              @@@@@@@ @@@:@@ @@: @::@@:@:@: :::: @@@@@@@@@#:::::@::::@::
     |           @@:@@ @@@@ @@@:@@ @@: @::@@:@:@: :::: @@@@@@@@@#:::::@::::@::
     |        :@@@ :@@ @@@@ @@@:@@ @@: @::@@:@:@: :::: @@@@@@@@@#:::::@::::@::
     |     :@@@@@@ :@@ @@@@ @@@:@@ @@: @::@@:@:@: :::: @@@@@@@@@#:::::@::::@::
     |  @@@:@ @@@@ :@@ @@@@ @@@:@@ @@: @::@@:@:@: :::: @@@@@@@@@#:::::@::::@::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   68.91

Single:

    MB
458.3^              #
     |              #                                          :             :
     |              #                           :             ::             @
     |            @@#            :@            ::             ::            :@
     |            @ #           ::@           :::            :::           ::@
     |           :@ #          :::@           :::           ::::          :::@
     |           :@ #          :::@          ::::           ::::          :::@
     |         :::@ #         ::::@         :::::         ::::::         ::::@
     |         : :@ #        :::::@         :::::         : ::::        :::::@
     |        @: :@ #        :::::@       :::::::        :: ::::       ::::::@
     |        @: :@ #      :::::::@       : :::::       ::: ::::      :::::::@
     |      @@@: :@ #      : :::::@      :: :::::      :::: ::::      :::::::@
     |      @ @: :@ #     @: :::::@     @:: :::::      :::: ::::      :::::::@
     |     @@ @: :@ #     @: :::::@     @:: :::::     ::::: ::::     @:::::::@
     |    @@@ @: :@ #   ::@: :::::@    :@:: :::::     ::::: ::::    :@:::::::@
     |    @@@ @: :@ #   : @: :::::@    :@:: :::::   ::::::: ::::   ::@:::::::@
     |  @@@@@ @: :@ #  :: @: :::::@   ::@:: :::::   : ::::: ::::   ::@:::::::@
     |  @ @@@ @: :@ #  :: @: :::::@   ::@:: :::::   : ::::: ::::  :::@:::::::@
     | @@ @@@ @: :@ # ::: @: :::::@ ::::@:: :::::  :: ::::: :::: ::::@:::::::@
     | @@ @@@ @: :@ #:::: @: :::::@:: ::@:: :::::  :: ::::: :::: ::::@:::::::@
   0 +----------------------------------------------------------------------->Gi
     0                                                                   71.92
davidtwco (Jun 25 2018 at 15:39, on Zulip):

Each entry in the build list from the response being a ShortJob. It takes the url from it, appends /api/json and parses that into a CommonJob in get_full_job.

davidtwco (Jun 25 2018 at 15:40, on Zulip):

I think that is what I would expect, in the single threaded case, it frees each CommonJob and then fetches another. In the multi-threaded case, it just keeps growing (as if those CommonJobs, allocated by get_full_job are not being freed).

simulacrum (Jun 25 2018 at 15:40, on Zulip):

We're freeing common jobs -- just not short jobs

simulacrum (Jun 25 2018 at 15:41, on Zulip):

or at least, we should be

davidtwco (Jun 25 2018 at 15:43, on Zulip):

Yeah, that's what I'd expect. But we only fetch the ShortJobs once - in the get_home call - so it shouldn't be increasing over time. (I realise that the API example I gave previously was slightly incorrect, as we get the short job from the home query and then the CommonJob from the job page query - the builds key we were looking at is another thing entirely).

davidtwco (Jun 25 2018 at 15:43, on Zulip):

Only the CommonJobs are fetched over time, and therefore could contribute to the increasing memory (at least as far as I can work out).

simulacrum (Jun 25 2018 at 15:44, on Zulip):

I agree that what I would expect is that as we fetch ShortJobs we should increase (faster in the threaded case) and then the commonjobs should be a up/down (effectively average flat line)

davidtwco (Jun 25 2018 at 15:44, on Zulip):

Yeah, and the ShortJobs should live the duration of the execution.

davidtwco (Jun 25 2018 at 15:45, on Zulip):

(This is where we get the ShortJob from in the API - link (ala get_home) - and then something like this - link (ala get_full_job) - is a CommonJob - I got this mixed up earlier).

simulacrum (Jun 25 2018 at 15:46, on Zulip):

CommonJobs do look fairly small

davidtwco (Jun 25 2018 at 15:46, on Zulip):

I'd expect a increase in memory usage at the start to a sort-of base-level as all the ShortJob are loaded, then up and down spikes as CommonJob is loaded and unloaded as the iteration proceeds.

davidtwco (Jun 25 2018 at 15:47, on Zulip):

That reponse is slightly less as it doesn't use ?depth=1 like the example code (just returns more info - don't think it makes a difference in terms of this behaviour we're seeing) and the job I linked has very few builds.

davidtwco (Jun 25 2018 at 15:48, on Zulip):

I apologize if I'm doing a awful job at explaining and clarifying what I mean.

simulacrum (Jun 25 2018 at 15:49, on Zulip):

massif is reporting that ~60% of the allocated bytes come from the home call

simulacrum (Jun 25 2018 at 15:49, on Zulip):

Or at least that's how I understand the output

simulacrum (Jun 25 2018 at 15:50, on Zulip):

So maybe in both cases we're seeing the same behavior effectively -- steady growth, and then the get_full_job calls are fast and as such almost unnoticeable for ms_print... let me try switching it to time-based graph

davidtwco (Jun 25 2018 at 15:51, on Zulip):

Are you using massif visualizer to view the data in a nice way?

simulacrum (Jun 25 2018 at 15:51, on Zulip):

ms_print

davidtwco (Jun 25 2018 at 15:52, on Zulip):

Here's a link to some other massif data I've collected: link with my own Jenkins instances.

Temporary Jenkins Memory Profiling
Jake Goulding (Jun 25 2018 at 16:00, on Zulip):

line: 1, column: 35758080

35,758,080 that's 35MB

davidtwco (Jun 25 2018 at 16:00, on Zulip):

What's that from?

Jake Goulding (Jun 25 2018 at 16:00, on Zulip):

that seems odd, are any of these responses supposed to be that big

Jake Goulding (Jun 25 2018 at 16:01, on Zulip):

thread 'ubuntu2' panicked at 'called Result::unwrap() on an Err value: Error { kind: Json(Error("Unexpected eof during chunk size line", line: 1, column: 35758080)), url: None }', libcore/result.rs:945:5

simulacrum (Jun 25 2018 at 16:01, on Zulip):

Yeah, not sure. There doesn't seem to be a difference between the two profiles really, as far as I can tell

davidtwco (Jun 25 2018 at 16:01, on Zulip):

Yeah, they certainly could be. Jenkins can return lots of data.

Jake Goulding (Jun 25 2018 at 16:01, on Zulip):

jeez jenkins

davidtwco (Jun 25 2018 at 16:01, on Zulip):

I have jobs that just time out because we can't get the data fast enough.

davidtwco (Jun 25 2018 at 16:02, on Zulip):

Do either of you agree that there is some strange behaviour here or am I just going mad?

Jake Goulding (Jun 25 2018 at 16:02, on Zulip):

RIIR

simulacrum (Jun 25 2018 at 16:03, on Zulip):

If the common jobs are actually large, then yes, but if they're very small, then the results I get are expected

davidtwco (Jun 25 2018 at 16:04, on Zulip):

How is it explained if they are small?

simulacrum (Jun 25 2018 at 16:04, on Zulip):

Well, mostly explained -- we see growth only as the get_home call runs and then growth "stops" in theory if I'm right about where this comes from

simulacrum (Jun 25 2018 at 16:07, on Zulip):

I see the get_home call complete in ~2-4 seconds and then the iteration through it take <1 second

davidtwco (Jun 25 2018 at 16:07, on Zulip):

pasted image Here's an example of a large CommonJob response I picked at random from my own instances.

simulacrum (Jun 25 2018 at 16:08, on Zulip):

i.e., sure, we have large CommonJobs but they're nearly immediately deallocated

davidtwco (Jun 25 2018 at 16:09, on Zulip):

If the CommonJob allocations are small (or being correctly deallocated), then do you propose the get_home call is allocating the growing memory over time or something else?

simulacrum (Jun 25 2018 at 16:09, on Zulip):

right, yeah, that's where I think the memory growth comes from

davidtwco (Jun 25 2018 at 16:10, on Zulip):

Wouldn't that only allocate as the get_home call is happening - and afterwards during iteration, be constant?

simulacrum (Jun 25 2018 at 16:10, on Zulip):

right but I see iteration happen instantaneously in a release build at least

davidtwco (Jun 25 2018 at 16:10, on Zulip):

I'm open to the idea, whatever it is that causes this, I'd just want to figure out why it's happening. I've not looked into get_home much because my massif profiles have always pointed at get_full_job.

simulacrum (Jun 25 2018 at 16:11, on Zulip):

I don't see get_full_job in profiles at all...

davidtwco (Jun 25 2018 at 16:13, on Zulip):

pasted image
This is from the massif visualizer for data I've shared (particularly 19141) - if I go to the later snapshots, and dig down into the largest allocation location, I see get_full_job.

simulacrum (Jun 25 2018 at 16:13, on Zulip):

Let me try loading that file

davidtwco (Jun 25 2018 at 16:14, on Zulip):

(the image isn't great because I run massif visualizer in a VM - not supported on Windows - and the VM can't be larger than 1920x1080 and I need to resize the window partially offscreen to fit it all in because there's no horizontal scrollbar)

davidtwco (Jun 25 2018 at 16:14, on Zulip):

I see the same result in all my profiles.

davidtwco (Jun 25 2018 at 16:14, on Zulip):

The massif data lists the backtraces of memory allocations, ordered by % of total allocated memory which came from that location, for each snapshot in time, the most recent snapshot at the bottom.

davidtwco (Jun 25 2018 at 16:15, on Zulip):

So it ends up being a little bit up from the bottom of the file that you see the big "89% of allocated memory from X" in "snapshot 47" (for example) one that we want.

simulacrum (Jun 25 2018 at 16:16, on Zulip):

hm, yeah, that might be debug vs. release build difference

simulacrum (Jun 25 2018 at 16:16, on Zulip):

let me try a debug build...

davidtwco (Jun 25 2018 at 16:18, on Zulip):

It's interesting that there is a difference - I've not profiled the release builds, only observed the issue on them.

simulacrum (Jun 25 2018 at 16:20, on Zulip):

well, in theory we're not reading anything but display_name from full job which is tiny so LLVM could skip a bunch of allocation in theory

davidtwco (Jun 25 2018 at 16:21, on Zulip):

Ah. In the main application that I encountered this on, I take much more from the struct, so in release builds of that (which is what I've checked), it would still be pretty bad.

davidtwco (Jun 25 2018 at 16:21, on Zulip):

Printing the name was all that was included here as it was sufficient to demonstrate it.

simulacrum (Jun 25 2018 at 16:22, on Zulip):

that could explain it

davidtwco (Jun 25 2018 at 16:30, on Zulip):

Would it be possible to send your massif output @simulacrum so I can see what it looks like? Curious about the increased allocations from get_home.

simulacrum (Jun 25 2018 at 16:31, on Zulip):

Sure, yeah

simulacrum (Jun 25 2018 at 16:32, on Zulip):

massif.out.single.bytes massif.out.single massif.out.threaded.bytes

davidtwco (Jun 25 2018 at 16:32, on Zulip):

What's different about the *.bytes profiles?

simulacrum (Jun 25 2018 at 16:32, on Zulip):

massif.out.threaded massif.debug.multi

simulacrum (Jun 25 2018 at 16:33, on Zulip):

They aren't all that interesting -- --time-unit=B vs. --time-unit=I

davidtwco (Jun 25 2018 at 16:33, on Zulip):

Ah, cool.

davidtwco (Jun 25 2018 at 16:36, on Zulip):

That is interesting. They are very different from my own debug profiles.

simulacrum (Jun 25 2018 at 16:36, on Zulip):

all but .debug.multi I think are release profiles

simulacrum (Jun 25 2018 at 16:36, on Zulip):

but you should be able to see that in the profile (based on the command run)

davidtwco (Jun 25 2018 at 16:37, on Zulip):

Yeah. I think in .debug.multi you need --depth=100 (or similar) in order to see what call it originates from as the backtrace is much larger on debug builds.

simulacrum (Jun 25 2018 at 16:37, on Zulip):

yeah, possible

davidtwco (Jun 25 2018 at 16:38, on Zulip):

Well, I can say that I'm no closer to understanding this problem. If anything, I've gotten more confused.

simulacrum (Jun 25 2018 at 16:39, on Zulip):

Mostly agreed -- it looks like there's somewhat of a "leak" in that memory use constantly increases but I can't see how that could be thread-related

davidtwco (Jun 25 2018 at 16:43, on Zulip):

Yeah, it's the threading that has been confusing me. I'd expect it to be a problem regardless of threads, or not at all.

simulacrum (Jun 25 2018 at 16:47, on Zulip):

However, I'm not completely certain that I see different behavior

davidtwco (Jun 25 2018 at 16:48, on Zulip):

I think it's the same thing, just manifesting itself differently in the profile - the end result of a gradually increasing memory footprint, only when threaded, is the same.

davidtwco (Jun 25 2018 at 16:50, on Zulip):

My theory, as much as it makes no sense, is that the CommonJobs from get_full_job is where the allocations are accumulating and for some reason, when multi-threaded, are dropped at the end of the thread, and when single-threaded, are dropped at the end of the function. It makes _zero_ sense, but that lines up with what I think I'm observing.

davidtwco (Jun 25 2018 at 16:50, on Zulip):

And with the graphs you posted earlier, if we assume that the get_home allocation is just a constant size that both include.

simulacrum (Jun 25 2018 at 16:50, on Zulip):

I don't think I see a difference threaded vs. not threaded -- both profiles have steady increase for thread lifetime

simulacrum (Jun 25 2018 at 16:51, on Zulip):

i.e. there is no period in them when the bytes allocated is constant

davidtwco (Jun 25 2018 at 16:51, on Zulip):

I would expect that in both, we would see spikes of allocations and de-allocations, but the spikes when multi-threaded would be larger.

davidtwco (Jun 25 2018 at 16:51, on Zulip):

(in the ideal case)

davidtwco (Jun 25 2018 at 16:52, on Zulip):

But then again, I have no idea.

simulacrum (Jun 25 2018 at 16:53, on Zulip):

I'd expect that the thread upon start would show steady growth as get_home runs and then a spikey flat line for the iteration, allocating/deallocating commonjobs

davidtwco (Jun 25 2018 at 16:54, on Zulip):

Yeah, that's what I mean. But instead we see a steady increase instead of the spikes.

simulacrum (Jun 25 2018 at 16:54, on Zulip):

well, instead of the spikey flat line, right?

davidtwco (Jun 25 2018 at 16:54, on Zulip):

Yeah.

simulacrum (Jun 25 2018 at 16:55, on Zulip):

Yeah, I think so -- and we don't know why

davidtwco (Jun 25 2018 at 16:55, on Zulip):

I think we've now agreed on what the problem is.

simulacrum (Jun 25 2018 at 16:56, on Zulip):

Yes, though we're no closer to finding out why :)

davidtwco (Jun 25 2018 at 16:57, on Zulip):

Yes. In fact, I've regressed in my understanding of this issue since I discovered it last week.

Jake Goulding (Jun 25 2018 at 17:03, on Zulip):

That's usually a good thing, while not feeling like it ;-)

Jake Goulding (Jun 25 2018 at 17:13, on Zulip):

You know, I wonder if it's a bug with Hyper / Tokio

pasted image

I added some prints:

    for (i, job) in home.jobs.into_iter().enumerate() {
        eprintln!("Mark generation {} now", i);
        use std::io::Write;
        std::io::stderr().flush().unwrap();
        thread::sleep_ms(5_000);
        buggy_code2(&jenkins, job);
    }

but those never showed up. It used 800+ MB without ever running that eprintln...

simulacrum (Jun 25 2018 at 17:20, on Zulip):

If that's from the initial population of home then that would be reasonable...

davidtwco (Jun 25 2018 at 17:21, on Zulip):

What do you mean by initial population? Does home not only get populated once (per thread)?

davidtwco (Jun 25 2018 at 17:21, on Zulip):

(may be confused with terminology)

davidtwco (Jun 25 2018 at 17:23, on Zulip):

This may be inconsequential, but this code came up when I was running massif the first few times, but on recent runs it hasn't shown up, not sure why. It struck me as potentially the issue, though unlikely.

I sent this earlier, some other information - when I was originally debugging this particular part of code (when it was still coming up in backtraces, I've not looked since or investigated why it no longer is), I placed a conditional breakpoint in that code that never ever got triggered but the problem went away when I did so IIRC (this was a week ago).

Jake Goulding (Jun 25 2018 at 17:25, on Zulip):

impl<'a> ToString for Path<'a> {

That's a PR waiting to happen

simulacrum (Jun 25 2018 at 17:25, on Zulip):

Yes, that initial and only population

davidtwco (Jun 25 2018 at 17:25, on Zulip):

Ah, I assumed when you said initial that you were implying there were further populations. My bad.

simulacrum (Jun 25 2018 at 17:27, on Zulip):

That code surface-level looks wrong to me (not panic safe)... but I don't obviously see how it could be related

davidtwco (Jun 25 2018 at 17:28, on Zulip):

I don't think it is related, but on an initial glance I thought it might make the buffer too large if there was a data race or something like that due to the unsafe block.

simulacrum (Jun 25 2018 at 17:29, on Zulip):

hm, I don't think so

Jake Goulding (Jun 25 2018 at 17:33, on Zulip):

Now I'm installing Wireshark to see what the network data is

davidtwco (Jun 25 2018 at 17:38, on Zulip):

So, that code is still being called. The old_len size does keep increasing (I kept setting increasingly larger and larger conditional breakpoints on line 213) and with a conditional breakpoint that will not be hit (checking for old_len < 0), the memory usage does not increase (it is a lot slower, so I might just not have observed it yet).

simulacrum (Jun 25 2018 at 17:39, on Zulip):

Isn't that expected behavior though? i.e., that we want to grow the buffer?

simulacrum (Jun 25 2018 at 17:39, on Zulip):

At least I understood that code as intentionally growing the buffer

davidtwco (Jun 25 2018 at 17:40, on Zulip):

It is, yeah.

davidtwco (Jun 25 2018 at 17:41, on Zulip):

I think I may be jumping the gun, I think it's just increasing slowly.

davidtwco (Jun 25 2018 at 17:41, on Zulip):

And I know that the buffer isn't shared between threads after having dug into that library a bit.

davidtwco (Jun 25 2018 at 17:41, on Zulip):

(which makes sense, there's no reason for it to be shared)

Jake Goulding (Jun 25 2018 at 17:49, on Zulip):

@David Wood are you using https locally too? can you use not?

davidtwco (Jun 25 2018 at 17:49, on Zulip):

I'm only using http locally. It's not TLS by the looks of it.

Jake Goulding (Jun 25 2018 at 17:51, on Zulip):

ah. I think my errors were coming from https (at least switching to http gets further)

davidtwco (Jun 25 2018 at 17:52, on Zulip):

Yeah, normally mine will execute until the end (for my local Jenkins instances, it's around 27 mins).

Jake Goulding (Jun 25 2018 at 17:54, on Zulip):

pasted image

Seems reasonably stable

Jake Goulding (Jun 25 2018 at 17:54, on Zulip):

clearly you need to switch to macOS

davidtwco (Jun 25 2018 at 17:56, on Zulip):

That's interesting - I wonder if is proportional to how many threads there are? I've had it happen on Linux and Windows (well, technically, inside WSL, but close enough).

davidtwco (Jun 25 2018 at 17:57, on Zulip):

In fact, that might make sense - when I ran the original program to completion with heaptrack (it took 2 days and 18 hours..), it started to plateau at around 7GB.

davidtwco (Jun 25 2018 at 17:58, on Zulip):

If a relationship between number of threads and maximum memory (and size of responses from Jenkins) holds, then I guess the question is, why does increasing the number of threads have a large (non-linear?) impact on memory usage of the threads?

davidtwco (Jun 25 2018 at 18:00, on Zulip):

Running a single-instance, single-threaded process for each Jenkins instance would result in a great deal less memory being used (which is how I'm working around this for now, I'm really just curious why it is happening)

Jake Goulding (Jun 25 2018 at 18:12, on Zulip):

I mean, I still think there's something truly wrong cause there was only like a few MB of data transferred. Where are 100MB coming form

simulacrum (Jun 25 2018 at 18:21, on Zulip):

Right, it's not clear where the memory use is coming from

davidtwco (Jun 25 2018 at 18:25, on Zulip):

When I run heaptrack, it shows the size of allocations as a bar chart. The vast majority of allocations are very small. For that reason I've always considered it a "why isn't it being dropped" rather than "why is it being allocated".

davidtwco (Jun 25 2018 at 18:25, on Zulip):

Since I expected the allocations to be small, like the responses.

davidtwco (Jun 25 2018 at 18:26, on Zulip):

Though, it occurs to me it could just be a few large allocations alongside a lot of small ones.

simulacrum (Jun 25 2018 at 18:26, on Zulip):

BTreeMap is inherently lots of small allocations, IIRC

simulacrum (Jun 25 2018 at 18:27, on Zulip):

And BTreeMap is where the majority of memory is spent according to massif

Jake Goulding (Jun 25 2018 at 19:41, on Zulip):

I see each thread allocating ~400MB at "boot":

pasted image

It's all coming from get_home:

pasted image

There are 20997 entries in the jobs vector...

davidtwco (Jun 25 2018 at 19:41, on Zulip):

Is that the same amount allocated at boot with no threads?

Jake Goulding (Jun 25 2018 at 19:45, on Zulip):

pasted image

Looks like ~400 MB twice

davidtwco (Jun 25 2018 at 19:45, on Zulip):

Huh, I wonder why it is twice.

davidtwco (Jun 25 2018 at 19:46, on Zulip):

I guess that contributes to a bunch of the memory usage. Does it explain the memory increasing with threads?

Jake Goulding (Jun 25 2018 at 19:46, on Zulip):

each instance is sequential w/o threads, yeah?

davidtwco (Jun 25 2018 at 19:46, on Zulip):

Yeah.

davidtwco (Jun 25 2018 at 19:46, on Zulip):

I guess it would be that.

Jake Goulding (Jun 25 2018 at 19:51, on Zulip):

You could start scattering https://crates.io/crates/heapsize annotations everywhere...

Jake Goulding (Jun 25 2018 at 20:02, on Zulip):

@David Wood can you use nightly? I wonder if https://github.com/rust-lang/rust/pull/50352 would help you out a lot.

davidtwco (Jun 25 2018 at 20:03, on Zulip):

If that was merged on 12th May, wouldn't it be in nightly and helping right now? Since we need to use nightly to use the system allocator in our profiles?

Jake Goulding (Jun 25 2018 at 20:04, on Zulip):

Uhhhhh
shhh.

Jake Goulding (Jun 25 2018 at 20:05, on Zulip):

This explains why cargo +nightly build had nothing to do...

davidtwco (Jun 25 2018 at 20:06, on Zulip):

I wonder if there's a way to improve the 56MB response from get_home turning into ~400MB of data structure.

davidtwco (Jun 25 2018 at 20:07, on Zulip):

serde_json taking up more space than the responses certainly explains the magnitude of the memory usage, just not the increasing behaviour?

Jake Goulding (Jun 25 2018 at 20:08, on Zulip):

right, but I can't yet see the increasing behavior. It is interesting you saw it on multiple OS though.

You said WSL, right? I wonder what "system allocator" means there

davidtwco (Jun 25 2018 at 20:09, on Zulip):

Yeah, all the data I've shown has been from WSL - that's where I primarily work. The linux machine it's ran on was the one where I discovered the issue when the program got killed.

Jake Goulding (Jun 25 2018 at 20:11, on Zulip):

Would it be non-trivial to try Windows Real?

Jake Goulding (Jun 25 2018 at 20:12, on Zulip):

but I suppose you did jemalloc and "system"...

davidtwco (Jun 25 2018 at 20:12, on Zulip):

https://github.com/mockersf/jenkins-api.rs/issues/19 could certainly help reduce the size of the requests - but as ?depth=1 is required to get the data I need, and that's incompatible with tree (oh, Jenkins...). Though, we don't need ?depth=1 for the get_home call - that makes it _way_ bigger and the library parses it as ShortJob anyway (throwing away a bunch of the data). So perhaps that is a improvement for reducing the usage. Again, doesn't explain the increasing behaviour.

davidtwco (Jun 25 2018 at 20:13, on Zulip):

I can give it a go.

Jake Goulding (Jun 25 2018 at 20:23, on Zulip):

(throwing away a bunch of the data)

But it doesn't, right?

    #[serde(flatten)]
    pub(crate) other_fields: Option<serde_json::Value>,
davidtwco (Jun 25 2018 at 20:24, on Zulip):

Technically no, but there's no API (unlike CommonJob) to convert it to a type that will take advantage of all the fields.

Jake Goulding (Jun 25 2018 at 20:28, on Zulip):

Then it seems like a terrible idea to have the other_fields field...

davidtwco (Jun 25 2018 at 20:29, on Zulip):

There's no reason the author couldn't add a function that does it - but it would only have sufficient information to do it if the Jenkins instance was made with ?depth=1.

davidtwco (Jun 25 2018 at 20:30, on Zulip):

It seems like this still happens on Windows.

Jake Goulding (Jun 25 2018 at 20:30, on Zulip):

huh. I can't fathom why I wouldn't see it on macos

davidtwco (Jun 25 2018 at 20:31, on Zulip):

VMMap is actually pretty interesting - it shows each individual allocation and its size. I should be able to get traces too. But saving the data to a file makes it stop responding, so, there's that.

davidtwco (Jun 25 2018 at 20:31, on Zulip):

I think it does happen on MacOS based on what you sent previously - it's just that when running with two instances, the maximum that it increases to is lower?

Jake Goulding (Jun 25 2018 at 20:32, on Zulip):

I removed

    #[serde(flatten)]
    pub(crate) other_fields: Option<serde_json::Value>,

And it still compiles. So that be a thing to try.

davidtwco (Jun 25 2018 at 20:34, on Zulip):

Yeah, I've got a few different ways to lower the overall memory usage now, I think.

Jake Goulding (Jun 25 2018 at 20:36, on Zulip):

I do think there's a bug in Hyper / Tokio. It pretty reliably dies with

thread 'ubuntu1' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Json(Error("Unexpected eof during chunk size line", line: 1, column: 58261504)), url: None }', libcore/result.rs:945:5
davidtwco (Jun 25 2018 at 20:37, on Zulip):

Isn't that only because I call unwrap() everywhere in the repro?

Jake Goulding (Jun 25 2018 at 20:39, on Zulip):

I mean, how is the data being corrupted?

davidtwco (Jun 25 2018 at 20:40, on Zulip):

It wouldn't surprise me if the Jenkins instance was sending messed up data when asked for jobs that have _lots_ of data rather than it being a bug in Tokio/Hyper.

Jake Goulding (Jun 25 2018 at 20:40, on Zulip):

heh, if thread2 gets the panic, thread1 can keep on going

Jake Goulding (Jun 25 2018 at 20:40, on Zulip):

I spose I'd have to figure out how to use wireshark with HTTPS to find out for sure

davidtwco (Jun 25 2018 at 20:43, on Zulip):

We're still not sure why it increases gradually though. There's an expected increase as all the get_home calls return, then it should be a straight line with spikes for the CommonJob allocations and frees - but there isn't, it just keeps going. I don't know how much memory gets allocated for the get_home call on my Jenkins instances, but over time it increases up to 7GB of memory, so it can't be from get_home, you can observe all the threads having passed that point.

Jake Goulding (Jun 25 2018 at 20:47, on Zulip):

I'm trying with 2x threads. One thread died right away, but the other kept going. Over 50 jobs (~5 min), it allocated a net of +17.42 MB. Is that the rate you were seeing?

davidtwco (Jun 25 2018 at 20:47, on Zulip):

I believe heaptrack said 20.4MB/s - but yeah, thereabouts.

davidtwco (Jun 25 2018 at 20:49, on Zulip):

Unless you meant it allocated a total of 17.42MB. That's about what I'd see each second.

Jake Goulding (Jun 25 2018 at 20:52, on Zulip):

a total, yes

Jake Goulding (Jun 25 2018 at 20:53, on Zulip):

Which I'd actually expect that to be because my range doesn't include the next deallocation

Jake Goulding (Jun 25 2018 at 20:58, on Zulip):

pasted image

generations are roughly each iteration of the jobs for loop

davidtwco (Jun 25 2018 at 20:59, on Zulip):

You're not seeing any allocations from the later loop iterations?

Jake Goulding (Jun 25 2018 at 21:03, on Zulip):

Nope
pasted image

davidtwco (Jun 25 2018 at 21:03, on Zulip):

Maybe I do need to switch to macOS.

Jake Goulding (Jun 25 2018 at 21:05, on Zulip):

well, I can't get both threads to run at the same time, so maybe it's not all milk and honey

Jake Goulding (Jun 25 2018 at 21:15, on Zulip):

Now I have a Mutex and a Barrier. Exciting

Jake Goulding (Jun 25 2018 at 21:18, on Zulip):

haha. you'll love this — the memory usage is now going down over the iterations

Jake Goulding (Jun 25 2018 at 21:24, on Zulip):

pasted image

2 threads, 44 iterations. no memory increase

davidtwco (Jun 25 2018 at 21:25, on Zulip):

It seems a lot like my local Jenkins data is causing a quite different effect.

davidtwco (Jun 25 2018 at 21:26, on Zulip):

pasted image This is my most recent profiling run with the ubuntu Jenkins.

Jake Goulding (Jun 25 2018 at 21:26, on Zulip):

The first flag is 861MB , the second is 856MB

davidtwco (Jun 25 2018 at 21:26, on Zulip):

That's with five copies of the instance.

davidtwco (Jun 25 2018 at 21:27, on Zulip):

It definitely terminates quicker and doesn't go up as much as with my local Jenkins.

davidtwco (Jun 25 2018 at 21:27, on Zulip):

In particular, the profiling data said get_home for that run.

davidtwco (Jun 25 2018 at 21:27, on Zulip):

But my local Jenkins massif data is all get_full_job.

Jake Goulding (Jun 25 2018 at 21:28, on Zulip):

huh. 5threads ~ 800 MB. Mine is 2threads ~800 MB.

davidtwco (Jun 25 2018 at 21:29, on Zulip):

Well, I have no clue what's going on. Pretty sure there's a bug here, but I could tell you nothing about it.

davidtwco (Jun 25 2018 at 22:28, on Zulip):

Currently running memory profile for my Jenkins instances in single-threaded and multi-threaded. For multi-threaded, I see the memory usage creep up over time, currently at 14.6% of my total memory. For the single-threaded, it doesn't grow, it got to a constant 0.5-0.7% and it stays around there. Just in writing this, multi-threaded has increased to 15.7% now. I'll post the massif data when it finishes. But this is enough to convince me that there is a problem and I'm not imagining things.

davidtwco (Jun 25 2018 at 22:56, on Zulip):

massif-singlethreaded.out.18768 massif-multithreaded.out.18042

davidtwco (Jun 26 2018 at 14:09, on Zulip):

So, I just got a chance today to look at the data from those profiles yesterday. Sometime after my comment here, the single-threaded memory usage spiked.

davidtwco (Jun 26 2018 at 14:10, on Zulip):

I'm thinking that this isn't a threading issue at all. It's just that the threads highlight the problem quicker/more obviously.

davidtwco (Jun 26 2018 at 14:11, on Zulip):

Or there might not be a problem at all, and this application just uses a bunch of memory in a strange-looking way. In which case, I'll have wasted your time, if that's the case - sorry about that.

davidtwco (Jun 26 2018 at 16:13, on Zulip):

I'm now pretty confident having done a bunch more profiling that threads aren't an issue at all - I've got a Jenkins response that is giant (hundreds of megabytes) and that makes the memory usage spike something awful - I've seen it range from 1.9GB to 7GB total, and I'm not sure why there's a range, but I've confirmed that everything looks about normal without that Jenkins instance.

Jake Goulding (Jun 26 2018 at 16:14, on Zulip):

FWIW, I did the digging I did because I wanted to learn a bit more about using Instruments for memory-related stuff, so no harm no foul

davidtwco (Jun 26 2018 at 16:31, on Zulip):

(just as an FYI, it occurs to me the 7GB usage is explained by that job being huge too - in the actual application it goes further and uses fetches the builds of that job, they're also going to be giant, so the usage increases then too - to 7GB. In the smaller repro, it just uses 1.9GB because it is only the job.)

Jake Goulding (Jun 26 2018 at 16:42, on Zulip):

I don't think any API should be sending back 10MB of JSON, let alone 100 or 1000MB

davidtwco (Jun 26 2018 at 16:46, on Zulip):

I think it should be solvable with some PRs (though the author has started work on this functionality) to the jenkins-api crate so that it can use the tree query parameter instead of only depth.

Jake Goulding (Jun 26 2018 at 16:49, on Zulip):

it's also possible that this could be a case where you may want to scatter some drop calls about, if you have "longer" methods

davidtwco (Jun 26 2018 at 16:50, on Zulip):

Perhaps, yeah. Regardless, thanks to both you (@Jake Goulding ) and @simulacrum for spending some time looking into this.

Last update: Nov 22 2019 at 00:05UTC