Stream: t-compiler/wg-rls-2.0

Topic: #2835


Edwin Cheng (Jan 15 2020 at 12:34, on Zulip):

I am trying to use Currency Visualizer to analyse this bug, It still in progress, but I found out that a lot of times in main-loop thread is wasting one waiting the salsa inner RWLock.

Edwin Cheng (Jan 15 2020 at 12:39, on Zulip):

thread.png

Edwin Cheng (Jan 15 2020 at 12:42, on Zulip):

Green one is Premptted State, Nepal Blue on is Sleep State

Edwin Cheng (Jan 15 2020 at 12:44, on Zulip):

The left side of the frame I added is the time the Send<Enter> request is sent in the test thread.

Edwin Cheng (Jan 15 2020 at 12:45, on Zulip):

And the actual lock is in https://github.com/salsa-rs/salsa/blob/d1e9471bee517e66b8af2b82e263878d02353547/src/runtime.rs#L313

Laurențiu Nicola (Jan 15 2020 at 13:12, on Zulip):

Wild guess: the comment on that function says "Acquires the global query write lock (ensuring that no queries are executing)". Maybe some readers are starving the main thread which is trying make a change?

Laurențiu Nicola (Jan 15 2020 at 13:13, on Zulip):

(I'm not familiar with the code, nor can I see the colors you mention)

Edwin Cheng (Jan 15 2020 at 13:14, on Zulip):

Yes, and this is what the test is doing.. https://github.com/rust-analyzer/rust-analyzer/blob/4444192b05c107a40a5a05ea3c9091ad8f8cbbcc/crates/ra_lsp_server/tests/heavy_tests/main.rs#L463

Edwin Cheng (Jan 15 2020 at 13:14, on Zulip):

Oh, I am color weakness, so I just grap the RGB value and use https://www.color-blindness.com/color-name-hue/ to tell the color

Laurențiu Nicola (Jan 15 2020 at 13:15, on Zulip):

:D

Jeremy Kolb (Jan 15 2020 at 13:16, on Zulip):

That's a useful site. I was in a work meeting about 6 months ago and a coworker realized that he was color blind when someone had green and red stoplights in an excel spreadsheet...

Laurențiu Nicola (Jan 15 2020 at 13:17, on Zulip):

On the other hand, in that issue matklad mentioned stuff being blocked on clippy, not RA diagnostics. I'm not sure how that works.

Jeremy Kolb (Jan 15 2020 at 13:18, on Zulip):

The scheduler on windows isn't very fair and it could be exposing the issue.

Edwin Cheng (Jan 15 2020 at 13:18, on Zulip):

color_table.png

Laurențiu Nicola (Jan 15 2020 at 13:19, on Zulip):

It could also be something different, like slower disk access

Jeremy Kolb (Jan 15 2020 at 13:20, on Zulip):

antivirus...

Edwin Cheng (Jan 15 2020 at 13:20, on Zulip):

Clippy just make the case worse, OTOH github VM is only have 2 core, so that's why bors always failed :smiley:

Laurențiu Nicola (Jan 15 2020 at 13:22, on Zulip):

But why is clippy running during that test? Aren't the files in-memory?

Edwin Cheng (Jan 15 2020 at 13:30, on Zulip):

And one of the lock is aquired is, when we create a world snapshot (e.g. update_file_notifications_on_threadpool) , a read lock is held inside SalsaDB.

Laurențiu Nicola (Jan 15 2020 at 13:44, on Zulip):

I tried the CPU affinity trick on Linux, that request takes ~80 ms on a single core (and 7 ms without setting the affinity)

matklad (Jan 15 2020 at 13:58, on Zulip):

I think I've noticed at leas something suspicious!

cc @Emil Lauridsen

In WorldSnapshot, we store an Arc<RwLock<CheckWatcherSharedState>>. We read lock this lock in handle_diagnostics.

Additionally, we .write this lock from the watcher thread in CheckWatcherState::run.

I think in general this is less then ideal, b/c diagnostics request can be blocked on another thread. I think it makes sense to architect this in a way which does not block.

For that, we stop sharing the state between ServerWorld and CheckWatcherState. Instead, the watcher thread sends new diagnostics via a channel, and we accomodate thouse diagnostics intot he server state in the main loop.

So, instead of

struct Server {
    diagnostics: Arc<Mutex<Vec<Diagnostics>>>,
}

struct Watcher {
    diagnostics: Arc<Mutex<Vec<Diagnostics>>>,
}

we'll have something like this:

struct Server {
    // this bit now *owns* diagnostics
    diagnostisc: Vec<Diagnostics>
}

struct Watcher {
    diagnostics_sink: Sender<Vec<Diagnostics>>,
}

I am not sure this is the cuprit of slowness on widnows, but I think we should fix it, because it's very useful when all changes to the server's state can occur only via the main loop.

Note how VFS is set up in a similar way: instead of modifing some global hash map with files, VFS sends a message to the main looop that hey, I have these new files for you. The main loop than incorporates the changes itself.

matklad (Jan 15 2020 at 13:59, on Zulip):

Note that I think we'll still need some locks here, to share the state between ServerWorld and WorldSnapshot, but we won't actually be changing anyting mid-snapshot

Emil Lauridsen (Jan 15 2020 at 14:07, on Zulip):

That seems possible, and a reasonable re-factoring. I had considered that the locking could be problematic, but haven't actually looked into it yet. Give me 45 minutes or so and I'll try to have a draft PR up for testing

matklad (Jan 15 2020 at 14:16, on Zulip):

In general, those RwLocks in ServerWorld are sort-of fake.

What we really have, conceptually, is a single RwLock across the whole state.

The only thing that .writes to this lock is the main loop.

All handlers are run in the background and .read from the lock.

When a main loop wants to lock the state, but there's some background processing, it cancells background task, which die and release the look for the loop to catch it.

The reason why this is not literarly strutcutre that way is that the most important lock is actually hidden in salsa, and we can't piggy back on it unfortunately :(

Emil Lauridsen (Jan 15 2020 at 14:58, on Zulip):

PR is up: https://github.com/rust-analyzer/rust-analyzer/pull/2853

Edwin Cheng (Jan 16 2020 at 04:02, on Zulip):

Okay, I finally conclude a theory what happens in diagnostics_dont_block_typing case.

Some facts:

  1. DidOpenTextDocument notification will be translate to Event::Vfs in main-loopand it will trigger the RWLock write lock in main-loop.
  2. A write lock will be waited until all other WorldSnapshot cancelled.
  3. We handle event one by one in main-loop, basically is a read, check, apply-change cycle.

What diagnostics_dont_block_typing do is, pushing 10 DidOpenTextDocument notification, and then push a OnEnter request and check whether that OnEnter event can handle in a short period. Actually all these events are pushed to main-loop almost instantly.

However, in the main-loop, (e.g let say we named these events as [1d,2d,3d,4d,5d... , OnEnter]), when we handle 2d, because it is a Event::Vfs, it have to wait for 1d to cancel. Such that all these events will be handled sequentially. And depends on how fast the main-loop is processing, and if the 1d is already in very deep phase (e.g some parsing phase), it will be quite slow.

OTOH, if we raise the mainloop priority, the main-loop will be processing very fast such that all these events will be in a very shallow phase, the cancellation will be quite fast and that's why it is fast in that case.

Do I miss something important I miss to consider ?

matklad (Jan 16 2020 at 10:59, on Zulip):

Hm, I think 2 is not true. Snapshots don't really hold the read lock all the time, they only acquire readlock for a very brief moment in time to copy data out of the VFS.

matklad (Jan 16 2020 at 11:02, on Zulip):

It might be best to look at logging/profiling output out of the main_loop.

    let _p = profile("main_loop_inner/loop-turn");
    log::info!("loop turn = {:?}", event);

These two lines should show us exactly which event takes long time to process

Edwin Cheng (Jan 16 2020 at 11:07, on Zulip):

I mean salsa snapshot rwlock

Edwin Cheng (Jan 16 2020 at 11:09, on Zulip):

This one : https://github.com/salsa-rs/salsa/blob/d1e9471bee517e66b8af2b82e263878d02353547/src/runtime.rs#L35

matklad (Jan 16 2020 at 11:15, on Zulip):

I think this should not be a problem, by itself.

  1. main loop gets DidOpen from the editor
  2. main loop write-locks vfs and pushes a new pending change. This should be fast (assuming lock fairness), b/c we never hold vfs lock for long
  3. main loop calls world_state.process_changes(), which actually intorporates the change
  4. process_changes calls self.analysis_host.apply_change(change), which advances salsa's revision (thus cancelling background tasks) and waits for the salsa lock
  5. now, all background tasks should be cancelled as soon as they do a next salsa query. They can spend some CPU time finish parsing or something like that, but it shoulnd take more than a dozen of millis
Edwin Cheng (Jan 16 2020 at 11:22, on Zulip):

in step 4, it have to wait for the current task to cancel, so that means as in diagnostics_dont_block_typing, all events will be handle sequentially, right ? if these cancelling take 80 mills, it will take 800 mills.

matklad (Jan 16 2020 at 11:25, on Zulip):

Hmm, I see.

So your hypothesis is that we have 10 DidOpen events, and every one of them takes some time to process, but together it ends up to a substantional delay?

Edwin Cheng (Jan 16 2020 at 11:26, on Zulip):

yes !

matklad (Jan 16 2020 at 11:26, on Zulip):

That is possible I think, though I'd still expect that we spend <200ms per event

matklad (Jan 16 2020 at 11:27, on Zulip):

We really need to look at the traces and profiling info...

Edwin Cheng (Jan 16 2020 at 11:27, on Zulip):

But even we spend only 80ms per event, 10 events add up will be 800ms

Edwin Cheng (Jan 16 2020 at 11:36, on Zulip):

One of the problem here for logging is, IO will disturb how thread behavior. But I have some event graphs captured by Concurrency Visualizer which show these behavior, I will post it here when Im back home.

Edwin Cheng (Jan 16 2020 at 13:02, on Zulip):

prove2.png

Edwin Cheng (Jan 16 2020 at 13:03, on Zulip):

I hope it could be proved my hypothesis is correct :)

Jeremy Kolb (Jan 16 2020 at 13:47, on Zulip):

off topic but where do you get concurrency visualizer from?

Edwin Cheng (Jan 16 2020 at 14:30, on Zulip):

@Jeremy Kolb https://marketplace.visualstudio.com/items?itemName=Diagnostics.ConcurrencyVisualizer2017

Installed as Visual Studio 2017 Extension

Last update: Feb 25 2020 at 04:25UTC