Stream: t-compiler/rust-analyzer

Topic: rust-analyzer vfs

Lucien Greathouse (Aug 20 2019 at 20:14, on Zulip):

Hello all!

I've been working on a project with its own VFS implementation that looks really similar to what ra_vfs is doing. Is this kind of crate a good candidate for generalizing for the rest of the Rust ecosystem?

My project's VFS strategy was pretty similar to ra_vfs, with tracking of multiple roots and reading in contents pretty aggressively. I've been rewriting it recently to ditch the notion of roots and also do I/O lazily on-demand, which obviates the need for ignores in my application and also reduces incidents of my application trying to read files while they're still being saved.

If our use cases line up, I'm interested in factoring out a VFS crate with all the combined wisdom!

matklad (Aug 20 2019 at 20:36, on Zulip):

@Lucien Greathouse that's interesting! What do you use for watching files for changes?

Lucien Greathouse (Aug 20 2019 at 20:36, on Zulip):

notify 4.x, considering porting over to 5.x since it looks like it uses crossbeam-channel now. Not being able to select over std mpsc channels is troublesome.

matklad (Aug 20 2019 at 20:38, on Zulip):

Yeah, so the file watching bit is the thing that interests me most, as it turns out to be the most complicated thing. Issues like this one are a pretty common occurence:

Lucien Greathouse (Aug 20 2019 at 20:38, on Zulip):

Yep, we get bug reports like that from MacOS users too. D:

matklad (Aug 20 2019 at 20:39, on Zulip):

I'd love to be able to abstract that part. Specifically, having a library on top of notify to which I can tell "watch this direcotry", and which will manage initial walkdir + managing notfications transparently. Like, I want walkdir, which also does watching

matklad (Aug 20 2019 at 20:40, on Zulip):

It'd be cool to extract this behind the API, as I am thinking that maybe using facebook's watchman over notify is a better thing to do...

matklad (Aug 20 2019 at 20:40, on Zulip):

Re lazy vs. eager file loading, that's an interesting design question! Rust-analyzer is eager by design, but I am not entirely sure that it's the right design....

matklad (Aug 20 2019 at 20:42, on Zulip):

The hard invariant I want to maintain is that no IO happens during analysis. With eager file loading, I do all the IO upfront, handle all the errors, and then the rest of rust-analyzer's compiler literally never has to deal with errors.

matklad (Aug 20 2019 at 20:43, on Zulip):

I wonder if it is possible to marry lazy loading and absence of errors though.... Like, on the error you can return an empty file, and just signal the error via a side channel

Lucien Greathouse (Aug 20 2019 at 20:43, on Zulip):

That was an invariant we had for ours too. The base of the application is basically webpack, so user-configurable transformation of files into whatever else, and we model those as pure functional snapshot functions in terms of the in-memory files

matklad (Aug 20 2019 at 20:43, on Zulip):

@Lucien Greathouse is your code published somewhere? I am definitelly interested in at least taking a look.

Also, rls 1.0 also has a vfs, which is lazy, so that might be interesting, code-sharing-wise, as well

matklad (Aug 20 2019 at 20:45, on Zulip):

Another reason why we want eager loading is to be abdle to crawl all files in the current package, to be able to index symbols for "go to symbol" even if a particular file is not included into any crate as a module

matklad (Aug 20 2019 at 20:45, on Zulip):

But, presumably, directory walking can be implemented on top of lazy loading as well?

Lucien Greathouse (Aug 20 2019 at 20:46, on Zulip):

Yep! New implementation tracks for directories whether the children have been enumerated yet, and files just store an Option<Vec<u8>> depending on whether the contents have been loaded.

matklad (Aug 20 2019 at 20:47, on Zulip):

Aha, interesting! And how do you setup watching, if you don't have explicit roots? Something like "track which directories/files are actually accessed, and watch those"?

Lucien Greathouse (Aug 20 2019 at 20:48, on Zulip):

Yeah, that's the current chunk I'm working on this week

Lucien Greathouse (Aug 20 2019 at 20:48, on Zulip):

We were having issues where users had their Git indexes get read into the VFS, but we didn't want to explicitly list ignores as globs or whatever

matklad (Aug 20 2019 at 20:52, on Zulip):

Aha, that makes sense, ignores are a pain to get right and convenient to use.

How your VFS incorporates changes? In rust-analyzer, we have this elaborate VfsTask things, which allow us to separate noticing that something's changed from actually applying the changes, which makes changes transactional. I like the property that vfs can't change under your feet (which is not the case with rls's vfs), but I wonder if this can be achieved easier...

Lucien Greathouse (Aug 20 2019 at 20:52, on Zulip):

The (pretty sloppy, oof) implementation we ship right now hangs out here:

It tracks roots like ra_vfs does, with the file watching/application-specific change tracking rolled into one spot here:

Current work that is definitely _not_ ready is in this part of this branch:

I want to steal the integer file handle idea from ra_vfs, since we just clone PathBuf objects everywhere right now (ahh!), considering using something like generational-arena to get a little better guarantees on top of that.

We've also got an abstraction of a hierarchical path map that underpins this. It's alright, but a little funky:

matklad (Aug 20 2019 at 20:53, on Zulip):

sweet, reading all that right now!

Lucien Greathouse (Aug 20 2019 at 20:53, on Zulip):

We apply them as the changes happen, everything is held by a top-level mutex since file changes always send us into a big snapshot function that wants exclusive access.

Lucien Greathouse (Aug 20 2019 at 20:55, on Zulip):

Dealing with diffing+applying changes is something that's core to this project, Rojo, as well, since we turn file change events into snapshots, then turn those into diff objects that we apply to a DOM, then send over the wire to another application via IPC. :D

Lucien Greathouse (Aug 20 2019 at 20:57, on Zulip):

Oh, the other motivation for lazy I/O for us is that we don't know what the roots of the project are until we explore the project, since we want to support nested project manifests automagically.

matklad (Aug 20 2019 at 20:57, on Zulip):

Yeah, I feel like your's vfs is father along than rust-analyzer's

matklad (Aug 20 2019 at 20:58, on Zulip):

Do you deal with symlinks already?

matklad (Aug 20 2019 at 20:59, on Zulip):

A significant motivation behind the roots system was to make sure that vfs's file system is a real tree, and not an arbitrary graph like the real file system is.

Lucien Greathouse (Aug 20 2019 at 21:01, on Zulip):

We don't do anything explicitly with symlinks right now, we might even have duplicate file contents bugs? I'm sure we crash for cyclical symlinks right now

Lucien Greathouse (Aug 20 2019 at 21:02, on Zulip):

I haven't had good luck with path canonicalization, especially when racing with other applications deleting files and causing it to fail.

matklad (Aug 20 2019 at 21:05, on Zulip):

Yeah, that's kind of a tough design choice:

For "cli" apps, the wisdom says that you should never change paths you get from user/os.

For something like rust-analyzer though, I'd love to be able to "walk the subdiretory tree", without feaing of infinite loops and such, even if that means showing an error to the user with "sorry, you have an extremely weird file system setup, can't handle it fully"

Lucien Greathouse (Aug 20 2019 at 21:06, on Zulip):

Does notify handle those cases gracefully today?

matklad (Aug 20 2019 at 21:06, on Zulip):

I have no idea, but I think nothing in notify itself walks the directories recursively, so it's not their problem

Lucien Greathouse (Aug 20 2019 at 21:08, on Zulip):

It does! The second arg to Watcher::watch has a RecursiveMode argument, we use it and it seems to work alright:

matklad (Aug 20 2019 at 21:08, on Zulip):

I wonder if there's a middle ground here...

I imagine a good thing to have would be "as true to OS as possible" VFS, which doesn't have roots and uses std::Path everywhere. On top of this thing, I might build a roots system with RelativePath paths. I need to think If I really need roots though... But it seems I might: "walk all .rs files in this dir" is something that an IDE needs

Lucien Greathouse (Aug 20 2019 at 21:09, on Zulip):

We have a handful of bugs about dropped changes but I don't know whether they get dropped by the OS, by notify, or by our code, and it's a crapshoot trying to debug them.

matklad (Aug 20 2019 at 21:12, on Zulip):

That is the code that incorporates changes, right?

Lucien Greathouse (Aug 20 2019 at 21:13, on Zulip):

That's what ties notify to the VFS and our project "session", yeah

Lucien Greathouse (Aug 20 2019 at 21:14, on Zulip):

We still crawl all kinds of files in the project directories, so that doesn't feel tied to roots to me

matklad (Aug 20 2019 at 21:14, on Zulip):

Is there API to get "the list of changed files since I've last looked"?

Lucien Greathouse (Aug 20 2019 at 21:15, on Zulip):

Not in master, but I want to start using an unbounded channel for that in reconciler-mk5 with the new vfs

matklad (Aug 20 2019 at 21:17, on Zulip):

Cool! That would be crucial for rust-analyzer, because we need to eagarly invalidate the "caches". So far, I must say I extremely like everything about your approach besides the eager update via locked mutex!

matklad (Aug 20 2019 at 21:18, on Zulip):

unbounded channel f

I am not sure I understand, but "unbounded channel" sounds like a think which can make coalescing changes harder. LIke, if the user saves files ten times, ideally I would prefer to get only one changed event

Lucien Greathouse (Aug 20 2019 at 21:21, on Zulip):

This is something I haven't thought through enough to implement yet. The idea with using an unbounded queue would be that you'd try_recv it (without blocking) until it's empty, merge the events together, and then process them.

Lucien Greathouse (Aug 20 2019 at 21:22, on Zulip):

I guess you could do that on the side before the changes are sent, but that feels like it's getting into the same problems that notify's debounced event impl has

matklad (Aug 20 2019 at 21:24, on Zulip):

In ra_vfs, we have an explicit commit method that advances the version and gives you the list of changed files. It doens't do coalescing though, but it should be able to rather trivially:

matklad (Aug 20 2019 at 21:26, on Zulip):

Recalled why we don't use recursive mode of notify, and do it ourselves:

matklad (Aug 20 2019 at 21:26, on Zulip):

with recursive, you can't ignore stuff

Lucien Greathouse (Aug 20 2019 at 21:27, on Zulip):

Ahhh, I wonder why we haven't run into any significant bug reports based on that same issue.

matklad (Aug 20 2019 at 21:29, on Zulip):

So, I got to go, but I am extremely interested in all this! I think I'll try to whip up an API for vfs that rust-analyzer needs, trying to make it rootless and lazy, and then it makes sense to compare the two.

It would be cool if you can take a closer look at the commit thing: I feel whole-fs snapshottng is pretty crucial for ra, and, from the looks of it, imfs doesn't provide it (just like the real fs doesn't provide it).

Lucien Greathouse (Aug 20 2019 at 21:30, on Zulip):

Definitely, I think that approach is also super neat from a robustness perspective.

Lucien Greathouse (Aug 21 2019 at 00:43, on Zulip):

The join-on-drop ScopedThread type is neat, that feels like it should be a crate since Crossbeam's scoped thread is for a different purpose (borrowing things from your stack)

Lucien Greathouse (Aug 21 2019 at 00:58, on Zulip):

In ra_vfs, who is doing work with the changes returned from commit? In imfs, it's the same thread that receives the notify change through a channel that ultimately does I/O and updates the imfs.

Jane Lusby (Aug 21 2019 at 01:00, on Zulip):

uh oh, join on drop scoped thread. sounds like the leakpocalypse again

Lucien Greathouse (Aug 21 2019 at 01:02, on Zulip):

luckily it's still safe to leak it since it doesn't hold stack references! :big_smile:

Jane Lusby (Aug 21 2019 at 01:41, on Zulip):

oooo got it got it got it. That kind of scoped thread, not the other kind of scoped thread, that's what you meant earlier. All making sense now

ylck (Aug 21 2019 at 07:21, on Zulip):


matklad (Aug 21 2019 at 07:33, on Zulip):

published scoped thread as

matklad (Aug 21 2019 at 08:11, on Zulip):

@Jane Lusby yeah, the purpose of jod-thread is not to express more lifetime patterns, but to make software more robust. Not joining a thread is bad because, for example, you might get interference between zombie threads of various tests. Additionally, if you don't join a thread and exit the main, then thread's destructors won't run

matklad (Aug 21 2019 at 08:22, on Zulip):

In ra_vfs, who is doing work with the changes returned from commit? In imfs, it's the same thread that receives the notify change through a channel that ultimately does I/O and updates the imfs.

@Lucien Greathouse that's a little wonky at the moment. For IO specifically, I try (but not 100% succesfully) to maintain constraint than only one thread does IO. This helps to make sure that the history of file versions always goes forward in time: if one thread eagarly loads the file while crawling a directory, and nother thread eagarly loads a file due to notification from notify, they can race against each other and you might end up with older file contents in the VFS. I think that with lazy loading you don't have this problem.

Incorporation of the new file content, read from disk, into the VSF happens in the app's main event loop.

Specifically, here we get a bunch of files from IO thread and add them to vfs as "pending changes":

After that, we commit those pending changes and get the new state of vfs:

The problem here is that we don't actually really maintain a list of pending changes, rather, we update in-place but record what was updated, and then the commit message only returns what has changed, while ideally it also should be the method that applies the changes.

matklad (Aug 21 2019 at 09:11, on Zulip):

@Lucien Greathouse so, I think this is roughly the API I want to see:

matklad (Aug 21 2019 at 09:12, on Zulip):

Just realized that @Igor Matuszewski might also be interested in this ^^ VFS discussion :D

Lucien Greathouse (Aug 21 2019 at 16:40, on Zulip):

Our project just made a similar transition for change tracking in the DOM diffing phase. We used to apply changes and log what changed as it happened, but now we generate + apply patch objects!

Lucien Greathouse (Aug 21 2019 at 17:42, on Zulip):

That API looks very doable, especially if you're a little flexible on &self vs &mut self

Another use we have for our VFS is dropping file change notifications that come from our own program writing the file, to avoid us getting in a nasty loop.

matklad (Aug 21 2019 at 17:46, on Zulip):

You write to files via VFS as well, right? It's interesting that in rust-analyzer we don't have this problem: when we want to write to a file, we ask the editor to do this, and then we see a change via usual vfs notification mechanism

matklad (Aug 21 2019 at 17:46, on Zulip):

That is, we don't directly mutate VFS

matklad (Aug 21 2019 at 17:47, on Zulip):

That API looks very doable, especially if you're a little flexible on &self vs &mut self

That depends on what "flexible" mean. I use & and &mut to enforce consistent shapshots, but I am fine with any other way of enforcement

Lucien Greathouse (Aug 21 2019 at 17:51, on Zulip):

With lazy I/O, VFile::contents, VDir::list and VDir::walk would probably need to accept &mut Vfs since they can change the state of the VFS. Do you think that's at odds with consistent snapshots, since it opens up a potential for a race?

Lucien Greathouse (Aug 21 2019 at 17:51, on Zulip):

We write to files through the VFS, yeah, and then when we get a change notification we can compare the new contents with our existing and drop the change on the ground

matklad (Aug 21 2019 at 17:54, on Zulip):

Hehe, I bet &self would work, and would guantee consistent snapshots...

Let me draft a little bit of code...

matklad (Aug 21 2019 at 18:02, on Zulip):

@Lucien Greathouse so, something like this is possible:

use std::fs;
use std::path::PathBuf;

use once_cell::sync::OnceCell;

struct FileData {
    path: PathBuf,
    contents: OnceCell<Option<Vec<u8>>>,

impl FileData {
    fn reset(&mut self) {
        self.contents = OnceCell::new()

    fn get(&self) -> Option<&[u8]> {
            .get_or_init(|| fs::read(self.path.as_path()).ok())
            .map(|it| it.as_slice())
matklad (Aug 21 2019 at 18:03, on Zulip):

That's file you can lazy read with & and reset with &mut. I am not sure if it transers easitly to hash map though, but I imagine it should

Lucien Greathouse (Aug 21 2019 at 18:12, on Zulip):

That's a sweet way to encode that relationship for files, nice. It might not work for directory listing, which needs mutable access to the whole VFS to be lazy, though :exhausted:

Daniel Mcnab (Aug 21 2019 at 18:19, on Zulip):

Can we learn from Noria's approach? That is maintaining two maps, one read only and one being written to, then swapping them, resetting the writing one once the final reader has been dropped

Daniel Mcnab (Aug 21 2019 at 18:19, on Zulip):

Of course I can't find the actual code for that - I just remember watching the talk.

matklad (Aug 21 2019 at 18:24, on Zulip):

ok, i've spend 10 minutes looking for a just right arena crate, and haven't found one, but I think dirs should be doable in theory as well...

matklad (Aug 21 2019 at 18:25, on Zulip):

What we need is an arena with fn push(&self /* sic! */, T) -> usize, fn get(&self, usize) -> &T and fn get_mut(&mut self, usize) -> &mut T methods I think

matklad (Aug 21 2019 at 18:26, on Zulip):

that is, we basically allocate files as never-moved slots

matklad (Aug 21 2019 at 18:29, on Zulip):

Hm, even

enum Entry { Dir(Dir), File(File)}

struct Dir { entries: OnceCell<HashMap<String, Entry>> }

struct File { contents: OnceCell<Vec<u8>>}

works, though you don't get a flat hash-map view of the file-system....

matklad (Aug 21 2019 at 18:45, on Zulip):

maybe I am overthinking this, and a single RWLock around the vfs would be enough? I definitelly want the ability to use VFS from diffrent threads though...

Lucien Greathouse (Aug 21 2019 at 19:52, on Zulip):

It seems like a trade-off between lazy I/O and concurrent read-only access from multiple threads

Lucien Greathouse (Aug 21 2019 at 19:53, on Zulip):

If you go down the road of needing to break your path into pieces and pointer-hop through nested hashmaps, that approach works, but I bounced off that idea awhile ago.

Lucien Greathouse (Aug 22 2019 at 00:39, on Zulip):

I thought I had a proof-of-concept of the file-watching implementation working, but now everything is deadlocking on drop, oof.

Lucien Greathouse (Aug 22 2019 at 01:21, on Zulip):

notify is holding onto its mpsc::Sender after the watcher is dropped, which is causing the receiver half to never return, lovely!
Switching from RecommendedWatcher to NullWatcher makes the deadlock go away with no other changes.

matklad (Aug 22 2019 at 06:46, on Zulip):

@Lucien Greathouse , I am pretty sure that in case of ra_fvs, dropping the notify::watchercloses the notify channel. But I did get a deadlock around that area with my channel though.

matklad (Aug 22 2019 at 06:48, on Zulip):

To ensure that the thread and channels are closed in the right order, I had to use a qute, aweful trick:

let thread; // <- affects drop order
let (tx, rx) = channel();
thread = spawn(|| /*use rx here*/);

Lucien Greathouse (Aug 22 2019 at 06:50, on Zulip):

I'm ensuring drop order with struct order, and also tried doing the same thing with ManuallyDrop -- I drop the notify watcher first and then drop the thread, joining it.

Lucien Greathouse (Aug 22 2019 at 06:50, on Zulip):

I'm pretty sure it's a notify bug because it only happens on Windows -- the null watcher and the default on Linux do not deadlock in this way :/

matklad (Aug 22 2019 at 06:51, on Zulip):

Interesting! I haven't tested this on windows a lot, so it might be the case.

matklad (Aug 22 2019 at 06:54, on Zulip):

(but yeah, I must say that hitting such platform specific bugs in platform-agnostic-ish notify is not pleasant. And I am not sure that it even moves in the right direction, given super grand plans for 5.0. For something where correctness is criticla, I'd seriously consider trying watchman)

Lucien Greathouse (Aug 22 2019 at 06:56, on Zulip):

The behavior around watching things across renames is definitely pretty spooky across different platforms.

Lucien Greathouse (Aug 22 2019 at 06:57, on Zulip):

Is there a watchman bindings crate?

matklad (Aug 22 2019 at 06:57, on Zulip):

I don't think so. And given that it's an external process, there should be a special can of worms to it as well...

Lucien Greathouse (Aug 28 2019 at 00:21, on Zulip):

One complication we discovered this week (in our VFS project!) is that our ad-hoc roots can be removed and we need to handle it gracefully, which is starting to complicate watching.

Lucien Greathouse (Aug 28 2019 at 00:23, on Zulip):

For example, the user can configure their project to watch src/foo and src/bar. We need to handle either of them being removed and re-added, but also need to handle (to some degree) the user deciding that they don't care about either of those folders anymore. It feels almost like we need to reference count our watches, but that's tricky because we aren't tracking roots at the VFS level anymore :(

matklad (Aug 29 2019 at 12:23, on Zulip):

Interesting! It seems like changing the set of roots is a rare operation, so perhaps a solution is just a linear traversal that takes the current set of roots and the current set of watches and reconsiles them?

matklad (Oct 09 2019 at 14:32, on Zulip):

@Lucien Greathouse I am starting to move to the new VFS in

Lucien Greathouse (Oct 09 2019 at 17:22, on Zulip):

Nice, this looks good. I haven't moved on our VFS implementation and have focused on other things since it mostly-kinda works.

Lucien Greathouse (Oct 11 2019 at 17:50, on Zulip):

I see that ra_vfs is internally locked with an RwLock, and all the file contents are kept as Arc<String>. I've been struggling pretty badly with synchronization in our implementation, so this looks really sound.

Lucien Greathouse (Oct 11 2019 at 17:51, on Zulip):

Notably, we found that once you start making every file operation lazy, it's way too hard to hold onto data and it increased our data copying a lot. Refcounting the buffers (since they're copy-on-write anyways) is awesome.

matklad (Oct 11 2019 at 18:05, on Zulip):

Yeah. Another nice thing I’ve realized is by using this invalidator thing
one can make VFS completely oblivious of he actual watcher implementation.
One not so nice thing that I”ve realized is that I still need roots for
incrementality. Basically, if I notice that a file is created, I
need to invalidate some thing, and, as I don’t want to invalidate
everything, I need a root concept

Lucien Greathouse (Oct 11 2019 at 18:12, on Zulip):

Oh! We modeled that in our "snapshot" subsystem, where for each output artifact we track all paths that could possibly contribute to that artifact.

So we have init.lua as the equivalent, and when we snapshot a folder, we say that the relevant paths are foo/ and foo/init.lua (along with other stuff, like foo/init.meta.json)

matklad (Oct 11 2019 at 18:19, on Zulip):

Yeah, perhaps I can try to track the precise deps as well? Might be slightly awkward to do with salsa, but should be possible...

Lucien Greathouse (Oct 11 2019 at 22:31, on Zulip):

Has there been any further investigation in file watching stuff? We're decoupled from Notify now through an interface called ImfsWatcher now. We consume coarser events than what notify gave us. Curious if starting a crate that would start as just a notify wrapper with a smaller API could be worthwhile.

matklad (Oct 12 2019 at 17:48, on Zulip):

Sort-of. I took a lot of time to read the docs for watchman. The docs a very thorough when it comes to details, but getting a high-level picture from them seems hard. It seems like it makes sense to model high-level API on top of watchman. Important bits I've got from the docs:

I feel that roughtly the following interface should work for watchers:

struct Subscription {
    root: PathBuf,
    include_glob_pattern: Option<String>,
    exclude_glob_pattern: Option<String>,

struct Watcher {
     fn new(event_callback: Box<dyn Fn(&Path)>) -> Watcher { ... }

     fn set_subscriptions(&mut self, subscription: Vec<Subscription>) {... }

Interesting points:

matklad (Oct 12 2019 at 17:59, on Zulip):

Oh, another high-order bit I've indetifid about separation between VFS and watching per se. At least in rust-analyzer use-cases, it seems like the set of files in VFS and the set of watched files actually should be different things. Specifically, VFS should contain files from, but we probably don't want to watch them. We only want to watch files inside the project. Now, we could watch everything, but that'll require more resources.

That's why in a design I have in mind you specify watched roots completely orthogonaly to the actual VFS impl

matklad (Oct 12 2019 at 18:10, on Zulip):

Opened an issue on the watchman repo about my struggle with the docs:

Last update: Jul 28 2021 at 05:00UTC