Stream: t-compiler/wg-rls-2.0

Topic: ignoring in VFS


matklad (Mar 07 2019 at 15:06, on Zulip):

@vipentti you've recently tweaked ignoring logic in the VFS, would you be interested in making it more generic?

Specifically, I think VFS can accept paris of (PathBuf, Box<dyn Fn(&RelativePath)>) as source roots (i.e, source root + exclusion pattern), and use that instead of hardcoded logic.

I think it makes sense to say that the filtering function is only called for files and change the implementation inside VFS such that if path.is_dir() is an implementation detail. We probaly want to use a struct with good names instead of a pair as well.

With that interface in place, we can move hard-coded logic about .rs and ndoe_modules out of the VFS and into ra_lsp_server.

vipentti (Mar 07 2019 at 15:07, on Zulip):

Sounds good, Yeah I was thinking about that too. That way ra_vfs could be used in a more generic way and not just with rust files

matklad (Mar 07 2019 at 15:13, on Zulip):

yeah... We really should have done this when extracting VFS to a separate repo. Now we have to coordinate updates to both repos, or use the [patch] section of Cargo.toml

vipentti (Mar 07 2019 at 15:17, on Zulip):

Yeah, that's unfortunate but hopefully should not be too bad to do now

vipentti (Mar 07 2019 at 15:56, on Zulip):

The reason for the (PathBuf, Box<dyn Fn(&RelativePath)>) is so that we can have different exclusions for different paths ? like ignoring test|benchmark|example from external crates

matklad (Mar 07 2019 at 16:00, on Zulip):

Yeah

matklad (Mar 07 2019 at 16:01, on Zulip):

I imagine long-term users should be able to override exclusion manually as well

vipentti (Mar 08 2019 at 11:16, on Zulip):

Should the exclusion happen in Roots still ? e.g. should the callback be passed and stored together with the RootData ?

vipentti (Mar 08 2019 at 11:47, on Zulip):

Currently the VFS is provided with a list of root directories right ? and then the individual files are read and the folders watched in io.rs ?

matklad (Mar 08 2019 at 12:24, on Zulip):

Yeah, the callback should be stored as a part of RootData

matklad (Mar 08 2019 at 12:24, on Zulip):

Currently the VFS is provided with a list of root directories right ? and then the individual files are read and the folders watched in io.rs ?

Yeap

matklad (Mar 08 2019 at 12:25, on Zulip):

Specifically, roots is what answers the question "to which root does this path belong"

matklad (Mar 08 2019 at 12:25, on Zulip):

and io is the thing which actually reads the contents of files from disk

vipentti (Mar 08 2019 at 13:03, on Zulip):

I guess since the current method of filtering files works, there should not be performance concerns regarding filtering ? I mean currently you could technically attempt to load files from node_modules but then the files would be ignored based in is_excluded

matklad (Mar 08 2019 at 13:05, on Zulip):

it's important to avoid scanning the node_modules folder though

vipentti (Mar 08 2019 at 13:09, on Zulip):

Yeah, I imagine being able to exclude whole directories from scanning would be useful in general

vipentti (Mar 08 2019 at 13:34, on Zulip):

What do you think about instead of returning bool from the callback, we'll return some enum with two variants e.g.

pub enum MatchResult {
    /// Keep the file
    Keep,
    /// Exclude the file
    Exclude
}

Naming tbd but that way they callback can be used to exclude files by default or to keep files by default, I find bool to be ambiguous in this case.

matklad (Mar 08 2019 at 13:49, on Zulip):

excellent idea!

matklad (Mar 08 2019 at 13:49, on Zulip):

enum Inclusion { Include, Exclude }

vipentti (Mar 08 2019 at 13:52, on Zulip):

hmm, still not quite sure what to do regarding folders, I mean it doesn't make sense to start watching folders that will be ignored, meaning if you add a/{nested|node_modules} as root but you want to ignore node_modules it would make sense to only watch a/nested and skip further processing of content in node_modules ?

matklad (Mar 08 2019 at 13:54, on Zulip):

yeah, I guess my original intention "make exclusion care only about files" was wrong

matklad (Mar 08 2019 at 13:54, on Zulip):

we do need to care about folders somehow

matklad (Mar 08 2019 at 13:55, on Zulip):

Can we say that a path to folder must include trailing / to distinguish two cases without making a syscall?

matklad (Mar 08 2019 at 13:55, on Zulip):

Alternatively, we can make something like { IncludeIfFile, IncludeIfDir, Exclude }

vipentti (Mar 08 2019 at 13:57, on Zulip):

hmm I wonder if we could run some folder_based filtering already in the initial Roots creation, meaning if you have a root a/node_modules and it matches an ignore, we'll not process it further and drop it from the roots

vipentti (Mar 08 2019 at 14:00, on Zulip):

Although I guess we'd still need to do some further filtreing in io since that one walks the directory tree ?

matklad (Mar 08 2019 at 14:01, on Zulip):

I think filtering during creating is isomorphic to filtering during initial scan

vipentti (Mar 08 2019 at 14:05, on Zulip):

True, though during initial scan you'll recursively go down the roots

matklad (Mar 08 2019 at 14:06, on Zulip):

yeah, it's more correct to say that we do strictly more work during initial scan anyway

vipentti (Mar 08 2019 at 14:11, on Zulip):

hmm actually it might not be a problem, if the roots don't contain the a/node_modules exclude_callback will only be called once when the WalkDir happens to attempt reading it, and then the exclusion should only occur once without further processing of the folders / files inside node_modules

vipentti (Mar 08 2019 at 14:32, on Zulip):

Can we say that a path to folder must include trailing / to distinguish two cases without making a syscall?

hmm I'm not sure, I think we'd need WalkDir to return paths with a trailing slash ?

vipentti (Mar 12 2019 at 14:04, on Zulip):

I have been working on this and experimented with various tweaks to the API. Currently the callback is called in Roots::contains where previously is_excluded and rel_path were called with some additional parameters.

An example of a callback:

fn include_rust_files(path: RootRelativePath) -> Inclusion {
    const IGNORED_FOLDERS: &[&str] = &[
        "node_modules",
        ".git",
    ];

    let is_ignored = path.components()
        .any(|c| IGNORED_FOLDERS.contains(&c.as_str()))
        ;

    if is_ignored {
        return Inclusion::Exclude;
    }

    // Allow rust files
    if let Some("rs") = path.extension() {
        return Inclusion::IncludeIfFile;
    }

    // TODO: This should be the default unless otherwise specified
    // because if you exclude here, then you won't get the nested
    // files under the root processed, even if they are not directly ignored
    Inclusion::IncludeIfDir
}

RootRelativePath is a container struct to allow access to the various parts of the path.

/// RootRelativePath identifies a file or a folder inside the root
pub struct RootRelativePath<'a> {
    /// The root path
    root: &'a Path,
    /// The full path to the file / folder, includes the root
    full: &'a Path,
    /// Relative path to root, e.g. full_path without the root as prefix
    relative: &'a RelativePath,
}

One issue I have is that the callbacks return value is important in ways that may not be obvious, basically if you by default return Inclusion::Exclude, nested folders under the particular root will not be processed (if they were not roots themselves).

So if you have a folder structure like

"a/nested"
"a/other"
"a/.git"

And the a is used as a root, if you return Inclusion::Exclude from the callback, all files under the nested, other folders will not be processed even if you did not explicitly ignore them.

matklad (Mar 12 2019 at 14:08, on Zulip):

Perhaps we could use a trait with two methods instead:

trait Filter {
    fn contains_path() -> bool;
    fn contains_file() -> bool;
}

?

matklad (Mar 12 2019 at 14:09, on Zulip):
/// The full path to the file / folder, includes the root

I hope we could avoid exposing Path/PathBuf in the API, b/c they are OS-dependent

vipentti (Mar 12 2019 at 14:11, on Zulip):

Yeah currently they are private but used to perform checks

impl<'a> RootRelativePath<'a> {
    /// Returns whether the `full_path` exists on disk and is pointing at a regular file.
    pub fn is_file(&self) -> bool {
        self.full.is_file()
    }

    /// Returns whether the `full_path` exists on disk and is pointing at a directory.
    pub fn is_dir(&self) -> bool {
        self.full.is_dir()
    }
}
vipentti (Mar 12 2019 at 14:13, on Zulip):

Perhaps we could use a trait with two methods instead:

trait Filter {
fn contains_path() -> bool;
fn contains_file() -> bool;
}

Use the trait instead of a function callback ?

matklad (Mar 12 2019 at 14:22, on Zulip):

Yeah

vipentti (Mar 12 2019 at 14:24, on Zulip):

That might work, I'll experiment a bit

vipentti (Mar 15 2019 at 15:00, on Zulip):

An interesting "problem" came up, if Vfs::add_file_overlay is called for a file, and the file does not exist, we can't properly filter it using file filtering, since when we are filtering, we can't really know if path points to a file or a directory, if it doesn't actually exist on disk yet. So

vfs.add_file_overlay(&dir.path().join("a/LICENSE2"), "text".to_string());

Would be filtered as it was a directory

matklad (Mar 15 2019 at 21:10, on Zulip):

hm, if you call add_file, the contract is that it is a file

matklad (Mar 15 2019 at 21:10, on Zulip):

we don't need to care about the state of the file system at all I think?

vipentti (Mar 18 2019 at 07:40, on Zulip):

Had some time to work on this again, I think I'm getting somewhere now. Basically we can avoid doing any extra IO when doing the filtering. By adding a new crate-specific enum for FileType, which is then used as an additional parameter to Roots::find(&self, path: &Path, expected: FileType) and Roots::contains(&self, root: VfsRoot, path: &Path, expected: FileType). Basically Vfs is always expecting a File and io then provides the argument based on the actual type in the filesystem. This way we can run the correct Filter function based on the expectation.

This means that when one of the _file_overlay functions is called, we won't add them to the VFS unless they pass filtering, even if they don't actually exist in the filesystem yet.

NOTE The FileType enum is purely for convenience.

pub(crate) enum FileType {
    File,
    Dir,
}

Example of a filter:

struct IncludeRustFiles;

impl Filter for IncludeRustFiles {
    fn include_folder(&self, folder_path: &RelativePath) -> bool {
        const IGNORED_FOLDERS: &[&str] = &[
            "node_modules",
            ".git",
        ];

        let is_ignored = folder_path.components()
            .any(|c| IGNORED_FOLDERS.contains(&c.as_str()))
            ;

        !is_ignored
    }

    fn include_file(&self, file_path: &RelativePath) -> bool {
        file_path.extension() == Some("rs")
    }
}
matklad (Mar 18 2019 at 08:29, on Zulip):

Looks reasonable, I don't see any problems with this approach!

vipentti (Mar 18 2019 at 09:39, on Zulip):

I'm not sure what is the appropriate bound for the Filter when creating a RootEntry.

Currently Filter and RootEntry look like

pub trait Filter: Send + Sync {
    fn include_folder(&self, folder_path: &RelativePath) -> bool;
    fn include_file(&self, file_path: &RelativePath) -> bool;
}

#[derive(Clone)]
pub struct RootEntry {
    path: PathBuf,
    filter: Arc<dyn Filter>,
}

impl RootEntry {
    pub fn new<T>(path: PathBuf, filter: T) -> Self
    where
        T: Filter + 'static,
    {
        RootEntry { path, filter: Arc::new(filter) }
    }
}

This means that each RootEntry::new will create a new Arc. But I'm not sure adding a function that takes an Arc<dyn Filter> is appropriate?

matklad (Mar 18 2019 at 09:42, on Zulip):

Send + Sync bounds on Filter sounds right to me.

API-wise, we should accept an Box<dyn Filter> and convert it to Arc<dyn Filter> internally (via Into). Though, I am not sure if we need Arc at all here: I think Root could store just Box, iirc, we already put roots inside arcs?

vipentti (Mar 18 2019 at 09:44, on Zulip):

Roots are inside arcs but the way Roots is currently initialized, we clone the entry

vipentti (Mar 18 2019 at 09:45, on Zulip):

Although I suppose the clone could be avoided by restructuring the Roots::new a bit

vipentti (Mar 18 2019 at 13:13, on Zulip):

Now the RootEntry contains Box<dyn Filter> with no extra Arc or cloning happening. Is there a reason to accept only Box<dyn Filter> vs adding a trait bound ?

impl RootEntry {
    pub fn new(path: PathBuf, filter: Box<dyn Filter>) -> Self {
        RootEntry { path, filter }
    }
    pub fn new_alt(path: PathBuf, filter: impl Filter + 'static) -> Self {
        RootEntry { path, filter: Box::new(filter) }
    }
}
vipentti (Mar 18 2019 at 14:25, on Zulip):

PR Up in https://github.com/rust-analyzer/ra_vfs/pull/4

vipentti (Mar 19 2019 at 10:52, on Zulip):

I was thinking about filtering examples etc from external crates, the best place to do this is probably ra_project_model ? Instead of ProjectWorkspace::to_roots returning a Vec<PathBuf>, we could return Vec<ra_vfs::RootEntry> where the filter has knowledge on if its an external crate or a member of the current project. This would mean adding ra_vfs dependency to ra_project_model though

matklad (Mar 19 2019 at 18:14, on Zulip):

argh........

I've just remembered the reason why I wanted to stick with Fn(&RelativePathBuf) interafact for filtering, instead of introducing a nominal trait

matklad (Mar 19 2019 at 18:15, on Zulip):

:)

I think it would be nice to avoid depndency on vfs though. Can project module supply something which could be converted to ra_vfs::RootEntry easily by a third crate?

vipentti (Mar 19 2019 at 18:17, on Zulip):

Hmm should be doable

vipentti (Mar 19 2019 at 18:18, on Zulip):

argh........

I've just remembered the reason why I wanted to stick with Fn(&RelativePathBuf) interafact for filtering, instead of introducing a nominal trait

what's the reason ?

matklad (Mar 19 2019 at 18:21, on Zulip):

to not depend on vfs :)

matklad (Mar 19 2019 at 18:21, on Zulip):

If the type is non-nominal (like a tuple of Fns or something), two crates can agree on it without depending on each other.

vipentti (Mar 19 2019 at 18:22, on Zulip):

Ah, yes of course

vipentti (Mar 19 2019 at 18:23, on Zulip):

I think the ra_project_mode can return just some tuple for example with (PathBuf, is_member) that can then be used outside of it

matklad (Mar 19 2019 at 18:24, on Zulip):

yeah, something like that should work

vipentti (Mar 19 2019 at 18:24, on Zulip):

I'd still like to combine the Filter somewhere, since both ra_batch and the lsp server are using the roots

vipentti (Mar 20 2019 at 07:11, on Zulip):

hmm, do we have more common functionality that could be shared between crates ? I was thinking about creating a simple crate vfs_filter or ra_filter or some such that would depend on Vfs and then both ra_lsp_server and ra_batch could depend on it. The new crate would contain the filters and such.
But if we have more common functionality, maybe creating ra_common could be useful ?

matklad (Mar 20 2019 at 07:24, on Zulip):

My gut feeling is that we don't actually want to share anything between these filters. It might turn out not to be true, we'll see, but, at this point, I think it's easier to keep the logic inside project_model, and provide a thin vfs glue twice, in ra_batch and ra_lsp_server

vipentti (Mar 20 2019 at 07:27, on Zulip):

Actually that sounds reasonable, so project_model returns something that allows us to tell if a root is part of the current workspace or not, then both ra_lsp_server and ra_batch implement their own Filters, which initially can be similar but may end up being different ?

vipentti (Mar 20 2019 at 09:59, on Zulip):

Updated the pr https://github.com/rust-analyzer/rust-analyzer/pull/997

Last update: Nov 19 2019 at 17:40UTC