Skip to content

proposal to refactor with traits and trees #162

@ExpandingMan

Description

@ExpandingMan

I started #159 a while ago and since could mostly not decide what it should look like.

I've recently had some much better thoughts about all this stuff, and here's my proposal.

AbstractPath is a tree, always

The central problem we've had with S3 is that S3 does not understand that it is supposed to be a tree. In retrospect, this seems like a poor design choice on the part of S3. That's not to say there's something wrong with pure key-value stores, there isn't, but people are trying to copy S3 to and from trees all the time. This means that everything that interacts with S3 has to re-implement a tree on top of it.

Obviously S3 is not going to change, but I think this discussion supports the idea that there is no getting away from treating S3Path like a tree.

Therefore, FilePathsBase should NOT try to represent objects that are not trees, meaning that function such as isdir which distinguish leaves from non-leaf nodes on trees are mandatory and must return logically consistent results regardless of the AbstractPath type (though we should be able to mitigate their cost in the case of key-value stores).

The difference between DirectoriesExplicit and DirectoriesImplicit

In my proposed trait system paths either have directoriestype(::PathType) of DirectoriesExplicit() or DirectoriesImplicit(). From the above discussion we see that

  • DirectoriesExplicit: children (getting immediate child nodes) is cheap.
  • DirectoriesImplicit: leaves (getting descendant leaf nodes) is cheap.

Therefore FilePathsBase should not attempt to call children (readdir) directly on DirectoriesImplicit paths.

Implementation

The DirectoriesExplicit code can mostly stay as is, however, it needs to be re-optimized. Tree functions such as parent need to be made more efficient (I have done this already, but I may not have made a commit as of writing).

For DirectoriesImplicit, the missing piece is a function which constructs an explicit tree from leaves by parsing the path names of the leaves. Functions such as walkpath would then have to traverse the explicitly constructed tree.

I strongly suggest this package adopt AbstractTrees.jl so that it can take advantage of the generalizations and optimizations therein. This should eliminate the vast majority of non-parsing logic from FilePathsBase itself (i.e. the most fiddly functions in src/paths.jl).

I expect that doing this properly should also solve problems such as #145 .

Next steps

I'm willing (and eager, considering the current rather broken state of the ecosystem and the frustrating importance of AWSS3.jl) to carry this through to completion, I'm posting here because I'd like some feedback as I am not a maintainer. I would like to see this PR by Keno to AbstractTrees merged, tagged and fully documented first, so my next step is to help out with that.

Afterwards, I can turn my attention to FilePathsBase. I'm hopeful that everyone will be pretty enthusiastic about the changes I wind up making because at this point I'm pretty confident they will lead to drastic simplifications in both FilePathsBase.jl and AWSS3.jl.

cc @ericphanson

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions