-
Notifications
You must be signed in to change notification settings - Fork 19
Description
I started #159 a while ago and since could mostly not decide what it should look like.
I've recently had some much better thoughts about all this stuff, and here's my proposal.
AbstractPath
is a tree, always
The central problem we've had with S3 is that S3 does not understand that it is supposed to be a tree. In retrospect, this seems like a poor design choice on the part of S3. That's not to say there's something wrong with pure key-value stores, there isn't, but people are trying to copy S3 to and from trees all the time. This means that everything that interacts with S3 has to re-implement a tree on top of it.
Obviously S3 is not going to change, but I think this discussion supports the idea that there is no getting away from treating S3Path
like a tree.
Therefore, FilePathsBase
should NOT try to represent objects that are not trees, meaning that function such as isdir
which distinguish leaves from non-leaf nodes on trees are mandatory and must return logically consistent results regardless of the AbstractPath
type (though we should be able to mitigate their cost in the case of key-value stores).
The difference between DirectoriesExplicit
and DirectoriesImplicit
In my proposed trait system paths either have directoriestype(::PathType)
of DirectoriesExplicit()
or DirectoriesImplicit()
. From the above discussion we see that
DirectoriesExplicit
:children
(getting immediate child nodes) is cheap.DirectoriesImplicit
:leaves
(getting descendant leaf nodes) is cheap.
Therefore FilePathsBase
should not attempt to call children
(readdir
) directly on DirectoriesImplicit
paths.
Implementation
The DirectoriesExplicit
code can mostly stay as is, however, it needs to be re-optimized. Tree functions such as parent
need to be made more efficient (I have done this already, but I may not have made a commit as of writing).
For DirectoriesImplicit
, the missing piece is a function which constructs an explicit tree from leaves
by parsing the path names of the leaves
. Functions such as walkpath
would then have to traverse the explicitly constructed tree.
I strongly suggest this package adopt AbstractTrees.jl so that it can take advantage of the generalizations and optimizations therein. This should eliminate the vast majority of non-parsing logic from FilePathsBase itself (i.e. the most fiddly functions in src/paths.jl
).
I expect that doing this properly should also solve problems such as #145 .
Next steps
I'm willing (and eager, considering the current rather broken state of the ecosystem and the frustrating importance of AWSS3.jl) to carry this through to completion, I'm posting here because I'd like some feedback as I am not a maintainer. I would like to see this PR by Keno to AbstractTrees
merged, tagged and fully documented first, so my next step is to help out with that.
Afterwards, I can turn my attention to FilePathsBase. I'm hopeful that everyone will be pretty enthusiastic about the changes I wind up making because at this point I'm pretty confident they will lead to drastic simplifications in both FilePathsBase.jl and AWSS3.jl.
cc @ericphanson