Skip to content

Conversation

benwtrent
Copy link
Member

This adds prefetching to directIO. The idea is pretty simple,

  • configure a number of "prefetch buffers" that are the same size as the directIO buffer
  • calling prefetch will start a prefetch virtual thread to fill an available buffer
  • On read, DirectIO will attempt to refill from any prefetched buffers that match the position before attempting to do directIO itself.

When doing many prefetches and handling things in batches, this can significantly improve throughput.

Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions github-actions bot added this to the 11.0.0 milestone Sep 23, 2025
private final int prefetchBytesSize;
private final Deque<Long> pendingPrefetches = new ArrayDeque<>();
private final FileChannel channel;
private final ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the executor something you would want to share within a directory or potentially even across directories? I can't find any documentation that indicates that this pattern would be a problem.

@dungba88
Copy link
Contributor

Do you have some number on throughput change?

@mikemccand
Copy link
Member

This is neat -- if Lucene implements enough top-down prefetch hinting, it might eventually be that DirectIO, alone, is sufficient for good query latency/throughput? I.e. we could stop entirely relying on OS to do its prefetching/caching (buffer cache), maybe, in very cold indices?

Isn't DirectIODirectory today only inserting itself for merge context reading & writing?

@benwtrent
Copy link
Member Author

Isn't DirectIODirectory today only inserting itself for merge context reading & writing?

Correct, its only used in certain scenarios. We are experimenting using it in more areas (e.g. vector rescoring, to keep from polluting the off-heap cache with rescoring vectors)

it might eventually be that DirectIO, alone, is sufficient for good query latency/throughput?

Its not quite there yet. I have seen this improve throughput by more than 2x though depending on the read patterns. MMAP still has TONS of advantages (direct memory segment access being a HUGE one for vectors).

Virtual threads make this VERY easy, but I am sure there is a lot of headroom for improvements.

@benwtrent
Copy link
Member Author

I also think that NIOFS could benefit of a prefetch implementation as well.

@mccullocht
Copy link
Contributor

If you used direct io for everything you would want to introduce an explicit disk cache somewhere, even with prefetching I don't think performance would meet expectations for a lot of workloads if most reads resulted in a syscall.

@benwtrent
Copy link
Member Author

If you used direct io for everything you would want to introduce an explicit disk cache somewhere, even with prefetching I don't think performance would meet expectations for a lot of workloads if most reads resulted in a syscall.

100% agreed. I think we are a long ways away from making IO super cheap.

Again, MMAP has many benefits still. But virtual threads do make this way easier than it would have been before!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants