-
-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Is your feature request related to a problem? Please describe.
I'm needing to make my chunk sizes really large or just go to full context with big page sizes because our pages have pngs in them. I can see in the search results "(data:image/png;base64,iVBORw0KGgoAAAANS" etc.
These strings eat up a ton of context and push useful results out. It would be nice if we could filter them somehow.
Describe the solution you'd like
I'm not sure what the best approach would be;
Maybe there's a confluence api that will return just text? Likely ideal?
If not, maybe there's a way to strip embeds as the pages get pulled in?
Describe alternatives you've considered
Work-around is full context with a big page size. Sometimes allowing RAG with really big chunk sizes and a fair amount of overlap works, too. Either way the context window for the response generation model needs to be big, too.
Additional context
I'm happy to help / answer any questions as needed.