[feature request] Somehow exclude images/attachments when retrieving pages / chunking

**Is your feature request related to a problem? Please describe.**
I'm needing to make my chunk sizes really large or just go to full context with big page sizes because our pages have pngs in them. I can see in the search results "(data:image/png;base64,iVBORw0KGgoAAAANS" etc.

These strings eat up a ton of context and push useful results out. It would be nice if we could filter them somehow. 

**Describe the solution you'd like**
I'm not sure what the best approach would be; 
Maybe there's a confluence api that will return just text? Likely ideal?
If not, maybe there's a way to strip embeds as the pages get pulled in?

**Describe alternatives you've considered**
Work-around is full context with a big page size. Sometimes allowing RAG with really big chunk sizes and a fair amount of overlap works, too. Either way the context window for the response generation model needs to be big, too. 

**Additional context**
I'm happy to help / answer any questions as needed. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[feature request] Somehow exclude images/attachments when retrieving pages / chunking #29

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[feature request] Somehow exclude images/attachments when retrieving pages / chunking #29

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions