-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
Description
Create a RAG-based system that retrieves relevant text and images in response to a user query. The model should support multi-modal retrieval and generation.
Features
- Process both text and images for retrieval.
- Use separate embeddings for text and images.
- Implement multi-modal query handling.
- Generate responses combining both modalities.
Tech Stack (Suggestions, not limited to)
- Image Processing: CLIP, OpenAI Vision, Google Vision API
- Text Embedding Models: OpenAI, SBERT, Cohere
- Vector Database: FAISS, Weaviate, Pinecone
- Backend: Python (FastAPI, Flask)