GitHub - kelvinweijun/AI-Powered-Search-Engine: AI-powered search engine that uses FAISS and DenseNet-50 for both text and reverse image search capabilities. Comes with an asynchronous based web crawler

Search Engine and Web Crawler

This project combines a depth-first web crawler with an AI-powered search engine with reverse image search capabilities. It allows users to index the contents of web pages including images, into a relational database and later query that data using an image, returning similar images and associated page content.

Features

Web Crawler (`web_crawler.py`)

Implements a tree-based, depth-first traversal.
Uses BeautifulSoup for HTML parsing and scraping.
Extracts:
- Text snippets from each webpage.
- All downloadable images.
Stores data in a MariaDB database using SQLAlchemy ORM.
Avoids duplicate indexing and revisiting previously crawled links.

Search Engine (`search_engine.py`)

Performs both text based search and reverse image search:
- Encodes images with ResNet-50 (via Torchvision).
- Uses FAISS for high-performance similarity search over text vectors.
Returns the most visually similar images from the indexed set.
Provides metadata like the source webpage and content snippet for each match.

Setup Instructions

1. Clone the Repository

git clone https://github.com/your-username/reverse-image-search-crawler.git
cd reverse-image-search-crawler

2. Install Dependencies

pip install -r requirements.txt

3. Set Up MariaDB

Make sure you've installed MariaDB and a suitable database management platform (preferable HeidiSQL or Table Plus). Set up a database connection with the following credentials:

host: 127.0.0.1

user: root

password: root

Then, create a new database named "search_engine"

4. Set seed link in `web_crawler.py`

Open web_crawler.py and change the seed link to whatever you want.

5. Run `web_crawler.py`

Run web_crawler.py to begin crawling the pages.

6. Run `search_engine.py`

Run search_engine.py to start the search engine. This might take a while because the indexes are being embedded. Once done, access the localhost website on your browser and you should be able to see the webpage.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
__pycache__		__pycache__
static		static
templates		templates
README.md		README.md
requirements.txt		requirements.txt
search_engine.py		search_engine.py
web_crawler.py		web_crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Search Engine and Web Crawler

Features

Web Crawler (`web_crawler.py`)

Search Engine (`search_engine.py`)

Setup Instructions

1. Clone the Repository

2. Install Dependencies

3. Set Up MariaDB

4. Set seed link in `web_crawler.py`

5. Run `web_crawler.py`

6. Run `search_engine.py`

About

Uh oh!

Releases

Packages

Languages

kelvinweijun/AI-Powered-Search-Engine

Folders and files

Latest commit

History

Repository files navigation

Search Engine and Web Crawler

Features

Web Crawler (web_crawler.py)

Search Engine (search_engine.py)

Setup Instructions

1. Clone the Repository

2. Install Dependencies

3. Set Up MariaDB

4. Set seed link in web_crawler.py

5. Run web_crawler.py

6. Run search_engine.py

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Web Crawler (`web_crawler.py`)

Search Engine (`search_engine.py`)

4. Set seed link in `web_crawler.py`

5. Run `web_crawler.py`

6. Run `search_engine.py`

Packages