This project performs sentiment analysis on tweets using a Logistic Regression classifier. It classifies tweets into Positive, Negative, or Neutral sentiments based on their text content.
The model is trained using Python in Google Colab, and saved as a .sav
file using pickle
.
The project uses a labeled Twitter dataset containing tweets and their corresponding sentiment classes. Preprocessing steps include:
- Lowercasing text
- Removing URLs, mentions, and punctuation
- Removing stopwords
- Tokenization and vectorization using
TfidfVectorizer
- Algorithm: Logistic Regression (
scikit-learn
) - Features: TF-IDF vectors from tweet text
- Target Labels: Positive, Negative, Neutral
- NumPy: For numerical computing and array operations
- Pandas: For data manipulation and analysis
- Scikit-learn: For machine learning and statistical modeling
- Clone the repository
- Install the required dependencies:
pip install -r requirements.txt
- Python 3.8+
- See requirements.txt for specific package versions
import numpy as np
import pandas as pd
from sklearn import model_selection