Debargha Mitra Roy Arpan Pramanick Rounak Koner Unnati Mishra
National Institute of Technology, Durgapur
Please refer to the Problem Statement to gain a clear and comprehensive understanding of the problem. This problem was featured in the Amazon ML Challenge 2025.
Our solution involves the use of a multimodal machine learning model, which combines the textual and visual data about products in order to precisely estimate the best prices. With a SentenceTransformer-based text embedding, ResNet50 visual representations, and a LightGBM regression model, we are able to sum up the semantic and aesthetic indicators that affect price. Such a hybrid approach provides more interpretability, scalability and predictive accuracy across different product categories.
The problem aimed to predict optimal product prices using multimodal data - combining textual catalog descriptions and product images. The dataset contained missing text fields, which were filled with empty strings, and all text was standardized to lowercase. Exploratory analysis revealed that textual content encoded product category, brand, and quality cues, while visual data offered complementary insights like color, texture, and design. Together, these modalities provide a holistic representation of the product, making a multimodal strategy ideal.
Key Observations:
The fusion of textual and visual features significantly improved price prediction accuracy compared to single-modality models. Text embeddings captured semantic cues like brand and quality, while image features added visual context, resulting in a balanced and robust multimodal pricing model.
We are proposing to use a multimodal machine learning model, which will unite the textual and the visual data concerning products with the aim of accurately approximating the most appropriate prices. We can combine semantic and aesthetic indicators which influence price with a SentenceTransformer-based text embedding, ResNet50 visual representations, and a LightGBM regression model. A hybrid strategy offers higher interpretation, scale and predictive validity between various product lines.
Approach Type: Hybrid Multimodal Model
Core Innovation: Fusion of SentenceTransformer text embeddings and ResNet50 visual features, trained with a LightGBM Regressor for accurate price prediction.
Workflow Summary:
- Text preprocessing and embedding generation using SentenceTransformer.
- Image feature extraction using pre-trained ResNet50.
- Fusion of multimodal embeddings.
- Training
LightGBM
on combined features. - Model evaluation using MAE, RMSE, and R² metrics.
The model comprises three major components:
- Text Processing Pipeline: Encodes product descriptions into semantic vectors.
- Image Processing Pipeline: Extracts high-level visual embeddings.
- Fusion + Regression Layer: Concatenates both embeddings and passes them to LightGBM for final price prediction.
Text Processing Pipeline:
- Preprocessing steps: Lowercasing, missing value handling, cleaning.
- Model type:
SentenceTransformer(all-MiniLM-L6-v2)
($384$ -dim embeddings). - Key parameters: Dense vector per sentence capturing context and brand-specific meaning.
Image Processing Pipeline:
- Preprocessing steps: Resize
$(224×224)$ , normalization viapreprocess_input()
. - Model type: Pre-trained ResNet50 (without top layers).
- Key parameters:
$2048$ -dim feature vector from global average pooling.
Fusion + Regression: Features concatenated using scipy.sparse.hstack(). Model: LightGBMRegressor (n_estimators=1000, learning_rate=0.05). Objective: Regression task on price variable.
-
SMAPE Score:
$22.85%$ -
Other Metrics: Calculated
$R^{2}, MAE, RMSE$ .$R^{2}: 0.58$ $MAE: 0.58$ $RMSE: 0.58$
The proposed Smart Product Pricing System is an excellent blend between language and vision intelligence to make the right predictions regarding prices. The model is expected to have excellent generalization and interpretability with the combination of SentenceTransformer-based textual understanding, ResNet50-based visual representation, and LightGBM regression.
This is a scalable and adaptable hybrid methodology that offers a strong structure of intelligent pricing automation across various e-commerce platforms.
Code Artefacts | Links |
---|---|
Kaggle Dataset Download Link | |
Uploaded Code Google Drive Link |