This project is designed to validate Indian PAN numbers using MySQL Workbench.
It includes SQL scripts to clean, validate, and categorize PAN numbers as Valid or Invalid based on multiple business rules and regex patterns.
The Permanent Account Number (PAN) is a 10-character alphanumeric identifier issued by the Indian Income Tax Department.
To ensure correctness of data, this project:
- Cleans raw PAN number datasets
- Identifies missing, duplicate, or incorrectly formatted PANs
- Uses custom MySQL functions to detect invalid patterns (e.g., sequential/repeated characters)
- Classifies PAN numbers into Valid and Invalid
- Generates a summary report for quick insights
- Data cleaning (trimming, uppercase conversion, duplicate removal)
- Validation rules (regex-based format checks)
- Custom MySQL functions for:
- Detecting adjacent character repetition
- Detecting sequential characters
- Creation of a validation view for quick results
- Summary report with counts of valid, invalid, and missing PANs
The project assumes an input dataset with one column:
| Column Name | Description |
|---|---|
pan_number |
Raw PAN numbers (string) |
Example:
| pan_number |
|---|
| ABCDE1234F |
| xyz 1234p |
| AA1111AA1A |
| (NULL) |
- Handle missing data → Remove NULL or empty PAN numbers
- Check duplicates → Identify duplicate PANs
- Trim spaces → Remove leading/trailing spaces
- Correct case → Convert to UPPERCASE
- Create cleaned table → Store cleaned dataset
PAN must match:
^[A-Z]{5}[0-9]{4}[A-Z]$
- fn_check_adjacent_repetition() → Ensures no two adjacent characters are the same
- fn_check_sequence() → Detects sequential patterns like
ABCDE,1234
- Valid PAN → Matches regex + no repetition + no sequences
- Invalid PAN → Fails any rule
| pan_number | status |
|---|---|
| ABCDE1234F | Valid PAN |
| AAAAA1111A | Invalid PAN |
| XYZ1234P | Invalid PAN |
| total_processed_records | total_valid_pans | total_invalid_pans | missing_incomplete_pans |
|---|---|---|---|
| 10 | 7 | 2 | 1 |
- Clone this repository:
git clone https://github.com/MasterMindRomii/PAN-Number-Validation-Project-using-MySQL.git
🤝 Contribution
Pull requests are welcome! If you find issues or want to add new validation rules, feel free to fork and contribute.
📜 License
This project is licensed under the MIT License.
👨💻 Author
Romi Gupta 💼 Data Analyst | SQL | Power BI | Python | Excel 📧 romigupta1875@gmail.com