Skip to content

This project is designed to validate Indian PAN numbers using MySQL Workbench. It includes SQL scripts to clean, validate, and categorize PAN numbers as Valid or Invalid based on multiple business rules and regex patterns.

License

Notifications You must be signed in to change notification settings

MasterMindRomii/PAN-Number-Validation-Project-using-MySQL

Repository files navigation

🪪 PAN Number Validation Project using MySQL

This project is designed to validate Indian PAN numbers using MySQL Workbench.
It includes SQL scripts to clean, validate, and categorize PAN numbers as Valid or Invalid based on multiple business rules and regex patterns.


📌 Project Overview

The Permanent Account Number (PAN) is a 10-character alphanumeric identifier issued by the Indian Income Tax Department.
To ensure correctness of data, this project:

  • Cleans raw PAN number datasets
  • Identifies missing, duplicate, or incorrectly formatted PANs
  • Uses custom MySQL functions to detect invalid patterns (e.g., sequential/repeated characters)
  • Classifies PAN numbers into Valid and Invalid
  • Generates a summary report for quick insights

🛠️ Features

  • Data cleaning (trimming, uppercase conversion, duplicate removal)
  • Validation rules (regex-based format checks)
  • Custom MySQL functions for:
  • Detecting adjacent character repetition
  • Detecting sequential characters
  • Creation of a validation view for quick results
  • Summary report with counts of valid, invalid, and missing PANs

📂 Dataset

The project assumes an input dataset with one column:

Column Name Description
pan_number Raw PAN numbers (string)

Example:

pan_number
ABCDE1234F
xyz 1234p
AA1111AA1A
(NULL)

🧹 Data Cleaning Steps

  1. Handle missing data → Remove NULL or empty PAN numbers
  2. Check duplicates → Identify duplicate PANs
  3. Trim spaces → Remove leading/trailing spaces
  4. Correct case → Convert to UPPERCASE
  5. Create cleaned table → Store cleaned dataset

🔎 Validation Logic

1️⃣ Regex Format Rule

PAN must match:

^[A-Z]{5}[0-9]{4}[A-Z]$

2️⃣ Custom Functions

  • fn_check_adjacent_repetition() → Ensures no two adjacent characters are the same
  • fn_check_sequence() → Detects sequential patterns like ABCDE, 1234

3️⃣ Final Categorization

  • Valid PAN → Matches regex + no repetition + no sequences
  • Invalid PAN → Fails any rule

📊 Outputs

View: vw_valid_invalid_pans

pan_number status
ABCDE1234F Valid PAN
AAAAA1111A Invalid PAN
XYZ1234P Invalid PAN

📈 Summary Report Example

total_processed_records total_valid_pans total_invalid_pans missing_incomplete_pans
10 7 2 1

🚀 Usage Guide

  1. Clone this repository:
    git clone https://github.com/MasterMindRomii/PAN-Number-Validation-Project-using-MySQL.git
    

🤝 Contribution

Pull requests are welcome! If you find issues or want to add new validation rules, feel free to fork and contribute.

📜 License

This project is licensed under the MIT License.

👨‍💻 Author

Romi Gupta 💼 Data Analyst | SQL | Power BI | Python | Excel 📧 romigupta1875@gmail.com

About

This project is designed to validate Indian PAN numbers using MySQL Workbench. It includes SQL scripts to clean, validate, and categorize PAN numbers as Valid or Invalid based on multiple business rules and regex patterns.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published