Data Engineer with 2+ years of experience building scalable, secure, and high-performance data pipelines using Azure Data Factory, Databricks, PySpark, and SQL.
Experienced in healthcare and retail domains, delivering production-grade ETL pipelines and cloud-native data solutions.
Strong foundation in data engineering, cloud architecture, and large-scale data processing with hands-on experience in both Azure and AWS ecosystems.
Nov 2023 β Present | Pune, India
- Designed and deployed 5+ scalable ETL pipelines using Azure Data Factory and Databricks.
- Processed and managed 10M+ healthcare records, ensuring HIPAA-compliant data handling.
- Implemented data validation, audit logging, and secure ingestion pipelines.
- Optimized SQL transformations for large-scale datasets to improve query performance.
- Collaborated with cross-functional teams to automate scheduling and monitoring workflows.
Tech Stack: AWS S3, Glue, Athena, Lambda, IAM, Python
- Built a production-grade data lake using AWS free-tier services.
- Implemented raw β processed β curated data zones.
- Automated ETL using AWS Glue and optimized querying via Athena.
- Applied partitioning, schema evolution, and cost-optimized design.
- Designed for scalability, security, and real-world data engineering use cases.
Databricks | PySpark | SQL | Delta Lake
- Built an end-to-end ETL pipeline processing 10K+ retail records.
- Used OPTIMIZE + Z-ORDER for query performance improvement.
- Created analytical dashboards for sales and discount trends.
- Implemented CI/CD using GitHub and Databricks Jobs.
OpenCV | Python | SIFT
- Developed an image registration pipeline for large-scale medical images (10GB+).
- Implemented feature detection and alignment for improved visualization accuracy.
- Supported research-level workflows for medical imaging analysis.
Cloud & Big Data:
Azure Data Factory, Azure Databricks, Delta Lake, Azure Blob Storage, AWS S3, Glue, Athena
Programming & Tools:
Python, SQL, PySpark, Pandas, NumPy, Git, Linux
Concepts:
Data Engineering, Data Warehousing, ETL, Data Modeling, Cloud Architecture, DSA
- Databricks Certified Data Engineer β Associate
- Databricks Certified Data Engineer β Professional
- Python & Data Science Certifications (Kaggle, Coursera)
- AI for Medicine, Algorithms, Git & GitHub
β Focused on building scalable, production-grade data systems.
