Vipul Anand vipul818

Hi 👋, I'm Vipul Anand

Data Engineer | Cloud & Big Data | Azure | AWS

👨‍💻 About Me

Data Engineer with 2+ years of experience building scalable, secure, and high-performance data pipelines using Azure Data Factory, Databricks, PySpark, and SQL.
Experienced in healthcare and retail domains, delivering production-grade ETL pipelines and cloud-native data solutions.

Strong foundation in data engineering, cloud architecture, and large-scale data processing with hands-on experience in both Azure and AWS ecosystems.

💼 Professional Experience

Accenture — Data Engineer

Nov 2023 – Present | Pune, India

Designed and deployed 5+ scalable ETL pipelines using Azure Data Factory and Databricks.
Processed and managed 10M+ healthcare records, ensuring HIPAA-compliant data handling.
Implemented data validation, audit logging, and secure ingestion pipelines.
Optimized SQL transformations for large-scale datasets to improve query performance.
Collaborated with cross-functional teams to automate scheduling and monitoring workflows.

🚀 Key Projects

AWS Production Data Lake (Latest Project)

Tech Stack: AWS S3, Glue, Athena, Lambda, IAM, Python

Built a production-grade data lake using AWS free-tier services.
Implemented raw → processed → curated data zones.
Automated ETL using AWS Glue and optimized querying via Athena.
Applied partitioning, schema evolution, and cost-optimized design.
Designed for scalability, security, and real-world data engineering use cases.

Retail Data Engineering Pipeline

Databricks | PySpark | SQL | Delta Lake

Built an end-to-end ETL pipeline processing 10K+ retail records.
Used OPTIMIZE + Z-ORDER for query performance improvement.
Created analytical dashboards for sales and discount trends.
Implemented CI/CD using GitHub and Databricks Jobs.

Medical Image Registration System

OpenCV | Python | SIFT

Developed an image registration pipeline for large-scale medical images (10GB+).
Implemented feature detection and alignment for improved visualization accuracy.
Supported research-level workflows for medical imaging analysis.

🛠️ Technical Skills

Cloud & Big Data:
Azure Data Factory, Azure Databricks, Delta Lake, Azure Blob Storage, AWS S3, Glue, Athena

Programming & Tools:
Python, SQL, PySpark, Pandas, NumPy, Git, Linux

Concepts:
Data Engineering, Data Warehousing, ETL, Data Modeling, Cloud Architecture, DSA

📜 Certifications

Databricks Certified Data Engineer – Associate
Databricks Certified Data Engineer – Professional
Python & Data Science Certifications (Kaggle, Coursera)
AI for Medicine, Algorithms, Git & GitHub

🤝 Connect With Me

⭐ Focused on building scalable, production-grade data systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly