Skip to content
View vipul818's full-sized avatar

Block or report vipul818

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
vipul818/README.md

Hi πŸ‘‹, I'm Vipul Anand

Data Engineer | Cloud & Big Data | Azure | AWS


πŸ‘¨β€πŸ’» About Me

Data Engineer with 2+ years of experience building scalable, secure, and high-performance data pipelines using Azure Data Factory, Databricks, PySpark, and SQL.
Experienced in healthcare and retail domains, delivering production-grade ETL pipelines and cloud-native data solutions.

Strong foundation in data engineering, cloud architecture, and large-scale data processing with hands-on experience in both Azure and AWS ecosystems.


πŸ’Ό Professional Experience

Accenture β€” Data Engineer

Nov 2023 – Present | Pune, India

  • Designed and deployed 5+ scalable ETL pipelines using Azure Data Factory and Databricks.
  • Processed and managed 10M+ healthcare records, ensuring HIPAA-compliant data handling.
  • Implemented data validation, audit logging, and secure ingestion pipelines.
  • Optimized SQL transformations for large-scale datasets to improve query performance.
  • Collaborated with cross-functional teams to automate scheduling and monitoring workflows.

πŸš€ Key Projects

AWS Production Data Lake (Latest Project)

Tech Stack: AWS S3, Glue, Athena, Lambda, IAM, Python

  • Built a production-grade data lake using AWS free-tier services.
  • Implemented raw β†’ processed β†’ curated data zones.
  • Automated ETL using AWS Glue and optimized querying via Athena.
  • Applied partitioning, schema evolution, and cost-optimized design.
  • Designed for scalability, security, and real-world data engineering use cases.

Retail Data Engineering Pipeline

Databricks | PySpark | SQL | Delta Lake

  • Built an end-to-end ETL pipeline processing 10K+ retail records.
  • Used OPTIMIZE + Z-ORDER for query performance improvement.
  • Created analytical dashboards for sales and discount trends.
  • Implemented CI/CD using GitHub and Databricks Jobs.

Medical Image Registration System

OpenCV | Python | SIFT

  • Developed an image registration pipeline for large-scale medical images (10GB+).
  • Implemented feature detection and alignment for improved visualization accuracy.
  • Supported research-level workflows for medical imaging analysis.

πŸ› οΈ Technical Skills

Cloud & Big Data:
Azure Data Factory, Azure Databricks, Delta Lake, Azure Blob Storage, AWS S3, Glue, Athena

Programming & Tools:
Python, SQL, PySpark, Pandas, NumPy, Git, Linux

Concepts:
Data Engineering, Data Warehousing, ETL, Data Modeling, Cloud Architecture, DSA


πŸ“œ Certifications

  • Databricks Certified Data Engineer – Associate
  • Databricks Certified Data Engineer – Professional
  • Python & Data Science Certifications (Kaggle, Coursera)
  • AI for Medicine, Algorithms, Git & GitHub

🀝 Connect With Me


⭐ Focused on building scalable, production-grade data systems.

Popular repositories Loading

  1. Medical_Image_Registration Medical_Image_Registration Public

    Jupyter Notebook 2 1

  2. AI_for_Medicine AI_for_Medicine Public

    Jupyter Notebook 2

  3. EDA-with-Pandas EDA-with-Pandas Public

    Exploration & visualization of banking datasets

    Jupyter Notebook 1

  4. Expressando Expressando Public

    Python 1

  5. vim-as-a-python-ide vim-as-a-python-ide Public

    Forked from mbrochh/vim-as-a-python-ide

    Example code from my PyCon APAC 2012 talk.

    Vim Script

  6. mirror-quickstart-python mirror-quickstart-python Public

    Forked from googleglass/mirror-quickstart-python

    Google Mirror API's Quickstart for Python

    Python