- Designing Data-Intensive Applications
- Fundamentals of Data Engineering
- The Data Warehouse Toolkit
- Cracking the Data Engineering Interview
- Data Engineering with Python
- Data Pipelines with Apache Airflow
- The Data Warehouse Toolkit
- Big Data: Principles and Best Practices of Scalable Real-Time Data Systems
- Designing Data-Intensive Applications
-
Basic Skills:
Linux,Git & GitHub,Computer Networking,Cloud Computing,Network & Security,Agile Development -
Advanced Skills (Good to Know):
Data Lake & Data WareHouse Concepts,REST APIs,Databases(SQL & NoSQL) -
Programming Languages:
Python,SQL,Java,Scala -
Databases:
PostgreSQL,MongoDB,Neo4j,Redis,Cassandra,Apache HBase,Snowflake,InfluxDB -
Data Ingestion:
Apache Kafka,Flume,Logstash,Airbyte,Apache Spark,Talend,Informatica -
Data Tranformation:
Python,Pandas,SQL,Apache Spark,Hive,dbt,Matillion,Pig -
Data Preprocessing:
Apache Spark,Apache Hadoop,Apache Flink -
Data Orchestration:
Apache Airflow,Luigi -
Data Storage:
Data Lake: AWS S3, Azure Blob Storage, Google Cloud Storage,Data Warehouse: Snowflake, Google BigQuery, Amazon Redshift, Apache Hive -
Data Visualization:
Tableau,PowerBI,Looker -
DataOps:
Docker,Kubernetes,Jenkins
- 🐍 Python,
- 📊 SQL,
- 🛠️ MySQL,
- 🌳 MongoDB,
- 🔥 PySpark,
- 🎈 Bash,
- 🌬️ Airflow,
- ☕ Apache Kafka,
- 🐙 Git,
- 🐈 GitHub,
- ⚙️ CICD basics,
- 🏬 Data Warehousing,
- 🛠️ DBT,
- 🌊 Data Lakes,
- 📘 DataBricks,
- ☁️ Azure Databricks,
- ❄️ Snowflake,
- 🌪️ Apache NiFi,
- 🌐 Debezium
-
Master Python: https://lnkd.in/d-pZPyf5
-
Learn SQL: https://lnkd.in/dzAiRF-x
-
Get hands-on with MySQL: https://lnkd.in/ddpSkUhc
-
Dive into MongoDB: https://lnkd.in/dHQ4VC2E
-
Master PySpark: https://lnkd.in/d7fgs7dE
-
Discover Bash, Airflow & Kafka: https://lnkd.in/dDhuEqQE
-
Master Git & GitHub: https://lnkd.in/dqJ7J3kN
-
Understand CICD basics: https://lnkd.in/dcfKBmCa
-
Decode Data Warehousing: https://lnkd.in/dPVRDJT5
-
Learn DBT: https://lnkd.in/eG9eaEuE
-
Understand Data Lakes: https://lnkd.in/dtZKJ4d6
-
Explore DataBricks: https://lnkd.in/dCBiQXPR
-
Learn Azure Databricks: https://lnkd.in/dzmwBs4Y
-
Master Snowflake: https://lnkd.in/dDBeddVy
-
Explore Apache NiFi: https://lnkd.in/de7bvnSt
| Tools | Link | Used for | Official Docs | Youtube |
|---|---|---|---|---|
| DBMS | - MySQL - MongoDB | |||
| SQL | https://lnkd.in/dzAiRF-x | |||
| Python | https://lnkd.in/d-pZPyf5 | |||
| Linux | ||||
| Data Warehouse & Lake Concepts | - Data Warehouse - Data Lakes | |||
| Data Pipelines | ||||
| DBT | https://lnkd.in/eG9eaEuE | |||
| PySpark | https://lnkd.in/d7fgs7dE | |||
| Kafka | ||||
| Apache Nifi | https://lnkd.in/de7bvnSt | |||
| Airflow | ||||
| Databricks | https://lnkd.in/dCBiQXPR | |||
| Snowflake | https://lnkd.in/dDBeddVy | |||
| Cloud Computing Concepts | ||||
| Distributed Systems fundamentals | ||||
| AWS | ||||
| Azure | ||||
| GCP | ||||
| Git & GitHub | https://lnkd.in/dqJ7J3kN | |||
| CI/CD | https://lnkd.in/dcfKBmCa | |||
| Jenkins | ||||
| Github Actions | ||||
| Terraform | ||||
| Sonarqube | ||||
| Docker | ||||
| Kubernetes | ||||
| Power BI | ||||
| Tableau | ||||
| Apache Superset | ||||
| Prometheus | ||||
| Graphana | ||||
| Datadog |
- Netflix - https://netflixtechblog.medium.com/
- AWS - https://aws.amazon.com/solutions/case-studies/
- GCP - https://cloud.google.com/customers
- Azure - https://azure.microsoft.com/en-us/resources/customer-stories/
- Spotify - https://engineering.atspotify.com/category/data/
- MongoDB - https://www.mongodb.com/blog/all
- Swiggy - https://bytes.swiggy.com/the-swiggy-delivery-challenge-part-one-6a2abb4f82f6 - https://bytes.swiggy.com/swiggy-distance-service-9868dcf613f4 - https://bytes.swiggy.com/the-tech-that-brings-you-your-food-1a7926229886
- Zomato - https://blog.zomato.com/