Summary
Keywords
Full Transcript
In this video, you will be building a real-time data streaming pipeline, covering each phase from data ingestion to processing and finally storage. We'll utilize a powerful stack of tools and technologies, including Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra—all neatly containerized using Docker. MORE FREE COURSES: https://datamasterylab.com 📚 What You'll Learn: 👉 Setting up a data pipeline with Apache Airflow 👉 Streaming data with Kafka and Kafka Connect 👉 Using Zookeeper for distributed synchronization 👉 Data processing with Apache Spark 👉 Data storage solutions with Cassandra and PostgreSQL 👉 Containerizing your data engineering environment with Docker ✨ Timestamps: ✨ 0:00 Introduction 0:53 System architecture 3:47 Getting data from API with Airflow 17:10 Docker Compose for the architecture 26:09 Streaming data into Kafka 44:29 Apache Spark and Cassandra setup 49:33 Streaming data into cassandra 1:27:05 Outro 👦🏻 My Linkedin: https://www.linkedin.com/in/yusuf-ganiyu-b90140107/ 🚀 Twitter: https://twitter.com/YusufOGaniyu 📝 Medium: https://medium.com/@yusuf.ganiyu 🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟 Like this video? Buy me a coffee ❤️ https://www.buymeacoffee.com/yusuf.ganiyu/ 🔗 Useful Links and Resources: ✅ Code: https://github.com/airscholar/e2e-data-engineering.git ✅ Medium Article: https://medium.com/@yusuf.ganiyu/realtime-data-engineering-project-with-airflow-kafka-spark-cassandra-and-postgres-804bcd963974 ✅ Docker Compose Documentation: https://docs.docker.com/compose/ ✅ Apache Kafka Official Site: https://kafka.apache.org/ ✅ Apache Spark Official Site: https://spark.apache.org/ ✅ Apache Airflow Official Site: https://airflow.apache.org/ ✅ Cassandra: https://cassandra.apache.org/ ✅ Confluent Docs: https://docs.confluent.io/home/overview.html ✨ Tags ✨ Data Engineering, Apache Airflow, Kafka, Apache Spark, Cassandra, PostgreSQL, Zookeeper, Docker, Docker Compose, ETL Pipeline, Data Pipeline, Big Data, Streaming Data, Real-time Analytics, Kafka Connect, Spark Master, Spark Worker, Schema Registry, Control Center, Data Streaming ✨ Hashtags ✨ #confluent #DataEngineering #ApacheAirflow #Kafka #ApacheSpark #Cassandra #PostgreSQL #Docker #ETLPipeline #DataPipeline #StreamingData #RealTimeAnalytics
