Course Hive
Search

Welcome

Sign in or create your account

Continue with Google
or
Building Self-Healing Data Pipeline - End to End Data Engineering Project
Play lesson

Google Cloud End to End Data Engineering Projects - Building Self-Healing Data Pipeline - End to End Data Engineering Project

Master End-to-End Data Engineering: Real-Time Streaming, AI Integrations, and High-Performance Systems! Dive into hands-on projects, expert-guided tutorials, and cutting-edge technologies for a standout career in data engineering.

4.0 (2)
25 learners

What you'll learn

Understand and implement real-time streaming with Google Cloud for data engineering projects
Learn how to perform real-time socket streaming using Apache Spark
Master the use of Apache Airflow alongside Spark, Pyspark, Java, and Scala for data engineering
Develop skills to build and optimize high-performance, real-time analytics databases

This course includes

  • 47.5 hours of video
  • Certificate of completion
  • Access on mobile and TV

Summary

Full Transcript

In this video, I'll show you how to build a production-ready, AI-powered data pipeline that automatically detects and heals data quality issues in real-time. No more failed pipelines because of bad data! We'll combine the power of Apache Airflow 3.0 with Ollama (running LLaMA 3.2 locally) to create an intelligent pipeline that: ✅ Automatically diagnoses data quality issues (missing values, wrong types, malformed text) ✅ Self-heals problematic records without manual intervention ✅ Performs sentiment analysis on millions of Yelp reviews using local LLM ✅ Generates comprehensive health reports and metrics ✅ Gracefully degrades when things go wrong This is the future of data engineering - pipelines that think for themselves and fix problems before they become failures. What You'll Learn: ✅ How to build agentic workflows in Apache Airflow ✅ Integrating local LLMs (Ollama) into your data pipelines ✅ Implementing self-healing patterns for data quality ✅ Batch processing strategies for large datasets ✅ Building health monitoring and observability into pipelines Like this video? Support us: https://www.youtube.com/@CodeWithYu/join Timestamps: 0:00 Introduction 1:43 System Architecture and background 5:49 Setting up the project 13:27 The Agentic Self Healing Pipeline 17:00 Embedding AI Agents in Airflow 40:44 Diagnosing and Healing Pipelines 1:11:44 Generating Health Reports 1:16:12 Results and Review 1:30:00 Outro Resources: Read more: https://open.substack.com/pub/datainproduction/p/why-agentic-workflows-change-everything Full Code+Video: https://buymeacoffee.com/yusuf.ganiyu/source-code-self-healing-agentic-data-pipeline Full Source Code: https://github.com/airscholar/SelfHealingPipeline Ollama Download: https://ollama.com/download Apache Airflow: https://airflow.apache.org/ Connect With Me: LinkedIn: https://linkedin.com/in/yusuf-ganiyu GitHub: https://github.com/airscholar Twitter/X: https://x.com/yusufOGaniyu #dataengineering #airflow #python #llm #ollama #datapipeline #machinelearning #ai #selfhealing #apacheairflow #dataengineer #etl #dataquality

Course Hive

Continue this lesson in the app

Install CourseHive on Android or iOS to keep learning while you move.

Related Courses

FAQs

Course Hive
Download CourseHive
Keep learning anywhere