PySpark - Zero to Hero | PySpark Tutorial 2025 | Spark Tutorial 2025 | Learn from Basics to Advanced Performance Optimization
4.0
(1)
18 learners
What you'll learn
This course includes
- 9 hours of video
- Certificate of completion
- Access on mobile and TV
Course content
1 modules • 35 lessons • 9 hours of video
PySpark - Zero to Hero | PySpark Tutorial 2025 | Spark Tutorial 2025 | Learn from Basics to Advanced Performance Optimization
35 lessons
• 9 hours
PySpark - Zero to Hero | PySpark Tutorial 2025 | Spark Tutorial 2025 | Learn from Basics to Advanced Performance Optimization
35 lessons
• 9 hours
- 01 PySpark Tutorial | PySpark Training | Learn from Basics to Advanced Performance Optimization 02:55
- 02 How Spark Works - Driver & Executors | How Spark divide Job in Stages | What is Shuffle in Spark 04:47
- 03 Spark Transformations & Actions | Why Spark prefers Lazy Evaluation |What are Partitions in Spark 05:45
- 04 Spark DataFrames & Execution Plans | Spark Logical and Physical Execution Planning | What are DAG 03:58
- 04_2 - Setup PySpark in Local Machine with Jupyter Lab | PySpark Local Machine Setup 20:19
- 05 Understand Spark Session & Create your First DataFrame | Create SparkSession object | Spark UI 11:15
- 06 Basic Structured Transformation - Part 1 | Write Spark DataFrame Schema |Ways to write DF Columns 13:06
- 07 Basic Structured Transformation - Part 2 | Cast Column | Add Column | Static Column Value |Rename 12:15
- 08 Working with Strings, Dates and Null | Regex Replace | Convert string to date | Transform NULL 16:15
- 09 Sorting data, Union and Aggregation in Spark | Difference in Union and UnionAll | Having Clause 10:10
- 10 Window Functions, Unique Data & Databricks Community Cloud | Second Highest Salary | Spark expr 10:28
- 11 Data Repartitioning & PySpark Joins | Coalesce vs Repartition | Spark Data Partition | Joins 13:23
- 12 Understand Spark UI, Read CSV Files and Read Modes | Spark InferSchema Option | Drop Malformed 17:08
- 13 Read Complex File Formats | Parquet | ORC | Performance benefit of Parquet |Recursive File Lookup 10:33
- 14 Read, Parse or Flatten JSON data | JSON file with Schema | from_json | to_json | Multiline JSON 17:50
- 15 How Spark Writes data | Write modes in Spark | Write data with Partition | Default Parallelism 14:08
- 16 Understand Spark Execution on Cluster | Cluster Manager | Cluster Deployment Modes | Spark Submit 12:37
- 17 User Defined Function (UDF) | How Spark works with UDF | How to register Python UDF 09:42
- 18 Understand DAG, Explain Plans & Spark Shuffle with Tasks |Skipped Stage |Benefit of Shuffle Write 16:47
- 19 Understand and Optimize Shuffle in Spark 15:14
- 20 Data Caching in Spark | Cache vs Persist | Spark Storage Level with Persist |Partial Data Caching 13:19
- 21 Broadcast Variable and Accumulators in Spark | How to use Spark Broadcast Variables 12:35
- 22 Optimize Joins in Spark & Understand Bucketing for Faster joins |Sort Merge Join |Broad Cast Join 28:17
- 23 Static vs Dynamic Resource Allocation in Spark | Dynamic Allocation vs Databricks Scale up 10:30
- 24 Fix Skewness and Spillage with Salting in Spark | Salting Technique | How to identify Skewness 21:17
- 25 AQE aka Adaptive Query Execution in Spark | Coalesce Shuffle Partitions | Skew Partitions Fix 11:52
- 26 Spark SQL, Hints, Spark Catalog and Metastore | Hints in Spark SQL Query | SQL functions & Joins 19:20
- 27 Read and Write from Azure Cosmos DB using Spark | E2E Cosmos DB setup | NoSQL vs SQL Databases 21:17
- 28 Get Started with Delta Lake using Databricks | Benefits and Features of Delta Lake | Time Travel 34:15
- 29 Optimize Data Scanning with Partitioning in Spark | How Partitioning data works | Optimize Jobs 13:42
- 30 Data Skipping and Z-Ordering in Delta Lake Tables | Optimize & Data Compaction Delta Lake Tables 18:45
- 31 Delta Tables - Deletion Vectors and Liquid Clustering | Optimize Delta Tables | Delta Clustering 12:53
- 32 Spark Memory Management | Why OOM Errors in Spark | Spark Unified Memory | Storage/Execution Mem 48:25
- 33 What is Spark Connect? | Spark Connect vs Spark Session | Setup Spark Connect Server with Cluster 23:17
- 34 Write PySpark Unit Test Cases using PyTest module | Setup PyTest with PySpark 22:26
