Apache Spark Fundamentals
- Principales conceptos de Big Data.
- Módulos de Apache Spark.
- Dataframe API.
- PySpark Functions.
- Taller: Desarrollo de un ETL en Pyspark.
Apache Spark for Tunning and Performance
- Reading Spark Query Plans.
- Reading Spark DAGs.
- Memory Management (Spark UI).
- Executor Tuning – Cores & Memory.
- Shuffle Partition Optimization.
- Data Partitioning.
- Bucketing.
- Using Caching.
- Solve Data Skew (With Salting Method ,Boadcast Joins).
- Dynamic Partition Pruning.
- Taller: Optimización de un proceso ETL con Pyspark.