Weekend Learning Sprint: Apache Airflow - Basics to advanced
Sat Jan 24, 11:00 - Sun Jan 25, 23:00
Event is online
ABOUT
Apache Airflow was originally created by the nice folks at Airbnb. Airbnb was growing rapidly and, as they grew, so did their task and data pipeline orchestration needs. They created Airflow to solve their own urgent needs. And then they open-sourced it.
Airflow is a kind of task scheduler, but with a lot of super powers. Here are a few of things it's good at:
- Complicated Task Interdependencies: You can build complicated task structures to handle many different needs. In Airflow these are called DAGs - Directed Acyclic Graphs. You can do some pretty hardcore things with these
- Logging and Monitoring: You can see what tasks ran and when, and exactly what happened
- Retries: You can set tasks up so that they retry themselves, and you can rerun tasks and entire workflows whenever you want to
- Secret Management: Automated tasks often need credentials, and those need to be kept safe
- Scale: Airflow allows you to create as many workers as you need, and those workers can be spread across many computers/vms/pods/whatever. There isn't an upper bound
- Usability: It has a really nice UI that you can interact with to view and control things
- Extendable: Airflow is designed for flexibility
One of the really cool things about Airflow is that DAGs are authored using Python code. A DAG is a graph of tasks and their interdependencies. Since DAGs are written with normal Python code (instead of some kind of configuration language), you can be quite creative about how you author them.
What We Will Cover
This workshop will take you from the basics to advanced concepts, helping you to get to grips with some of Airflow's weirder parts for building, scheduling, and managing workflows at any scale. We'll show you how to use Airflow to create reliable, maintainable data pipelines, automate tasks, and troubleshoot complex workflows effectively.
- Introduction to Workflow Orchestration and Apache Airflow: Explore the role of Airflow in data engineering and why it’s a popular choice for building and automating complex workflows.
- Installing and Configuring Airflow: Get started with Airflow by setting up a development environment and learning the essentials of configuring the platform.
- Working with DAGs (Directed Acyclic Graphs): Learn how to define workflows using DAGs and explore their structure to create clear, maintainable workflows.
- Creating and Scheduling Tasks: Discover the basics of creating tasks with Python operators, dependencies, and setting up scheduling rules.
- Managing Data Pipelines: Build modular and reusable pipelines by breaking workflows into smaller tasks and organizing code efficiently. Exploring the importance of Idempotent tasks
- Error Handling and Retries: Learn to manage failures and set up retry logic to make workflows more resilient and fault-tolerant.
- Working with Airflow Operators: Explore the range of built-in operators and use custom operators to tailor workflows to your needs.
- Managing Dependencies and Task Concurrency: Set up task dependencies and manage parallel execution to optimize workflow performance.
- Advanced Scheduling with CRON and Timetables: Deepen your experience with scheduling and configure custom schedules to fit specific data needs.
- Managing Secrets and Credentials: Securely handle credentials and API keys using Airflow’s connection management and environment variables.
- Workflow Monitoring and Logging: Track and troubleshoot workflows with Airflow’s logging and monitoring tools, helping you spot issues before they become problems.
- Dynamic DAGs: Implement dynamic DAG generation for flexible workflows. You can write Python code to generate DAGs for you
How the Workshop Will Work
You will be introduced to concepts in a hands-on way. Every concept will be practised and implemented.
You will also be given DAG challenges to solve along the way to build up and solidify your skills.
Prerequisite Knowledge
Participants need to be comfortable writing Python code.
Prerequisite Software
Please be aware that Airflow only works on certain operating systems. Here is a link to the official docs:
Location
This is an online event hosted on Discord. You will receive an invitation to the Discord server and joining information closer to the time.
Questions?
If you have any questions, please email [email protected]