- execution_date <= timezone.utcnow() # execution_date
- period_end <= timezone.utcnow() # period_end = follow_schedule(execution_date)
1. Run the jobs every day
Schedule the job every day but have 'expected failures' on the weekends. While this will solve the issue, it will be annoying to get failure alerts over the weekend. You may become jaded to actual errors.2. Calculate the real execution date using PythonOperator
Offset the schedule and have a custom python operator at the beginning of the DAG which sets the 'real' execution date. The downstream tasks will have to ignore the built-in execution_date and use the date established by this task. For our case: We change the schedule to 0 0 * * 2-6 and create a python operator at the beginning which returns execution_date - 1 day. On Monday and on Monday at midnight, nothing will run because there is no data on Sunday. From Tuesday to Saturday at midnight, the previous day's data will be ingested. This is exactly what we are looking for! While this works, it may be confusing for developers who are adding new tasks or attempting to trigger the DAG manually. Due to the date logic, the person who triggers the DAG manually needs to be aware of the custom date that we are using. Fairly dangerous, right?3. Use a ShortCircuitOperator
We want to have the advantage of Solution 1, where we don't introduce our own execution_date, but we want the schedule of Solution 2 so that we don't have 'expected failures'. Such a solution does exist. We can achieve the best of both by leveraging the ShortCircuitOperator. To apply this technique to your situation, you would schedule the job every day: schedule_interval='0 0 * * *'. At the beginning of your DAG, you create a ShortCircuitOperator which takes in a python callable that may look like the following:def filter_execution_date(*args, **kwargs):
execution_date = kwargs['execution_date']
cron = croniter('0 0 * * 1-5', execution_date)
return execution_date == cron.get_next(datetime).get_prev(datetime)
weekdays_only = ShortCircuitOperator(
dag=dag,
task_id='weekdays_only',
python_callable=filter_execution_date
...
) The weekdays_only task will filter for execution_dates that fall within '0 0 * * 1-5', so effectively Monday to Friday. For the execution dates of Saturday and Sunday, the downstream tasks will be skipped. This technique prevents unnecessary alarms from going off. Also, since we don't have a custom date arithmetic, when a DAG is manually triggered with an execution date, that DAG will be run (assuming that the execution date falls within the cron expression specified in the filter_execution_date).
Next steps
There are ways to augment this technique, like creating a new SubClass of the ShortCircuitOperator or creating a function that returns a python callable so that you don't need to declare. However, I will leave that up to your imagination and creativity.On this page
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.
Airflow Scheduler: the very basics
Airflow Scheduler: the very basics
Jul 5, 2018 12:00:00 AM
4
min read
DAG Monitoring with Airflow

DAG Monitoring with Airflow
Oct 11, 2022 12:00:00 AM
5
min read
Creating dynamic tasks using Apache Airflow
Creating dynamic tasks using Apache Airflow
Apr 24, 2018 12:00:00 AM
7
min read
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.