Efficient way to deploy dag files on airflow

AirflowAirflow Scheduler

Airflow Problem Overview


Are there any best practices that are followed for deploying new dags to airflow?

I saw a couple of comments on the google forum stating that the dags are saved inside a GIT repository and the same is synced periodically to the local location in the airflow cluster.
Regarding this approach, I had a couple of questions

  • Do we maintain separate dag files for separate environments? (testing. production)
  • How to handle rollback of an ETL to an older version in case the new version has a bug?

    Any help here is highly appreciated. Let me know in case you need any further details?

  • Airflow Solutions


    Solution 1 - Airflow

    Here is how we manage it for our team.

    First in terms of naming convention, each of our DAG file name matches the DAG Id from the content of the DAG itself (including the DAG version). This is useful because ultimately it's the DAG Id that you see in the Airflow UI so you will know exactly which file has been used behind each DAG.

    Example for a DAG like this:

    from airflow import DAG
    from datetime import datetime, timedelta
    
    default_args = {
      'owner': 'airflow',
      'depends_on_past': False,
      'start_date': datetime(2017,12,05,23,59),
      'email': ['[email protected]'],
      'email_on_failure': True
    }
    
    dag = DAG(
      'my_nice_dag-v1.0.9', #update version whenever you change something
      default_args=default_args,
      schedule_interval="0,15,30,45 * * * *",
      dagrun_timeout=timedelta(hours=24),
      max_active_runs=1)
      [...]
    

    The name of the DAG file would be: my_nice_dag-v1.0.9.py

    • All our DAG files are stored in a Git repository (among other things)
    • Everytime a merge request is done in our master branch, our Continuous Integration pipeline starts a new build and packages our DAG files into a zip (we use Atlassian Bamboo but there's other solutions like Jenkins, Circle CI, Travis...)
    • In Bamboo we configured a deployment script (shell) which unzips the package and places the DAG files on the Airflow server in the /dags folder.
    • We usually deploy the DAGs in DEV for testing, then to UAT and finally PROD. The deployment is done with the click of a button in Bamboo UI thanks to the shell script mentioned above.

    Benefits

    1. Because you have included the DAG version in your file name, the previous version of your DAG file is not overwritten in the DAG folder so you can easily come back to it
    2. When your new DAG file is loaded in Airflow you can recognize it in the UI thanks to the version number.
    3. Because your DAG file name = DAG Id you could even improve the deployment script by adding some Airflow command line to automatically switch ON your new DAGs once they are deployed.
    4. Because every version of the DAGs is historicized in Git, we can always comeback to previous versions if needed.

    Solution 2 - Airflow

    As of yet, Airflow doesn't has its own functionality of versioning workflows (see this). However you can manage that on your own by managing DAGs on their own git repository and fetching its state into airflow reposiroty as submodules. In this way you always have single airflow version that contains sets of DAGs with specific versions. Watch more here

    Solution 3 - Airflow

    One best practice is written in the documentation:

    > Deleting a task > > Never delete a task from a DAG. In case of deletion, the historical > information of the task disappears from the Airflow UI. It is advised > to create a new DAG in case the tasks need to be deleted

    I believe this is why the versioning topic is not so easy to solve yet, and we have to plan some workarounds.

    https://airflow.apache.org/docs/apache-airflow/2.0.0/best-practices.html#deleting-a-task

    Attributions

    All content for this solution is sourced from the original question on Stackoverflow.

    The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

    Content TypeOriginal AuthorOriginal Content on Stackoverflow
    QuestionSreenath KamathView Question on Stackoverflow
    Solution 1 - AirflowAlexis.RollandView Answer on Stackoverflow
    Solution 2 - AirflowAnum SherazView Answer on Stackoverflow
    Solution 3 - AirflowAbimael DomínguezView Answer on Stackoverflow