Apache airflow install linux

Installation¶

The easiest way to install the latest stable version of Airflow is with pip :

pip install apache-airflow

You can also install Airflow with support for extra features like gcp or postgres :

pip install 'apache-airflow[postgres,gcp]' 

Extra Packages¶

The apache-airflow PyPI basic package only installs what’s needed to get started. Subpackages can be installed depending on what will be useful in your environment. For instance, if you don’t need connectivity with Postgres, you won’t have to go through the trouble of installing the postgres-devel yum package, or whatever equivalent applies on the distribution you are using.

Behind the scenes, Airflow does conditional imports of operators that require these extra dependencies.

Here’s the list of the subpackages and what they enable:

pip install ‘apache-airflow[all]’

All Airflow features known to man

pip install ‘apache-airflow[all_dbs]’

All databases integrations

pip install ‘apache-airflow[atlas]’

Apache Atlas to use Data Lineage feature

pip install ‘apache-airflow[async]’

Async worker classes for Gunicorn

pip install ‘apache-airflow[aws]’

pip install ‘apache-airflow[azure]’

pip install ‘apache-airflow[cassandra]’

Cassandra related operators & hooks

pip install ‘apache-airflow[celery]’

pip install ‘apache-airflow[cgroups]’

Needed To use CgroupTaskRunner

pip install ‘apache-airflow[cloudant]’

pip install ‘apache-airflow[crypto]’

Encrypt connection passwords in metadata db

pip install ‘apache-airflow[dask]’

pip install ‘apache-airflow[databricks]’

Databricks hooks and operators

pip install ‘apache-airflow[datadog]’

Datadog hooks and sensors

pip install ‘apache-airflow[devel]’

Minimum dev tools requirements

pip install ‘apache-airflow[devel_hadoop]’

Airflow + dependencies on the Hadoop stack

pip install ‘apache-airflow[doc]’

Packages needed to build docs

pip install ‘apache-airflow[docker]’

Docker hooks and operators

pip install ‘apache-airflow[druid]’

Druid related operators & hooks

pip install ‘apache-airflow[elasticsearch]’

pip install ‘apache-airflow[gcp]’

pip install ‘apache-airflow[github_enterprise]’

GitHub Enterprise auth backend

pip install ‘apache-airflow[google_auth]’

pip install ‘apache-airflow[grpc]’

pip install ‘apache-airflow[hdfs]’

pip install ‘apache-airflow[hive]’

Читайте также:  Load profile in linux

All Hive related operators

pip install ‘apache-airflow[jdbc]’

pip install ‘apache-airflow[jira]’

pip install ‘apache-airflow[kerberos]’

Kerberos integration for Kerberized Hadoop

pip install ‘apache-airflow[kubernetes]’

Kubernetes Executor and operator

pip install ‘apache-airflow[ldap]’

LDAP authentication for users

pip install ‘apache-airflow[mongo]’

Mongo hooks and operators

pip install ‘apache-airflow[mssql]’

Microsoft SQL Server operators and hook, support as an Airflow backend

pip install ‘apache-airflow[mysql]’

MySQL operators and hook, support as an Airflow backend. The version of MySQL server has to be 5.6.4+. The exact version upper bound depends on version of mysqlclient package. For example, mysqlclient 1.3.12 can only be used with MySQL server 5.6.4 through 5.7.

pip install ‘apache-airflow[oracle]’

Oracle hooks and operators

pip install ‘apache-airflow[papermill]’

Papermill hooks and operators

pip install ‘apache-airflow[password]’

Password authentication for users

pip install ‘apache-airflow[pinot]’

pip install ‘apache-airflow[postgres]’

PostgreSQL operators and hook, support as an Airflow backend

pip install ‘apache-airflow[qds]’

Enable QDS (Qubole Data Service) support

pip install ‘apache-airflow[rabbitmq]’

RabbitMQ support as a Celery backend

pip install ‘apache-airflow[redis]’

pip install ‘apache-airflow[salesforce]’

pip install ‘apache-airflow[samba]’

pip install ‘apache-airflow[sendgrid]’

Send email using sendgrid

pip install ‘apache-airflow[segment]’

Segment hooks and sensors

pip install ‘apache-airflow[slack]’

pip install ‘apache-airflow[snowflake]’

Snowflake hooks and operators

pip install ‘apache-airflow[ssh]’

pip install ‘apache-airflow[statsd]’

pip install ‘apache-airflow[vertica]’

Vertica hook support as an Airflow backend

pip install ‘apache-airflow[webhdfs]’

pip install ‘apache-airflow[winrm]’

WinRM hooks and operators

Initiating Airflow Database¶

Airflow requires a database to be initiated before you can run tasks. If you’re just experimenting and learning Airflow, you can stick with the default SQLite option. If you don’t want to use SQLite, then take a look at Initializing a Database Backend to setup a different database.

After configuration, you’ll need to initialize the database before you can run tasks:

Источник

Quick Start¶

This quick start guide will help you bootstrap an Airflow standalone instance on your local machine.

Читайте также:  Developing software for linux

Successful installation requires a Python 3 environment. Starting with Airflow 2.3.0, Airflow is tested with Python 3.7, 3.8, 3.9, 3.10. Note that Python 3.11 is not yet supported.

Only pip installation is currently officially supported.

While there have been successes with using other tools like poetry or pip-tools, they do not share the same workflow as pip — especially when it comes to constraint vs. requirements management. Installing via Poetry or pip-tools is not currently supported.

If you wish to install Airflow using those tools you should use the constraint files and convert them to appropriate format and workflow that your tool requires.

The installation of Airflow is straightforward if you follow the instructions below. Airflow uses constraint files to enable reproducible installation, so using pip and constraint files is recommended.

    Set Airflow Home (optional): Airflow requires a home directory, and uses ~/airflow by default, but you can set a different location if you prefer. The AIRFLOW_HOME environment variable is used to inform Airflow of the desired location. This step of setting the environment variable should be done before installing Airflow so that the installation process knows where to store the necessary files.

export AIRFLOW_HOME=~/airflow
AIRFLOW_VERSION=2.6.3 # Extract the version of Python you have installed. If you're currently using Python 3.11 you may want to set this manually as noted above, Python 3.11 is not yet supported. PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)" CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-$AIRFLOW_VERSION>/constraints-$PYTHON_VERSION>.txt" # For example this would install 2.6.3 with python 3.7: https://raw.githubusercontent.com/apache/airflow/constraints-2.6.3/constraints-3.7.txt pip install "apache-airflow==$AIRFLOW_VERSION>" --constraint "$CONSTRAINT_URL>" 

Upon running these commands, Airflow will create the $AIRFLOW_HOME folder and create the “airflow.cfg” file with defaults that will get you going fast. You can override defaults using environment variables, see Configuration Reference . You can inspect the file either in $AIRFLOW_HOME/airflow.cfg , or through the UI in the Admin->Configuration menu. The PID file for the webserver will be stored in $AIRFLOW_HOME/airflow-webserver.pid or in /run/airflow/webserver.pid if started by systemd.

Out of the box, Airflow uses a SQLite database, which you should outgrow fairly quickly since no parallelization is possible using this database backend. It works in conjunction with the SequentialExecutor which will only run task instances sequentially. While this is very limiting, it allows you to get up and running quickly and take a tour of the UI and the command line utilities.

As you grow and deploy Airflow to production, you will also want to move away from the standalone command we use here to running the components separately. You can read more in Production Deployment .

Here are a few commands that will trigger a few task instances. You should be able to see the status of the jobs change in the example_bash_operator DAG as you run the commands below.

# run your first task instance airflow tasks test example_bash_operator runme_0 2015-01-01 # run a backfill over 2 days airflow dags backfill example_bash_operator \ --start-date 2015-01-01 \ --end-date 2015-01-02

If you want to run the individual parts of Airflow manually rather than using the all-in-one standalone command, you can instead run:

airflow db init airflow users create \ --username admin \ --firstname Peter \ --lastname Parker \ --role Admin \ --email spiderman@superhero.org airflow webserver --port 8080 airflow scheduler

What’s Next?¶

From this point, you can head to the Tutorials section for further examples or the How-to Guides section if you’re ready to get your hands dirty.

Источник

Оцените статью
Adblock
detector