This is a practice on Airflow, which is building virtual env, installing Airflow and constructing data pipeline (DAGs)

Overview

airflow-test

This is a practice on Airflow, which is

First of all, build virtual env using virtualbox

This is optional but highly recommended

  • It is very important to make sure that development env is independent and has no conflict with other env setting

Next, make python virtual env on proper directory

python3 -m venv sandbox
source ~/sandbox/bin/activate

Let start to install Airflow and initial setting

Install Airflow

pip3 install apache-airflow==2.1.0 \ ## airflow version depends on when you install and constraint 
  --constraint 'git address' ## constraint is ???. It is optional. refer to https://gist.github.com/marclamberti

Airflow Inital Setting

airflow db init ## airflow metadata initializing
airflow users create -u admin -p admin -f jaeyoung -l jang -r Admin -e [email protected] ## create admin user

airflow webserver ## launch web server. after that we can access airflow web server through localhost:8080
airflow scheduler ## activate airfow scheduler to run DAGs

Construct data pipeline with Airflow

One example of data pipeline (one DAGs) - let follow below steps and make data pipeline

  • creating table -> is-api-available -> extending -> processing-use ->
  • what is it? when new user comes in, sense that and process and store user info to destination (?)
  1. create table
  • make python script user_processing.py in dags folder
  • let create table using sqlite operator in user_processing.py
from airflow.models import DAG
from airflow.providers.sqlite.operators.sqlite import SqliteOperator

from datetime import datetime

default_args = {
    'start_date': datetime(2020, 1, 1)
}

with DAG('user_processing', schedule_interval='@daily', default_args=default_args, catchup=False) as dag:
    # Define tasks/operators

    creating_talbe = SqliteOperator(
        task_id='creating_table',
        sqlite_conn_id='db_sqlite',
        sql='''
            CREATE TABLE users (
                firstname TEXT NOT NULL,
                lastname TEXT NOT NULL,
                country TEXT NOT NULL,
                username TEXT NOT NULL,
                password TEXT NOT NULL,
                email TEXT NOT NULL PRIMARY KEY
            );
            '''
    )
  • set sqlite connection on Airflow Web UI
pip install apache-airflow-providers-sqlite ## install airflow provider of sqlite. refer airflow doc. may already be installed from intial installing Airflow..
# next, make a new sqltie connection named 'db_sqlite' in Airflow
  • after that, do test airflow tasks!

airflow tasks test user_processing creating_table 2020-01-01

  • one more things to do just for check again
sqlite3 airflow.db ## connect to sqlite3 in airflow.db
sqlite>.tables; ## see the tables in airflow.dbb
sqlite>select * form users; ## test query
Owner
Jaeyoung
Jaeyoung
Transform a Google Drive server into a VFX pipeline ready server

Google Drive VFX Server VFX Pipeline About The Project Quick tutorial to setup a Google Drive Server for multiple machines access, and VFX Pipeline on

Valentin Beaumont 17 Jun 27, 2022
The code for 2021 MGTV AI Challenge Anti Stealing Link, and the online result ranks 10th.

赛题介绍 芒果TV-第二届“马栏山杯”国际音视频算法大赛-防盗链 随着业务的发展,芒果的视频内容也深受网友的喜欢,不少视频网站和应用开始盗播芒果的视频内容,盗链网站不经过芒果TV的前端系统,跳过广告播放,且消耗大量的服务器、带宽资源,直接给公司带来了巨大的经济损失,因此防盗链在日常运营中显得尤为重要

tongji40 16 Jun 17, 2022
token vesting escrow with cliff and clawback

Yearn Vesting Escrow A modified version of Curve Vesting Escrow contracts with added functionality: An escrow can have a start_date in the past.

62 Dec 08, 2022
An app about keyboards, originating from the design of u/Sonnenschirm

keebapp-backend An app about keyboards, originating from the design of u/Sonnenschirm Setup Firstly, ensure that the environment for python is install

8 Sep 04, 2022
Fastest Semantle solver this side of the Mississippi

semantle Fastest Semantle solver this side of the Mississippi. Roughly 3 average turns to win Measured against (part of) the word2vec-google-news-300

Frank Odom 8 Dec 26, 2022
Information about a signed UEFI Shell that can be used when Secure Boot is enabled.

SignedUEFIShell During our research of the BootHole vulnerability last year, we tried to find as many signed bootloaders as we could. We searched all

Mickey 61 Jan 03, 2023
Find virtual hosts (vhosts) from IP addresses and hostnames

Features Enumerate vhosts from a list of IP addresses and domain names. Virtual Hosts are enumerated using the following process: Supplied domains are

3 Jul 09, 2022
This repository contains all the data analytics projects that I've worked on in python.

93_Python_Data_Analytics_Projects This repository contains all the data analytics projects that I've worked on in python. No. Name 01 001_Cervical_Can

Milaan Parmar / Милан пармар / _米兰 帕尔马 267 Jan 06, 2023
This Open-Source project is great for sensor capture and storage solutions.

Phase 1 This project helps developers in the creation of extended realities that communicate with Arduino and require the security of blockchain stora

Wolfberry, LLC 10 Dec 28, 2022
Rock-paper-scissors basic game in terminal with Python

piedra-papel-tijera Juego básico de piedra, papel o tijera en terminal con Python. El juego incluye: Nombre de jugador Número de veces a jugar Resulta

Isaías Flores 1 Dec 14, 2021
This repository requires you to solve a problem by writing some basic python code.

Can You Solve a Problem? A beginner friendly repository that requires you to solve familiar problems with python. This could be as simple as implement

Precious Kolawole 11 Nov 30, 2022
[arXiv 2020] Video Representation Learning with Visual Tempo Consistency

Video Representation Learning with Visual Tempo Consistency [Paper] [Project Page] News Full codebae is coming soon Pretained Models For now, we provi

DeciForce: Crossroads of Machine Perception and Autonomy 24 Nov 23, 2022
A fast Python in-process signal/event dispatching system.

Blinker Blinker provides a fast dispatching system that allows any number of interested parties to subscribe to events, or "signals". Signal receivers

jason kirtland 1.4k Dec 31, 2022
Excel cell checker with python

excel-cell-checker Description This tool checks a given .xlsx file has the struc

Paul Aumann 1 Jan 04, 2022
Runtime profiler for Streamlit, powered by pyinstrument

streamlit-profiler 🏄🏼 Runtime profiler for Streamlit, powered by pyinstrument. streamlit-profiler is a Streamlit component that helps you find out w

Johannes Rieke 23 Nov 30, 2022
Python client library for the Databento API

Databento Python Library The Databento Python client library provides access to the Databento API for both live and historical data, from applications

Databento, Inc. 35 Dec 24, 2022
Coffeematcher is a python library to randomly match participants for coffee meetings.

coffeematcher coffeematcher is a python library to randomly match participants for coffee meetings. Installation Clone the repository: git clone https

Thomas Wesselink 3 May 06, 2022
The little-endian version of MessagePack

MessagePackEL This is the little-endian version of MessagePack, except the endianness is different, the rest is exactly the same as MessagePack. C lib

dukelec 9 May 13, 2022
A lightweight and unlocked launcher for Lunar Client made in Python.

LCLPy LCL's Python Port of Lunar Client Lite. Releases: https://github.com/Aetopia/LCLPy/releases Build Install PyInstaller. pip install PyInstaller

21 Aug 03, 2022
A Lynx that manages a group that puts the federation first.

Lynx Super Federation Management Group Lynx was created to manage your groups on telegram and focuses on the Lynx Federation. I made this to root out

Unknown 2 Nov 01, 2022