Pipelines de datos, 2021.

Last update: May 19, 2022

Related tags

Overview

Este repo ilustra un proceso sencillo de automatización de transformación y modelado de datos, a través de un pipeline utilizando Luigi.

Stack principal

Python 3.7+
Streamlit
Scikit-learn
Pandas
Luigi

Idea

El proceso completo es descrito en una app interactiva que encuentras en el script app.py. Checa los detalles de cómo levantar la app en la sección de cómo ejecutar los scripts.

Setup

Crea un entorno virtual (te recomiendo usar conda):
```
conda create --name data-pipes python=3.7
```
Activate the virtual environment:
```
conda activate data-pipes
```
Install requirements:
```
pip install -r requirements.txt
```

Ejecuta los scripts

App interactiva

Para ejecutar la app interactiva, simplemente ejecuta el comando de Streamlit con el entorno virtual activado:

(data-pipes) streamlit run app.py

Esto abrirá un servidor local en: http://localhost:8501.

Pipeline de datos

Si deseas ejecutar una tarea en específico ,supongamos la TareaX que se encuentra en el script tareas.py, entonces ejecuta el comando:

PYTHONPATH=. luigi --module tareas TareaX --local-scheduler

¡Puedes extender el código y agregar las tareas que tú desees!

Pipelines de datos, 2021.

Related tags

Overview

Stack principal

Idea

Setup

Ejecuta los scripts

App interactiva

Pipeline de datos

Owner

Rodolfo Ferro

Codename generator using WordNet parts of speech database

[NeurIPS 2021] Code for Learning Signal-Agnostic Manifolds of Neural Fields

Pytorch NLP library based on FastAI

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

Python library for interactive topic model visualization. Port of the R LDAvis package.

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP

MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data.

Extract rooms type, door, neibour rooms, rooms corners nad bounding boxes, and generate graph from rplan dataset

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Natural language Understanding Toolkit

VD-BERT: A Unified Vision and Dialog Transformer with BERT

Py65 65816 - Add support for the 65C816 to py65

2021海华AI挑战赛·中文阅读理解·技术组·第三名

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention

IEEEXtreme15.0 Questions And Answers

⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).