Python3 to Crystal Translation using Python AST Walker

Related tags

Text Data & NLPpy2cr
Overview

py2cr.py

A code translator using AST from Python to Crystal. This is basically a NodeVisitor with Crystal output. See AST documentation (https://docs.python.org/3/library/ast.html) for more information.

Status

Currently more than 80% of the relevant tests are passing. See more information below.

Installation

Execute the following:

pip install py2cr

or

git clone git://github.com/nanobowers/py2cr.git

Versions

  • Python 3.6 .. 3.9
  • Crystal 1.1+

Dependencies

Python

pip install pyyaml

# Probably not needed for much longer since py2 support is going to be removed.
pip install six 

# Probably not really needed since there is no crystal equivalent
pip install numpy

Crystal

currently there are no external dependencies

Methodology

In addition to walking and writing the AST tree and writing a Crystal syntax output, this tool either:

  • Monkey-patches some common Crystal stdlib Structs/Classes in order to emulate the Python equivalent functionality.
  • Calls equivalent Crystal methods to the Python equivalent
  • Calls wrapped Crystal methods that provide Python equivalent functionality

Usage

Generally, py2cr.py somefile.py > somefile.cr

There is a Crystal shim/wrapper library in src/py2cr (and linked into lib/py2cr) that is also referenced in the generated script. You may need to copy that as needed, though eventually it may be appropriate to convert it to a shard if that is more appropriate.

Example

TODO

Tests

$ ./run_tests.py

Will run all tests that are supposed to work. If any test fails, its a bug. (Currently there are a lot of failing tests!!)

$ ./run_tests.py -a

Will run all tests including those that are known to fail (currently). It should be understandable from the output.

$ ./run_tests.py basic

Will run all tests matching basic. Useful because running the entire test-suite can take a while.

$ ./run_tests.py -x or $ ./run_tests.py --no-error

Will run tests but ignore if an error is raised by the test. This is not affecting the error generated by the test files in the tests directory.

For additional information on flags, run:

./run_tests.py -h

Writing new tests

Adding tests for most new or existing functionality involves adding additional python files at tests/ .py .

The test-runner scripts will automatically run py2cr to produce a Crystal script, then run both the Python and Crystal scripts, then compare stdout/stderr and check return codes.

For special test-cases, it is possible to provide a configuration YAML file on a per test basis named tests/ / .config.yaml which overrides defaults for testing. The following keys/values are supported:

min_python_version: [int, int] # minimum major/minor version
max_python_version: [int, int] # maximum major/minor version
expected_exit_status: int      # exit status for py/cr test script
argument_list: [str, ... str]  # list of strings as extra args for argv

Typing

Some amount of typing support in Python is translated to Crystal. Completely untyped Python code in many cases will not be translatable to compilable Crystal. Rudimentary for python Optional and Union should convert appropriately to Crystal typing.

Some inference of bare list/dict types can now convert to [] of X and {} of X, however set and tuple may not work properly.

Status

This is incomplete and many of the tests brought forward from py2rb do not pass. Some of them may never pass as-is due to significant language / compilation differences (even moreso than Python vs. Ruby)

To some extent, it will always be incomplete. The goal is to cover common cases and reduce the additional work to minimum-viable-program.

Limitations

  • Many Python run-time exceptions are not translatable into Crystal as these issues manifest in Crystal as compile-time errors.
  • A significant portion of python code is untyped and may not translate properly in places where Crystal demands type information.
    • e.g. Crystal Lambda function parameters require typing and this is very uncommon in Python, though may be possible with Callable[] on the python side.
  • Python importing is significantly different than Crystal and thus may not ever map well.
  • Numpy and Unittest which are common in Python don't have equivalents in Crystal. With some significant additional work, converting tests into Spec format may be possible via https://github.com/jaredbeck/minitest_to_rspec as a guide

To-do

  • Remove python2/six dependencies to reduce clutter. Py2 has been end-of-lifed for a while now.
  • Remove numpy dependencies unless/until a suitable target for Crystal can be identified
  • Add additional Crystal shim methods to translate common python3 stdlib methods. Consider a mode that just maps to a close Crystal method rather than using a shim-method to reduce the python-ness.
  • Refactor the code-base. Most of it is in the __init__.py
  • Add additional unit-tests
  • Multi-thread the test-suite so it can run faster.

Contribute

Free to submit an issue. This is very much a work in progress, contributions or constructive feedback is welcome.

If you'd like to hack on py2cr, start by forking the repo on GitHub:

https://github.com/nanobowers/py2cr

Contributing

The best way to get your changes merged back into core is as follows:

  1. Fork it (https://github.com/nanobowers/py2cr/fork)
  2. Create a thoughtfully named topic branch to contain your change (git checkout -b my-new-feature)
  3. Hack away
  4. Add tests and make sure everything still passes by running crystal spec
  5. If you are adding new functionality, document it in the README
  6. If necessary, rebase your commits into logical chunks, without errors
  7. Commit your changes (git commit -am 'Add some feature')
  8. Push to the branch (git push origin my-new-feature)
  9. Create a new Pull Request

License

MIT, see the LICENSE file for exact details.

Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Chenhe Dong 28 Nov 10, 2022
Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

背景 安装教程 快速上手 (一)预训练模型 (二)机器翻译 (三)文本分类 TenTrans 进阶 1. 多语言机器翻译 2. 跨语言预训练 背景 TrenTrans是一个统一的端到端的多语言多任务预训练平台,支持多种预训练方式,以及序列生成和自然语言理解任务。 安装教程 git clone git

Tencent Minority-Mandarin Translation Team 42 Dec 20, 2022
自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器

ja-timex 自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器 概要 ja-timex は、現代日本語で書かれた自然文に含まれる時間情報表現を抽出しTIMEX3と呼ばれるアノテーション仕様に変換することで、プログラムが利用できるような形に規格化するルールベースの解析器です。

Yuki Okuda 116 Nov 09, 2022
This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

About CappuccinoJs This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini! Este conversor criar

Arthur Ottoni Ribeiro 48 Nov 15, 2022
Pretrain CPM - 大规模预训练语言模型的预训练代码

CPM-Pretrain 版本更新记录 为了促进中文自然语言处理研究的发展,本项目提供了大规模预训练语言模型的预训练代码。项目主要基于DeepSpeed、Megatron实现,可以支持数据并行、模型加速、流水并行的代码。 安装 1、首先安装pytorch等基础依赖,再安装APEX以支持fp16。 p

Tsinghua AI 37 Dec 06, 2022
An Explainable Leaderboard for NLP

ExplainaBoard: An Explainable Leaderboard for NLP Introduction | Website | Download | Backend | Paper | Video | Bib Introduction ExplainaBoard is an i

NeuLab 319 Dec 20, 2022
Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any

Little Endian 1 Apr 28, 2022
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

Tsukinousag1 3 Apr 02, 2022
Diaformer: Automatic Diagnosis via Symptoms Sequence Generation

Diaformer Diaformer: Automatic Diagnosis via Symptoms Sequence Generation (AAAI 2022) Diaformer is an efficient model for automatic diagnosis via symp

Junying Chen 20 Dec 13, 2022
Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

Twitter-NLP-Analysis Business Problem I got last @turk_politika 3000 tweets with

Çağrı Karadeniz 7 Mar 12, 2022
Persian Bert For Long-Range Sequences

ParsBigBird: Persian Bert For Long-Range Sequences The Bert and ParsBert algorithms can handle texts with token lengths of up to 512, however, many ta

Sajjad Ayoubi 63 Dec 14, 2022
Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

BP-Transformer This repo contains the code for our paper BP-Transformer: Modeling Long-Range Context via Binary Partition Zihao Ye, Qipeng Guo, Quan G

Zihao Ye 119 Nov 14, 2022
A natural language processing model for sequential sentence classification in medical abstracts.

NLP PubMed Medical Research Paper Abstract (Randomized Controlled Trial) A natural language processing model for sequential sentence classification in

Hemanth Chandran 1 Jan 17, 2022
Code for Emergent Translation in Multi-Agent Communication

Emergent Translation in Multi-Agent Communication PyTorch implementation of the models described in the paper Emergent Translation in Multi-Agent Comm

Facebook Research 75 Jul 15, 2022
keras implement of transformers for humans

keras implement of transformers for humans

苏剑林(Jianlin Su) 4.8k Jan 03, 2023
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Moment-DETR QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries Jie Lei, Tamara L. Berg, Mohit Bansal For dataset de

Jie Lei 雷杰 133 Dec 22, 2022
Awesome-NLP-Research (ANLP)

Awesome-NLP-Research (ANLP)

Language, Information, and Learning at Yale 72 Dec 19, 2022
Script to download some free japanese lessons in portuguse from NHK

Nihongo_nhk This is a script to download some free japanese lessons in portuguese from NHK. It can be executed by installing the packages with: pip in

Matheus Alves 2 Jan 06, 2022
🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

English | 中文 Features 🌍 Chinese supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, data_aishell, and etc. ?

Vega 25.6k Dec 31, 2022
Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

[UPDATED] A TensorFlow Implementation of Attention Is All You Need When I opened this repository in 2017, there was no official code yet. I tried to i

Kyubyong Park 3.8k Dec 26, 2022