Taking the fight to the establishment.

Overview

Throwdown

Taking the fight to the establishment.

Wat?

I wanted a simple markdown interpreter in python and/or javascript to output html for my website. Python does not have a bug-free official distribution, javascript only has things you install through npm and I don't want to have anything to do with node and the 100 MB of dependencies you end up uploading to your FTP server in order to do the most basic tasks.

So writing my own parser it is then eh? I tried to trudge through the commonmark markdown spec and had a heart attack at the complexity. 24722 words and 181 pages of complicated language explaining features I absolutely don't need.

I just want minimal, well defined, syntactical elements with maximum payoff, so here is throwdown. Taking the fight to the establishment to have a stupidly minimal markup language in both definition and capability.

Goals

Keep it a subset for markdown so we can use existing IDEs & plugins. Support HTML tags in line with text. Write a well defined language spec in the manual to ease creating new interpreters for it.

The spec

Tokenization

Given a piece of text we tokenize the following concepts (these are regular expressions using DOTALL and MULTILINE modifiers):

blank_line(s): (\r | \n | \r\n){2,}
html tag: <.*?>
code: ```.*?```
unescaped italic: ^_|(?

any characters inbetween matching tokens are flagged content.

Content itself gets an additional treatment where we replace this regex

\\(?)

for escaped characters, with whatever was in the matched group. I currently do this in the generation stage but it could move to any stage.

I am not sure if in the UTF8 2/3/4 byte characters any of these elements may match, so make sure to perform these single-characetr checks per unicode char, not per byte.

Parsing

We then have a parsing pass that tries to group matching tokens:

In this example:

This *word* is bold but this* is wrong.

We have the following tokens:

content, bold, content, bold, content, bold, content

The parser simply finds any content block surrounded by matching code|italic|bold neighbours, and then 'consumes' these neighbours so they can not be picked up more than once. Reading from left to right this means we get (note we search outwards from content recursively to support *_content_* notations, instead of holding on to the boundary tokens as soon as we encounter them):

content group content bold content

Then, any token outside of a group gets merged into it's content, any consecutive content gets merged into 1 content. The first step reduced the bold into it's left neighbour:

content group content content

The next step reduces the two content blocks into one:

content group content

The above step should include html tags.

A final step is to remove the blank line tokens, but first we must make sure to merge consecutive group and content blocks, because after this any consecutive content and/or group tokens are known unique paragraphs (or headers) so the blank lines are no longer necessary to imply this separation.

Generation

Then there is the generation step. We simply walk the resulting tokens and output a html document.

  • If a content group is preceded by a heading, the node gets wrapped into tags where n is the number of #.
  • Every other content node gets wrapped into

    tags.

  • Every group gets wrapped based on the first and last tokens (which are identical).
    • italic becomes In this case the wrapping is recursive, a bold group in an intalic group may exist.
    • bold becomes In this case the wrapping is recursive, an italic group in a bold group may exist.
    • code becomes

Write
to insert single line breaks manually.

TODO:

Consider bullet points and numbered lists, though the html is not super invasive.

Owner
Trevor van Hoof
I write tools & shaders TropicalTrevor in the Demoscene
Trevor van Hoof
A Powerful Tool For Making Combo List(All possible modes)

ComboMaker A Powerful Tool For Making Combo List Introduction Check out all possible Combo list build modes with this tool =) How to Install Open the

MasterBurnt 7 Jan 07, 2023
Python with braces. Because Python is awesome, but whitespace is awful.

Bython Python with braces. Because Python is awesome, but whitespace is awful. Bython is a Python preprosessor which translates curly brackets into in

1 Nov 04, 2021
Tool to audit and fix Python project requirements.

Requirement Auditor Utility to revise and updated python requirement files.

Luis Carlos Berrocal 1 Nov 07, 2021
SEH-Helper - Binary Ninja plugin for exploring Structured Exception Handlers

SEH Helper Author: EliseZeroTwo A Binary Ninja helper for exploring structured e

Elise 74 Dec 26, 2022
Resources for the 2021 offering of COMP 598

comp598-2021 Resources for the 2021 offering of COMP 598 General submission instructions Important Please read these instructions located in the corre

Derek Ruths 23 May 18, 2022
This application demonstrates IoTVAS device discovery and security assessment API integration with the Rapid7 InsightVM.

Introduction This repository hosts a sample application that demonstrates integrating Firmalyzer's IoTVAS API with the Rapid7 InsightVM platform. This

Firmalyzer BV 4 Nov 09, 2022
Utility to play with ADCS, allows to request tickets and collect information about related objects

certi Utility to play with ADCS, allows to request tickets and collect information about related objects. Basically, it's the impacket copy of Certify

Eloy 185 Dec 29, 2022
Python interface to ISLEX, an English IPA pronunciation dictionary with syllable and stress marking.

pysle Questions? Comments? Feedback? Pronounced like 'p' + 'isle'. An interface to a pronunciation dictionary with stress markings (ISLEX - the intern

Tim 38 Dec 14, 2022
Python 100daysofcode

#python #100daysofcode Python is a simple, general purpose ,high level & object-oriented programming language even it's is interpreted scripting langu

Tara 1 Feb 10, 2022
Cairo-bloom - A naive bloom filter implementation in Cairo

🥀 cairo-bloom A naive bloom filter implementation in Cairo. A Bloom filter is a

Sam Barnes 37 Oct 01, 2022
Load dependent libraries dynamically.

dypend dypend Load dependent libraries dynamically. A few days ago, I encountered many users feedback in an open source project. The Problem is they c

Louis 5 Mar 02, 2022
Craxk is a SINGLE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reproduced by a single machine.

What is Craxk ? Craxk is a UNIQUE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reprod

5 Jun 19, 2021
An Embedded Linux Project Build and Compile Tool -- An Bitbake UI Extension

Dianshao - An Embedded Linux Project Build and Compile Tool

0 Mar 27, 2022
Mannaggia is a python application to praise or more likely to curse the saints

Mannaggia-py 👼 Remember Mannaggia? This is a Python remake of it, with new features. mannaggia is a python application to praise or more likely to cu

Christian Visintin 9 Aug 12, 2022
When should you berserk in lichess arena tournament games?

When should you berserk in a lichess arena tournament game? 1+0 arena tournament 3+0 arena tournament Explanation For details on how I arrived at the

18 Aug 03, 2022
An example of Connecting a MySQL Database with Python Code

An example of Connecting And Query Data a MySQL Database with Python Code And How to install Table of contents General info Technologies Setup General

Mohammad Hosseinzadeh 1 Nov 23, 2021
A micro-service that can be extended to help in monitoring systems

A micro-service that can be extended to help in monitoring systems. Be extensible to be incorporated in any of the systems to facilitate timely interventions.

Peter Kagwe 1 Feb 06, 2022
A pure-Python codified rant aspiring to a world where numbers and types can work together.

Copyright and other protections apply. Please see the accompanying LICENSE file for rights and restrictions governing use of this software. All rights

Matt Bogosian 28 Sep 04, 2022
Library for managing git hooks

Autohooks Library for managing and writing git hooks in Python. Looking for automatic formatting or linting, e.g., with black and pylint, while creati

Greenbone 165 Dec 16, 2022
sawa (ꦱꦮ) is an open source programming language, an interpreter to be precise, where you can write python code using javanese character.

ꦱꦮ sawa (ꦱꦮ) is an open source programming language, an interpreter to be precise, where you can write python code using javanese character. sawa iku

Rony Lantip 307 Jan 07, 2023