a small library for extracting rich content from urls

Last update: Dec 27, 2022

Related tags

Overview

A small library for extracting rich content from urls.

what does it do?

micawber supplies a few methods for retrieving rich metadata about a variety of links, such as links to youtube videos. micawber also provides functions for parsing blocks of text and html and replacing links to videos with rich embedded content.

examples

here is a quick example:

import micawber

# load up rules for some default providers, such as youtube and flickr
providers = micawber.bootstrap_basic()

providers.request('http://www.youtube.com/watch?v=54XHDUOHuzU')

# returns the following dictionary:
{
    'author_name': 'pascalbrax',
    'author_url': u'http://www.youtube.com/user/pascalbrax'
    'height': 344,
    'html': u'<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>',
    'provider_name': 'YouTube',
    'provider_url': 'http://www.youtube.com/',
    'title': 'Future Crew - Second Reality demo - HD',
    'type': u'video',
    'thumbnail_height': 360,
    'thumbnail_url': u'http://i2.ytimg.com/vi/54XHDUOHuzU/hqdefault.jpg',
    'thumbnail_width': 480,
    'url': 'http://www.youtube.com/watch?v=54XHDUOHuzU',
    'width': 459,
    'version': '1.0',
}

providers.parse_text('this is a test:\nhttp://www.youtube.com/watch?v=54XHDUOHuzU')

# returns the following string:
this is a test:
<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>

providers.parse_html('<p>http://www.youtube.com/watch?v=54XHDUOHuzU</p>')

# returns the following html:
<p><iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&amp;feature=oembed" frameborder="0" allowfullscreen="allowfullscreen"></iframe></p>

a small library for extracting rich content from urls

Related tags

Overview

what does it do?

examples

Owner

Charles Leifer

CRI Scrape is a tool for get general info about Italian Red Cross in GAIA Platform

Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a bot

Ebay Webscraper for Getting Average Product Price

An experiment to deploy a serverless infrastructure for a scrapy project.

A crawler of doubamovie

A Python web scraper to scrape latest posts from official Coinbase's Blog.

Scrapy-soccer-games - Scraping information about soccer games from a few websites

🐞 Douban Movie / Douban Book Scarpy

A Python library for automating interaction with websites.

Demonstration on how to use async python to control multiple playwright browsers for web-scraping

Examine.com supplement research scraper!

Dictionary - Application focused on word search through web scraping

Async Python 3.6+ web scraping micro-framework based on asyncio

Deep Web Miner Python | Spyder Crawler

原神爬虫抓取原神界面圣遗物信息

Crawl the information of a given keyword on Google search engine

HappyScrapper - Google news web scrapper with python

CreamySoup - a helper script for automated SourceMod plugin updates management.

Scraping followers of an instagram account

Works very well and you can ask for the type of image you want the scrapper to collect.

a small library for extracting rich content from urls

Related tags

Overview

what does it do?

examples

Owner

Charles Leifer

CRI Scrape is a tool for get general info about Italian Red Cross in GAIA Platform

Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a bot

Ebay Webscraper for Getting Average Product Price

An experiment to deploy a serverless infrastructure for a scrapy project.

A crawler of doubamovie

A Python web scraper to scrape latest posts from official Coinbase's Blog.

Scrapy-soccer-games - Scraping information about soccer games from a few websites

🐞 Douban Movie / Douban Book Scarpy

A Python library for automating interaction with websites.

Demonstration on how to use async python to control multiple playwright browsers for web-scraping

Examine.com supplement research scraper!

Dictionary - Application focused on word search through web scraping

Async Python 3.6+ web scraping micro-framework based on asyncio

Deep Web Miner Python | Spyder Crawler

原神爬虫 抓取原神界面圣遗物信息

Crawl the information of a given keyword on Google search engine

HappyScrapper - Google news web scrapper with python

CreamySoup - a helper script for automated SourceMod plugin updates management.

Scraping followers of an instagram account

Works very well and you can ask for the type of image you want the scrapper to collect.

原神爬虫抓取原神界面圣遗物信息