Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Use Raspberry Pi and CircuitSetup's power monitor hardware to publish electrical usage to MQTT

This repo has code and notes for whole home electrical power monitoring using a Raspberry Pi and CircuitSetup modules. Beyond just collecting data, it

Eric Tsai 10 Jul 25, 2022
A repository dedicated to IoT(internet of things ) and python scripts

📑 Introduction Week of Learning is a weekly program in which you will get all the necessary knowledge about Circuit-Building, Arduino and Micro-Contr

27 Nov 22, 2022
API for concurrency connections

Multi-connection-server-API API for concurrency connections difference between this server and the echo server is the call to lsock.setblocking(False)

Muziwandile Nkomo 1 Jan 04, 2022
Easy-to-use sync library for handy proxy parse

Proxy Parser About Synchronous library, for convenient and fast parsing of proxies from different sources. Uses Scrapy as a parser. At the moment the

Michael Mironov 2 Nov 22, 2022
A Scapy implementation of SMS-SUBMIT and (U)SIM Application Toolkit command packets.

A Scapy implementation of SMS-SUBMIT and (U)SIM Application Toolkit command packets.

mnemonic 83 Dec 11, 2022
euserv auto-renew script - A Python script which can help you renew your free EUserv IPv6 VPS.

eu_ex eu_ex means EUserv_extend. A Python script which can help you renew your free EUserv IPv6 VPS. This Script can check the VPS amount in your acco

A beam of light 92 Jan 25, 2022
Openconnect VPN RPi Gateway

Openconnect-VPN-RPi-Gateway See the blog (Chinese) for how to build an Openconne

Zhongze Tang 2 Jan 30, 2022
QUIC and HTTP/3 implementation in Python

aioquic What is aioquic? aioquic is a library for the QUIC network protocol in Python. It features a minimal TLS 1.3 implementation, a QUIC stack and

1.2k Dec 29, 2022
A network address manipulation library for Python

netaddr A system-independent network address manipulation library for Python 2.7 and 3.5+. (Python 2.7 and 3.5 support is deprecated). Provides suppor

711 Jan 05, 2023
A Python Packages to make own chat room

Chathon A Python packages for make own chat room Install PyPI pip install chathon

1 Dec 10, 2021
simple subdomain finder

Subdomain-finder Simple SubDomain finder using python which is easy to use just download and run it Wordlist you can use your own wordlist but here i

AsjadOwO 5 Sep 24, 2021
A repository to spoof ARP table of any devices and successfully establish Man in the Middle(MITM) attack using Python3 in Linux

arp_spoofer A repository to spoof ARP table of any devices and successfully establish Man in the Middle(MITM) attack using Python3 in Linux Usage: git

Surya Das N 1 Oct 30, 2021
Wifijammer - Continuously jam all wifi clients/routers

wifijammer Continuously jam all wifi clients and access points within range. The effectiveness of this script is constrained by your wireless card. Al

Dan McInerney 3.5k Dec 31, 2022
FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware. FIRM-AFL addresses two fundamental problems in IoT fuzzing

356 Dec 23, 2022
Exfiltrate files using the HTTP protocol version ("HTTP/1.0" is a 0 and "HTTP/1.1" is a 1)

http-protocol-exfil Use the HTTP protocol version to send a file bit by bit ("HTTP/1.0" is a 0 and "HTTP/1.1" is a 1). It uses GET requests so the Blu

Ricardo Ruiz 23 Apr 30, 2022
Monitoring plugin to check network interfaces with Icinga, Nagios and other compatible monitoring solutions

check_network_interface - Monitor network interfaces This is a monitoring plugin for Icinga, Nagios and other compatible monitoring solutions to check

DinoTools 3 Nov 15, 2022
No-dependency, single file NNTP server library for developing modern, rfc3977-compliant (bridge) NNTP servers.

nntpserver.py No-dependency, single file NNTP server library for developing modern, rfc3977-compliant (bridge) NNTP servers for python =3.7. Develope

Manos Pitsidianakis 44 Nov 14, 2022
This is simple script that changes the config register of a cisco router over serial so that you can reset the password

Cisco-router-config-bypass-tool- This is simple script that changes the config register of a cisco router over serial so that you can bypass the confi

James 1 Jan 02, 2022
A simple chat room using socket and threading for handle multiple connections.

• Socket Chat Room was a little project for socket study. It works with a server handling the incoming connections from the clients. Clients send encoded messages while waiting for others clients mes

Guilherme de Oliveira 2 Mar 03, 2022
Network-Shredder is a python based NIDS.

Network-Shredder is a python based NIDS.

Oussama RAHALI 9 Dec 13, 2022