Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Multi-path load balancing is a method used by most of the real-time network to split the packets into different paths rather than transferring it through a single path

Multipath-Load-Balancing Method of managing incoming traffic by distributing and sharing load fairly among multiple routes from source to destination

Dharshan Kumar 6 Dec 10, 2022
Interact remotely with the computer using Python and MQTT protocol 💻

Comandos_Remotos Interagir remotamento com o computador através do Python e protocolo MQTT. 💻 Status: em desenvolvimento 🚦 Objetivo: Interagir com o

Guilherme_Donizetti 6 May 10, 2022
track IP Address

ipX Table of Contents ipX Welcome Features Uses Author 📝 License Welcome find the location of an IP address. Specifically, you can get the following

Ali Shahid 15 Sep 26, 2022
Share clipboards between two devices in a network

Shared Clipboard I felt the need for sharing clipboard texts between virtual machines but I didn't find any reliable solutions for this (I use HyperV)

Teja Swaroop 9 Jun 10, 2022
Ultimate transformation library that supports validation, contexts and aiohttp.

Trafaret Ultimate transformation library that supports validation, contexts and aiohttp. Trafaret is rigid and powerful lib to work with foreign data,

Mikhail Krivushin 174 Nov 27, 2022
Visualize the electric field of a point charge network.

ElectriPy ⚡ Visualize the electric field of a point charges network. 🔌 Installation Install ElectriPy package: $ pip install electripy You are all d

Dylan Tintenfich 29 Aug 29, 2022
GhostVPN - Simple and lightweight TUI application for CyberGhostVPN

GhostVPN Simple and lightweight TUI application for CyberGhostVPN. Screenshot Us

Mehmet Ali KERİMOĞLU 5 Jul 27, 2022
Medusa is a cross-platform agent compatible with both Python 3.8 and Python 2.7.

Medusa Medusa is a cross-platform agent compatible with both Python 3.8 and Python 2.7. Installation To install Medusa, you'll need Mythic installed o

Mythic Agents 123 Nov 09, 2022
Desktop application for checking sites connection in a background mode

Site connectivity checker Desktop application for checking site connection in a background mode by sending ICMP messages. Problem and solution Usually

Karina Singatullina 26 Dec 19, 2022
tradingview socket api for fetching real time prices.

tradingView-API tradingview socket api for fetching real time prices. How to run git clone https://github.com/mohamadkhalaj/tradingView-API.git cd tra

MohammadKhalaj 35 Dec 31, 2022
Official ProtonVPN Linux app

ProtonVPN Linux App Copyright (c) 2021 Proton Technologies AG This repository holds the ProtonVPN Linux App. For licensing information see COPYING. Fo

ProtonVPN 288 Jan 01, 2023
Multiple-requests-poster - A tool to send multiple requests to a particular website written in Python

Multiple-requests-poster - A tool to send multiple requests to a particular website written in Python

RLX 2 Feb 14, 2022
Huawei firewall automatically updates Chinese ip to target IP group.

Huawei firewall automatically updates Chinese ip to target IP group.

Lundaa 0 Jan 11, 2022
Take a list of domains and probe for working HTTP and HTTPS servers

httprobe Take a list of domains and probe for working http and https servers. Install ▶ go get -u github.com/tomnomnom/httprobe Basic Usage httprobe

Tom Hudson 2.3k Dec 28, 2022
pfSense integration with Home Assistant

hass-pfsense Join pfSense with home-assistant! hass-pfsense uses the built-in xmlrpc service of pfSense for all interactions. No special plugins or so

Travis Glenn Hansen 105 Dec 24, 2022
A Python tool used to automate the execution of the following tools : Nmap , Nikto and Dirsearch but also to automate the report generation during a Web Penetration Testing

📡 WebMap A Python tool used to automate the execution of the following tools : Nmap , Nikto and Dirsearch but also to automate the report generation

Iliass Alami Qammouri 274 Jan 01, 2023
Exfiltrate files using the HTTP protocol version ("HTTP/1.0" is a 0 and "HTTP/1.1" is a 1)

http-protocol-exfil Use the HTTP protocol version to send a file bit by bit ("HTTP/1.0" is a 0 and "HTTP/1.1" is a 1). It uses GET requests so the Blu

Ricardo Ruiz 23 Apr 30, 2022
High capacity, high availability, well connected, fast lightning node.

LND ⚡ Routing High capacity, high availability, well connected, fast lightning node. We aim to become a top liquidity provider for the lightning netwo

18 Dec 16, 2022
WARP+ uses Cloudflare’s virtual private backbone, known as Argo, to achieve higher speeds and ensure your connection is encrypted across the long haul of the Internet

WARP+ uses Cloudflare’s virtual private backbone, known as Argo, to achieve higher speeds and ensure your connection is encrypted across the long haul of the Internet

Rivane Rasetiansyah 3 Apr 01, 2022
Simple reverse backdoor utility, that uses sockets to communicate.

reverse_backdoor Simple reverse backdoor utility, that uses sockets to communicate. How to use Run rev_bd_listener.py using command below: $ python3 r

1 Dec 10, 2021