Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
A live streaming chatroom involving multiple modalities, such as voice, gesture, and facial expression

HiLive A live streaming chatroom involving multiple modalities, such as voice, gesture, and facial expression. Introduction We focus on demonstrating

Ryan Yen 2 Dec 02, 2021
GlokyPortScannar is a really fast tool to scan TCP ports implemented in Python.

GlokyPortScannar is a really fast tool to scan TCP ports implemented in Python. Installation: This program requires Python 3.9. Linux

gl0ky 5 Jun 25, 2022
Wifi-jammer - Continuously perform deauthentication attacks on all detectable stations

wifi-jammer Continuously perform deauthentication attacks on all detectable stat

Leonardo de Araujo 14 Nov 03, 2022
Display ip2.network active live streams.

Display ip2.network active live streams.

Daeshon Jones 0 Oct 31, 2021
openPortScanner is a port scanner made with Python!

Port Scanner made with python • Installation • Usage • Commands Installation Run this to install: $ git clone https://github.com/Miguel-Galdin0/openPo

Miguel Galdino 7 Jan 09, 2022
Wifijammer - Continuously jam all wifi clients/routers

wifijammer Continuously jam all wifi clients and access points within range. The effectiveness of this script is constrained by your wireless card. Al

Dan McInerney 3.5k Dec 31, 2022
BLE parser for passive BLE advertisements

This pypi package is parsing BLE advertisements to readable data for several sensors and can be used for device tracking, as long as the MAC address is static. The parser was originally developed as

Ernst Klamer 19 Dec 26, 2022
Quickly fetch your WiFi password and if needed, generate a QR code of your WiFi to allow phones to easily connect

wifi-password Quickly fetch your WiFi password and if needed, generate a QR code of your WiFi to allow phones to easily connect. Works on macOS and Li

Siddharth Dushantha 2.6k Jan 05, 2023
IP Pinger - This tool allows you to enter an IP and check if its currently connected to a host

IP Pinger - This tool allows you to enter an IP and check if its currently connected to a host

invasion 3 Feb 18, 2022
Minimal, self-hosted, 0-config alternative to ngrok. Caddy+OpenSSH+50 lines of Python.

If you have a webserver running on one computer (say your development laptop), and you want to expose it securely (ie HTTPS) via a public URL, SirTunnel allows you to easily do that.

Anders Pitman 423 Jan 02, 2023
Roadster - Distance to Closest Road Feature Server

Roadster: Distance to Closest Road Feature Server Milliarium Aerum, the zero of

Textualization Software Ltd. 4 May 23, 2022
Simple Python Script to Parse Apache Log, Get all Unique IPs and Urls visited by that IP

Parse_Apache_Log Simple Python Script to Parse Apache Log, Get all Unique IPs and Urls visited by that IP. It will create 3 different files. allIP.txt

Kathan Patel 2 Mar 29, 2022
AV Evasion, a Red Team Tool - Fiber, APC, PNG and UUID

AV Evasion, a Red Team Tool - Fiber, APC, PNG and UUID

9 Mar 07, 2022
A simple python script to send cute messages to my boyfriend.

Morning Messages A simple python script to send cute messages to my boyfriend. It gives him the weather and news currently. Installation git clone htt

Sabrina Medwinter 3 Oct 12, 2022
Domain To Api [ PYTHON ]

Domain To IP Usage You Open Terminal For Run The Program python ip.py Input & Output Input Your List e.g domain.txt Output ( For Save Output File )

It's Me Jafar 0 Dec 12, 2021
A simple software which can use to make a server in local network

home-nas it is simple software which can use to make a server in local network, it has a web site on it which can use by multipale system, i use nginx

R ansh joseph 1 Nov 10, 2021
基于多线程快速端口扫描脚本,支持目标批量导入、结果导出。

JWS_portscan 基于多线程快速端口扫描脚本,支持目标批量导入、结果导出。如果扫描公网资产,为了提升扫描的精准性,建议放到服务器运行。 用法 依赖安装:pip3 install -r requriement.txt 支持参数:python3 JWS_portscan.py --help 脚本

jammny 5 Apr 12, 2022
MS Iot Device Can Platform

Kavo MS IoT Platform Version: 2.0 Author: Luke Garceau Requirements Read CAN messages in real-time Convert the given variables to engineering useful v

Luke Garceau 1 Oct 13, 2021
GNS3 Graphical Network Simulator

GNS3-gui GNS3 GUI repository.

GNS3 1.7k Dec 29, 2022
Ipscanner - A simple threaded IP-Scanner written in python3 that can monitor local IP's in your network

IPScanner 🔬 A simple threaded IP-Scanner written in python3 that can monitor lo

4 Dec 12, 2022