How to Run Web Drivers with Proxies in Python

Often times when scraping web data, accessing API’s, or any other automated web activity through Python, you will want to use a web driver and/or proxy as levels of protection for your script and personal IP address. Using combinations of different web drivers and proxies can help ensure your process and personal IP are not blocked by whatever website or app you are trying access. Luckily there are easy-to-use resources that make all of this possible. Here I will outline some methods for doing so.

This post assumes that you already have these installed:

  • Python 2.6, Python 2.7, or Python 3.3+
  • pip

The basic process will work like this:

  1. We will have a Python script that will use a free* proxy API service (GimmeProxy) to request an open IP to use as a proxy (*free within usage limits).
  2. We will store the IP in a separate text file.
  3. Then whenever we have any other Python script that could use the protection of a proxy, we can reference that text file for the useable IP. If the proxy IP ever gets blocked or is no longer available, then we can rerun our proxy script to request a new proxy.
  4. Furthermore, if our process needs to automate any web browsing activity (such as scraping web data), we can use Selenium (a Python package) with web drivers to imitate a browser (such as Chrome, Firefox, PhantomJS, etc…).

First you’ll need to pip install requests and selenium:

sudo pip install requests selenium

Next, you’ll need at least one of the supported web drivers:

I personally prefer PhantomJS. The examples that follow will use PhantomJS.
Now, we can first create our proxy requesting script:
Save that script and call it something like “proxy_request.py”, for example. This script will grab a proxy and port for us, and save them to a text file that can be read by any other script that needs a proxy. Now you can include something like this in any Python scripts that need a proxy:
Finally, if you want to automate any type of web browsing activity (such as data scraping), then you can combine using a web driver with Selenium and a proxy:
As you’ll see, you can do things like open urls and even execute scripts that are used by the web url, fully automated, all through the protection of a proxy, so you never have to you worry about your personal IP address getting blocked.
NOTE: When you’re finished using your driver in your script, you should quit your driver with this command:
driver.quit()
Otherwise your driver will be left running indefinitely.
As always, thank you for reading and stay tuned for more future ramblings!
-JPR

9 thoughts on “How to Run Web Drivers with Proxies in Python

  1. June says:

    # -*- coding: utf-8 -*-
    import pandas
    import datetime
    from datetime import timedelta
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
    from pyvirtualdisplay import Display
    from selenium.webdriver.common.proxy import ProxyType
    from Utility import *

    dcap = dict(DesiredCapabilities.PHANTOMJS)
    proxy = Proxy({‘proxyType’: ProxyType.MANUAL, ‘httpProxy’: ‘175.155.24.55:808’})
    proxy.add_to_capabilities(dcap)
    driver = webdriver.PhantomJS(desired_capabilities=dcap, executable_path=”D:\phantomjs\\bin\phantomjs.exe”)
    wait = WebDriverWait(driver, 5)
    driver.get(“http://icanhazip.com/”)
    html=driver.page_source
    print html
    cap_dict = driver.desired_capabilities
    for key in cap_dict:
    print ‘%s: %s’ % (key, cap_dict[key])
    print driver.current_url
    for cookie in driver.get_cookies():
    print cookie
    driver.quit()

    ###########################################
    how come I get a blank page when using proxy while full page when not using?
    you did not have that problem?

    Like

  2. Etienne says:

    Nice tuto ! Thank you very much ! I just had a question : the proxy_ip in the last photos contains IP:PORT or just the IP ?

    Like

      • Etienne says:

        Ok, thank you, I have a problem with the code : I tried to build that function :

        def open_browser() :
        headers={‘Accept’:’*/*’,’Accept-Encoding’:’gzip, deflate, sdch’,’Accept-Language’:’en-US,en;q=0.8′,’Cache-Control’:’max-age=0′,’User-Agent’:’Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.517 Safari/537.36′}
        for key,value in enumerate(headers):
        webdriver.DesiredCapabilities.PHANTOMJS[‘phatomjs.page.customHeaders.{}’.format(key)]=value
        service_args=[‘–proxy=’+proxy_ip+’:9999′,’–proxy-type=socks5′]
        driver=webdriver.PhantomJS(service_args=service_args)
        driver.implicitly_wait(10)
        driver.set_page_load_timeout(300)
        return(driver)

        And I compile and give me a webdriver but when I try to scrap a website like ‘https://whatismyipaddress.com/’ with this driver it gives my real IP and not the proxy one

        (I used proxy_ip with that format : xxxx.xxxx.xxxx.xxxx:xxxx )

        Could you please help me to figure out what’s wrong ?

        Like

      • John Patrick Roach says:

        Looks like it could be an issue with the proxy-type definition. I’ll actually need to build in functionality to pull the proxy type when the new proxy is requested and use that. But for now, try testing using http instead of socks5 for the proxy-type and see if that fixes it. I’ll get back to you once I have my updates too.

        Like

  3. Vladimir Strycek says:

    You can shorten the get proxy a bit with json reader build in requests 🙂 no need to split and replace stuff.

    import requests
    try:
    r = requests.get(“http://gimmeproxy.com/api/getProxy?country=US”)
    t = r.json()
    string_proxy = t[‘ip’] + “:” + t[‘port’]
    print(‘PROXY: ‘ + string_proxy)

    f = open(r”C:\\proxy.txt”, “w”,encoding=”utf-8″)
    f.write(string_proxy)
    f.close()
    except:
    print(“Error requesting new proxy”)
    f = open(r”C:\\\proxy.txt”, “w”,encoding=”utf-8″)
    f.write(‘0’)
    f.close()

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s