Often times when scraping web data, accessing API’s, or any other automated web activity through Python, you will want to use a web driver and/or proxy as levels of protection for your script and personal IP address. Using combinations of different web drivers and proxies can help ensure your process and personal IP are not blocked by whatever website or app you are trying access. Luckily there are easy-to-use resources that make all of this possible. Here I will outline some methods for doing so.
This post assumes that you already have these installed:
- Python 2.6, Python 2.7, or Python 3.3+
- pip
The basic process will work like this:
- We will have a Python script that will use a free* proxy API service (GimmeProxy) to request an open IP to use as a proxy (*free within usage limits).
- We will store the IP in a separate text file.
- Then whenever we have any other Python script that could use the protection of a proxy, we can reference that text file for the useable IP. If the proxy IP ever gets blocked or is no longer available, then we can rerun our proxy script to request a new proxy.
- Furthermore, if our process needs to automate any web browsing activity (such as scraping web data), we can use Selenium (a Python package) with web drivers to imitate a browser (such as Chrome, Firefox, PhantomJS, etc…).
First you’ll need to pip install requests and selenium:
sudo pip install requests selenium
Next, you’ll need at least one of the supported web drivers:
I personally prefer PhantomJS. The examples that follow will use PhantomJS.
Now, we can first create our proxy requesting script:
Save that script and call it something like “proxy_request.py”, for example. This script will grab a proxy and port for us, and save them to a text file that can be read by any other script that needs a proxy. Now you can include something like this in any Python scripts that need a proxy:
Finally, if you want to automate any type of web browsing activity (such as data scraping), then you can combine using a web driver with Selenium and a proxy:
As you’ll see, you can do things like open urls and even execute scripts that are used by the web url, fully automated, all through the protection of a proxy, so you never have to you worry about your personal IP address getting blocked.
NOTE: When you’re finished using your driver in your script, you should quit your driver with this command:
driver.quit()
Otherwise your driver will be left running indefinitely.
As always, thank you for reading and stay tuned for more future ramblings!
-JPR