Often times when scraping web data, accessing API’s, or any other automated web activity through Python, you will want to use a web driver and/or proxy as levels of protection for your script and personal IP address. Using combinations of different web drivers and proxies can help ensure your process and personal IP are not blocked by whatever website or app you are trying access. Luckily there are easy-to-use resources that make all of this possible. Here I will outline some methods for doing so.
This post assumes that you already have these installed:
- Python 2.6, Python 2.7, or Python 3.3+
The basic process will work like this:
- We will have a Python script that will use a free* proxy API service (GimmeProxy) to request an open IP to use as a proxy (*free within usage limits).
- We will store the IP in a separate text file.
- Then whenever we have any other Python script that could use the protection of a proxy, we can reference that text file for the useable IP. If the proxy IP ever gets blocked or is no longer available, then we can rerun our proxy script to request a new proxy.
- Furthermore, if our process needs to automate any web browsing activity (such as scraping web data), we can use Selenium (a Python package) with web drivers to imitate a browser (such as Chrome, Firefox, PhantomJS, etc…).
First you’ll need to pip install requests and selenium:
sudo pip install requests selenium
Next, you’ll need at least one of the supported web drivers: