I'm trying to get the content of App Store > Business:

import requestsfrom lxml import htmlpage = requests.get("https://itunes.apple.com/in/genre/ios-business/id6000?mt=8")tree = html.fromstring(page.text)flist = []plist = []for i in range(0, 100):app = tree.xpath("//div[@class='column first']/ul/li/a/@href")ap = app[0]page1 = requests.get(ap)

When I try the range with (0,2) it works, but when I put the range in 100s it shows this error:

Traceback (most recent call last):File "/home/preetham/Desktop/eg.py", line 17, in <module>page1 = requests.get(ap)File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in getreturn request('get', url, **kwargs)File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in requestreturn session.request(method=method, url=url, **kwargs)File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 383, in requestresp = self.send(prep, **send_kwargs)File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in sendr = adapter.send(request, **kwargs)File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 378, in sendraise ConnectionError(e)requests.exceptions.ConnectionError: HTTPSConnectionPool(host='itunes.apple.com', port=443): Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)
20

Best Answer


Just use requests features:

import requestsfrom requests.adapters import HTTPAdapterfrom urllib3.util.retry import Retrysession = requests.Session()retry = Retry(connect=3, backoff_factor=0.5)adapter = HTTPAdapter(max_retries=retry)session.mount('http://', adapter)session.mount('https://', adapter)session.get(url)

This will GET the URL and retry 3 times in case of requests.exceptions.ConnectionError. backoff_factor will help to apply delays between attempts to avoid failing again in case of periodic request quota.

Take a look at urllib3.util.retry.Retry, it has many options to simplify retries.

If you are encountering the 'max retries exceeded with url: udemy dl' error, it means that your attempt to download a course from Udemy has failed due to reaching the maximum number of retries for the specified URL. This error message is commonly encountered when using a download manager or attempting to download a large course bundle.

To resolve this error and continue with your Udemy download, there are a few troubleshooting steps you can try:

1. Check your internet connection: Ensure that you have a stable and reliable internet connection. If your internet connection is weak or intermittent, it may cause interruptions during the download process.

2. Retry the download: Sometimes, the error may occur due to temporary server issues. In such cases, simply retrying the download after some time can resolve the issue. If the error persists, try the next steps.

3. Clear your browser cache: Clearing your browser's cache can help eliminate any temporary files or corrupted data that may be interfering with the download process. Go to your browser's settings and clear the cache.

4. Disable download managers: If you are using a download manager, try disabling it and downloading the course directly through your browser. Some download managers may have limitations or conflicts that can cause the 'max retries exceeded' error.

What happened here is that itunes server refuses your connection (you're sending too many requests from same ip address in short period of time)

Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8

error trace is misleading it should be something like "No connection could be made because the target machine actively refused it".

There is an issue at about python.requests lib at Github, check it out here

To overcome this issue (not so much an issue as it is misleading debug trace) you should catch connection related exceptions like so:

try:page1 = requests.get(ap)except requests.exceptions.ConnectionError:r.status_code = "Connection refused"

Another way to overcome this problem is if you use enough time gap to send requests to server this can be achieved by sleep(timeinsec) function in python (don't forget to import sleep)

from time import sleep

All in all requests is awesome python lib, hope that solves your problem.

Just do this,

Paste the following code in place of page = requests.get(url):

import timepage = ''while page == '':try:page = requests.get(url)breakexcept:print("Connection refused by the server..")print("Let me sleep for 5 seconds")print("ZZzzzz...")time.sleep(5)print("Was a nice sleep, now let me continue...")continue

You're welcome :)

I got similar problem but the following code worked for me.

url = <some REST url> page = requests.get(url, verify=False)

"verify=False" disables SSL verification. Try and catch can be added as usual.

pip install pyopenssl seemed to solve it for me.

https://github.com/requests/requests/issues/4246

Specifying the proxy in a corporate environment solved it for me.

page = requests.get("http://www.google.com:80", proxies={"http": "http://111.233.225.166:1234"})

The full error is:

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.google.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError(': Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

It is always good to implement exception handling. It does not only help to avoid unexpected exit of script but can also help to log errors and info notification. When using Python requests I prefer to catch exceptions like this:

 try:res = requests.get(adress,timeout=30)except requests.ConnectionError as e:print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")print(str(e)) renewIPadress()continueexcept requests.Timeout as e:print("OOPS!! Timeout Error")print(str(e))renewIPadress()continueexcept requests.RequestException as e:print("OOPS!! General Error")print(str(e))renewIPadress()continueexcept KeyboardInterrupt:print("Someone closed the program")

Here renewIPadress() is a user define function which can change the IP address if it get blocked. You can go without this function.

Adding my own experience for those who are experiencing this in the future. My specific error was

Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'

It turns out that this was actually because I had reach the maximum number of open files on my system. It had nothing to do with failed connections, or even a DNS error as indicated.

When I was writing a selenium browser test script, I encountered this error when calling driver.quit() before a usage of a JS api call.Remember that quiting webdriver is last thing to do!

i wasn't able to make it work on windows even after installing pyopenssl and trying various python versions (while it worked fine on mac), so i switched to urllib and it works on python 3.6 (from python .org) and 3.7 (anaconda)

import urllib from urllib.request import urlopenhtml = urlopen("http://pythonscraping.com/pages/page1.html")contents = html.read()print(contents)

just import timeand add :

time.sleep(6)

somewhere in the for loop, to avoid sending too many request to the server in a short time.the number 6 means: 6 seconds.keep testing numbers starting from 1, until you reach the minimum seconds that will help to avoid the problem.

It could be network config issue also. So, for that u need to re-config ur network confgurations.

for Ubuntu :sudo vim /etc/network/interfaces

add 8.8.8.8 in dns-nameserver and save it.

reset ur network : /etc/init.d/networking restart

Now try..

In my case, I am deploying some docker containers inside the python script and then calling one of the deployed services. Error is fixed when I add some delay before calling the service. I think it needs time to get ready to accept connections.

from time import sleep#deploy containers#get URL of the containersleep(5)response = requests.get(url,verify=False)print(response.json())

Adding my own experience :

r = requests.get(download_url)

when I tried to download a file specified in the url.

The error was

HTTPSConnectionPool(host, port=443): Max retries exceeded with url (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))

I corrected it by adding verify = False in the function as follows :

r = requests.get(download_url + filename)open(filename, 'wb').write(r.content)

Check your network connection. I had this and the VM did not have a proper network connection.

I had the same error when I run the route in the browser, but in postman, it works fine. It issue with mine was that, there was no / after the route before the query string.

127.0.0.1:5000/api/v1/search/?location=Madina raise the error and removing / after the search worked for me.

This happens when you send too many requests to the public IP address of https://itunes.apple.com. It as you can see caused due to some reason which does not allow/block access to the public IP address mapping with https://itunes.apple.com. One better solution is the following python script which calculates the public IP address of any domain and creates that mapping to the /etc/hosts file.

import reimport socketimport subprocessfrom typing import TupleENDPOINT = 'https://anydomainname.example.com/'ENDPOINT = 'https://itunes.apple.com/'def get_public_ip() -> Tuple[str, str, str]:"""Command to get public_ip address of host machine and endpoint domainReturns-------my_public_ip : strIp address string of host machine.end_point_ip_address : strIp address of endpoint domain host.end_point_domain : strdomain name of endpoint."""# bash_command = """host myip.opendns.com resolver1.opendns.com | \# grep "myip.opendns.com has" | awk '{print $4}'"""# bash_command = """curl ifconfig.co"""# bash_command = """curl ifconfig.me"""bash_command = """ curl icanhazip.com"""my_public_ip = subprocess.getoutput(bash_command)my_public_ip = re.compile("[0-9.]{4,}").findall(my_public_ip)[0]end_point_domain = (ENDPOINT.replace("https://", "").replace("http://", "").replace("/", ""))end_point_ip_address = socket.gethostbyname(end_point_domain)return my_public_ip, end_point_ip_address, end_point_domaindef set_etc_host(ip_address: str, domain: str) -> str:"""A function to write mapping of ip_address and domain name in /etc/hosts.Ref: https://stackoverflow.com/questions/38302867/how-to-update-etc-hosts-file-in-docker-image-during-docker-buildParameters----------ip_address : strIP address of the domain.domain : strdomain name of endpoint.Returns-------strMessage to identify success or failure of the operation."""bash_command = """echo "{} {}" >> /etc/hosts""".format(ip_address, domain)output = subprocess.getoutput(bash_command)return outputif __name__ == "__main__":my_public_ip, end_point_ip_address, end_point_domain = get_public_ip()output = set_etc_host(ip_address=end_point_ip_address, domain=end_point_domain)print("My public IP address:", my_public_ip)print("ENDPOINT public IP address:", end_point_ip_address)print("ENDPOINT Domain Name:", end_point_domain )print("Command output:", output)

You can call the above script before running your desired function :)

My situation is rather special. I tried the answers above, none of them worked. I suddenly thought whether it has something to do with my Internet proxy? You know, I'm in mainland China, and I can't access sites like google without an internet proxy. Then I turned off my Internet proxy and the problem was solved.

Add headers for this request.

headers={'Referer': 'https://itunes.apple.com','User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'}requests.get(ap, headers=headers)