天道酬勤,学无止境

Python HTTPConnectionPool Failed to establish a new connection: [Errno 11004] getaddrinfo failed

I was wondering if my requests is stopped by the website and I need to set a proxy.I first try to close the http's connection ,bu I failed.I also try to test my code but now it seems no outputs.Mybe I use a proxy everything will be OK? Here is the code.

import requests
from urllib.parse import urlencode
import json
from bs4 import BeautifulSoup
import re
from html.parser import HTMLParser
from multiprocessing import Pool
from requests.exceptions import RequestException
import time


def get_page_index(offset, keyword):
    #headers = {'User-Agent':'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50'}
    data = {
        'offset': offset,
        'format': 'json',
        'keyword': keyword,
        'autoload': 'true',
        'count': 20,
        'cur_tab': 1
    }
    url = 'http://www.toutiao.com/search_content/?' + urlencode(data)
    try:
        response = requests.get(url, headers={'Connection': 'close'})
        response.encoding = 'utf-8'
        if response.status_code == 200:
            return response.text
        return None
    except RequestException as e:
        print(e)

def parse_page_index(html):
    data = json.loads(html)
    if data and 'data' in data.keys():
        for item in data.get('data'):
            url = item.get('article_url')
            if url and len(url) < 100:
                yield url

def get_page_detail(url):
    try:
        response = requests.get(url, headers={'Connection': 'close'})
        response.encoding = 'utf-8'
        if response.status_code == 200:
            return response.text
        return None
    except RequestException as e:
        print(e)

def parse_page_detail(html):
    soup = BeautifulSoup(html, 'lxml')
    title = soup.select('title')[0].get_text()
    pattern = re.compile(r'articleInfo: (.*?)},', re.S)
    pattern_abstract = re.compile(r'abstract: (.*?)\.', re.S)
    res = re.search(pattern, html)
    res_abstract = re.search(pattern_abstract, html)
    if res and res_abstract:
        data = res.group(1).replace(r".replace(/<br \/>|\n|\r/ig, '')", "") + '}'
        abstract = res_abstract.group(1).replace(r"'", "")
        content = re.search(r'content: (.*?),', data).group(1)
        source = re.search(r'source: (.*?),', data).group(1)
        time_pattern = re.compile(r'time: (.*?)}', re.S)
        date = re.search(time_pattern, data).group(1)
        date_today = time.strftime('%Y-%m-%d')
        img = re.findall(r'src=&quot;(.*?)&quot', content)
        if date[1:11] == date_today and len(content) > 50 and img:
            return {
                'title': title,
                'content': content,
                'source': source,
                'date': date,
                'abstract': abstract,
                'img': img[0]
            }

def main(offset):
    flag = 1
    html = get_page_index(offset, '光伏')
    for url in parse_page_index(html):
        html = get_page_detail(url)
        if html:
            data = parse_page_detail(html)
            if data:
                html_parser = HTMLParser()
                cwl = html_parser.unescape(data.get('content'))
                data['content'] = cwl
                print(data)
                print(data.get('img'))
                flag += 1
                if flag == 5:
                    break



if __name__ == '__main__':
    pool = Pool()
    pool.map(main, [i*20 for i in range(10)])

and the error is the here!

HTTPConnectionPool(host='tech.jinghua.cn', port=80): Max retries exceeded with url: /zixun/20160720/f191549.shtml (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x00000000048523C8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

By the way, When I test my code at first it shows everything is OK! Thanks in advance!

评论

It seems to me you're hitting the limit of connection in the HTTPConnectionPool. Since you start 10 threads at the same time

Try one of the following:

  1. Increase the request timeout (seconds): requests.get('url', timeout=5)
  2. Close the response: Response.close(). Instead of returning response.text, assign response to a varialble, close Response, and then return variable

When I faced this issue I had the following problems

I wasn't able to do the following - The requests python module was unable to get information from any url. Although I was able to surf the site with browser, also could get wget or curl to download that page. - pip install was also not working and use to fail with following errors

Failed to establish a new connection: [Errno 11004] getaddrinfo failed

Certain site blocked me so i tried forcebindip to use another network interface for my python modules and then i removed it. Probably that cause my network to mess up and my request module and even the direct socket module were stuck and not able to fetch any url.

So I followed network configuration reset in the below URL and now I am good.

network configuration reset

In case it helps someone else, I faced this same error message:

Client-Request-ID=long-string Retry policy did not allow for a retry: , HTTP status code=Unknown, Exception=HTTPSConnectionPool(host='table.table.core.windows.net', port=443): Max retries exceeded with url: /service(PartitionKey='requests',RowKey='9999') (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001D920ADA970>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')).

...when trying to retrieve a record from Azure Table Storage using

table_service.get_entity(table_name, partition_key, row_key).

My issue:

  • I had the table_name incorrectly defined.

受限制的 HTML

  • 允许的HTML标签:<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • 自动断行和分段。
  • 网页和电子邮件地址自动转换为链接。

相关推荐
  • urllib2.URLError:(urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>)
    问题 如果我运行: urllib2.urlopen('http://google.com') 即使我使用其他网址,也会出现相同的错误。 我很确定我的计算机或路由器上没有运行防火墙,并且Internet(通过浏览器)运行良好。 回答1 在我的情况下,问题是当我没有代理时,某些安装在某时在我的机器上定义了一个环境变量http_proxy 。 删除http_proxy环境变量可解决此问题。 回答2 该站点的DNS记录使得Python以一种特殊的方式使DNS查找失败:它找到条目,但是找到零个关联的IP地址。 (使用nslookup验证。)因此,11004,WSANO_DATA。 用“ www。”作为网站的前缀。 然后重试该请求。 (使用nslookup来验证其结果是否也不同。) 这基本上与Python Requests模块失败的方式相同: requests.exceptions.ConnectionError:HTTPConnectionPool(host ='...',端口= 80):URL超过最大重试次数:/(由:[Errno 11004] getaddrinfo失败) 回答3 如果这是网络级别的问题,这可能对您没有帮助,但是您可以通过在httplib上设置debuglevel来获取一些调试信息。 试试这个: import urllib, urllib2, httplib url =
  • en_core_web_sm安装报错:requests.exceptions.ConnectionError: HTTPSConnectionPool(host=‘raw.githubusercont
    en_core_web_sm安装报错 问题描述解决办法方法一方法二 问题描述 当我安装完 spacy 库后,要安装 英文包 en_core_web_sm, 安装官网给的命令: python -m spacy download [en_core_web_sm] 然后报错了,完整报错如下: Traceback (most recent call last): File "E:\Anaconda\lib\site-packages\urllib3\connection.py", line 171, in _new_conn (self._dns_host, self.port), self.timeout, **extra_kw) File "E:\Anaconda\lib\site-packages\urllib3\util\connection.py", line 56, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "E:\Anaconda\lib\socket.py", line 748, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type
  • 解决Python报错:URLError: <urlopen error [Errno 11004] getaddrinfo failed>
    原因 获取地址信息失败,通常是由于自动选择的DNS服务器不行 解决方法 更换DNS服务器 控制面板(win+R+control) -> 网络和Internet 2. 打开网络共享中心 3. 打开当前连接的网络,以wifi为例 4. 打开属性 5. 打开Internet协议版本4 6. 选择使用特定DNS服务器地址,并设置为114.114.114.114 8. 设置完成后,错误解决 来源:https://blog.csdn.net/qq_43474959/article/details/107902588
  • IOError: [Errno socket error] [Errno 11004] getaddrinfo failed
    I am beginner python prorammer. With 2.7.2, Windows 7, built-in interpreter, and three libraries. I am trying to do this, with error. I appreciate any help? import os import urllib import socket DISNEY_URL = 'http://www.sec.gov/Archives/edgar/data/1001039/000119312511321340/dis-20111001.xml' #Neither of these seem to work when opening with urllib.urlopen becaue of the error: #I/O error(socket error): [Errno 11004] getaddrinfo failed DISNEY_LOCAL = 'file://C:/Users/Nate/Desktop/Education/python_education/xbrlnexusfiles/xbrlfiles/dis-20111001.xml' DISNEY_LOCAL_NONE = 'file://C:/Users/Nate
  • Using an HTTP PROXY - Python [duplicate]
    This question already has answers here: Proxy with urllib2 (7 answers) Closed 5 years ago. I familiar with the fact that I should set the HTTP_RPOXY environment variable to the proxy address. Generally urllib works fine, the problem is dealing with urllib2. >>> urllib2.urlopen("http://www.google.com").read() returns urllib2.URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it> or urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed> Extra info: urllib.urlopen(....) works fine! It is just urllib2 that is playing tricks
  • 使用HTTP PROXY-Python(Using an HTTP PROXY - Python [duplicate])
    问题 这个问题已经在这里有了答案: urllib2代理(7个答案) 5年前关闭。 我熟悉将HTTP_RPOXY环境变量设置为代理地址这一事实。 通常urllib可以正常工作,问题在于处理urllib2。 >>> urllib2.urlopen("http://www.google.com").read() 退货 urllib2.URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it> 或者 urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed> 额外的信息: urllib.urlopen(....)工作正常! 只是urllib2在发挥作用... 我尝试了@Fenikso答案,但是现在出现此错误: URLError: <urlopen error [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection
  • Visdom pycharm启动坑
    之前安装好了visdom,在pycharm里运行如下代码: 然后就 Traceback (most recent call last): File "D:\program\conda\envs\python36_gan\lib\site-packages\visdom\__init__.py", line 711, in _send data=json.dumps(msg), File "D:\program\conda\envs\python36_gan\lib\site-packages\visdom\__init__.py", line 677, in _handle_post r = self.session.post(url, data=data) File "D:\program\conda\envs\python36_gan\lib\site-packages\requests\sessions.py", line 581, in post return self.request('POST', url, data=data, json=json, **kwargs) File "D:\program\conda\envs\python36_gan\lib\site-packages\requests\sessions.py", line 533, in
  • Python SOAP client with Zeep - authentication
    I am trying to use Zeep to implement a SOAP client, as it seems the only maintained library at the moment: ZSI looked very good but its latest version on pypi dates 2006 suds seemed to be a popular alternative, but the master is unmaintained since 2011 and there are a lot of forks out there but none seems "official" and "recent" enough to be used in a large project. So, trying to use Zeep, I am stuck with the authentication required by the server to access the WSDL. Such operation was quite easy with ZSI: from ZSI.client import Binding from ZSI.auth import AUTH b = Binding(url='http://mysite
  • Python SOAP client with Zeep - import namespace
    A little context: I am opening this question arose here, after solving an authentication problem. I prefer to open a new one to avoid polluting the previous with comments not related to the original issue, and to give it the proper visibility. I am working on a SOAP client running in the same intranet as the server, without internet access. from requests.auth import HTTPBasicAuth from zeep import Client from zeep.transports import Transport wsdl = 'http://mysite.dom/services/MyWebServices?WSDL' client = Client(wsdl, transport=HTTPBasicAuth('user','pass'), cache=None) The problem: WSDL contains
  • 从公司防火墙后面使用urllib2打开网站-11004 getaddrinfo失败(opening websites using urllib2 from behind corporate firewall - 11004 getaddrinfo failed)
    问题 我正在尝试使用以下方法从公司防火墙后面访问网站: password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password(None, url, username, password) auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr) opener = urllib2.build_opener(auth_handler) urllib2.install_opener(opener) conn = urllib2.urlopen('http://python.org') 遇到错误 URLError: <urlopen error [Errno 11004] getaddrinfo failed> 我尝试使用不同的处理程序(尝试ProxyHandler的方式也略有不同),但似乎不起作用。 是否有任何线索可能是导致错误的原因,以及提供凭据并使其起作用的任何不同方式? 回答1 如果您正在使用代理,并且该代理具有用户名和密码(许多公司代理都有),则需要使用urllib2设置代理处理程序。 proxy_url = 'http://' + proxy_user + ':' + proxy_password + '@' + proxy
  • Why can't an AWS lambda function inside a public subnet in a VPC connect to the internet?
    I've followed the tutorial here to create a VPC with public and private subnets. Then I set up an AWS lambda function inside the public subnet to test if it could connect to the outside internet. Here's my lambda function written in python3 import requests def lambda_handler(event, context): r = requests.get('http://www.google.com') print(r) The function above failed to fetch the content of http://www.google.com when I set it inside the public subnet in a VPC. Here's the error message: "errorMessage": "HTTPConnectionPool(host='www.google.com', port=80): Max retries exceeded with url: / (Caused
  • opening websites using urllib2 from behind corporate firewall - 11004 getaddrinfo failed
    I am trying to access a website from behind corporate firewall using below:- password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password(None, url, username, password) auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr) opener = urllib2.build_opener(auth_handler) urllib2.install_opener(opener) conn = urllib2.urlopen('http://python.org') Getting error URLError: <urlopen error [Errno 11004] getaddrinfo failed> I have tried with different handlers (tried ProxyHandler also in slightly different way), but doesn't seem to work. Any clues to what could be the reason for
  • urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>
    If I run: urllib2.urlopen('http://google.com') even if I use another url, I get the same error. I'm pretty sure there is no firewall running on my computer or router, and the internet (from a browser) works fine.
  • Python Requests:proxy代理错误
    目标:www.baidu.com源码:import requests url = 'http://www.baidu.com' header = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36' } free_proxy = { #都是http类型地址 ##'http': '163.204.241.160:9999' 'http': '123.206.54.52:8118' } response = requests.get(url=url, headers=header, proxies=free_proxy) print(response.status_code)使用代理'163.204.241.160:9999'出现ProxyError:Traceback (most recent call last): File "D:\Software\python3.7.4\lib\site-packages\urllib3\connection.py", line 160, in _new_conn (self._dns_host, self.port), self
  • Use Python to Query SQL Server Analysis Services (SSAS) cube Data
    There is a SQL analysis service resource in my organization, we can use excel or powerbi to connect to the cube use a server name (tooldata.amr.xxx.com) and get the data. What i want is use python or excel to automate the data query and output to a csv file for downstream application use (reporting/chart etc.) I've tried below but failed: 1.Microsoft.AnalysisServices.AdomdClient FileNotFoundException Traceback (most recent call last) in 2. clr.AddReference ("Microsoft.AnalysisServices.AdomdClient.dll") FileNotFoundException: Unable to find assembly 'Microsoft.AnalysisServices.AdomdClient.dll'
  • Using Azure Web App for Containers with managed identity
    Deployed an Azure App service for Containers with a custom image (from Centos 7 base image). Based on the following documentation There is an environment variable that should be set by Azure and used for creating the REST API request to obtain an access token: IDENTITY_ENDPOINT - the URL to the local token service. However, when checking inside the container, this variable is not set: [root@f22dfd74be31 ~]# echo $IDENTITY_ENDPOINT (empty result here) I've also tried to invoke az cli, which fails as well: [root@f22dfd74be31 ~]# az login -i AzureConnectionError: Failed to connect to MSI. Please
  • 如何测试UDP端口连通性
    根据测试环境的不同,用户可以参阅如下方式测试UDP端口的连通性。假设待测试服务器的IP地址为1.1.1.1 ,待测试UDP端口为5555。博主补充:利用namp探测udp端口nmap -sU 1.1.1.1 -p 5555 -PnSTATE为open是正常打开的状态STATE为filtered是被阻断或者没有打开的状态一. 测试环境为Linux环境根据操作系统类型的不同,使用如下指令确认系统内已经安装有nc测试工具;#which nc/bin/nc如果nc未被安装,根据操作系统的不同,使用yum或apt-get等工具先手工安装该工具;使用如下指令测试目标服务器UDP端口的连通性:# nc -vuz 1.1.1.1 5555Connection to 1.1.1.1 5555 port [udp/isakmp] succeeded!如果返回succeeded,则说明相应的UDP端口访问正常。如果无任何返回信息,则说明相应的UDP端口访问失败。二. 测试环境为Windows环境从netcat官方网站下载最新的netcat(NC的Windows版本);将程序解压到所需目录;在cmd下切换到上述解压目录后,再使用指令测试目标服务器UDP端口的连通性:C:\>nc -vuz 1.1.1.1 55551.1.1.1: inverse host lookup failed: h_errno
  • Python pip raising NewConnectionError while installing libraries
    I've Python 3 running in a linux server. I need to install some libraries (obviously) so I'm trying : pip3 install numpy Which, is resulting in the following error: Collecting numpy Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7542572828>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /simple/numpy/ Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError
  • MaxRetryError:HTTPConnectionPool:超过最大重试次数(由ProtocolError(“连接中止。”,错误(111,“连接被拒绝”)引起))(MaxRetryError: HTTPConnectionPool: Max retries exceeded (Caused by ProtocolError('Connection aborted.', error(111, 'Connection refused'))))
    问题 我有一个问题:我想测试“选择”和“输入”。我可以像下面的代码一样写吗:原始代码: 12 class Sinaselecttest(unittest.TestCase): 13 14 def setUp(self): 15 binary = FirefoxBinary('/usr/local/firefox/firefox') 16 self.driver = webdriver.Firefox(firefox_binary=binary) 17 18 def test_select_in_sina(self): 19 driver = self.driver 20 driver.get("https://www.sina.com.cn/") 21 try: 22 WebDriverWait(driver,30).until( 23 ec.visibility_of_element_located((By.XPATH,"/html/body/div[9]/div/div[1]/form/div[3]/input")) 24 ) 25 finally: 26 driver.quit() # #测试select功能 27 select=Select(driver.find_element_by_xpath("//*[@id='slt_01']")).select_by_value(
  • “ getaddrinfo失败”,这是什么意思?(“getaddrinfo failed”, what does that mean?)
    问题 File "C:\Python27\lib\socket.py", line 224, in meth return getattr(self._sock,name)(*args) gaierror: [Errno 11004] getaddrinfo failed 从此处启动hello world示例时出现此错误: http://bottlepy.org/docs/dev/ 回答1 这很可能意味着主机名无法解析。 import socket socket.getaddrinfo('localhost', 8080) 如果它在那里不起作用,则在Bottle示例中将不起作用。 如果出现问题,可以尝试使用“ 127.0.0.1”而不是“ localhost”。 回答2 就我而言,问题是在没有代理的情况下,某些安装在某时在我的计算机上定义了环境变量http_proxy 。 删除http_proxy环境变量可解决此问题。 回答3 我的问题是我需要为http_proxy和https_proxy添加环境变量。 例如, http_proxy=http://your_proxy:your_port https_proxy=https://your_proxy:your_port 要在Windows中设置这些环境变量,请参阅此问题的答案。 回答4 确保在命令中传递代理属性,例如-pip