天道酬勤,学无止境

How do I specify URL resolution in python's requests library in a similar fashion to curl's --resolve flag?

Question

I am writing some python client code and, due to some environmental constraints, I want to specify a URL and also control how it is resolved. I can accomplish this with curl by using the --resolve flag. Is there a way to do something similar with Python's requests library?

Ideally this would work in Python 2.7 but I can make a 3.x solution work as well.

Answer1

After doing a bit of digging, I (unsurprisingly) found that Requests resolves hostnames by asking Python to do it (which is asking your operating system to do it). First I found some sample code to hijack DNS resolution (Tell urllib2 to use custom DNS) and then I figured out a few more details about how Python resolves hostnames in the socket documentation. Then it was just a matter of wiring everything together:

import socket
import requests

def is_ipv4(s):
    # Feel free to improve this: https://stackoverflow.com/questions/11827961/checking-for-ip-addresses
    return ':' not in s

dns_cache = {}

def add_custom_dns(domain, port, ip):
    key = (domain, port)
    # Strange parameters explained at:
    # https://docs.python.org/2/library/socket.html#socket.getaddrinfo
    # Values were taken from the output of `socket.getaddrinfo(...)`
    if is_ipv4(ip):
        value = (socket.AddressFamily.AF_INET, 0, 0, '', (ip, port))
    else: # ipv6
        value = (socket.AddressFamily.AF_INET6, 0, 0, '', (ip, port, 0, 0))
    dns_cache[key] = [value]

# Inspired by: https://stackoverflow.com/a/15065711/868533
prv_getaddrinfo = socket.getaddrinfo
def new_getaddrinfo(*args):
    # Uncomment to see what calls to `getaddrinfo` look like.
    # print(args)
    try:
        return dns_cache[args[:2]] # hostname and port
    except KeyError:
        return prv_getaddrinfo(*args)

socket.getaddrinfo = new_getaddrinfo

# Redirect example.com to the IP of test.domain.com (completely unrelated).
add_custom_dns('example.com', 80, '66.96.162.92')
res = requests.get('http://example.com')
print(res.text) # Prints out the HTML of test.domain.com.

Some caveats I ran into while writing this:

  • This works poorly for https. The code works fine (just use https:// and 443 instead of http:// and 80). However, SSL certificates are tied to domain names and Requests is going to try validating the name on the certificate to the original domain you tried connecting to.
  • getaddrinfo returns slightly different info for IPv4 and IPv6 addresses. My implementation for is_ipv4 feels hacky to me and I strongly recommend a better version if you're using this in a real application.
  • The code has been tested on Python 3 but I see no reason why it wouldn't work as-is on Python 2.
Answer2

I have been trying to figure out a solution for a while now and finally stumbled on this post. The solution provided by @supersam654 did not work for me right away (was using https and python 3.8), but a few days of sleeping on got me this solution that works regardless of version (have not tested for too many versions, but naively hope that to be the case).

It should also work for ipv6 - though I have not tested that either.

The key to the solution was to use the default getaddrinfo() for all calls (no assumptions on its output) - simply replace the hostname with the ip address to override it with! Hence my grandiose statement on how well it works ;-)

import socket

dns_cache = {}
# Capture a dict of hostname and their IPs to override with
def override_dns(domain, ip):
    dns_cache[domain] = ip


prv_getaddrinfo = socket.getaddrinfo
# Override default socket.getaddrinfo() and pass ip instead of host
# if override is detected
def new_getaddrinfo(*args):
    if args[0] in dns_cache:
        print("Forcing FQDN: {} to IP: {}".format(args[0], dns_cache[args[0]]))
        return prv_getaddrinfo(dns_cache[args[0]], *args[1:])
    else:
        return prv_getaddrinfo(*args)


socket.getaddrinfo = new_getaddrinfo

To use the above logic - simply call the function like so before making requests (you can override with IP Address or another FQDN!):

override_dns('www.example.com', '192.168.1.100')

I believe this is a better solution than the ForcedIPHTTPSAdapter that I had used earlier.

Answer3

Late answer, but there's a module called forcediphttpsadapter that does exactly this:

Install:

pip3 install forcediphttpsadapter

Usage:

import requests
from forcediphttpsadapter.adapters import ForcedIPHTTPSAdapter

url = 'https://domain.tld/path'
session = requests.Session()
session.mount(url, ForcedIPHTTPSAdapter(dest_ip='x.x.x.x')) # type the desired ip
r = session.get(url, verify=False)
print(r.text)
...

Sources:

  • Forcing Python Requests to connect to a specific IP address
  • Github repo: Roadmaster/forcediphttpsadapter
Answer4

Looks like the easiest route is to use this package: https://github.com/requests/requests-kerberos

use the routable name and set the hostname_override value to be the name that Kerberos expects.

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.

相关推荐