Каков наилучший способ получить код ответа HTTP из URL?

Мне нужен быстрый способ получить код ответа HTTP из URL (т.е. 200, 404 и т.д.). Я не уверен, какую библиотеку использовать.

Ответ 1

Обновление с использованием замечательной библиотеки запросов. Обратите внимание, что мы используем запрос HEAD, который должен выполняться быстрее, чем полный запрос GET или POST.

import requests
try:
    r = requests.head("https://stackoverflow.com")
    print(r.status_code)
    # prints the int of the status code. Find more at httpstatusrappers.com :)
except requests.ConnectionError:
    print("failed to connect")

Ответ 2

Здесь используется решение, использующее httplib.

import httplib

def get_status_code(host, path="/"):
    """ This function retreives the status code of a website by requesting
        HEAD data from the host. This means that it only requests the headers.
        If the host cannot be reached or something else goes wrong, it returns
        None instead.
    """
    try:
        conn = httplib.HTTPConnection(host)
        conn.request("HEAD", path)
        return conn.getresponse().status
    except StandardError:
        return None


print get_status_code("stackoverflow.com") # prints 200
print get_status_code("stackoverflow.com", "/nonexistant") # prints 404

Ответ 3

Вы должны использовать urllib2, например:

import urllib2
for url in ["http://entrian.com/", "http://entrian.com/does-not-exist/"]:
    try:
        connection = urllib2.urlopen(url)
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

# Prints:
# 200 [from the try block]
# 404 [from the except block]

Ответ 4

В будущем для тех, кто использует python3 и более поздние версии, здесь еще один код, чтобы найти код ответа.

import urllib.request

def getResponseCode(url):
    conn = urllib.request.urlopen(url)
    return conn.getcode()

Ответ 5

Исключение urllib2.HTTPError не содержит метода getcode(). Вместо этого используйте атрибут code.

Ответ 6

Здесь httplib решение, которое ведет себя как urllib2. Вы можете просто указать ему URL-адрес, и он просто работает. Не нужно путаться с разбиением URL-адресов на имя хоста и путь. Эта функция уже делает это.

import httplib
import socket
def get_link_status(url):
  """
    Gets the HTTP status of the url or returns an error associated with it.  Always returns a string.
  """
  https=False
  url=re.sub(r'(.*)#.*$',r'\1',url)
  url=url.split('/',3)
  if len(url) > 3:
    path='/'+url[3]
  else:
    path='/'
  if url[0] == 'http:':
    port=80
  elif url[0] == 'https:':
    port=443
    https=True
  if ':' in url[2]:
    host=url[2].split(':')[0]
    port=url[2].split(':')[1]
  else:
    host=url[2]
  try:
    headers={'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0',
             'Host':host
             }
    if https:
      conn=httplib.HTTPSConnection(host=host,port=port,timeout=10)
    else:
      conn=httplib.HTTPConnection(host=host,port=port,timeout=10)
    conn.request(method="HEAD",url=path,headers=headers)
    response=str(conn.getresponse().status)
    conn.close()
  except socket.gaierror,e:
    response="Socket Error (%d): %s" % (e[0],e[1])
  except StandardError,e:
    if hasattr(e,'getcode') and len(e.getcode()) > 0:
      response=str(e.getcode())
    if hasattr(e, 'message') and len(e.message) > 0:
      response=str(e.message)
    elif hasattr(e, 'msg') and len(e.msg) > 0:
      response=str(e.msg)
    elif type('') == type(e):
      response=e
    else:
      response="Exception occurred without a good error message.  Manually check the URL to see the status.  If it is believed this URL is 100% good then file a issue for a potential bug."
  return response

Ответ 7

Направляя комментарий @Niklas R к ответу @nickanor:

from urllib.error import HTTPError
import urllib.request

def getResponseCode(url):
    try:
        conn = urllib.request.urlopen(url)
        return conn.getcode()
    except HTTPError as e:
        return e.code