Come posso verificare se esiste un URL tramite PHP?

188

Come posso verificare se esiste un URL (non 404) in PHP?

php url

— X10nD
fonte

3

possibile duplicato di Come si può verificare se esiste un file remoto usando PHP?

— viam0Zah,

297

Qui:

$file = 'http://www.domain.com/somefile.jpg';
$file_headers = @get_headers($file);
if(!$file_headers || $file_headers[0] == 'HTTP/1.1 404 Not Found') {
    $exists = false;
}
else {
    $exists = true;
}

Da qui e proprio sotto il post sopra, c'è una soluzione a ricciolo :

function url_exists($url) {
    if (!$fp = curl_init($url)) return false;
    return true;
}

— karim79
fonte

18

Temo che il CURL-way non funzioni in questo modo. Check this out: stackoverflow.com/questions/981954/...

— viam0Zah

4

alcuni siti Web hanno una $file_headers[0]pagina di errore diversa . ad esempio, youtube.com. la sua pagina di errore ha quel valore come HTTP/1.0 404 Not Found(la differenza è 1.0 e 1.1). cosa fare allora?

— Krishna Raj K

21

Forse usando strpos($headers[0], '404 Not Found')potrebbe fare il trucco

— alexandru.topliceanu

12

@Mark concordato! Per chiarire, strpos($headers[0], '404')è meglio!

— alexandru.topliceanu,

1

@ karim79, fai attenzione agli attacchi SSRF e XSPA

— M Rostami,

55

Quando capisci se esiste un URL da PHP ci sono alcune cose a cui prestare attenzione:

L'URL stesso è valido (una stringa, non vuota, buona sintassi), questo è veloce per controllare il lato server.
Attendere una risposta potrebbe richiedere del tempo e bloccare l'esecuzione del codice.
Non tutte le intestazioni restituite da get_headers () sono ben formate.
Usa l'arricciatura (se puoi).
Impedisci di recuperare l'intero corpo / contenuto, ma richiedi solo le intestazioni.
Prendi in considerazione il reindirizzamento degli URL:
- Vuoi che venga restituito il primo codice?
- Oppure segui tutti i reindirizzamenti e restituisci l'ultimo codice?
- Potresti finire con un 200, ma potrebbe reindirizzare usando meta tag o javascript. Capire cosa succede dopo è difficile.

Tieni presente che qualunque metodo tu usi, ci vuole tempo per attendere una risposta.
Tutto il codice potrebbe (e probabilmente lo farà) arrestarsi fino a quando non si conosce il risultato o le richieste sono scadute.

Ad esempio: il codice seguente potrebbe richiedere molto tempo per visualizzare la pagina se gli URL non sono validi o irraggiungibili:

<?php
$urls = getUrls(); // some function getting say 10 or more external links

foreach($urls as $k=>$url){
  // this could potentially take 0-30 seconds each
  // (more or less depending on connection, target site, timeout settings...)
  if( ! isValidUrl($url) ){
    unset($urls[$k]);
  }
}

echo "yay all done! now show my site";
foreach($urls as $url){
  echo "<a href=\"{$url}\">{$url}</a><br/>";
}

Le seguenti funzioni potrebbero essere utili, probabilmente vorrai modificarle in base alle tue esigenze:

    function isValidUrl($url){
        // first do some quick sanity checks:
        if(!$url || !is_string($url)){
            return false;
        }
        // quick check url is roughly a valid http request: ( http://blah/... ) 
        if( ! preg_match('/^http(s)?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)*(:[0-9]+)?(\/.*)?$/i', $url) ){
            return false;
        }
        // the next bit could be slow:
        if(getHttpResponseCode_using_curl($url) != 200){
//      if(getHttpResponseCode_using_getheaders($url) != 200){  // use this one if you cant use curl
            return false;
        }
        // all good!
        return true;
    }

    function getHttpResponseCode_using_curl($url, $followredirects = true){
        // returns int responsecode, or false (if url does not exist or connection timeout occurs)
        // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
        // if $followredirects == false: return the FIRST known httpcode (ignore redirects)
        // if $followredirects == true : return the LAST  known httpcode (when redirected)
        if(! $url || ! is_string($url)){
            return false;
        }
        $ch = @curl_init($url);
        if($ch === false){
            return false;
        }
        @curl_setopt($ch, CURLOPT_HEADER         ,true);    // we want headers
        @curl_setopt($ch, CURLOPT_NOBODY         ,true);    // dont need body
        @curl_setopt($ch, CURLOPT_RETURNTRANSFER ,true);    // catch output (do NOT print!)
        if($followredirects){
            @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,true);
            @curl_setopt($ch, CURLOPT_MAXREDIRS      ,10);  // fairly random number, but could prevent unwanted endless redirects with followlocation=true
        }else{
            @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,false);
        }
//      @curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,5);   // fairly random number (seconds)... but could prevent waiting forever to get a result
//      @curl_setopt($ch, CURLOPT_TIMEOUT        ,6);   // fairly random number (seconds)... but could prevent waiting forever to get a result
//      @curl_setopt($ch, CURLOPT_USERAGENT      ,"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1");   // pretend we're a regular browser
        @curl_exec($ch);
        if(@curl_errno($ch)){   // should be 0
            @curl_close($ch);
            return false;
        }
        $code = @curl_getinfo($ch, CURLINFO_HTTP_CODE); // note: php.net documentation shows this returns a string, but really it returns an int
        @curl_close($ch);
        return $code;
    }

    function getHttpResponseCode_using_getheaders($url, $followredirects = true){
        // returns string responsecode, or false if no responsecode found in headers (or url does not exist)
        // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
        // if $followredirects == false: return the FIRST known httpcode (ignore redirects)
        // if $followredirects == true : return the LAST  known httpcode (when redirected)
        if(! $url || ! is_string($url)){
            return false;
        }
        $headers = @get_headers($url);
        if($headers && is_array($headers)){
            if($followredirects){
                // we want the the last errorcode, reverse array so we start at the end:
                $headers = array_reverse($headers);
            }
            foreach($headers as $hline){
                // search for things like "HTTP/1.1 200 OK" , "HTTP/1.0 200 OK" , "HTTP/1.1 301 PERMANENTLY MOVED" , "HTTP/1.1 400 Not Found" , etc.
                // note that the exact syntax/version/output differs, so there is some string magic involved here
                if(preg_match('/^HTTP\/\S+\s+([1-9][0-9][0-9])\s+.*/', $hline, $matches) ){// "HTTP/*** ### ***"
                    $code = $matches[1];
                    return $code;
                }
            }
            // no HTTP/xxx found in headers:
            return false;
        }
        // no headers :
        return false;
    }

— Moonlite
fonte

per qualche motivo getHttpResponseCode_using_curl () restituisce sempre 200 nel mio caso.

— TD_Nijboer,

2

se qualcuno ha lo stesso problema, controllare dns-nameservers .. uso OpenDNS senza followRedirects stackoverflow.com/a/11072947/1829460

— TD_Nijboer

+1 per essere l'unica risposta per gestire i reindirizzamenti. Modificato il return $codeper if($code == 200){return true;} return false;risolvere solo i successi

— Birrel

@PKHunter: No. La mia rapida regex preg_match è stata un semplice esempio e non corrisponderà a tutti gli URL elencati lì. Vedi questo URL di prova: regex101.com/r/EpyDDc/2 Se vuoi uno migliore, sostituiscilo con quello elencato sul tuo link ( mathiasbynens.be/demo/url-regex ) da diegoperini; sembra corrispondere a tutti loro, vedere questo link di test: regex101.com/r/qMQp23/1

— MoonLite

46

$headers = @get_headers($this->_value);
if(strpos($headers[0],'200')===false)return false;

quindi ogni volta che contatti un sito Web e ottieni qualcosa di diverso da 200 ok, funzionerà

— lunarnet76
fonte

13

E se fosse un reindirizzamento? Il dominio è ancora valido, ma verrà lasciato fuori.

— Eric Leroy,

4

Sopra su una sola riga: return strpos(@get_headers($url)[0],'200') === false ? false : true. Potrebbe essere utile.

— Dejv

$ this is in PHP è un riferimento all'oggetto corrente. Riferimento: php.net/manual/en/language.oop5.basic.php Primer: phpro.org/tutorials/Object-Oriented-Programming-with-PHP.html Molto probabilmente il frammento di codice è stato preso da una classe e non fisso di conseguenza .

— Marc Witteveen,

18

non puoi usare l'arricciatura in alcuni server, puoi usare questo codice

<?php
$url = 'http://www.example.com';
$array = get_headers($url);
$string = $array[0];
if(strpos($string,"200"))
  {
    echo 'url exists';
  }
  else
  {
    echo 'url does not exist';
  }
?>

— Minhaz
fonte

potrebbe non funzionare per il reindirizzamento 302-303 o, ad esempio, 304 non modificato

— Zippp

8

$url = 'http://google.com';
$not_url = 'stp://google.com';

if (@file_get_contents($url)): echo "Found '$url'!";
else: echo "Can't find '$url'.";
endif;
if (@file_get_contents($not_url)): echo "Found '$not_url!";
else: echo "Can't find '$not_url'.";
endif;

// Found 'http://google.com'!Can't find 'stp://google.com'.

— Randy Skretka
fonte

2

Questo non funzionerà se allow-url-fopen è disattivato. - php.net/manual/it/…

— Daniel Paul Searles,

2

Suggerirei di leggere solo il primo byte ... if (@file_get_contents ($ url, false, NULL, 0,1))

— Daniel Valland

8

function URLIsValid($URL)
{
    $exists = true;
    $file_headers = @get_headers($URL);
    $InvalidHeaders = array('404', '403', '500');
    foreach($InvalidHeaders as $HeaderVal)
    {
            if(strstr($file_headers[0], $HeaderVal))
            {
                    $exists = false;
                    break;
            }
    }
    return $exists;
}

— leela
fonte

8

Uso questa funzione:

/**
 * @param $url
 * @param array $options
 * @return string
 * @throws Exception
 */
function checkURL($url, array $options = array()) {
    if (empty($url)) {
        throw new Exception('URL is empty');
    }

    // list of HTTP status codes
    $httpStatusCodes = array(
        100 => 'Continue',
        101 => 'Switching Protocols',
        102 => 'Processing',
        200 => 'OK',
        201 => 'Created',
        202 => 'Accepted',
        203 => 'Non-Authoritative Information',
        204 => 'No Content',
        205 => 'Reset Content',
        206 => 'Partial Content',
        207 => 'Multi-Status',
        208 => 'Already Reported',
        226 => 'IM Used',
        300 => 'Multiple Choices',
        301 => 'Moved Permanently',
        302 => 'Found',
        303 => 'See Other',
        304 => 'Not Modified',
        305 => 'Use Proxy',
        306 => 'Switch Proxy',
        307 => 'Temporary Redirect',
        308 => 'Permanent Redirect',
        400 => 'Bad Request',
        401 => 'Unauthorized',
        402 => 'Payment Required',
        403 => 'Forbidden',
        404 => 'Not Found',
        405 => 'Method Not Allowed',
        406 => 'Not Acceptable',
        407 => 'Proxy Authentication Required',
        408 => 'Request Timeout',
        409 => 'Conflict',
        410 => 'Gone',
        411 => 'Length Required',
        412 => 'Precondition Failed',
        413 => 'Payload Too Large',
        414 => 'Request-URI Too Long',
        415 => 'Unsupported Media Type',
        416 => 'Requested Range Not Satisfiable',
        417 => 'Expectation Failed',
        418 => 'I\'m a teapot',
        422 => 'Unprocessable Entity',
        423 => 'Locked',
        424 => 'Failed Dependency',
        425 => 'Unordered Collection',
        426 => 'Upgrade Required',
        428 => 'Precondition Required',
        429 => 'Too Many Requests',
        431 => 'Request Header Fields Too Large',
        449 => 'Retry With',
        450 => 'Blocked by Windows Parental Controls',
        500 => 'Internal Server Error',
        501 => 'Not Implemented',
        502 => 'Bad Gateway',
        503 => 'Service Unavailable',
        504 => 'Gateway Timeout',
        505 => 'HTTP Version Not Supported',
        506 => 'Variant Also Negotiates',
        507 => 'Insufficient Storage',
        508 => 'Loop Detected',
        509 => 'Bandwidth Limit Exceeded',
        510 => 'Not Extended',
        511 => 'Network Authentication Required',
        599 => 'Network Connect Timeout Error'
    );

    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

    if (isset($options['timeout'])) {
        $timeout = (int) $options['timeout'];
        curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);
    }

    curl_exec($ch);
    $returnedStatusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if (array_key_exists($returnedStatusCode, $httpStatusCodes)) {
        return "URL: '{$url}' - Error code: {$returnedStatusCode} - Definition: {$httpStatusCodes[$returnedStatusCode]}";
    } else {
        return "'{$url}' does not exist";
    }
}

— Ehsan
fonte

5

La soluzione get_headers () di karim79 non ha funzionato per me poiché ho ottenuto risultati folli con Pinterest.

get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

get_headers(): Failed to enable crypto

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

Ad ogni modo, questo sviluppatore dimostra che cURL è molto più veloce di get_headers ():

http://php.net/manual/fr/function.get-headers.php#104723

Poiché molte persone hanno chiesto a karim79 di risolvere la soluzione cURL, ecco la soluzione che ho costruito oggi.

/**
* Send an HTTP request to a the $url and check the header posted back.
*
* @param $url String url to which we must send the request.
* @param $failCodeList Int array list of code for which the page is considered invalid.
*
* @return Boolean
*/
public static function isUrlExists($url, array $failCodeList = array(404)){

    $exists = false;

    if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){

        $url = "https://" . $url;
    }

    if (preg_match(RegularExpression::URL, $url)){

        $handle = curl_init($url);


        curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);

        curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);

        curl_setopt($handle, CURLOPT_HEADER, true);

        curl_setopt($handle, CURLOPT_NOBODY, true);

        curl_setopt($handle, CURLOPT_USERAGENT, true);


        $headers = curl_exec($handle);

        curl_close($handle);


        if (empty($failCodeList) or !is_array($failCodeList)){

            $failCodeList = array(404); 
        }

        if (!empty($headers)){

            $exists = true;

            $headers = explode(PHP_EOL, $headers);

            foreach($failCodeList as $code){

                if (is_numeric($code) and strpos($headers[0], strval($code)) !== false){

                    $exists = false;

                    break;  
                }
            }
        }
    }

    return $exists;
}

Lasciami spiegare le opzioni di arricciatura:

CURLOPT_RETURNTRANSFER : restituisce una stringa invece di visualizzare la pagina chiamante sullo schermo.

CURLOPT_SSL_VERIFYPEER : cUrl non effettuerà il checkout del certificato

CURLOPT_HEADER : include l'intestazione nella stringa

CURLOPT_NOBODY : non includere il corpo nella stringa

CURLOPT_USERAGENT : alcuni siti devono funzionare correttamente (ad esempio: https://plus.google.com )

Nota aggiuntiva : in questa funzione sto usando la regex di Diego Perini per convalidare l'URL prima di inviare la richiesta:

const URL = "%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini

Nota aggiuntiva 2 : esplodo la stringa di intestazione e le intestazioni utente [0] per essere sicuro di convalidare solo il codice e il messaggio di ritorno (esempio: 200, 404, 405, ecc.)

Nota aggiuntiva 3 : a volte la convalida del solo codice 404 non è sufficiente (vedere il test unitario), quindi esiste un parametro $ failCodeList opzionale per fornire tutto l'elenco di codici da rifiutare.

E, naturalmente, ecco il test unitario (compresi tutti i social network popolari) per legittimare la mia codifica:

public function testIsUrlExists(){

//invalid
$this->assertFalse(ToolManager::isUrlExists("woot"));

$this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456"));

$this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800"));

$this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405)));

$this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/"));

$this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456"));

$this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546"));

$this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405)));

$this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456"));


//valid
$this->assertTrue(ToolManager::isUrlExists("www.google.ca"));

$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));

$this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque"));

$this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/"));

$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));

$this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/"));

$this->assertTrue(ToolManager::isUrlExists("https://regex101.com"));

$this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire"));

$this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/"));

$this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666"));
}

Grande successo a tutti,

Jonathan Parent-Lévesque di Montreal

— Jonathan Parent Lévesque
fonte

4

function urlIsOk($url)
{
    $headers = @get_headers($url);
    $httpStatus = intval(substr($headers[0], 9, 3));
    if ($httpStatus<400)
    {
        return true;
    }
    return false;
}

— spir
fonte

3

molto veloce:

function http_response($url){
    $resURL = curl_init(); 
    curl_setopt($resURL, CURLOPT_URL, $url); 
    curl_setopt($resURL, CURLOPT_BINARYTRANSFER, 1); 
    curl_setopt($resURL, CURLOPT_HEADERFUNCTION, 'curlHeaderCallback'); 
    curl_setopt($resURL, CURLOPT_FAILONERROR, 1); 
    curl_exec ($resURL); 
    $intReturnCode = curl_getinfo($resURL, CURLINFO_HTTP_CODE); 
    curl_close ($resURL); 
    if ($intReturnCode != 200 && $intReturnCode != 302 && $intReturnCode != 304) { return 0; } else return 1;
}

echo 'google:';
echo http_response('http://www.google.com');
echo '/ ogogle:';
echo http_response('http://www.ogogle.com');

— Sebastian Lasse
fonte

Troppo complicato :) stackoverflow.com/questions/981954/...

— Jack

ottengo questa eccezione quando esiste l'URL: impossibile chiamare CURLOPT_HEADERFUNCTION

— safiot

3

Tutte le soluzioni sopra + zucchero extra. (Soluzione AIO definitiva)

/**
 * Check that given URL is valid and exists.
 * @param string $url URL to check
 * @return bool TRUE when valid | FALSE anyway
 */
function urlExists ( $url ) {
    // Remove all illegal characters from a url
    $url = filter_var($url, FILTER_SANITIZE_URL);

    // Validate URI
    if (filter_var($url, FILTER_VALIDATE_URL) === FALSE
        // check only for http/https schemes.
        || !in_array(strtolower(parse_url($url, PHP_URL_SCHEME)), ['http','https'], true )
    ) {
        return false;
    }

    // Check that URL exists
    $file_headers = @get_headers($url);
    return !(!$file_headers || $file_headers[0] === 'HTTP/1.1 404 Not Found');
}

Esempio:

var_dump ( urlExists('http://stackoverflow.com/') );
// Output: true;

— Junaid Atari
fonte

3

per verificare se l'URL è online o offline ---

function get_http_response_code($theURL) {
    $headers = @get_headers($theURL);
    return substr($headers[0], 9, 3);
}

— Hosam Elzagh
fonte

3

function url_exists($url) {
    $headers = @get_headers($url);
    return (strpos($headers[0],'200')===false)? false:true;
}

— Krishna Guragai
fonte

2

Ecco una soluzione che legge solo il primo byte del codice sorgente ... restituendo false se il file_get_contents fallisce ... Funzionerà anche per file remoti come le immagini.

 function urlExists($url)
{
    if (@file_get_contents($url,false,NULL,0,1))
    {
        return true;
    }
    return false;
}

— Daniel Valland
fonte

0

il modo semplice è arricciare (e anche più veloce)

<?php
$mylinks="http://site.com/page.html";
$handlerr = curl_init($mylinks);
curl_setopt($handlerr,  CURLOPT_RETURNTRANSFER, TRUE);
$resp = curl_exec($handlerr);
$ht = curl_getinfo($handlerr, CURLINFO_HTTP_CODE);


if ($ht == '404')
     { echo 'OK';}
else { echo 'NO';}

?>

— T.Todua
fonte

0

Un altro modo per verificare se un URL è valido o meno può essere:

<?php

  if (isValidURL("http://www.gimepix.com")) {
      echo "URL is valid...";
  } else {
      echo "URL is not valid...";
  }

  function isValidURL($url) {
      $file_headers = @get_headers($url);
      if (strpos($file_headers[0], "200 OK") > 0) {
         return true;
      } else {
        return false;
      }
  }
?>

— Antonio Carlos Barbosa
fonte

0

get_headers () restituisce un array con le intestazioni inviate dal server in risposta a una richiesta HTTP.

$image_path = 'https://your-domain.com/assets/img/image.jpg';

$file_headers = @get_headers($image_path);
//Prints the response out in an array
//print_r($file_headers); 

if($file_headers[0] == 'HTTP/1.1 404 Not Found'){
   echo 'Failed because path does not exist.</br>';
}else{
   echo 'It works. Your good to go!</br>';
}

— Jeacovy Gayle
fonte

0

cURL può restituire il codice HTTP Non penso che tutto quel codice extra sia necessario?

function urlExists($url=NULL)
    {
        if($url == NULL) return false;
        $ch = curl_init($url);
        curl_setopt($ch, CURLOPT_TIMEOUT, 5);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $data = curl_exec($ch);
        $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch); 
        if($httpcode>=200 && $httpcode<300){
            return true;
        } else {
            return false;
        }
    }

— Arun Vitto
fonte

0

Una cosa da considerare quando si controlla l'intestazione per un 404 è il caso in cui un sito non genera immediatamente un 404.

Molti siti controllano se esiste una pagina nella fonte PHP / ASP (eccetera) e ti inoltrano a una pagina 404. In questi casi, l'intestazione viene sostanzialmente estesa dall'intestazione del 404 che viene generata. In questi casi l'errore 404 non si trova nella prima riga dell'intestazione, ma nel decimo.

$array = get_headers($url);
$string = $array[0];
print_r($string) // would generate:

Array ( 
[0] => HTTP/1.0 301 Moved Permanently 
[1] => Date: Fri, 09 Nov 2018 16:12:29 GMT 
[2] => Server: Apache/2.4.34 (FreeBSD) LibreSSL/2.7.4 PHP/7.0.31 
[3] => X-Powered-By: PHP/7.0.31 
[4] => Set-Cookie: landing=%2Freed-diffuser-fig-pudding-50; path=/; HttpOnly 
[5] => Location: /reed-diffuser-fig-pudding-50/ 
[6] => Content-Length: 0 
[7] => Connection: close 
[8] => Content-Type: text/html; charset=utf-8 
[9] => HTTP/1.0 404 Not Found 
[10] => Date: Fri, 09 Nov 2018 16:12:29 GMT 
[11] => Server: Apache/2.4.34 (FreeBSD) LibreSSL/2.7.4 PHP/7.0.31 
[12] => X-Powered-By: PHP/7.0.31 
[13] => Set-Cookie: landing=%2Freed-diffuser-fig-pudding-50%2F; path=/; HttpOnly 
[14] => Connection: close 
[15] => Content-Type: text/html; charset=utf-8 
)

— Lexib0y
fonte

0

Eseguo alcuni test per vedere se i link sul mio sito sono validi - mi avvisa quando terze parti cambiano i loro link. Avevo un problema con un sito che aveva un certificato mal configurato che significava che get_headers di php non funzionava.

Quindi, ho letto che il ricciolo era più veloce e ho deciso di provarlo. poi ho avuto un problema con linkedin che mi ha dato un errore 999, che si è rivelato essere un problema di agente utente.

Non mi importava se il certificato non era valido per questo test e non mi importava se la risposta fosse una reindirizzamento.

Quindi ho pensato di usare get_headers comunque se il ricciolo non stava funzionando ...

Provaci....

/**
 * returns true/false if the $url is present.
 *
 * @param string $url assumes this is a valid url.
 *
 * @return bool
 */
private function url_exists (string $url): bool
{
  $ch = curl_init($url);
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_NOBODY, TRUE);             // this does a head request to make it faster.
  curl_setopt($ch, CURLOPT_HEADER, TRUE);             // just the headers
  curl_setopt($ch, CURLOPT_SSL_VERIFYSTATUS, FALSE);  // turn off that pesky ssl stuff - some sys admins can't get it right.
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
  // set a real user agent to stop linkedin getting upset.
  curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36');
  curl_exec($ch);
  $http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
  if (($http_code >= HTTP_OK && $http_code < HTTP_BAD_REQUEST) || $http_code === 999)
  {
    curl_close($ch);
    return TRUE;
  }
  $error = curl_error($ch); // used for debugging.
  curl_close($ch);
  // just try the get_headers - it might work!
  stream_context_set_default(array('http' => array('method' => 'HEAD')));
  $file_headers = @get_headers($url);
  if ($file_headers)
  {
    $response_code = substr($file_headers[0], 9, 3);
    return $response_code >= 200 && $response_code < 400;
  }
  return FALSE;
}

— pgee70
fonte

-2

una specie di vecchio thread, ma .. lo faccio:

$file = 'http://www.google.com';
$file_headers = @get_headers($file);
if ($file_headers) {
    $exists = true;
} else {
    $exists = false;
}

— hackdotslashdotkill
fonte

Sorta .. Ma non esattamente.

— hackdotslashdotkill,

come è migliore la tua risposta?

— Jah,

@Jah ovviamente non lo è a -2. Probabilmente l'ho pubblicato a tarda notte quando ero mezzo addormentato dopo aver fissato gli schermi tutto il giorno ..

— hackdotslashdotkill