Header only retrieval in php via curl

PhpCurlHeader

Php Problem Overview


Actually I have two questions.

(1) Is there any reduction in processing power or bandwidth used on remote server if I retrieve only headers as opposed to full page retrieval using php and curl?

(2) Since I think, and I might be wrong, that answer to first questions is YES, I am trying to get last modified date or If-Modified-Since header of remote file only in order to compare it with time-date of locally stored data, so I can, in case it has been changed, store it locally. However, my script seems unable to fetch that piece of info, I get NULL, when I run this:

class last_change {

 public last_change;

 function set_last_change() {
  $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, "http://url/file.xml");
    curl_setopt($curl, CURLOPT_HEADER, true);
    curl_setopt($curl, CURLOPT_FILETIME, true);
    curl_setopt($curl, CURLOPT_NOBODY, true);
  // $header = curl_exec($curl);
  $this -> last_change = curl_getinfo($header);
  curl_close($curl);
 }

 function get_last_change() {
  return $this -> last_change['datetime']; // I have tested with Last-Modified & If-Modified-Since to no avail
 }

}

In case $header = curl_exec($curl) is uncomented, header data is displayed, even if I haven't requested it and is as follows:

HTTP/1.1 200 OK
Date: Fri, 04 Sep 2009 12:15:51 GMT
Server: Apache/2.2.8 (Linux/SUSE)
Last-Modified: Thu, 03 Sep 2009 12:46:54 GMT
ETag: "198054-118c-472abc735ab80"
Accept-Ranges: bytes
Content-Length: 4492
Content-Type: text/xml

Based on that, 'Last-Modified' is returned.

So, what am I doing wrong?

Php Solutions


Solution 1 - Php

You are passing $header to curl_getinfo(). It should be $curl (the curl handle). You can get just the filetime by passing CURLINFO_FILETIME as the second parameter to curl_getinfo(). (Often the filetime is unavailable, in which case it will be reported as -1).

Your class seems to be wasteful, though, throwing away a lot of information that could be useful. Here's another way it might be done:

class URIInfo 
{
    public $info;
    public $header;
    private $url;
    
    public function __construct($url)
    {
        $this->url = $url;
        $this->setData();
    }
    
    public function setData() 
    {
        $curl = curl_init();
        curl_setopt($curl, CURLOPT_URL, $this->url);
        curl_setopt($curl, CURLOPT_FILETIME, true);
        curl_setopt($curl, CURLOPT_NOBODY, true);
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($curl, CURLOPT_HEADER, true);
        $this->header = curl_exec($curl);
        $this->info = curl_getinfo($curl);
        curl_close($curl);
    }
    
    public function getFiletime() 
    {
        return $this->info['filetime'];
    }
    
    // Other functions can be added to retrieve other information.
}

$uri_info = new URIInfo('http://www.codinghorror.com/blog/');
$filetime = $uri_info->getFiletime();
if ($filetime != -1) {
    echo date('Y-m-d H:i:s', $filetime);
} else {
    echo 'filetime not available';
}

Yes, the load will be lighter on the server, since it's only returning only the HTTP header (responding, after all, to a HEAD request). How much lighter will vary greatly.

Solution 2 - Php

Why use CURL for this? There is a PHP-function for that:

$headers=get_headers("http://www.amazingjokes.com/img/2014/530c9613d29bd_CountvonCount.jpg");
print_r($headers);

returns the following:

Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Date: Tue, 11 Mar 2014 22:44:38 GMT
    [2] => Server: Apache
    [3] => Last-Modified: Tue, 25 Feb 2014 14:08:40 GMT
    [4] => ETag: "54e35e8-8873-4f33ba00673f4"
    [5] => Accept-Ranges: bytes
    [6] => Content-Length: 34931
    [7] => Connection: close
    [8] => Content-Type: image/jpeg
)

Should be easy to get the content-type after this.

You could also add the format=1 to get_headers:

$headers=get_headers("http://www.amazingjokes.com/img/2014/530c9613d29bd_CountvonCount.jpg",1);
    print_r($headers);

This will return the following:

Array
(
    [0] => HTTP/1.1 200 OK
    [Date] => Tue, 11 Mar 2014 22:44:38 GMT
    [Server] => Apache
    [Last-Modified] => Tue, 25 Feb 2014 14:08:40 GMT
    [ETag] => "54e35e8-8873-4f33ba00673f4"
    [Accept-Ranges] => bytes
    [Content-Length] => 34931
    [Connection] => close
    [Content-Type] => image/jpeg
)

More reading here (PHP.NET)

Solution 3 - Php

(1) Yes. A HEAD request (as you're issuing in this case) is far lighter on the server because it only returns the HTTP headers, as opposed to the headers and content like a standard GET request.

(2) You need to set the CURLOPT_RETURNTRANSFER option to true before you call curl_exec() to have the content returned, as opposed to printed:

curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

That should also make your class work correctly.

Solution 4 - Php

You can set the default stream context:

stream_context_set_default(
    array(
        'http' => array(
            'method' => 'HEAD'
        )
    )
);

Then use:

$headers = get_headers($url,1);

get_headers seems to be more efficient than cURL once get_headers skip steps like trigger authentication routines such as log in prompts or cookies.

Solution 5 - Php

Here is my implementation using CURLOPT_HEADER, then parsing the output string into a map:

function http_headers($url){
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_HEADER, true);

    $headers = curl_exec($ch);

    curl_close($ch);

    $data = [];
    $headers = explode(PHP_EOL, $headers);
    foreach ($headers as $row) {
        $parts = explode(':', $row);
        if (count($parts) === 2) {
            $data[trim($parts[0])] = trim($parts[1]);
        }
    }

    return $data;
};

Sample usage:

$headers = http_headers('https://i.ytimg.com/vi_webp/g-dKXOlsf98/hqdefault.webp');
print_r($headers);

Array
(
    ['Content-Type'] => 'image/webp'
    ['ETag'] => '1453807629'
    ['X-Content-Type-Options'] => 'nosniff'
    ['Server'] => 'sffe'
    ['Content-Length'] => 32958
    ['X-XSS-Protection'] => '1; mode=block'
    ['Age'] => 11
    ['Cache-Control'] => 'public, max-age=7200'
)

Solution 6 - Php

You need to add

curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

to return the header instead of printing it.

Whether returning only the headers is lighter on the server depends on the script that's running, but usually it will be.

I think you also want "filetime" instead of "datetime".

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionKruleView Question on Stackoverflow
Solution 1 - PhpGZippView Answer on Stackoverflow
Solution 2 - PhppatrickView Answer on Stackoverflow
Solution 3 - PhpIan KempView Answer on Stackoverflow
Solution 4 - PhplipinfView Answer on Stackoverflow
Solution 5 - Phprodrigo-silveiraView Answer on Stackoverflow
Solution 6 - PhpGregView Answer on Stackoverflow