PHP script to log in to Drupal and save web page using cURL

The following the PHP script will log in to Drupal 6 and save a web page using cURL. It can be run from the command line (provided PHP is at /user/bin/php) via e.g. a cron job.

#!/usr/bin/php
<?php
/**
* Log in to Drupal 6 as USERNAME:PASSWORD and save URL as FILENAME.
*/

define('USERNAME', 'username');
define('PASSWORD', 'password');
define('DOMAIN', 'http://example.com'); // This must not have trailing slash.
define('URL', DOMAIN.'/webpage.html');
define('FILENAME', dirname(__FILE__).'webpage.html');
//define('FILENAME', dirname(__FILE__).'webpage-'.date("Y-m-d",time()).'.html'); // Add date in yyyy-mm-dd format.
//define('FILENAME', dirname(__FILE__).'/webpage'.date("c",time()).'.html'); // Add date/time in ISO 8601 format.

function logout($crl) {
  curl_setopt($crl, CURLOPT_URL, DOMAIN.'/logout');
  curl_setopt($crl, CURLOPT_POST, 0);
  curl_exec($crl);
}

// Show all errors.
ini_set('display_errors',true);

// Set up cURL.
$crl = curl_init();
curl_setopt($crl, CURLOPT_COOKIEFILE, "/tmp/cookie.txt");
curl_setopt($crl, CURLOPT_COOKIEJAR, "/tmp/cookie.txt");
curl_setopt($crl, CURLOPT_FOLLOWLOCATION, 1); // Follow 'Location' headers.
curl_setopt($crl, CURLOPT_RETURNTRANSFER, 1); // Return query results as a string rather than printing them.

// Make sure we're logged out.
echo "Making sure we're logged out\n";
logout($crl);

// Log in.
echo "Logging in\n";
$login_url = DOMAIN.'/user/login';
curl_setopt($crl, CURLOPT_URL, $login_url);
curl_setopt($crl, CURLOPT_POST, 1);
$postdata = array(
  "name" => USERNAME,
  "pass" => PASSWORD,
  "form_id" => "user_login",
  "op" => "Log in",
);
curl_setopt($crl, CURLOPT_POSTFIELDS, $postdata);
$result = curl_exec($crl);
$headers = curl_getinfo($crl);
if ($headers['url'] == $login_url) {
  logout($crl);
  exit("Could not log in\n"); // Or already logged in - simply no way of knowing.
}

// Get URL and save as FILENAME.
echo "Downloading file\n";
set_time_limit(0); // Useful when downloading big files, as it prevents timeout of the script.
$fp = fopen(FILENAME, 'w+'); // Open file.
curl_setopt($crl, CURLOPT_URL, URL);
curl_setopt($crl, CURLOPT_POST, 0);
curl_setopt($crl, CURLOPT_TIMEOUT, 60); // 60 seconds.
curl_setopt($crl, CURLOPT_FILE, $fp);
curl_exec($crl);
fclose($fp); // Close file (though cURL can still write to it!)
curl_setopt($crl, CURLOPT_RETURNTRANSFER, 0); // Reset return method so that cURL doesn't append result of logout to the file.
curl_setopt($crl, CURLOPT_RETURNTRANSFER, 1); // Return query results as a string rather than printing them.

// Log out (this must be done even though we later closing cURL session, otherwise we'll not be able to log in again afterwards).
echo "Logging out\n";
logout($crl);

// Close cURL session.
echo "Done\n";
curl_close($crl);

exit();

You can download it here: save-url-as-file.

Last modified: 04/10/2009 Tags: ,

This website is a personal resource. Nothing here is guaranteed correct or complete, so use at your own risk and try not to delete the Internet. -Stephan

Site Info

Privacy policy

Go to top