Skip to content

metarefresh fails when source URL responds without HTTP version and response code #42

@aaronhark

Description

@aaronhark

Describe the bug
The metarefresh module can fetch metadata from a list of provided URLs and compile it in to the SimpleSAMLphp flatfile format. If one of those URLs responds but does not include a header with its HTTP version and response code (e.g. HTTP/1.1 200 OK), the entire metarefresh process fails due to an uncaught exception. The exception occurs because the logic in MetaLoader.php that handles the response does not account for the possibility of a null value for this header.

Specifics of Environment
Using SimpleSAMLphp 2.3.5 as an SP on Linux with PHP 8.3.10

To Reproduce

  1. Install simplesamlphp ^2.3.5
  2. Install module metarefresh ^1.2
  3. Configure module_metarefresh.php to use these three example URL-based sources and output as flatfile:
  1. Trigger metarefresh via the cron module

Expected Behavior
Fetch the XML metadata from each specified URL, and compile it into the SimpleSAMLphp flatfile format.

Actual Behavior
Because the web server of the second link (https://amidp.drew.edu/nidp/saml2/metadata) returns the XML without an HTTP version and response code, nothing gets loaded into the [0] key of the response array and an exception is subsequently thrown. We can debate whether the web server owner ought to fix its response behavior, but the fact of the matter is there's nothing wrong with the actual XML metadata it serves up. The only reason I know it's not returning an HTTP version and response code is from debugging this issue.

PHP Warning:  Undefined array key 0 in /vendor/simplesamlphp/simplesamlphp/modules/metarefresh/src/MetaLoader.php on line 125
PHP Fatal error:  Uncaught TypeError: preg_match(): Argument #2 ($subject) must be of type string, null given in /vendor/simplesamlphp/simplesamlphp/modules/metarefresh/src/MetaLoader.php:125
Stack trace:
#0 /vendor/simplesamlphp/simplesamlphp/modules/metarefresh/src/MetaLoader.php(125): preg_match('@^HTTP/1\\.[01]\\...', NULL)
#1 /vendor/simplesamlphp/simplesamlphp/modules/metarefresh/src/MetaRefresh.php(129): SimpleSAML\Module\metarefresh\MetaLoader->loadSource(Array)
#2 /vendor/simplesamlphp/simplesamlphp/modules/metarefresh/hooks/hook_cron.php(27): SimpleSAML\Module\metarefresh\MetaRefresh->runRefresh('hourly')
#3 /vendor/simplesamlphp/simplesamlphp/src/SimpleSAML/Module.php(597): metarefresh_hook_cron(Array)
#4 /vendor/simplesamlphp/simplesamlphp/modules/cron/src/Cron.php(55): SimpleSAML\Module::callHooks('cron', Array)
#5 /vendor/simplesamlphp/simplesamlphp/modules/cron/bin/cron.php(45): SimpleSAML\Module\cron\Cron->runTag('hourly')
#6 {main}
  thrown in /vendor/simplesamlphp/simplesamlphp/modules/metarefresh/src/MetaLoader.php on line 125

Solution
There are several ways to resolve this; I'm just not sure what the preferred approach would be for the maintainers of this project. Therefore, I'm presenting two potential solutions inline rather than doing a PR. If one of these is preferred, I'm happy to fork and submit a PR. The most straightforward place to effect this change is at lines 118-134 of src/MetaLoader.php.

Option 1: If for some reason it is critical to this module that the HTTP version and response code header be present, more gracefully handle the lack of one by modifying the code to read as follows. With this option, the reproduction steps I provided above would result in the metadata being loaded successfully from URLs 1 and 3, and URL 2 throwing a warning to the logs.

// We have response headers, so the request succeeded
if (!isset($responseHeaders)) {
    // No response headers, this means the request failed in some way, so re-use old data
    Logger::warning('No response from ' . $source['src'] . ' - attempting to re-use cached metadata');
    $this->addCachedMetadata($source);
    return;
} elseif (!isset($responseHeaders[0])) {
    // Failed to receive an HTTP version and response code in the [0] array value. Treat as an invalid file.
    Logger::warning('Received no HTTP header in response - attempting to re-use cached metadata');
    $this->addCachedMetadata($source);
    return;
} elseif (preg_match('@^HTTP/1\.[01]\s304\s@', $responseHeaders[0])) {
    // 304 response
    Logger::debug('Received HTTP 304 (Not Modified) - attempting to re-use cached metadata');
    $this->addCachedMetadata($source);
    return;
} elseif (!preg_match('@^HTTP/1\.[01]\s200\s@', $responseHeaders[0])) {
    // Other error
    Logger::info('Error from ' . $source['src'] . ' - attempting to re-use cached metadata');
    $this->addCachedMetadata($source);
    return;
}

Option 2: Rather than testing for the HTTP version and response code -- which is irrelevant to the validity of the content served up -- it seems it might make more sense to look at the content-type header to ensure we actually received text/xml. With this option, the reproduction steps I provided above would result in the metadata being loaded successfully from all three URLs.

// We have response headers, so the request succeeded
if (!isset($responseHeaders['content-type']) || substr($responseHeaders['content-type'], 0, 8) != "text/xml") {
    // Invalid response headers. This means the request failed in some way, so re-use old data
    Logger::info('Invalid response headers from ' . $source['src'] . ' - attempting to re-use cached metadata');
    $this->addCachedMetadata($source);
    return;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions