Security warning: Please don't use any version older than 1.11!

The processing logic of gzip_cnc

Order of the main program's operation

  1. Do we have a value for the path name of the file to be served?
    (in the Environment variable PATH_TRANSLATED - this one is set by the web server for each Apache handler.)
    • no => perform the self-test function.
      (We cannot have been invoked as a handler, for we don't just know what we have to do.)
  2. Do the Environment variables PATH_INFO and REDIRECT_URL contain the same value?
    • no => reject the request with the HTTP status 403.
      (This seems to have been an attempt to invoke the script directly and thus avoid existing Apache access control mechanisms; this test has been added in version 1.11.)
  3. Do we have a URL value for the file to be served?
    (in the Environment variable PATH_INFO - this one is set by the web server for each Apache handler.)
  4. Compute the name of the corresponding cache file from the values of these two Environment variables.
    (PATH_INFO provides the path name, PATH_TRANSLATION the file name which may be the result of a Content Negotiation already performed by the Apache server, whereas PATH_INFO still contains the original URL of the request.)
  5. Read the attributes of the original file.
    (using the system function stat())
    Did this work?
    • no => perform a processing for a missing page.
      (We cannot access this file - it makes no difference for the visitor whether this file doesn't exist or whether any kind of system error has occurred.)
  6. Did the UserAgent allow the serving of compressed data?
    (i. e. supplied the value gzip within the Accept-Encoding HTTP header)
  7. Read the attributes of the compressed file in the cache.
    (using the system function stat())
    Did this work?
  8. Is the last modification date for the cache file newer than the one for the original file?
  9. Is the content of the cache file smaller than the one for the original file?
    • no => serve the content of the original file.
      (The compressed version is good for nothing - but it doesn't make sense to remove it because it would then have to be created again during the next access to recognize that it is worthless.)
  10. Serve the content of the cache file.

Creating (or updating the content of) a cache file

It makes no difference whether

- in both cases it is necessary to fill this cache file with a compressed form of the current content of the original file.

  1. Split the path name of the cache file into directory part and file part.
    Did this work?
  2. Compute the name of the cache directory for the compressed version of the requested file from the directory part.
  3. If this directory doesn't exist already, try to create it.
    Did this work?
  4. Create a random but unique name of a temporary file inside the cache directory.
  5. Compress the content of the original file into this temporary file.
    Did this work?
  6. Rename the temporary file to the name of the cache file.
    Did this work?
    • no => remove the temporary file and serve the content of the original file.
      (We possibly might have served the version but something strange has been gone wrong here, so we rather play it safe.)
  7. Read the attributes of the compressed file from the cache
    (using the system function stat()).
    Did this work?

Serving a file content

Whether the content of a file is to be served in compressed or uncompressed form is a relatively small difference, therefore both processes are handled by the same function.

  1. Open the file to be sent.
    If this did work then
    1. Send the HTTP status 200 showing the successful processing.
    2. Send the HTTP header Date describing the current server time.
    3. Send the HTTP header Vary describing the Content Negotiation performed
      (as mark of the conditional serving of the file content depending on the content of the received HTTP header Accept-Encoding - a proxy server ought to be informed that it must not cache and serve such pages as response to subsequent requests for the same URL without hesitation).
    4. Send the HTTP header Last-Modified describing the last modification date of the content.
      (The browser will send this value back to the server if it wants to check the validity of its cache content.)
    5. Send the HTTP header Content-Type describing the data type of the served file content.
    6. Send the HTTP header Content-Length describing the length of the served file content.
    7. On request send the HTTP headers
      • Expires describing the guaranteed validity of the content inside the browser cache of a HTTP/1.0 client and
      • Cache-Control describing the guaranteed validity of the content inside the browser cache of a HTTP/1.1 client.
      (Both of the above has nothing to do with compression - but requests not even sent by the browser at all are the cheapest ones in respect to bandwidth.)
    8. Are we about to serve compressed data?
      • Send the HTTP header Content-Encoding describing the encoding of the served file content.
      • On request send our own HTTP headers:
        1. X-Gzipcnc-Original-File-Size describing the size of the original file,
        2. X-Gzipcnc-Version describing the program version used,
        3. X-Gzipcnc-Path-Info describing the URL of the requested document (in active self test mode only) and
        4. X-Gzipcnc-Path-Translated describing the file name of the requested document (in active self test mode only).
    9. Send the file content.
    10. Close the file.
    11. Perform the termination processing
  2. If we were about to serve a compressed cache file (but failed doing so)

Processing for a missing page

  1. If a user specific error document has been defined in gzip_cnc
    • as an URL then send a Location HTTP header to redirect to this URL.
    • as a path name then serve this file's content.
  2. Otherwise display our own error document.
  3. perform the termination processing.

Termination processing

  1. If a log file has been defined then
    1. Compute a formatted representation of the current date and time.
    2. Compute the used CPU time.
    3. Compute the data volume saved.
    4. Write a message into the log file.
  2. Terminate the program's processing.

(Michael Schröpl, 2002-09-08)