Security warning: Please don't use any version older than 1.11!

gzip_cnc - an Apache handler for serving compressed content

What is this all about?

gzip_cnc is a CGI script written in Perl. It has to be embedded into the Apache web server as handler and is then able to serve the requested static page content in gzip compressed form to (sufficiently capable) web browsers.

The program

From all these attributes the program's name is derived.
(A cryptic name may neither be easy to remember nor easy to pronounce - but as unique term for search engines it still has its advantages ...)

Why gzip_cnc - there is mod_gzip already?

Right. If you want your Apache web server to deliver page contents - especially dynamically created ones - in compressed form, then the Apache module mod_gzip is the best solution (known to me). So if you

then you are best taken care of there.

But what if you have a normal home page that mainly consists of static pages and just use a normal web space for this, with a handful of add-ons like CGI and .htaccess - but your provider isn't willing to install mod_gzip on this server? This is the scenario that gzip_cnc is made for, typical for many smaller web sites.

Which requests can be compressed by gzip_cnc?

gzip_cnc can only compress requests for static content.

This is a fundamental restriction of the program itself - because for being able to compress dynamic content it would need to get its hands on this dynamic content after its creation but before it is to be served by the web server. mod_gzip can do this - because it is an Apache module; gzip_cnc is 'only' an Apache handler, and it doesn't make a lot of sense to teach this handler to partially emulate the functions of other handlers (like CGI or SSI) which can be handled a lot more efficiently by the Apache server itself.

Which file types can be handled by gzip_cnc?

The current version of gzip_cnc compresses only one (configurable) document type with the default value of HTML documents - which are the biggest part of the files available in the Web.

The reason why any restriction exists here at all is that gzip_cnc itself needs to understand exactly what it is compressing (because it has to send this information to the browser as well) but unfortunately isn't being informed by the Apache server what it thinks about the content in question, according to its own configuration (as we are 'only' a handler and not a module and therefore cannot directly access the Apache configuration).

It is easy to change this document type and it would not be too difficult to implement more document types - it would end up to use a similar mapping table as the Apache server itself is already using. But you are free to use several instances of gzip_cnc simultaneously and configure each one of them for handling a MIME type of your choice. And when using the configuration via Environment variables this can even be done with just a single gzip_cnc instance.

Am I able to use gzip_cnc for my web space?

To be able to use gzip_cnc calls for a number of prerequisits:

  1. availability of a Perl5 interpreter - as gzip_cnc is a Perl script.
  2. ability to execute your individual CGI applications - as gzip_cnc is one of them.
  3. some compression tool. There are several possibilities for this:
    1. a Perl module Compress::Zlib installed on the server or
    2. a gzip compression program which
      • for UNIX-type operating systems
        • often is already available or
        • may be created from the source code
        as system command gzip and
      • for Windows-type operating systems
    gzip_cnc is testing for both alternatives and takes what it gets - preferring the Perl module (invoking a system command requires starting an additional process all the time).
  4. an Apache web server - as the way gzip_cnc competes for handling the page requests to be served in compressed form is Apache specific.
  5. the ability to complete the web server configuration for your own URL tree, probably by using.htaccess files - as gzip_cnc has to be included into the Apache configuration as responsible program for handling the page requests to be compressed.
  6. the ability to use directives of the FileInfo class within .htaccess files - as to this class belongs the directive for defining the inclusion of gzip_cnc as responsible program for handling page requests to be compressed. The directive AllowOverride FileInfo must have been specified for the corresponding web space in the central Apache configuration to enable this.

At first sight all this may sound awfully complicated and this combination of requirements may look rather improbable. But UNIX (including Linux variants of any kind), Apache and Perl are a reliable team and available at most providers; allowing individual CGI scripts and .htaccess (the latter one often used for access control to protected areas of the web space) are common add-on features, rarely available for free web space but normally included even in the smallest offered package of commercial provider.
The most likely problem seems to be a restriction of the directives to be used within .htaccess - this should be clarified first, either by asking your provider or simply by testing.

How much additional load will the use of gzip_cnc put on the server?

You wouldn't want to use it if you have a commercial high traffic server - simply because in this case you would rather use mod_gzip anyway.

Basically, you replace each static page access by a call of a CGI script that is doing not much more than reading a file content and serving it to the browser, just like the web server would normally do.

The trick is that

On the other side, you pay the price of having one CGI invocation for every page request - so you want to have a fast CGI execution model on your server. On my web space the Apache server is using mod_fastcgi, and even though the machine running this server is merely a Pentium 400 it takes

The logging function of gzip_cnc is telling you the values that hold true for your machine.

But I experience that even though I changed some of the most frequently accessed pages on my web space about once a day during the development phase of gzip_cnc

- and with time this rate should probably decrease (depending on how often you change your documents' content).

So the costs for building up the cache are close to neglectable, and only the costs for serving pages, i. e. for invoking the CGI script, do really count.

How much additional disk space will gzip_cnc's cache use on the server?

A good estimation for disk space occupied by the cache tree is one third of the disk space used for all the uncompressed original HTML files.

The cache tree will contain two types of objects:

So the exact amount of space your cache tree will use depends slightly on the number of directories you have but mainly on the degree of redundancies inside your documents.

And as a rule of thumb: If your cache needs one third of the original web space (including the overhead for the directories) then your visitors will be served one third of the original traffic volume (including the overhead for the HTTP headers) and thus experience response times three times faster than normal.

Your mileage may of course vary if your web space offers a large percentage of other file types than static HTML.

(Michael Schröpl, 2004-01-11)