Fossil

Artifact [1213785a]
Login

Artifact [1213785a]

Artifact 1213785a5b889c2eaf6bfbe05836f2e253a6fdd7a4de14b36dbdf248c178ca9d:


<title>How CGI Works In Fossil</title>
<h2>Introduction</h2><blockquote>
<p>CGI or "Common Gateway Interface" is a venerable yet reliable technique for
generating dynamic web content.  This article gives a quick background on how
CGI works and describes how Fossil can act as a CGI service.
<p>This is a "how it works" guide.  If you just want to set up Fossil
as a CGI server, see the [./server/ | Fossil Server Setup] page.
</blockquote>
<h2>A Quick Review Of CGI</h2><blockquote>
<p>
An HTTP request is a block of text that is sent by a client application
(usually a web browser) and arrives at the web server over a network
connection.  The HTTP request contains a URL that describes the information
being requested.  The URL in the HTTP request is typically the same URL
that appears in the URL bar at the top of the web browser that is making
the request.  The URL might contain a "?" character followed
query parameters.  The HTTP will usually also contain other information
such as the name of the application that made the request, whether or
not the requesting application can except a compressed reply, POST
parameters from forms, and so forth.
<p>
The job of the web server is to interpret the HTTP request and formulate
an appropriate reply.
The web server is free to interpret the HTTP request in any way it wants.
But most web servers follow a similar pattern, described below.
(Note: details may vary from one web server to another.)
<p>
Suppose the URL in the HTTP request looks like this:
<blockquote><b>/one/two/timeline/four</b></blockquote>
Most web servers will search their content area for files that match
some prefix of the URL.  The search starts with <b>/one</b>, then goes to
<b>/one/two</b>, then <b>/one/two/timeline</b>, and finally
<b>/one/two/timeline/four</b> is checked.  The search stops at the first
match.
<p>
Suppose the first match is <b>/one/two</b>.  If <b>/one/two</b> is an
ordinary file in the content area, then that file is returned as static
content.  The "<b>/timeline/four</b>" suffix is silently ignored.
<p>
If <b>/one/two</b> is a CGI script (or program), then the web server
executes the <b>/one/two</b> script.  The output generated by
the script is collected and repackaged as the HTTP reply.
<p>
Before executing the CGI script, the web server will set up various
environment variables with information useful to the CGI script:
<table border=1 cellpadding=5>
<tr><th>Environment<br>Variable<th>Meaning
<tr><td>GATEWAY_INTERFACE<td>Always set to "CGI/1.0"
<tr><td>REQUEST_URI
    <td>The input URL from the HTTP request.
<tr><td>SCRIPT_NAME
    <td>The prefix of the input URL that matches the CGI script name.
    In this example: "/one/two".
<tr><td>PATH_INFO
    <td>The suffix of the URL beyond the name of the CGI script.
    In this example: "timeline/four".
<tr><td>QUERY_STRING
    <td>The query string that follows the "?" in the URL, if there is one.
</table>
<p>
There are other CGI environment variables beyond those listed above.
Many Fossil servers implement the
[https://www.fossil-scm.org/fossil/test_env/two/three?abc=xyz|test_env]
webpage that shows some of the CGI environment
variables that Fossil pays attention to.
<p>
In addition to setting various CGI environment variables, if the HTTP
request contains POST content, then the web server relays the POST content
to standard input of the CGI script.
<p>
In summary, the task of the
CGI script is to read the various CGI environment variables and
the POST content on standard input (if any), figure out an appropriate
reply, then write that reply on standard output.
The web server will read the output from the CGI script, reformat it
into an appropriate HTTP reply, and relay the result back to the
requesting application.
The CGI script exits as soon as it generates a single reply.
The web server will (usually) persist and handle multiple HTTP requests,
but a CGI script handles just one HTTP request and then exits.
<p>
The above is a rough outline of how CGI works.
There are many details omitted from this brief discussion.
See other on-line CGI tutorials for further information.
</blockquote>
<h2>How Fossil Acts As A CGI Program</h2>
<blockquote>
An appropriate CGI script for running Fossil will look something
like the following:
<blockquote><pre>
#!/usr/bin/fossil
repository: /home/www/repos/project.fossil
</pre></blockquote>
The first line of the script is a
"[https://en.wikipedia.org/wiki/Shebang_%28Unix%29|shebang]"
that tells the operating system what program to use as the interpreter
for this script.  On unix, when you execute a script that starts with
a shebang, the operating system runs the program identified by the
shebang with a single argument that is the full pathname of the script
itself.
In our example, the interpreter is Fossil, and the argument might
be something like "/var/www/cgi-bin/one/two" (depending on how your
particular web server is configured).
<p>
The Fossil program that is run as the script interpreter
is the same Fossil that runs when
you type ordinary Fossil commands like "fossil sync" or "fossil commit".
But in this case, as soon as it launches, the Fossil program
recognizes that the GATEWAY_INTERFACE environment variable is
set to "CGI/1.0" and it therefore knows that it is being used as
CGI rather than as an ordinary command-line tool, and behaves accordingly.
<p>
When Fossil recognizes that it is being run as CGI, it opens and reads
the file identified by its sole argument (the file named by
<code>argv&#91;1&#93;</code>).  In our example, the second line of that file
tells Fossil the location of the repository it will be serving.
Fossil then starts looking at the CGI environment variables to figure
out what web page is being requested, generates that one web page,
then exits.
<p>
Usually, the webpage being requested is the first term of the
PATH_INFO environment variable.  (Exceptions to this rule are noted
in the sequel.)  For our example, the first term of PATH_INFO
is "timeline", which means that Fossil will generate
the [/help?cmd=/timeline|/timeline] webpage.
<p>
With Fossil, terms of PATH_INFO beyond the webpage name are converted into
the "name" query parameter.  Hence, the following two URLs mean
exactly the same thing to Fossil:
<ol type='A'>
<li> [https://www.fossil-scm.org/fossil/info/c14ecc43]
<li> [https://www.fossil-scm.org/fossil/info?name=c14ecc43]
</ol>
In both cases, the CGI script is called "/fossil".  For case (A),
the PATH_INFO variable will be "info/c14ecc43" and so the
"[/help?cmd=/info|/info]" webpage will be generated and the suffix of
PATH_INFO will be converted into the "name" query parameter, which
identifies the artifact about which information is requested.
In case (B), the PATH_INFO is just "info", but the same "name"
query parameter is set explicitly by the URL itself.
</blockquote>
<h2>Serving Multiple Fossil Repositories From One CGI Script</h2>
<blockquote>
The previous example showed how to serve a single Fossil repository
using a single CGI script.
On a website that wants to serve multiple repositories, one could
simply create multiple CGI scripts, one script for each repository.
But it is also possible to serve multiple Fossil repositories from
a single CGI script.
<p>
If the CGI script for Fossil contains a "directory:" line instead of
a "repository:" line, then the argument to "directory:" is the name
of a directory that contains multiple repository files, each ending
with ".fossil".  For example:
<blockquote><pre>
#!/usr/bin/fossil
directory: /home/www/repos
</pre></blockquote>
Suppose the /home/www/repos directory contains files named
<b>one.fossil</b>, <b>two.fossil</b>, and <b>subdir/three.fossil</b>.
Further suppose that the name of the CGI script (relative to the root
of the webserver document area) is "cgis/example2".  Then to
see the timeline for the "three.fossil" repository, the URL would be:
<blockquote>
<b>http://example.com/cgis/example2/subdir/three/timeline</b>
</blockquote>
Here is what happens:
<ol>
<li> The input URI on the HTTP request is
     <b>/cgis/example2/subdir/three/timeline</b>
<li> The web server searches prefixes of the input URI until it finds
     the "cgis/example2" script.  The web server then sets
     PATH_INFO to the "subdir/three/timeline" suffix and invokes the
     "cgis/example2" script.
<li> Fossil runs and sees the "directory:" line pointing to
     "/home/www/repos".  Fossil then starts pulling terms off the
     front of the PATH_INFO looking for a repository.  It first looks
     at "/home/www/resps/subdir.fossil" but there is no such repository.
     So then it looks at "/home/www/repos/subdir/three.fossil" and finds
     a repository.  The PATH_INFO is shortened by removing
     "subdir/three/" leaving it at just "timeline".
<li> Fossil looks at the rest of PATH_INFO to see that the webpage
     requested is "timeline".
</ol>
</blockquote>
<h2>Additional CGI Script Options</h2>
<blockquote>
<p>
The CGI script can have additional options used to fine-tune
Fossil's behavior.  See the [./cgi.wiki|CGI script documentation]
for details.
</p>
</blockquote>
<h2>Additional Observations</h2>
<blockquote><ol type="I">
<li><p>
Fossil does not distinguish between the various HTTP methods (GET, PUT,
DELETE, etc).  Fossil figures out what it needs to do purely from the
webpage term of the URI.
<li><p>
Fossil does not distinguish between query parameters that are part of the
URI, application/x-www-form-urlencoded or multipart/form-data encoded
parameter that are part of the POST content, and cookies.  Each information
source is seen as a space of key/value pairs which are loaded into an
internal property hash table.  The code that runs to generate the reply
can then reference various properties values.
Fossil does not care where the value of each property comes from (POST
content, cookies, or query parameters) only that the property exists
and has a value.
<li><p>
The "[/help?cmd=ui|fossil ui]" and "[/help?cmd=server|fossil server]" commands
are implemented using a simple built-in web server that accepts incoming HTTP
requests, translates each request into a CGI invocation, then creates a
separate child Fossil process to handle each request.  In other words, CGI
is used internally to implement "fossil ui/server".
<p>
SCGI is processed using the same built-in web server, just modified
to parse SCGI requests instead of HTTP requests.  Each SCGI request is
converted into CGI, then Fossil creates a separate child Fossil
process to handle each CGI request.
<li><p>
Fossil is itself often launched using CGI.  But Fossil can also then
turn around and launch [./serverext.wiki|sub-CGI scripts to implement
extensions].
</ol>
</blockquote>