Fossil: Artifact [32bbb5df]

Artifact 32bbb5df33013ccfcfe02fb5a678520701084014a2bcf0978326e548743769d4:

File www/stats.wiki — part of check-in [b46141f4] at 2018-06-09 16:37:44 on branch trunk — Update secondary mention of SQLite compression ratio to match table (user: andygoth size: 5113) [more...]
<title>Fossil Performance</title>
<h1 align="center">Performance Statistics</h1>

The questions will inevitably arise:  How does Fossil perform?
Does it use a lot of disk space or bandwidth?  Is it scalable?

In an attempt to answers these questions, this report looks at several
projects that use fossil for configuration management and examines how
well they are working.  The following table is a summary of the results.
(Last updated on 2018-06-04.)
Explanation and analysis follows the table.

<table border=1>
<tr>
<th>Project</th>
<th>Number Of Artifacts</th>
<th>Number Of Check-ins</th>
<th>Project&nbsp;Duration<br>(as of 2018-06-04)</th>
<th>Uncompressed Size</th>
<th>Repository Size</th>
<th>Compression Ratio</th>
<th>Clone Bandwidth</th>
</tr>

<tr align="center">
<td>[http://www.sqlite.org/src/timeline | SQLite]
<td>77492
<td>20686
<td>6580&nbsp;days<br>18.02&nbsp;years
<td>5.6&nbsp;GB
<td>70.0&nbsp;MB
<td>80:1
<td>51.1&nbsp;MB
</tr>

<tr align="center">
<td>[http://core.tcl.tk/tcl/timeline | TCL]
<td>161991
<td>23146
<td>7375&nbsp;days<br>20.19&nbsp;years
<td>8.0&nbsp;GB
<td>222.0&nbsp;MB
<td>36:1
<td>150.5&nbsp;MB
</tr>

<tr align="center">
<td>[/timeline | Fossil]
<td>39148
<td>11266
<td>3971&nbsp;days<br>10.87&nbsp;years
<td>3.8&nbsp;GB
<td>42.0&nbsp;MB
<td>90:1
<td>27.4&nbsp;MB
</tr>

<tr align="center">
<td>[http://www.sqlite.org/slt/timeline | SLT]
<td>2384
<td>169
<td>3474&nbsp;days<br>9.51&nbsp;years
<td>2.1&nbsp;GB
<td>145.9&nbsp;MB
<td>14:1
<td>143.4&nbsp;MB
</tr>

<tr align="center">
<td>[http://www.sqlite.org/th3.html | TH3]
<td>12406
<td>3718
<td>3539&nbsp;days<br>9.69&nbsp;years
<td>544&nbsp;MB
<td>18.0&nbsp;MB
<td>30:1
<td>14.7&nbsp;MB
</tr>

<tr align="center">
<td>[http://www.sqlite.org/docsrc/timeline | SQLite Docs]
<td>8752
<td>2783
<td>3857&nbsp;days<br>10.56&nbsp;years
<td>349.9&nbsp;MB
<td>16.3&nbsp;MB
<td>21:1
<td>13.57&nbsp;MB
</tr>

</table>

<h2>Measured Attributes</h2>

In Fossil, every version of every file, every wiki page, every change to
every ticket, and every check-in is a separate "artifact".  One way to
think of a Fossil project is as a bag of artifacts.  Of course, there is
a lot more than this going on in Fossil.  Many of the artifacts have meaning
and are related to other artifacts.  But at a low level (for example when
synchronizing two instances of the same project) the only thing that matters
is the unordered collection of artifacts.  In fact, one of the key
characteristics of Fossil is that the entire project history can be
reconstructed simply by scanning the artifacts in an arbitrary order.

The number of check-ins is the number of times that the "commit" command
has been run.  A single check-in might change a 3 or 4 files, or it might
change dozens or hundreds of files.  Regardless of the number of files
changed, it still only counts as one check-in.

The "Uncompressed Size" is the total size of all the artifacts within
the repository assuming they were all uncompressed and stored
separately on the disk.  Fossil makes use of delta compression between related
versions of the same file, and then uses zlib compression on the resulting
deltas.  The total resulting repository size is shown after the uncompressed
size.

On the right end of the table, we show the "Clone Bandwidth".  This is the
total number of bytes sent from server back to the client.  The number of
bytes sent from client to server is negligible in comparison.
These byte counts include HTTP protocol overhead.

In the table and throughout this article,
"GB" means gigabytes (10<sup><small>9</small></sup> bytes)
not <a href="http://en.wikipedia.org/wiki/Gibibyte">gibibytes</a>
(2<sup><small>30</small></sup> bytes).  Similarly, "MB" and "KB"
means megabytes and kilobytes, not mebibytes and kibibytes.

<h2>Analysis And Supplemental Data</h2>

Perhaps the two most interesting datapoints in the above table are SQLite
and SLT.  SQLite is a long-running project with long revision chains.
Some of the files in SQLite have been edited over a thousand times.
Each of these edits is stored as a delta, and hence the SQLite project
gets excellent 80:1 compression.  SLT, on the other hand, consists of
many large (megabyte-sized) SQL scripts that have one or maybe two
edits each.  There is very little delta compression occurring and so the
overall repository compression ratio is much lower.  Note also that
quite a bit more bandwidth is required to clone SLT than SQLite.

For the first nine years of its development, SQLite was versioned by CVS.
The resulting CVS repository measured over 320MB in size.  So, the
developers were surprised to see that the equivalent Fossil project (the
first nine years on SQLite) would clone with only 13MB of bandwidth.
The "sync" protocol
used by fossil has turned out to be surprisingly efficient.  A typical
check-in on SQLite might use 3 or 4KB of network bandwidth.
For example, the [04eef9522386a59e] check-in used a single HTTP request
of 2099 bytes and got back a reply of 1116 bytes.
The sync protocol is efficient enough that, once cloned,
Fossil can easily be used over a dial-up connection.