Fossil

Check-in [796db898]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Documentation updates - remove references to SHA1 as the only available hash algorithm.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 796db898c7c5b2ccedf5bf3c36912ac56608baa1
User & Date: drh 2017-03-02 00:10:41
Context
2017-03-02
01:28
Fix harmless compiler warnings in the new hardened SHA1 implementation. check-in: ed30b3d6 user: drh tags: trunk
00:10
Documentation updates - remove references to SHA1 as the only available hash algorithm. check-in: 796db898 user: drh tags: trunk
2017-03-01
22:26
Fix a redundant #include in sha1hard.c. check-in: 0f5dc21e user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to www/branching.wiki.

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
...
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
<tr><td align="center">
<img src="branch01.gif" width=280 height=68><br>
Figure 1
</td></tr></table>

Each circle represents a check-in.  For the sake of clarity, the check-ins
are given small consecutive numbers.  In a real system, of course, the
check-in numbers would be 40-character SHA1 hashes since it is not possible
to allocate collision-free sequential numbers in a distributed system.
But as sequential numbers are easier to read, we will substitute them for
the 40-character SHA1 hashes in this document.

The arrows in figure 1 show the evolution of a project.  The initial
check-in is 1.  Check-in 2 is derived from 1.  In other words, check-in 2
was created by making edits to check-in 1 and then committing those edits.
We say that 2 is a <i>child</i> of 1
and that 1 is a <i>parent</i> of 2.
Check-in 3 is derived from check-in 2, making
................................................................................

The initial check-in of every repository has two propagating tags.  In
figure 5, that initial check-in is check-in 1.  The <b>branch</b> tag
tells (by its value)  what branch the check-in is a member of.
The default branch is called "trunk."  All tags that begin with "<b>sym-</b>"
are symbolic name tags.  When a symbolic name tag is attached to a
check-in, that allows you to refer to that check-in by its symbolic
name rather than by its 40-character SHA1 hash name.  When a symbolic name
tag propagates (as does the <b>sym-trunk</b> tag) then referring to that
name is the same as referring to the most recent check-in with that name.
Thus the two tags on check-in 1 cause all descendants to be in the
"trunk" branch and to have the symbolic name "trunk."

Check-in 4 has a <b>branch</b> tag which changes the name of the branch
to "test."  The branch tag on check-in 4 propagates to check-ins 6 and 9.







|


|







 







|







8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
...
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
<tr><td align="center">
<img src="branch01.gif" width=280 height=68><br>
Figure 1
</td></tr></table>

Each circle represents a check-in.  For the sake of clarity, the check-ins
are given small consecutive numbers.  In a real system, of course, the
check-in numbers would be long hexadecimal hashes since it is not possible
to allocate collision-free sequential numbers in a distributed system.
But as sequential numbers are easier to read, we will substitute them for
the long hashes in this document.

The arrows in figure 1 show the evolution of a project.  The initial
check-in is 1.  Check-in 2 is derived from 1.  In other words, check-in 2
was created by making edits to check-in 1 and then committing those edits.
We say that 2 is a <i>child</i> of 1
and that 1 is a <i>parent</i> of 2.
Check-in 3 is derived from check-in 2, making
................................................................................

The initial check-in of every repository has two propagating tags.  In
figure 5, that initial check-in is check-in 1.  The <b>branch</b> tag
tells (by its value)  what branch the check-in is a member of.
The default branch is called "trunk."  All tags that begin with "<b>sym-</b>"
are symbolic name tags.  When a symbolic name tag is attached to a
check-in, that allows you to refer to that check-in by its symbolic
name rather than by its hexadecimal hash name.  When a symbolic name
tag propagates (as does the <b>sym-trunk</b> tag) then referring to that
name is the same as referring to the most recent check-in with that name.
Thus the two tags on check-in 1 cause all descendants to be in the
"trunk" branch and to have the symbolic name "trunk."

Check-in 4 has a <b>branch</b> tag which changes the name of the branch
to "test."  The branch tag on check-in 4 propagates to check-ins 6 and 9.

Changes to www/checkin_names.wiki.

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
..
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64

<table align="right" border="1" width="33%" cellpadding="10">
<tr><td>
<h3>Executive Summary</h3>
<p>A check-in can be identified using any of the following
names:
<ul>
<li> SHA1 hash prefix
<li> Tag or branchname
<li> Timestamp:  <i>YYYY-MM-DD HH:MM:SS</i>
<li> <i>tag-name</i> <big><b>:</b></big> <i>timestamp</i>
<li> <b>root :</b> <i>branchname</i>
<li> Special names:
<ul>
<li> <b>tip</b>
................................................................................
determines which version of the documentation to display.

Fossil provides a variety of ways to specify a check-in.  This
document describes the various methods.

<h2>Canonical Check-in Name</h2>

The canonical name of a check-in is the SHA1 hash of its
[./fileformat.wiki#manifest | manifest] expressed as a 40-character
lowercase hexadecimal number.  For example:

<blockquote><pre>
fossil info e5a734a19a9826973e1d073b49dc2a16aa2308f9
</pre></blockquote>

The full 40-character SHA1 hash is unwieldy to remember and type, though,
so Fossil also accepts a unique prefix of the hash, using any combination
of upper and lower case letters, as long as the prefix is at least 4
characters long.  Hence the following commands all
accomplish the same thing as the above:

<blockquote><pre>
fossil info e5a734a19a9







|







 







|
|






|







2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
..
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64

<table align="right" border="1" width="33%" cellpadding="10">
<tr><td>
<h3>Executive Summary</h3>
<p>A check-in can be identified using any of the following
names:
<ul>
<li> Cryptographic hash prefix
<li> Tag or branchname
<li> Timestamp:  <i>YYYY-MM-DD HH:MM:SS</i>
<li> <i>tag-name</i> <big><b>:</b></big> <i>timestamp</i>
<li> <b>root :</b> <i>branchname</i>
<li> Special names:
<ul>
<li> <b>tip</b>
................................................................................
determines which version of the documentation to display.

Fossil provides a variety of ways to specify a check-in.  This
document describes the various methods.

<h2>Canonical Check-in Name</h2>

The canonical name of a check-in is the hash of its
[./fileformat.wiki#manifest | manifest] expressed as a 40-or-more character
lowercase hexadecimal number.  For example:

<blockquote><pre>
fossil info e5a734a19a9826973e1d073b49dc2a16aa2308f9
</pre></blockquote>

The full 40+ character hash is unwieldy to remember and type, though,
so Fossil also accepts a unique prefix of the hash, using any combination
of upper and lower case letters, as long as the prefix is at least 4
characters long.  Hence the following commands all
accomplish the same thing as the above:

<blockquote><pre>
fossil info e5a734a19a9

Changes to www/customskin.md.

175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
       without the leading "/" and without query parameters.
       Examples:  "timeline", "doc/trunk/README.txt", "wiki".

   *   **csrf_token** - A token used to prevent cross-site request forgery.

   *   **release_version** - The release version of Fossil.  Ex: "1.31"

   *   **manifest_version** - A prefix on the SHA1 check-in hash of the
       specific version of fossil that is running.  Ex: "\[47bb6432a1\]"

   *   **manifest_date** - The date of the source-code check-in for the
       version of fossil that is running.

   *   **compiler_name** - The name and version of the compiler used to
       build the fossil executable.







|







175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
       without the leading "/" and without query parameters.
       Examples:  "timeline", "doc/trunk/README.txt", "wiki".

   *   **csrf_token** - A token used to prevent cross-site request forgery.

   *   **release_version** - The release version of Fossil.  Ex: "1.31"

   *   **manifest_version** - A prefix on the check-in hash of the
       specific version of fossil that is running.  Ex: "\[47bb6432a1\]"

   *   **manifest_date** - The date of the source-code check-in for the
       version of fossil that is running.

   *   **compiler_name** - The name and version of the compiler used to
       build the fossil executable.

Changes to www/fileformat.wiki.

38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
..
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
<h2>1.0 Artifact Names</h2>

Each artifact in the repository is named by a hash of its content.
No prefixes, suffixes, or other information is added to an artifact before
the hash is computed.  The artifact name is just the (lower-case
hexadecimal) hash of the raw artifact.

Fossil supports multiple hash algorithms including SHA1 and various
lengths of SHA3.  Because an artifact can be hashed using multiple algorithms,
a single artifact can have multiple names.  Usually, Fossil knows
each artifact by just a single name called the "display name".  But it is
possible for Fossil to know an artifact by multiple names from different
hashes.  In that case, Fossil uses the display name for output, but continues
to accept the alternative names as command-line arguments or as parameters to
webpage URLs.

When referring to artifacts in using tty commands or webpage URLs, it is 
sufficient to specify a unique prefix for the artifact name.  If the input
prefix is not unique, Fossil will show an error.  Within a structural
artifact, however, all references to other artifacts must be the complete
hash.

................................................................................
Prior to Fossil version 2.0, all names were formed from the SHA1 hash of
the artifact.  The key innovation in Fossil 2.0 was adding support for
alternative hash algorithms.

<a name="structural"></a>
<h2>2.0 Structural Artifacts</h2>

A structural artifact is an artifact that has a particular format and
that is used to define the relationships between other artifacts in the
repository.
Fossil recognizes the following kinds of structural
artifacts:

<ul>
<li> [#manifest | Manifests] </li>







|
|
|
<
<
<
<
<







 







|







38
39
40
41
42
43
44
45
46
47





48
49
50
51
52
53
54
..
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
<h2>1.0 Artifact Names</h2>

Each artifact in the repository is named by a hash of its content.
No prefixes, suffixes, or other information is added to an artifact before
the hash is computed.  The artifact name is just the (lower-case
hexadecimal) hash of the raw artifact.

Fossil currently computes artifact names using either SHA1 or SHA3-256.  It
is relatively easy to add new algorithms in the future, but there are no
plans to do so at this time.  






When referring to artifacts in using tty commands or webpage URLs, it is 
sufficient to specify a unique prefix for the artifact name.  If the input
prefix is not unique, Fossil will show an error.  Within a structural
artifact, however, all references to other artifacts must be the complete
hash.

................................................................................
Prior to Fossil version 2.0, all names were formed from the SHA1 hash of
the artifact.  The key innovation in Fossil 2.0 was adding support for
alternative hash algorithms.

<a name="structural"></a>
<h2>2.0 Structural Artifacts</h2>

A structural artifact is an artifact with a particular format
that is used to define the relationships between other artifacts in the
repository.
Fossil recognizes the following kinds of structural
artifacts:

<ul>
<li> [#manifest | Manifests] </li>

Changes to www/fossil-v-git.wiki.

65
66
67
68
69
70
71
72

73
74
75
76
77
78
79
website using Fossil can be done in minutes, whereas doing the same using
Git requires hours or days.

<h3>3.2 Database</h3>

The baseline data structures for Fossil and Git are the same (modulo
formatting details).  Both systems store check-ins as immutable
objects referencing their immediate ancestors and named by their SHA1 hash.


The difference is that Git stores its objects as individual files
in the ".git" folder or compressed into
bespoke "pack-files", whereas Fossil stores its objects in a
relational ([https://www.sqlite.org/|SQLite]) database file.  To put it
another way, Git uses an ad-hoc pile-of-files key/value database whereas
Fossil uses a proven, general-purpose SQL database.  This







|
>







65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
website using Fossil can be done in minutes, whereas doing the same using
Git requires hours or days.

<h3>3.2 Database</h3>

The baseline data structures for Fossil and Git are the same (modulo
formatting details).  Both systems store check-ins as immutable
objects referencing their immediate ancestors and named by a
cryptographic hash of the check-in content.

The difference is that Git stores its objects as individual files
in the ".git" folder or compressed into
bespoke "pack-files", whereas Fossil stores its objects in a
relational ([https://www.sqlite.org/|SQLite]) database file.  To put it
another way, Git uses an ad-hoc pile-of-files key/value database whereas
Fossil uses a proven, general-purpose SQL database.  This

Changes to www/pop.wiki.

23
24
25
26
27
28
29
30

31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
repositories are fully synchronized).  The local state
for each repository is private to that repository.
The global state represents the content of the project.
The local state identifies the authorized users and
access policies for a particular repository.</p></li>

<li><p>The global state of a repository is an unordered
collection of artifacts.  Each artifact is named by

its SHA1 hash encoded in lowercase hexadecimal.
In many contexts, the name can be
abbreviated to a unique prefix.  A five- or six-character
prefix usually suffices to uniquely identify a file.</p></li>

<li><p>Because artifacts are named by their SHA1 hash, all artifacts
are immutable.  Any change to the content of an artifact also
changes the hash that forms the artifacts name, thus
creating a new artifact.  Both the old original version of the
artifact and the new change are preserved under different names.</p></li>

<li><p>It is theoretically possible for two artifacts with different
content to share the same hash.  But finding two such
artifacts is so incredibly difficult and unlikely that we
consider it to be an impossibility.</p></li>

<li><p>The signature of an artifact is the SHA1 hash of the
artifact itself, exactly as it would appear in a disk file.  No prefix
or meta-information about the artifact is added before computing
the hash.  So you can
always find the SHA1 signature of a file by using the
"sha1sum" command-line utility.</p></li>

<li><p>The artifacts that comprise the global state of a repository
are the complete global state of that repository.  The SQLite
database that holds the repository contains additional information
about linkages between artifacts, but all of that added information
can be discarded and reconstructed by rescanning the content
artifacts.</p></li>







|
>
|




|










|



|
|







23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
repositories are fully synchronized).  The local state
for each repository is private to that repository.
The global state represents the content of the project.
The local state identifies the authorized users and
access policies for a particular repository.</p></li>

<li><p>The global state of a repository is an unordered
collection of artifacts.  Each artifact is named by a
cryptographic hash (SHA1 or SHA3-256) encoded in
lowercase hexadecimal.
In many contexts, the name can be
abbreviated to a unique prefix.  A five- or six-character
prefix usually suffices to uniquely identify a file.</p></li>

<li><p>Because artifacts are named by a cryptographic hash, all artifacts
are immutable.  Any change to the content of an artifact also
changes the hash that forms the artifacts name, thus
creating a new artifact.  Both the old original version of the
artifact and the new change are preserved under different names.</p></li>

<li><p>It is theoretically possible for two artifacts with different
content to share the same hash.  But finding two such
artifacts is so incredibly difficult and unlikely that we
consider it to be an impossibility.</p></li>

<li><p>The signature of an artifact is the cryptographic hash of the
artifact itself, exactly as it would appear in a disk file.  No prefix
or meta-information about the artifact is added before computing
the hash.  So you can
always find the signature of a file by using the
"sha1sum" or "sha3sum" or similar command-line utilities.</p></li>

<li><p>The artifacts that comprise the global state of a repository
are the complete global state of that repository.  The SQLite
database that holds the repository contains additional information
about linkages between artifacts, but all of that added information
can be discarded and reconstructed by rescanning the content
artifacts.</p></li>

Changes to www/selfcheck.wiki.

49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
..
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
recoverable, fossil makes sure it can extract an exact replica
of every content file that it changes just prior to transaction
commit.  So during the course of check-in (or other repository
operation) many different files
in the repository might be modified.  Some files are simply
compressed.  Other files are delta encoded and then compressed.
While all this is going on, fossil makes a record of every file
that is encoded and the SHA1 hash of the original content of that
file.  Then just before transaction commit, fossil re-extracts
the original content of all files that were written, computes
the SHA1 checksum again, and verifies that the checksums match.
If anything does not match up, an error
message is printed and the transaction rolls back.

So, in other words, fossil always checks to make sure it can
re-extract a file before it commits a change to that file.
Hence bugs in fossil are unlikely to corrupt the repository in
a way that prevents us from extracting historical versions of
................................................................................
Manifest artifacts that define a check-in have two fields (the
R-card and Z-card) that record MD5 hashes of the manifest itself
and of all other files in the manifest.  Prior to any check-in
commit, these checksums are verified to ensure that the check-in
agrees exactly with what is on disk.  Similarly,
the repository checksum is verified after a checkout to make
sure that the entire repository was checked out correctly.
Note that these added checks use a different hash (MD5 instead
of SHA1) in order to avoid common-mode failures in the hash
algorithm implementation.


<h2>Checksums On Control Artifacts And Deltas</h2>

Every [./fileformat.wiki | control artifact] in a fossil repository
contains a "Z-card" bearing an MD5 checksum over the rest of the







|

|
|







 







|
|







49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
..
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
recoverable, fossil makes sure it can extract an exact replica
of every content file that it changes just prior to transaction
commit.  So during the course of check-in (or other repository
operation) many different files
in the repository might be modified.  Some files are simply
compressed.  Other files are delta encoded and then compressed.
While all this is going on, fossil makes a record of every file
and the SHA1 or SHA3-256 hash of the original content of that
file.  Then just before transaction commit, fossil re-extracts
the original content of all files that were written, recomputes
the hash, and verifies that the recomputed hash still matches.
If anything does not match up, an error
message is printed and the transaction rolls back.

So, in other words, fossil always checks to make sure it can
re-extract a file before it commits a change to that file.
Hence bugs in fossil are unlikely to corrupt the repository in
a way that prevents us from extracting historical versions of
................................................................................
Manifest artifacts that define a check-in have two fields (the
R-card and Z-card) that record MD5 hashes of the manifest itself
and of all other files in the manifest.  Prior to any check-in
commit, these checksums are verified to ensure that the check-in
agrees exactly with what is on disk.  Similarly,
the repository checksum is verified after a checkout to make
sure that the entire repository was checked out correctly.
Note that these added checks use a different hash algorithm (MD5)
in order to avoid common-mode failures in the hash
algorithm implementation.


<h2>Checksums On Control Artifacts And Deltas</h2>

Every [./fileformat.wiki | control artifact] in a fossil repository
contains a "Z-card" bearing an MD5 checksum over the rest of the

Changes to www/shunning.wiki.

21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
     disrupting the operation of Fossil.

<h2>Shunning</h2>

Fossil provides a mechanism called "shunning" for removing content from
a repository.

Every Fossil repository maintains a list of the SHA1 hash names of
"shunned" artifacts.
Fossil will refuse to push or pull any shunned artifact.
Furthermore, all shunned artifacts (but not the shunning list
itself) are removed from the
repository whenever the repository is reconstructed using the
"rebuild" command.








|







21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
     disrupting the operation of Fossil.

<h2>Shunning</h2>

Fossil provides a mechanism called "shunning" for removing content from
a repository.

Every Fossil repository maintains a list of the hash names of
"shunned" artifacts.
Fossil will refuse to push or pull any shunned artifact.
Furthermore, all shunned artifacts (but not the shunning list
itself) are removed from the
repository whenever the repository is reconstructed using the
"rebuild" command.

Changes to www/sync.wiki.

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
...
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
...
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
...
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
...
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422

<p>This document describes the wire protocol used to synchronize
content between two Fossil repositories.</p>

<h2>1.0 Overview</h2>

<p>The global state of a fossil repository consists of an unordered
collection of artifacts.  Each artifact is identified by its SHA1 hash
expressed as a 40-character lower-case hexadecimal string.
Synchronization is the process of sharing artifacts between
servers so that all servers have copies of all artifacts.  Because
artifacts are unordered, the order in which artifacts are received
at a server is inconsequential.  It is assumed that the SHA1 hashes
of artifacts are unique - that every artifact has a different SHA1 hash.
To a first approximation, synchronization proceeds by sharing lists
SHA1 hashes of available artifacts, then sharing those artifacts that
are not found on one side or the other of the connection.  In practice,
a repository might contain millions of artifacts.  The list of
SHA1 hashes for this many artifacts can be large.  So optimizations are
employed that usually reduce the number of SHA1 hashes that need to be
shared to a few hundred.</p>

<p>Each repository also has local state.  The local state determines
the web-page formatting preferences, authorized users, ticket formats,
and similar information that varies from one repository to another.
The local state is not using transferred during a sync.  Except,
some local state is transferred during a [/help?cmd=clone|clone]
................................................................................
or the artifact delta is the first <i>size</i> bytes of the
x-fossil content that immediately follow the newline that
terminates the file card.
</p>

<p>The first argument of a file card is the ID of the artifact that
is being transferred.  The artifact ID is the lower-case hexadecimal
representation of the SHA1 hash of the artifact.
The last argument of the file card is the number of bytes of
payload that immediately follow the file card.  If the file
card has only two arguments, that means the payload is the
complete content of the artifact.  If the file card has three
arguments, then the payload is a delta and second argument is
the ID of another artifact that is the source of the delta.</p>

................................................................................
<blockquote>
<b>cfile</b> <i>artifact-id usize csize</i> <b>\n</b> <i>content</i><br>
<b>cfile</b> <i>artifact-id delta-artifact-id usize csize</i> <b>\n</b> <i>content</i><br>
</blockquote>

<p>The first argument of the cfile card is the ID of the artifact that
is being transferred.  The artifact ID is the lower-case hexadecimal
representation of the SHA1 hash of the artifact.  The second argument of
the cfile card is the original size in bytes of the artifact.  The last
argument of the cfile card is the number of compressed bytes of payload
that immediately follow the cfile card.  If the cfile card has only
three arguments, that means the payload is the complete content of the
artifact.  If the cfile card has four arguments, then the payload is a
delta and the second argument is the ID of another artifact that is the
source of the delta and the third argument is the original size of the
................................................................................

<blockquote>
<b>uvfile</b> <i>name mtime hash size flags</i> <b>\n</b> <i>content</i>
</blockquote>

<p>The <i>name</i> field is the name of the unversioned file.  The
<i>mtime</i> is the last modification time of the file in seconds
since 1970.  The <i>hash</i> field is the SHA1 hash of the content
for the unversioned file, or "<b>-</b>" for deleted content.
The <i>size</i> field is the (uncompressed) size of the content
in bytes.  The <i>flags</i> field is an integer which is interpreted
as an array of bits.  The 0x0004 bit of <i>flags</i> indicates that
the <i>content</i> is to be omitted.  The content might be omitted if
it is too large to transmit, or if the sender merely wants to update the
modification time of the file without changing the files content.
................................................................................
<blockquote>
<b>uvigot</b> <i>name mtime hash size</i>
</blockquote>

<p>The <i>name</i> argument is the name of an unversioned file.
The <i>mtime</i> is the last modification time of the unversioned file
in seconds since 1970.
The <i>hash</i> is the SHA1 hash of the unversioned file content, or
"<b>-</b>" if the file has been deleted.
The <i>size</i> is the uncompressed size of the file in bytes.

<p>When the server sees a "pragma uv-hash" card for which the hash
does not match, it sends uvigot cards for every unversioned file that it
holds.  The client will use this information to figure out which
unversioned files need to be synchronized.
The server might also send a uvigot card when it receives a uvgimme card







|
|



|
|

|
|
|
|
|







 







|







 







|







 







|







 







|
|







2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
...
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
...
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
...
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
...
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422

<p>This document describes the wire protocol used to synchronize
content between two Fossil repositories.</p>

<h2>1.0 Overview</h2>

<p>The global state of a fossil repository consists of an unordered
collection of artifacts.  Each artifact is identified by a cryptographic
hash of its content, expressed as a lower-case hexadecimal string.
Synchronization is the process of sharing artifacts between
servers so that all servers have copies of all artifacts.  Because
artifacts are unordered, the order in which artifacts are received
at a server is inconsequential.  It is assumed that the hash names
of artifacts are unique - that every artifact has a different hash.
To a first approximation, synchronization proceeds by sharing lists
hash values for available artifacts, then sharing the content of artifacts
whose names are missing from one side or the other of the connection.  
In practice, a repository might contain millions of artifacts.  The list of
hash names for this many artifacts can be large.  So optimizations are
employed that usually reduce the number of hashes that need to be
shared to a few hundred.</p>

<p>Each repository also has local state.  The local state determines
the web-page formatting preferences, authorized users, ticket formats,
and similar information that varies from one repository to another.
The local state is not using transferred during a sync.  Except,
some local state is transferred during a [/help?cmd=clone|clone]
................................................................................
or the artifact delta is the first <i>size</i> bytes of the
x-fossil content that immediately follow the newline that
terminates the file card.
</p>

<p>The first argument of a file card is the ID of the artifact that
is being transferred.  The artifact ID is the lower-case hexadecimal
representation of the name hash for the artifact.
The last argument of the file card is the number of bytes of
payload that immediately follow the file card.  If the file
card has only two arguments, that means the payload is the
complete content of the artifact.  If the file card has three
arguments, then the payload is a delta and second argument is
the ID of another artifact that is the source of the delta.</p>

................................................................................
<blockquote>
<b>cfile</b> <i>artifact-id usize csize</i> <b>\n</b> <i>content</i><br>
<b>cfile</b> <i>artifact-id delta-artifact-id usize csize</i> <b>\n</b> <i>content</i><br>
</blockquote>

<p>The first argument of the cfile card is the ID of the artifact that
is being transferred.  The artifact ID is the lower-case hexadecimal
representation of the name hash for the artifact.  The second argument of
the cfile card is the original size in bytes of the artifact.  The last
argument of the cfile card is the number of compressed bytes of payload
that immediately follow the cfile card.  If the cfile card has only
three arguments, that means the payload is the complete content of the
artifact.  If the cfile card has four arguments, then the payload is a
delta and the second argument is the ID of another artifact that is the
source of the delta and the third argument is the original size of the
................................................................................

<blockquote>
<b>uvfile</b> <i>name mtime hash size flags</i> <b>\n</b> <i>content</i>
</blockquote>

<p>The <i>name</i> field is the name of the unversioned file.  The
<i>mtime</i> is the last modification time of the file in seconds
since 1970.  The <i>hash</i> field is the hash of the content
for the unversioned file, or "<b>-</b>" for deleted content.
The <i>size</i> field is the (uncompressed) size of the content
in bytes.  The <i>flags</i> field is an integer which is interpreted
as an array of bits.  The 0x0004 bit of <i>flags</i> indicates that
the <i>content</i> is to be omitted.  The content might be omitted if
it is too large to transmit, or if the sender merely wants to update the
modification time of the file without changing the files content.
................................................................................
<blockquote>
<b>uvigot</b> <i>name mtime hash size</i>
</blockquote>

<p>The <i>name</i> argument is the name of an unversioned file.
The <i>mtime</i> is the last modification time of the unversioned file
in seconds since 1970.
The <i>hash</i> is the SHA1 or SHA3-256 hash of the unversioned file 
content, or "<b>-</b>" if the file has been deleted.
The <i>size</i> is the uncompressed size of the file in bytes.

<p>When the server sees a "pragma uv-hash" card for which the hash
does not match, it sends uvigot cards for every unversioned file that it
holds.  The client will use this information to figure out which
unversioned files need to be synchronized.
The server might also send a uvigot card when it receives a uvgimme card

Changes to www/tech_overview.wiki.

171
172
173
174
175
176
177
178

179
180
181
182
183
184
185
...
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
All of the original uncompressed and undeltaed artifacts can be extracted
from a Fossil repository database using
the [/help/deconstruct | fossil deconstruct]
command. Individual artifacts can be extracted using the
[/help/artifact | fossil artifact] command.
When accessing the repository database using raw SQL and the
[/help/sqlite3 | fossil sql] command, the extension function
"<tt>content()</tt>" with a single argument which is the SHA1 hash

of an artifact will return the complete undeleted and uncompressed
content of that artifact.

Going the other way, the [/help/reconstruct | fossil reconstruct]
command will scan a directory hierarchy and add all files found to
a new repository database.  The [/help/import | fossil import] command
works by reading the input git-fast-export stream and using it to construct
................................................................................

The set of canonical artifacts for a project - the global state for the
project - is intended to be an append-only database.  In other words,
new artifacts can be added but artifacts can never be removed.  But
it sometimes happens that inappropriate content is mistakenly or
maliciously added to a repository.  The only way to get rid of
the undesired content is to [./shunning.wiki | "shun"] it.
The "shun" table in the repository database records the SHA1 hash of
all shunned artifacts.

The shun table can be pushed or pulled using
the [/help/config | fossil config] command with the "shun" AREA argument.
The shun table is also copied during a [/help/clone | clone].

<a name='localdb'></a>







|
>







 







|







171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
...
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
All of the original uncompressed and undeltaed artifacts can be extracted
from a Fossil repository database using
the [/help/deconstruct | fossil deconstruct]
command. Individual artifacts can be extracted using the
[/help/artifact | fossil artifact] command.
When accessing the repository database using raw SQL and the
[/help/sqlite3 | fossil sql] command, the extension function
"<tt>content()</tt>" with a single argument which is the SHA1 or
SHA3-256 hash
of an artifact will return the complete undeleted and uncompressed
content of that artifact.

Going the other way, the [/help/reconstruct | fossil reconstruct]
command will scan a directory hierarchy and add all files found to
a new repository database.  The [/help/import | fossil import] command
works by reading the input git-fast-export stream and using it to construct
................................................................................

The set of canonical artifacts for a project - the global state for the
project - is intended to be an append-only database.  In other words,
new artifacts can be added but artifacts can never be removed.  But
it sometimes happens that inappropriate content is mistakenly or
maliciously added to a repository.  The only way to get rid of
the undesired content is to [./shunning.wiki | "shun"] it.
The "shun" table in the repository database records the hash values for
all shunned artifacts.

The shun table can be pushed or pulled using
the [/help/config | fossil config] command with the "shun" AREA argument.
The shun table is also copied during a [/help/clone | clone].

<a name='localdb'></a>

Changes to www/theory1.wiki.

36
37
38
39
40
41
42
43

44
45
46
47
48
49
50
An artifact is a list of bytes - a "file" in the usual manner of thinking.
Many artifacts are simply the content of source files that have
been checked into the Fossil repository.  Call these "content artifacts".
Other artifacts, known as
"control artifacts", contain ASCII text in a particular format that
defines relationships between other artifacts, such as which
content artifacts that go together to form a particular version of the
project.  Each artifact is named by its SHA1 hash and is thus immutable.

Artifacts can be added to the database but not removed (if we ignore
the exceptional case of [./shunning.wiki | shunning].)  Repositories
synchronize by computing the union of their artifact sets.  SQL and
relation theory play no role in any of this.

SQL enters the picture only in the implementation details.  The current
implementation of Fossil stores each artifact as a BLOB in an SQLite







|
>







36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
An artifact is a list of bytes - a "file" in the usual manner of thinking.
Many artifacts are simply the content of source files that have
been checked into the Fossil repository.  Call these "content artifacts".
Other artifacts, known as
"control artifacts", contain ASCII text in a particular format that
defines relationships between other artifacts, such as which
content artifacts that go together to form a particular version of the
project.  Each artifact is named by its SHA1 or SHA3-256 hash and is
thus immutable.
Artifacts can be added to the database but not removed (if we ignore
the exceptional case of [./shunning.wiki | shunning].)  Repositories
synchronize by computing the union of their artifact sets.  SQL and
relation theory play no role in any of this.

SQL enters the picture only in the implementation details.  The current
implementation of Fossil stores each artifact as a BLOB in an SQLite

Changes to www/webpage-ex.md.

121
122
123
124
125
126
127
128
129
130
131
132

  *  <a target='_blank' class='exbtn'
     href='$ROOT/bigbloblist'>Example</a>
     The largest objects in the repository.

  *  <a target='_blank' class='exbtn'
     href='$ROOT/hash-collisions'>Example</a>
     SHA1 prefix collisions

  *  <a target='_blank' class='exbtn'
     href='$ROOT/sitemap'>Example</a>
     The "sitemap" containing links to many other pages







|




121
122
123
124
125
126
127
128
129
130
131
132

  *  <a target='_blank' class='exbtn'
     href='$ROOT/bigbloblist'>Example</a>
     The largest objects in the repository.

  *  <a target='_blank' class='exbtn'
     href='$ROOT/hash-collisions'>Example</a>
     Hash prefix collisions

  *  <a target='_blank' class='exbtn'
     href='$ROOT/sitemap'>Example</a>
     The "sitemap" containing links to many other pages

Changes to www/whyusefossil.wiki.

230
231
232
233
234
235
236
237

238
239
240
241
242
243
244
245
246
247
248
249
250
251
          to the first check-in of a branch.  The name assigned by this
          special tag automatically propagates to all direct children.
       </ul>
  </ul>
<li><p><b>Why version control is important (reprise)</b>
  <ol type="A">
  <li><p>Every check-in and every individual file has a unique name - its
      SHA1 hash.   Team members can unambiguously identify any specific

      version of the overall project or any specific version of an
      individual file.
  <li><p>Any historical version of the whole project or of any individual
      file can be easily recreated at any time and by any team member.
  <li><p>Accidental changes to files can be detected by recomputing their
      SHA1 hash.
  <li><p>Files of unknown origin can be identified using their SHA1 hash.
  <li><p>Developers are able to work in parallel, review each others work,
      and easily merge their changes together.  External revisions to
      the baseline can be easily incorporated into the latest changes.
  <li><p>Developers can follow experimental lines of development,  then
      revert back to an earlier stable version if the experiment does
      not work out.  Creativity is enhanced by allowing crazy ideas to
      be investigated without destabilizing the project.







|
>





|
|







230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
          to the first check-in of a branch.  The name assigned by this
          special tag automatically propagates to all direct children.
       </ul>
  </ul>
<li><p><b>Why version control is important (reprise)</b>
  <ol type="A">
  <li><p>Every check-in and every individual file has a unique name - its
      SHA1 or SHA3-256 hash.  Team members can unambiguously identify
      any specific
      version of the overall project or any specific version of an
      individual file.
  <li><p>Any historical version of the whole project or of any individual
      file can be easily recreated at any time and by any team member.
  <li><p>Accidental changes to files can be detected by recomputing their
      cryptographic hash.
  <li><p>Files of unknown origin can be identified using their hash.
  <li><p>Developers are able to work in parallel, review each others work,
      and easily merge their changes together.  External revisions to
      the baseline can be easily incorporated into the latest changes.
  <li><p>Developers can follow experimental lines of development,  then
      revert back to an earlier stable version if the experiment does
      not work out.  Creativity is enhanced by allowing crazy ideas to
      be investigated without destabilizing the project.