Fossil

Check-in [81e61d78]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Improvements to the document that describes the delta format.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256:81e61d78fdfdd2ee471217054b6de89e26d244ebf9671e6a842c47d252ba187d
User & Date: drh 2019-03-02 20:33:24
Context
2019-03-04
13:58
Improvements to Append Wiki privilege suggested by jakesfr. check-in: 3790dbbd user: drh tags: trunk
2019-03-02
20:33
Improvements to the document that describes the delta format. check-in: 81e61d78 user: drh tags: trunk
18:18
Enable make install without first calling make workflow by adjusting the install target prerequisites. This allows make install to be called on a fresh clone/checkout of Fossil because otherwise OBJDIR is missing and make install fails. check-in: 904eb8a5 user: andybradford tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to www/delta_format.wiki.

1
2
3
4
5
6
7
8
9
10
11
12
13
14

15
16
17
18
19
20

21
22

23
24
25



26




































27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
..
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
...
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
...
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
<title>Fossil Delta Format</title>
<nowiki>
<h2>Abstract</h2>

<p>Fossil achieves efficient storage and low-bandwidth synchronization
through the use of delta-compression.  Instead of storing
or transmitting the complete content of an artifact, fossil stores or
transmits only the changes relative to a related artifact.
</p>

<p>This document describes the delta-encoding format used by fossil.
The intended audience is developers working on either
<a href="index.wiki">fossil</a> itself, or on tools compatible with
fossil.</p>


<p>Note that the delta-encoding is not a fundamental element of the
state of a fossil repository.  A state of a fossil repository is
defined by the uncompressed and undeltaed content of all artifacts.
The fact the artifacts
are stored on disk using this delta-encoding format is merely an

optimization.  One could, in theory, create an entirely new and
compatible implementation of fossil that used a different delta-encoding

or did no delta-encoding at all.  However, experience has shown that
the delta-encoding described here is both efficient to compute and
results in very small deltas, so its continued use is recommended.</p>








































<a name="structure"></a><h2>1.0 Structure</h2>
<img src="delta1.gif" align="left" hspace="10">

<p>A delta consists of three parts, a "header", a "trailer", and a
"segment-list" between them.</p>

<p>Both header and trailer provide information about the target
helping the decoder, and the segment-list describes how the target can
be constructed from the original.</p>

<a name="header"></a><h3>1.1 Header</h3>
<img src="delta6.gif" align="left" hspace="10">

<p>The header consists of a single number followed by a newline
character (ASCII 0x0a). The number is the length of the target in
bytes.</p>

<p>This means that, given a delta, the decoder can compute the size of
the target (and allocate any necessary memory based on that) by simply
reading the first line of the delta and decoding the number found
there. In other words, before it has to decode everything else.</p>

<a name="trailer"></a><h3>1.2 Trailer</h3>
<img src="delta5.gif" align="left" hspace="10">

<p>The trailer consists of a single number followed by a semicolon (ASCII
0x3b). This number is a checksum of the target and can be used by a
decoder to verify that the delta applied correctly, reconstructing the
target from the original.</p>

................................................................................
2^32-1. A target whose length is not a multiple of 4 is padded with
0-bytes (ASCII 0x00) at the end.</p>

<p>By putting this information at the end of the delta a decoder has
it available immediately after the target has been reconstructed
fully.</p>

<a name="slist"></a><h3>1.3 Segment-List</h3>
<img src="delta2.gif" align="left" hspace="10">

<p>The segment-list of a delta describes how to create the target from
the original by a combination of inserting literal byte-sequences and
copying ranges of bytes from the original. This is where the
compression takes place, by encoding the large common parts of
original and target in small copy instructions.</p>

<p>The target is constructed from beginning to end, with the data
generated by each instruction appended after the data of all previous
instructions, with no gaps.</p>

<a name="insertlit"></a><h4>1.3.1 Insert Literal</h4>

<p>A literal is specified by two elements, the size of the literal in
bytes, and the bytes of the literal itself.</p>

<img src="delta4.gif" align="left" hspace="10">
<p>The length is written first, followed by a colon character (ASCII
0x3a), followed by the bytes of the literal.</p>

<a name="copyrange"></a><h4>1.3.2 Copy Range</h4>

<p>A range to copy is specified by two numbers, the offset of the
first byte in the original to copy, and the size of the range, in
bytes. The size zero is special, its usage indicates that the range
extends to the end of the original.</p>

<img src="delta3.gif" align="left" hspace="10">
<p>The length is written first, followed by an "at" character (ASCII
0x40), then the offset, followed by a comma (ASCII 0x2c).</p>

<a name="intcoding"></a><h2>2.0 Encoding of integers</h2>

<p>
The format currently handles only 32 bit integer numbers. They are
written base-64 encoded, MSB first, and without leading
"0"-characters, except if they are significant (i.e. 0 => "0").
</p>

<p>
The base-64 coding is described in
<a href="http://www.ietf.org/rfc/rfc3548.txt">RFC 3548</a>.
</p>

<a name="examples"></a><h2>3.0 Examples</h2>

<a name="examplesint"></a><h3>3.1 Integer encoding</h3>

<table border=1>
<tr>
<th>Value</th>
<th>Encoding</th>
</tr>
<tr>
................................................................................
</tr>
<tr>
<td>-1101438770</td>
<td>2zMM3E</td>
</tr>
</table>

<a name="examplesdelta"></a><h3>3.2 Delta encoding</h3>

<p>An example of a delta using the specified encoding is:</p>

<table border=1><tr><td><pre>
1Xb
4E@0,2:thFN@4C,6:scenda1B@Jd,6:scenda5x@Kt,6:pieces79@Qt,F: Example: eskil~E@Y0,2zMM3E;</pre>
</td></tr></table>
................................................................................

  *  Ticketing interface (expand this bullet)

</pre></td></tr></table>



<a name="notes"></a><h2>Notes</h2>

<ul>
<li>Pure text files generate a pure text delta.
</li>
<li>Binary files generate a delta that may contain some binary data.
</li>
<li>The delta encoding does not attempt to compress the content.
It was considered to be much
more sensible to do compression using a separate general-purpose
compression library, like <a href="http://www.zlib.net">zlib</a>.
</li>
</ul>

|
<



|



|

|
|
>

<
|
|
|
<
>
|
<
>
|
|
|
>
>
>

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|









|











|







 







|












|








|










|












|

|







 







|







 







|












1
2

3
4
5
6
7
8
9
10
11
12
13
14
15

16
17
18

19
20

21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
..
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
...
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
...
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
<title>Fossil Delta Format</title>
<h1>1.0 Overview</h1>


<p>Fossil achieves efficient storage and low-bandwidth synchronization
through the use of delta-compression.  Instead of storing
or transmitting the complete content of an artifact, Fossil stores or
transmits only the changes relative to a related artifact.
</p>

<p>This document describes the delta-encoding format used by Fossil.
The intended audience is developers working on either
<a href="index.wiki">Fossil</a> itself, or on tools compatible with
Fossil. Understanding of this document is <em>not</em> required for
ordinary users of Fossil.  This document is an implementation detail.</p>


<p>This document only describes the delta file format.  A 
[./delta_encoder_algorithm.wiki|separate document] describes one possible
algorithm for generating deltas in this format.</p>


<h2>1.1 Sample Software And Analysis Tools</h2>


<p>The core routines used to generate and apply deltas in this format
are contained in the [../src/delta.c|delta.c] source file.  Interface
logic, including "test-*" commands to directly manipulate deltas are
contained in the [../src/deltacmd.c|deltacmd.c] source file.  SQL functions
to create, apply, and analyze deltas are implemented by code in the
[../src/deltafunc.c|deltafunc.c] source file.

<p>The following command-line tools are available to create and apply
deltas and to test the delta logic:

   *   [/help?cmd=test-delta|fossil test-delta] &rarr; Run self-tests of
       the delta logic

   *   [/help?cmd=test-delta-create|fossil test-delta-create X Y] &rarr; compute
       a delta that converts file X into file Y.  Output that delta.

   *   [/help?cmd=test-delta-apply|fossil test-delta-apply X D] &rarr; apply
       delta D to input file X and output the result.

   *   [/help?cmd=test-delta-analyze|fossil test-delta-analyze X Y] &rarr; compute
       and delta that converts file X into file Y but instead of writing the
       delta to output, write performance information about the delta.

<p>When running the [/help?cmd=sqlite3|fossil sql] command to get an
interactive SQL session connected to the repository, the following
additional SQL functions are provided:

   *   <b>delta_create(</b><i>X</i><b>,</b><i>Y</i><b>)</b> &rarr;
       Compute a data that carries blob X into blob Y and return that delta
       as a blob.

   *   <b>delta_apply(</b><i>X</i><b>,</b><i>D</i><b>)</b> &rarr;
       Apply delta D to input blob X return a new blob which is the result.


   *   <b>delta_output_size(</b><i>D</i>)</b> &rarr;
       Return the size of the output that would result from applying delta D.

   *   <b>delta_parse(</b><i>D</i>)</b> &rarr; This is a table-valued function
       that returns one row for the header, for the trailer, and for each segment
       in delta D.


<a name="structure"></a><h1>2.0 Structure</h1>
<img src="delta1.gif" align="left" hspace="10">

<p>A delta consists of three parts, a "header", a "trailer", and a
"segment-list" between them.</p>

<p>Both header and trailer provide information about the target
helping the decoder, and the segment-list describes how the target can
be constructed from the original.</p>

<a name="header"></a><h2>2.1 Header</h2>
<img src="delta6.gif" align="left" hspace="10">

<p>The header consists of a single number followed by a newline
character (ASCII 0x0a). The number is the length of the target in
bytes.</p>

<p>This means that, given a delta, the decoder can compute the size of
the target (and allocate any necessary memory based on that) by simply
reading the first line of the delta and decoding the number found
there. In other words, before it has to decode everything else.</p>

<a name="trailer"></a><h2>2.2 Trailer</h2>
<img src="delta5.gif" align="left" hspace="10">

<p>The trailer consists of a single number followed by a semicolon (ASCII
0x3b). This number is a checksum of the target and can be used by a
decoder to verify that the delta applied correctly, reconstructing the
target from the original.</p>

................................................................................
2^32-1. A target whose length is not a multiple of 4 is padded with
0-bytes (ASCII 0x00) at the end.</p>

<p>By putting this information at the end of the delta a decoder has
it available immediately after the target has been reconstructed
fully.</p>

<a name="slist"></a><h2>2.3 Segment-List</h2>
<img src="delta2.gif" align="left" hspace="10">

<p>The segment-list of a delta describes how to create the target from
the original by a combination of inserting literal byte-sequences and
copying ranges of bytes from the original. This is where the
compression takes place, by encoding the large common parts of
original and target in small copy instructions.</p>

<p>The target is constructed from beginning to end, with the data
generated by each instruction appended after the data of all previous
instructions, with no gaps.</p>

<a name="insertlit"></a><h3>2.3.1 Insert Literal</h3>

<p>A literal is specified by two elements, the size of the literal in
bytes, and the bytes of the literal itself.</p>

<img src="delta4.gif" align="left" hspace="10">
<p>The length is written first, followed by a colon character (ASCII
0x3a), followed by the bytes of the literal.</p>

<a name="copyrange"></a><h3>2.3.2 Copy Range</h3>

<p>A range to copy is specified by two numbers, the offset of the
first byte in the original to copy, and the size of the range, in
bytes. The size zero is special, its usage indicates that the range
extends to the end of the original.</p>

<img src="delta3.gif" align="left" hspace="10">
<p>The length is written first, followed by an "at" character (ASCII
0x40), then the offset, followed by a comma (ASCII 0x2c).</p>

<a name="intcoding"></a><h1>3.0 Encoding of integers</h1>

<p>
The format currently handles only 32 bit integer numbers. They are
written base-64 encoded, MSB first, and without leading
"0"-characters, except if they are significant (i.e. 0 => "0").
</p>

<p>
The base-64 coding is described in
<a href="http://www.ietf.org/rfc/rfc3548.txt">RFC 3548</a>.
</p>

<a name="examples"></a><h1>4.0 Examples</h1>

<a name="examplesint"></a><h2>4.1 Integer encoding</h2>

<table border=1>
<tr>
<th>Value</th>
<th>Encoding</th>
</tr>
<tr>
................................................................................
</tr>
<tr>
<td>-1101438770</td>
<td>2zMM3E</td>
</tr>
</table>

<a name="examplesdelta"></a><h2>4.2 Delta encoding</h2>

<p>An example of a delta using the specified encoding is:</p>

<table border=1><tr><td><pre>
1Xb
4E@0,2:thFN@4C,6:scenda1B@Jd,6:scenda5x@Kt,6:pieces79@Qt,F: Example: eskil~E@Y0,2zMM3E;</pre>
</td></tr></table>
................................................................................

  *  Ticketing interface (expand this bullet)

</pre></td></tr></table>



<a name="notes"></a><h1>Notes</h1>

<ul>
<li>Pure text files generate a pure text delta.
</li>
<li>Binary files generate a delta that may contain some binary data.
</li>
<li>The delta encoding does not attempt to compress the content.
It was considered to be much
more sensible to do compression using a separate general-purpose
compression library, like <a href="http://www.zlib.net">zlib</a>.
</li>
</ul>