Fossil

Check-in [7de2410f]
Login

Check-in [7de2410f]

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Added named anchors to the "Image Format vs Fossil Repo Size" doc so I can refer to one in particular.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 7de2410f742af79d1122264fc58e0490b30f4a5626969cfc534f19feeb9303e0
User & Date: wyoung 2023-01-03 20:13:24
Context
2023-01-05
17:21
Add the "-f|--file" flag to the "whatis" command which consist to search for any other files in the repo with the exact same content as the given file. ... (check-in: a821cbf5 user: mgagnon tags: trunk)
2023-01-03
20:13
Added named anchors to the "Image Format vs Fossil Repo Size" doc so I can refer to one in particular. ... (check-in: 7de2410f user: wyoung tags: trunk)
2023-01-02
16:12
When applying a patch, if the file rename fails, make that just a warning not a fatal error, as the warning might be due to file renames on a prior merge. Fix for ticket [21037bfc1296dabc]. ... (check-in: f0133846 user: drh tags: trunk)
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to www/image-format-vs-repo-size.md.

36
37
38
39
40
41
42
43

44
45
46
47
48
49
50
36
37
38
39
40
41
42

43
44
45
46
47
48
49
50







-
+








[dc]:  ./delta_format.wiki
[hf]:  https://en.wikipedia.org/wiki/Hash_function
[prn]: https://en.wikipedia.org/wiki/Pseudorandomness
[zl]:  http://www.zlib.net/


## Affected File Formats
## <a id="formats"></a>Affected File Formats

In this article’s core experiment, we use 2D image file formats, but
this article’s advice also applies to many other file types. For just a
few examples out of what must be thousands:

*   **Microsoft Office**: The [OOXML document format][oox] used from
    Office 2003 onward (`.docx`, `.xlsx`, `.pptx`, etc.) are Zip files
71
72
73
74
75
76
77
78

79
80
81
82
83
84
85
71
72
73
74
75
76
77

78
79
80
81
82
83
84
85







-
+







[jcl]: https://en.wikipedia.org/wiki/Java_(programming_language)
[odf]: https://en.wikipedia.org/wiki/OpenDocument
[oox]: https://en.wikipedia.org/wiki/Office_Open_XML
[wi]:  https://en.wikipedia.org/wiki/Windows_Installer



## Demonstration
## <a id="demo"></a>Demonstration

The companion `image-format-vs-repo-size.ipynb` file ([download][nbd],
[preview][nbp]) is a [JupyterLab][jl] notebook implementing the following
experiment:

1.  Create a new minimum-size Fossil repository. Save this initial size.

111
112
113
114
115
116
117
118

119
120
121
122
123
124
125
111
112
113
114
115
116
117

118
119
120
121
122
123
124
125







-
+







[im]:  https://www.imagemagick.org/
[jl]:  https://jupyter.org/
[nbd]: ./image-format-vs-repo-size.ipynb
[nbp]: https://nbviewer.jupyter.org/urls/fossil-scm.org/fossil/doc/trunk/www/image-format-vs-repo-size.ipynb
[wp]:  http://wand-py.org/


## Results
## <a id="results"></a>Results

Running the notebook gives a bar chart something like⁴ this:

![results bar chart](./image-format-vs-repo-size.svg)

There are a few key things we want to draw your attention to in that
chart:
149
150
151
152
153
154
155
156

157
158
159
160
161
162
163
149
150
151
152
153
154
155

156
157
158
159
160
161
162
163







-
+







    tradeoff is to choose JPEG for repositories where each image will
    change fewer than that number of times.


[mce]: https://en.wikipedia.org/wiki/Monte_Carlo_method


## Automated Recompression
## <a id="makefile"></a>Automated Recompression

Since programs that produce and consume binary-compressed data files
often make it either difficult or impossible to work with the
uncompressed form, we want an automated method for producing the
uncompressed form to make Fossil happy while still having the compressed
form to keep our content creation applications happy.  This `Makefile`
should⁶ do that for BMP, PNG, SVG, and XLSX files:
240
241
242
243
244
245
246
247

248
249
250
251
252
253
254
240
241
242
243
244
245
246

247
248
249
250
251
252
253
254







-
+







    with the Excel case, there is no simple “file base name to directory
    name” mapping, so we just created the `-big` to `-small` name scheme
    here.

----


## Footnotes and Digressions
## <a id="notes"></a>Footnotes and Digressions

1.  This problem is not Fossil-specific.  Several other programs also do
    delta compression, so they’ll also be affected by this problem:
    [rsync][rs], [Unison][us], [Git][git], etc. You should take this
    article’s advice when using all such programs, not just Fossil.

    When using file copying and synchronization programs *without* delta