fossil databse gets corrupt by logging
(1) By anonymous on 2019-07-11 10:04:29
Sometimes, rarely, but every now and then, my fossil database gets corrupt after I used it from fossil cgi (in a webbrowser). The start of the file is not the usual sqlite stanza anymore ("SQLite format 3 ..."), it then is some logging (like "no-match [REQUEST_URI] env-match [SCRIPT_NAME] = [/fossil.cgi] env-match [PATH_INFO] = [/] no-match [HTTP_COOKIE] no-match [QUERY_STRING]"). The file size is unchanged, only the beginning is crippled/overwritten by log content. This seems to happen when I set debug: FILE Causing debugging information to be written into FILE. errorlog: FILE Warnings, errors, and panics written to FILE. in my fossil.cfg for cgi, i. e. if I enable logging. It seems that some content which should be written to "debug: FILE" is now at the beginning of the fossil file itself. I suspect this has something to do with backoffice, as I can see in the logfile around that time that the cgi is waiting for backoffice to finish its job to get access to the database. Restoring from backup was always my solution, I had it running often enough to never loose anything. Is this a known bug? For now I just disabled logging, hoping that this will fix it. Maybe I can also just disable backoffice. Nevertheless, probably it might be a good idea to catch that bug? Thank you.
(2) By Stephan Beal (stephan) on 2019-07-11 10:12:52 in reply to 1 [link]
> Is this a known bug? Definitely not. My suspicion is that your repo is being hosted from a USB stick, SD card, or SMB network share, all of which are known to be problematic from time to time (or more often). What fossil version are you using? Sidebar/trivia: there was an ancient problem where any `assert()` triggered in C code could indeed overwrite part of the database file(!!!), but that was fixed ages ago (5+ years). You mention backoffice, which is a new feature, so that assert problem is not what's affecting you.
(3) By anonymous on 2019-07-11 10:23:27 in reply to 2 [link]
- The version is quite new: fossil version 2.9 [5b6be64760] 2019-06-14 00:24:04 UTC - Repo is on ext4, locally.
(4) By Richard Hipp (drh) on 2019-07-11 10:44:38 in reply to 1 [link]
> It seems that some content which should be written to "debug: FILE" is now at the beginning of the fossil file itself. Perhaps the backoffice is writing to a file descriptor that has been closed, but then later reopened by SQLite. I have to be away from the office for a couple of hours, but I will look into this when I get back.
(5) By Richard Hipp (drh) on 2019-07-11 12:20:54 in reply to 4 [link]
Please try patch <https://www.fossil-scm.org/fossil/info/458ced35354314b1> and report back whether or not this seems to clear the problem. Thanks.
(6) By anonymous on 2019-07-11 14:16:32 in reply to 5 [link]
Thank you, Richard. This looks like you found it! -> recompiled, testing now. (As I wrote earlier, it only corrupts rarely for me, so it might take some time until I can reliably report "success") BTW: Shouldn't fossil better do a "double-fork"? - [http://thelinuxjedi.blogspot.com/2014/02/why-use-double-fork-to-daemonize.html](http://thelinuxjedi.blogspot.com/2014/02/why-use-double-fork-to-daemonize.html) - [http://thinkiii.blogspot.com/2009/12/double-fork-to-avoid-zombie-process.html](http://thinkiii.blogspot.com/2009/12/double-fork-to-avoid-zombie-process.html)
(7) By Richard Hipp (drh) on 2019-07-11 14:24:24 in reply to 6 [link]
I'd never heard of a "double-fork" before. Sounds like something that needs to be added to the backoffice implementation. This might clear some of the problems that (for example) OpenBSD was having. But, since that is a potentially destabilizing change, I'll wait to do that *after* the 2.9 release, which will happen soon (perhaps this weekend).
(8) By Andy Bradford (andybradford) on 2019-07-12 01:57:01 in reply to 6 [link]
What problem would actually be solved in Fossil by using the double-fork as suggested in these articles? Thanks, Andy
(9) By Andy Bradford (andybradford) on 2019-07-12 02:08:23 in reply to 6 [link]
Also, I believe that using a double-fork in a daemon breaks the ability for daemon monitoring services to function properly. For example, the double-fork breaks things like runit, daemontools, and anything else that is built on a similar model. If Fossil does need a double-fork to daemonize, perhaps it should be optional? For example, I have in a daemontools run script the following: #!/bin/sh exec 2>&1 exec envdir ./env setuidgid _fossil fossil server --repolist /repos If "fossil server" were to employ the double-fork mechanism to daemonize, this would break my ability to effectively manage this service. Or have I misunderstood the suggestion? Thanks, Andy
(10) By Warren Young (wyoung) on 2019-07-12 02:36:03 in reply to 9 [link]
> I believe that using a double-fork in a daemon breaks the ability for daemon monitoring services to function properly. systemd can handle double-forked daemons. :) > have I misunderstood the suggestion? Is this proposal not specific to the backoffice, and nothing to do with daemonization of the Fossil server at all? Fossil already does the right thing with regards to daemonization: it doesn't fork itself into the background on `fossil server`; it stays in the foreground if that's how its caller started it. The thing to avoid is automatic and unconditional forking of the process into the background, because that takes flexibility away from the caller. If a process's caller wants it double-forked, it should do the double forking itself! The backoffice can't depend on someone else — e.g. your daemontools run script — to do that on its behalf, so it has to arrange to do it itself. I also don't see that double-forking solves this file handle confusion. My understanding of double forking is that it's just about avoiding zombie processes. It's easier to double-fork than to ensure that you do all of the `wait()` and `SIGCHLD` stuff properly.
(11) By anonymous on 2019-07-12 05:50:48 in reply to 9 [link]
The suggestion is only for that very code place that drh has touched in this thread where 'backoffice' is started to go away, do its job and forget about it. There is only a *single* fork() followed by setsid() at the moment. Other places where fork() is used have to be checked separately, independantly and carefully (if at all)- AFAIK `fossil server` doesn't involve forking at the moment at all, so there is nothing to fix (going from one to two). The suggestion is not to introduce new forking.
(12) By anonymous on 2019-07-12 05:53:08 in reply to 10 [link]
correct, wyoung. Just an improvement for backoffice found as by-catch. No new daemon features. And independant from the file handle bug/fix.
(13) By Richard Hipp (drh) on 2019-07-16 12:24:08 in reply to 6 [link]
Investigating further, I find that Fossil probably does not need a double-fork. A double-fork is useful when the parent process continues running but does not invoke wait() to harvest dead children. The double-fork causes the daemon process (the child process) to disconnect from the parent, so that it does not become a zombie when it dies but the parent is still running. But in fossil, the backoffice is only started as the parent process is shutting down. The parent will not continue running, but will itself die very shortly after launching the backoffice child. Hence, it seems the double-fork is superfluous and would accomplish nothing beyond consuming CPU cycles.
(14) By anonymous on 2019-07-16 12:45:50 in reply to 13 [link]
OK! ( BTW: no more corruption so far since https://www.fossil-scm.org/fossil/info/458ced35354314b1 )
(15) By Warren Young (wyoung) on 2019-07-16 15:30:48 in reply to 13 [link]
That's plausible. I think what we'd want to see next is `ps` output showing a lot of zombies on someone's Fossil server. No zombies, no problem.
(16) By Andy Bradford (andybradford) on 2019-07-17 01:25:27 in reply to 10 [link]
> My understanding of double forking is that it's just about avoiding > zombie processes. It's easier to double-fork than to ensure that you > do all of the wait() and SIGCHLD stuff properly. Among other things, yes, that's what it's about. I didn't think Fossil had a zombie problem because it uses wait() when appropriate which is why I asked the question. It's actually both fork() and setsid() combined together that make it so the process cannot get a controlling terminal. At any rate, I would like to know what the actual problem is that Richard hinted at earlier in this thread when he said: > This might clear some of the problems that (for example) OpenBSD was > having. Thanks, Andy
(17) By Andy Bradford (andybradford) on 2019-07-17 01:27:59 in reply to 10 [link]
> systemd can handle double-forked daemons. :) Yeah, and some people think that storing PIDs in files in the filesystem and then killing what you find in that file later on a good way to manage daemons too. :-) Thanks, Andy