Import above an initial artifact

(1) By Alex P (aplsimple) on 2019-09-16 19:32:52 [link] [source]

How cool it would be if Fossil were able to import data above the initial empty artifact. I.e. to populate the empty repository.

Or it's already available somehow?

See e.g. this post

(2) By Stephan Beal (stephan) on 2019-09-16 20:23:16 in reply to 1 [link] [source]

There used to be an option to create a new repo without the "initial empty commit", but i have no idea why it was removed.

Hypothetically you could simply delete the initial checkin artifact from the blob table of a freshly-created repo, but i've never tried it.

After creating the repo:

fossil sql -R thenewrepo
delete from blob where rid=1;

There should only be one blob entry at that point, so the "where" clause is unnecessary.

Untested. "Use at your own risk", as well as all the other usual disclaimers, apllies.

(3) By Stephan Beal (stephan) on 2019-09-16 20:30:27 in reply to 2 [link] [source]

After doing so, run "fossil rebuild" so that the timeline doesn't refer to a missing blob.

(4) By Andy Bradford (andybradford) on 2019-09-17 03:29:11 in reply to 2 [link] [source]

> There used to be  an option to create a new  repo without the "initial
> empty commit", but i have no idea why it was removed.

There were a number of lurking  bugs that were uncovered that were based
on the  assumption that rid=1  would always be  of type=ci which  led to
some unfortunate events:

https://marc.info/?l=fossil-users&m=144565852443811&w=2

Some effort was put into finding all the remaining bugs and fixing them,
however, eventually  the ability was  entirely removed because  it broke
backwards compatiblity (e.g.  old clients could no  longer clone against
newer servers).

Of course  it's still possible to  generate a repository with  no intial
empty checkin as you have demonstrated. Here's a simpler method:

$ fossil recon /tmp/newrepo.fossil `mktemp -d`        
Reading files from directory "/tmp/tmp.xjJccqCjWM"...

Building the Fossil repository...
  100.0% complete...
project-id: 9165ff8ec2e2d700d631d27e10ad045116da2629
server-id: de62a41a1fd6c2d781efb05a9d4c380717e5a12c
admin-user: amb (initial password is "Den84vcX7M")

Andy

(5.4) By Alex P (aplsimple) on 2019-09-28 10:52:29 edited from 5.3 in reply to 1 [source]

The main idea of this proposed feature is following:

When you import/push a non-empty repository to an EMPTY one (with the initial check-in only), you rightly hope that the non-empty trunk would be descendant of that empty initial check-in.

For now, nothing alike. The fossil import --incremental creates two unrelated trunks. Having no common ancestor, they aren't forks and you cannot merge them, so you should shun one of them (the empty initial check-in).

That's not good, because you cannot properly push this shunned repo to the remote EMPTY one, waiting for its proper trunk root. Thus, you should shun and shun, here and there.

Perhaps, --incremental option might be enhanced with adding --parent (--ancestor) option, which means "a trunk check-in to be a parent of the imported trunk root".

(6) By Stephan Beal (stephan) on 2019-09-28 12:09:46 in reply to 5.4 [link] [source]

When you import/push a non-empty repository to an EMPTY one (with the initial check-in only), you rightly hope that the non-empty trunk would be descendant of that empty initial check-in.

"Rightly" hoping that is an indication of a misunderstanding of parentage in fossil (whether what follows globally applies to DSCMs, i don't know).

That empty initial checkin is a full-fledged artifact with its own unique hash code. There is nothing magical about that commit as far as fossil is concerned and there's no direct indication that it is "the first commit" (that can be inferred indirectly by the fact that that checkin has no parents, but that condition only makes it unique so long as nobody has injected unrelated artifacts into the repo). When a new checkin is created, the hash of its parent is stored in that checkin and when the new checkin's own hash is generated, the hash of the parent is part of the content which is hashed. Thus the relationship becomes an immutable part of that new checkin.

Fossil's data model is an "unordered set of artifacts", in which it enforces no inherent lineage between those artifacts - all such information has to come from those artifacts themselves. In the case of pushing artifacts from one repo into another repo (empty or otherwise), Fossil has no basis whatsoever from which to assume that any given artifact from the new data is in any way related to any of the artifacts it already knows about. All such information has to be encoded into the artifacts. (Remember that fossil places no special significance on that initial commit record, nor on the order of the new artifacts: their ordering is implied via their dependencies upon each other (as recorded via hash codes).)

That said: there is a command called "reparent" which "patches" the parentage of a checkin, but it comes with this warning:

This is an experts-only command. It is used to patch up a repository that has been damaged by a shun or that has been pieced together from two or more separate repositories. You should never need to reparent during normal operations.

Reparent does not modify the child, but effectively amends the blockchain with a record saying, "by the way, Amy has adopted Joe as her parent" (and the relationship is one-way: Joe doesn't know about it). Like all tags in fossil, that amendment can be revoked by a later tag which cancels the older one. (Which can then be re-instated, revoked, reinstated... ad nauseam.) None of those tags are ever removed from the system, but each new one (as implied by the order of their dependencies) supersedes the next.

Sidebar: "shun" is something which really should never be done unless it's an emergency (Truly Undesirable data, e.g., a password or illegal/dangerous data, has landed in the repo). Shunning is considered "the nuclear option," not something intended to be done willy nilly.

(7.1) By Alex P (aplsimple) on 2019-09-28 16:57:20 edited from 7.0 in reply to 6 [link] [source]

Thanks, Stephan, for the detailed post. The parent's hash being a part of its child's hash explains all.

It was only a proposal of feature. I only hoped, not required, it to be implemented.

That said, let's separate names and contents.

Hashes are names of artifacts. Imported are contents of underlying artifacts, while their names (as we might rightly hope) might be rebuilt new in their new repo.

It is the import of contents, not of hashes, that is interesting.

ALSO, the issue of two unrelated trunks at --incremental importing still remains seemingly unsolved. See e.g. the book by Jim Schimpf, the Chisel section.

(8) By anonymous on 2019-09-30 11:00:53 in reply to 7.1 [link] [source]

let's separate names and contents.

Actually, you can't do that. The name of an artifact is the hash of its content. It serves both as a way to specify exactly what you are expecting, and as a way to verify the content of the artifact. In Fossil, the specification is most important: If you don't have an artifact whose content hash matches the expected hash, you don't have the expected artifact.

Granted, there are other ways to do this, but,l given that each artifact needs a unique name, using the has of the content as the name is the simplest way to name artifacts.

while their names (as we might rightly hope) might be rebuilt new in their new repo.

I'm not sure what you mean.

Fossil can "retrieve" the name of an artifact by hashing its content. Fossil does not rename artifacts.

You can attach a symbolic name to an artifact using a control artifact. This symbolic name will be displayed, but Fossil otherwise doesn't use it.

(9) By Alex P (aplsimple) on 2019-09-30 17:31:46 in reply to 8 [link] [source]

Actually, you can't do that.

Really? I can do, don't doubt. It's Fossil who can't.

Well, you defend the details of implementation. I, being a simple user, defend the essence of this. I don't care of the details of implementation.

For me, Fossil is a black box that behaves inadequately in this specific case, namely: it gets some imported FILES (all is file not only in UNIX, huh) and cannot put them into a target repo, without the crutch of shun. Hashes are the reason, as Stephan and you rightly noted, but this detail doesn't make us happy.

I'm not sure what you mean.

What I mean is following:

Let's imagine we make changes in a working directory. Then we commit them, working hard.

How does this way differ from the easy way of automated commitment of the imported/pushed FILES?

(10) By anonymous on 2019-09-30 19:36:22 in reply to 9 [link] [source]

cannot put them into a target repo, without the crutch of shun.

Incorrect.

"fossil reparent" can be used to "connect" the "imported" artifacts. I think this would be the preferred "crutch".[1] [2]

Hashes are the reason

Actually, RID 1 is the reason. See [1]. Yes, it is a bug, but one that is too entrenched to safely fix. If there were no initial, empty commit, this import problem would not happen (unless you were combining 2 more repos into a single repo).

This does suggest a potential solution: Omit the P card for the first real commit and ignore the initial, empty commit. This would be equivalent to when Fossil allowed not having an initial, empty commit. [3]

Granted, this would not help with "importing" from repos created before this might be implemented, but would help going forward.

[1] As mentioned in another post, some parts of Fossil assume that RID 1 is always a commit. If the empty commit is shunned, what becomes RID 1? Would there be a RID 1?

[2] Arguably, Fossil could have an option to automatically reparent the oldest incoming commit to a freshly created repo. The result of this would be semantically equivalent to not having an empty commit as RID 1.

[3] Alternately, could the oldest incoming commit be renumbered to be RID 1, thus replacing the initial, empty commit?

(11) By Stephan Beal (stephan) on 2019-09-30 19:50:54 in reply to 10 [link] [source]

If RID 1 is shunned, the next blob gets the next number in the sequence: RID 2. That said... if a rebuild is run immediately after shunning, that counter might (i am not certain) get reset.

The next commit would not actually get RID 2, though: the files which are part of a commit are necessarily imported intothe blob table before the commit record is created, so one of its files would get RID 2 (unless the commit was empty (had no files)).

Note that RIDs are transient and mutable. Multiple clones of a repo may have different RID values for the same blobs. RIDs are only for fossil's own internal use, essentially as cache keys.

(12) By Andy Bradford (andybradford) on 2019-09-30 19:57:27 in reply to 11 [link] [source]

I don't  think the  order of RIDs  is guaranteed on  a rebuild.  In fact
there's no  guarantee that the "initial  empty checkin" will have  RID 1
(see mailing list link mentioned earlier).

Andy

(13) By Stephan Beal (stephan) on 2019-09-30 20:05:01 in reply to 12 [link] [source]

It's definitely true that RID 1 itself is not guaranteed to exist or be a specific blob. A reconstruct can certainly remap RIDs, and a rebuild might (i don't recall for certain). Back about 5 or 6 years ago, in the context of porting libfossil, Jan and i discovered that some fossil ops assumed that the db always had at least one blob, and crashed if that weren't the case. Jan cleaned those up and added the ability to create an empty repo, but that was later backed out (from what i understand, it introduced backwards-compatibility issues).

(14) By Andy Bradford (andybradford) on 2019-09-30 20:24:02 in reply to 13 [link] [source]

Yep, I was one of those who found bugs with the changes. :-)

Regarding the history of changes, that's about what I said here, though without as many details:

https://www.fossil-scm.org/forum/forumpost/8aac350c36

Thanks,

Andy

(15) By anonymous on 2019-09-30 23:33:21 in reply to 13 [link] [source]

A reconstruct can certainly remap RIDs, and a rebuild might

What did the older versions of Fossil do? (Just curious.)

Moving on, other than backward compatibility with very old versions of Fossil, is there some other need for RID 1 to have any particular type or to even exist?

If not, or depending on just what assumes RID 1 is present and/or a commit, maybe the simplest workaround to this problem is to bring back the option to create a repo with no initial commit. Perhaps require that overriding the project code also be done, as the main purpose for this would be to facilitate pushing an existing repo into the new repo.

Another option might be, as I previously suggested, when receiving a push from an existing repo into a freshly created repo, auto-delete the initial commit and allow one of the received artifacts to take its place.

Yet another option might be to create the initial commit with a special date and "null" user, so initial commits will always have the same hash. This, of course, would negate using the initial commit as a record of the repo's creation event. If there's really interest in keeping that info, maybe auto-create a technote to keep that info. (The technote would be RID 2, while the initial commit would be RID 1.)

(18) By Stephan Beal (stephan) on 2019-10-01 00:06:26 in reply to 15 [link] [source]

What did the older versions of Fossil do? (Just curious.)

Fossil's behaviour hasn't ever changed in this regard. RIDs have always been volatile and internal-only. They're never exposed by any web pages or CLI commands and always been subject to change at any time. In short, RIDs are simply the integer IDs/aliases of blob table entries. Those entries are formally known/referenced by their hash values, but internally fossil uses integers as the primary key for efficiency's sake (it saves a ton of memory over using full hash values everywhere a foreign key reference is needed, and is faster as well).

The deconstruct command can be used to rip a database apart (exporting all of its blobs), then the reconstruct command can repopulate it from those ripped-apart pieces, and the RIDs will (in all likelihood) be completely different (for populated repo, anyway). Fossil never, ever shares RIDs between repos, nor exposes them to users. They're strictly internal keys.

Moving on, other than backward compatibility with very old versions of Fossil, is there some other need for RID 1 to have any particular type or to even exist?

RID 1, per se, was never magical. Because fossil, early on, adopted the policy of "seeding" the database with a single empty commit, its code came to implicitly rely on the fact that the database would never be empty and that there would be a parent checkin for the user's first "real" commit to inherit from. Which RID that item had was actually never relevant, but the way the SQL sequence works is that the first blob will always get RID 1 (and the first blob is always the empty initial commit). "And thus it came to pass" that the initial empty commit is sometimes colloquially known as "RID 1", even though that's only true for freshly-created repos and only if we ignore the fact that the SQL sequence's starting point is an internal implementation detail which could change at any moment (thereby giving the initial commit a different RID).

Another option might be, as I previously suggested, when receiving a push from an existing repo into a freshly created repo, auto-delete the initial commit and allow one of the received artifacts to take its place.

That would violate fossil's most golden of core-most rules: "don't lose data." It's true that the initial empty commit is auto-generated and "could" be sacrificed without any harm being done, provided no checkin inherits it, but, even so... the mere thought of fossil deleting a piece of data gives me the shivers, just out of principle. (That said: that's just my own opinion and i'm sure there are many others who don't share it (and that's okay).)

Yet another option might be to create the initial commit with a special date and "null" user, so initial commits will always have the same hash.

The timeline would then start on November 24, 4714 BCE (Fossil internal uses Julian Day timestamps, not Unix Epoch times).
It would break for people who store multiple disjointed trunks in their repo, as all such trunks would derive from the same checkin (same hash == same content == same checkin). (Why some people insist on doing that is beyond me, but they do and Fossil's data model allows it, so doing so isn't fundamentally broken (just really weird).)

This, of course, would negate using the initial commit as a record of the repo's creation event. If there's really interest in keeping that info, maybe auto-create a technote to keep that info. (The technote would be RID 2, while the initial commit would be RID 1.)

That's an interesting idea, but fossil might (i haven't checked since 2014 or so) still internally rely on having a checkin for its initial checkout state and as a parent for the first "real" commit.

(16) By Richard Hipp (drh) on 2019-09-30 23:52:31 in reply to 1 [link] [source]

How cool it would be if Fossil were able to import data above the initial empty artifact.

I can't think of a practical reason to what to do this. Please help me out. What problem are you trying to solve that this capability would help with?

(17) By anonymous on 2019-09-30 23:58:59 in reply to 16 [link] [source]

What problem are you trying to solve that this capability would help with?

One example I can think of: When setting up a Fossil server on a hosting service like ChiselApp. This would allow cleaning pushing from an existing repo to the newly created one.

(19) By Richard Hipp (drh) on 2019-10-01 00:23:33 in reply to 17 [link] [source]

Wouldn't it be easier to simply hand ChiselApp your initial repository as part of the setup process?

We could add special-case logic to Fossil such that if you push to an empty repository it overwrites the empty repository with the content of whatever is being pushed. That seems doable. But it is a lot of extra code to test and maintain and potentially have bugs. Is this really a problem that needs that much attention?

(20) By Roy Keene (rkeene) on 2019-10-01 00:59:11 in reply to 19 [link] [source]

For what it's worth ChiselApp already let's you upload a repository as one of the options for creating a new repository on ChiselApp.

(21) By Alex P (aplsimple) on 2019-10-01 07:00:15 in reply to 19 [link] [source]

Please, look at Jim Schimpf's book, by the way referred to in Documentation Index:

Fossil book by Jim Schimpf (version on 15 November 2016)

Pay a special attention to

Chapter 6 ChiselApp

6.2.3 Fixing Data

Figure 114: Time Line view

And you ask after that:

Is this really a problem that needs that much attention?

Also, visit

Public Repositories of ChiselApp

and try at random 10-20-100... repositories to gather some statistics - you would discover 4 of 5, maybe 6 of 7, repositories being "stubs" with one and only "initial commit".

You might refer to the similar statistics of GitHub, but it's not so evident and blatant.

As for ChiselApp aka GitHub of Fossil, the not last reason of its pitiful state with "stubs" is the issue described by Jim (and by me, as far as I could deliver this message for your team).

(22) By Stephan Beal (stephan) on 2019-10-01 12:15:39 in reply to 21 [link] [source]

FWIW, i suspect that ChiselApp represents a minority of Fossil users. Most people, i suspect, run it either as a CGI (the simplest/most widely-deployable approach) or run it only "internally" with a server in their local network. (i'm not sure how we could reasonably gather usable statistics on which approach(es) people are using.)

(23) By Chris (crustyoz) on 2019-10-01 12:56:45 in reply to 22 [link] [source]

(i'm not sure how we could reasonably gather usable statistics on which approach(es) people are using.)

For at least a sizable subset of the population who are members of this forum, you could simply ask.

I have a server half-way around the world that acts as my primary repository and backup for a bunch of repositories. Running it as a CGI is a trivial extension of a local network.

Chris

(25) By ckennedy on 2019-10-01 22:18:24 in reply to 23 [link] [source]

I have a private W2k12 server running at work, and my own semi-public server running under Apache as CGI on a web hosting account.

(24) By ckennedy on 2019-10-01 22:16:28 in reply to 21 [link] [source]

So the Fossil Book is somewhat old, was last updated when Fossil was at v1.36, and well, I wouldn't take anything in it as gospel.

(26) By Alex P (aplsimple) on 2019-10-02 09:37:00 in reply to 24 [link] [source]

Really? Can you be more exact as for the specific issue?

I tried it in Fossil v2.8 and found the issue the book describes is still here.

(27) By ckennedy on 2019-10-03 00:49:17 in reply to 26 [link] [source]

So the book is several years old, Fossil has moved on, and so has ChiselApp. You can now not just create a new repository in ChiselApp, but also clone or upload a repository. So for the example in the book you should upload the existing repository and thus circumvent the entire issue.

If you are creating a new repository to use ChiselApp hosting, you should create a new repository in ChiselApp first, then clone it, then enter your code and sync.

The book really needs to be updated, but I'm unclear on how it is copyrighted and whether the community can just go ahead and update it or not.

Thanks.