Transformation between ‘flat’ DAG of commits, i.e., linear, and hierarchical. Might be cleanest to have (empty) commit at each ‘start of section’. Then could-be-ff commit at ‘end of section’.
Will need test repo; or at least test branch within this main repo.
How to signal section start/end within flat commits? Maybe add ‘notes’ to a commit. Maybe adopt convention for content of commit message. E.g., <s> at start of ‘subject’ line of message of a ‘start of section’ commit; </s> in ‘end of section’ commit. Allows extension to giving a name to a commit, via ‘<s id=”add documentation”>’ or more abbreviated ‘<s Add documentation>’. Could include some redundancy to ensure matching end/start but probably not necessary.
Expts with pygit2 — seems straightforward enough so far.
Seems that just using repo.walk() will do what we want — list of commits reachable from that point.
- Check it’s linear — i.e., all commits except one have exactly one parent; the one exception should have no parents (it’s the root).
- Assuming it is linear, put into list by following parent-ids, then reverse that list. Now have commits in order from root to tip.
- Create new empty tree and ‘base’ commit pointing to it. Seems might be least clunky to have a global such ‘base’.
- Create new branch pointing at the base commit. Generate unique name for this branch? Force existing branch?
- Maintain idea of ‘current tip’, starting with that base commit. For each commit from root to tip: If <s> or normal, create new commit with a single parent (the tip), updating the dendrified branch. The message, committer, author, tree taken as-is from linear commit.
- If <s>, bit more to it: Might need equivalent of ‘allow empty’. Push ID of just-created commit onto stack of ‘section starts’.
- If </s> create (empty) commit with two parents: tip, and result of popping from section-start stack.
- For both <s> and </s>, strip it from start of commit message.
- Where do we check that we don’t try to pop from an empty stack? A preparatory check would allow friendlier error messages.
On trying it: seems the idea of a ‘base’ commit isn’t required after all. No reason not to have a multi-rooted repo (although this is apparently unusual). As long as there is a branch whose target is the tip, we’re OK. Might well end up deleting that stuff.
Anyway, in general we’ll have an ‘upstream’ type reference which we’ll go back to anyway.
Again, walk from tip of dendrified branch. How to verify structure? Each commit should have either one parent or two.
From tip, what when we hit first two-parented commit?
It will become a </s> commit in the linear history.
It should have an empty diff to exactly one of its parents. That parent is the non-section-start commit. It should be the second parent that has the empty diff.
The other (the parent to which this commit has a non-empty diff) is the matching <s>.
If we can iterate through from tip to root in topological order, should also be able to tell by: peek ahead in iterator; that one is the non-section-start commit.
Either way, when we have noted the matching <s>, push that onto a stack. Each time we have a single-parent commit, check whether it’s the top of the <s>-stack. Could we do this just for commits with empty diffs? Seems like that ought to be valid.
Then we should have the commits in a linear sequence, and having noted whether each one is a <s> or </s> or neither. From there, straightforward to create a chain of commits from the ‘base’.
Is this going to cause trouble? Normal git tools are wary of such commits. Would be annoying to have to artificially cause a diff, so hope nothing too bad happens.
What if the linear history has some real empty commits in it? Is this likely? As long as they don’t have the <s> or </s> markers in the message, think it doesn’t matter.
How to describe entire linear/flat history for tests? As patch series? Use ‘git format-patch’ to get bunch of files, which can then be in the main git repo as test data. Bit perverse but might be OK.
Annoyance (of unknown magnitude) caused by ‘git am’ handling of empty commits. Can create them fine via the (undocumented?) ‘–always’ flag to ‘git format-patch’, but web suggests (and brief experiments confirm) that ‘git am’ will baulk at them. Might have to do them semi-manually. Come back to that.
All that matters for the test data sets is that there are real changes to the tree to simulate a ‘real’ commit, and the topology of the resulting sectioned history.
So to describe a linear history, sequence of <s>, ‘normal’, </s>. When creating a repo to instantiate that, can (eg) make a tree with a single blob whose contents are the text representation of the index of that commit in the linear history.
Call it ‘[’ for a <s>, ‘.’ for a ‘normal commit’ and ‘]’ for </s>.
Current code takes all commits back to the parentless root commit for transformation. We actually want to be able to specify a range of commits, e.g., start with the most recent ‘develop’ and transform everything reachable from ‘linear’ but not from ‘develop’ into a new branch ‘dendrified’.
In manual linear->dendrified->linear round-trip inspection, the commits get the same OIDs. Skimming the source of libgit2, and also observing behaviour, it does not seem to be an error to attempt to write a pre-existing object to the ODB. You just get the same oid back, as expected.
Tried it with re-structuring flat list of commits in python-xml-serdes. Under new magit, was pretty easy to add appropriate empty commits to start sections, and to reword existing labelled ‘merge point’ empty commits. Then the dendrify code worked nicely and quickly: c120ms (felt instantaneous) for c.70 commits.
Result slightly clunky in one regard: empty ‘start section’ commits feel redundant and not what would be done manually. See next section.
Normally, the next feature branch just launches straight off the merge commit which brought in the previous feature branch. That commit, therefore, is both a section-end and a section-start commit.
Design so far is that the commit message needs to start with the special </s> sequence, and then end with <s>, to say a new section starts straight away.
This is clunky for multi-line commit messages though.
Try </s><s> at the start instead? Not convinced: That might give the impression that the commit messages belongs to the <s> tag (i.e., describes the section that’s about to start), whereas in fact the message describes the work done in the section just closing. E.g.:
</s><s>Error-reporting improvements
Add a ’’, aimimg for some vague analogy with the meaning of ’’ in the
‘mode’ argument of an fopen() call?
<s+>Error-reporting improvements
or
</s+>Error-reporting improvements
Make the <s> be at the end of the first line (i.e., the ‘subject’) of the commit message?
</s>Error-reporting improvements<s> Include more context in error messages. [...]
Does anything actually rely on the usual emptiness of the section-start commit? If not, could just tag any commit. But it would be nice to tag the first commit of a section (feature branch), not the last one of the previous run of standalone commits. So it is not the case that the tagged commit is the multi-children’d one.
Would the following work? Showing tags as part of a dendrified history, which wouldn’t normally be the case:
* </s>Add printing feature |\ | * Allow configuration | * </s>Implement driver | |\ | | * Check parity | | * Set up flow control | | * <s>Define serial port | |/ | * <s>Emit control codes |/ * Fix typo * This is version 1.2.3 * (Minor code style clean-ups) * </s>Add colour-picker |\ | * Translate colour names | * <s>Set up defaults |/ * </s>Allow choice of language |\ | * Add French, German | * Move existing strings to xln framework | * <s>Set up translations framework |/ * --BASE-- This is version 1.2.2 : :
Which commit gets labelled with which type? Maybe the ‘EndAndStart’ type goes away, if we’re calling the <s> commits the ‘Start’ ones? I.e., a ‘Start’ commit is a single-parent’d commit whose (sole) parent has another child besides the commit under consideration.