Skip to content

use 3-way diff algorithm to render merged.u (file output when UCM_MERGETOOL is set)#6160

Merged
aryairani merged 6 commits intotrunkfrom
26-02-02-merged-file
Feb 5, 2026
Merged

use 3-way diff algorithm to render merged.u (file output when UCM_MERGETOOL is set)#6160
aryairani merged 6 commits intotrunkfrom
26-02-02-merged-file

Conversation

@mitchellwrosen
Copy link
Member

Overview

This PR changes the internal logic we use to render the merged.u file when UCM_MERGETOOL is set. merged.u is the user-editable buffer they're given to resolve changes, given other context in $BASE, $LOCAL, and $REMOTE.

Previously, we'd simply render all of the conflicted definitions and their dependents for each branch, and perform a two-way merge between the two, inserting conflict markers as appropriate. This would leave unconflicted dependents alone, and put merge markers in the definitions that were actually conflicted.

However, it had the unfortunate and annoying property that it'd highlight, for the user to resolve, linewise differences in rendered definitions that were only touched by one party.

In this PR, I implemented a simple diff3 algorithm that finds sync points (untouched lines), and classifies the code between them as either conflicted or not, depending on how it compares to the other side and the LCA. This results in many fewer fences that require user intervention, sometimes even none! (which is kinda weird - the merge fails due to conflicts, but when the conflicted definitions are rendered with a linewise 3-way diff, they end up being compatible with each other as they affected different parts of the pretty-printed definition).

Test coverage

This change only affects merged.u, which isn't really capturable by our transcripts at the moment, as it's only written by the external UCM_MERGETOOL when configured.

I did unit test the diff3 algorithm though, see lib/unison-util-diff3/test/testcases/

@mitchellwrosen
Copy link
Member Author

weeder is registering a million unused functions in CI again. Unlike last time when this happened, I can't try just merging main in again because this branch is already up-to-date :(

@mitchellwrosen mitchellwrosen marked this pull request as ready for review February 4, 2026 19:17
@aryairani
Copy link
Contributor

aryairani commented Feb 4, 2026

This change only affects merged.u, which isn't really capturable by our transcripts at the moment, as it's only written by the external UCM_MERGETOOL when configured.

In this case you can always throw in a screenshot of the sanity check that you're using to verify it's good to go, or attach a file if desired; not saying you have to here at this point

I did unit test the diff3 algorithm though, see lib/unison-util-diff3/test/testcases/

👍

weeder is registering a million unused functions in CI again. Unlike last time when this happened, I can't try just merging main in again because this branch is already up-to-date :(

Do the local checks complete ok? (scripts/check.sh)

Does weeder need to know about some roots in the new package? Or no, because it should flow from the root in the cli package?

@aryairani
Copy link
Contributor

I guess let us know if the weeder thing is a false alarm and it's safe to merge.

Alternatively we can merge after #6046, which we should probably do anyway because it'll be easier to deal with conflicts here than on the other PR

@mitchellwrosen
Copy link
Member Author

@aryairani Yeah, scripts/check.sh passes locally. Weeder only failing on one CI build. And sure, we can wait for that PR to merge first.

@ceedubs
Copy link
Contributor

ceedubs commented Feb 4, 2026

Awesome!

I think that this resolves #6096 .

One thing that I noticed: when running the merge from ^ this issue it tells me that it can't automatically merge and opens up my merge tool. But the merged.u file doesn't actually have any conflicts and compiles fine. Is that expected behavior? What is considered something that it can merge automatically vs not?

@mitchellwrosen
Copy link
Member Author

@ceedubs I noticed that too, it's what I referred to here:

which is kinda weird - the merge fails due to conflicts, but when the conflicted definitions are rendered with a linewise 3-way diff, they end up being compatible with each other as they affected different parts of the pretty-printed definition

I think it reveals an odd property of our merge algorithm, which we could consider improving.

Basically, merge sees that two people touched the same definition, and that's enough for it to conclude "there's a conflict here". We will end up on a merge branch for the user to resolve no matter what. And without UCM_MERGETOOL, the user will just get both copies of the conflicted thing in their buffer, to manually untangle.

One improvement we should probably make is using a 3-way diff with conflict markers regardless of whether UCM_MERGETOOL is present, i.e. put merged.u in the buffer rather than whatever we currently put.

Anyway, it does seem like we shouldn't have a system that says "there are conflicts", and then not render any conflicts, but this is still a step in the right direction even though it reveals that weirdness, I think.

@aryairani
Copy link
Contributor

aryairani commented Feb 4, 2026

Weeder only failing on one CI build.

@mitchellwrosen it's only actually attempted on that one CI build

Anyway, it does seem like we shouldn't have a system that says "there are conflicts", and then not render any conflicts, but this is still a step in the right direction even though it reveals that weirdness, I think.

If it doesn't render any conflicts, and the result type-checks, then there's nothing the user can do, so we should just complete the merge automatically.

@aryairani
Copy link
Contributor

If it doesn't render any conflicts, and the result type-checks, then there's nothing the user can do, so we should just complete the merge automatically.

Would that be an easy tweak to make?

@mitchellwrosen
Copy link
Member Author

@aryairani I think that's pretty much right, though there could be semantic conflicts that aren't surfaced by the type checker, e.g. two people put the same logging statements into the same function in two different spots, or something. If we just auto-merge these things, I think that would do what the user wants like 99+% of the time, and it's what git would do, but it is somewhat interesting that we can at least detect the difference between these two kinds of conflicts, and handle them differently, if we want to.

@mitchellwrosen
Copy link
Member Author

Would that be an easy tweak to make?

An easy tweak to this PR, you mean? No, I don't think so. That'd be a substantial surgery to the merge algorithm code.

@aryairani
Copy link
Contributor

aryairani commented Feb 4, 2026

@aryairani I think that's pretty much right, though there could be semantic conflicts that aren't surfaced by the type checker, e.g. two people put the same logging statements into the same function in two different spots, or something. If we just auto-merge these things, I think that would do what the user wants like 99+% of the time, and it's what git would do, but it is somewhat interesting that we can at least detect the difference between these two kinds of conflicts, and handle them differently, if we want to.

Yeah I was thinking the same. For example we could flag a few functions and just say "hey maybe you should stare at these". But right now, if I understood correctly, we tell them there's a problem, we don't give any clues as to what they should do with a potentially huge file, and also tell them there's no problem, so that's not great.

Would that be an easy tweak to make?

An easy tweak to this PR, you mean? No, I don't think so. That'd be a substantial surgery to the merge algorithm code.

I don't get it, it's not the same as load plus update?

@mitchellwrosen
Copy link
Member Author

I don't get it, it's not the same as load plus update?

I think that would work, but kind of bolted-on. Better would be to not go down the code path of putting together a merge branch to resolve conflicts on in the first place, only to perform an update to merge it back into its parent and delete.

@pchiusano
Copy link
Member

Awesome!!

@aryairani
Copy link
Contributor

I don't get it, it's not the same as load plus update?

I think that would work, but kind of bolted-on. Better would be to not go down the code path of putting together a merge branch to resolve conflicts on in the first place, only to perform an update to merge it back into its parent and delete.

Oh yeah, I totally agree.

@ceedubs
Copy link
Contributor

ceedubs commented Feb 5, 2026

Oops thanks @mitchellwrosen I didn't read your initial description thoroughly enough and missed that parenthetical.

I think that this PR is a massive improvement and that it should be reviewed/merged before any potential future endeavor to make more merges automatic.

🎉 🎉 excited about this one!

Also I think that @mitchellwrosen might be on to something about using the merged.u form when the user doesn't have a merge tool configured.

@aryairani
Copy link
Contributor

aryairani commented Feb 5, 2026

I think that this PR is a massive improvement and that it should be reviewed/merged before any potential future endeavor to make more merges automatic.

Yeah I see that the PR takes us from "sometimes give up" to "sometimes confusing", and plan to merge it as soon as we figure out how to pass CI, but want to normalize:

If there's something weird with the workflow that can be fixed with a know but hacky but easy-to-add and easy-to-reverse option, vs a conceptually nicer but costly to implement option, that we prioritize the workflow.

If the hacky option is hard to implement or hard to reverse, then that's less obvious, maybe we have to sacrifice the DX on a case-by-case basis.

Also I think that @mitchellwrosen might be on to something about using the merged.u form when the user doesn't have a merge tool configured.

We could certainly try it out on a branch!

@aryairani
Copy link
Contributor

@mitchellwrosen Could you take a look at the conflicts with the ghc 9.10 merge

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Some tests with 'continue-on-error: true' have failed:

  • Cabal / smoke-test

Created by continue-on-error-comment

@aryairani aryairani merged commit 08b7c12 into trunk Feb 5, 2026
32 checks passed
@aryairani aryairani deleted the 26-02-02-merged-file branch February 5, 2026 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants