Why Pijul?

Pijul is the first distributed version control system to be based on a sound mathematical theory of changes. It is inspired by Darcs, but aims at solving the soundness and performance issues of Darcs.

Pijul has a number of features that allow it to scale to very large repositories and fast-paced workflows. In particular, change commutation means that changes written independently can be applied in any order, without changing the result. This property simplifies workflows, allowing Pijul to clone sub-parts of repositories, to solve conflicts reliably, to easily combine different versions.

Change commutation

In Pijul, for any two changes A and B, either A and B commute, (in other words, A and B can be applied in any order), or A depends on B, or B depends on A.

  • [Use case: early stage of a project] Change commutation makes Pijul a highly forgiving system, as you can “unapply” (or “unrecord”) changes made in the past, without having to change the identity of new changes. A reader familiar with Git will understand “rebasing”, in git terms).

    This tends to happen pretty often in the early stages of a project, when most things are still uncertain. With Pijul, exploring new features and new implementations comes at no extra cost in time.

  • [Use case: the project becomes stable] As your project grows, change commutation saves even more time: imagine a project with two main branches, a stable one containing only the original product, and bugfixes, and an unstable one, where new features are constantly added.

    The team working on the unstable branch is likely to discover old bugs, and fix them in the stable branch too.

    In Pijul, maintainers of the stable branch can simply pull only the changes they are interested in. Those changes do not change when imported, which means that pulling new changes in will work just as expected.

Associativity

In Pijul, change application is an associative operation, meaning that applying some change A, and then a set of changes (BC) at once, yields the same result as applying (AB) first, and then C.

With branches, the first scenario looks like this: Bob creates A, while Alice creates B, C, and Bob finally merges both B and C at once.

The second scenario would look like the following, with Bob creating commit A, and then pulling B. At that moment, Bob has both A and B on his branch, and wants to pull C from Alice.

Note that this is different from change reordering: here, we apply A, then B, then C, in the same order in both scenarios.

Using math words such as “associative” for such a simple operation may sound like nitpicking, because intuition suggests that it should always be the case. However, Git doesn’t guarantee that property, even if A, B, and C do not conflict. Concretely, this means that Git (and relatives) can sometimes shuffle lines around, because these systems only track versions, rather than the changes that happen between the versions. And even though one can reconstruct one from the other, the following example (taken from here) shows that tracking versions only does not yield the expected result.

Git merge (which A is which?)
Pijul merge

In this diagram, Alice and Bob start from a common file with the lines A and B. Alice adds G above everything, and then another instance of A and B above that (her new lines are shown in green). Meanwhile, Bob adds a line X between the original A and B.

This example will be merged by Git, SVN, Mercurial… into the file shown on the left, with the relative positions of G and X swapped, where as Pijul (and Darcs) yield the file on the right, preserving the order between the lines. Note that this example has nothing to do with a conflict, since the edits happen in different parts of the file. And in fact neither Git nor Pijul will report a conflict in this case.

The reason for the counter-intuitive behaviour in Git is that Git runs a heuristic algorithm called three-way merge or diff3, which extends diff to two “new” versions instead of one. Note, however, that diff has multiple optimal solutions, and the same change can be described equivalently by different diffs. While this is fine for diff (since the patch resulting from diff has a unique interpretation), it is ambiguous in the case of diff3 and might lead to arbitrary reshuffling of files.

Obviously, this does not mean that the merge will have the intended semantics: code should be still reviewed and tests should still be run. But at least a review of the change will not be made useless by a reshuffling of lines by the version control tool.

Modelling conflicts

Conflicts are a normal thing in the internal representation of a Pijul repository. Actually, after applying new changes, we even have to do extra work to find where the conflicts are.

In particular, changes editing sides of a conflict can be applied without resolving the conflict. This guarantees that no information ever gets lost.

This is different from both Git and Darcs:

  • In Git, conflicts are not really handled after they are output to files. For instance, if one commits just after a conflict occurs, git will commit the entire conflict (including markers).

  • In Darcs, conflicts can lead to the exponential merge problem, which might cause it to take several hours to merge even a two-lines change.

Comparisons with other version control systems

Pijul for Git/Mercurial/SVN/… users

The main difference between Pijul and Git (and related systems) is that Pijul deals with changes (or patches), whereas Git deals only with snapshots (or versions).

There are many advantages to using changes. First, changes are the intuitive atomic unit of work. Moreover, changes can be merged according to formal axioms that guarantee correctness in 100% of cases, whereas commits have to be /stitched together based on their contents, rather than on the edits that took place/. This is why in these systems, conflicts are often painful, as there is no real way to solve a conflict once and for all (for example, Git has the rerere command to try and simulate that in some cases).

Pijul for Darcs users

Pijul is mostly a formally correct version of Darcs’ theory of changes, as well as a new algorithm for merging changes. Its main innovation compared to Darcs is to use a better data structure for its pristine, allowing for:

  • A sane representation of conflicts: Pijul’s pristine is stored in a “conflict-tolerant” data structure. Many changes can be applied to it, and the presence or absence of conflicts are only computed afterwards, by looking at the pristine.

  • Conflicting changes always commute in Pijul, and never commute in Darcs.

  • Fast algorithms: Pijul’s pristine can be seen as a “cache” of applied changes, to which new changes can be applied directly, without having to compute anything on the repository’s history.

However, Pijul’s pristine format was designed to comply with axioms on a specific set of operations only. As a result, some of darcs’ features, such as darcs replace, are not (yet) available.