git aware tools: Part 1

jcbellido December 07, 2024 [Code] #Rust #rstest #docker #shuttle.rs #git #LFS

In this article series I want to describe how what looked like a common feature request from an assets team:

we want a button that runs "git push" in two repos, at the same time

spiraled out of proportion with great amusement and joy. How I began by failing at making libgit2 deal with LFS in any reasonable way. Introduced the rstest library into the team. Modified the default profile of nextest and brought docker-based integration tests into our CI toolkit.

This piece is a nice follow up to Rust Macros and Teams where I manually created a test fixture. And reads even better when combined with shuttle's (recommended blog, btw) entry on testing.

This text is ordered with the technical notes first. For a breakdown on the request and its context, please check the Context section. You'll notice that the following text describes a very manual way of implementing all the steps. For example docker is interfaced through the CLI, to name only one detail. What follows are "field notes on a first implementation". When applicable I'm adding references to potential libraries that could simplify all the implementation with more resilient and tested code.

Tooling git

Perhaps after reading the request one could be tempted to deploy a script-ish thingy that takes care of the piping with git. I decided to go ahead and build surfaces for this particular request anticipating following features that could benefit from having a window available. This bet actually paid off but it's material for another entry.

Back to the request, the idea is to do a git push on two branches. But as probably you very well know there's a universe of details there. To name just a few:

And with that target in mind, off I went with my lovely git2 library at hand.

libgit2

One of my first goals was to determine the active branch when given a file path of a file or directory inside a locally cloned git repository. One reasonable first candidate to use is git2, alternatively their page in git2 docs.rs. This is a big boi library that requires a pinch of reading when used.

Let's see an example on how to get the name of a branch:

fn get_repo_shorthand(path: &Path) -> String {
    let repo = git2::Repository::discover(path).unwrap();
    if repo.is_bare() {
      panic!("Bare git repositories supported");
    }
    let head = repo.head().unwrap();
    let Some(head_shorthand) = head.shorthand() else {
      panic!("shortname non UTF-8 or invalid");
    };
    return head_shorthand.to_string()
}

or perhaps you're interested in the upstream name?

fn get_upstream_name(repo: &Repository) -> String {
    let head = repo
        .head()
        unwrap();

    let Some(head_name) = head.name() else {
        panic!("head name non UTF-8 or invalid");
    };

    let ups_name = repo.branch_upstream_name(head_name).unwrap();
    let Some(un) = ups_name.as_str() else {
        panic!("upstream name non UTF-8 or invalid");
    };
    un.to_string()
}

The examples directory in the repository is a good starting point when approaching this library. They truly helped me get started.

It was time to implement something beefier, let's say, the commit part.

git LFS is a killjoy

There I was, happily trodding along with git2 and rust. Until I started testing the implementation of a git add operation through the library, a simple: let's add these 2 files under LFS.

I wasn't able to make it work through git2. Or, better said, I was able to add + commit but no amount of tweaking made the files register properly inside LFS. I was consistently committing files into the repo as binaries.

In retrospect, I should've never tried. Sometimes is good to take a look around when the first signs of weird stuff start to appear around you. It seems as if this issue has been the source of entertainment of many good folks for a while.

If you're curious, the spec of Git LFS is around here. The core idea is that it's extending Git through a clever use of Git Hooks and Git Filters (look for "clean" and "smudge").

On one hand it felt like defeat. On the other, seems as if every other tool dev facing this issue has decided that it wasn't worth the time and reverted to invoking git through a subcommand. And very soon the code started to look like this:

let _cmd = Command::new("git")
    .arg("rev-list")
    .arg("--count")
    .arg("HEAD...@{u}")
    .headless()
    .output()
    .unwrap();

or like this:

let _cmd = Command::new("git")
    .arg("diff")
    .arg("--name-only")
    .arg("--staged")
    .headless()
    .output()
    unwrap();

Where the builder extension headless takes care of a detail in Windows where we try to avoid the opening of a new console.

Your own git server: gogs

As you can imagine from the list of features above, the code started to grow and very soon testing by hand was absolutely a nightmare. Ideally I wanted my own personal git server, do whatever I wanted with it and then nuke it. Preferably in a repeatable way. Ideally all driven by code. In other words: I wanted to run a git server in docker.

Any quick visit to r/selfhosted will give you many options on what's out there to choose. Personally, I'm using gitea at home and it's working (for my tiny needs) flawlessly. And, of course, this was my first option. Then, I tested a simpler project, gogs and noticed that the memory footprint of the running containers was significantly lower.

This link will give you more details on how to run gogs on Docker.

Ok, the server was selected and it was possible to start a couple gogs, configure them and do things and then nuke them. That also grew tiresome pretty quickly. So I decided to write a tiny crate to take care of this idea of: start a gog server and run it with a configuration I know and, also, clean any remaining poop after I was done.

Let's go bit by bit.

Configuring gogs server

If you start gogs through docker as described in the manual and navigate to the exposed port in your host you'll see a configuration page. The server needs the database details to complete booting. If you go ahead, select sqlite3 and let it finish you'll notice an interesting new file created from the container: gogs/conf/app.ini.

[... continues ...]

A handy initial configuration lives in this gist with app.ini.

Dynamic ports in the host

Is it alive?

How to check if the server is ready to accept connections.

Create a user

Where a small call to the service is done and a new super user is created.

Writing the first test

What if we transform all this docker stuff into a fixture?

rstest

a new library makes an entrance

Note: testcontainers arrived late

Probably is good to read the entirety of an article before starting implementing like a mad man. SHRUG

Note: bollard never made an entrance

Even though this is just a guess, it's probably way cleaner to talk with the daemon directly instead of using the CLI. Perhaps for the next loop of the tool.

CI: Introducing nextest profiles

All these testing was based on having docker at hand. This isn't a requirement for our local developers. I wanted to make an "opt-in" mechanism. Let's pretend we're running in the CI machines.

rstest VS. nextest: no #once

Where some links and references are mentioned to


Context

The request in detail

In late 2024 I received a request from an asset team1 in the company. For reasons they were working with several git repositories simultaneously, by hand. If you have worked in a team where "raw assets" and "cooked assets" are both version controlled in different repos, you might be familiar with the scenario.

Reminder: the assets in question, either raw or cooked, are not diff friendly. They're a mix of binary and text files. Can't be merged and are "big" in the range of KB - MB.

Let's breakdown a bit:

NameDescription
source repoA git + LFS repository containing raw assets. Sometimes originals, sometimes assets using lossless compression. Not used by any build system. Owned by the asset team.
target repoA git repository containing cooked assets. Packaged assets, compressed. Input to build flows. Shared with several other teams.
authoring tool3rd party product used by the asset team. Used to generate raw assets.
wiring toolInternally developed metadata editor used by the asset team to publish the assets and expose them to the runtime.
git client3rd party git client. The asset team has agreed on which one.2
cooking toolA mix of scripts, 3rd party and internal tools that transform source assets into target assets.
And perhaps particularly relevant for our case
source branchA git branch in the source repo. In this context we assume this branch has been shared with the team. It's pushed.
target branchA git branch in the target repo. As before, this branch has been shared with the team. It's pushed.

These asset teams are doing a lot of PR work. They're used and expected to work in branches and then merge.

Using the concepts above we can imagine Alice's work day thus:

  1. Alice opens git client and fetches both source repo and target repo.
  2. Magically3 she knows which source branch and target branch to use.
  3. Using the authoring tool she produces the raw assets into source branch.
  4. Using the wiring tool she connects the raw assets, defines the metatadata and iterates.
  5. Using the cooking tool she produces the cooked assets in the target branch.
  6. Through git client pointing at source repo she adds, commits and push to source branch.
  7. Again, using git client in target branch she adds, commits and push to target branch.
  8. Hopefully, profit.

That's the context for the request. At surface level, reasonably simple. Just a bit of surface, minimal UI and day done, right?

Approach: A new module in wiring tool

It seemed as if the request was aiming to reduce the friction at the end of "Alice's work day", let's simplify that jumping back and forth between apps, repositories and branches. What if we simply had a button that "pushed" to both branches at the same time? Perhaps we could have a big fat text box where to jot a message for the commit? Or even we could add it automatically? And since we have both the modified files and knowledge about the domain, why not showing a list of modified files and assets?

This is what I pitched to the team:

  1. Alice, using magic, prepares the source and target branches.
  2. She iterates between: authoring - wiring - cooking steps.
  3. In the wiring tool she opens the session control window.
  4. She reviews a summary of the work done:
    1. Files affected.
    2. Entities modified.
    3. Warnings or errors in either source or target.
  5. Alice adds a description to her work in a session control text input.
  6. A submit button in session control will (on both source branch and target branch):
    1. Add the relevant files changed.
    2. Commit with provided comment.
    3. Push to repository.

In other words, the idea was to make the wiring tool somewhat aware that its intent was to share material with other parts of the production. An extension that will introduce some git awareness and details on how the target repo expected the materials to look.

Not another git client

I didn't want to write anything close to a git client. We seem to have more than enough: gh desktop, lazygit, gitx, GitUI, fork, etc. There's this old joke about having a "new Javascript framework per week". We probably get a new git client with it as a bonus. I have nothing against having choices. It's actually quite amazing. It's just that sometimes I miss the old Perforce client, with its silly animated icons, where the team has exactly one option. That makes tool devs lives significantly easier.

The goal was to display the changes done in a way that was relevant to the paticular assets. I wanted to go from:

File walk1.anim is modified.

to

Game entity boom-boom-man has changed the payload of component animation-loops (that happens to live in the file walk1.anim) and it's shared by entities creepy-boi and stompy-dude.

This way the user, before sharing any changes with the rest of the team, has an overview of the changes in the context of the project.

On git for asset teams

If you're lucky4 you're using perforce and you have other issues.

Chances are that your organization has adopted git. With a pinch of luck you haven't heard about git submodules. Assuming you're working only with code, your repo doesn't use git LFS because it doesn't need it.

That's not our case.

The asset team requesting this feature is exposed to both submodules and LFS. And that's a double issue:

  1. Submodules: Are confusing for non technical peeps. Their usage sometimes require users to force sync. Every git client solves its representation in a slightly different way.
  2. LFS: is bolted on top of git. At the end it's a plugin. This makes the use of any library, for example git2, borderline impossible5. Even simple things, such as a call to add, won't properly invoke the lfs hooks tooling. It's easy to end with your committed binaries in the repository outside of the lfs control.

A note on "Magic"

You probably have noticed the word "magic" in the sections above. In the context of this text it's a shorthand for: "this branch matching is done by hand by humans using a combination of Slack messages, JIRA-ish systems, emails and over the screen conversations". It's also the most important issue that Alice and her asset team is facing. In this production, the branch an asset is living in a repository has meaning. If it has reached main it's considered shipped. From that point in time any shipping of the title that includes it won't be leaking anything. This rule applies to both the repositories listed above: source and target. With this in mind we can revisit the breakdown of tasks done by Alice and notice that we're missing a critical step: deciding when to merge back to main on both repositories.

Taking this "magic" into account, the introduction of the session control window in the wiring tool is solving a small part of the production issue. It's the programmery part of all this. On defense of this line of action I could argue that probably any implementation of a system that takes care of tracking the lifetimes of git branches will use one or more of the components that went into implementing session control.

And hopefully this will be the topic of a future entry in this series.

jcb out, good hunt out there!


1

These guys are not programmers, they are artists. They build assets for a living: chonky binary files that will be included in the final build more o less as they are produced.

4

Always understanding that P4 is it's own purgatory.

5

If you happen to know of a rust - friendly library that's able to deal with git + LFS, please, send me a message.

2

Depending on how lax your internal policies are, you might discover yourself constantly asking: Which git client are you using? And that makes any documentation or automation or issue tracking significantly more challenging.

3

With a bit of luck in a following entry, I might be able to tackle this magic a little.