tooling git: Part 1

jcbellido December 07, 2024 [Code] #Rust #rstest #docker #shuttle.rs #git #LFS

In this article series I want to describe how what looked like a common feature request from an assets team:

we want a button that runs "git push" in two repos, at the same time

spiraled out of proportion with great amusement and joy. How I began by failing at making libgit2 deal with LFS in any reasonable way. Introduced the rstest library into the team. Modified the default profile of nextest and brought docker based integration tests into our CI toolkit.

This piece is a follow up to Rust Macros and Teams where I wrote a test fixture from scratch. This entry combines well with shuttle's (recommended blog, btw) entry on testing.

The following text has the technical notes first. For a breakdown on the request and its context, please check the Context section. What follows are "field notes on a first implementation". When applicable I'm adding references to potential libraries that could simplify all the implementation with more resilient and tested code.

Tooling git

Perhaps after reading the request one could be tempted to deploy a script-ish thingy that takes care of the piping with git. I decided to go ahead and build surfaces for this particular request anticipating following features that could benefit from having a window available. This bet actually paid off but it's material for another entry.

Back to the request, the idea is to do a git push on two branches. But you probably suspect that there's a ton of details hidden there. To name just a few:

A user needs to know her position: repo, branch, working directory and probably more.
- This will be handy if one of the goals is to display this information on a window.
git push is working as a shorthand for:
- git add: but only what's relevant for the particular branch. If there's anything else in the repo that doesn't belong to the work of this asset team, it's ignored.
- git commit: including a comment provided by the user or automatically generated.
- git push: since the usual work of this team is most usually remoted too.
git lfs is used in one of the repositories.
By team custon this tool should make sure that the user has the local branches synchronized to remote latest.

The Not another git client section contains a couple more ideas on how was I imagining this new surface working.

And with that target in mind, off I went looking for an all-powerful git library in crates.io.

First steps: libgit2

One of my first goals was to determine the active branch from the path of a file or directory inside a locally cloned git repository. One reasonable first candidate to use is git2 (or the docs if you prefer).

Let's see an example on how to get the name of a branch:

 
fn get_repo_shorthand(path: &Path) -> String {
    let repo = git2::Repository::discover(path).unwrap();
    if repo.is_bare() {
      panic!("Bare git repositories supported");
    }
    let head = repo.head().unwrap();
    let Some(head_shorthand) = head.shorthand() else {
      panic!("shortname non UTF-8 or invalid");
    };
    return head_shorthand.to_string()
}

or perhaps you're interested in the upstream name?

 
fn get_upstream_name(repo: &Repository) -> String {
    let head = repo
        .head()
        unwrap();

    let Some(head_name) = head.name() else {
        panic!("head name non UTF-8 or invalid");
    };

    let ups_name = repo.branch_upstream_name(head_name).unwrap();
    let Some(un) = ups_name.as_str() else {
        panic!("upstream name non UTF-8 or invalid");
    };
    un.to_string()
}

The examples directory in the repository is a good starting point when approaching this library. They truly helped me get started.

It was time to implement something beefier, let's say, the add + commit part.

git LFS is a killjoy

There I was, happily trodding along with git2 and rust. Until I started testing the implementation of a git add operation through the library, a simple: let's add these 2 files under LFS.

I wasn't able to make it work through git2. Or, better said, I was able to add + commit but no amount of tweaking made the files register properly inside LFS. I was consistently committing files into the repo as binaries.

In retrospect, I should've never tried. Sometimes is good to take a look around when the first signs of weird stuff start to appear around you. It seems as if this issue has been the source of entertainment of many good folks for a while.

If you're curious, the spec of Git LFS is around here. The core idea is that it's extending Git through a clever use of Git Hooks and Git Filters (look for "clean" and "smudge").

On one hand it felt like defeat. On the other, seems as if every other tool dev facing this issue has decided that it wasn't worth the time and reverted to invoking git through a subcommand. And very soon the code started to look like this:

 
let _cmd = Command::new("git")
    .arg("rev-list")
    .arg("--count")
    .arg("HEAD...@{u}")
    .headless()
    .output()
    .unwrap();

or like this:

 
let _cmd = Command::new("git")
    .arg("diff")
    .arg("--name-only")
    .arg("--staged")
    .headless()
    .output()
    unwrap();

Where the builder extension headless takes care of a detail in Windows where we try to avoid the opening of a new console.

Your own git server: gogs

As you can imagine from the list of features above, the code started to grow and very soon testing by hand was absolutely a nightmare. Ideally I wanted my own personal git server, do whatever I wanted with it and then nuke it. Preferably in a repeatable way. Ideally all driven by code. In other words: I wanted to run a git server in docker.

Any quick visit to r/selfhosted will give you many options on what's out there to choose. Personally, I'm using gitea at home and it's working (for my tiny needs) flawlessly. And, of course, this was my first option. Then, I tested a simpler project, gogs and noticed that the memory footprint of the running containers was significantly lower.

This link will give you more details on how to run gogs on Docker.

Ok, the server was selected and it was possible to start a couple gogs, configure them and do things and then nuke them. That also grew tiresome pretty quickly. So I decided to write a tiny crate to take care of this idea of: start a gog server and run it with a configuration I know and, also, clean any remaining poop after I was done.

Let's go bit by bit.

Configuring gogs server

If you start gogs through docker as described in the manual and navigate to the exposed port in your host you'll see a configuration page. The server needs the database details to complete booting. Let's say you go and configure the service however you want. The server finishes its configuration stage and boots. At that point you have a fully containerized git server (with support for LFS, btw) and you can start your testing.

Docker Desktop running Gogs

This is very interesting but it's also useless if you intent to automate all this process. If you dig deeper in the mounted volume, you might notice an interesting new file: gogs/conf/app.ini. Let me copy a bit of it:

 
# ... things above

[database]
TYPE     = sqlite3
HOST     = 127.0.0.1:5432
NAME     = gogs
SCHEMA   = public
USER     = gogs
PASSWORD = 
SSL_MODE = disable
PATH     = data/gogs.db

# ... things below

Seems as if the server is writing its configuration into an .ini file?

I have a gist with an app.ini lying around if you want to take a deeper look.

So let's say that instead of creating an empty directory for the volume of the container, we add that file in the location it expects. And since I was trying to make this as replicatable (is that even a word?) as possible I started to use tempfile to have an OS abstracted temp file and directory solution that took care of the deletion of the temp dir on drop.

And that actually worked! Now the container took a smidgeon longer to start since it was completing the setup but when finished it was ready to accept requests. No more configuration page.

Which port?

At this point I had a running containerized git server that I could connect to and start messing with it. That was great. The next issue to solve was "which port is it exposing?". I didn't want to hardcode every single port. Or to limit myself to run only one git server. Fortunately docker has us covered with docker container port.

In a very clumsy way I could express this in rust:

 
// Let's run the container ...
let path_volume: Path = p; // [ A directory with your precooked "app.ini" configuration ];

let cmd_run_container = Command::new("docker")
    .arg("run")
    .arg("-d")
    .arg("-p")
    .arg("3000")
    .arg("-v")
    .arg(format!("{}:/data", path_volume.display()))
    .arg("gogs/gogs:latest")
    .headless()
    .output()
    .unwrap();

assert!(cmd_run_container.status.success());
let container_id = String::from_utf8(cmd_run_container.stdout).unwrap();

// Let's check the port
let cmd = Command::new("docker")
    .arg("port")
    .arg(&self.container_id)
    .headless()
    .output()
    .unwrap();
assert!(cmd.status.success());
let port_description = String::from_utf8(cmd.stdout)?;
// Where port_description might contain IPV6 addresses too.

Where that port_description variable will contain the port number you need to use to reach the server in your host.

Let's use BOLLARD or an equivalent

Did I mentioned anywhere that these are "field notes of a first implementation"?

me quedo por aqui

Is it alive?

How to check if the server is ready to accept connections.

Create a user

Where a small call to the service is done and a new super user is created.

Writing the first test

What if we transform all this docker stuff into a fixture?

rstest

a new library makes an entrance

Note: testcontainers arrived late

Probably is good to read the entirety of an article before starting implementing like a mad man. SHRUG

Note: bollard never made an entrance

Even though this is just a guess, it's probably way cleaner to talk with the daemon directly instead of using the CLI. Perhaps for the next loop of the tool.

CI: Introducing nextest profiles

All these testing was based on having docker at hand. This isn't a requirement for our local developers. I wanted to make an "opt-in" mechanism. Let's pretend we're running in the CI machines.

rstest VS. nextest: no `#once`

Where some links and references are mentioned to

Context

The request in detail

In late 2024 I received a request from an asset team¹ in the company. For reasons they were working with several git repositories simultaneously, by hand. If you have worked in a team where "raw assets" and "cooked assets" are both version controlled in different repos, you might be familiar with the scenario.

Reminder: the assets in question, either raw or cooked, are not diff friendly. They're a mix of binary and text files. Can't be merged and are "big" in the range of KB - MB.

Let's breakdown a bit:

Name	Description
source repo	A git + LFS repository containing raw assets. Sometimes originals, sometimes assets using lossless compression. Not used by any build system. Owned by the asset team.
target repo	A git repository containing cooked assets. Packaged assets, compressed. Input to build flows. Shared with several other teams.
authoring tool	3rd party product used by the asset team. Used to generate raw assets.
wiring tool	Internally developed metadata editor used by the asset team to publish the assets and expose them to the runtime.
git client	3rd party git client. The asset team has agreed on which one.²
cooking tool	A mix of scripts, 3rd party and internal tools that transform source assets into target assets.
	And perhaps particularly relevant for our case
source branch	A git branch in the `source repo`. In this context we assume this branch has been shared with the team. It's pushed.
target branch	A git branch in the `target repo`. As before, this branch has been shared with the team. It's pushed.

These asset teams are doing a lot of PR work. They're used and expected to work in branches and then merge.

Using the concepts above we can imagine Alice's work day thus:

Alice opens git client and fetches both source repo and target repo.
Magically³ she knows which source branch and target branch to use.
Using the authoring tool she produces the raw assets into source branch.
Using the wiring tool she connects the raw assets, defines the metatadata and iterates.
Using the cooking tool she produces the cooked assets in the target branch.
Through git client pointing at source repo she adds, commits and push to source branch.
Again, using git client in target branch she adds, commits and push to target branch.
Hopefully, profit.

That's the context for the request. At surface level, reasonably simple. Just a bit of surface, minimal UI and day done, right?

Approach: A new module in wiring tool

It seemed as if the request was aiming to reduce the friction at the end of "Alice's work day", let's simplify that jumping back and forth between apps, repositories and branches. What if we simply had a button that "pushed" to both branches at the same time? Perhaps we could have a big fat text box where to jot a message for the commit? Or even we could add it automatically? And since we have both the modified files and knowledge about the domain, why not showing a list of modified files and assets?

This is what I pitched to the team:

Alice, using magic, prepares the source and target branches.
She iterates between: authoring - wiring - cooking steps.
In the wiring tool she opens the session control window.
She reviews a summary of the work done:
1. Files affected.
2. Entities modified.
3. Warnings or errors in either source or target.
Alice adds a description to her work in a session control text input.
A submit button in session control will (on both source branch and target branch):
1. Add the relevant files changed.
2. Commit with provided comment.
3. Push to repository.

In other words, the idea was to make the wiring tool somewhat aware that its intent was to share material with other parts of the production. An extension that will introduce some git awareness and details on how the target repo expected the materials to look.

Not another git client

I didn't want to write anything close to a git client. We seem to have more than enough: gh desktop, lazygit, gitx, GitUI, fork, etc. There's this old joke about having a "new Javascript framework per week". We probably get a new git client with it as a bonus. I have nothing against having choices. It's actually quite amazing. It's just that sometimes I miss the old Perforce client, with its silly animated icons, where the team has exactly one option. That makes tool devs lives significantly easier.

The goal was to display the changes done in a way that was relevant to the paticular assets. I wanted to go from:

File walk1.anim is modified.

Game entity boom-boom-man has changed the payload of component animation-loops (that happens to live in the file walk1.anim) and it's shared by entities creepy-boi and stompy-dude.

This way the user, before sharing any changes with the rest of the team, has an overview of the changes in the context of the project.

On git for asset teams

If you're lucky⁴ you're using perforce and you have other issues.

Chances are that your organization has adopted git. With a pinch of luck you haven't heard about git submodules. Assuming you're working only with code, your repo doesn't use git LFS because it doesn't need it.

That's not our case.

The asset team requesting this feature is exposed to both submodules and LFS. And that's a double issue:

Submodules: Are confusing for non technical peeps. Their usage sometimes require users to force sync. Every git client solves its representation in a slightly different way.
LFS: is bolted on top of git. At the end it's a plugin. This makes the use of any library, for example git2, borderline impossible⁵. Even simple things, such as a call to add, won't properly invoke the lfs hooks tooling. It's easy to end with your committed binaries in the repository outside of the lfs control.

A note on "Magic"

You probably have noticed the word "magic" in the sections above. In the context of this text it's a shorthand for: "this branch matching is done by hand by humans using a combination of Slack messages, JIRA-ish systems, emails and over the screen conversations". It's also the most important issue that Alice and her asset team is facing. In this production, the branch an asset is living in a repository has meaning. If it has reached main it's considered shipped. From that point in time any shipping of the title that includes it won't be leaking anything. This rule applies to both the repositories listed above: source and target. With this in mind we can revisit the breakdown of tasks done by Alice and notice that we're missing a critical step: deciding when to merge back to main on both repositories.

Taking this "magic" into account, the introduction of the session control window in the wiring tool is solving a small part of the production issue. It's the programmery part of all this. On defense of this line of action I could argue that probably any implementation of a system that takes care of tracking the lifetimes of git branches will use one or more of the components that went into implementing session control.

And hopefully this will be the topic of a future entry in this series.

jcb out, good hunt out there!

These guys are not programmers, they are artists. They build assets for a living: chonky binary files that will be included in the final build more o less as they are produced.

⁴

Always understanding that P4 is it's own purgatory.

⁵

If you happen to know of a rust - friendly library that's able to deal with git + LFS, please, send me a message.

Depending on how lax your internal policies are, you might discover yourself constantly asking: Which git client are you using? And that makes any documentation or automation or issue tracking significantly more challenging.

With a bit of luck in a following entry, I might be able to tackle this magic a little.