git aware tools: Part 1
jcbellido December 07, 2024 [Code] #Rust #rstest #docker #shuttle.rs #git #LFSIn this article series I want to describe how what looked like a common feature request from an assets team:
we want a button that runs "git push" in two repos, at the same time
spiraled out of proportion with great amusement and joy. How I began by failing at making libgit2 deal with LFS in any reasonable way. Introduced the rstest library into the team. Modified the default profile of nextest and brought docker-based integration tests into our CI toolkit.
This piece is a nice follow up to Rust Macros and Teams where I manually created a test fixture. And reads even better when combined with shuttle's (recommended blog, btw) entry on testing.
This text is ordered with the technical notes first. For a breakdown on the request and its context, please check the Context section. You'll notice that the following text describes a very manual way of implementing all the steps. For example docker is interfaced through the CLI, to name only one detail. What follows are "field notes on a first implementation". When applicable I'm adding references to potential libraries that could simplify all the implementation with more resilient and tested code.
Tooling git
Perhaps after reading the request one could be tempted to deploy a script-ish thingy that takes care of the piping with git. I decided to go ahead and build surfaces for this particular request anticipating following features that could benefit from having a window available. This bet actually paid off but it's material for another entry.
Back to the request, the idea is to do a git push
on two branches. But as probably you very well know there's a universe of details there. To name just a few:
- A user needs to know: repo, branch, working directory and probably more.
git push
in this context is a shorthand for:git add
: only what's relevant for the particular branch.git commit
: including a comment provided by the user or automatically generatedgit push
: since the usual work of this team is most usually remoted too.
- git
lfs
is in the mix. - By design, this tool should avoid introducing conflicts and make sure that the user has the local branches synchronized to remote latest.
And with that target in mind, off I went with my lovely git2 library at hand.
libgit2
One of my first goals was to determine the active branch when given a file path of a file or directory inside a locally cloned git repository. One reasonable first candidate to use is git2, alternatively their page in git2 docs.rs. This is a big boi library that requires a pinch of reading when used.
Let's see an example on how to get the name of a branch:
or perhaps you're interested in the upstream name?
The examples directory in the repository is a good starting point when approaching this library. They truly helped me get started.
It was time to implement something beefier, let's say, the commit
part.
git LFS is a killjoy
There I was, happily trodding along with git2 and rust. Until I started testing the implementation of a git add
operation through the library, a simple: let's add these 2 files under LFS.
I wasn't able to make it work through git2
. Or, better said, I was able to add + commit
but no amount of tweaking made the files register properly inside LFS. I was consistently committing files into the repo as binaries.
In retrospect, I should've never tried. Sometimes is good to take a look around when the first signs of weird stuff start to appear around you. It seems as if this issue has been the source of entertainment of many good folks for a while.
If you're curious, the spec of Git LFS is around here. The core idea is that it's extending Git through a clever use of Git Hooks and Git Filters (look for "clean" and "smudge").
On one hand it felt like defeat. On the other, seems as if every other tool dev facing this issue has decided that it wasn't worth the time and reverted to invoking git
through a subcommand. And very soon the code started to look like this:
let _cmd = new
.arg
.arg
.arg
.headless
.output
.unwrap;
or like this:
let _cmd = new
.arg
.arg
.arg
.headless
.output
unwrap;
Where the builder extension
headless
takes care of a detail in Windows where we try to avoid the opening of a new console.
Your own git server: gogs
As you can imagine from the list of features above, the code started to grow and very soon testing by hand was absolutely a nightmare. Ideally I wanted my own personal git server, do whatever I wanted with it and then nuke it. Preferably in a repeatable way. Ideally all driven by code. In other words: I wanted to run a git server in docker.
Any quick visit to r/selfhosted will give you many options on what's out there to choose. Personally, I'm using gitea at home and it's working (for my tiny needs) flawlessly. And, of course, this was my first option. Then, I tested a simpler project, gogs and noticed that the memory footprint of the running containers was significantly lower.
This link will give you more details on how to run gogs on Docker.
Ok, the server was selected and it was possible to start a couple gogs, configure them and do things and then nuke them. That also grew tiresome pretty quickly. So I decided to write a tiny crate to take care of this idea of: start a gog server and run it with a configuration I know and, also, clean any remaining poop after I was done.
Let's go bit by bit.
Configuring gogs server
If you start gogs through docker as described in the manual and navigate to the exposed port in your host you'll see a configuration page. The server needs the database details to complete booting. If you go ahead, select sqlite3
and let it finish you'll notice an interesting new file created from the container: gogs/conf/app.ini
.
[... continues ...]
A handy initial configuration lives in this gist with app.ini.
Dynamic ports in the host
Is it alive?
How to check if the server is ready to accept connections.
Create a user
Where a small call to the service is done and a new super user is created.
Writing the first test
What if we transform all this docker
stuff into a fixture?
rstest
a new library makes an entrance
Note: testcontainers arrived late
Probably is good to read the entirety of an article before starting implementing like a mad man. SHRUG
Note: bollard never made an entrance
Even though this is just a guess, it's probably way cleaner to talk with the daemon directly instead of using the CLI. Perhaps for the next loop of the tool.
CI: Introducing nextest profiles
All these testing was based on having docker at hand. This isn't a requirement for our local developers. I wanted to make an "opt-in" mechanism. Let's pretend we're running in the CI
machines.
rstest VS. nextest: no #once
Where some links and references are mentioned to
Context
The request in detail
In late 2024 I received a request from an asset team1 in the company. For reasons they were working with several git repositories simultaneously, by hand. If you have worked in a team where "raw assets" and "cooked assets" are both version controlled in different repos, you might be familiar with the scenario.
Reminder
: the assets in question, either raw or cooked, are not diff friendly. They're a mix of binary and text files. Can't be merged and are "big" in the range of KB - MB.
Let's breakdown a bit:
Name | Description |
---|---|
source repo | A git + LFS repository containing raw assets. Sometimes originals, sometimes assets using lossless compression. Not used by any build system. Owned by the asset team. |
target repo | A git repository containing cooked assets. Packaged assets, compressed. Input to build flows. Shared with several other teams. |
authoring tool | 3rd party product used by the asset team. Used to generate raw assets. |
wiring tool | Internally developed metadata editor used by the asset team to publish the assets and expose them to the runtime. |
git client | 3rd party git client. The asset team has agreed on which one.2 |
cooking tool | A mix of scripts, 3rd party and internal tools that transform source assets into target assets. |
And perhaps particularly relevant for our case | |
source branch | A git branch in the source repo . In this context we assume this branch has been shared with the team. It's pushed. |
target branch | A git branch in the target repo . As before, this branch has been shared with the team. It's pushed. |
These asset teams are doing a lot of PR work. They're used and expected to work in branches and then merge.
Using the concepts above we can imagine Alice's work day thus:
- Alice opens
git client
and fetches bothsource repo
andtarget repo
. - Magically3 she knows which
source branch
andtarget branch
to use. - Using the
authoring tool
she produces the raw assets intosource branch
. - Using the
wiring tool
she connects the raw assets, defines the metatadata and iterates. - Using the
cooking tool
she produces the cooked assets in thetarget branch
. - Through
git client
pointing atsource repo
she adds, commits and push tosource branch
. - Again, using
git client
intarget branch
she adds, commits and push totarget branch
. - Hopefully, profit.
That's the context for the request. At surface level, reasonably simple. Just a bit of surface, minimal UI and day done, right?
Approach: A new module in wiring tool
It seemed as if the request was aiming to reduce the friction at the end of "Alice's work day", let's simplify that jumping back and forth between apps, repositories and branches. What if we simply had a button that "pushed" to both branches at the same time? Perhaps we could have a big fat text box where to jot a message for the commit? Or even we could add it automatically? And since we have both the modified files and knowledge about the domain, why not showing a list of modified files and assets?
This is what I pitched to the team:
- Alice, using magic, prepares the
source
andtarget
branches. - She iterates between:
authoring
-wiring
-cooking
steps. - In the
wiring tool
she opens thesession control
window. - She reviews a summary of the work done:
- Files affected.
- Entities modified.
- Warnings or errors in either
source
ortarget
.
- Alice adds a description to her work in a
session control
text input. - A submit button in
session control
will (on bothsource branch
andtarget branch
):- Add the relevant files changed.
- Commit with provided comment.
- Push to repository.
In other words, the idea was to make the wiring tool
somewhat aware that its intent was to share material with other parts of the production. An extension that will introduce some git awareness and details on how the target repo
expected the materials to look.
Not another git client
I didn't want to write anything close to a git client. We seem to have more than enough: gh desktop, lazygit, gitx, GitUI, fork, etc. There's this old joke about having a "new Javascript framework per week". We probably get a new git client with it as a bonus. I have nothing against having choices. It's actually quite amazing. It's just that sometimes I miss the old Perforce client, with its silly animated icons, where the team has exactly one option. That makes tool devs lives significantly easier.
The goal was to display the changes done in a way that was relevant to the paticular assets. I wanted to go from:
File
walk1.anim
is modified.
to
Game entity
boom-boom-man
has changed the payload of componentanimation-loops
(that happens to live in the filewalk1.anim
) and it's shared by entitiescreepy-boi
andstompy-dude
.
This way the user, before sharing any changes with the rest of the team, has an overview of the changes in the context of the project.
On git for asset teams
If you're lucky4 you're using perforce and you have other issues.
Chances are that your organization has adopted git. With a pinch of luck you haven't heard about git submodules. Assuming you're working only with code, your repo doesn't use git LFS because it doesn't need it.
That's not our case.
The asset team requesting this feature is exposed to both submodules and LFS. And that's a double issue:
- Submodules: Are confusing for non technical peeps. Their usage sometimes require users to
force
sync. Every git client solves its representation in a slightly different way. - LFS: is bolted on top of git. At the end it's a plugin. This makes the use of any library, for example git2, borderline impossible5. Even simple things, such as a call to
add
, won't properly invoke thelfs
hooks tooling. It's easy to end with your committed binaries in the repository outside of thelfs
control.
A note on "Magic"
You probably have noticed the word "magic" in the sections above. In the context of this text it's a shorthand for: "this branch matching is done by hand by humans using a combination of Slack messages, JIRA-ish systems, emails and over the screen conversations". It's also the most important issue that Alice and her asset team is facing. In this production, the branch an asset is living in a repository has meaning. If it has reached main
it's considered shipped. From that point in time any shipping of the title that includes it won't be leaking anything. This rule applies to both the repositories listed above: source
and target
. With this in mind we can revisit the breakdown of tasks done by Alice and notice that we're missing a critical step: deciding when to merge back to main
on both repositories.
Taking this "magic" into account, the introduction of the session control
window in the wiring tool
is solving a small part of the production issue. It's the programmery part of all this. On defense of this line of action I could argue that probably any implementation of a system that takes care of tracking the lifetimes of git branches will use one or more of the components that went into implementing session control
.
And hopefully this will be the topic of a future entry in this series.
jcb out, good hunt out there!
These guys are not programmers, they are artists. They build assets for a living: chonky binary files that will be included in the final build more o less as they are produced.
Always understanding that P4 is it's own purgatory.
If you happen to know of a rust - friendly library that's able to deal with git + LFS, please, send me a message.
Depending on how lax your internal policies are, you might discover yourself constantly asking: Which git client are you using? And that makes any documentation or automation or issue tracking significantly more challenging.
With a bit of luck in a following entry, I might be able to tackle this magic a little.