Using GPT to write Git commit messages without making your teammates hate you
If I had a dollar for every time I thought “this is clean and ready” and then realized the Git commit message made zero sense six months later… I’d have enough to buy yet another GPT-4 add-on token package.
I’ve been using GPT to write commit messages for a while now — originally just dumping `git diff` into ChatGPT and asking for a quick summary. That mostly worked, until the day I blindly pasted in a 4000-line unformatted diff and the model essentially sighed at me. Since then, I’ve tried to streamline the process with some actual Git-based inputs and automation.
Let’s be real: GPT can write grammatically perfect commit messages that *still* don’t tell you what a change actually does. The trick is to feed the model more useful context and trim the noise before asking for message suggestions. And—and this is important—you have to live-test your prompts with *several* types of diffs. More than once, I got excellent summaries for simple file renames but completely garbage commits for dependency bumps.
Here’s what that journey looked like: half functional Bash scripts, multiple hallucinating GPTs, and a few PR reviews that hurt my feelings 🙂
## 1. Prompting GPT with diffs often returns word salad
So the naive approach I started with looks like this:
“`bash
git diff HEAD^ HEAD > changes.diff
cat changes.diff | pbcopy
“`
And then I’d paste that into ChatGPT with something like:
> “Write a Git commit message based on these code changes.”
What I forgot to consider was how hard GPT struggles when I don’t give *any* sense of purpose or structure. Instead of something useful like:
> Fix broken redirect on login page
I’d get this gem:
> Updated various JavaScript and JSON files to change values and improve application response behavior for better performance and user experience.
¯\_(ツ)_/¯
Cool, thanks. Which files? What user experience? Did anything break? Nobody knows.
What worked better was including something like this before the diff:
> “This change is part of the login auth refactor. Focus on describing user-facing login behavior and security changes. Use conventional commit format.”
That actually got me 70% closer. But even then, I had to watch out for GPT inserting imaginary file names or describing logic that wasn’t in the diff at all.
## 2. Use git diff with flags or prep scripts to avoid noise
One massive friction point: just running `git diff` will include huge chunks of noise, like version bumps in `package-lock.json` or a total re-indent of an HTML file. GPT eats up the entire context window trying to parse this, and then its summary will lean heavily on irrelevant garbage.
Some quick fixes:
– Use `git diff –cached` if you’ve staged your changes selectively
– Exclude lockfiles and large autogenerated files with a `.gptignore` list (I literally keep a shell alias for this)
– Strip unchanged function sections using `–unified=5` or smarter syntax awareness (e.g. `diff-so-fancy`)
One prompt that gives me semi-reliable results:
> “Summarize the functional changes in this code diff. Ignore formatting and whitespace-only lines. Follow the format: type(scope): action.”
I capped this off with a fuzzy post-filter that drops GPT suggestions that are longer than 120 characters and rewrites anything that starts with “updated” or “modified” because… yeah. Not helpful.
## 3. Writing a CLI wrapper around GPT makes this usable
Eventually, I duct-taped together a little Node.js tool (honestly, mostly copied from `openai-node`) that:
– Detects staged changes
– Builds a diff excluding noisy files
– Appends a context message like “This is part of feature X…”
– Sends it all into the GPT API via temp file
– Outputs only the first line of the response into `git commit -m`
And it *works*, like half the time.
The remaining 50% either:
– Time out if the diff is too large
– Return maybe-too-clever messages like “strip rust from old migrations” (??? I was deleting old schema files)
– Hallucinate file names and functions
But what’s wild is how quickly you start trusting these outputs until someone on the team asks: “What does `chore(login): reframe legacy state` even mean??”
I had no idea. I had to re-read the diff to figure out what GPT was talking about. Turns out it noticed a renamed variable from `legacyMode = true` to `mode = ‘fallback’`, which I guess is a vibe shift?
Here’s how I caught problems earlier:
– Automatically print out the full GPT response before committing
– Offer a –dry-run flag that just logs GPT vs human message side-by-side
– Hard fail any messages over 100 characters
– Show file count and biggest hunk sizes before sending to GPT
Once I did that, I could quickly spot when GPT glossed over critical features or went off the rails. Doesn’t fix everything, but it avoids embarrassing PR titles like “Fix things”
## 4. Real prompt examples that actually work decently
I saved a few prompt templates in my scratchpad.txt (yes, that’s its name) that consistently give decent commit messages across at least 80% of diffs:
### For new features:
> “You are a senior backend engineer. Write a concise Git commit message describing the key impact of these changes. Focus on what new behavior is enabled, and avoid discussing structure or formatting. Format as `feat(area): useful message`”
### For tests or infra:
> “Summarize the following changes to CI and test logic. Skip file names in the summary. Use the `test(scope): message` format.”
### For bugs:
> “Assume this diff fixes a user-facing problem. Summarize only the fix behavior, not the cause or file details. Format: `fix(scope): brief change`”
But sometimes those still fall apart, e.g. if the commit mixes concerns or touches both frontend and backend pieces. GPT just picks whichever caught its eye first, not necessarily the most important one.
Also, a weird one: if I left a TODO or unfinished change in the diff (`// TODO – add retry logic later`), the GPT commit message would sometimes include “adds retry logic for stability” even though no such thing was actually implemented. Huge trust-destroyer 🙁
## 5. When it’s worse than nothing and how to fix it
Here’s the thing: GPT-generated commit messages *can* be better than the lazy stuff I used to write. But only if I treat it more like a fussy assistant than a Git-integrated brain.
The worst commit message I let slip in was generated off a 600-line refactor:
> “Refactor application”
Yeah, no. I could feel my past self getting mad.
So now I’ve got a checklist I try to run through before committing with GPT help:
– Is the diff under ~1000 lines? (Otherwise chunk it manually)
– Does it touch more than one type of file? (Don’t trust GPT to choose which to describe)
– Did I include a clear scope in the context message? (Front-end? Auth? Logging?)
– Are there any lingering comments, TODOs, or commented-out code that could mislead GPT?
– Did it hallucinate new features or imagined bug fixes?
When all else fails, I fall back to:
“`bash
git diff –name-only
“`
And then just read those filenames out loud to myself until something human clicks. No shame in writing your own commit message when the robot starts to improv jazz on your code changes 😛