git grab bag

Time to flush my buffers again before the new year. Happy new year, everyone!

In no particular order here are some useful git things I've run across recently.

git stash --keep-index

git's index (or staging area) is useful, but it comes with a liability: you run into the danger of committing a state that doesn't work (compile, pass tests, etc.) because you never actually "saw", on your disk, the stuff that was going to be committed, in isolation.

That's what git stash --keep-index is for: it stashes the changes you haven't staged for commit, so you are left with only what will get committed. Do your builds/testing/verification, then git stash pop to continue working.

git clone --reference

If you're trying to clone a large remote repo, it's a waste of time to re-download objects that are already present in another repository on your local disk or local network. You can use the --reference option to obtain objects from this local repository when possible (any objects not found are pulled as usual from the real source):

git clone --reference LOCAL.git REMOTE.git my_new_clone

(This is sort of like cloning LOCAL.git, changing its remote to point to REMOTE.git, and fetching again, but much easier, and doesn't pollute your clone with branches and commits that are only present in LOCAL.git but not in REMOTE.git. From a content perspective it is exactly a clone of REMOTE.git.)

Note: if LOCAL.git is on the same filesystem, git sets up the alternates file so that object IDs can be resolved just using the object files in LOCAL.git. This may be undesirable because you won't actually have the objects in your new repo(!) and references in your new repo alone will not(!) be sufficient to keep the objects from being GC'd if LOCAL.git changes. To break this dependency, forcing the required objects to actually be copied into your new clone, do the following:

git repack -a; rm .git/objects/info/alternates

gfind

To make git print out all the files it is tracking:

git ls-tree -r --name-only HEAD

This is a useful base for searching your codebase, since it automatically lets you search all of your actual code and avoid looking at compiled code, generated code, downloaded libraries and documentation, etc.

I have aliased this to gfind and do things like this to search within a repo:

gfind | xargs grep FOO

git-filter-branch

git-filter-branch is your general purpose tool for rewriting repositories wholesale, for example, to extract a single subdirectory retaining all its history, or to excise a subdirectory, file, secret key, password, etc. that you've put in the repository.

Matthew McCullough has put together a good starting point with examples of how to use this tool. (Unfortunately, some of the links are broken; I will post alternate links once I find track down copies.)

git diff --color-words

Word based diffs in git! Great for LaTeX, Markdown, etc. More info here.