Chapter 3. Making Commits

This chapter explains how to make changes to your repository content: add, edit, and remove files; manipulate the index; and commit changes.

Changing the Index

When you run git commit, without arguments or options, Git adds the contents of the index as the latest commit on the current branch. So before committing, you add to the index those changes you want to commit. This can skip some changes you’ve made to your working files, if you’re not ready to commit those yet.

git commit filename

Giving a specific filename to git commit works differently: it ignores the index, and commits just the changes to that file.

$git add filename This is suitably mnemonic, but note the next command. Adding the Changes to an Existing File $ git add filename

Yes, this is the same command. In both cases, Git adds the current working file contents to the object database as a new blob-type object (assuming it’s not already there), and notes the change in the index. If the file is new, then this will be a new index entry; if not, just an updated one pointing to the new object (or with changed attributes, such as permissions)—but it’s essentially the same operation to Git. A file is “new” if its pathname is not in the index, usually meaning it was not part of the last commit; this is what causes git status to note a file as “untracked” prior to your adding it (files in the index are called “tracked,” and they are the ones Git cares about, generally speaking).

The filename can be a directory, in which case Git adds all changes to tracked files under that directory.

$git add -p You can also add only some of the changes you’ve made to a file, using git add --patch (-p). This starts an interactive loop in which you can select portions of the changes you’ve made and skip others. When you’re done, Git adds to the index versions of the relevant files with only those changes applied to them. git status reports this situation by listing the same file under both “changes not staged for commit” and “changes to be committed,” since the file in fact has a mix of both. This is a very important feature, since it helps you to make well-factored commits. When you’re done with some editing and ready to commit, you may realize that you’ve made changes that ought to be represented by more than one commit; perhaps you’ve fixed two bugs in the same file, or tidied up some unrelated comments while you were at it. git add -p allows you to conveniently split the work up into separate commits. The interactive loop has a number of options with integrated help (use “?”), but note particularly the s command to split a hunk into smaller changes (if Git’s initial analysis glues together pieces you want separated), and the e command, which allows you to edit hunks yourself. If you set the interactive.singlekey Git configuration variable, you can use single keystrokes for these commands and skip typing return after each. Just running git add -p with no arguments will let you examine all files with unstaged changes (unlike just git add, which requires an argument or option to tell it what to add). You can also specify particular files to consider as arguments. git add -p is actually a special case of git add --interactive (-i). The latter starts at a higher level, allowing you to view status, add untracked files, revert to the HEAD version, select files to patch, etc.; git add -p just jumps straight to the “patch” subcommand of git add -i. Shortcuts git add -u Include all files in the current index; this includes changed and deleted files, but not new ones. git add -A Include all filenames in the index and in the working tree; this stages file additions as well. This is useful if you are importing a new version of code from another source not in Git, traditionally called a “vendor branch.” You would replace your working tree with the unpacked new code, then use git add -A to stage all changes, additions, and deletions necessary to commit the new version. Removing a File $ git rm filename

This does two things:

1. Deletes the file’s entry from the index, scheduling it for removal in the next commit
2. Deletes the file from disk as well, as with rm filename

If you happen to delete the file yourself first, that’s no problem; Git won’t care. Removing it from the index is what matters; deleting the working copy afterward is just being tidy. In both cases, git status will show the file as deleted; the difference will be whether it is listed under “changes not staged for commit” (if you just deleted the working file), or “changes to be committed” (if you used git rm).

git rm on a file not yet under version control won’t work, though; just use rm.

Renaming a File

Renaming a file or moving a directory in Git is simple using the git mv command:

$git mv foo bar This is actually just a shortcut for renaming the working file outside Git, then using git add on the new name: $ mv foo bar
$git add bar Renaming is a thorny topic in version control generally. Renaming a file is in a sense equivalent to deleting that file and creating a new one with a different name and the same contents—but that might also occur without your meaning to rename anything, if the new file just happens to coincide with the old one. The distinction is one of intent, and so must be represented separately by the system if it is to be captured at all. And it can be quite important to do so, because people generally want the history of a renamed file to be preserved; by even calling what we’ve done “renaming,” we are implicitly saying that this is really “the same file, just with a different name.” We don’t want to lose the history just because we changed the name. Which begs the question: just what is a “file,” anyway? Is it just the content? No, because we track changes to content to the same file over time. Is it just the name? No, because sometimes we want to “rename” the file, which considers the content to be primary and the name secondary. The truth is that there is no single answer to this question, since it depends on user’s wishes in a particular situation—and so it is hard to design a single system to accommodate it, and systems vary in how they do so. CVS does not handle renaming at all. Subversion has explicit renaming: it represents a rename operation separately from a delete/create pair. This has some advantages, but also engenders considerable complexity in the system to support it. Git’s approach is to not track renaming explicitly, but rather to infer it from combinations of name and content changes; content-based addressing makes this particularly easy and attractive as a matter of implementation. Git doesn’t have a “rename” function internally at all; as indicated, git mv is just a shortcut. If you run git status after the first command above, you’ll see what you’d expect: Git shows foo as deleted, and the new file bar as untracked. If you do it after the git add, though, you see just one annotation: renamed: foo -> bar. Git sees that the file for a particular index entry has been removed from disk, while a new entry has appeared with a different filename—but the same object ID, and hence the same contents. It can also consider renaming relative to a less strict notion of file equivalence—that is, if a new file is sufficiently similar to one that’s been deleted rather than 100% identical (see the options for renaming and copy detection in Chapter 9). This approach is very simple, but it requires that you sometimes be aware of the mechanics. For example: because this analysis is expensive, it is turned off by default when examining history with git log; you have to remember to enable it with -M if you want to see renaming. Also, if you edit a file substantially and rename it in a single commit, it may not show up as a rename at all; you’re better off editing, committing, then doing the rename in a separate commit to make sure it shows up as such. Unstaging Changes If you want to start over with this process, it’s easy: just use git reset. This resets the index to match the current commit, undoing any changes you’ve made with git add. git reset reports the files with outstanding changes after its action: $ git reset
Unstaged changes after reset:
M       old-and-busted.c
M       new-hotness.hs

You can also give specific files or directories to reset, leaving staged changes in other files alone. With git reset --patch you can be even more specific, interactively selecting portions of your staged changes to unstage; it is the reverse of git add -p. See “Discarding Any Number of Commits” for other options.

Making a Commit

When you’ve prepared the index you want, use git commit to store it as a new commit. Use git status first to check the files involved, and git diff --cached to check the actual changes you’re applying. git diff alone shows any remaining unstaged changes (the difference between your working tree and the index); adding --cached (or the synonym --staged) shows the difference between the index and the last commit instead (i.e., the changes you’re about to make with this commit).

Commit Messages

Each commit has an associated “commit message”: some free-form text used to describe the changes introduced by that commit. You can give the message on the command line as:

\$ git commit -m "an interesting commit message"

If you don’t, Git will start a text editor to allow you to enter your message; “Text Editor” describes how the editor is chosen. Although the text is free-form, the usual practice is to make the first line no longer than 50–60 characters or so. If you need further lines, then separate them from the first one with a blank line, and wrap the remaining paragraphs to 72 characters. The first line should serve as a subject line for the commit, as with an email. The intention is to allow listings that include the commit message to usefully abbreviate the message with its first line, still leaving space for some other information on the line (e.g., git log --oneline).

It’s actually rather important to follow this convention, since lots of Git-related software as well as various parts of Git itself assume it. The subject line of a commit is addressable as a separate entity when writing commit formats and extracting commit information, and programs that display commits in various contexts assume that the subject will make sense on its own and not be too long. GitHub and gitweb both do this visually, for example, displaying the subject as a separate item in bold at the top, with the rest of the message (the “body”), if any, set in smaller text below. You’ll get odd-looking results that are difficult to read if the first line is just a sentence fragment and/or too long to fit in the allotted space.

Following this convention can also help you make better commits: if you find it difficult to summarize the changes, consider whether they might better be split into separate commits. Which brings up the topic of the next section.

What Makes a Good Commit?

This depends on how you intend to use your repository and Git in general; there’s no single right answer to this question. Some people use the convention (if the content is software) that every commit must be buildable, which means that commits will generally be larger since they must contain everything required to advance the code from one coherent stage to another. Another approach is to structure your commits primarily to take advantage of Git’s ability to transmit and reuse them. When preparing a commit, ask yourself: does it contain entirely and only the changes necessary to do what the commit message says it does? If the commit says it implements a feature, does someone using git cherry-pick to try out the feature have a decent chance of that succeeding, or does the commit also contain unrelated changes that will complicate things? Think also about later using git revert to undo a change, or about merging this branch into other branches to incorporate the new feature. In this style, each commit might not produce functional software, since it could make sense to represent a large overall change as a series of commits in order to better reuse its parts. You can use other methods to indicate larger project checkpoints like buildable intermediate versions, including Git tags or unique strings in commit messages which you can find using git log --grep.

Be careful too with the timing of your commits, as well as with their content. If you are going to make wide-ranging, disruptive changes such as adjusting whitespace, renaming functions or variables, or changing indentation, you should do that at a time when others can conveniently take your changes as given, since automatic merge is likely to fail miserably in such cases. Doing these things while others are doing lots of work on related branches—say when a big merge is coming up—will make that merge a nightmare.

There are other issues about which version control users in general can argue endlessly: for example, how should commits be phrased grammatically? Some like the imperative mood (“fix a bug”), while others favor the past tense (“fixed a bug”). It is common in the Git source code itself to refer to adding a feature as “teaching Git” to do something. Obviously there is no strict guideline to be had here, though consistency at least makes it easier to search for specific changes.

Shortcuts

git commit -a adds all tracked, modified files to the index before committing. This commits changed and deleted files, but not new ones; it is equivalent to git add -u followed by git commit. Be careful, though; if you get too accustomed to using this command, you may accidentally commit some changes you didn’t intend to—though that’s easy to undo; see the next chapter.

Empty Directories

Git does not track directories as separate entities; rather, it creates directories in the working tree as needed to create the paths to files it checks out, and removes directories if there are no longer any files in them. This implies that you can’t represent an empty directory directly to Git; you have to put at least one placeholder file within the directory to get Git to create it.

A Commit Workflow

Here’s a procedure for making multiple commits from a single set of edits to your working files, while making sure each commit is good:

1. Use git add (with various options) to stage a subset of your changes.
2. Run git stash --keep-index. This saves and undoes your outstanding, unstaged changes while leaving your staged changes in the index alone, resetting your working tree to match the index.
3. Examine this working tree state to make sure your selection of changes makes sense; build and test your software, for example.
4. Run git commit.
5. Now, use git stash pop to restore your remaining unstaged changes, and go back to step 1. Continue this process until you’ve committed all your changes, as confirmed by git status reporting “nothing to commit, working directory clean.”

See “git stash” for more on the useful git stash command.