A Concise Guide to Git
© David Matuszek, 2014

Purpose

Git is normally used along with a cloud service such as Github or Bitbucket, which are free for educational and open source projects. Git can be used by a single individual to maintain multiple versions of a project, both "snapshots" taken during development, and differing final versions for different purposes. More importantly, git provides superb support for groups working on the same project. The cloud service serves as backup.

Organization

As a programmer, you do your work in a workspace, which is a local directory on your own computer. In your workspace you will have a hidden directory (named .git) known as the local repository. The local repository keeps track of the important files in your workspace; since you are likely to have a number of additional files (scratch files, output of runs, etc.), an index in the local repository specifies which files are important. Finally, the remote repository backs up your local repository.

All repositories are equal. Any repository can serve as a remote repository to any computer that has access to it.

Command line vs. GUIs

I'm a person who almost always prefers working in a GUI to working on the command line. That said, I consider Git to be an exception. You need to know the Git commands in any case, and GUIs can leave you uncertain as to which commands are being executed. When you are first learning Git, it's easier to use the command line; later, you may wish to switch to using some GUI.

Git assumes a Unix-based operating system. If your operating system is Mac OSX or Linux, just open the Terminal application. For Windows, Git comes with a (partial) Unix emulator, GitBash. Either way, you need to be comfortable with the most common Unix commands (see, for example, this page).

Setting up

First, download and install git on your computer. Second, get an account on github (or somewhere similar). Finally, identify yourself to git. This last step can be done globally, or on a per-repository basis. To do it globally, issue the following commands:

git config --global user.name "Your name"
git config --global user.email "Your email address"

To do this for a particular repository, cd to the local repository and issue the above commands, omitting the --global flag.

Optionally, you can (maybe) configure git to use a particular editor, and to use syntax coloring. These commands are system-dependent and don't always work.

git config --global core.editor "Path to editor"
git config --global color.ui auto

Starting a project

Starting with a remote repository

First, cd to the empty directory you will use as the local repository, then enter the command

git init

Next, copy the existing remote repository into your local workspace:

git clone Path_to_the_remote_repository   # Notice path is not quoted

Now you should have your own complete copy of the code and other files, to do with as you will.

Starting with a local project

This is more complicated. Use the following commands:

  1. Create a new repository on GitHub or Bitbucket. Don't initialize the new repository with a README file; the repository needs to be empty. Notice the URL of this new repository.
  2. cd your_project_directory
  3. git init         # creates the local repository
  4. git add .        # tells Git which files to track (. means all)
  5. git commit -m "First commit"    # Tells Git to save the current
                                 # version of all tracked files
  6. git remote add origin URL_of_remote_repository
                                 # Connects the two repositories
  7. git remote -v                # verifies the remote URL
  8. git push origin master       # uploads your files

Working alone

When you are the only person working on a project, and you use only a single computer, you can assume that the code in your local repository is the most up-to-date version. In this case, the main value of using Git (or any other version control system) is to keep a record of all the changes that have been made in the code. When something goes wrong, it's easy to find out when and where the problem occurred, and to back out of the erroneous changes.

The (strongly) recommended way of working with Git is this: Fix one small bug, or add one small feature, to your project, then commit what you have just done. Every commit must include a comment that tells what you have done. The comment should be as specific as you can make it, and ideally should be only a single sentence.

An example of a "bad" (basically useless) comment would be: Fixed some bugs. A much better comment would be: Fixed crash when required foo.dat file is missing. Each comment should help you find your way back to exactly where you did something, because you may need that later.

Because the comments are so important, Git requires that every commit have an associated comment. If the comment is short, you can use the -m (message) flag on the commit command. If you just say git commit, Git will open an editor for you to enter the comment; when you save and quit the editor, the Git command finishes. If you have not specified your preferred editor, the default is usually vi. If this happens and you don't know vi, type :q to escape.

When you add a new file to your project, use git add filename to "stage it" (tell Git about it), or it won't be tracked. You can even use git add directory to start tracking all the files in a directory.

The command git status is good for reminding you what you have done and what you still need to do. It can be issued anytime. Use it often.

Occasionally--at least at the end of every session working on your project--issue the command git push. This will upload all your work to the remote server. At the very least, you will have a backup if your computer fails.

Summary workflow

  1. git pull # Not needed if you always use the same computer to do your work.
  2. while there are changes to be made:
    1. edit a file
    2. git commit -m "specific message about what you did"
  3. git push

Working on a team

If other people are also working on this project, your local repository may not be up to date. This is a bad start, and will cause problems later on, which you, not your teammates, will have to fix. Obviously you don't want this to happen. So whenever you return to working on a project, issue the command git pull URL_of_remote_repository. If your current files match the current or some earlier version in the remote directory, your files will be updated. If they don't match (because you never pushed your version), there will be problems which you will have to resolve.

Rule 1: Always start your work session with a pull command.

If you and your co-workers are working on different parts of the project, or on the same part but at different times, problems seldom arise. Everything goes smoothly.

Now suppose you and a team member are working on the same code, but making conflicting changes. (Git is pretty good, but not perfect, at figuring out when changes conflict, and I can't give you a simple rule.) And suppose she pushes her code to the remote repository first. When you go to push your code, Git will reject it. You now have to pull her code and figure out how to merge it with your own, before you can push it. So you want to push your changes first, making any conflicts her problem. How?

The best approach is to work in small increments: Fix or add one thing, commit it, and push it.

This seems mean and selfish, but it isn't. Small conflicts are much easier to deal with than large, multi-file conflicts. If she works the same way (frequent commits and pushes), conflicts will almost always be small and easy to deal with. And if she doesn't already work the same way, she will probably soon learn to.

Rule 2: Make small changes that you can commit and push frequently. (Or be sure that you are working on something that no one else is working on.)

Here's the next scenario. You start your work session with a pull (as you should), and very shortly discover that nothing works. Maybe it won't even compile. Now what? You can revert to the code you had just before the pull, thus guaranteeing headaches in the future. Or you can work through the dozen or so versions that were pushed to the remote server, trying to find the last one that wasn't broken. Or you can try to fix someone else's bug. You have good reason to be annoyed at somebody; and Git will tell exactly who to be annoyed at.

Now look at it reverse. What if you were the culprit. How would the team feel about you?

Rule 3: Never post broken code. Implementing a new feature usually can't be done all at once, but it's okay to push incomplete implementations, so long as they compile and no one else is depending on them to work. Just don't do anything that interferes with other people getting their jobs done.

Summary workflow

  1. git fetch # To get the file and see the current changes
  2. git pull # To merge in any changes
  3. if there are any conflicts that are not automatically resolved:
    1. Resolve the conflicts by editing the affected file(s)
    2. git commit -m "specific message about what you did"
  4. while there are changes to be made:
    1. edit a file
    2. git commit -m "specific message about what you did"
  5. git push

Git commands

git mv from to
Used to move or rename a file. Works like the Unix mv command, but also updates git. Should be followed by a commit with a message saying that the move/rename has occurred.
git rm file
Used to delete a file. Works like the Unix rm command, but also updates git. Should be followed by a commit with a message saying that the deletion has occurred. The file is not "lost," but can be recovered from the project history.