An initial note: I learned about Git initially with the book Pro Git which I'm amazed is freely available online, so if you want a more comprehensive overview, check that out.
Also, I'd like to amend this post by saying, for change/commit descriptions, it's best practice to use the imperative style. (e.g. "Add file A", rather than "Added file A"). It's too late for me to update all these screenshots.
Table of Contents
- Setup
- Clone a Repository
- Folder Structure
- git log
- The .gitignore File, and "git status"
- Staging Changes You Want to Commit
- Committing Changes to Your Repository
- The Origin
- Branches
- Pushing to the Remote Repository
- Pulling Changes from the Remote Repository
- Merges and Conflicts
- Rebasing
- Pull Requests
What is a Version Control System?
A version control system allows you to store different versions of your code as you make changes. This allows you to avoid losing your changes, and access your code from a convenient location on the internet. You can keep some changes separate in its own branch, and merge those changes into the master branch when you're ready, so the main version of your code is unaffected until you know your changes function properly. In case you're unfamiliar with the term "branch", picture it as a particular version of your entire project. Let's say you're working on a book, and you have a published version, and a "work in progress" version. You are rewriting a chapter of your book in the "work in progress" section, but you still have the published version separate, so if you want to print out a copy of your book without the partially modified chapter, you can. When you're done modifying that chapter, you can merge in your changes to the published version, deleting the modification branch afterward. You could also have a permanent "student edition" branch that has more detail, and perhaps exercises at the end of each chapter. You can easily merge the same changes into that branch to keep it up to date, but maintain a separate copy.
Overview of This Article
I haven't been particularly inspired by any version control system until I dove into Git, created by the founder of Linux, Linus Torvalds. If you're already familiar with Git, feel free to skip the tutorial sections. I'm going to go over the following:
- Summarize what sets Git apart from other alternatives
- The basics of Git, so someone new to Git can be productive with it
- How it's set up at New York Magazine, where I work
- My preferred workflow using Git
Key Benefits of Git
- When you create a new branch, Git doesn't copy all your files over. A branch will point to the original files and only track the changes (commits) specific to that branch. This makes it blazingly fast, as opposed to its main competitor in terms of marketshare, Subversion, which actually laboriously copies the files.
- Git lets you work on your own copy of a project, merging your commits into the central repository, often on github.com, when you want your commits to be available to others. Github.com, by the way, will host your project, for free as long as it's open source. This means you can reliably access your code from anywhere with an internet connection. If you lose that internet connection, you can continue to work locally and sync up your changes when you're able to reconnect.
- When you screw up, you can usually undo your changes, somehow. You might need to call in an expert in serious cases, but there's always hope. This is the best "key benefit" a version control system can have.
- Git also lets you keep your commit history very organized. If you have lots of little changes, it lets you easily rewrite history so you see it as one big change (via something called rebasing). You can add/remove files in each commit, and certainly change the descriptions of each. This definitely forces me to consciously repress OCD tendencies.
- It's open source, fast, and very flexible, so it's widely-adopted and therefore well-supported.
Other Notable Features
- With Git, you can create "hooks", which let things happen automatically when you work with your code. A common usage is to create a hook to check the description submitted with each commit, making sure it conforms to a particular format. Perhaps you have your bugs described in a bug tracking system and each bug has an ID #. Git can ensure each message has "Bug: SomeNumber".
- Another under-appreciated feature is how Git tracks files. It uses an algorithm called SHA-1 to take the contents of files and produce a large hexadecimal number (hash code). The same file will always produce the same hash code. This way, if you move a file to a different folder, it can detect that the file moved, and not think that you deleted one file and added another. This allows Git to not have to keep two copies of the same file.
- While Git is not necessarily the most intuitive version control system out there, once you get used to it, you're able to browse through its internal directories and it all makes complete sense. Wondering where the file with the hash code "d482acb1302c49af36d5dabe0bccea04546496f7" is? Check out this file: "<your project>/.git/objects/d4/82acb1302c49af36d5dabe0bccea04546496f7" See how where the file is stored is determined by the hash code? There are lots of lower-level commands that let you build the operations you want, in case, for instance, git's merge command doesn't work how you'd like it to.
Tutorial
So you're, in whatever programming language, going to start a new project, and you want to use version control? I'm going to just create a silly, sample application in Scala that's very easy to understand, to demonstrate. I'm going to assume you're familiar with your operating system's command-line interface, and that you're able to write something in the language of your choice.
Setup
Github is one of the go-to places to get your code hosted for free. It gives you a home for your code, that you can access from anywhere. Initial steps:
- Go to http://github.com and "Sign up for Github"
- You'll need Git. Follow this page step by step: http://help.github.com/articles/set-up-git
- This explains how to create a new repository: https://help.github.com/articles/create-a-repo
- Lastly, you're going to want to get used to viewing files that start with a "." These files are hidden by default, so at the command line, when you're listing contents of a directory, you need to include an "a" option. That's "ls -a" in OSX and Linux, and "dir /a" for Windows. In your folder options, you can turn on "Show hidden files and folders" as well.
Once you got that far, there's nothing stopping you, outside of setting aside some play time, from using everything git has to offer.
Clone a Repository
Cloning a repository lets you grab the source code from an existing project that you own, or someone else's project that you have access to (usually public). Unless it's your project, you won't be able to make changes, so you're going to "fork" my potayto project, which means to create your own copy of it under your own account, then you can modify that to your heart's content. I keep all of my projects locally (on my computer) in a "projects" folder in my home directory, "/Users/sdanzig/projects", so I'm going to use "projects" for this demo.
First, fork my repository...
I created a sample project on github, as you now should know how to do.
First, log into your github account, then go to my repository at https://github.com/sdanzig/potayto ... Click "Fork":
Fig. 1 |
Then select your user account, to copy it to. At this point, it's as though it were your own repository, and you can actually make changes in the code on github. We're going to copy the repository onto our local hard drive, so we can both edit and compile the code there.
Fig. 2 |
Folder structure
There are a few key things to know about what git is doing with your files. Type:cd potaytoThere are a couple things to see here. List the contents in the potayto folder, being careful to show the hidden files and folders:
Fig. 3 |
The src folder has the source code, and its structure conforms to the Maven standard directory structure. You'll also see a .git folder, which contains a complete record of all the changes that were made to the potayto project.the potayto repository and also a .gitignore text file. We're not going to dive into the contents of .git in this tutorial, but it's easier to understand than you think. If you're curious, please refer to the online book I mentioned earlier.
git log
A "commit" is a change recorded in your repository. Type "git log", and you might have to press your space bar to scroll and type "q" at the end, to quit displaying the file:Fig. 4 |
Git's log shows the potayto project has 3 commits so far, from oldest on bottom, the first commit, to most recent on top. You see the big hexadecimal numbers preceded by the word "commit"? Those are the SHA codes I was referring to. Git also uses these SHA codes to identify commits. They're big and scary, but you can just copy and paste them. Also, you only need to type enough letters and numbers for it to be uniquely identified. Five should be usually enough. For this project, you can get away with 4, the minimum.
Let's see how my first commit started. To see the details of the first commit, ype:
git show bfaa
Fig. 5 |
Initially I checked in my Scala application as something that merely printed out "Tomayto tomahto," "Potayto potahto!" You can see that near the bottom. The "main" method of the "Potayto" object gets executed, and there are those two "print lines". Earlier in the commit you can see the addition of the .gitignore I provided. I'm making git ignore my Eclipse-specific dot-something files (e.g. .project) and also the target directory, where my source code get compiled to. Git's show command is showing the changes in the file, not the entire files. Those +'s before each line mean the lines were added. In this case, they were added because the file was previously non-existant. That's why you see the /dev/null there.
Now type:
git show 963e
Fig. 6 |
Here you see my informative commit message about what changed, which should be concise but comprehensive, so you're able to find the change when you need it. After that, you see that I did exactly what the message says. I changed the order of the lyrics. You see two lines beginning with "-", preceding the lines removed, and two lines beginning with "+", preceding the lines added. You get the idea.
The .gitignore File, and "git status"
View the .gitignore file..cacheThis is a manually created file that tells git what to ignore. If you don't want files tracked, you include it here. I use software called Eclipse to write my code, and it creates hidden project files which git will see and want to add in to the project. Why should you be confined to using not only the same software as me to mess with my code, but also the same settings? Some teams might want to conform to the same development environments and checking in the project files might be a time saver, but these days there are tools that let you easily generate such project files for popular IDEs. Therefore, I have git ignore all the eclipse-specific files, which all happen to start with a "."
.settings
.classpath
.project
target
There's also a "target" folder. I've configured Eclipse to write my compiled code into that folder. We don't want git tracking the files generated upon compilation. Let those grabbing your source code compile it themselves after they make what modifications they wish. You're going to want to create one for your own projects. This .gitignore file gets checked in along with your project, so people who modify your code don't accidentally check in their generated code as well. Others might be using Intellij, which writes .idea folders and .ipr and .iws files, so they may append this to the .gitignore, which is completely fine.
Let's try this. Type:
git status
Fig. 7 |
You'll see you're on the main branch of your project (a version of your code), "master". Being "on a branch" means your commits are appended to that branch. Now create a text file named "deleteme.txt" using whatever editor you want in that potayto folder and type "git status" again:
Fig. 8 |
Use that same text editor to add "deleteme.txt" as the last line of .gitignore and check this out:
Fig. 9 |
See, you modified .gitignore, so git doesn't see your deleteme.txt file. However, other than this nifty feature, .gitignore is a file just like any other file in your repository, so if you want this "ignoring" saved, you have to commit the change, just like you would commit a change to your code.
Staging Changes You Want to Commit
Here's one of the fun things with git. You can "stage" the modified files that you want to commit. Other version control systems ominously await your one command before your files instantly changed in the repository, perhaps the remote repository for the entire team.
Let's say you wanted to make a change involving files A and B. You changed file A. You then remembered something you'd like to do with file Z, on an unrelated note, and modified that before you forgot about it. Then you completed your initial change, modifying file B. Git allows you to "add" files A and B to staging, while leaving file Z "unstaged". Then you can "commit" only the staged files to your repository. But you don't! You realize you need to make a change to file C as well. You "add" it. Now files A, B, and C are staged, and Z is still unstaged. You commit the staged changes only.
Read that last paragraph repeatedly if you didn't follow it fully. It's important. See how Git lets you prepare your commit beforehand? With a version control system such as Subversion, you'd have to remember to make your change to file Z later, and your "commit history" would show that you changed files A and B, then, in another entry, that you changed file C later.
We won't be as intricate. Let's just stage our one file for now. Look at Figure 9. Git gives you instructions for what you can do while in the repository's current state. Git is not known for having intuitive commands, but it is known for helping you out. "git checkout -- .gitignore" to undo your change? It's strange, but at least it tells you exactly what to do.
To promote .gitignore to "staged" status, type
git add .gitignore
Fig. 10 |
The important thing to note here is that now your file change is listed under "Changes to be committed" and git is spoon-feeding you what exactly you need to type if you want to undo this staging. Don't type this:
git reset HEAD .gitignoreYou should strive to understand what's going on there (check out the Pro Git book I linked to for those details) but for now, in this situation, you just are given means to an end when you might need it (changing your mind about what to stage).
By the way, it's often more convenient to just type "git add <folder name>" to add all modifications of files in a folder (and subfolders of that folder). Also very common to type is "git add .", a shortcut to stage all the modified files in your repository. This is fine as long as you're sure you're not accidentally adding a file such as Z that you don't want to be grouped into this change in your commit history.
It's also useful to know how to stage the deletion of a file. Use "git rm <file>" for that.
Committing Changes to Your Repository
Guess what? We get to do our first commit! Time to make that .gitignore change official. Type:
git commit -m "Added deleteme.txt to .gitignore"
Fig. 11 |
You could just type "git commit", but then git would load up a text editor, and you'd be required to type a commit message anyway. In OSX and Linux, "vim" would load up, and in Windows, you'd get an error. If you prefer a full screen editor in Windows, you can type this to configure it:
git config --global core.editor "notepad"
Mug available on thinkgeek.com sporting some vi quick reference. Vim supports all vi commands listed. |
The full screen editor is necessary if you want a commit message with multiple lines, or in other situations, so if you hate vim, configure git to one you do like.
Enough with this babble. Fill that VI mug with champagne -- you just made your first commit! If you can contain your excitement, type:
git log
Fig. 12 |
Fig. 13 |
See the +deleteme.txt there? That was you! The way this "diff" works is that git tries to show you three lines before and after each of your changes. Here, there were no lines below your addition. The -3,3 and +3,4 are ranges. - precedes the old file's range, and + is for the new file. The first number in each range is a starting line number. The second number is the number of lines of the displayed sample before and after your modification. The 4 lines displayed only totaled 3 before your change.
If you want to revert changes you made, the safest way is to use "git revert", which automatically creates a new commit that undoes the changes in another commit. Don't do this, but if you wanted to undo that "deleteme.txt ignoring" commit which has the SHA starting with 0c22, you can type: "git revert 0c22"
The Origin
You cloned your repository from your github account. Unless something went horribly wrong, this should be:
https://github.com/<your github username>/potayto.gitGit automatically labels where you cloned a repository from as "origin". Remember when I said the internals of a git repository were easily accessible in that .git folder in your project? Look at the text file .git/config:
Fig. 14 |
It's as simple as this.
Branches
Before I explain how to make your changes on the version of your code stored on github, I should first explain more about branches. I already explained how a branch is a separate version of your code. A change made to one branch does not affect the version of your repository represented by the other branch, unless you explicitly merge the change into it. By default, git will put your code on a "master" branch. When you clone a project from a remote repository (remote in this case means hosted by github), it will automatically create a local branch that "tracks" a remote branch. Tracking a branch means that git will help you easily determine:
- See the differences between commits made to the tracking branch (the local one) and the tracked branch (remote)
- Add your new local commits to the remote branch
- Put the new remote commits on your local branch
If you didn't have your local branch track the remote branch, you could still move changes from one to another, but it becomes more of a manual process. Hey, guess what? I can easily demonstrate all this in action! First, type:
git status
Fig. 15 |
That deleteme.txt ignoring change you made in your local master branch is not yet on Github! You have one commit that Github's (the origin) remote master branch (denoted as origin/master) does not yet have.
Don't do this now, but if you don't want to make changes directly in your local master branch, you can create a new local branch, perhaps named "testing" by typing "git branch testing". Then you can switch to that branch by typing "git checkout testing". Then make whatever changes you want, stage and commit them, then switch back to the master branch with "git checkout master". You could also create and switch to a new local branch in one command, "git checkout -b testing".
Pushing to the Remote Repository
Let's put your change on Github, then we'll make a change directly on Github and pull it. Git's push command, if you don't provide arguments, will just push all the changes in your local branches to the remote branches they track. This can be dangerous, if you have commits in another local branch and you're not quite ready to push those out also. (I one time accidentally erased the last week of changes in New York Magazine's main repository doing this. We did manage to recover them, but, don't ask.) It's better to be explicit. Type:
git push origin master
Fig. 16 |
You don't really need to concern yourself with the details of how Git does the upload. But as for the command you just typed, git push lets you specify the "remote" that you're pushing to, and the branch. By specifying the branch, you tell git to take that particular branch ("master", in this case) and update the remote branch, on the origin (your Github potayto repository), with the same name (it will create a new remote "master" branch if it doesn't exist). If you didn't specify "master", it will try to push the changes in all your branches to branches of the same names on the origin (if they exist there. It won't create new remote branches in this case).
Anyway, if you type "git status" again, you'll see your branch now matches the remote repository's copy of it. I'd show you, but I can only do so many screen captures, okay? Also what you can do is type:
git log origin/master
Fig. 17 |
This is the syntax to see a log of the commits in the master branch on your "origin" remote. You can see the change is there. You can also see this list of commits by logging into Github, viewing your Potayto repository, and clicking on this link:
Fig. 18 |
Pulling Changes from the Remote Repository
While we're browsing the Github interface, let's use it to create a change that you can fetch (or pull). This will emulate someone else accessing the remote repository and making a change. If you want your local copy of the repository to reflect what's stored in the remote repository, you need to keep yours up to date by intermittently fetching new changes. First, let's create a README.md file which Github will automatically use to describe your project. Github provides a button labeled "Add a README" for this, but let's do it the more generic way. Click the encircled "Add a file" button:
Fig. 19 |
The "md" in the filename stands for "Markdown", which is a "markup language" that lets you augment your text with different things just like HTML does. If you want to learn how pretty you can make your README file, you can learn more about Markdown here, but just realize Github uses a slightly modified version of Markdown.
Click the "Commit New File" button:
Fig. 20 |
git status
Fig. 21 |
Wait a sec... Why's it saying that your local branch is up to date? It's because the git "status" command does not do any network communication. Even typing "git log origin/master" won't show the change. Only Git's "push", "pull", and "fetch" does anything over the network. Let's talk about "fetch", as "pull" is just a shortcut of functionality that "fetch" can do.
When you track a remote branch, you do get a copy of that remote branch in your local repository. However, aside from those three aforementioned commands that talk over the network, git treats these remote branches just like any other branches. You can even have one local branch track another local branch. (Probably won't need to do that.)
So, how do we update our local copies of the remote branches? "git fetch" will update all the local copies of the remote branches listed in your .git/config file. Here, I'll start adding more shadows to my screenshots, in case you actually aren't as excited about all this niftiness as I am. Please type:
git fetch
Fig. 22 |
Fig. 23 |
git log origin/master
Fig. 24 |
git statusSee, this is more like it, but what does "fast-forwarded" mean? Fast-forwarding is a version of "merging". It means there's no potential conflict. It means you took all the changes in a branch, such as the remote master branch, and made changes from there, while no new changes were made in the remote branch. I'll explain more later, in the section on "rebasing", but for now, we're going to pull these changes in. Type:
git merge origin/master
Fig. 25 |
Fig. 26 |
git pull origin masterThat would have not only fetched the commits from the remote repository, but would also have done the merge. And if you want to pull all of the branches from all the remote repositories that your .git/config file lists, you can just type:
git pullYou can be as trigger happy as you want with that for now, but when you start dealing with more than one branch, you might update some branches you weren't yet ready to update.
Merges and Conflicts
Now for the purposes of learning about merges, we're going to undo that last merge. Very carefully, type:
git reset HEAD~1 --hard
Fig. 27 |
Don't worry -- we don't have to mess with remote repositories for a while. Your new README.md file is also safely committed to your local repository's cached version of the remote master branch, "origin/master". You could type "git merge origin/master" to re-merge your changes, but don't do it right now.
Let's say someone else added that README.md, and you were unaware. You start to create a README.md in your local repository, with the intention of pushing it to the remote repository later. Because we undid our change, there is no longer a README.md file in your current directory.
Normally you'd use a text editor, but for now, type this to create a new README.md file:
echo A test repository for learning git > README.md
Fig. 28 |
git add README.mdthen type:
git commit -m "Created a simple readme file"and finally:
git status
Fig. 29 |
git merge origin/master
Fig. 30 |
git status
Fig. 31 |
Can't be any clearer, except for one detail. Git is telling us to type "git add/rm whatever" to "mark resolution. That means, in order to fix this, you could take one of two routes. DON'T DO THIS! ... You could go into README.md, fix it up, then stage it with git add. Edit the README.md file. I'll use vim, but you use whatever editor you want:
Fig. 32 |
Fig. 33 |
However, the reason I told you not to do this is because it's the hard way, especially for complicated conflicts. Instead, while still in your project directory, having just experienced a failed merge command, type:
git mergetool
Fig. 34 |
Mergetool will guide you through each conflicted file, letting you choose which version of each conflicted line you'd like to use for the committed file. You can see, by default, it will use "opendiff". Press enter to see what "opendiff" looks like:
Fig. 35 |
If this were more than one line, you'd be able to say "use the left version for this conflict line". Or "use the right version for this line". Or "I don't want to use either line." In this case, we only have one conflicted line to choose from, so make it count! The one conflicted line is selected. Click on the "Actions" pull down menu and choose "Choose right". You'll see nothing changed. That was because that arrow in the middle was already pointing to the right. Try selecting "Choose left", then "Choose right" again. You'll see what I mean. Opendiff doesn't give you the opportunity to put in your own custom line. You can do that later if you wish.
At the pull down menu at the top of the screen, select "File" then "Save Merge":
Fig. 36 |
Go back to the menu and select "Quit FileMerge". Now again type:
git status
Select the line then
Let's stage the new version of the readme file. Type:
git add README.md
Fig. 37 |
git commit -m "Merged remote version of readme with local version."and then:
git status
Fig. 38 |
Back to the merge. Look! Your branch is "ahead" of "origin/master" by 2 commits. Let's see what those commits are. To show just the last two commits, type:
git log -n 2
Fig. 39 |
git push origin master
Fig. 40 |
Fig. 41 |
Fig. 42 |
It's good to pull in remote changes not too infrequently, to minimize the complexity of conflicts.
Rebasing
This is as advanced as this tutorial is going to get, and you're in the homestretch! Rebasing is meant to give you that clean, fresh feeling when committing your changes. With it, you can shape your commits how you prefer before merging them to another branch. But wait, you might think... You can already do that when you're staging your files. You can stage and unstage files repeatedly, getting a commit exactly how you want. There are two main things that rebasing lets you do in addition to that.
Let's say you were working on branch A and you created branch B. Branch B is nothing more than a series of changes made to a specific version of branch A (starting with a specific commit in branch A). Let's say you were able to take those changes and reapply them to the last commit in branch A. It's as though you checked out branch A and you made the same changes. Read this paragraph as many times as you need to before you move on.
Remember when I mentioned about fast-forward commits? When you viewed the commit history on Github, did you like seeing commits on other branches being merged in? Or would you have preferred one commit after another? Most prefer the latter. Merging can get quite messy in a worst-case scenario, but even if it's not so bad, it's not preferable. You can use rebasing to allow your merges to be "fast-forward", so when you merge your changes into another branch, there's no "merge commit". Your changes are simply added as the next commits in the target branch, and the new latest commit of that branch is your last change.
Let's demonstrate before I talk about the next benefit. I explained how to create and switch to local branches at the end of the "Branches" section. Type:
git branch testingWe're still in the master branch. Now let's make another change to that awful readme file again. Load up your editor and add the line: "Inspired by the Gershwin brothers" then save:
Fig. 43 |
git commit -am "Added something to the readme file"then view the log with:
git log -n 2
Fig. 44 |
git checkout testingand then view the README.md file:
Fig. 45 |
Edit the file "src/main/scala/scottdanzig/potayto/Potayto.scala" and add the printing of "Ding!" as shown. Please, just humor me...
Fig. 46 |
git commit -am "Added the printing of Ding"then show the last two changes for both the current "testing" branch and the "master" branch with:
git log -n 2 <branch>
Fig. 47 |
git rebase master
Fig. 48 |
Fig. 49 |
Fig. 50 |
I mentioned there are two things rebasing lets you do that you can't do just with staging. There's this notion of "interactive rebasing" that I think is the coolest part of git. This is the last part of the tutorial where you have to do anything, so this is the homestretch of the homestretch. Now we're going back to our testing branch (currently the same as master) and create two new files, A and B. I'm going to keep this simple. Type:
git checkout testingthen:
echo test > Aand stage and commit that change. File "A" is new/untracked, so you can't use the "-am" shortcut:
git add Aand then:
git commit -m "Added A"
Fig. 51 |
echo test > Band stage then commit as well:
git add Bgit commit -m "Added B"
Fig. 52 |
Fig. 53 |
- You can do this in the same branch, if you just want to reorganize a branch while you're working with it.
- You can also combine commits when you're rebasing (reapplying/replaying) them onto another branch.
If you don't think this is the bees knees, you're nuts. We're going to do the rebasing the second way, while rebasing onto master. The latest change on master is contained in the testing branch, so rebasing just to avoid merge commits would be unnecessary. Merging testing into master would be a fast-forward merge. However, we're also going to use this opportunity to combine the two commits. Rebasing can be multi-purpose that way. Type:
git rebase -i master
Fig. 54 |
Let's change the second "pick" to a "squash". It's possible to change your default editor from "vim" if you want, but if you prefer vim like me or just haven't got around to it yet, just heed my instructions:
Use the arrow keys to move the cursor to the "p" of the second "pick".
- Type "cw" to change the word.
- Type "s" then press the escape button.
- Type ":x" to exit and save.
Now you should see a screen allowing you to create the new commit message:
Fig. 55 |
Fig. 56 |
Fig. 57 |
Pull Requests
Commits are often grouped into "feature branches", representing all the changes needed for a branch. How projects with designated maintainer(s) often operate is as follows:
- You push your "feature branch" to a remote repository, often your fork of the main repository.
- You create a "pull request" on Github for that branch, which tells the project maintainer that you want your branch merged into the master branch.
- If the branch is recent enough where it's spawned from the most recent commit on the project's master branch, or it can be rebased onto master without any conflicts, the maintainer can easily merge in your changes.
- If there are conflicts, then it's up to the maintainer to do the merge, or to reject the pull request and let you rebase and deconflict the commits in your branch yourself.
New York Magazine Development Environment
At New York Magazine, where I work, we generally have 4 main branches of each project entitled dev, qa, stg, prod. We have software called Jenkins that monitors each branch, and when any change is made, the project is redeployed to a computer/server dedicated to that environment.
- dev branch - While developers first test their code on their own computers, eventually they need to test changes on a server with shared resources. This often exposes a bunch of integration issues so often a change requires multiple commits (multiple attempts to get it right) before the change is complete. It's a necessary evil that developers simultaneously make changes in this environment for their own features. Hopefully, someone else's changes don't affect testing of your own changes.
- qa branch - This is branch is for QA (quality assurance) testing to be done on a new change. The branch is cleaner, only having completed changes, and, although everything isn't necessarily optimized (maybe you do have debugging information being recorded to the log, for instance), it's much more controlled as opposed to dev.
- stg branch - Changes approved by QA go to the "staging" environment. This environment to be fully optimized, as if it were the production environment. There could be more issues that are exposed by testing in a fully optimized environment, but usually not. This is not to be confused with the much lower-level staging in git, but ultimately, the concept is the same. You're ultimately preparing a set of features that are slated to go public, rather than a bunch of file changes that are about to be committed.
- prod branch - What your clients/customers/users ultimately see is deployed directly from this branch.
To manage the environment-specific configuration, including enabling optimizations and altering logging levels, we use Puppet. We also use Git to maintain our internal documentation, written as text files using the Git-variety of Markdown, to allow ease of collaboration and code-friendly formatting. Hosting a project on Github is free unless it's to be private. Most New York Magazine repositories are private.
Each commit message at New York Magazine, optimally, should have a "story number". A "story" is a description of a desired modification. If something should be changed in code, someone describes how the change works in a web interface provided by a story-tracking application such as Atlassian's JIRA, which we use. A developer can modify the "status" of the story to reflect progress being made toward its resolution.
We use something called Crucible for "peer code reviews". This lets a developer send a series of commits out to fellow developers to have a look at. It tracks who has had a change to review your code, and gives them the opportunity to make comments.
I'm typically tasked with a modification I must make to a shared project hosted as a Github repository as I described. On Github, I have a separate user, "scottdanzig" for my job-related Github activity, which allows clear separation of my personal projects from what I've done that is New York Magazine property. For my examples, I'll refer to a web application created with Scala and the Play Framework, that provides restaurant listings for your mobile device. Let's say we realized that the listings load very fast, and we can afford to display larger pictures. Here is my preferred workflow:
Let's say both Jenkins and my fellow developers are happy. It's time to submit my code to QA. The QA branch is automatically deployed by Jenkins to the QA servers, a pristine environment meant to better reflect what actually is accessed by New York Magazine's readers. We have some dedicated QA experts to systematically test my functionality to make sure I didn't unintentionally break something. If there are no QA experts available, QA might be done by another developer if the feature is sufficiently urgent.
At this point, it's out of my hands, for the time being. However, the project has a "maintainer" assigned.
At regular intervals, a development team will release a set of features that are ready and desired. A release consists of:
That's a summary of how I work, and although everything is "sensible", it's a bit in flux. These are things which could be changed:
After years of using many other version control systems, Git has proven to be the one that
makes the most sense. It's certainly not dependent on a reliable internet connection. It's fast. It's very flexible. After over 20 years of professional software development, I conclude Git is an absolutely indispensable tool.
Each commit message at New York Magazine, optimally, should have a "story number". A "story" is a description of a desired modification. If something should be changed in code, someone describes how the change works in a web interface provided by a story-tracking application such as Atlassian's JIRA, which we use. A developer can modify the "status" of the story to reflect progress being made toward its resolution.
We use something called Crucible for "peer code reviews". This lets a developer send a series of commits out to fellow developers to have a look at. It tracks who has had a change to review your code, and gives them the opportunity to make comments.
My Preferred Workflow with Git
I'm typically tasked with a modification I must make to a shared project hosted as a Github repository as I described. On Github, I have a separate user, "scottdanzig" for my job-related Github activity, which allows clear separation of my personal projects from what I've done that is New York Magazine property. For my examples, I'll refer to a web application created with Scala and the Play Framework, that provides restaurant listings for your mobile device. Let's say we realized that the listings load very fast, and we can afford to display larger pictures. Here is my preferred workflow:
Changing the Code
- First thing I do is change the status of the JIRA story I'm going to work on to "In Progress".
- If I don't yet have the project cloned onto my machine, I'll do that first: git clone https://github.com/nymag/listings.git
- I checkout the dev branch: git checkout dev
- I update my dev branch with the latest from the remote repository: git pull origin dev
- I create and checkout a branch off dev: git checkout -b larger-pics
- I make my modifications and test as much as I can, staging and committing my changes after successfully testing each piece of the new functionality.
- I'll then update my dev branch again, so when I merge back, hopefully it's a fast-forward merge: git pull origin dev
- I'll interactively rebase my larger-pics branch onto my dev branch. This gives me an opportunity to change all my commits to one big commit, to be applied to the latest commit on the dev branch: git rebase -i dev then I change all "picks" but the top one to a squash. I write one comprehensive commit message detailing my changes so far, making sure to start with the JIRA story number so people can review the motivation behind the change. It's possible I might want to not combine all my commits yet. If I'm not sure if one of the incremental changes is necessary, I may decide to keep it as a separate commit. This is possible if you leave it as a separate "pick" during the interactive rebasing. Git will give you an opportunity to rewrite the commit description for that commit separately.
- I checkout the dev branch: git checkout dev
- Then I merge in my one commit: git merge larger-pics
- Then I push it to Github: git push origin dev
- If it complains about it not being a fast-forward merge and rejects my change, I may need to rebase my dev branch onto origin/dev and then try again. We're not going to combine any commits, so it doesn't need to be interactive: git rebase origin/dev then again: git push origin dev
- Jenkins will detect the commit and kick off a new build. I can log into the Jenkins web interface and watch the progress of the build. It's possible the build will fail, and other developers will grumble at me until I fix the now broken dev environment. Let's say I did just that.
- If I think it might be a while before I'm able to fix my change, I'll use "git revert <SHA code>" to undo the commit then quickly push that to dev. Either way, I'll again checkout my larger-pics branch, git rebase dev, then make changes, git pull origin dev, git rebase dev, git checkout dev, git merge larger-pics, git push origin dev. Let's say Jenkins gives me the thumbs up now.
- Next stage is the code review. I'll log into Crucible and advertise my list of commits in the dev branch for others to review. I can make modifications based on their feedback if necessary.
Submitting to QA
Let's say both Jenkins and my fellow developers are happy. It's time to submit my code to QA. The QA branch is automatically deployed by Jenkins to the QA servers, a pristine environment meant to better reflect what actually is accessed by New York Magazine's readers. We have some dedicated QA experts to systematically test my functionality to make sure I didn't unintentionally break something. If there are no QA experts available, QA might be done by another developer if the feature is sufficiently urgent.
- I need to update my local QA branch so I can rebase my changes onto it, pushing fast-forward commits. I first type: git pull origin qa
- Then I change to my larger-pics branch: git checkout larger-pics
- It's time to rebase my commits onto the qa branch, rather than dev, which can be polluted by the works in progress of other developers. I type: git rebase -i qa, creating a combined commit message describing my entire set of changes. I now have a branch that is the same as QA, plus one fast-forwardable commit that reflects all of my changes.
- I add my branch to the remote repository: git push -u origin larger-pics
- I go to the repository on Github and create a pull request, requesting my larger-pics branch be merged into the qa branch.
The Project Maintainer
At this point, it's out of my hands, for the time being. However, the project has a "maintainer" assigned.
- The maintainer can first use the Github interface to see the changes. The maintainer can give a last check for the code.
- If approved, the maintainer must merge the branch targeted by the pull request to the qa branch. If the commit will have no conflicts, Github's interface is sufficient to merge in the change. Otherwise, the maintainer either can reject the change, requesting for the original developer of the change to rebase the branch again and resolve the conflict before creating a new pull request. Otherwise, the maintainer can checkout the branch locally and resolve the merge, rather than the original developer doing it.
- The maintainer commits the merged change and updates the JIRA story to "Submitted to QA".
- If QA finds a bug, they will change the JIRA status to "Failed QA". The maintainer will checkout the QA branch and use "git revert" to roll back the change, then will reassign the JIRA ticket back to the original developer.
- If QA approves the change however, they will change the JIRA status to "Passed QA".
Release Day
At regular intervals, a development team will release a set of features that are ready and desired. A release consists of:
- A developer merging QA-approved changes from the QA branch to the staging branch.
- Members of the team having a last look at the change's functionality in the staging environment.
- The developer of a change, after confirming that it works correctly in staging, merges the change into the prod branch before a designated release cutoff time.
- The developer changes the status of the JIRA story to "Resolved"
- The system administrators will deploy a build including the last commit before the cutoff time. For New York Magazine, this entails a brief period of down-time, so the release is coordinated with the editors and others who potentially will be affected.
What's Not Set in Stone
That's a summary of how I work, and although everything is "sensible", it's a bit in flux. These are things which could be changed:
- We can get rid of the staging environment, and merge directly from QA. I see the value in this extra level of testing, but I believe four stages is a bit too cumbersome.
- A project does not necessarily need a maintainer, and if we use Crucible, perhaps not even pull requests. A developer can merge his change directly into the QA branch and submit the story to QA on his/her own. I prefer to have a project maintainer.
- We can get rid of Crucible, and just use the code review system in Github. It might not be as feature-filled, but if we use pull requests, it's readily available and could streamline the process. I like Crucible, although it might be worth exploring eliminating this redundancy.
Conclusion
After years of using many other version control systems, Git has proven to be the one that
makes the most sense. It's certainly not dependent on a reliable internet connection. It's fast. It's very flexible. After over 20 years of professional software development, I conclude Git is an absolutely indispensable tool.
Thank you, this is a very nice post and helped me with Git workflow. Now I just nedd to figure out how to do this steps with GitLab instead of GitHub and learn about Jenkins. Once again, very nice post.
ReplyDeleteThanks! You're quite welcome. Happy to know writing this huge thing is going to help some people. :)
DeleteAre you doing crucible review from local or remote dev branch ?
ReplyDeleteMy point is that as 2 developers (lets say John and Bob) could work on a shared dev branch. As you are always rebasing it could easily happen that changes of one of them will be on top of the other in a Dev branch (like John changes are on top of Bob changes). Now both change sets (John's and Bob's) are sent to Crucible. It can happen that John did a perfect job so his changesets are accepted but Bob did sth really bad, and it need to be modified/rollbacked. What can even be worse is that Bob changes are much more invasive to the system so reviewing them would require more time. In the workflow you described above John has a problem: he needs to be aware he can not merge his work to QA since his changes are based on Bob changes which are:a) not ready to be promoted to QA or b) hasn't been reviewed yet.