Lesson preamble

Lesson objectives:

  • Understand basic bash commands
  • Understand the logic of version control
  • Practice using Git commands at the command line
  • Learn how and when to use branches
  • Learn how to effectively use Pull Requests
  • Learn how to work with forks

Lesson outline:

  • Intro to Git (50 min) - Bash Primer (15 min) - Getting started with Git (5 min) - Version control with Git (30 min)
  • Collaborating with GitHub (50 min) - Creating a fork + GitHub setup (5 min) - Branches and pull requests (40 min) - Issues (5 min)

Intro to Git

Git is a version control system that tracks changes in files. Although it is primarily used for software development, Git can be used to track changes in any files, and is an indispensable tool for collaborative projects. Using Git, we effectively create different versions of our files and track who has made what changes. The complete history of the changes for a particular project and their metadata make up a repository.

to sync the repository across different computers and to collaborate with others, Git is often used via GitHub, a web-based service that hosts Git repositories. In the spirit of ‘working open’ and ensuring scientific reproducibility, it has also become increasingly common for scientists to upload scripts and related files to GitHub for others to use.

In this lesson, we’ll be covering some basic Git commands, before moving on to using GitHub for the upcoming group projects. All members of the group will be given admin access to a pre-made repository.

Command line Git

Setup and Installation

Git is primarily used on the command line. The implementation of the command line that we’ll be using is known as a ‘bash interpreter’, or ‘shell’. While bash interpreters are natively available on Mac and Linux operating systems, Windows users may have to externally install a bash shell.

To install Git on a Mac, you will have to install Xcode, which can be downloaded from the Mac App store. Be warned: Xcode is a very large download (approximately 6 GB)! Install Xcode and Git should be ready for use. (Note: most students will have already installed Xcode already for some R packages we used earlier in the course. If you’re not sure whether this is the case, run the git --version command described below)

To install Git (and Bash) on Windows, download Git for Windows from its website. At the time of writing, 2.19.1 is the most recent version. Download the exe file to get started. Git for Windows provides a program called Git Bash, which provides a bash interpreter that can also run Git commands.

To install Git for Linux, use the preferred package manager for your distribution and follow the instructions listed here.

To test whether Git has successfully been installed, open a bash interpreter, type

git --version

and hit enter. If the interpreter returns a version number, Git has been installed.

Bash Primer

Before we can use Git in the bash interpreter, it’s worth learning a few basic bash commands. All bash commands are run simply by typing them into the interpreter and pressing enter.

pwd

pwd, or ‘print working directory’, will return the folder bash is currently in. This is similar to the concept of the working directory in R that we discussed earlier in the course.

cd [folder]

cd changes the current working directory. This is useful in tandem with ~, which is shorthand for ‘home directory’ in bash (usually /Users/[yourname]). To navigate to your Desktop, the command would be cd ~/Desktop.

cd .. will let you move back up one folder. If your current working directory is /Users/[yourname]/Desktop, for instance, cd .. will change it to /Users/[yourname].

ls

ls will return what files/folders are within the current working directory. This is useful when you have just used cd to change your current directory and need a quick reminder of what’s in said directory.

mkdir [foldername]

mkdir, or ‘make directory’, will create a new directory with the specified name. For instance, mkdir thesis will create a folder called thesis, which can then be entered with cd thesis.

nano [filename]

nano is a text editor for unix-like systems that opens right within the bash window. If the filename provided doesn’t refer to an existing file, nano will create a new file with that name. Once you’re done with nano, press Control + X to exit - nano will ask whether you want to save your work, which can be answered by pressing either y or n. Alternatively, Control + O followed by Enter will save your current work, after which Control + X will exit without prompting.

You can use any other text editor based on your system, for example, Notepad on Windows.

Getting started with Git

First, we have to tell Git who we are. This is especially important when collaborating with Git and keeping track of who did what! Keep in mind that this only has to be done the first time we’re using Git.

This is done with the git config command. When using Git, all commands begin with git, followed by more specific subcommands.

git config --global user.name "Your Name"
git config --global user.email "youremail@email.com"

Next, we have to set a default text editor for Git. Here, we’ll use nano, as mentioned above:

git config --global core.editor "nano"

here is a link to instructions on how to config other text editors.

use the following to review and your Git configurations:

git config --list

Version control with Git

Note: Material for the following section of the class was taught from Software Carpentry’s openly available Intro to Git lesson, which is available in its entirety here.

Material covered in class, in order:

Summary: Basic Git commands
  • git init (or git clone)
  • git status
  • git log
  • git add
  • git commit
  • git checkout

Branching

Git commit history is a directed acyclic graph (or DAG), which means that every single commit always has at least one ‘parent’ commit (the previous commit(s) in the history), and any individual commit can have multiple ‘children’. This history can be traced back through the ‘lineage’ or ‘ancestry’.

What are branches?

Branches are simply a named pointer to a commit in the DAG, which automatically move forward as commits are made. Divergent commits (two commits with the same parent) could be considered “virtual” branches. Since they are simply pointers, branches in Git are very lightweight.

Examples of branching

Examples of branching

Why use them?
  • To keep experimental work separate
  • To separate trials
  • To ease collaboration
  • To separate bug fixes from development

Since you will be using Github, we will learn how to make branches in the second section of the lesson when we learn about GitHub.

Collaborating with GitHub

Although command line Git is very useful, GitHub allows for easy use of Git for collaborative purposes using a primarily point-and-click interface, in addition to providing a web-based hosting service for Git repositories (or ‘repos’). If you have not already made a GitHub account, do so now here.

All groups have been provided with existing repositories, but a new repository can be made by clicking on the + in the top right of the page and selecting ‘New repository’. For now, however, navigate to your provided group repo.

Creating a fork

The repos that have already been created can be thought of as ‘main repos’, which will contain the ‘primary’ version of the repo at any given point in time. However, instead of directly uploading and editing files right within this main repo itself, we will begin by forking the repo. When a given user forks a repo, GitHub creates a user-specific copy of the repo and all its files.

To fork your group’s repo, navigate to the repo’s page and click on the ‘Fork’ button at the top right of the page. Following a brief load screen, GitHub will redirect you to your new, forked repo.

You’ll notice that on the top left of this repo page, the repo’s name will be ‘[your username] / [repo name]’, as opposed to ‘EEB313 / [repo name]’. Furthermore, GitHub will indicate that this is a fork right underneath said repo name (‘forked from eeb313-2018/ [repo name]’).

Branches and Pull Requests

Currently, your fork is identical to what the original repo looked like at the moment of forking, and so GitHub will indicate that the fork is currently ‘even’ with its original. Any edits that are made in the fork will be unique to the fork, leaving the original unchanged.

But how do you add these changes to the main repo? Doing so is a two step process:

  1. First, create a branch on your fork (not the main repo). This is yet another copy of the fork (a copy of a copy!) that will store changes separately from ‘master’, which is the default branch.

  2. Once you have made our changes in this branch, submit what’s known as a pull request from this edited branch to the main repo. This allows your changes to be reviewed by the other members of your group before they are merged into the main repo.

Although this process may seem a bit laborious, using this method (also known as the ‘GitHub flow’) minimizes chances of error and ensures that all code is reviewed by at least one other person. Understanding how and why this process works is key to collaborative work in software development and the like, and is used by all sorts of open source projects on GitHub (including dplyr itself!)

Creating a new branch

In a given repo, there is a ‘Branch: master’ button just above the list of files. Clicking on this will bring up a dropdown menu containing a list of branches (which currently just contains master) and a text field. A new branch can be created by typing in the text field. Start by making a new branch called initial-edit using said text field. GitHub will then refresh, and the button will now say ‘Branch: initial-edit’ instead.

Pictured: the branch dropdown menu

Pictured: the branch dropdown menu

The dropdown can now be used to switch back to master, but let’s stay in initial-edit for now. This is the place to make new changes – either by editing existing files or creating new ones. Let’s create a simple Rmd file with the ‘Create new file’ button on the right side of the repo page.

---
title: "Test R Notebook"
author: "Your Name"
output: pdf_document
---

Hello! This is my R notebook.

```{r}
print('Hello, GitHub!')
```
Pictured: editing a file on GitHub

Pictured: editing a file on GitHub

Once done, scroll to the bottom of the page and add a commit message using the text file – something like ‘initial Rmd file creation’ – and then click on ‘Commit new file’.

Pull requests

After a commit has been made, GitHub takes you back to the main repo page, but now with an attention-grabbing yellow prompt about these new edits. GitHub has noticed that your fork has a new branch with edits, and over on the right side of the prompt, GitHub presents the option of creating a pull request.

Pictured: GitHub’s pull request prompt

Pictured: GitHub’s pull request prompt

If you don’t see the yellow prompt, head to your fork and switch to the branch containing your edits. Underneath the green ‘Clone or download’ button, there is a ‘Pull request’ option – this can be used to open a pull request instead. You’ll also notice that the left side of this bar lists how many commits ahead of master your branch currently is.

Following either method of opening a pull request, GitHub takes you to the ‘Pull requests’ tab of the main repo and prompts you to write about your pull request. Here, you can (and should) explain the changes you’ve made for your group members, so that they know what to look for and review. Be specific and detailed to save your group members’ time – it’s a good idea to start off your pull request message with an overall summary (‘adding dplyr code to clean dataset’) followed by a point-form list of what changes have been made, if necessary.

Once the pull request has been made, GitHub will list both your message and your commit messages below. You also have the option of merging the pull request yourself – but don’t do this! When collaborating, always have someone else review and merge your pull request.

If all does not look good, your team members can add messages below, and tag others using the @ symbol, similarly to most social networks. If more changes are needed before the pull request is ready to merge, any new commits you make to the branch on your fork (initial-edit in our case here) will automatically be added on to the pull request. This way, you can incorporate any changes or fixes suggested by your team members simply by continuing to work in your fork’s new branch until your changes are ready to merge.

Once a pull request has been merged into the main repo, the branch on your fork isn’t needed anymore. Because of this, GitHub will immediately prompt you to delete the initial-edit branch as soon as the merge has been completed.

Pictured: GitHub’s branch deletion prompt

Pictured: GitHub’s branch deletion prompt

Great work – your edits are now on the main repo!

Syncing your fork

There’s one issue with the above, though – now, the master branch on your fork is out of date with the current state of the main repo. The fork has to be synced to remedy this issue.

First, head to your fork on GitHub, and click on the green ‘Clone or download’ button on the right side of the page. This yields a link to your fork. Copy this link to your clipboard.

Next, open a bash shell, and navigate over to whichever folder you would like a copy of your repo to be saved in. Then, run:

git clone [repo link]

This process, known as cloning, will create a new folder in your current working directory that contains the contents of your fork. Enter this new folder with cd and type git status to make sure the repo has been cloned properly. git status should output that the branch is even with origin/master, indicating that it is currently the same as the current state of your fork.

To get your fork up to date with the main repo, you next have to add a remote linking to the main repo. Head to your group’s repo and once again click on ‘Clone or download’ to grab its link. Then, using the main repo link, run:

git remote add upstream [repo link]

Once this is done, run:

git remote -v

to get a list of existing remotes. This should return four links, two of which are labelled origin and two of which are labelled upstream.

Next, you have to fetch the new changes that are in the main repo.

git fetch upstream

Once the edits have been downloaded, merge them into your local repo:

git merge upstream/master

Your local copy is now even with the main repo! Finally, push these changes to the GitHub version of your fork:

git push origin master

and now the GitHub version of the fork is all synced up, ready for your next batch of edits, and eventually another pull request!

Keeping your fork up to date

When other group members add to the main repo through their own pull requests, you have to make sure those edits have been incorporated into your repo before making new changes of your own. This ensures that there aren’t any conflicts within files, wherein your edits clash with someone else’s if one of you is working with an earlier version of the file.

To remain up to date, navigate to the local copy of your repo using bash, and once again repeat the above steps:

git fetch upstream
git merge upstream/master
git push origin master

Note: The git pull command combines two other commands, git fetch and git merge.

This will download any changes your group members may have made and update both the local and GitHub versions of your fork accordingly. Before starting any edits of your own, it’s usually a good idea to start off by checking to see whether anything’s been added to the main repo and, if needed, then syncing your fork to account for that.

Issues

Finally, each repo on GitHub also has an Issues tab at the top of the page. Here, you and your group can create posts regarding the content of the repo that highlight issues with code or serve as to-do lists to manage outstanding tasks with.

Although issues aren’t needed for any of the steps we discussed above, it can be useful to create a roadmap of your project with them and
assign group members to specific tasks if need be.

To sum up

The ‘GitHub flow’:

  1. Make sure your fork is up to date
  2. Create a new branch on your fork
  3. Make your edits (with informative commit messages along the way)
  4. Submit a pull request
  5. Have your group members review your pull request
  6. If your group requests further edits prior to merging, make more commits in the new branch
  7. Once your edits have been merged, delete your new edits-containing branch
  8. Sync your fork to be up to date with the new changes

Additional resources:


This work is licensed under a Creative Commons Attribution 4.0 International License. See the licensing page for more details about copyright information.