Using bash to manage your git/GitHub repositories
Background
Currently I have over 80 active project directories in a folder that I use for most of my coding. Some of these are active projects, some of these are project stubs that I started and then stopped working on, some of these reflect work I’ve done on collaborative projects that have finished. The challenge for me is that it’s not always clear where each project is. As part of my own personal goal to reduce the degree of clutter I’m facing, I decided to try to take control of the problem.
What you’ll learn
Hopefully in this resource you’ll get a sense of:
- how to manage elements of project management that are not directly related to one particular project, but rather a set of projects.
- lean a bit about flow control (conditionals and loops) in
bash
- learn how to print in color to the terminal
- learn how messy a person’s project folders can get
This all sprang out of my own experience, so I hope it can be helpful to you. The latest version of the bash script I will be talking about will be posted on my GitHub account in a gist for managing projects with git.
Project Management
I borrowed my project management file and directory structure largely from a post by Jenny Bryant that I am unable to find. The structure is mirrored in a paper by Nobel on managing projects in Computational Biology. I choose to have a fairly standard project structure because I rely on R for much of my programming (although I’ve been using more of Python and node.js lately), but also because for most analytic workflows it makes a lot of intuitive sense.
Within each directory there is usually a data
folder (with input
and output
folders), a README.md
, LICENSE
and code_of_conduct.md
file (for most projects), some folder for auxillary code (like SQL or CQL code if needed), and then a folder for figures. Regardless, because I am usually working on multiple projects at the same time, and have other constraints on my time, folders are often in various states of disrepair (or completion). This leads to a situation where I have a large number of projects, many with un-committed changes, that can’t easily be cleaned out of my working folders, nor pushed up to GitHub, because they are only partially complete.
To help make a shift in the way I work, to focus more on completing projects than on starting new things, I decided to start a new project. A bash file that could check each directory and report its status, so that I can slowly check each of these off of my TODO list, and then push them up to GitHub and get them off of my laptop.
Using bash
bash
is a program that comes as part of Linux-based systems. It provides command-line control, can pass and modify variables, do basic flow control (if/else and for loops), and execute other programs.
I want the bash file to start running in a directory (D0) and then check each sub-directory (dn) to see whether there is a .git
directory (condition1: .git
is absent: 0; .git
is present: 1). I should have git initialized for each directory because it is good practice, and tells me that I’m interested in managing this project sensibly. If there is a .git
directory, the next thing I want to do is test to see whether all files are committed (condition2: uncommitted files: 0; all files committed: 1). This leads to three possible outcomes:
c1 | c2 | Result |
---|---|---|
0 | - | There’s no .git folder! |
1 | 0 | Some files are uncommitted. |
1 | 1 | All files are committed. |
Given the size of the parent project folder I want to be able to provide summary statistics as well, basically a running tally of each of these possible outcomes. So, I want to report conditions [0,-]
and [1,0]
to the screen, and provide a tally of each class.
Program workflow
I’ll break down the code I wrote, but first, here’s the program in its entirety:
#!/bin/bash
curdir=$PWD
RED='\033[0;31m'
NC='\033[0m' # No Color
cnt=0
good=0
untr=0
uncom=0
for D in */; do
cnt=$((cnt+1))
cd $D
find -maxdepth 1 -name '\.git' -type d -print -quit | grep '\.git' &> /dev/null
if [ $? == 0 ]; then
if [ -z "$(git status --porcelain)" ]; then
good=$((good+1))
else
printf "${RED}%-30s${NC} contains untracked (or uncommitted) files\n" $D
uncom=$((uncom+1))
fi
else
printf "${RED}%-30s${NC} is not currently tracked with git\n" $D
untr=$((untr+1))
fi
cd $curdir
done
printf "You currently have:\n * %s clean project folders\n * %s tracked folders awaiting commits\n * %s untracked folders\n" $good $uncom $untr
Setting variables
I set a number of variables at the top of the script. These set the parent directory (the directory from which the bash script is called), assign color codes for printing later (the variables RED
and NC
), and then assign our counters.
Once these variables are assigned values we move through each directory using the for D in */; do
statement. This is a reserved statement in bash that tells the bash program to loop through each directory in the current directory, using D
as the alias for that directory name.
Inside the loop
Inside the loop the script moves into directory D
, incrementing the total cntr
that keeps track of how many directories we currently have. In the new directory I test condition1 using find
. The statement:
find -maxdepth 1 -name '\.git' -type d -print -quit | grep '\.git' &> /dev/null
looks for a folder called .git
in the current directory. It doesn’t look in any sub-folders because we set -maxdepth
to 1, and it only looks at folders since we’ve set the -type d
. The -print
flag tells find
to print the folder name if it finds the folder, otherwise, with the -quit
flag it will exit silently. By piping (|
) the output to the grep
statement we can check to see if the output results in a positive hit (the error buffer, $?
will be 0). A lot happens in that little line, but basically it checks to see if .git
is a folder.
If it is a folder (and $?==0
) then we run git status
. The -z
flag in the if
statement checks to see if there is any output. The git status --porcelain
command returns nothing if all files are committed, otherwise it returns the status of untracked and uncommitted files:
$ git status --porcelain
M Baconizing_paper.Rmd
M R/lead_ages.R
?? Baconizing_paper_cache/
?? R/deprecated/
?? R/recalibrate_actual.R
?? figures/lead_binford_mod.svg
?? installLib.sh
?? short_bash.sh
If there was no output we would increment the good
counter (good=$((good+1))
), otherwise we would move to the else
, print output to the screen (printf "${RED}%-30s${NC} contains untracked (or uncommitted) files\n" $D
), and increment the counter for untracked changes (uncom=$((uncom+1))
).
printf
has some advantages over the simple echo
statement. For one, we can be a bit more explicit about how we actually format the output string. Using the ${RED}
and ${NC}
variables defined at the top of the script we can color code our output. %s
is the standard formatting for including a variable in the printf
statement, where the quoted text would be followed by the variable. We see this again at the bottom of the script. Here we add some more formatting instructions to the %s
call. By adding -30
to %-30s
we are telling printf
to make the character block occupied by the variable ($D
, our directory name) 30 characters long, and to pad with whitespace on the right. This ensures that we get consistent alignment of both our directories (on the one side) and their associated messages.
Now that we’ve dealt with directories that contain .git
folders, we close the if/else/fi
conditional and report on the directories that don’t have .git
folders. There’s no need to test the git status
, since we know that without a .git
folder there’s no .git
tracking.
So, we can close out this if/else/fi
conditional, we are still in the directory we’re checking, so now we need to hop back out to our parent directory and get ready to move into the next project directory $D
as we go back through the loop until we’re finally done
.
At the end.
Once we’re done we’ll get something printed to the screen that looks a bit like this:
tac_earthcube/ contains untracked (or uncommitted) files
teststaninstall/ is not currently tracked with git
throughputdb/ contains untracked (or uncommitted) files
throughput-ec.github.io/ contains untracked (or uncommitted) files
tilia-api/ contains untracked (or uncommitted) files
Workbooks/ contains untracked (or uncommitted) files
workflow-paper/ is not currently tracked with git
This is nice, but I’d like to be able to get a quick summary of the results of this analysis, so I want to print out the final status of each counter:
printf "You currently have:\n * %s clean project folders\n * %s tracked folders awaiting commits\n * %s untracked folders\n" $good $uncom $untr
And there you go:
You currently have:
* 21 clean project folders
* 43 tracked folders awaiting commits
* 17 untracked folders
What you’ve learned
Hopefully in this resource you’ve:
- obtained a handy tool to move through a set of project directories to check which directories need some attention to get them cleaned up and committed.
- leaned a bit about flow control in bash
- learned how to print in color to the terminal
- learned how messy a person’s project folders can get
You’re welcome to help improve or update this bash file by commenting on a live version of the file on the github gist I’ve made for the checker. Thanks for reading!