Scenario: Your supervisor has asked you to take over a project from a former lab member. It’s a bit of a mess, and needs to be cleaned up for a publication. Here’s the request you received by email:

Dear NewLabMember,

Thank you for agreeing to pick up the pieces on this project. My postdoc, John Snow, was working on this, but he had to drop it abruptly to deal with some cholera issue in London.

I’m providing everything he gave me in the attached tgz file, but it’s a bit of a mess. It’s a basic case fatality ratio analysis for COVID-19 in South Africa. It looks like there’s a shell script to download the data, but the data are also included, so you may or may not need the script. There are also a few versions of the R code to make the figures–I’ve marked the most complete one as “good”, but maybe that’s not the best adjective for it . . . it doesn’t actually run.

I think John was doing this in RStudio, and running chunks of code in a piecemeal way, so probably the issue is that everything is there, but not necessarily in the right order. It looks to me like line 20, where the date column is cast as dates, needs to be moved up before any plotting is attempted, but please verify. We’re going to be including this in a paper we’re submitting, so it needs to be in a public repository, and organized well enough that reviewers will understand without a lot of effort. So please do whatever needs to be done to clean this up.

Best wishes, Tom

The file attached to the email can also be downloaded here:

Required steps:

  1. Create a new repository. Put something? everything? in it. Set up the repo locally and on a remote service (e.g. github, gitlab).
  2. What problems do you see? Create one or more issues on your remote repository.
  3. Begin solving the organizational problems, committing with each solution and marking the corresponding issues as solved.

Things to consider doing:

  • Add/remove files from repo as appropriate
  • Create a directory hierarchy that reflects the relationships between the files in the projects
  • Set up shared directories with your collaborators for input/output files using e.g. Dropbox
  • Improve documentation
  • Refactor the source code so that it has clearer structure, is more versatile, is easier to modify, etc.