SEAMS is about
- Software: useful computer abstractions for transforming inputs into outputs, created by…
- Engineering: an organized, systematic approach to design and delivery of reliable, flexible, practical, and human-useable systems for…
- Applied: considering empirical data…
- Mathematical: within quantitative, rules-based representations…
- Sciences: that make testable predictions.
These concepts can guide the way you approach a new research problem–or, put another way, they can help you understand the problem requirements, use those to create design addessing them, and plan the work to execute and validate that design.
The Requirements, Design, and Planning Work Loop
The basis of science is observation, prediction, and experiment. Software might be used in any or all of these steps, and the best practices for those activities translate naturally to any software project. Collecting field data? You need to make unambiguous observations, deal with incomplete measurements, and perhaps even carefully censor certain observations. These translate directly into implementation issues for your project: how to represent measurements (with or without units? standardizing categorical data?), differentiate between nothing observed and not observed, and transform raw data into curated, ready-to-analyze data.
A deliberate approach to Design is one way researchers can ensure that their work meets the high standards of Science. Taking the time to make a design does not directly write the code to solve your problem, and as such people often think “Why bother?” For exploratory work, that approach can be fine. But much like a scientist cannot undertake experimental, publishable work without establishing a protocol (and in the case of clinical trials, this is a legal matter), approaching a serious programming project without laying it out first will doom the result.
For this topic, we recommend you think of loop of activities: determining requirements, setting a design that will verifiably satisy those requirements, and planning on how to execute that design.
Requirements
There are a variety of good formal descriptive approaches used for thinking about requirements. None of them are perfect for every combination of people and project! However, usually all of them can provide some insight into thinking about your problem.
In general, developing requirements is about having sufficient constraints (1) to start working on your problem and, later, (2) to tell if you have actually solved your problem. Generally, you should start without having (2) completely finished (at least, for a completely new project - extending a well-established project is another matter), but you aren’t finished until (2) is complete.
For both (1) and (2), you can often identify constraints by posing questions and thinking about their answers. The mneumonic 5WH is a good general set of questions to start with:
- What: what are the input(s)? the analysis? the output(s)?
- Why: why do the work? to answer a specific question now? a generic question when data becomes available? to simplify or standardize future work?
- Who: who is going to provide input? use the software? see the outputs?
- Where: where does your program run? a personal computer, a supercomputer, over the web, …?
- When: when does the program run? once, so you can publish results? whenever new data becomes available?
- How: how does the software provide the needed capabilities? in a particular language? using a particular library?
These questions can be initially answered somewhat loosely–e.g., the input is a csv file–and increasingly refined as you develop the project and learn more about the detailed requirements–e.g., the input is a filename, provided via command line argument, which identifies a csv file which is semi-colon delimited.
Requirements References
- Wikipedia: Software Requirements, and most particularly the specification section. There’s some business jargon, but the high level concepts (e.g., user stories) may be applicable to your project.
- Assorted other requirements related items: here, here, and here (though this one has annoying pop-ups)
As always: if you find an interesting / useful link or book, please feel free to suggest it to the site by forking, editing this file, and requesting a merge!
Design
With a starting set of requirements, you can record a design for how information will flow (and is transformed) from input(s) to output(s). You can write this flow down using something between prose (often called plain or natural language) and pseudo-code, but the key here is communication: you are providing a document for other developers (including future you!), users (again, future you!), and scientists reviewing your work (still including future you!) to think about what your software does. This high level description should be roughly consistent with what you might put in the main text of a scientific publication for a “Materials & Methods” section.
Circling back to requirements, you should have a clear idea which steps are associated with which needs. If there’s a big imbalance in your requirements–most link to a single step, say, then perhaps you need to revisit your steps (though you may just have a high level of details in part of your requirements and a lot of work to do on the rest). Alternatively, if a particular requirement is associated with too many steps, you probably don’t have enough detail in the requirement or your steps aren’t dividing up your problem cleanly. This approach is more than just shuffling around words–taking this approach will help you size your code into intellectually and practically digestible chunks.
Your architecture can be used as a skeleton for your project. For example, if you identify seven high level steps, you can probably organize your code into seven high level pieces (e.g., seven scripts or folders collecting related scripts, or seven top level make
targets). If you try to organize that way, and determine it makes more sense to have, say, four of those things (whatever the right instantiation is), then perhaps your high level process is only four steps. These steps indicate minimal boundaries where you might want to test outputs, or store intermediate results. They represent chunks that might be reusable (or largely replaceable with external libraries).
Finally, this approach to designing your project is iterative. Obviously, you can refine your high level steps, but also you can dig inside each step. While your initial steps correspond to high level processes, each refinement corresponds to lower level detail. There should be at least a few passes at this refinement between the highest level description of your approach and the level which is quite nearly actual code.
Design References
- Wikihow: Pseudocode
- Stackoverflow: Coding 102
- Latex Packages for Pseudocode - checkout package documentation for examples of various styles
- Wikipedia: Flow Diagrams (check out the for loop)
- (Youtube) Flowcharts and Pseudocode
- Smartdraw (commercial product)
- the Wikipedia page is a thorough overview, though a bit technical. Start with the History section.
- Some introductory pages on design patterns.
Planning
With a project design in mind, researchers can make a plan to accomplish that work. Important scientific work is increasingly collaborative, and modern communication technology enables those collaborations to occur across continents. Similar technology makes unprecedented amounts of computational resources available for research.
However, effective use of these advantages requires careful planning. Software development companies will often adopt formal planning and tracking processes once they reach sufficient size and project complexity. At SEAMS, we are not covering any of those formal processes directly, but rather focusing on the more fundamental concepts such approaches are intended to address.
Part of planning is writing down requirements and architecture. Next is ordering the implementation, scheduling, and if the work is collaborative, figuring out who is covering what part of the project. When executing your design, plan to implement a piece at a time, and to use that piece to produce results. That is, implement the smallest practical bit, verify the code behaves as you intend, and then move on to the next piece. This process of isolation also creates an organization where multiple people can attend to different pieces.
Other References
- Wikipedia: Software Design; ignore the arcane vocabulary, and focus on the high level concepts
- An introduction to software testing
- A general guide to testing in Python
- A guide to
unittest
(PyUnit) - …and another
- using R package testthat
- Tests for randomness
- Thinking a bit about what testing means.