Wednesday, April 23, 2014

Reflections on Scientific Programming

I am really starting to associate with the curse, "May you live in interesting times." Having spent more than a decade of my life in science, particularly supporting genetics labs' principal investigators (PIs) with computer programming, I have seen major changes. This blog entry was instigated by a thought about upgrading my R language idioms in light of many social media stories about Hadley Wickham's prodigious output of R-related packages that augment and in some cases replace base R language constructs. At the moment I am trying to finish a study that involves simulated as well as real data. As is typical of scientific programming, I know what I really need to do about the time I finish generating data from analytical experiments and plotting the results-- time to publish. From there we move on to the next study. Rarely do we find the time to "sharpen our saws" when we want to starting sawing down a new problem and start chopping it up. I do not put the blame on PIs, who trust their programmers and technicians to do the right thing. The problem, like so many in our society, is one of information. As science becomes more transparent, this issue will arise again and again. Transparency in programming is difficult, even when all sources of information are available. How many projects are actually based on reuse and not cut-and-pasting efforts (code surgery). Code reuse should probably be a subfield of archaeology. Tools and system are being created to address this issue, but I do not have a great deal of optimism. The systems and their solutions can be just as complex as those they are trying to address. Maybe I'm just getting old, but my opinion is that we need to slow the fuck down. A topic for a different blog post: What are we getting for all this frenetic effort? I have had the feeling for some time that a great deal of science will be exposed as (mostly innocent or accidental) fraud. There is too much previous work to check, while output increases exponentially. I don't want to end with pessimism, so I will say that I do believe most of science is done by responsible and ethical people. The perception in many cases is that science is a cut-and-dried field of exacting work. My experience is that it is mostly trial-and-error supported by statistics. Again, this is just misinformation about what science attempts to do. I love it, but I do have serious doubts about its efficacy versus common perception.