From Mapping Scientific Software to Automating Science

Last summer, while working on a grant titled Enhancing Scientific Research Productivity with Foundation Models (funded by the Alfred P. Sloan Foundation) I attempted to map the space of scientific software. Ultimately, after spending a summer with a research assistant, working with my co-investigator on the grant and his graduate assistant, and after talking with numerous others about the topic, it became clear that this was too ambitious a task.

For one, scientific software is so broad that there are really two different classes of scientific software we can think of: specialized scientific software and generalizable scientific software. The former is what we most often associate with scientific software, or, that seemed to have been the case until recently. The latter has traditionally been less commonly associated with scientific software, although it may in fact play a much larger role than is commonly attributed to it.

Think of spreadsheets as a tool for scientific research. Without the use of spreadsheets in the 80s and 90s, a large amount of scientific research may have been slowed. These decision support and data analysis tools provide us an ability to analyze numeric data of any type, and have been referred to as ‘the killer app’ by some given the breadth and significance of their role in advancing progress in so many different disciplines during the 80s and 90s.

However, as noted, ultimately the attempts to map the space of scientific software proved futile because the scope and ambitions of the objective were too great. Yet, this work did lead to some useful insights that are documented here.


We are now seeing the emergence of different types of generalizable scientific software based on a variety of applications of large language model and foundation model technologies. Examples include applications like Elicit that utilize GPT-4 to accelerate literature review, or, ChatGPT itself could be thought to be a very powerful tool/piece of generalizable scientific software. 

Really, there are few good examples of generalizable scientific software. The electronic spreadsheet, mentioned earlier, can certainly be thought of as one. I know that I still frequently use electronic spreadsheets for quickly viewing new data for exploratory analysis. Electronic spreadsheets are also still very useful for quickly generating visualizations. However, examples of other easy-to-use general purpose scientific software tools do not easily come to mind.

Possibly one could think of Matlab or Python—high-level or scripting programming languages with broad suites of packages or libraries that can be applied for a wide range of scientific applications—as generalizable scientific software. Of course, the packages and libraries that are developed for Python and Matlab users could also be thought of as specialized scientific software, but collectively, they enable much more powerful analysis than the electronic spreadsheet. Similarly, Jupyter notebooks or the data science platform Anaconda might be thought of as generalizable scientific software. 

If it is not apparent to readers, the lines between generalizable and specialized scientific software—with a few exceptions—are very blurred. This is why it was unreasonable to try to create a map of scientific software.


What if we were to start thinking about AI scientific software in terms of the dichotomy of specialized and generalizable scientific software? It might be easy to identify examples of specialized AI scientific software like AlphaFold or BioBERT as well as examples of generalizable AI scientific software like Elicit, ChatGPT, or even GPT-4 (i.e., directly, via API access). There are many more examples of each, but we can focus on a subclass of generalizable AI software, that of agentic wrappers for foundation models.

In case it’s not apparent what I mean by agentic wrappers for foundation models, I am talking about software like BabyAGI or AutoGPT. Tools like this are not only generalizable in being able to enhance performance of other generalizable AI software, but they are also able to combine specialized scientific software with generalizable scientific software. 

At a very simplified and fundamental level, science is a process involving hypotheses and experiments. If generalizable AI scientific software like GPT-4 is able to accurately generate hypotheses, and specialized scientific software like AlphaFold is able to conduct experiments or to program specialized scientific software able to conduct experiments, then generalizable AI scientific software like BabyAGI or AutoGPT could be used to combine the hypothesis generation with the conducting of experiments. 

While AI alone might not be able to generate new major scientific discoveries in the near future, it is plausible to expect that AI will be able to help humans to automate much of the hypothesis and experimentation process. And agentic wrappers for foundation models are not absolutely necessary. Humans can fill the role of these wrappers, so foundation models alone, with an interactive user interface like ChatGPT, are able to act as productivity tools for scientists. 


Based on the reasoning laid out above, I expect to see extensive use of GPT-4 and even more powerful foundation models to come—either through API access or through the interactive user interfaces—to become increasingly more common for use in enhancing scientific productivity. The degree to which productivity might be enhanced will vary by discipline, sub-discipline, problem class, etc. These powerful new tools might just be useful as tools, or, they might be able to entirely automate the scientific process. 


This research was funded by the Alfred P. Sloan Foundation as part of the Better Software for Science Program.

Leave a Reply

Your email address will not be published.