Why Big Data Matters to Researchers Now More Than Ever

The following is an adapted excerpt from the report “Top 10 Trends Driving Science,” a look at the social, political, and economic forces affecting researchers in 2017.

Research has always meant data, but never quite like this. Modern research produces experimental data not just from in vitro and in vivo studies, but also from simulation-driven in silico work. The ability to interpret, compare, and contrast data sets is essential.

As the amount of scientific data increases, it becomes imperative to determine how to store it, archive it, and make it accessible to other researchers and the general public. This openness can have enormous benefits. For example, in medicinal chemistry, data analysis can aid in decision-making for drug discovery research.

This is motivated by parallel synthesis, the creation of increasingly large and complex analytical and biological data sets associated with each new chemical entity and the requirement to integrate publicly available information, including patent literature, into the design process. If other researchers can easily access this information, it can speed up the production of future research and avoid duplication.

Managing the data is a complex process, however, and it can require the use of specialized tools. Dealing with large data sets isn’t just about file size. It may also mean being able to keep up with evolving data sets or handling a combination of structured and unstructured data. Researchers also need the means to detect errors hidden in rows and rows of tables. These skills may be outside the traditional training of many scientists. In 2016, however, every researcher needs to have at least some skills as a data scientist.

Using existing data sets makes sense. It can save researchers time and money and, when applied to their own work, can potentially aid in research decision-making. “Potentially” is the key word here, because data is useless if the research community cannot learn how to effectively harness it. While researchers are great at producing data, they need better data management infrastructure, systems, and training. Organizations also need to find standardized ways of structuring data to allow researchers to use it more efficiently. According to one estimate, 40% of all research and design experiments are duplicated effort brought on by efficient design or a lack of information technology resources.

“Conversion of massive amounts of chemical and biological data into cogent insights is becoming a significant area of opportunity. With ever-advancing sensor capabilities and the expanding power of computers to generate computational data the need to understand massive data sets will drive a lot of scientific endeavors and will offer exciting opportunities to advance the chemical and biological sciences,” says Kenneth Merz, Editor-In-Chief of the Journal of Chemical Information and Modeling.

This optimism is shared by many others including Jonathan Sweedler, Editor-in-Chief of Analytical Chemistry. “This is a golden age of measurement science. From the evolving challenges of environmental monitoring to following the chemical intricacies occurring in our brains that give rise to consciousness, analytical chemistry-related grand challenges are capturing national attention and becoming national research priorities.”

This rise in measurement science extends to all industries. And chemists are at the heart of many of these advancements. “Chemists have been central to these developments and will exploit genetic information in new and unexpected ways going forward. Links between genetic changes and their resultant human diseases increasingly will be understood in molecular terms, and new treatments and preventive strategies will emerge,” says Carolyn Bertozzi, Editor-in-Chief of ACS Central Science.

How can we make it easier for researchers to keep having data-fueled discoveries? One answer is education. Scientists of all types need a solid grounding in data management. They should be able to recognize sources of relevant information, prepare raw data, use statistical tools, extract meaningful information, interpret results, recognize potential problems, and make visualizations to convey their findings. That kind of training is especially essential for reviewers and editors, who need to be able to spot false positives and statistical manipulation to prevent spurious studies from gaining traction.

Scientists have more information than ever at their disposal. But to do any good, they have to know how to use it.

Read more articles in the Top 10 Trends Driving Science series and download the full report.

If you have comments or questions for the author of this post, please e-mail: Axial@acs.org.