Lora Fleming, Anthony Kessel, Virginia Murray, Michael Depledge, Sabina Leonelli, Niccolò Tempini, Harriet Gordon-Brown, Gordon Nichols, Christophe Sarran, Paolo Vineis, Giovanni Leonardi, Brian Golding, and Andy Haines
This is an advance summary of a forthcoming article in the Oxford Research Encyclopedia of Environmental Science. Please check back later for the full article.
Over the past few decades, thanks to the rapid expansion of computer technology, there has been a growing appreciation for the potential of big data in environment and health research. Big data refers to large, complex, potentially linkable data from diverse sources, ranging from the genome and social media, to individual health information and the contributions of citizen science monitoring, to large scale long-term oceanographic and climate modeling in innovative and integrated data mashups. The promise of big data mashups in environment and human health includes the ability to truly explore and understand the “wicked environment and health problems” of the 21st century, from tracking the global spread of the Zika and Ebola virus epidemics to modeling future adaptation to climate change at the city or national level. Other opportunities include the possibility of identifying environment and health hot spots, where innovative interventions can be evaluated to prevent, mitigate, or adapt to climate and other environmental changes over the long term; and of locating and filling gaps in existing knowledge of relevant factors. On the more individual level, there is the potential for individual control of personal data, benefits to health and the environment from smart homes and cities, and opportunities to contribute in citizen science research and share information locally and globally.
At the same time, there are challenges inherent with data mashups, particularly in the environment and human health arena. Environment and health represent very diverse disciplinary areas with different research cultures, ethos, languages, and expertise. Equally diverse are the types of data involved (including time and space scales, and different types of modeled data), often with no standardization of the data as yet, which would allow easy linkage beyond time and space variables, as data types are mostly shaped by needs of the communities where they originated and have been used the most. The ways in which research communities in health and environmental sciences approach analysis as well as statistical and mathematical modeling are widely different, and there is a lack of trained personnel who can span these interdisciplinary divides and who have the necessary expertise in the techniques that make this kind of interdisciplinary bridging possible, such as software development, big data management and storage, and data analyses. Health data in particular have unique challenges due to the need to maintain confidentiality for the individuals being studied, to evaluate the implications of shared information for the communities affected by research, and to resolve the long-standing issues of intellectual property and ownership occurring throughout the environment and health fields. As with other areas of big data, the new “digital divide” is growing, where some researchers and research groups have access to data and computing resources while others do not; although crowdsourcing, citizen science, and communities of science are some of the alternative approaches. Finally, funding, especially when aimed to encourage sustainability and accessibility of big data resources (from personnel to hardware), is currently inadequate; there is widespread disagreement over what business models can support long-term maintenance of data infrastructures, and these models are often inadequate to deal with the complexity and resource-intensivity of the infrastructure development enterprise.
Nevertheless, researchers, policy makers, funders, governments, the media, and even the general public recognize the innovation and creativity potential of big data for environment, health, and many other areas. The need is urgent to explore the challenges, opportunities, and potentials of big data mashup applications for the environment and human health research.