Lora Fleming, Anthony Kessel, Virginia Murray, Michael Depledge, Sabina Leonelli, Niccolò Tempini, Harriet Gordon-Brown, Gordon Nichols, Christophe Sarran, Paolo Vineis, Giovanni Leonardi, Brian Golding, and Andy Haines
Big data refers to large, complex, potentially linkable data from diverse sources, ranging from the genome and social media, to individual health information and the contributions of citizen science monitoring, to large-scale long-term oceanographic and climate modeling and its processing in innovative and integrated “data mashups.” Over the past few decades, thanks to the rapid expansion of computer technology, there has been a growing appreciation for the potential of big data in environment and human health research.
The promise of big data mashups in environment and human health includes the ability to truly explore and understand the “wicked environment and health problems” of the 21st century, from tracking the global spread of the Zika and Ebola virus epidemics to modeling future climate change impacts and adaptation at the city or national level. Other opportunities include the possibility of identifying environment and health hot spots (i.e., locations where people and/or places are at particular risk), where innovative interventions can be designed and evaluated to prevent or adapt to climate and other environmental change over the long term with potential (co-) benefits for health; and of locating and filling gaps in existing knowledge of relevant linkages between environmental change and human health. There is the potential for the increasing control of personal data (both access to and generation of these data), benefits to health and the environment (e.g., from smart homes and cities), and opportunities to contribute via citizen science research and share information locally and globally.
At the same time, there are challenges inherent with big data and data mashups, particularly in the environment and human health arena. Environment and health represent very diverse scientific areas with different research cultures, ethos, languages, and expertise. Equally diverse are the types of data involved (including time and spatial scales, and different types of modeled data), often with no standardization of the data to allow easy linkage beyond time and space variables, as data types are mostly shaped by the needs of the communities where they originated and have been used. Furthermore, these “secondary data” (i.e., data re-used in research) are often not even originated for this purpose, a particularly relevant distinction in the context of routine health data re-use. And the ways in which the research communities in health and environmental sciences approach data analysis and synthesis, as well as statistical and mathematical modeling, are widely different.
There is a lack of trained personnel who can span these interdisciplinary divides or who have the necessary expertise in the techniques that make adequate bridging possible, such as software development, big data management and storage, and data analyses. Moreover, health data have unique challenges due to the need to maintain confidentiality and data privacy for the individuals or groups being studied, to evaluate the implications of shared information for the communities affected by research and big data, and to resolve the long-standing issues of intellectual property and data ownership occurring throughout the environment and health fields. As with other areas of big data, the new “digital data divide” is growing, where some researchers and research groups, or corporations and governments, have the access to data and computing resources while others do not, even as citizen participation in research initiatives is increasing. Finally with the exception of some business-related activities, funding, especially with the aim of encouraging the sustainability and accessibility of big data resources (from personnel to hardware), is currently inadequate; there is widespread disagreement over what business models can support long-term maintenance of data infrastructures, and those that exist now are often unable to deal with the complexity and resource-intensive nature of maintaining and updating these tools.
Nevertheless, researchers, policy makers, funders, governments, the media, and members of the general public are increasingly recognizing the innovation and creativity potential of big data in environment and health and many other areas. This can be seen in how the relatively new and powerful movement of Open Data is being crystalized into science policy and funding guidelines. Some of the challenges and opportunities, as well as some salient examples, of the potential of big data and big data mashup applications to environment and human health research are discussed.
Giovanni Lo Iacono and Gordon L. Nichols
The introduction of pasteurization, antibiotics, and vaccinations, as well as improved sanitation, hygiene, and education, were critical in reducing the burden of infectious diseases and associated mortality during the 19th and 20th centuries and were driven by an improved understanding of disease transmission. This advance has led to longer average lifespans and the expectation that, at least in the developed world, infectious diseases were a problem of the past. Unfortunately this is not the case; infectious diseases still have a significant impact on morbidity and mortality worldwide. Moreover, the world is witnessing the emergence of new pathogens, the reemergence of old ones, and the spread of antibiotic resistance. Furthermore, effective control of infectious diseases is challenged by many factors, including natural disasters, extreme weather, poverty, international trade and travel, mass and seasonal migration, rural–urban encroachment, human demographics and behavior, deforestation and replacement with farming, and climate change.
The importance of environmental factors as drivers of disease has been hypothesized since ancient times; and until the late 19th century, miasma theory (i.e., the belief that diseases were caused by evil exhalations from unhealthy environments originating from decaying organic matter) was a dominant scientific paradigm. This thinking changed with the microbiology era, when scientists correctly identified microscopic living organisms as the pathogenic agents and developed evidence for transmission routes. Still, many complex patterns of diseases cannot be explained by the microbiological argument alone, and it is becoming increasingly clear that an understanding of the ecology of the pathogen, host, and potential vectors is required.
There is increasing evidence that the environment, including climate, can affect pathogen abundance, survival, and virulence, as well as host susceptibility to infection. Measuring and predicting the impact of the environment on infectious diseases, however, can be extremely challenging. Mathematical modeling is a powerful tool to elucidate the mechanisms linking environmental factors and infectious diseases, and to disentangle their individual effects. A common mathematical approach used in epidemiology consists in partitioning the population of interest into relevant epidemiological compartments, typically individuals unexposed to the disease (susceptible), infected individuals, and individuals who have cleared the infection and become immune (recovered). The typical task is to model the transitions from one compartment to another and to estimate how these populations change in time. There are different ways to incorporate the impact of the environment into this class of models. Two interesting examples are water-borne diseases and vector-borne diseases. For water-borne diseases, the environment can be represented by an additional compartment describing the dynamics of the pathogen population in the environment—for example, by modeling the concentration of bacteria in a water reservoir (with potential dependence on temperature, pH, etc.). For vector-borne diseases, the impact of the environment can be incorporated by using explicit relationships between temperature and key vector parameters (such as mortality, developmental rates, biting rate, as well as the time required for the development of the pathogen in the vector).
Despite the tremendous advancements, understanding and mapping the impact of the environment on infectious diseases is still a work in progress. Some fundamental aspects, for instance, the impact of biodiversity on disease prevalence, are still a matter of (occasionally fierce) debate. There are other important challenges ahead for the research exploring the potential connections between infectious diseases and the environment. Examples of these challenges are studying the evolution of pathogens in response to climate and other environmental changes; disentangling multiple transmission pathways and the associated temporal lags; developing quantitative frameworks to study the potential effect on infectious diseases due to anthropogenic climate change; and investigating the effect of seasonality. Ultimately, there is an increasing need to develop models for a truly “One Health” approach, that is, an integrated, holistic approach to understand intersections between disease dynamics, environmental drivers, economic systems, and veterinary, ecological, and public health responses.