Lessons learned in working with real-life data in resource constrained settings with limited domain knowledge

Each day, 1.8 billion individuals drink water contaminated with faeces. Improved faecal matter detection methods are crucial in identifying causes of water contamination, communicating risks to users more efficiently, and developing adequate solutions, especially in the domain of sanitation infrastructure.

Over the course of two months in Summer 2018, as part of the AfriWatSan project, I collected water samples in the Greater Dakar region and tested them for faecal contamination using two methods: traditional, culture-based methods, and a new real-time, fluorescence-based method: tryptophan-like fluorescence (TLF). This new method offers several key advantages over traditional methods as it is portable, real-time and easy to use. The initial goal of my research was to follow up on previous field studies in which TLF produced false positives. I hoped to use geostatistics methods to identify the causes of false positives.

97 samples were collected with 48 parameters including hydrochemical parameters and environmental risks. But quickly, we realised that TLF were producing very strange results, and that no correlation could be observed between the TLF readings and actual levels of contamination as showed by the culture-based method.

In this talk, I’d like to walk the audience through some of the challenges of working with real-life data in resource-constrained settings and in with limited prior domain knowledge. I’ll share ups and downs, lessons learnt and some reflections around what it means to maintain intellectual honesty even when the pressure is strong to produce a certain type of results to be published and recognized in academia.

In terms of methods, I’ll cover the many interesting insights derived from the exploratory data analysis and from simple correlation matrices. I’ll explain the reasons why Dakar’s groundwater pollution was so extreme that TLF was not ““working”” there. I’ll then explain the geospatial methods I used and what I could learn from a simple hierarchical clustering analysis.

Finally, I’ll discuss the implications of this study, how it compares with samples collected in rural Uganda and Kenya, and what next steps are needed to better understand Dakar’s groundwater pollution (in particular, reconciling vertical flow models with our understanding of horizontal groundwater movements).”