Jorre’s Profile

Reproducing a Study

2025-05-20T00:00:00+00:00

I have recently finished my final in my Open GIScience class involving trying to reproduce the work by Charles W. Sterling III studying Connections Between Present-Day Water Access and Historical Redlining 10.1089/env.2022.0115. The study used redlining polygons from the Mapping Inequality Database to determine if there was correlation between historic redlining and modern-day access to water, gathered through census plumbing data. The study found positive relationships between lower Home Owners’ Loan Corporation (HOLC) scores and higher rates of incomplete plumbing. While this is certainly the case across the United States, the study has many problems due to the modifiable areal unit problem (MAUP). These HOLC polygons only covered 1920 cities, which largely correspond to only downtown areas today, additionally, the demographics of each of these polygons have changed over time. Using a HOLC zone as the spatial resolution meant that tract data can be matched incorrectly to the actual demographics of each neighborhood.

Our reproduction had many issues along the way, including working with the large datasets necessary to complete the project, as well as the original study making some specific procedures hard to follow. Instead of switching between multiple programs (R, Python, ArcGIS), our reproduction decided to make an analysis plan all in R, which led to our major problems. The foremost of these problems had to do with the geometry of the HOLC polygons, which contained many errors in the sf package. The result of these errors was HOLC zones assigned with drastically different demographic statistics in some areas. Ultimately, our reproduction yielded the same conclusions relating lower HOLC scores to higher rates of incomplete plumbing, but other statistics remained different.

If you want to see an html summary of this project, click this link.

Reproducing a Study

2025-04-07T00:00:00+00:00

I have recently started a project in my Open GIScience class involving trying to reproduce the work by Jayajit Chakraborty studying Social inequities in the distribution of COVID-19: An intra-categorical analysis of people with disabilities in the U.S.10.1016/j.dhjo.2020.101007. The study used early COVID-19 Data to find correlation between rates of disability and COVID-19 infection rate in US counties during the early stages of the pandemic. In my reproduction, I found similar but not identical results. In the end, my repreoduction found the same significant predictions that COVID-19 rates were higher in areas where there were high rates of disability among marginalized groups (black, hispanic, below poverty level, and female). However, there were discrepancies in the final results, notably: in the categories of black, non-hispanic non-white, below-poverty, and female disability rates, the coefficient values were higher in my study than the original.

This reproduction study was difficult for many reasons. For one, the proprietary software SATScan used by Chakraborty could not be used by me in order to fully open-source the project. Although some of the key formulas in my R script by SpatialEpi, the overall GEE clustering method should be the same. However, because of differeces in parameters between the two functions, I ended up with different reults.

If you want to see an html summary of this project, click this link.

Starting Projects with a Compendium and Analysis Plan

2025-02-24T00:00:00+00:00

I have recently started a project in my Open GIScience class involving studying the fairness of new congressional districts made in 2023 for Alabama. In this project I am using the HEGGSRR-Template for my github to accurately document this project for open source publishing and reproduceability. Currently, a lot of the challegnes surround the tedious nature of documenting every step of the project. Instead of just posting a workflow, metatdata and descriptions are written for each data source, as well as an explanation of biases for each step in the process. This can make it easy to miss or write sparsely on certain parts of the documentation, and I’m still learning in this process.

On the other hand, this method will allow me to approach the actual coding with a more clear understanding of what needs to be done. In this way, time spent earlier is equal to time saved later on. I think that doing plans for my code beforehand will result in a quicker process of being able to publish my results. Additionally, this method will make it easier for others to see my work and note where I may not be writing enough or anything at all, and where my code may be making a mistake.

If you want to see an html summary of this project, click this link.

Is GIScience Reproducible

2025-02-16T00:00:00+00:00

Based on my personal experience with Geography and Spatial Data Science, Reproducibility and Replicability have been incredibly important. In GIScience, like any other science, knowledge is built off of the work of others in the field. For writing new scripts, workflows, or tests, its important that work published is both open-source and reproduceable to any other reader. I think that making a workflow that works under a variety of conditions is critical to both the verification of one’s own work, but also the inspiration and beginnings of new studies. Reproduceable work can lead to the basis of new tools in GIScience. That being said, there are often conditions of GIScience inherent to the field that I think may not work well with replicability. For image processing on any satellite data, location is extremely important. In this way, the world could be considered that the world is capricious. Still, there are many explanations to why image processing for mountains in Siberia may not yield the same results as one for the Andes. For cases like this, localized training of image processing models and scripts are necessary, as image processing in this way might gloss over the complexities of physical geography.

I think that what challenges Reproducibility and Replicability in GIScience most is a lack of open-sourcing of one’s knowledge. Proprietary hold over different methods in spatial data analysis leads to a lack of trust in any given results, but also a slowing down of progress that can be made in the discipline. According to the opinions of researchers in GIScience, over 28% do not think that reproduceability is important to them. Additionally, over 46% researchers do not feel encouraged to write replication studies themselves becasue of pressures to write original research. These pressures can lead researchers to believe that Reproduceability and Replicability are not important, leading to its decline.

References

NASEM (National Academies of Sciences, Engineering, and Medicine). 2019. Reproducibility and Replicability in Science. Washington, D.C.: National Academies Press. DOI:10.17226/25303.
Holler, Joseph, Yifei Luo, Peter Kedron, and Sarah Bardin. 2023. “Reproducibility Survey Data Visualization.” OSF. August 15. doi:10.17605/OSF.IO/B47XU.
Holler, Joseph, Yifei Luo, Peter Kedron, and Sarah Bardin. 2023. “Replicability Survey Data Visualization.” OSF. August 15. doi:10.17605/OSF.IO/KUCHA.