Lucas Nerbonne’s Portfolio

Sterling Reproduction Study

2025-05-20T00:00:00+00:00

For my third and final project of this semester’s OpenGIScience course, my classmate Jorre Dahl and I undertook a reproduction of Charles Sterling’s 2023 study titled “Connections between present-day water access and historical redlining”. Over the course of this reproduction we wrangled huge census datasets, dealt with broken geometries throughout datasets, and reproduced Sterling’s regression analysis on water access in redlined neighborhoods. I learned a lot about both the application of and structure behind geographic methods and am extremely proud of our result.

View the final report here

Or view the GitHub repository here

This project is contributing a well-documented open-source series of research methods that conceivibly could be applied to many different questions pertaining to the historical effects of redlining. This workflow is made accessible through full data and code transparancy and by limiting implementation to one language/progam, finding ways to conduct an analysis that Sterling did in R, ArcGIS, and Python in just R.

References

Sterling III, Charles W., et al. “Connections between present-day water access and historical redlining.” Environmental Justice (2023). DOI:10.1089/env.2022.0115

Jay Chakraborty Reproduction Project

2025-05-11T00:00:00+00:00

For my next project for OpenGIScience, I undertook a partial reproduction of Jayajit Chakraborty’s 2021 study Social inequities in the distribution of COVID-19: An intra-categorical analysis of people with disabilities in the U.S., which sought to investigation the connection between Covid 19 Incidence rate on the county level and the prevalence by a variety of demographic subgroups. The author was specifically interested in the correlation between disability rate and Covid cases, something that would be interesting for both public health officials and policymakers.

This project gave me the opportunity to dive into the nuts and bolts of epidemiological clustering functions through the author’s choice of using clusters of high incidence counties as a spatial control, with interesting results that I’m still working through.

If you want to check out the project results they’re here: https://lnerbonne/Covid_19_Clustering_Original/blob/main/docs/report/Final_Analysis_and_Report.html

The Github repo for the project can be found here: https://github.com/lnerbonne/Covid_19_Clustering_Original

References

Chakraborty, J. 2021. Social inequities in the distribution of COVID-19: An intra-categorical analysis of people with disabilities in the U.S. Disability and Health Journal 14:1-5. DOI:10.1016/j.dhjo.2020.101007

Gerrymander Analysis Results

2025-03-14T00:00:00+00:00

Excited to publish the results of my Alabama Gerrymandering analysis! The new population difference metric did a pretty good job at benchmarking gerrymandering - check out the final product down below!

View my Alabama Gerrymandering Analysis

I was really intruiged by the full circle completion of my first round of this pre-planning-execution-results cycle. I especially found it useful to force myself to interrogate the metadata of the datasources I was using; too often I assume I know what’s in a file and don’t take the time to ensure that I understand it well.

I’m excited to get to work on my next project- a clustering and demographics analysis reproduction from a study on county-level Covid 19 data from August of 2020.

On procedure documentation: how is it changing the way that I work?

2025-02-23T00:00:00+00:00

As a researcher it’s sometimes interesting to step back and ask myself “how did I get to this place in my project?”. Oftentimes in the middle of the research process I can give you a general idea- first I tried x, then I pivoted to y, and now I’m working on z ect. Whatever path I’ve taken through the research process oftentimes feels like it makes perfect sense to me, even if I can barely tell you why I decided to head down each road in the first place.

This process was tested this week as I set out to write my pre-research plan for an analysis of congressional gerrymandering in Alabama. Instead of my usual let-it-rip process I instead utilized HEGSRR’s open source template for reproducible geographic research to pre-plan my research approach. This included documenting data source metadata, recording processing environment and package metadata, and detailing data transformations before I ever hit ‘run’ on any code. This forced me to really think intentionally about what I wanted to get out of each piece of data. Additionally, I spent time dictating what different results would mean in context of the study in an attempt to discourage cherry-picking significant results.

This process was especially interesting to me because it forced me to put to paper (or vsc, in this case) what my thought process was. In a lot of ways this is what I’ve spent the most time developing over my four years at Middlebury; hard skills come and go, but what doesn’t is your ability to look over a dataset and make decisions about how to treat data. I’ll be curious to see how my workflow does during implementation, which should get done this week.

View my Pre-Planning documentation

Is GIScience Reproducible?

2025-02-10T00:00:00+00:00

As the scientific community grows and publication rates increase, it’s paramount for the applicability and trustworthiness of this increasing amount of data to scale with the sheer volume of new information being presented yearly, irrespective of the discipline. In tandem with the wider accessibility of the sciences worldwide, this has resulted in the number of scientific papers published annually more than tripling since 2000. This trend is especially true for geography, a discipline that could be said to be having somewhat of a ‘kid in a candy store’ boom of research potential with exponential increases in recorded image volume every year. For even just two satellites- ESA’s Sentinel 1 and 2- downlinked data volume is over 3 petabytes annually. If printed, this data would be enough to fill 1.5 trillion sheets of 8.5 x 11 paper. As the sheer amount of data and its accessibility increases year-to-year, so too do the opportunities for scientific breakthroughs that warrant dissemination through publication.

As the number of papers published annually has increased, so have concerns that many of the findings presented within are not sufficiently reproducible. This lack of reproducibility goes against one of the central tenets of the Scientific Method; the communality of science allows for work that builds off of the findings of others, trusting in the validity of their work to make further conclusions. This field-wide lack of reproducibility has been frequently publicized in the fields of psychology and medicine as a replicability crisis that threatens to call many things previously believed to be scientific fact into question. A lack of reproducibility can stem from various factors, including failure to share research data, withholding statistical analysis code, or inconsistencies in methodology that lead to others not being able to replicate your work. These lapses represent failures of the scientific process as it means that other researchers aren’t necessarily able to rely on the validity of your results.

GIScience, being a subdiscipline of geography focused mainly on solving problems through spatial data analysis, is especially well suited to both effectively take advantage of the scientific community’s newfound glut of spatial information and to pave the way for new standards of reproducibility. This could look like several things: sharing specific information on how data was acquired (so that others can get the same data), sharing code from data manipulation (so others can check for mistakes), and documenting your process of analysis (so that others can question what priors you may have introduced to the work). GIScience is well-positioned to do this for a couple of reasons. Geographic data sources are sometimes publicly available and unlike many other ‘hard’ sciences often don’t involve field data collection, something that can’t necessarily be repeated the same way twice. Additionally, computational geographic techniques allow researchers to make their entire workflow public, allowing for the exact duplication of work by others to verify results and learn from techniques used.

If this is so possible, are we doing it? Broadly speaking, no. When surveyed, more than half of geographers said they were either only sometimes or rarely/ever using open-source software to communicate their research findings. Similarly, more than 3/4 of the respondents said they only sometimes/rarely/ever attempt to share the code used in their research. While this usage rate appears poor, the motivation to share work is there; approximately 75% of respondents said that reproducibility was either very or somewhat important within their subfield. This begs the question: where is the disconnect? If researchers both know that reproducibility is important but still aren’t following through with implementing these practices in their own work, how can we get them to yes and ‘bake in’ the expectation of open-source research into the GIScience field? Likely the answers lie in incentivising the more diligent and time-consuming work that it takes to truly make a project open source and convincing researchers that it’s in their best interest to work together, along with giving researchers resources to learn the skills that are necessary to adequately share their work in a reproducible way.

I’ll be thinking more about this in the weeks ahead, so stay tuned for my definitive and all-encompassing findings (ha ha). I might even share my thought process (so open-source of me)!

References

Research Publication Output Over Time

Sentinel 1/2 Data

Geographer Open-Source Opinions/Practice