This repository (also available at https://doi.org/10.5281/zenodo.15611161) contains the artifact (scripts and data) for our meta paper "Confusing Value with Enumeration: Studying the Use of CVEs in Academia", accepted to USENIX Security 2025. Our work captures pitfalls of using CVEs as a proxy metric to measure real-world impact and provides actionable recommendations our community can take to avoid them. We hope that this work and our recommendations can contribute to the debate of how to use CVEs safely (if at all).
Our paper consists of three separate analyses, as detailed in the following.
We conduct a large-scale quantitative analysis of CVE use by papers published on the "top four" A* conferences in the field of systems security (IEEE S&P, USENIX Security, ACM CCS, ISOC NDSS). Our results show increasing prevalence of CVEs over time.
Important
For legal reasons, we do not share paper PDFs used during our quantitative (or qualitative) analysis nor any representation directly resulting thereof.
Next, we qualitatively study the outcome of all CVEs claimed by papers on the "top four" conferences in the past five years (2020-2024). More precisely, we track the underlying bug report for the 1,803 CVEs claimed by papers, and we analyze how the maintainers of affected projects have reacted to them.
Important
As part of this work, we critically assess the CVE assignments obtained by fellow researchers. Among these, we identified papers that claim CVEs with questionable or uncertain impact. After internal discussion and consultation of our reviewers and USENIX's Research Ethics Committee, we decided not to disclose individual author names or paper titles. We release a subset of our data containing a mapping of CVEs to outcomes as well as a list of analyzed titles.
Finally, we survey 103 members of our community, most of which have served as reviewers on the 2023 program committee of one of the "top four" conferences. This way, we capture the perception of CVEs within our community, can study misconceptions, and sample the opinion of how CVEs should be used (if at all).
Important
We release the survey raw data without any identifiable information, including free text fields (such as feedback on the survey).
Most likely, you will be interested in the different datasets published in the respective sections.
If, however, you want to reproduce every step of our work, you will need to invest additional effort to (1) retrieve all paper PDFs published on respective conferences and (2) identify which CVE IDs are claimed by any given paper (for the qualitative analysis). Even though we do not provide this data for legal and ethical reasons, we stress that our analyses are based on publicly available data. Once you provide this data, you can use our scripts for processing, evaluation or verification purposes.
We provide necessary dependencies in the respective directories, install them via pip install -r requirements.txt. Any recent version of Linux (e.g., Ubuntu 22.04) with a recent Python version (e.g., 3.12) should suffice to run our scripts.