Skip to content

euxoa/plates

Repository files navigation

Bayesian Reanalysis of Photographic Plate Transients

A (Bayesian) reanalysis of correlations between nuclear tests, UAP (Unidentified Aerial Phenomena) sightings, and photographic plate transients during the Cold War era. This project extends the work originally published in Scientific Reports (DOI 10.1038/s41598-025-21620-3).

Overview

The original study found:

  • Transients 45% more likely within ±1 day of nuclear tests (p = .008)
  • Each additional UAP report correlates with 8.5% rise in transients on transient days (p = .015)
  • Nuclear tests show small but significant links to UAP counts (p = .008)

This reanalysis uses Bayesian hierarchical models to:

  1. Model actual transient counts (not just binary occurrence)
  2. Incorporate temporal autocorrelation via latent random walks
  3. Use improved center-of-plate data with edge artifacts excluded

In addition, the counts of UAP sightings appear in these models as linear covariates after the dampening (concave) transformation $\log(n(\textrm{UAP}) + 1)$.

The latent random walk is an effort to control for temporal clustering, which would artificially inflate p-values of the original analysis. This mechanism does not control for more complex, nonlocal forms of temporal dependencies, such as calendar-related effects, although initial analyses did not find obvious weekday effects.

Data

... as file names appear in scripts. Note that the data is not included in this repository:

  • Transient_CENTER_of_PLATE_FULL_DATASET_DETAILED.xlsx — "New data", used to augment the data set from the original paper, and used in the main model reported below. The new thing here is transient counts from plate centers only, with corner/edge artifacts excluded for improved reliability. 306 days with transients recorded (non-zero days only).

  • Transient_Nuclear_Analyzed_Dataset_ScientificReports.xlsx — Data of the original publication, used in some earlier models. Includes all days (with zeros). Contains nuclear test and UAP predictor data. Date range: 1949-11-19 to 1957-04-28.

  • counts-data.parquet — Processed, merged dataset with ±3 day lagged predictors, generated by convert_counts.R. 2718 days total, ~89% zeros.

Main Model: Hurdle Negative Binomial with Shared Latent Structure

The primary analysis uses a hurdle model where both the probability of any transients and the count magnitude share a common latent predictor. See hurdle_model.md for full mathematical specification.

Key features:

  • Shared latent predictor: Covariate effects (nuclear tests, UAP reports at ±3 day lags) influence both occurrence probability and count magnitude through a single mechanism
  • Random walk component: Captures temporal autocorrelation in underlying "activity state" beyond covariate effects
  • Zero-truncated negative binomial: Handles the substantial overdispersion in non-zero counts (variance/mean ≈ 57)

Files

File Description
hurdle_negbin_shared_latent.stan Main Stan model with random walk
hurdle_negbin_shared.stan Simpler variant without random walk
count_model.R Data preparation, model fitting, diagnostics
convert_counts.R Creates counts-data.parquet with lagged predictors
hurdle_model.md Mathematical specification
notes.md Data properties and modeling notes

Results

Model outputs are saved to results/:

File Description
parameter_pvalues.md Posterior summaries with Bayesian p-values for all lag coefficients
hurdle_latent_coefficients.png Coefficient intervals for nuclear/UAP effects at each lag
hurdle_latent_structural.png Structural parameters (coupling, dispersion, RW scale)
latent_rw_trajectory.png Estimated latent random walk over time
hurdle_latent_trace.png MCMC trace plots for convergence diagnostics

Additional plots for the simpler (no random walk) model and distribution diagnostics are also saved.

Key Findings

Significant Lagged Effects

From the hurdle model with latent random walk (full table):

Predictor Lag Mean Effect Bayesian p-value
Nuclear test -1 day +0.72 0.037
UAP reports -2 days +0.59 <0.001
UAP reports 0 days +0.32 0.048
  • Nuclear tests 1 day before show a significant positive association with transient occurrence (p = 0.037). Same-day and +1 day effects are in the expected direction but not significant.
  • UAP reports 2 days before show a strong positive association (p < 0.001, posterior probability 100% positive across 2000 draws).
  • Same-day UAP shows a marginally significant effect (p = 0.048).

Structural Parameters

  • Coupling (a ≈ 0.1): The coupling between the shared latent predictor and count magnitude is small, with credible interval barely excluding zero. This suggests covariates primarily influence whether transients occur, not how many — the count magnitude on transient days is largely independent of nuclear/UAP activity.
  • Random walk scale (σ_L ≈ 0.8): Substantial temporal autocorrelation exists beyond what covariates explain, justifying the latent random walk component.
  • Dispersion (φ ≈ 1.5): Confirms overdispersion in counts relative to Poisson.

Interpretation, and hunches

The reanalysis broadly supports the original findings. The effects appear to operate on occurrence probability rather than intensity (counts). When transients do occur, their count is driven by other factors (captured partly by the random walk) rather than nuclear/UAP activity. If the effects themselves are real, the irrelevance of counts can, for example, be a natural result of noise on the plates or from the transient detection algorithm.

My informal, general feeling after all these analyses is that the p-values are not easily erased by model details; obviously, when one adds parameters to a model, its power gradually becomes weaker at separating individual coefficients from zero, and significances wane.

In the particular model reported here, the UAP coefficient comes with a strong p-value, while the nuclear test coefficients are harder to distinguish from zero. This is, however, opposite to my overall impression. Nuclear at T-1 appears consistently and often with a decent p-value, while UAP coefficients are somewhat flaky and raise suspicion of complex, unknown confounding.

Earlier Work

Logistic Models (logistic_models/)

Earlier analyses modeled binary transient occurrence rather than counts, using the original dataset:

File Description
model.R Exploratory brms logistic regression
latent1.stan, latent.R Latent state-space model with impulse responses
lagged_transient_model.stan Hierarchical Student-t lag model
lagged_transient_model_rhs.stan Regularized horseshoe variant
latent_model.md Mathematical specification

These models use uap-data-small.parquet generated by convert.R.

Earlier Count Model Efforts (earlier_count_model_efforts/)

Exploratory count models before settling on the hurdle approach:

  • hurdle_negbin.stan — Hurdle model with separate (non-shared) predictors for hurdle and count components
  • poisson_lognormal.stan — Poisson-lognormal mixture
  • poisson_lognormal_sp.stan — Poisson-lognormal with spatial/temporal structure

Technical Requirements

  • R packages: tidyverse, arrow, cmdstanr, brms, bayesplot, ggplot2, posterior, knitr
  • Stan: Models compiled via cmdstanr; requires CmdStan installed (typically at ~/cmdstan)

Running the Analysis

# Generate count data with ±3 day lag window
source("convert_counts.R")

# Run interactively in count_model.R for:
# - Distribution exploration
# - Model fitting (takes several minutes per model)
# - Diagnostics and plots

See count_model.R for detailed fitting code.

License

This work is licensed under CC BY 4.0. You are free to share and adapt the material for any purpose, provided you give appropriate attribution.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors