Skip to content

traitecoevo/rabbit

Repository files navigation

rabbit

The {rabbit} package provides functions to standardise raw tri-axial accelerometer data and extract movement dynamic metrics from it. Accelerometer data is often collected in a raw format that may not be immediately suitable for analysis.

  • standardise_data() reads in raw accelerometer data, standardises column names, and converts the timestamp to a consistent format.
  • extract_movement_dynamics() takes the standardised data and calculates various movement dynamics metrics using rolling window calculations, including mean acceleration, variance, covariance, Overall Dynamic Body Acceleration (ODBA), and Vectorial Dynamic Body Acceleration (VDBA).
  • crop_ends(), crop_ends_diurnal(), and crop_ends_nocturnal()useful for removing records that are outside of the desired time window, such as records collected before the device was fitted or after it was removed.

Speed

Accelerometer data is often collected at high frequencies, usually 10 to 50Hz, resulting in large datasets. Compared to the original code inherited by the team, {rabbit} is 1000–2000 times faster. A file that previously took over a day to process can now be done in under a minute. Speed gains come from efficient design combined with the {RcppRoll} package, which provides optimised C++ implementations of rolling window functions.

Installation

You can install the development version of rabbit from GitHub using the remotes package:

# install.packages("remotes")
remotes::install_github("traitecoevo/rabbit")

Example

This basic example uses less than one hour of data from a bilby called Piccolo:

library(rabbit)
library(dplyr)

file_in <- system.file("extdata", "raw_Pic2Jan_50000.parquet", package = "rabbit")

df <- standardise_data(file_in = file_in, vars = c("Timestamp", "accX", "accY", "accZ")) |>
  extract_movement_dynamics()

# First rows are NA due to rolling window calculations
df |> filter(!is.na(time))

Alternatively, you can read in the data first and then standardise it:

library(arrow)

df <- arrow::read_parquet(file_in)

df_mvt <- standardise_data(df, vars = c("Timestamp", "accX", "accY", "accZ")) |>
  extract_movement_dynamics()

df_mvt |> filter(!is.na(time))

The following code loops through all files within a folder and extracts the animal ID from the file name using the {stringr} package. For ID extraction, files should be labelled with the animal ID followed by _.

library(stringr)
library(arrow)

csv_files <- list.files(path = "/Raw", pattern = "*.csv", full.names = TRUE)

# Create output folder if it doesn't exist
dir.create("/Processed", showWarnings = FALSE)

# Loop through each .csv file, calculate, and save the results
for (csv_file in csv_files) {
  cat("Processing:", csv_file, "\n")

  # Extract ID from filename
  file_name <- tools::file_path_sans_ext(basename(csv_file))
  animal_id <- stringr::str_split(file_name, "_")[[1]][1]

  result <- standardise_data(file_in = csv_file,
                             vars = c("Timestamp", "X", "Y", "Z"),
                             time_function = lubridate::dmy_hms) |>
    extract_movement_dynamics(window_size = 25)

  result$ID <- animal_id
  result <- dplyr::relocate(result, ID)

  output_file <- sub("Raw", "Processed", csv_file) |>
    sub(pattern = "\\.csv$", replacement = ".parquet")

  arrow::write_parquet(result, output_file)

  cat("Saved:", output_file, "\n\n")
  gc()
}

Files can then be cropped to remove records at the start and end of each file. This is useful for removing records collected before the device was attached or after it was removed.

  • crop_ends() removes all records before the first instance of a specified time and after the last instance of a specified time
  • crop_ends_diurnal() removes all records before the first instance of 01:00:00 and after the last instance of 01:00:00
  • crop_ends_nocturnal() removes all records before the first instance of 12:00:00 and after the last instance of 12:00:00
# List all .parquet files in folder
parquet_files <- list.files(path = "/Processed", pattern = "\\.parquet$", full.names = TRUE)

# Create output folder if it doesn't exist
dir.create("/Cropped", showWarnings = FALSE)

# Create an empty list to collect summary rows
summary_list <- vector("list", length(parquet_files))

for (i in seq_along(parquet_files)) {

  parquet_file <- parquet_files[i]
  cat("Processing:", parquet_file, "\n")

  # Load and remove incomplete time rows
  df <- arrow::read_parquet(parquet_file) |>
    filter(!is.na(time))

  # Crop each file with desired crop_ends() function
  result2 <- crop_ends_nocturnal(df)
  rm(df)
  gc()

  # Store first and last rows in list
  summary_list[[i]] <- bind_rows(result2[1, ], result2[nrow(result2), ])

  # Save cropped file
  new_filename <- sub("\\.parquet$", "_crop.parquet", basename(parquet_file))
  arrow::write_parquet(result2, file.path("/Cropped", new_filename))
  rm(result2)
  gc()

  cat("Saved:", new_filename, "\n\n")
}

# Combine all summary rows at the end
summary_tibble <- bind_rows(summary_list)
print(summary_tibble)

Analysis and classifying behaviour

The output is now ready for analysis or classification of behaviour using your own analytical methods or behavioural classifier.

Identifying high activity periods

sumVDBA is the best measure of heat-generating movement or activity:

df_mvt |>
  filter(!is.na(time)) |>
  ggplot(aes(x = time, y = sumVDBA)) +
  geom_point(size = 0.2) +
  theme_classic()

Benchmarking

We can compare performance of the fast (RcppRoll), base (pure R), and orig (original code) implementations on a real accelerometer file (50,000 rows):

bench <-
  arrow::read_parquet(system.file("extdata", "raw_Pic2Jan_50000.parquet", package = "rabbit")) |>
  standardise_data(vars = c("Timestamp", "accX", "accY", "accZ"),
                   time_function = lubridate::dmy_hms) |>
  dplyr::select(time, x, y, z) |>
  benchmark_movement_dynamics()

bench |>
  dplyr::group_by(task, method) |>
  dplyr::summarise(median_sec = median(elapsed_sec))
method time (seconds) relative speed
fast 0.146 1,835x faster than orig
base 0.983 272x faster than orig
orig 267.848 (~4.5 minutes) baseline

The fast method is ~7x faster than the pure R base implementation, and ~1,835x faster than the original code. On a full day’s recording at 25Hz, this difference becomes even more pronounced, a file that previously took over a day to process can now be done in seconds.

About

An R package to "rabbitly" process data outputs from triaxial accelerometers

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages