Refactor `scripts/benchmarking.py` in order to use a dataset instead of using randomly generated data