Scripts to ensure file synchronization across multiple sources with different directory structures and file names
Mounting the NAS volumes:
sudo su
mount -t nfs -o nfsvers=3 192.168.0.10:/volume1/photos-tmp /media/photos-tmp/
Verify that all files are readable before hashing (chmod if not):
find . ! -readable
Rename image files according to their creation date:
exiftool '-filename<CreateDate' -d IMG_%Y%m%d_%H%M%S%%-c.%%le -r -ext JPG -ext jpg .
Create hashes recursively:
hashdeep -c md5 -v -r -l -W hashdeep_out.txt .
Validate a hash file without re-hashing:
./hashing/hash_file_validation.sh
The scripts checks
- whether the sum of all file byte sizes according to the file system and according to the hash file are identical and
- whether the list of files in the current directory is equivalent with the file list in the hash file.
Input: Expects a hashdeep_out.txt in the current directory.
Output: The result of the file size checks and list of files which are unique either to file system or to the hash file (only file names and relative paths considered).
Merge all hashdeep_out.txt files in the subdirectories of the current working directory (one level only, not recursively). The output is printed to stdout and contains a hashdeep-alike header.
./hashing/merge_hash_files.sh > hashdeep_out.txt
Run an audit of the files in the current directory against the files listed in the hashes file provided:
hashdeep -vvv -a -r -k /media/photos-tmp/2017/photos_new/hashdeep_out.txt . > hashdeep_audit.txt 2>&1
Regex search audit file: cat hashdeep_audit.txt | grep -oP '.*No\smatch'
The audit script checks all hashes in the current directory against hashes in a reference directory. The script lists files that are considered unique to current directory, i.e. files that are not yet exist in the reference directory. The script expects a hashdeep_out.txt file to be available in the current directory. The hash file of the reference directory is provided as argument. Example:
/path/sync-utils/auditing/audit.sh /media/photos-arch/2020/hashdeep_out.txt
Find file duplicates within a particular directory. Out of a set of identical files the first one listed in the input is kept and all the others are listed as duplicates.
Input: Expects a hashdeep_out.txt in the current directory.
Output: List of duplicates (file path only)
find_duplicates_in_hash_file.sh
Move duplicates to a separate folder. Directory structure will not be preserved in the destination folder ../photos_duplicates/. However, existing destination files are backuped (numbered) and not overwritten. Hint: If some files are already moved (hence, moving them will fail) append 2>&1 >/dev/null | grep -v 'No such file or directory' to the command in order to ignore non-existing files.
find_duplicates_in_hash_file.sh | xargs -d '\n' mv --backup=t -t ../photos_duplicates/
Directories can be ecrypted and prepared for archiving (e.g. on cloud storage) using the archiving scripts