As I wanted to make sure that two video files were identical aside from metadata, I decided to write a script to help me. The script will create a hash from the audio and video data without the header data.
Caveats:
- There is a chance for hash collisions, so double check the results manually before deleting
- This script doesn't flag videos as duplicates if they are the same video but have different resolution, bitrate, audio data, etc
Requirements
- ffmpeg
- md5sum
Shell Version
Initial shell script to compare to 2 files:
#!/bin/shfile1="$1"file2="$2"flags="-fflags +bitexact -flags:v +bitexact -flags:a +bitexact -c copy -f matroska"file1hash=$(ffmpeg -i "$file1" $flags -c copy -f matroska -loglevel error - | md5sum | cut -f1 -d" ")file2hash=$(ffmpeg -i "$file2" $flags -c copy -f matroska -loglevel error - | md5sum | cut -f1 -d" ")echo "$file1hash $file1"echo "$file2hash $file2"
Example Shell
$./diffvideo.sh file1.m4v file2.m4vcb31XXXXXXXXXXXXXXXXXXXXXXXXXXXX file1.m4vcb31XXXXXXXXXXXXXXXXXXXXXXXXXXXX file2.m4v
Python Version
Expanded python script to compare more files:
#!/usr/bin/env python3import argparseimport shleximport subprocessdef setup_cli():parser = argparse.ArgumentParser(prog='',description='',epilog='',)parser.add_argument('filenames', nargs='*')return parserdef check_files(filenames):hashes = {'not_a_video': []}for file in filenames:flags = '-fflags +bitexact -flags:v +bitexact -flags:a +bitexact'cmd = f'ffmpeg -i "{file}" {flags} -c copy -f matroska -'ff_proc = subprocess.run(shlex.split(cmd), capture_output=True)if ff_proc.returncode != 0:hashes['not_a_video'].append(file)continuehash_proc = subprocess.run('md5sum', capture_output=True, input=ff_proc.stdout)filehash = hash_proc.stdout.decode().split()[0]if filehash not in hashes:hashes[filehash] = []hashes[filehash].append(file)return hashesdef print_results(hashes):not_a_video = hashes.pop('not_a_video')singles = {}dupes = {}for key, value in hashes.items():if len(value) > 1:dupes[key] = valueelse:singles[key] = valueif not_a_video:print('\nNot a video:')for each in not_a_video:print(f' {each}')if dupes:print('\nDuplicates found:')for key, value in dupes.items():print(f' {key}')for v in value:print(f' {v}')if singles:print('\nNo Duplicates for these files:')for key, value in singles.items():print(f' {value[0]}')if __name__ == "__main__":parser = setup_cli()args = parser.parse_args()hashes = check_files(args.filenames)print_results(hashes)
Example Python
$./diffvideo.py *Not a video:test.txtfile.docxDuplicates found:cb31XXXXXXXXXXXXXXXXXXXXXXXXXXXXfile1.m4vfile2.m4vab76XXXXXXXXXXXXXXXXXXXXXXXXXXXXfile5.m4vfile6.m4vNo duplicates for these files:file3.m4vfile4.m4v
Appendix
Sources
No comments:
Post a Comment