Özet:
Audio fingerprinting systems have many real-world use-cases such as digital rights management/copyright detection, duplicated audio detection, untagged audio labelling or identify/query-by-example recognition systems. Nowadays, there are popular online platforms that offer identify/query-by-example music recognition services where users can query by snippets of recorded audio to retrieve the matched song metadata. The compact, robust and fast retrieving fingerprint design is the cornerstone of these systems. Although short-term Fourier transform and Mel-spectral representations are common tools that come to mind, these feature extraction methods suffer from being unstable and having somehow limited resolution. In order to overcome these challenges, scattering wavelet transform (SWT) provides an alternative solution to these limitations by recovering information loss, while ensuring translation invariance and stability. In this study, a two-stage audio fingerprint characteristic/feature extraction framework is introduced using SWT integrated with Siamese neural network hashing model for musical audio identification. Similarity-preserving hashes provided by the Siamese neural network model correspond to sound fingerprints and can be defined by a similarity distance metric in the embedded hashing space. The Siamese neural network hashing model was trained by two-layer scattering wavelet transform coefficients using relatively aligned segments of the same music files and segments of different music files. The proposed system achieves successful performance scores under environmental noise, modeling the challenges of detecting music and audio data that may be encountered in everyday life. Using very compact storage, it has been shown to achieve high ROC-AUC scores both by one-to-one comparison and by using locality-sensitive hashing (LSH) for content storage.