Studying deep learning models for manipulated face detection

Hüseynli, İlkin

Yıldız Teknik Üniversitesi Açık Arşivi
→
Tezler
→
Fen Bilimleri Enstitüsü
→
Fen Bilimleri Enstitüsü Yüksek Lisans Tezleri
→
Bilgisayar Mühendisliği
→
Öğe Göster

dc.contributor.author	Hüseynli, İlkin
dc.date.accessioned	2022-08-09T12:25:50Z
dc.date.available	2022-08-09T12:25:50Z
dc.date.issued	2021
dc.identifier.uri	http://dspace.yildiz.edu.tr/xmlui/handle/1/12968
dc.description	Tez (Yüksek Lisans) - Yıldız Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2021	en_US
dc.description.abstract	Deepfakes allow users to manipulate the identity of a person in a video or an image. Previously, special hardware and skill were required to create such fake videos/images. But together with improvements on GAN-based techniques, generating more realistic and hard to detect manipulated faces became easier. This threatens individuals and decreases trust in social media platforms. In this work, our goal is to report eight different models’ learning ability on, by far, the largest fake face dataset - DFDC and test the generalization ability of these models with Celeb-DF-v2. Because the training dataset consists of high-quality videos, we started detecting and extracting faces from them. Next, we sampled data to have balanced classes and a feasible amount of data to train with limited resources. We started training with no extra augmentation because the dataset was big enough, and faces were already modified. Next, we added our default augmentation chain, inspired by other works and increased strength with Coarse-Dropout and Grid Mask augmentations. A separate test set from the DFDC dataset, which has unseen augmentations and distractors and a completely different Celeb-DF-v2 dataset, was used to evaluate results. As distinct from the train set, we followed different face extraction flow for the test sets. We issued face tracking by using simple Intersection over the Union and sampled faces that only tracked over a certain number of consecutive faces. For each video in the test set, the confidence of the sampled faces averaged, and a single confidence value was generated. To calculate video-based log loss values, we used this confidence values. For the Celeb-DF-v2 dataset, we also calculated Sensitivity and Specificity values. For these metrics, the optimal threshold was decided by using Equal Error Rate. We concluded that despite the relatively smaller size input EfficientNet-B4 model has the best learning and generalization ability. Training models with half-precision may speed up training time up to 2 times with very few losses. Finally, Coarse Dropout helped models to generalize better.	en_US
dc.language.iso	en	en_US
dc.subject	Digital video forensics	en_US
dc.subject	Face manipulation	en_US
dc.subject	Deepfake	en_US
dc.subject	Face swap	en_US
dc.title	Studying deep learning models for manipulated face detection	en_US
dc.type	Thesis	en_US