Deepfake Analysis - Amount of Images, Lighting and Angles Andrea Hauser Offense Department, scip AG anha@scip.ch https://www.scip.ch Marc Ruef (Editor) Research Department, scip AG maru@scip.ch https://www.scip.ch Abstract: It needs roughly 500 images of a source to create a solid deepfake. The lighting is very important for the selection of source material. Furthermore, the angles of the source material must be extended to create solid results. Keywords: Deepfake, Interview, YouTube 1. Preface This paper was written in 2018 as part of a research project at scip AG, Switzerland. It was initially published online at https://www.scip.ch/en/?labs.20181122 and is available in English and German. Providing our clients with innovative research for the information technology of the future is an essential part of our company culture. 2. Introduction As we have announced in an earlier article [1], we are going to determine the requirements of image material to create a successful deepfake. The source material for these tests consists of Youtube videos of the size of 720p showing George W. Bush and George Clooney. We are using the face of Clooney (source) and put it on the face of Bush (destination). The calculation of the source material is based on a model trained with Donald Trump and Nicolas Cage. The tests are divided into these three categories: Amount of images Lighting Angles of source material These categories present multiple test cases which consists of two videos as results. The first video shows the result generated by the default values and the second one shows a manually tweaked version to ensure the best possible result. 3. Amount of Images The goal of this category is to determine the minimum amount of images required to create a successful deepfake. The test cases are divided into 500, 2.000, and 5.000 images. We would also like to determine if it is necessary to have a large amount of images of the target video or if there have to be multiple target videos. The basic target video of George W. Bush is a Youtube video with the title Bush’s Best Speech. George Clooney is represented by a mix of four different videos. 3.1. Source 500 images Destination 7 seconds (168 images) After 24 hours of computing the results are the ones shown below. You can clearly see that manual tweaking of parameters increases the quality of the result. 3.2. Source 2.000 images Destination 7 seconds (168 images) After another 24 hours of computing the result consists of the two videos shown below. Once again the manual tweaking of the merge parameters produces much better results. 3.3. Source 5.000 images Destination 7 seconds (168 images) After another 24 hours the following results can be shown. Once again the default video is of lesser quality. 3.4. Source 5.000 images Destination 5.000 images After training on 5.000 to 5.000 images for a day, the resulting model was used in a one-minute-long conversion to the 7 second video of the other test cases. As usual the default values generate less convincing results. However, it is not possible to determine any differences in quality between the 168 images version of Bush and the 5.000 images version of Bush. There is also no difference between 500 and 5.000 images of Clooney. This leads us to the conclusion that just 500 images are required to create a perfect deepfake. 4. Lighting We shall now determine how lighting and shadows influence the quality of deepfakes. We have chosen material by George Clooney where his face is partially in the shadow or the lighting of the source and destination videos is not the same. The source video of George W. Bush remains the same Youtube video as used before.