Pose and Atribute Consistent Person Image Synthesis CHENG XU, South China University of Technology, China ZEJUN CHEN, South China University of Technology, China JIAJIE MAI, King’s College, UK XUEMIAO XU ∗ , South China University of Technology, China, State Key Laboratory of Subtropical Building Science, China, Ministry of Education Key Laboratory of Big Data and Intelligent Robot, China, and Guangdong Provincial Key Lab of Computational Intelligence and Cyberspace Information, China SHENGFENG HE ∗ , South China University of Technology, China Person Image Synthesis aims at transferring the appearance of the source person image into a target pose. Existing methods cannot handle large pose variations and therefore sufer from two critical problems, 1) synthesis distortion due to the entanglement of pose and appearance information among diferent body components; and 2) failure in preserving original semantics (e.g., the same outit). In this paper, we explicitly address these two problems by proposing a Pose and Attribute Consistent Person Image Synthesis Network (PAC-GAN). To reduce pose and appearance matching ambiguity, we propose a component-wise transferring model consisting of two stages. The former stage focuses only on synthesizing target poses, while the latter renders target appearances by explicitly transferring the appearance information from the source image to the target image in a component-wise manner. In this way, source-target matching ambiguity is eliminated due to the component-wise disentanglement of pose and appearance synthesis. Second, to maintain attribute consistency, we represent the input image as an attribute vector and impose a high-level semantic constraint using this vector to regularize the target synthesis. Extensive experimental results on the DeepFashion dataset demonstrate the superiority of our method over the state-of-the-arts, especially for maintaining pose and attribute consistencies under large pose variations. CCS Concepts: • Computing methodologies → Computer vision; Artiicial intelligence; Image processing. Additional Key Words and Phrases: Image synthesis, Image editing, Pose transfer, Generative adversarial network 1 INTRODUCTION Person Image Synthesis is a challenging task that transfers a person of the source image to a novel target pose. It is of great importance due to the potential of widespread applications, e.g., image/video editing [45, 49], virtual try-on [4, 15], person re-identiication (Re-ID) [51], etc. Existing methods formulate this problem as an image-pose ∗ Corresponding authors. Authors’ addresses: Cheng Xu, South China University of Technology, Guangzhou, Guangdong, China, cschengxu@gmail.com; Zejun Chen, South China University of Technology, Guangzhou, Guangdong, China, darkhorsezzz@163.com; Jiajie Mai, King’s College, London, UK, k20035517@kcl.ac.uk; Xuemiao Xu, xuemx@scut.edu.cn, South China University of Technology, Guangzhou, Guangdong, China and State Key Laboratory of Subtropical Building Science, Guangzhou, Guangdong, China and Ministry of Education Key Laboratory of Big Data and Intelligent Robot, Guangzhou, Guangdong, China and Guangdong Provincial Key Lab of Computational Intelligence and Cyberspace Information, Guangzhou, Guangdong, China; Shengfeng He, South China University of Technology, Guangzhou, Guangdong, China, hesfe@scut.edu.cn. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speciic permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Association for Computing Machinery. 1551-6857/2022/8-ART111 $15.00 https://doi.org/10.1145/3554739 ACM Trans. Multimedia Comput. Commun. Appl.