Enhancing the Virtual Jewelry Try-On Experience with Computer Vision Hrutika Patel School of Engineering and Applied Science Ahmedabad University Ahmedabad, India hrutika.p@ahduni.edu.in Jap Purohit School of Engineering and Applied Science Ahmedabad University Ahmedabad, India jap.p@ahduni.edu.in Sanket Patel School of Engineering and Applied Science Ahmedabad University Ahmedabad, India sanket.patel@ahduni.edu.in Abstract—This study proposes a novel computer vision-based virtual try-on system for rings, earrings, and bracelets, offering a game-changing alternative to traditional jewelry buying. It allows customers to try on jewelry from the comfort of their own homes by integrating Google’s Mediapipe collection, eliminating size concerns and providing a highly interesting and immersive experience with accurate placement and realism. This cost- effective method, which makes use of Python technologies such as rembg, OpenCV, and PIL, helps both small jewelers and customers, and future work seeks to increase support for more jewelry categories while improving the virtual jewelry shopping experience. Keywords—Computer Vision, Virtual Try-On, Mediapipe, Im- age Processing I. I NTRODUCTION The popularity of augmented reality technology has led to a growing preference among consumers for online buying over physical store visits. Online jewelry shopping is convenient, but it also takes a lot of time and effort because there are no in- person trials and sizing issues. These problems are resolved by virtual try-on solutions, which also improve the effectiveness of the jewelry buying process and aid in the modernization and transformation of the industry [1]. With the help of these technologies, consumers can now confidently envision jewelry from home, increasing their global options. However, there are two disadvantages of present virtual try- on systems the inability to see how a certain piece of jewelry might appear using photographs from the internet as virtual- try-on requires 3D modelling and predefined options [2] [2]. Secondly, managing an architecture or system where different models can be present at the same and created immediately based on customer needs is a challenge in itself and not cost-efficient due to rising technology [3] [4]. Therefore, to overcome these hurdles, a system has been developed that is a user-friendly and cost-effective solution that allows customers and sellers without worry about managing high-level architec- tural systems. Hence, with context of simplicity and resource efficiency, virtual try-on using computer vision is more cost- effective than 3D modeling and augmented reality try-on. It reduces production costs by doing away with the necessity for actual prototypes and offers a realistic experience without the expensive cost associated with photorealistic graphics. It is superior than 3D modeling and doesn’t need a lot of resources. This novel method is intended particularly for rings, bracelets, and earrings. To address the growing demand for virtual jewelry try- ons, various computer vision methods, including Google’s Mediapipe library, have been explored. This article will go over the numerous phases required in putting this system in place in great depth, offering a thorough grasp of the technique. II. LITERATURE REVIEW Computer vision methods have seen a revolution in recent years, notably in the context of virtual try-on systems, facil- itating customers’ discovery and purchase of fashion goods. This section includes an overview of present technology and methodologies in this subject, as well as notable developments and approaches. A. Traditional Virtual Try-On Methods Early efforts included 2D picture overlay and early 3D mod- eling [5] [6]. While these developments enabled viewers to see digital representations of apparel and jewelry superimposed over static photographs, they only provided a brief glimpse of the products. Their acceptance was hampered by a lack of realism and user integration. B. 3D Body Scanning and Modeling Integration strategies enhanced the accuracy of virtual try- ons dramatically. They started with recording precise 3D reconstructions of consumers’ bodies, which resulted in im- proved fit and visual appeal [7] [8]. For 3D body recon- struction, these approaches frequently rely on the following equation [9]: P = M · Q (1) Where: P represents the observed 2D points on the user’s body. M is the camera projection matrix. Q denotes the 3D points of the user’s body. 2024 IEEE Applied Sensing Conference (APSCON) | 979-8-3503-1727-5/24/$31.00 ©2024 IEEE | DOI: 10.1109/APSCON60364.2024.10465992 Authorized licensed use limited to: Ahmedabad University. Downloaded on April 15,2024 at 09:14:59 UTC from IEEE Xplore. Restrictions apply.