Enhancing the Virtual Jewelry Try-On Experience
with Computer Vision
Hrutika Patel
School of Engineering
and Applied Science
Ahmedabad University
Ahmedabad, India
hrutika.p@ahduni.edu.in
Jap Purohit
School of Engineering
and Applied Science
Ahmedabad University
Ahmedabad, India
jap.p@ahduni.edu.in
Sanket Patel
School of Engineering
and Applied Science
Ahmedabad University
Ahmedabad, India
sanket.patel@ahduni.edu.in
Abstract—This study proposes a novel computer vision-based
virtual try-on system for rings, earrings, and bracelets, offering
a game-changing alternative to traditional jewelry buying. It
allows customers to try on jewelry from the comfort of their own
homes by integrating Google’s Mediapipe collection, eliminating
size concerns and providing a highly interesting and immersive
experience with accurate placement and realism. This cost-
effective method, which makes use of Python technologies such
as rembg, OpenCV, and PIL, helps both small jewelers and
customers, and future work seeks to increase support for more
jewelry categories while improving the virtual jewelry shopping
experience.
Keywords—Computer Vision, Virtual Try-On, Mediapipe, Im-
age Processing
I. I NTRODUCTION
The popularity of augmented reality technology has led to a
growing preference among consumers for online buying over
physical store visits. Online jewelry shopping is convenient,
but it also takes a lot of time and effort because there are no in-
person trials and sizing issues. These problems are resolved by
virtual try-on solutions, which also improve the effectiveness
of the jewelry buying process and aid in the modernization
and transformation of the industry [1]. With the help of these
technologies, consumers can now confidently envision jewelry
from home, increasing their global options.
However, there are two disadvantages of present virtual try-
on systems the inability to see how a certain piece of jewelry
might appear using photographs from the internet as virtual-
try-on requires 3D modelling and predefined options [2] [2].
Secondly, managing an architecture or system where different
models can be present at the same and created immediately
based on customer needs is a challenge in itself and not
cost-efficient due to rising technology [3] [4]. Therefore, to
overcome these hurdles, a system has been developed that is a
user-friendly and cost-effective solution that allows customers
and sellers without worry about managing high-level architec-
tural systems. Hence, with context of simplicity and resource
efficiency, virtual try-on using computer vision is more cost-
effective than 3D modeling and augmented reality try-on. It
reduces production costs by doing away with the necessity
for actual prototypes and offers a realistic experience without
the expensive cost associated with photorealistic graphics. It is
superior than 3D modeling and doesn’t need a lot of resources.
This novel method is intended particularly for rings, bracelets,
and earrings.
To address the growing demand for virtual jewelry try-
ons, various computer vision methods, including Google’s
Mediapipe library, have been explored. This article will go
over the numerous phases required in putting this system
in place in great depth, offering a thorough grasp of the
technique.
II. LITERATURE REVIEW
Computer vision methods have seen a revolution in recent
years, notably in the context of virtual try-on systems, facil-
itating customers’ discovery and purchase of fashion goods.
This section includes an overview of present technology and
methodologies in this subject, as well as notable developments
and approaches.
A. Traditional Virtual Try-On Methods
Early efforts included 2D picture overlay and early 3D mod-
eling [5] [6]. While these developments enabled viewers to see
digital representations of apparel and jewelry superimposed
over static photographs, they only provided a brief glimpse
of the products. Their acceptance was hampered by a lack of
realism and user integration.
B. 3D Body Scanning and Modeling
Integration strategies enhanced the accuracy of virtual try-
ons dramatically. They started with recording precise 3D
reconstructions of consumers’ bodies, which resulted in im-
proved fit and visual appeal [7] [8]. For 3D body recon-
struction, these approaches frequently rely on the following
equation [9]:
P = M · Q (1)
Where:
P represents the observed 2D points on the user’s body.
M is the camera projection matrix.
Q denotes the 3D points of the user’s body.
2024 IEEE Applied Sensing Conference (APSCON) | 979-8-3503-1727-5/24/$31.00 ©2024 IEEE | DOI: 10.1109/APSCON60364.2024.10465992
Authorized licensed use limited to: Ahmedabad University. Downloaded on April 15,2024 at 09:14:59 UTC from IEEE Xplore. Restrictions apply.