An Approach for Recommending Similar Fashion Products.

Table of Contents:

1. Business Problem

1. Business Problem:

Nowadays e-fashion influencers regularly post images and users might be willing to mimic the looks of their influencers. Generally, users are interested in buying the entire look of the model. Here in this problem, given a full pose image of a model, the goal of our method is to recommend similar fashion products corresponding to the entire set of fashion articles worn by a model in the full-shot image.

2. Mapping Real-World Problem As Machine Learning Problem:

We can say that this is challenging because, in contrast to performing recommendations for a single primary article (for the query), we need to recommend similar products for each fashion article present in the entire set of the fashion articles worn by the model. So The main problem can be divided into the following stages:

3. Dataset Overview:

In this Task, I have used the Publicly available Street2Shop Dataset. The dataset contains 404,683 shop photos collected from 25 different online retailers and 20,357 street photos, providing a total of 39,479 clothing item matches between street and shop photos. Each of the street photos contains the bounding box location of the clothing item and we also have a similar shop image. We prefer this dataset for the problem because this dataset consists of both bounding boxes and similar images so that this single dataset can be used in all stages.

4. Exploratory Data Analysis and Preprocessing

The Street2shop dataset consists of 20,357 street images and 404,683 corresponding shop images. This dataset contains 11 clothing categories such as bags, belts, dresses, eyewear, footwear, hats, leggings, outerwear, pants, skirts, tops.

5. Module 1: Pose Detection

Our problem is related to recommending similar products for the whole set of fashion articles present rather than similar products for a particular fashion article. First, we need to ensure that the image should be a full pose image. Then we can process that image for different fashion article detection. So we need to estimate and detect the position of key body parts. In other words for a given image of a person, pose estimation would be able to map the positions of his/her elbows, shoulder, knees, ankle, etc. This could be used to predict if the person is standing, walking, or dancing. For this task, we used a pre-trained Posenet model to ensure that the given image is a full pose image.

6. Module 2: Fashion Article detections

Next, we need to detect different fashion articles present in the image. There are many object detection algorithms that we can use such as RCNN, Faster RCNN, YOLO. We have used YOLO-based models for fashion article detection.

  • change line subdivisions to subdivisions=16(Train),1(Test)
  • set network size width=608 height=608 or any value multiple of 32
  • change max_batches to (classes*2000 but not less than the number of training images and not less than 6000), f.e. max_batches=2200 if you train for 3 classes
  • change line steps to 80% and 90% of max_batches, f.e. steps=4800,5400
  • change classes=80 to your number of objects in each of 3 [yolo]-layers.

7. Module-3:Similar Image Recommendation

Having extracted the relevant fashion articles from the full-shot look image, we now need to retrieve similar fashion products from the image database. Then to recommend similar products, we need a common embedding model which will group the similar articles together while moving away from the dissimilar ones. For this, we make use of triplet-based network architecture to learn our embeddings.

8. Future Works

  1. In the feature, we can experiment with the conditional similarity network to improve the triplet network performance.
  2. We also implement a locality-sensitive hashing technique to retrieve similar images much faster.

9. Conclusion

10. References

  1. https://arxiv.org/pdf/2008.11638.pdf