NaiLIA: Multimodal Nail Design Retrieval
Based on Dense Intent Descriptions

Under Review
Anonymous EMNLP submission

We will set the links as soon as possible.
At this moment, our dataset, code, and additional report are provided as supplementary materials.

Abstract

We focus on the task of retrieving nail design images based on dense intent descriptions, which represent long and multi-layered user intent for nail designs. This is challenging because such descriptions specify flexibly created paintings and pre-manufactured embellishments, as well as visual characteristics, spatial relationships, themes, and overall impressions. Existing vision-and-language foundation models often struggle to incorporate such multi-layered intent descriptions.

To address this, we propose NaiLIA, a multimodal retrieval method for nail design images, which comprehensively aligns with dense intent descriptions. Our approach introduces a relaxed loss based on confidence scores for unlabeled images that can align with the descriptions.

To evaluate NaiLIA, we constructed a benchmark consisting of 10,625 images collected from people with diverse cultural backgrounds. The images were annotated with long and dense intent descriptions given by over 200 annotators. Experimental results demonstrate that the proposed method outperforms standard methods.

NAIL-STAR Task

I want nails with a mermaid theme, using light blue as the base color. Please draw a mermaid fin on the middle finger and a seashell on the ring finger. Add pearl nail accessories to the seashell. I'd like a fresh, glossy, and shiny look.

Rank 1

Rank 2

Rank 3

Rank 4

Rank 5

In this study, we define the Nail design Semantic Text-image Aligned Retrieval (NAIL-STAR) task as follows: given a dense intent description for a nail design, the goal is to retrieve nail design images that align with the description at a high rank in the output image list. This is difficult because a nail design generally consists of a painted portion, which allows for creative flexibility, and a decorative portion, which can only be modified through the selection and arrangement of pre-manufactured embellishments. Furthermore, the descriptions often include the themes and spatial relationships of the designs, in addition to visual characteristics.

Given the description provided above, the model should retrieve the image enclosed in a green frame. The design is painted with expressive creativity ("fins and shells with a light blue appearance"), and the shells are decorated with pre-manufactured embellishments ("pearl nail accessories"). These elements symbolize a theme ("mermaid"), forming an overall impression ("a fresh and sparkling look").

Please put long fake nails on my nails and make them pink only at the base and the rest should be a fancy design with strawberries.

Target nail design image

Unlabeled positive

The terminology used in this study is defined as follows: a target nail design image refers to a nail design image that aligns with the dense intent description and is explicitly labeled as a positive. Meanwhile, an unlabeled positive is a nail design image that could be considered as the target nail design image but lacks explicit labeling.

NaiLIA

.

We propose NaiLIA, a multimodal retrieval method for nail design images based on dense intent descriptions. It differs from existing approaches in the following aspects. First, NaiLIA estimates confidence scores of unlabeled positives and incorporates these scores into a loss function. This approach can lead to an efficient training process by avoiding undesired anti-correlation between that should be correlated. Second, NaiLIA decomposes the dense intent descriptions and structures natural language descriptions of nail design images to align them in a multi-layered manner.

Qualitative Results

I'd like a colorful and flashy nail design. Please add a large flower nail stone to the ring finger. The tips of the nail tips should be square-shaped.
I'd like my nails to have a cute, teenage vibe. I'd love a pink base with floral patterns and maybe some character accessories. Can we do a long nail shape?
I would like nails that are transparent at the bottom and white with a square shape at the top, with pink hearts drawn on them.
I'd like to use colors that transition from blue to purple, creating a space-like design. For the thumbs and ring fingers, I'd like star-shaped glitter added.
I would like Halloween-themed nails with a solid black color, a white and orange gingham check, and a design with gray nails featuring gingham check pumpkins. Please finish all the nails with a matte top coat.
I would like purple nails, but one of the fingers should be transparent with icons of butterflies. Add glitters on one finger.

Quantitative Results

.

BibTeX

Coming soon...