Blog(1)
Zero-shot sketch-based image retrieval (ZS-SBIR) is a central problem to sketch understanding [6]. This paper aims to tackle all problems associated with the current status quo for ZS-SBIR, including category-level (standard) [4], fine-grained [1], and cross-dataset [3].
Research Areas(0)
Publications(20)
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs
AuthorYassine Ouali, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos
PublishedEuropean Conference on Computer Vision (ECCV)
Date2024-09-30
Modularized Multilingual NMT with Fine-grained Interlingua
AuthorSungjun Lim, Yoonjung Choi, Sangha Kim
PublishedNorth American Chapter of the Association for Computational Linguistics (NAACL)
Date2024-06-20
Enabling Device Control Planning Capabilities of Small Language Model
AuthorSudipta Paul,Lingyu Zhang,Yilin Shen,Hongxia Jin
PublishedIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Date2024-04-14
News(4)
Recently, personalized AI systems have gained significant attention. In the TTS field, zero-shot text-to-speech (ZS-TTS) systems [1-7] enable users to create their own TTS systems that replicate their voices with just one utterance, without further training.
Large Language Models (LLMs) have showcased impressive capabilities in text generation, translation, and code synthesis. Recent efforts focus on integrating LLMs, notably ChatGPT, into robotics for tasks like zero-shot system planning [1].
In recent years, text-to-speech (TTS) has accomplished remarkable improvement with the emergence of various end-to end TTS models [1, 2, 3]. Through these advanced models, TTS expands its field from a model built with a professional voice actor to a personalized TTS.
Others(0)