Abstract: This paper aims to improve speaker similarity of converted voice in a zero-shot voice conversion (VC) framework. To this end, an approach of utilizing pitch information into Perturbation ...
An icon of a desk calendar. An icon of a circle with a diagonal line across. An icon of a block arrow pointing to the right. An icon of a paper envelope. An icon of the Facebook "f" mark. An icon of ...
Abstract: This paper proposes a novel speech emotion recognition (SER) method that fully leverages the architecture of Whisper, a large-scale automatic speech recognition (ASR) model. The conventional ...
This repository contains code and models for vision transformers that generate representations which not only do well for standard recognition tasks (classification, segmentation), but also support ...