Abstract: Vision Transformers uniformly process all image patches, often leading to inefficiencies by learning unnecessary background information. To address this, this work propose a preprocessing ...