Tell, Don`t Show!: Language Guidance Eases Transfer Across Domains in Images and Videos (ICML 2024)

Official implementation of the paper Tell, Don`t Show!: Language Guidance Eases Transfer Across Domains in Images and Videos published in ICML 2024.

[project page] [paper]

Abstract

The current standard of unsupervised domain adaptation lacks mechanism for incorporating text guidance. We propose a novel framework called LaGTrAn, which leverages natural language to guide the transfer of discriminative knowledge from labeled source to weakly labeled target domains in image and video classification tasks. Despite its simplicity, LaGTrAn is highly effective on a variety of benchmarks including GeoNet and DomainNet. We also introduce a new benchmark called Ego2Exo to facilitate robustness studies across viewpoint variations in videos, and show LaGTrAn's effeciency in this novel transfer setting. This repository contains the original source code used to train the language and image classifier models in LaGTrAn as well as the trained models.

Requirements

You can use the requirements.txt file to create a new environment or install required packages into existing environments. The following are recommended:

Pytorch>=2.0
torchvision>=0.14.1
timm=0.9.10
transformers>=4.30.1
tokenizers=0.19.1

Datasets and metadata

You can access the textual captions and metadata used in our work in the following links.

GeoNet: GeoPlaces | GeoImnet | GeoUniDA
DomainNet: Metadata.

Download the metadata and places them inside a folder named metadata. You can download the original images from the respective webpages: GeoNet and DomainNet.

Ego2Exo: A new video adaptation benchmark.

We leverage the recently proposed Ego-Exo4D dataset to create a new benchmark called Ego2Exo to study ego-exo transfer in videos. Ego2Exo contains videos from both egocentric and exocentric viewpoints, and is designed to facilitate robustness studies across viewpoint variations in videos. Please refer to this page for dataset and metadata for Ego2Exo benchmark.

Training.

The training for LagTrAn proceeds in two phases - where we first train the text classifier module, and then use the pseudo-labels derived from that to train the image classification module. Note that the metadata

Text Classification on

GeoNet:

python3 text_classification.py --dataset [GeoPlaces|GeoImnet] --source usa --target asia --root_dir <data_dir>

DomainNet:

python3 text_classification.py --dataset DomainNet --source real --target clipart --root_dir <data_dir>

The trained BERT checkpoints along with the pseudo-labels should be download into bert_checkpoints and pseudo_labels respectively. The pseudo labels can then be used to train the downstream adaptation network as follows.

Domain Adaptation on GeoPlaces:

python3 train.py --config configs/lagtran.yml --source usa --target asia --dataset [GeoPlaces|GeoImnet] --data_root <data_dir> --exp_name <exp_name> --trainer lagtran

Domain Adaptation on DomainNet:

python3 train.py --config configs/lagtran.yml --source real --target clipart --dataset DomainNet --data_root <data_dir> --exp_name <exp_name> --trainer lagtran

Trained Models.

You can directly download the target-adapted models (along with the training logs) for GeoNet dataset at the following links. All models use a ViT-B/16 backbone pre-trained on ImageNet and trained using LagTrAn.

	USA -> Asia	Asia -> USA
GeoPlaces	56.14 (Link)	57.02 (Link)
GeoImnet	63.67 (Link)	64.16 (Link)

Testing.

If you just want to compute the accuracy using the pre-trained models, you may download the models and use the following command.

python3 test.py --config configs/test.yml --target asia --data_root <data_dir> --saved_model <checkpoint_dir>/best_model.pkl  --dataset [GeoPlaces|GeoImnet]

Citation

If this code or our work helps in your work, please consider citing us.

@article{kalluri2024lagtran,
        author    = {Kalluri, Tarun and Majumder, Bodhisattwa and Chandraker, Manmohan},
        title     = {Tell, Don`t Show! Language Guidance Eases Transfer Across Domains in Images and Videos},
        journal   = {ICML},
        year      = {2024},
        url       = {https://arxiv.org/abs/2403.05535},
      },

Contact

If you have any question about this project, please contact Tarun Kalluri.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Ego2Exo		Ego2Exo
assets		assets
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tell, Don`t Show!: Language Guidance Eases Transfer Across Domains in Images and Videos (ICML 2024)

[project page] [paper]

Abstract

Requirements

Datasets and metadata

Ego2Exo: A new video adaptation benchmark.

Training.

Trained Models.

Testing.

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ViLab-UCSD/LaGTran_ICML2024

Folders and files

Latest commit

History

Repository files navigation

Tell, Don`t Show!: Language Guidance Eases Transfer Across Domains in Images and Videos (ICML 2024)

[project page] [paper]

Abstract

Requirements

Datasets and metadata

Ego2Exo: A new video adaptation benchmark.

Training.

Trained Models.

Testing.

Citation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages