Skip to content

Code and models for the ICML 2024 paper "Tell, Don`t Show!: Language Guidance Eases Transfer Across Domains in Images and Videos"

License

Notifications You must be signed in to change notification settings

ViLab-UCSD/LaGTran_ICML2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tell, Don`t Show!: Language Guidance Eases Transfer Across Domains in Images and Videos (ICML 2024)

Official implementation of the paper Tell, Don`t Show!: Language Guidance Eases Transfer Across Domains in Images and Videos published in ICML 2024.

LagTrAn

Abstract

The current standard of unsupervised domain adaptation lacks mechanism for incorporating text guidance. We propose a novel framework called LaGTrAn, which leverages natural language to guide the transfer of discriminative knowledge from labeled source to weakly labeled target domains in image and video classification tasks. Despite its simplicity, LaGTrAn is highly effective on a variety of benchmarks including GeoNet and DomainNet. We also introduce a new benchmark called Ego2Exo to facilitate robustness studies across viewpoint variations in videos, and show LaGTrAn's effeciency in this novel transfer setting. This repository contains the original source code used to train the language and image classifier models in LaGTrAn as well as the trained models.

teaser_pic

Requirements

You can use the requirements.txt file to create a new environment or install required packages into existing environments. The following are recommended:

  1. Pytorch>=2.0
  2. torchvision>=0.14.1
  3. timm=0.9.10
  4. transformers>=4.30.1
  5. tokenizers=0.19.1

Datasets and metadata

You can access the textual captions and metadata used in our work in the following links.

  1. GeoNet: GeoPlaces | GeoImnet | GeoUniDA
  2. DomainNet: Metadata.

Download the metadata and places them inside a folder named metadata. You can download the original images from the respective webpages: GeoNet and DomainNet.

Ego2Exo: A new video adaptation benchmark.

We leverage the recently proposed Ego-Exo4D dataset to create a new benchmark called Ego2Exo to study ego-exo transfer in videos. Ego2Exo contains videos from both egocentric and exocentric viewpoints, and is designed to facilitate robustness studies across viewpoint variations in videos. Please refer to this page for dataset and metadata for Ego2Exo benchmark.

Training.

The training for LagTrAn proceeds in two phases - where we first train the text classifier module, and then use the pseudo-labels derived from that to train the image classification module. Note that the metadata

Text Classification on

  1. GeoNet:
python3 text_classification.py --dataset [GeoPlaces|GeoImnet] --source usa --target asia --root_dir <data_dir>
  1. DomainNet:
python3 text_classification.py --dataset DomainNet --source real --target clipart --root_dir <data_dir>

The trained BERT checkpoints along with the pseudo-labels should be download into bert_checkpoints and pseudo_labels respectively. The pseudo labels can then be used to train the downstream adaptation network as follows.

Domain Adaptation on GeoPlaces:

python3 train.py --config configs/lagtran.yml --source usa --target asia --dataset [GeoPlaces|GeoImnet] --data_root <data_dir> --exp_name <exp_name> --trainer lagtran

Domain Adaptation on DomainNet:

python3 train.py --config configs/lagtran.yml --source real --target clipart --dataset DomainNet --data_root <data_dir> --exp_name <exp_name> --trainer lagtran

Trained Models.

You can directly download the target-adapted models (along with the training logs) for GeoNet dataset at the following links. All models use a ViT-B/16 backbone pre-trained on ImageNet and trained using LagTrAn.

USA -> Asia Asia -> USA
GeoPlaces 56.14 (Link) 57.02 (Link)
GeoImnet 63.67 (Link) 64.16 (Link)

Testing.

If you just want to compute the accuracy using the pre-trained models, you may download the models and use the following command.

python3 test.py --config configs/test.yml --target asia --data_root <data_dir> --saved_model <checkpoint_dir>/best_model.pkl  --dataset [GeoPlaces|GeoImnet]

Citation

If this code or our work helps in your work, please consider citing us.

@article{kalluri2024lagtran,
        author    = {Kalluri, Tarun and Majumder, Bodhisattwa and Chandraker, Manmohan},
        title     = {Tell, Don`t Show! Language Guidance Eases Transfer Across Domains in Images and Videos},
        journal   = {ICML},
        year      = {2024},
        url       = {https://arxiv.org/abs/2403.05535},
      },

Contact

If you have any question about this project, please contact Tarun Kalluri.

About

Code and models for the ICML 2024 paper "Tell, Don`t Show!: Language Guidance Eases Transfer Across Domains in Images and Videos"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published