Xiulong Liu, Anurag Kumar, Paul Calamia, Sebasti`a V. Amengual, Calvin Murdock , Ishwarya Ananthabhotla , Philip Robinson , Eli Shlizerman , Vamsi Krishna Ithapu , Ruohan Gao
(If you find this project helpful, please give us a star ⭐ on this GitHub repository to support us.)
- 📋 Table of Contents
- 📝 Overview
- 🛠️ Installation
- 📊 Dataset
- ✅ Evaluation
- Sim to Real
- 📧 Contact
- 📑 Citation
xRIR is a novel and generalizable framework for cross-room RIR prediction. The approach demonstrates strong performance not only on large-scale synthetic dataset but also achieves decent performance when adapted to real acoustic scenes. This repository contains the unofficial implementation of the CVPR 2025 paper.
Clone the repository and create a conda environment:
git clone https://github.com/DragonLiu1995/xRIR_code.git
conda create -n xRIR python=3.8
conda activate xRIR
pip install torch==2.0.1+cu117 torchvision==0.15.2+cu117 torchaudio==2.0.2+cu117 \
-f https://download.pytorch.org/whl/torch_stable.html
Install dependencies: pip install -r requirements.txt
Check out the official dataset repository at for details: https://github.com/facebookresearch/AcousticRooms. Download all and unzip all *.zip files to a data folder.
Here we provide checkpoints for xRIR under 8-shot scenario for both seen and unseen splits in AcousticRooms dataset. To evaluate the model:
export PYTHONPATH=$PYTHONPATH:[repo_directory]
- Run:
python eval_unseen.py
for unseen test split, and
python eval_seen.py
for seen test split.
Download our pretrained model checkpoints from here
Check sim_to_real folder for more info. Basically sim to real transfer features two stages finetuning to achieve the most optimal results: 1. finetune on training split of 3 different rooms (12 * 3 as illustrated in this code, dampened room excluded), and then in second stage, we tune specifically on targeted room only (using only 12 samples in that room based on stage 1 checkpoint). For reference RIR, we preprocessed all RIRs in a room by dividing the waveform by training set's largest magnitude (12 samples in that room), and then resample to 22050Hz. We provide the rendered source depth map at the source in each room under sim_to_real/depth_map
. For all other inputs including raw RIRs, xyz locations, you can obtain from original HearingAnythingAnywhere dataset.
If you have any questions or need further assistance, feel free to reach out to us:
- Xiulong Liu: liuxiulong1995@gmail.com
If you use this code for your research, please cite our work:
@inproceedings{liu2025hearing,
title={Hearing Anywhere in Any Environment},
author={Liu, Xiulong and Kumar, Anurag and Calamia, Paul and Amengual, Sebastia V and Murdock, Calvin and Ananthabhotla, Ishwarya and Robinson, Philip and Shlizerman, Eli and Ithapu, Vamsi Krishna and Gao, Ruohan},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={5732--5741},
year={2025}
}