Improving performance of deep learning models for 3D point cloud semantic segmentation via attention mechanisms
This is the official implementation of "Improving performance of deep learning models for 3D point cloud semantic segmentation via attention mechanisms" paper, that you can download here.
Our implemented network "SPVCNN with Point Transformer in the voxel branch", achieves State Of the Art Results in Street3D dataset.
The baseline models are implemented according to Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
All the codes are tested in the following environment:
- Linux (tested on Ubuntu 18.04)
 - Python 3.9.7
 - PyTorch 1.10
 - CUDA 11.4
 
- Construct an anaconda environment with python 3.9.7
 - Install pytorch 1.10 
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge - Install torchsparse with 
pip install --upgrade git+https://github.com/mit-han-lab/torchsparse.git@v1.4.0 - For the k-NN, we use the operations as implemented in PointTransformer. Execute the lib\pointops\setup.py file, downloaded from PointTransformer,  with 
python3.9 setup.py install - Install h5py with 
conda install h5py - Install tqdm with 
pip install tqdm - Install ignite with 
pip install pytorch-ignite - Install numba with 
pip install numba 
- Please follow the instructions from here to download the SemanticKITTI dataset (both KITTI Odometry dataset and SemanticKITTI labels) and extract all the files in the sequences folder to 
data/SemanticKITTI. You should see 22 folders. Folders 00-10 should have subfolders namedvelodyneandlabels. The rest 11-21 folders are used for online testing and should not contain anylabelsfolder, only thevelodynefolder. 
- Plese follow the instructions from here to download the Street3D dataset. It is in a 
.txtform. Place it in thedata/Street3D/txtfolder, where you should have two folders,trainandtestwith 60 and 20.txtfiles, respectively. - Next, execute the pre-processing scripts as follows:
 
python scripts/Streed3D/street3d_txt_to_h5.py
python scripts/Streed3D/street3d_partition_train.py
python scripts/Streed3D/street3d_partition_test.py
The first script converts the dataset to h5 format and places it in the data/Street3D/h5 folder
The following scripts split each scene into subscenes of around 80k points and save them in .bin format into proper folders, train_part_80k and test_part_80k sets, respectively. The train_part_80k folder should contain 2458 files and the test_part_80k folder should contain 845 files. Training and testing is performed based on these split subscenes of 80k points.
The final structure for both datasets should look like this:
data/
- 
SemanticKITTI/
- sequences/
- 
00/
- 
poses.txt
 - 
labels/
- 000000.label
 - ...
 
 - 
velodyne/
- 000000.bin
 - ...
 
 
 - 
 - 
...
 - 
21/
- poses.txt
 - velodyne/
- 000000.bin
 - ...
 
 
 
 - 
 
 - sequences/
 - 
Street3D/
- 
txt/
- train/
- 5D4KVPBP.txt
 - ...
 
 - test/
- 5D4KVPG4.txt
 - ...
 
 
 - train/
 - 
h5/
- 
train/
- 5D4KVPBP.h5
 - ...
 
 - 
test/
- 5D4KVPG4.h5
 - ...
 
 - 
train_part_80k/
- 5D4KVPBP0.bin
 - ...
 
 - 
test_part_80k/
- 5D4KVPG40.bin
 - ...
 
 
 - 
 
 - 
 
To train the networks, check the following scripts for each dataset:
python scripts/SemanticKITTI/kitti_train_all.py
python scripts/Street3D/street3d_train_all.py
Inside each file, you can select the proper network to train, as well as training parameters.
To test the networks in SemanticKITTI validation set or Street3D test set, check the following scripts for each dataset:
python scripts/SemanticKITTI/kitti_inference_all.py
python scripts/Street3D/street3d_inference_all.py
Inside each file, you can select the proper network to inference, as well as to load the proper weights.
The pretrained weights, used in our paper, are provided here.
The size is around 4.8 GB for the weights for all networks
Next, unzip the pretrained_weights.zip file in the main folder of the repository
If you find this work useful in your research, please consider cite:
@article{vanian2022improving,
  title={Improving performance of deep learning models for 3D point cloud semantic segmentation via attention mechanisms},
  author={Vanian, Vazgen and Zamanakos, Georgios and Pratikakis, Ioannis},
  journal={Computers \& Graphics},
  year={2022},
  publisher={Elsevier}
}

