Into-the-Unknown

Welcome! This is the repository for our paper -- Into the Unknown: Applying Inductive Spatial-Semantic Location Embeddings for Predicting Individuals’ Mobility Beyond Visited Places (preprint: https://arxiv.org/abs/2506.14070).

Data

Currently, only the FSQ-NYC data is included. More will be added in the future.

The raw Foursquare check-ins data can be downloaded from Dingqi Yang's Homepage. The preprocessing steps of the raw data are introduced in the MHSA repository. For convenience, we directly include the preprocessed data along with the train, validation, and test sets in ./data folder.

Note that

The script in data/fsq_nyc/fsq_nyc_process.ipynb contains the code for constructing train, validation and test sets under the Inductive setting (as described in Section 4.3 of the paper).
The file data/fsq_nyc/fsq_nyc_pois.csv contains the training data used for CaLLiPer. This poi data is extracted from the original Foursquare NYC dataset.

Executing the "pre-train -> apply in downstream task" pipeline

The whole framework consists of two main stages:

pre-training location embeddings
Applying pre-trained embeddings in downstream tasks

1. Pre-training location embeddings

1.1 Baseline methods

Specifiy the parameters in the configuration file: configs/{dataset_name}/pretrain_{city_name}_{model}.yaml. For example, the config file for pre-training POI2Vec on FSQ-NYC is configs/fsq_nyc/pretrain_nyc_poi2vec.yaml.

And then run the following command to start the pre-training.

python run_pretrain.py --config {path_to_config_file}.yaml

The pre-training results will be saved in the ./pretrained directory, including the embed_mat file and a log file.

The folder name for the results depends on the inductive_pct parameter:

If inductive_pct = 0, results will be saved as {dataset_name}_{method_name}_{time}/
If inductive_pct = 10 (10% of the locations sampled to form $\mathcal{L}^{new}$), the folder will be {dataset_name}_{method_name}_inductive10pct_{time}/

1.2 CaLLiPer

To pre-train using CaLLiPer, use the POI coordinate-description pairs data/fsq_nyc/fsq_nyc_pois.csv.

Refer to the original CaLLiPer Repo for detailed instructions.

We recommend running CaLLiPer pre-training separately by cloning the original repository, placing the training data there, and then copying or moving the resulting checkpoint into this project’s ./pretrained folder.

2. Next location prediction (downstream task)

After obtaining the pre-trained location embeddings, configure the parameters in configs/{dataset_name}/next_loc_pred.yml and run the following command:

python main.py --config configs/{dataset_name}/next_loc_pred.yml

Key parameters inside the config file:

loc_embed_method: the specific pre-training method used
loc_enc_ckpt_path: the path to the saved location embedding results.
inductive_pct: Set it to 0 (conventional) or 10 (inductive) to switch task settings.

The downstream results will be saved in either ./outputs_conventional or ./outputs_inductive.

Reproducing results on FSQ-NYC

We provided some pre-trained location embeddings in ./pretrained, which can be used to reproduce the results in Table 2 of the paper.

To do so, just specify the parameters loc_embed_method, loc_enc_ckpt_path, inductive_pct, in the config file ./configs/fsq_nyc/next_loc_pred.yml and run the command as described above.

For now, only the experiments on the FSQ-NYC dataset are supported, but we plan to facilitate more datasets in the future.

Acknowledgements

This project builds on the excellent work from: CTLE, CaLLiPer, and location-prediction.

We thank the authors for their inspiring work and for promoting open and reproducible research!

TODOs

We plan to add more content soon:

🔲 Script for the UMAP visualisation of embeddings.

🔲 Support for other datasets, e.g., FSQ-TKY, etc.

Citation

@article{wang2025into,
  title={Into the Unknown: Applying Inductive Spatial-Semantic Location Embeddings for Predicting Individuals' Mobility Beyond Visited Places},
  author={Wang, Xinglei and Cheng, Tao and Law, Stephen and Zeng, Zichao and Ilyankou, Ilya and Liu, Junyuan and Yin, Lu and Huang, Weiming and Jongwiriyanurak, Natchapon},
  journal={arXiv preprint arXiv:2506.14070},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
calliper		calliper
configs		configs
data/fsq_nyc		data/fsq_nyc
embed		embed
figs		figs
models		models
pretrained		pretrained
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
run_pretrain.py		run_pretrain.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Into-the-Unknown

Table of Contents

Data

Executing the "pre-train -> apply in downstream task" pipeline

1. Pre-training location embeddings

1.1 Baseline methods

1.2 CaLLiPer

2. Next location prediction (downstream task)

Reproducing results on FSQ-NYC

Acknowledgements

TODOs

Citation

About

Uh oh!

Releases

Packages

Languages

SpaceTimeLab/Into-the-Unknown

Folders and files

Latest commit

History

Repository files navigation

Into-the-Unknown

Table of Contents

Data

Executing the "pre-train -> apply in downstream task" pipeline

1. Pre-training location embeddings

1.1 Baseline methods

1.2 CaLLiPer

2. Next location prediction (downstream task)

Reproducing results on FSQ-NYC

Acknowledgements

TODOs

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages