This repository demonstrates our solution to the Amazon Last Mile Routing Research Challenge, which aims to integrate real-life experience of Amazon drivers into the solution of optimal route planning. The solution is based on our paper Learning from Drivers to Tackle the Amazon Last Mile Routing Research Challenge, and can be deployed as an Amazon SageMaker Processing Job. The diagram below shows two example sequences generated from the same Amazon route with 100 packages. The sequence on the left (driver friendly) is calculated by our solution and the one on the right (cost optimal) is generated by a conventional Traveling Salesperson Problem (TSP) solver. While the cost optimal sequence on the right requires less travel time in theory, it presents several narrow and sharp V turns and leaps. This may appear unappealing to drivers who are not ready to reverse course. The sequence on the left, in comparison, appears to form a convex hull that avoids those V turns altogether. For example, the first stop that enters the delivery cluster from the South East side is far away from the last stop that leaves the cluster in the South West. This sequence can thus be selected to improve driving experience.
| Driver friendly sequence | Time optimal sequence |
|---|---|
![]() |
![]() |
| travel time: 2.01 hours | travel time: 1.80 hours |
: Image caption - Blue solid circles are locations of Amazon delivery packages. Red arrows show the direction for each transit. The actual depo location is far down below (out of sight), and is shown as red solid circles for the purpose of visualization. Travel time reported here does not include service time spent at each delivery stop. Road networks are overlaid as gray lines and curves. Note that roadnetwork-mapped driving routes that result from these two sequences are not shown here.
Our solution hierarchically integrates Markov model training, online policy search (i.e. Rollout), and off-the-shelf TSP solvers to produce driver friendly routes for last mile planning. The choice of the underlying TSP solver is flexible. For example, our paper reported the evaluation score of 0.0374 using LKH. This repository uses OR-tools for simpler integration, and obtains a nearly identical score of 0.0372. These results are comparable to what the top three teams have achieved on the public leaderboard. More importantly, our method is completely learning-based and data-driven. The training process is lightweight and fast - it takes 4 ~ 5 seconds to finish training the PPM model using 6,000 route sequences on a single CPU instance. Since it does not require any hand-crafted heuristics, our method has the potential to solve large-scale route sequence planning problem in a continuously changing operating environment.
Our method involves training and inference. The training phase produces a Prediction by Partial Matching (PPM for short) model to extract sequential patterns from Travelling Zones. These patterns encode driver preferences and behaviour, and will be used to produce driver friendly routes. Training is generally fast. For example, it takes less than 5 seconds on the AWS ml.m5.4xlarge instance. During the inference phase, Step 1 - the trained PPM model auto-regressively generates new zone sequences guided by the Rollout method. Step 2 - for each zone in the generated zone sequence, we use OR-tools to solve a small TSP instance to produce Stop (namely, package delivery) sequence. Step 3 - we join stop sequences from all zones as per the zone sequence order to form the global stop sequence as the final solution. The following pseudocode captures the main idea of our method.
# Training
ppm_model = train_ppm_model(ground_truth)
# Inference
for route in all_routes:
all_stops = []
zone_set = get_zone_set(route) # Step 0
sorted_zones = ppm_rollout_sort_zone(zone_set, ppm_model) # Step 1
for zone in sorted_zones:
sorted_stops = TSP_solve(zone) # Step 2
all_stops += sorted_stops # Step 3Python source files and shell scripts are structured as follows:
${repo_root_directory} # e.g. amazon-sagemaker-amazon-routing-challenge-sol
βββ aro/ # Name of the Python module of our solution
βββ model/ # model files
βββ __init__.npy # empty module file
βββ ortools_helper.py # interface to invoke the OR-tools TSP solver
βββ ppm.py # implementation of the PPM model
βββ zone_utils.py # Implementation of Step 1 and Step 2
βββ example_inference_job.sh # submit a SageMaker processing job
βββ example_inference.sh # test route sequence generation locally
βββ inference_job.py # Wrap inference.py as a SageMaker job
βββ inference.py # Glue code that combines step 0 to step 4
βββ preprocessing.py # Generate input files for training and inference
βββ requirements_dev.txt # External libraries used locally
βββ requirements.txt # External libraries rquired to run SageMaker jobs
βββ setup.py # Installation file
βββ train.py # Training script to train the PPM modelThe Quick Start instructions below are based on macOS and Linux OS with AWS CLI installed.
# Get the source code
git clone https://github.com/aws-samples/amazon-sagemaker-amazon-routing-challenge-sol.git
cd amazon-sagemaker-amazon-routing-challenge-sol
# Setup Python Environment
conda create --name aro python=3.8
conda activate aro # `aro` is the name of the python virtual environment, feel free to change it
# if on a SageMaker notebook instance, uncomment and execute the following command instead
# source activate aro
# Install the current version of the package
pip install -r requirements_dev.txt
pip install -e .The dataset associated with the Amazon Last Mile Routing Research Challenge is publicly available at Open Data on AWS. You can run the following command to download it:
aws s3 sync --no-sign-request s3://amazon-last-mile-challenges/almrrc2021/ ./data/The total size of the dataset is about 3.1 GB. If the command hangs for some reason, please terminate and re-run it. It will resume downloading from what's left since last try.
The datasets contain package (aka stop ) information, such as destination locations, parcel specifications, customer preferred time windows, expected service times, and zone identifiers. The preprocesing step converts this information from JSON to Parquet and Numpy array format for easy access.
train_data_dir=almrrc2021-data-training
eval_data_dir=almrrc2021-data-evaluation
# generate package information in Parquet file (training)
python preprocessing.py --act gen_route --data_dir data/${train_data_dir}
# generate travel time matrix for all stops in a route (training)
python preprocessing.py --act gen_dist_mat --data_dir data/${train_data_dir}
# generate zone information for each stop (training)
python preprocessing.py --act gen_zone_list --data_dir data/${train_data_dir}
# generate ground-truth zone sequence for each route using the Parquet file generated from `gen_route` action
python preprocessing.py --act gen_actual_zone --data_dir data/${train_data_dir} --pkfn ${parquet_file}
# generate package information in Parquet (evaluation)
python preprocessing.py --act gen_route --data_dir data/${eval_data_dir} --mode eval
# generate travel time matrix for all stops in a route (evaluation)
python preprocessing.py --act gen_dist_mat --data_dir data/${eval_data_dir} --mode eval
# generate zone information for each stop (evaluation)
python preprocessing.py --act gen_zone_list --data_dir data/${eval_data_dir} --mode evalWe upload pre-processed data to Amazon S3 in order to run SageMaker processing jobs during the inference phase.
export bucket_name=my-route-bucket # set `${bucket_name}` to your own S3 bucket name
export s3_data_prefix=lmc # set `${s3_data_prefix}` to your own S3 data prefix
aws s3 sync data/${train_data_dir}/ s3://${bucket_name}/data/${s3_data_prefix}/${train_data_dir}/
aws s3 sync data/${eval_data_dir}/ s3://${bucket_name}/data/${s3_data_prefix}/${eval_data_dir}/After completing this step, the prefix structure in your S3 bucket appears as follows:
${train_or_eval_data_dir} # e.g. `Final_March_15_Data` OR `Final_June_18_Data`
βββ distance_matrix/ # a directory with all distance matrix files
β βββ ${route_id_0}_raw_w_st.npy # distance matrix file produced by preprocessing.py
β βββ ... # more distance matrix files
β βββ ${route_id_N}_raw_w_st.npy # distance matrix file
βββ model_apply_inputs/ # challenge input files
βββ model_apply_outputs/ # output json results here
βββ model_build_inputs/ # challenge input files
βββ model_score_inputs/ # challenge input files
βββ model_score_outputs/ # output json score here
βββ processed/ # output processed parquet file here
βββ zone_list # A directory with all zone files
βββ ${route_id_0}_zone_w_st.joblib # zone file produced by preprocessing.py
βββ ... # more zone files
βββ ${route_id_N}_zone_w_st.joblib # the last zone file
βββ actual_zone-{mode}.csv # ground-truth zone sequence file produced
# by preprocessing.pyWe train the Prediction by Partial Matching (PPM) model. In our work, the PPM model is used as a sequential probability model for generating synthetic zone sequences.
# train the PPM model
python train.py --train_zdf_fn data/${train_data_dir}/zone_list/actual_zone-train.csvWe upload the PPM model to S3 so that the subsequent SageMake processing job can access the model to generate zone sequences.
# optional - set `${s3_model_prefix}` with your own S3 model prefix
export s3_model_prefix=almrc
aws s3 cp aro_ppm_train_model.joblib s3://${bucket_name}/models/${s3_model_prefix}/We run the inferernce for route generation locally (e.g. on your laptop or desktop) for the purpose of debugging or understanding the inner workings of our approach.
# test inference locally
./example_inference.shPlease change your script as per any debugging information revealed in the output or error statements.
Now we are ready to generate routes by submititng an Amazon SageMaker processing job running on a ml.m5.4xlarge instance.
# please set environment variables (e.g. ${bucket_name})
# in `example_inference_job.sh` before running it
./example_inference_job.shOnce submission is successful, we can navigate the browser to the Amazon SageMaker Processing jobs console to check if a job named ppm-rollout-2022-xxx is indeed running.
It generally takes less than an hour to complete generating 3,052 route sequences on an ml.m5.4xlarge instance. Once the job status becomes completed, we can check the generated sequences for all routes by running the following command.
aws s3 ls \
s3://${bucket_name}/data/${s3_data_prefix}/${eval_data_dir}/model_apply_outputs/eval-ppm-rolloutOnce the submission file is downloaded, follow the evaluation instructions at https://github.com/MIT-CAVE/rc-cli
to calculate the evaluation score, which should be around 0.0372 ~ 0.0376
If you are interested in potential applications of this example for your last mile planning, please feel free to contact the authors.
See CONTRIBUTING for more information.
This code is licensed under the Apache-2.0 License. See the LICENSE file.
- This code uses OR-tools, which is distributed under the Apache-2.0 License.
- This code uses the 2021 Amazon Last Mile Routing Research Challenge Dataset. Merchan, Daniel; Pachon, Julian; Arora, Jatin; Konduri, Karthik; Winkenbach, Matthias; Parks, Steven; Noszek, Joseph. (2022). Seattle: Amazon.com.


