Welcome to the repository for the paper "Secure and Efficient Logistic Regression with Secret-Sharing MPC and Differential Privacy". This repository contains the implementation of our privacy-preserving logistic regression using the CECILIA framework.
The project combines Secret-Sharing-based Multi-Party Computation (MPC) and Differential Privacy (DP) to develop a secure and efficient logistic regression algorithm. This implementation leverages CECILIA, a robust three-party computational framework written in C++ that offers building blocks for privacy-preserving algorithms.
- Privacy-Preserving Logistic Regression: Implements secure logistic regression using secret sharing.
- Differential Privacy: Ensures data privacy by adding noise to model updates.
- CECILIA Framework: Builds upon the efficient primitives and protocols provided by CECILIA.
- Scalability and Security: Designed to work efficiently with large datasets while maintaining high security.
No installation is required.
Make sure to clone the repository using the "--recurse-submodules" or "--recurse" flag to initialise the submodules as well.
git clone --recurse-submodules https://github.com/MDPPML/DPLR.git
cd DPLR
If you already have a local version of this repository without submodules, use the command "git submodule update --init --recursive" to initialise the submodules.
After cloning the repo into directory DPLR
, you can build the library DPLR
by executing the following commands.
mkdir build
cd build
cmake -S ../ -DCMAKE_BUILD_TYPE=Release
make
After the build completes, the output binaries can be found in DPLR/build/
directory.
./helper <helper_ip> <helper_port>
./dplr <role> <proxy1_port> <proxy1_ip> <helper_port> <helper_ip> <epsilon> <dataset> <size>
./dplr <role> <proxy1_port> <proxy1_ip> <helper_port> <helper_ip> <epsilon> <dataset> <size>
Role
: Specifies the role of the entity running the program:
proxy1
— First proxy server.proxy2
— Second proxy server.helper
— Helper server.
helper_ip
:
IP address of the helper server.
helper_port
:
Port number for the helper server.
proxy1_port
:
Port number for the first proxy server.
proxy1_ip
:
IP address of the first proxy server.
epsilon
:
ε: Privacy budget of differential privacy
dataset
:
Dataset to be used in the experiments
- [1] : Diabetes dataset
- [2] : Adult dataset
- [3] : BC-TCGA dataset
- [4] : GSE2034 dataset
- [5] : Sample size scalability
- [6] : Feature size scalability
size
:
This is only needed in scalability experiments. Depending on the test it is either the sample size or feature size.
For local testing, you can simulate record linkage between the datasets given:
# Start the helper
./helper 127.0.0.1 7777
# Start proxy1 for adult dataset and with epsilon 2
./dplr 0 8888 "127.0.0.1" 7777 "127.0.0.1" 3 2
# Start proxy2
./dplr 1 8888 "127.0.0.1" 7777 "127.0.0.1" 3 2
This setup demonstrates how to run the protocol locally. Due to the their sizes the BC-TCGA and GSE2034 are not included here. The test cases for them uses random numbers but it can be easily changed on the local. The current tests can be used to have a idea of the timing performance.
Apart from this work, Secure and Efficient Logistic Regression with Secret-Sharing MPC and Differential Privacy , see also other works utilizing the same framework: