Skip to content

Commit aa439e6

Browse files
committed
Merge branch 'master' of github.com:DoubleML/doubleml-serverless into 0.0.X
2 parents 5a9b372 + d861a3a commit aa439e6

File tree

4 files changed

+100
-28
lines changed

4 files changed

+100
-28
lines changed

README.md

Lines changed: 54 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,17 @@
1-
# DoubleML-Serverless - Distributed Double Machine Learning with a Serverless Architecture
1+
# DoubleML-Serverless - Distributed Double Machine Learning with a Serverless Architecture <a href="https://docs.doubleml.org"><img src="https://raw.githubusercontent.com/DoubleML/doubleml-for-py/master/doc/logo.png" align="right" width = "120" /></a>
22

33
This repo contains a prototype implementation **DoubleML-Serverless** of distributed double machine learning with a serverless infrastructure
44
using [AWS Lambda](https://aws.amazon.com/lambda).
5-
A detailed discussion of this prototype can be found in the paper "Distributed Double Machine Learning with a Serverless Architecture" (Kurz, 2021).
5+
A detailed discussion of this prototype can be found in the paper ["Distributed Double Machine Learning with a Serverless Architecture" (Kurz, 2021)](https://doi.org/10.1145/3447545.3451181).
66
DoubleML-Serverless is an extension for serverless cloud computing of the Python package **DoubleML**.
77
DoubleML is available via PyPI [https://pypi.org/project/DoubleML](https://pypi.org/project/DoubleML) and on GitHub [https://github.com/DoubleML/doubleml-for-py](https://github.com/DoubleML/doubleml-for-py).
8-
Also see [https://docs.doubleml.org](https://docs.doubleml.org) for a detailed documentation and user guide for the DoubleML package.
8+
The Python package DoubleML was introduced in
9+
"DoubleML - An Object-Oriented Implementation of Double Machine Learning in Python"
10+
([Bach et al., 2021](https://arxiv.org/abs/2104.03220))
11+
and a detailed documentation \& user guide for the package is available at
12+
[https://docs.doubleml.org](https://docs.doubleml.org).
913

10-
## Getting started
14+
## Getting Started
1115

1216
### Installation of DoubleML-Serverless
1317

@@ -30,7 +34,7 @@ After downloading the wheel, the package can be installed with pip (replace `XXX
3034
pip install -U DoubleML-Serverless-XXX-py3-none-any.whl
3135
```
3236

33-
### Deploy the corresponding serverless app to AWS Lambda using AWS SAM
37+
### Deploy the Corresponding Serverless App to AWS Lambda using AWS SAM
3438

3539
To use AWS Lambda for estimating double machine learning models, a deployment in your AWS account is necessary.
3640
The corresponding serverless application consists of the following components:
@@ -56,11 +60,11 @@ There are two options for deployment:
5660
sam deploy --guided
5761
```
5862
59-
### Estimating a partially linear regression model with double machine learning and serverless scaling using AWS Lambda
63+
### Estimating a Partially Linear Regression Model with Double Machine Learning and Serverless Scaling Using AWS Lambda
6064
6165
To demonstrate the functionality of DoubleML-Serverless we revisit the Pennsylvania Reemployment Bonus experiment
62-
and estimate the effect of provisioning a cash bonus on the unemployment duration as studied in Chernozhukov et al. (2018).
63-
This example is also discussed in the accompanying paper to the DoubleML-Serverless package (Kurz, 2021).
66+
and estimate the effect of provisioning a cash bonus on the unemployment duration as studied in [Chernozhukov et al. (2018)](https://doi.org/10.1111/ectj.12097).
67+
This example is also discussed in the accompanying paper to the DoubleML-Serverless package ([Kurz, 2021](https://doi.org/10.1145/3447545.3451181)).
6468
6569
We first load the data using functionalities from the DoubleML package.
6670
```python
@@ -112,9 +116,48 @@ dml_lambda_plr_bonus.fit_aws_lambda()
112116
A summary of the estimation result is available via the property `dml_lambda_plr_bonus.summary`.
113117
Some metrics about the estimation on AWS Lambda can be obtained via the property `dml_lambda_plr_bonus.aws_lambda_metrics`.
114118

119+
## Citation
120+
121+
If you use the DoubleML-Serverless package a citation is highly appreciated:
122+
123+
Kurz, M. S. (2021). Distributed Double Machine Learning with a Serverless Architecture.
124+
In Companion of the ACM/SPEC International Conference on Performance Engineering (ICPE '21).
125+
Association for Computing Machinery, New York, NY, USA, 27–33.
126+
doi:[10.1145/3447545.3451181](https://doi.org/10.1145/3447545.3451181).
127+
128+
Bibtex-entry:
129+
130+
```
131+
@inproceedings{kurz2021DoublemlServerless,
132+
author = {Kurz, Malte S.},
133+
title = {Distributed Double Machine Learning with a Serverless Architecture},
134+
year = {2021},
135+
isbn = {9781450383318},
136+
publisher = {Association for Computing Machinery},
137+
address = {New York, NY, USA},
138+
url = {https://doi.org/10.1145/3447545.3451181},
139+
doi = {10.1145/3447545.3451181},
140+
abstract = {This paper explores serverless cloud computing for double machine learning. Being based on repeated cross-fitting, double machine learning is particularly well suited to exploit the high level of parallelism achievable with serverless computing. It allows to get fast on-demand estimations without additional cloud maintenance effort. We provide a prototype Python implementation DoubleML-Serverless for the estimation of double machine learning models with the serverless computing platform AWS Lambda and demonstrate its utility with a case study analyzing estimation times and costs.},
141+
booktitle = {Companion of the ACM/SPEC International Conference on Performance Engineering},
142+
pages = {27--33},
143+
numpages = {7},
144+
keywords = {machine learning, causal machine learning, serverless computing, distributed computing, AWS Lambda, function-as-a-service (FAAS)},
145+
location = {Virtual Event, France},
146+
series = {ICPE '21}
147+
}
148+
```
149+
115150
## References
116151

117-
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018),
118-
Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21: C1-C68. doi:[10.1111/ectj.12097](https://doi.org/10.1111/ectj.12097).
152+
Bach, P., Chernozhukov, V., Kurz, M. S., and Spindler, M. (2021).
153+
DoubleML - An Object-Oriented Implementation of Double Machine Learning in Python.
154+
arXiv:[2104.03220](https://arxiv.org/abs/2104.03220).
155+
156+
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018).
157+
Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21: C1-C68.
158+
doi:[10.1111/ectj.12097](https://doi.org/10.1111/ectj.12097).
119159

120-
Kurz, M.S. 2020. "Distributed Double Machine Learning with a Serverless Architecture". Unpublished Working Paper.
160+
Kurz, M. S. (2021). Distributed Double Machine Learning with a Serverless Architecture.
161+
In Companion of the ACM/SPEC International Conference on Performance Engineering (ICPE '21).
162+
Association for Computing Machinery, New York, NY, USA, 27–33.
163+
doi:[10.1145/3447545.3451181](https://doi.org/10.1145/3447545.3451181).

doubleml_serverless/double_ml_iivm_aws_lambda.py

Lines changed: 33 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ def __init__(self,
1919
n_folds=5,
2020
n_rep=1,
2121
score='ATE',
22+
subgroups=None,
2223
dml_procedure='dml2',
2324
trimming_rule='truncate',
2425
trimming_threshold=1e-12,
@@ -32,6 +33,7 @@ def __init__(self,
3233
n_folds,
3334
n_rep,
3435
score,
36+
subgroups,
3537
dml_procedure,
3638
trimming_rule,
3739
trimming_threshold,
@@ -72,32 +74,49 @@ def _ml_nuisance_aws_lambda(self, cv_params):
7274
self._dml_data.z_cols[0], self._dml_data.x_cols,
7375
method='predict_proba')
7476

75-
_attach_learner(payload_ml_r0,
76-
'ml_r0', self.learner['ml_r'],
77-
self._dml_data.d_cols[0], self._dml_data.x_cols,
78-
method='predict_proba')
79-
80-
_attach_learner(payload_ml_r1,
81-
'ml_r1', self.learner['ml_r'],
82-
self._dml_data.d_cols[0], self._dml_data.x_cols,
83-
method='predict_proba')
84-
85-
all_payloads = [payload_ml_g0, payload_ml_g1, payload_ml_m, payload_ml_r0, payload_ml_r1]
86-
all_smpls = [smpls_z0, smpls_z1, self.smpls, smpls_z0, smpls_z1]
77+
all_payloads = [payload_ml_g0, payload_ml_g1, payload_ml_m]
78+
all_smpls = [smpls_z0, smpls_z1, self.smpls]
79+
send_train_ids = [True, True, False]
80+
params_names = ['ml_g0', 'ml_g1', 'ml_m']
81+
82+
if self.subgroups['always_takers']:
83+
_attach_learner(payload_ml_r0,
84+
'ml_r0', self.learner['ml_r'],
85+
self._dml_data.d_cols[0], self._dml_data.x_cols,
86+
method='predict_proba')
87+
all_payloads.append(payload_ml_r0)
88+
all_smpls.append(smpls_z0)
89+
send_train_ids.append(True)
90+
params_names.append('ml_r0')
91+
92+
if self.subgroups['never_takers']:
93+
_attach_learner(payload_ml_r1,
94+
'ml_r1', self.learner['ml_r'],
95+
self._dml_data.d_cols[0], self._dml_data.x_cols,
96+
method='predict_proba')
97+
all_payloads.append(payload_ml_r1)
98+
all_smpls.append(smpls_z1)
99+
send_train_ids.append(True)
100+
params_names.append('ml_r1')
87101

88102
payloads = _attach_smpls(all_payloads,
89103
all_smpls,
90104
self.n_folds,
91105
self.n_rep,
92106
self._dml_data.n_obs,
93107
cv_params['n_lambdas_cv'],
94-
[True, True, False, True, True],
108+
send_train_ids,
95109
cv_params['seed'])
96110

97-
preds = self.invoke_lambdas(payloads, self.smpls, self.params_names,
111+
preds = self.invoke_lambdas(payloads, self.smpls, params_names,
98112
self._dml_data.n_obs, self.n_rep,
99113
cv_params['n_lambdas_cv'])
100114

115+
if not self.subgroups['always_takers']:
116+
preds['ml_r0'] = np.zeros_like(preds['ml_g0'])
117+
if not self.subgroups['never_takers']:
118+
preds['ml_r1'] = np.ones_like(preds['ml_g1'])
119+
101120
for i_rep in range(self.n_rep):
102121
# compute score elements
103122

doubleml_serverless/tests/test_iivm.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,8 +51,16 @@ def trimming_threshold(request):
5151
return request.param
5252

5353

54+
@pytest.fixture(scope='module',
55+
params=[{'always_takers': True, 'never_takers': True},
56+
{'always_takers': False, 'never_takers': True},
57+
{'always_takers': True, 'never_takers': False}])
58+
def subgroups(request):
59+
return request.param
60+
61+
5462
@pytest.fixture(scope="module")
55-
def dml_iivm_fixture(generate_data_iivm, idx, learner, score, dml_procedure, trimming_threshold):
63+
def dml_iivm_fixture(generate_data_iivm, idx, learner, score, dml_procedure, trimming_threshold, subgroups):
5664
boot_methods = ['normal']
5765
n_folds = 4
5866
n_rep_boot = 502
@@ -77,6 +85,7 @@ def dml_iivm_fixture(generate_data_iivm, idx, learner, score, dml_procedure, tri
7785
ml_g, ml_m, ml_r,
7886
n_folds,
7987
score=score,
88+
subgroups=subgroups,
8089
dml_procedure=dml_procedure)
8190

8291
dml_iivm_lambda.fit_aws_lambda()
@@ -87,6 +96,7 @@ def dml_iivm_fixture(generate_data_iivm, idx, learner, score, dml_procedure, tri
8796
ml_g, ml_m, ml_r,
8897
n_folds,
8998
score=score,
99+
subgroups=subgroups,
90100
dml_procedure=dml_procedure)
91101

92102
dml_iivm.fit()

requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
1-
DoubleML>=0.1.2
1+
DoubleML>=0.2.2
22
joblib
33
numpy
44
pandas
55
scipy
6-
sklearn
6+
scikit-learn==0.23.2
77
statsmodels
88
aiobotocore==1.1.2
99
boto3==1.14.44

0 commit comments

Comments
 (0)