Updated README and experiments use 2000 samples per node

Joseph Shenouda · Joseph Shenouda · commit e445e680c1be · 2020-12-29T22:35:44.000-05:00
diff --git a/README.md b/README.md
@@ -34,6 +34,8 @@ The codebase uses implementations of ByRDiE, BRIDGE, and BRIDGE variants to gene
 
 For experiments in both the faultless and the faulty setting, we ran ten Monte Carlo trials in parallel and averaged the classification accuracy before plotting.
 
+**Note:** The experiments that produce Figure 3 of (Yang et al., 2020) can be reproduced by changing the argument `local_samples` passed to the constructor of `DecLearning.py` in `dec_BRIDGE.py` and `dec_ByRDiE.py` to equal 100 samples per node instead of 2000. However, due to a loss in the original implementation of the decentralized Krum and Bulyan screening methods, the experiments with these screening methods will not perfectly reproduce the results found in Figure 3 of (Yang et al., 2020). Nonetheless, the results from the implementations in this codebase are consistent with the discussions and conclusions made in the paper. Additionally, the original experiments and this codebase uses the ADAM optimizer for all methods to train the neural network but we have provided an option to use vanilla gradient descent when constructing the `linear_classifier` object.
+
 ## Summary of Code
 The `dec_BRIDGE.py` and `dec_ByRDiE.py` serve as the "driver" or "main" files where we set up the experiments and call the necessary functions to learn the machine learning model in a decentralized manner. The actual implementations of the various screenings methods (ByRDiE, BRIDGE, and variants of BRIDGE) are carried out in the `DecLearning.py` module. While these specific implementations are written for the particular case of training with a single-layer neural network using TensorFlow, the core of these implementations can be easily adapted for other machine learning problems.
 
@@ -49,7 +51,7 @@ Lenovo NextScale nx360 servers:
 However, we only allocated 4GB of RAM when submitting each of our jobs. 
 
 ## Requirements and Dependencies
-This code is written in Python and uses TensforFlow.  To reproduce the environment with necessary dependencies needed for running of the code in this repo, we recommend that the users create a `conda` environment using the `environment.yml` YAML file that is provided in the repo. Assuming the conda management system is installed on the user's system, this can be done using the following:
+This code is written in Python and uses TensorFlow.  To reproduce the environment with necessary dependencies needed for running of the code in this repo, we recommend that the users create a `conda` environment using the `environment.yml` YAML file that is provided in the repo. Assuming the conda management system is installed on the user's system, this can be done using the following:
 
 ```
 $ conda env create -f environment.yml
@@ -103,7 +105,7 @@ The user can run each of the possible screening methods ten times in parallel by
 
 <a name="byrdie"></a>
 # ByRDiE Experiments
-We performed decentralized learning using ByRDiE, both in the faultless setting and in the presence of actual Byzantine nodes. To train the one layer neural network on MNIST with ByRDiE, run the `dec_ByRDiE.py` script. Each Monte Carlo trial for ByRDiE ran in about two days on our machines.
+We performed decentralized learning using ByRDiE, both in the faultless setting and in the presence of actual Byzantine nodes. To train the one layer neural network on MNIST with ByRDiE, run the `dec_ByRDiE.py` script. Each Monte Carlo trial for ByRDiE ran in about three days on our machines.
 
 ```
 usage: dec_ByRDiE.py [-h] [-b BYZANTINE] [-gb GOBYZANTINE] monte_trial
@@ -141,15 +143,13 @@ The user can run ByRDiE ten times in parallel by varying `monte_trial` between 0
 # Plotting
 All results generated by `dec_BRIDGE.py` and `dec_ByRDiE.py` get saved in `./result` folder. After running ten independent trials for each Byzantine-resilient decentralized method as described above, run the `plot.py` script to generate the plots similar to Figure 3 in the paper (Yang et al., 2020).
 
-**Note:** Due to a loss in the original implementation of the decentralized Krum and Bulyan screening methods, the experiments with these screening methods will not perfectly reproduce the results found in Figure 3 of (Yang et al., 2020). Nonetheless, the results from the implementations in this codebase are consistent with the discussions and conclusions made in the paper.
-
 # Contributors
 The algorithmic implementations and experiments were originally developed by the authors of the papers listed above:
 
 - [Zhixiong Yang](https://www.linkedin.com/in/zhixiong-yang-67139152/)
-- [Arpita Gang](https://www.linkedin.com/in/arpita-gang-41444930/)
+- [Arpita Gang](https://arpitagang.github.io/)
 - [Waheed U. Bajwa](http://www.inspirelab.us/)
 
 The reproducibility of this codebase and publicizing of it was made possible by:
 
-- [Joseph Shenouda](https://github.com/joeshenouda)
+- [Joseph Shenouda](https://joeshenouda.github.io/)
diff --git a/dec_BRIDGE.py b/dec_BRIDGE.py
@@ -7,8 +7,8 @@
 
 import numpy as np
 from dist_data import data_prep
-from linear_classifier import linear_classifier
 import tensorflow as tf
+from linear_classifier import linear_classifier
 import time
 import pickle
 import random
@@ -59,7 +59,7 @@
 random.seed(a=30+monte_trial)
 
 num_nodes = 20
-para = DecLearning(dataset = 'MNIST', nodes=num_nodes, byzantine=b, local_samples=100)
+para = DecLearning(dataset = 'MNIST', nodes=num_nodes, byzantine=b, local_samples=2000)
 
 #Generate the graph
 con_rate = 50
diff --git a/dec_ByRDiE.py b/dec_ByRDiE.py
@@ -48,12 +48,12 @@
 random.seed(a=30+monte_trial)
 
 
-
-para = DecLearning(dataset = 'MNIST', nodes=20, byzantine = b, local_samples=2000)
+num_nodes = 20
+para = DecLearning(dataset = 'MNIST', nodes=num_nodes, byzantine = b, local_samples=2000)
 loaded = False
 #Generate the graph
 para.gen_graph(min_neigh = min_neighbor)
-local_set, test_data, test_label = data_prep(para.dataset, para.M, para.N, one_hot=True)
+local_set, test_data, test_label = data_prep(para.dataset, para.M, para.M*para.N, one_hot=True)
 neighbors = para.get_neighbor()
 save = []
 
@@ -102,9 +102,9 @@
 
 
 if b!=0 and goByzantine:
-    filename = f'./result/ByRDiE/result_ByRDiE_b{b}_{monte_trial}.pickle'
+    filename = f'./result/ByRDiE/result_{num_nodes}_nodes_{con_rate}%_b{b}_{monte_trial}.pickle'
 else:
-    filename = f'./result/ByRDiE/result_ByRDiE_b{b}_faultless_{monte_trial}.pickle'        
+    filename = f'./result/ByRDiE/result_{num_nodes}_nodes_{con_rate}%_b{b}_faultless_{monte_trial}.pickle'        
 
 end = time.time()
 print(f'Monte Carlo {monte_trial} Done!\n Time elapsed {end-start} seconds\n')
diff --git a/plot.py b/plot.py
@@ -37,9 +37,9 @@
     with open(f'./result/DGD/result_20_nodes_50%_b2_{monte}.pickle', 'rb') as handle:
         dgd_b2.append(pickle.load(handle))
     
-    with open(f'./result/ByRDiE/result_ByRDiE_b2_faultless_{monte}.pickle', 'rb') as handle:
+    with open(f'./result/ByRDiE/result_20_nodes_50%_b0_faultless_{monte}.pickle', 'rb') as handle:
         byrdie_b2_faultless.append(pickle.load(handle))
-    with open(f'./result/ByRDiE/result_ByRDiE_b2_{monte}.pickle', 'rb') as handle:
+    with open(f'./result/ByRDiE/result_20_nodes_50%_b2_{monte}.pickle', 'rb') as handle:
         byrdie_b2.append(pickle.load(handle))
     
     with open(f'./result/BRIDGE/result_20_nodes_50%_b2_faultless_{monte}.pickle','rb') as handle: