From 5c8ea06e60ee0c1308a1d71190564f40e7cec398 Mon Sep 17 00:00:00 2001 From: Ryan Gabbard Date: Mon, 5 Nov 2018 12:53:31 -0800 Subject: [PATCH 1/4] Adds requirements.txt for necessary dependencies --- README.md | 7 +++++++ requirements.txt | 11 +++++++++++ 2 files changed, 18 insertions(+) create mode 100644 requirements.txt diff --git a/README.md b/README.md index f2b3948..8757f35 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,13 @@ # finetune-transformer-lm Code and model for the paper "Improving Language Understanding by Generative Pre-Training" +Before running this code, you need to: +``` +# if you lack a GPU, see requirements.txt for necessary modifications +pip install -r requirements.txt +python -m spacy download en +``` + Currently this code implements the ROCStories Cloze Test result reported in the paper by running: `python train.py --dataset rocstories --desc rocstories --submit --analysis --data_dir [path to data here]` diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..694363e --- /dev/null +++ b/requirements.txt @@ -0,0 +1,11 @@ +# older versions of these dependencies may work +# these are simply the newest versions at the time this file was made +joblib>=0.12.5 +numpy>=1.15.4 +# change to just tensorflow if you don't have a GPU +tensorflow-gpu>=1.11.0 +tqdm>=4.28.1 +scikit-learn>=0.19.2 +pandas>=0.23.4 +ftfy>=5.5.0 +spacy>=2.0.16 From d12b808389da0916d2aaffb7b15c1dc9af44348e Mon Sep 17 00:00:00 2001 From: Ryan Gabbard Date: Mon, 5 Nov 2018 12:54:40 -0800 Subject: [PATCH 2/4] Ignore log directory --- .gitignore | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.gitignore b/.gitignore index 894a44c..ff9fefd 100644 --- a/.gitignore +++ b/.gitignore @@ -102,3 +102,5 @@ venv.bak/ # mypy .mypy_cache/ + +log/ From 3a2019d4e024078e857a32b7d27aab2d3fbda705 Mon Sep 17 00:00:00 2001 From: Ryan Gabbard Date: Mon, 5 Nov 2018 13:13:42 -0800 Subject: [PATCH 3/4] Add directions about data download --- README.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 8757f35..cdbf58d 100644 --- a/README.md +++ b/README.md @@ -2,11 +2,10 @@ Code and model for the paper "Improving Language Understanding by Generative Pre-Training" Before running this code, you need to: -``` -# if you lack a GPU, see requirements.txt for necessary modifications -pip install -r requirements.txt -python -m spacy download en -``` +1. `pip install -r requirements.txt`. If you lack a GPU, see `requirements.txt` for necessary modifications +2. `python -m spacy download en` +3. Export the "val set" and "test set" from the ROC stories corpus (see below) as CSV files + with the default filenames and place them in a `data` subdirectory under htis repository. Currently this code implements the ROCStories Cloze Test result reported in the paper by running: `python train.py --dataset rocstories --desc rocstories --submit --analysis --data_dir [path to data here]` From c96a81d84ed6e0f1612d7b60adb2a643e7d8926c Mon Sep 17 00:00:00 2001 From: Ryan Gabbard Date: Mon, 5 Nov 2018 16:14:23 -0500 Subject: [PATCH 4/4] Fix typo in README --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index cdbf58d..1b504f2 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ Before running this code, you need to: 1. `pip install -r requirements.txt`. If you lack a GPU, see `requirements.txt` for necessary modifications 2. `python -m spacy download en` 3. Export the "val set" and "test set" from the ROC stories corpus (see below) as CSV files - with the default filenames and place them in a `data` subdirectory under htis repository. + with the default filenames and place them in a `data` subdirectory under this repository. Currently this code implements the ROCStories Cloze Test result reported in the paper by running: `python train.py --dataset rocstories --desc rocstories --submit --analysis --data_dir [path to data here]`