|
1 | | -This is the improvered rewriter model described in Hu et al. 2019b. |
| 1 | +# Improved Monolingual Rewriter |
2 | 2 |
|
3 | | -Required packages are listed in `requirement.txt`. Please download `params.best` from the project website and place it under this directory. |
| 3 | +This is the rewriter described in our paper: |
4 | 4 |
|
5 | | -Usage: `echo -e "This is a test.\tis|test\twas|exam" | ./paraphrase.sh`, where "is" and "test" are negative constraints, and "was" and "exam" are positive constraints. |
| 5 | +> @inproceedings{N18-1007, |
| 6 | +> title = "Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting", |
| 7 | +> author = "Hu, J. Edward and Khayrallah, Huda and Culkin, Ryan and Xia, Patrick and Chen, Tongfei and Post, Matt and Van Durme, Benjamin", |
| 8 | +> booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)", |
| 9 | +> month = jun, |
| 10 | +> year = "2019", |
| 11 | +> address = "Minneapolis, Minnesota", |
| 12 | +> publisher = "Association for Computational Linguistics", |
6 | 13 |
|
7 | | -Please cite the following papers if you would like to use this rewriter in your work: |
| 14 | +It is available from |
8 | 15 |
|
9 | | -> Hu, J. E., R. Rudinger, M. Post, & B. Van Durme. 2019a. [ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation](http://aaai.org/Papers/AAAI/2019/AAAI-HuJ.4052.pdf). Proceedings of AAAI 2019, Honolulu, Hawaii, January 26 – Feb 1, 2019. |
10 | | -> Hu, J. E., H. Khayrallah, R. Culkin, P. Xia, T. Chen, M. Post, & B. Van Durme. 2019b. [Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting](TBD). Proceedings of NAACL 2019, Minneapolis, Minnesota, June 2 – 7, 2019. |
| 16 | + https://github.com/decompositional-semantics-initiative/improved-ParaBank-rewriter/releases |
11 | 17 |
|
| 18 | +You can also interact with it using [our online demo](http://cs.jhu.edu/~vandurme/pbr-1b-demo). |
12 | 19 |
|
13 | | -To interact with the improved monolingual rewriter online, [please check out this live demo](http://cs.jhu.edu/~vandurme/pbr-1b-demo). |
| 20 | +## Installation |
| 21 | + |
| 22 | +After downloading the release, unpacking it, and changing to that directory, run the following command: |
| 23 | + |
| 24 | + pip3 install -r requirements.txt |
| 25 | + |
| 26 | +You may also have to install spacy models: |
| 27 | + |
| 28 | + python3 -m spacy download en |
| 29 | + python3 -m spacy download en_core_web_lg |
| 30 | + |
| 31 | +## Usage |
| 32 | + |
| 33 | +The rewriter takes raw, unprocessed input and returns the same. |
| 34 | +It applies all the pre-processing automatically, and undoes it afterwards. |
| 35 | +To run the pipeline, use the `paraphrase.sh` script: |
| 36 | + |
| 37 | +Usage: |
| 38 | + |
| 39 | + cat sentences.txt | /path/to/rewriter/paraphrase.sh > paraphrases.txt |
| 40 | + |
| 41 | +It uses Sockeye internally, so takes many Sockeye options. |
| 42 | +By default, Sockeye looks for a GPU. |
| 43 | +You can use a CPU instead by passing |
| 44 | + |
| 45 | + paraphrase.sh --use-cpu |
| 46 | + |
| 47 | +To change the beam or batch sizes, use the `--beam-size X` and `--batch-size Y` options. |
| 48 | +You can set the GPU device with `--device-ids N`, where `N` is a 0-indexed CUDA device ID. |
| 49 | + |
| 50 | +To get n-best output, use |
| 51 | + |
| 52 | + paraphrase.sh --nbest-size K |
| 53 | + |
| 54 | +Finally, you can experiment with negative and positive constraints by passing them in as the second and third tab-delimited field, respectively. |
| 55 | +Separate constraints are separated by a bar. |
| 56 | +For example, |
| 57 | + |
| 58 | +```bash |
| 59 | +echo -e "In times like this, one takes one’s happiness where one can find it.\tfortune|happiness|chances" | ./paraphrase.sh |
| 60 | +``` |
| 61 | + |
| 62 | +will use *fortune*, *happiness*, and *chances* as negative constraints. |
| 63 | +Positive constraints could be added with the use of a third field. |
0 commit comments