-
Notifications
You must be signed in to change notification settings - Fork 22
Guide to Scripts
Nina Gial edited this page Mar 1, 2024
·
1 revision
This guide covers the contents of the scripts/ directory.
Αυτός ο οδηγός καλύπτει τα περιεχόμενα του φακέλου scripts/ μόνο.
| File | Exlanation |
|---|---|
| (Folder) conversion/ | |
| multiple_xml_files.py | Use to deal with multiple xml files nested in directory, format common with OPUS |
| pickle_to_sql.py | Pickle file to SQLITE (words) - use for tokenizer |
| pickle_to_sql_sentences.py | Pickle file to SQLITE (sentences) - use for RoBERTa, etc |
| (Folder) lda/ | |
| lda_post.py | Run this to prepare LDA |
| lda_pre.py | Run this to visualize LDA *consider Jupyter notebooks |
| (Folder) training/ | |
| Dockerfile | For docker build as container |
| requirements.txt | Python requirements for training |
| script.py | Command line script; use python script.py --help for parameters |
| (Folder) tokenizer/ | |
| train_bpe.py | Train tokenizer with Byte Pair Encoding |
| train_bpe_thread.py | Train tokenizer with Byte Pair Encoding |