This repository was archived by the owner on Jan 5, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 51
Release Notes
Ozan Çağlayan edited this page Sep 26, 2018
·
1 revision
- Ability to install through
pip. - Advanced layers are now organized into subfolders.
- New basic layers: Convolution over sequence, MaxMargin.
- New attention layers: Co-attention, multi-head attention, hierarchical attention.
- New encoders: Arbitrary sequence-of-vectors encoder, BiLSTMp speech feature encoder.
- New decoders: Multi-source decoder, switching decoder, vector decoder.
- New datasets: Kaldi dataset (.ark/.scp reader), Shelve dataset, Numpy sequence dataset.
- Added learning rate annealing: See
lr_decay*options inconfig.py. - Removed subword-nmt and METEOR files from repository. We now depend on
the PIP package for subword-nmt. For METEOR,
nmtpy-install-extrashould be launched after installation. - More multi-task and multi-input/output
translateandtrainingregimes. - New early-stopping metrics: Character and word error rate (cer,wer) and ROUGE (rouge).
- Curriculum learning option for the
BucketBatchSampler, i.e. length-ordered batches. - New models:
- ASR: Listen-attend-and-spell like automatic speech recognition
- Multitask*: Experimental multi-tasking & scheduling between many inputs/outputs.
- Add
environment.ymlfor easy installation usingconda. You can now create a ready-to-usecondaenvironment by just callingconda env create -f environment.yml. - Make
NumpyDatasetmemory efficient by keepingfloat16arrays as they are until batch creation time. - Rename
Multi30kRawDatasettoMulti30kDatasetwhich now supports both raw image files and pre-extracted visual features file stored as.npy. - Add CNN feature extraction script under
scripts/. - Add doubly stochastic attention to
ShowAttendAndTelland multimodal NMT. - New model
MNMTDecinitto initialize decoder with auxiliary features. - New model
AMNMTFeatureswhich is the attentive MMT but with features file instead of end-to-end feature extraction which was memory hungry.
- Updates to
ShowAttendAndTellmodel.
- Removed old
Multi30kDataset. - Sort batches by source sequence length instead of target.
- Fix
ShowAttendAndTellmodel. It should now work.
- Added
Multi30kRawDatasetfor training end-to-end systems from raw images as input. - Added
NumpyDatasetto read.npy/.npztensor files as input features. - You can now pass
-Stonmtpy trainto produce shorter experiment files with not all the hyperparameters in file name. - New post-processing filter option
de-spmfor Google SentencePiece (SPM) processed files. -
sacrebleuis now a dependency as it is now accepted as an early-stopping metric. It only makes sense to use it with SPM processed files since they are detokenized once post-processed. - Added
sklearnas a dependency for some metrics. - Added
momentumandnesterovparameters to[train]section for SGD. -
ImageEncoderlayer is improved in many ways. Please see the code for further details. - Added unmerged upstream PR for
ModuleDict()support. -
METEORwill now fallback to English if language can not be detected from file suffixes. -
-fnow produces a separate numpy file for token frequencies when building vocabulary files withnmtpy-build-vocab. - Added new command
nmtpy testfor non beam-search inference modes. - Removed
nmtpy resumecommand and addedpretrained_fileoption for[train]to initialize model weights from a checkpoint. - Added
freeze_layersoption for[train]to give comma-separated list of layer name prefixes to freeze. - Improved seeding: seed is now printed in order to reproduce the results.
- Added IPython notebook for attention visualization.
-
Layers
- New shallow
SimpleGRUDecoderlayer. -
TextEncoder: Ability to setmaxnormandgradscaleof embeddings and work with or without sorted-length batches. -
ConditionalDecoder: Make it work with GRU/LSTM, allow settingmaxnorm/gradscalefor embeddings. -
ConditionalMMDecoder: Same as above.
- New shallow
-
nmtpy translate
-
--avoid-doubleand--avoid-unkremoved for now. - Added Google's length penalty normalization switch
--lp-alpha. - Added ensembling which is enabled automatically if you give more than 1 model checkpoints.
-
- New machine learning metric wrappers in
utils/ml_metrics.py:- Label-ranking average precision
lrap - Coverage error
- Mean reciprocal rank
- Label-ranking average precision
- You can now use
$HOMEand$USERin your configuration files. - Fixed an overflow error that would cause NMT with more than 255 tokens to fail.
- METEOR worker process is now correctly killed after validations.
- Many runs of an experiment are now suffixed with a unique random string instead of incremental integers to avoid race conditions in cluster setups.
- Replaced
utils.nn.get_network_topology()with a newTopologyclass that will parse thedirectionstring of the model in a more smart way. - If
CUDA_VISIBLE_DEVICESis set, theGPUManagerwill always honor it. - Dropped creation of temporary/advisory lock files under
/tmpfor GPU reservation. - Time measurements during training are now structered into batch overhead, training and evaluation timings.
-
Datasets
- Added
TextDatasetfor standalone text file reading. - Added
OneHotDataset, a variant ofTextDatasetwhere the sequences are not prefixed/suffixed with<bos>and<eos>respectively. - Added experimental
MultiParallelDatasetthat merges an arbitrary number of parallel datasets together.
- Added
-
nmtpy translate
-
.nodbland.nounksuffixes are now added to output files for--avoid-doubleand--avoid-unkarguments respectively. - A model-agnostic enough
beam_search()is now separated out into its own filenmtpytorch/search.py. -
max_lendefault is increased to 200.
-
- New experimental
Multi30kDatasetandImageFolderDatasetclasses -
torchvisiondependency added for CNN support -
nmtpy-coco-metricsnow computes one METEOR withoutnorm=True - Mainloop mechanism is completely refactored with backward-incompatible
configuration option changes for
[train]section:-
patience_deltaoption is removed - Added
eval_batch_sizeto define batch size for GPU beam-search during training -
eval_freqdefault is now3000which means per3000minibatches -
eval_metricsnow defaults toloss. As before, you can provide a list of metrics likebleu,meteor,lossto compute all of them and early-stop based on the first - Added
eval_zero (default: False)which tells to evaluate the model once on dev set right before the training starts. Useful for sanity checking if you fine-tune a model initialized with pre-trained weights - Removed
save_best_n: we no longer save the bestNmodels on dev set w.r.t. early-stopping metric - Added
save_best_metrics (default: True)which will save best models on dev set w.r.t each metric provided ineval_metrics. This kind of remedies the removal ofsave_best_n -
checkpoint_freqnow to defaults to5000which means per5000minibatches. - Added
n_checkpoints (default: 5)to define the number of last checkpoints that will be kept ifcheckpoint_freq > 0i.e. checkpointing enabled
-
- Added
ExtendedInterpolationsupport to configuration files:- You can now define intermediate variables in
.conffiles to avoid typing same paths again and again. A variable can be referenced from within its section usingtensorboard_dir: ${save_path}/tbnotation Cross-section references are also possible:${data:root}will be replaced by the value of therootvariable defined in the[data]section.
- You can now define intermediate variables in
- Added
-p/--pretrainedtonmtpy trainto initialize the weights of the model using another checkpoint.ckpt. - Improved input/output handling for
nmtpy translate:-
-saccepts a comma-separated test sets defined in the configuration file of the experiment to translate them at once. Example:-s val,newstest2016,newstest2017 - The mutually exclusive counterpart of
-sis-Swhich receives a single input file of source sentences. - For both cases, an output prefix should now be provided with
-o. In the case of multiple test sets, the output prefix will be appended the name of the test set and the beam size. If you just provide a single file with-Sthe final output name will only reflect the beam size information.
-
- Two new arguments for
nmtpy-build-vocab:-
-f: Stores frequency counts as well inside the finaljsonvocabulary -
-x: Does not add special markers<eos>,<bos>,<unk>,<pad>into the vocabulary
-
- Added
Fusion()layer toconcat,sum,mulan arbitrary number of inputs - Added experimental
ImageEncoder()layer to seamlessly plug a VGG or ResNet CNN usingtorchvisionpretrained models -
Attentionlayer arguments improved. You can now select the bottleneck dimensionality for MLP attention withatt_bottleneck. Thedotattention is still not tested and probably broken.
New layers/architectures:
- Added AttentiveMNMT which implements modality-specific multimodal attention from the paper Multimodal Attention for Neural Machine Translation
- Added ShowAttendAndTell model
Changes in NMT:
-
dec_initdefaults tomean_ctx, i.e. the decoder will be initialized with the mean context computed from the source encoder -
enc_lnormwhich was just a placeholder is now removed since we do not provided layer-normalization for now - Beam Search is completely moved to GPU
nmtpytorch is developed in Informatics Lab / Le Mans University - France