Skip to content

DSSM tutorial can not run successfully #158

@surzia

Description

@surzia

Describe the bug

I cloned this repo by:
git clone https://github.com/NTMC-Community/MatchZoo-py.git
cd MatchZoo-py
python setup.py install
then I went to tutorials dir:
cd tutorials/ranking
To run .ipynb file in shell, I installed runipy by:
pip install runipy
but when I tried to run dssm.ipynb,
runipy dssm.ipynb
it returned errors

To Reproduce

02/07/2021 05:40:49 PM INFO: Reading notebook dssm.ipynb
02/07/2021 05:40:50 PM INFO: Running cell:
%run init.ipynb

02/07/2021 05:40:53 PM INFO: Cell returned
02/07/2021 05:40:53 PM INFO: Running cell:
ranking_task = mz.tasks.Ranking(losses=mz.losses.RankCrossEntropyLoss(num_neg=4))
ranking_task.metrics = [
mz.metrics.NormalizedDiscountedCumulativeGain(k=3),
mz.metrics.NormalizedDiscountedCumulativeGain(k=5),
mz.metrics.MeanAveragePrecision()
]

02/07/2021 05:40:53 PM INFO: Cell returned
02/07/2021 05:40:53 PM INFO: Running cell:
preprocessor = mz.models.DSSM.get_default_preprocessor(ngram_size=3)
train_pack_processed = preprocessor.fit_transform(train_pack_raw)
valid_pack_processed = preprocessor.transform(dev_pack_raw)
test_pack_processed = preprocessor.transform(test_pack_raw)

02/07/2021 05:41:12 PM INFO: Cell returned
02/07/2021 05:41:13 PM INFO: Running cell:
preprocessor.context

02/07/2021 05:41:13 PM INFO: Cell returned
02/07/2021 05:41:13 PM INFO: Running cell:
triletter_callback = mz.dataloader.callbacks.Ngram(
preprocessor, mode='aggregate')

trainset = mz.dataloader.Dataset(
data_pack=train_pack_processed,
mode='pair',
num_dup=1,
num_neg=4,
callbacks=[triletter_callback]
)
testset = mz.dataloader.Dataset(
data_pack=test_pack_processed,
callbacks=[triletter_callback]
)

02/07/2021 05:41:22 PM INFO: Cell returned
02/07/2021 05:41:22 PM INFO: Running cell:
padding_callback = mz.models.DSSM.get_default_padding_callback()

trainloader = mz.dataloader.DataLoader(
dataset=trainset,
batch_size=32,
stage='train',
resample=True,
callback=padding_callback
)
testloader = mz.dataloader.DataLoader(
dataset=testset,
batch_size=32,
stage='dev',
callback=padding_callback
)

02/07/2021 05:41:22 PM INFO: Cell raised uncaught exception:

TypeError Traceback (most recent call last)
in
6 stage='train',
7 resample=True,
----> 8 callback=padding_callback
9 )
10 testloader = mz.dataloader.DataLoader(

TypeError: init() got an unexpected keyword argument 'batch_size'
02/07/2021 05:41:22 PM INFO: Shutdown kernel
02/07/2021 05:41:22 PM WARNING: Exiting with nonzero exit status

Describe your attempts

I haven't modified anything of this file

Context

  • OS [Linux omnisky 4.15.0-132-generic#136-Ubuntu SMP Tue Jan 12 14:58:42 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux]:
  • Hardware [ GeForce RTX 2080 Ti]:

In addition, the result of running import matchzoo; matchzoo.__version__ is [1.1.1]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions