Skip to content

KeyError when running model.predict(X_test) in 02-1-train-and-test-sherlock.ipynb #48

@KentonParton

Description

@KentonParton

Hello!

I am trying to use the pre-built 'sherlock' model to make predictions. As suggested in the readme, I have run some of the cells in the 02-1-train-and-test-sherlock.ipynb file but get a KeyError when model.predict(X_test) is run.

Code to Reproduce:

model_id = 'sherlock'

from ast import literal_eval
from collections import Counter
from datetime import datetime

import numpy as np
import pandas as pd

from sklearn.metrics import f1_score, classification_report

from sherlock.deploy.model import SherlockModel

start = datetime.now()
print(f'Started at {start}')

X_test = pd.read_parquet('../data/processed/X_test.parquet')
y_test = pd.read_parquet('../data/raw/test_labels.parquet').values.flatten()

y_test = np.array([x.lower() for x in y_test])

print(f'Finished at {datetime.now()}, took {datetime.now() - start} seconds')

start = datetime.now()
print(f'Started at {start}')

model = SherlockModel();
model.initialize_model_from_json(with_weights=True, model_id="sherlock");

print('Initialized model.')
print(f'Finished at {datetime.now()}, took {datetime.now() - start} seconds')

predicted_labels = model.predict(X_test)
predicted_labels = np.array([x.lower() for x in predicted_labels])

When model.predict(X_test) is run the following KeyError occurs:

KeyError                                  Traceback (most recent call last)
/var/folders/66/cbb21km104n7d7t9qf61q8rmrsjdc8/T/ipykernel_21846/2316637303.py in <module>
----> 1 predicted_labels = model.predict(X_test)
      2 predicted_labels = np.array([x.lower() for x in predicted_labels])

~/ebsco_repos/sherlock-project/sherlock/deploy/model.py in predict(self, X, model_id)
    118         Array with predictions for X.
    119         """
--> 120         y_pred = self.predict_proba(X, model_id)
    121         y_pred_classes = helpers._proba_to_classes(y_pred, model_id)
    122 

~/ebsco_repos/sherlock-project/sherlock/deploy/model.py in predict_proba(self, X, model_id)
    141         y_pred = self.model.predict(
    142             [
--> 143                 X[feature_cols_dict["char"]].values,
    144                 X[feature_cols_dict["word"]].values,
    145                 X[feature_cols_dict["par"]].values,

KeyError: "['n_[^]-agg-sum', 'n_[^]-agg-max', 'n_[\\\\]-agg-kurtosis', 'n_[^]-agg-var', 'n_[\\\\]-agg-median', 'n_[^]-agg-kurtosis', 'n_[\\\\]-agg-mean', 'n_[\\\\]-agg-all', 'n_[^]-agg-min', 'n_[\\\\]-agg-sum', 'n_[^]-agg-median', 'n_[^]-agg-mean', 'n_[^]-agg-all', 'n_[\\\\]-agg-min', 'n_[\\\\]-agg-max', 'n_[^]-agg-any', 'n_[\\\\]-agg-var', 'n_[\\\\]-agg-any', 'n_[^]-agg-skewness', 'n_[\\\\]-agg-skewness'] not in index"

Is there something that I am missing or need to do prior to running the above code?

Appreciate the help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions