-
Notifications
You must be signed in to change notification settings - Fork 74
Open
Description
Hello!
I am trying to use the pre-built 'sherlock' model to make predictions. As suggested in the readme, I have run some of the cells in the 02-1-train-and-test-sherlock.ipynb file but get a KeyError when model.predict(X_test) is run.
Code to Reproduce:
model_id = 'sherlock'
from ast import literal_eval
from collections import Counter
from datetime import datetime
import numpy as np
import pandas as pd
from sklearn.metrics import f1_score, classification_report
from sherlock.deploy.model import SherlockModel
start = datetime.now()
print(f'Started at {start}')
X_test = pd.read_parquet('../data/processed/X_test.parquet')
y_test = pd.read_parquet('../data/raw/test_labels.parquet').values.flatten()
y_test = np.array([x.lower() for x in y_test])
print(f'Finished at {datetime.now()}, took {datetime.now() - start} seconds')
start = datetime.now()
print(f'Started at {start}')
model = SherlockModel();
model.initialize_model_from_json(with_weights=True, model_id="sherlock");
print('Initialized model.')
print(f'Finished at {datetime.now()}, took {datetime.now() - start} seconds')
predicted_labels = model.predict(X_test)
predicted_labels = np.array([x.lower() for x in predicted_labels])When model.predict(X_test) is run the following KeyError occurs:
KeyError Traceback (most recent call last)
/var/folders/66/cbb21km104n7d7t9qf61q8rmrsjdc8/T/ipykernel_21846/2316637303.py in <module>
----> 1 predicted_labels = model.predict(X_test)
2 predicted_labels = np.array([x.lower() for x in predicted_labels])
~/ebsco_repos/sherlock-project/sherlock/deploy/model.py in predict(self, X, model_id)
118 Array with predictions for X.
119 """
--> 120 y_pred = self.predict_proba(X, model_id)
121 y_pred_classes = helpers._proba_to_classes(y_pred, model_id)
122
~/ebsco_repos/sherlock-project/sherlock/deploy/model.py in predict_proba(self, X, model_id)
141 y_pred = self.model.predict(
142 [
--> 143 X[feature_cols_dict["char"]].values,
144 X[feature_cols_dict["word"]].values,
145 X[feature_cols_dict["par"]].values,
KeyError: "['n_[^]-agg-sum', 'n_[^]-agg-max', 'n_[\\\\]-agg-kurtosis', 'n_[^]-agg-var', 'n_[\\\\]-agg-median', 'n_[^]-agg-kurtosis', 'n_[\\\\]-agg-mean', 'n_[\\\\]-agg-all', 'n_[^]-agg-min', 'n_[\\\\]-agg-sum', 'n_[^]-agg-median', 'n_[^]-agg-mean', 'n_[^]-agg-all', 'n_[\\\\]-agg-min', 'n_[\\\\]-agg-max', 'n_[^]-agg-any', 'n_[\\\\]-agg-var', 'n_[\\\\]-agg-any', 'n_[^]-agg-skewness', 'n_[\\\\]-agg-skewness'] not in index"
Is there something that I am missing or need to do prior to running the above code?
Appreciate the help!
Metadata
Metadata
Assignees
Labels
No labels