Skip to content

Commit 7dc515c

Browse files
dengdifanNHML23117Deng Difan
authored
Time series forecasting (#434)
* new target scaler, allow NoNorm for MLP Encpder * allow sampling full sequences * integrate SeqBuilder to SequenceCollector * restore SequenceBuilder to reduce memory usage * move scaler to network * lag sequence * merge encoder and decoder as a single pipeline * faster lag_seq builder * maint * new init, faster DeepAR inference in trainer * more losses types * maint * new Transformer models, allow RNN to do deepAR inference * maint * maint * maint * maint * reduced search space for Transformer * reduced init design * maint * maint * maint * maint * faster forecasting * maint * allow singel fidelity * maint * fix budget num_seq * faster sampler and lagger * maint * maint * maint deepAR * maint * maint * cross validation * allow holdout for smaller datasets * smac4ac to smac4hpo * maint * maint * allow to change decoder search space * more resampling strategy, more options for MLP * reduced NBEATS * subsampler for val loader * rng for dataloader sampler * maint * remove generator as it cannot be pickled * allow lower fidelity to evaluate less test instances * fix dummy forecastro isues * maint * add gluonts as requirement * more data for val set for larger dataset * maint * maint * fix nbeats decoder * new dataset interface * resolve conflict * maint * allow encoder to receive input from different sources * multi blocks hp design * maint * correct hp updates * first trial on nested conjunction * maint * fit for deep AR model (needs to be reverted when the issue in ConfigSpace is fixed!!!) * adjust backbones to fit new structure * further API changes * tft temporal fusion decoder * construct network * cells for networks * forecasting backbones * maint * maint * move tft layer to backbone * maint * quantile loss * maint * maint * maint * maint * maint * maint * forecasting init configs * add forbidden * maint * maint * maint * remove shift data * maint * maint * copy dataset_properties for each refit iteration * maint and new init * Tft forecating with features (#6) * time feature transform * tft with time-variing features * transform features allowed for all architecture * repair mask for temporal fusion layer * maint * fix loss computation in QuantileLoss * fixed scaler computation * maint * fix dataset * adjust window_size to seasonality * maint scaling * fix uncorrect Seq2Seq scaling * fix sampling for seq2seq * maint * fix scaling in NBEATS * move time feature computation to dataset * maint * fix feature computation * maint * multi-variant feature validator * maint * validator for multi-variant series * feature validator * multi-variant datasets * observed targets * stucture adjustment * refactory ts tasks and preprocessing * allow nan in targets * preprocessing for time series * maint * forecasting pipeline * maint * embedding and maint * move targets to the tail of the features * maint * static features * adjsut scaler to static features * remove static features from forward dict * test transform * maint * test sets * adjust dataset to allow future known features * maint * maint * flake8 * synchronise with development * recover timeseries * maint * maint * limit memory usage tae * revert test api * test for targets * not allow sparse forecasting target * test for data validator * test for validations * test on TimeSeriesSequence * maint * test for resampling * test for dataset 1 * test for datasets * test on tae * maint * all evaluator to evalaute test sets * tests on losses * test for metrics * forecasting preprocessing * maint * finish test for preprocessing * test for data loader * tests for dataloader * maint * test for target scaling 1 * test for target scaer * test for training loss * maint * test for network backbone * test for backbone base * test for flat encoder * test for seq encoder * test for seqencoder * maint * test for recurrent decoders * test for network * maint * test for architecture * test for pipelines * fixed sampler * maint sampler * resolve conflict between embedding and net encoder * fix scaling * allow transform for test dataloader * maint dataloader * fix updates * fix dataset * tests on api, initial design on multi-variant * maint * fix dataloader * move test with for loop to unittest.subtest * flake 8 and update requirement * mypy * validator for pd dataframe * allow series idx for api * maint * examples for forecasting * fix mypy * properly memory limitation for forecasting example * fix pre-commit * maint dataloader * remove unused auto-regressive arguments * fix pre-commit * maint * maint mypy * mypy!!! * pre-commit * mypyyyyyyyyyyyyyyyyyyyyyyyy * maint * move forcasting requirements to extras_require * bring eval_test to tae * make rh2epm consistent with SMAC4HPO * remove smac4ac from smbo * revert changes in network * revert changes in trainer * revert format changes * move constant_forecasting to constatn * additional annotate for base pipeline * move forecasting check to tae * maint time series refit dataset * fix test * workflow for extra requirements * docs for time series dataset * fix pre-commit * docs for dataset * maint docstring * merge target scaler to one file * fix forecasting init cfgs * remove redudant pipeline configs * maint * SMAC4HPO instead of SMAC4AC in smbo (will be reverted further if study shows that SMAC4HPO is superior to SMAC4AC) * fixed docstrign for RNN and Transformer Decoder * uniformed docstrings for smbo and base task * correct encoder to decoder in decoder.init * fix doc strings * add license and docstrings for NBEATS heads * allow memory limit to be None * relax test load for forecasting * fix docs * fix pre-commit * make test compatible with py37 * maint docstring * split forecasting_eval_train_function from eval_train_function * fix namespace for test_api from train_evaluator to tae * maint test api for forecasting * decrease number of ensemble size of test_time_series_forecasting to reduce test time * flatten all the prediction for forecasting pipelines * pre-commit fix * fix docstrings and typing * maint time series dataset docstrings * maint warning message in time_series_forecasting_train_evaluator * fix lines that are overlength Co-authored-by: NHML23117 <nhmldeng@login03.css.lan> Co-authored-by: Deng Difan <deng@p200300cd070f1f50dabbc1fffe9c6aa9.dip0.t-ipconnect.de>
1 parent 71c2665 commit 7dc515c

File tree

144 files changed

+20795
-633
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

144 files changed

+20795
-633
lines changed

.github/workflows/docs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ jobs:
3333

3434
- name: Install dependencies
3535
run: |
36-
pip install -e .[docs,examples]
36+
pip install -e .[docs,examples,forecasting]
3737
3838
- name: Make docs
3939
run: |

.github/workflows/long_regression_test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ jobs:
3030
- name: Install test dependencies
3131
run: |
3232
python -m pip install --upgrade pip
33-
pip install -e .[test]
33+
pip install -e .[forecasting,test]
3434
3535
- name: Run tests
3636
run: |

.github/workflows/pytest.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ jobs:
8989
run: |
9090
git submodule update --init --recursive
9191
python -m pip install --upgrade pip
92-
pip install -e .[test]
92+
pip install -e .[forecasting,test]
9393
9494
- name: Dist install
9595
if: matrix.kind == 'dist'
@@ -98,7 +98,7 @@ jobs:
9898
9999
python setup.py sdist
100100
last_dist=$(ls -t dist/autoPyTorch-*.tar.gz | head -n 1)
101-
pip install $last_dist[test]
101+
pip install $last_dist[forecasting,test]
102102
103103
- name: Store repository status
104104
id: status-before

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,4 +165,4 @@ Please refer to the branch `TPAMI.2021.3067763` to reproduce the paper *Auto-PyT
165165

166166
## Contact
167167

168-
Auto-PyTorch is developed by the [AutoML Group of the University of Freiburg](http://www.automl.org/).
168+
Auto-PyTorch is developed by the [AutoML Groups of the University of Freiburg and Hannover](http://www.automl.org/).

autoPyTorch/api/base_task.py

Lines changed: 55 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,12 @@
3434
from autoPyTorch import metrics
3535
from autoPyTorch.automl_common.common.utils.backend import Backend, create
3636
from autoPyTorch.constants import (
37+
FORECASTING_BUDGET_TYPE,
38+
FORECASTING_TASKS,
3739
REGRESSION_TASKS,
3840
STRING_TO_OUTPUT_TYPES,
3941
STRING_TO_TASK_TYPES,
42+
TIMESERIES_FORECASTING,
4043
)
4144
from autoPyTorch.data.base_validator import BaseInputValidator
4245
from autoPyTorch.data.utils import DatasetCompressionSpec
@@ -77,7 +80,8 @@ def _pipeline_predict(pipeline: BasePipeline,
7780
X: Union[np.ndarray, pd.DataFrame],
7881
batch_size: int,
7982
logger: PicklableClientLogger,
80-
task: int) -> np.ndarray:
83+
task: int,
84+
task_type: str = "") -> np.ndarray:
8185
@typing.no_type_check
8286
def send_warnings_to_log(
8387
message, category, filename, lineno, file=None, line=None):
@@ -87,7 +91,7 @@ def send_warnings_to_log(
8791
X_ = X.copy()
8892
with warnings.catch_warnings():
8993
warnings.showwarning = send_warnings_to_log
90-
if task in REGRESSION_TASKS:
94+
if task in REGRESSION_TASKS or task in FORECASTING_TASKS:
9195
# Voting regressor does not support batch size
9296
prediction = pipeline.predict(X_)
9397
else:
@@ -101,13 +105,13 @@ def send_warnings_to_log(
101105
prediction,
102106
np.sum(prediction, axis=1)
103107
))
104-
105-
if len(prediction.shape) < 1 or len(X_.shape) < 1 or \
106-
X_.shape[0] < 1 or prediction.shape[0] != X_.shape[0]:
107-
logger.warning(
108-
"Prediction shape for model %s is %s while X_.shape is %s",
109-
pipeline, str(prediction.shape), str(X_.shape)
110-
)
108+
if STRING_TO_TASK_TYPES.get(task_type, -1) != TIMESERIES_FORECASTING:
109+
if len(prediction.shape) < 1 or len(X_.shape) < 1 or \
110+
X_.shape[0] < 1 or prediction.shape[0] != X_.shape[0]:
111+
logger.warning(
112+
"Prediction shape for model %s is %s while X_.shape is %s",
113+
pipeline, str(prediction.shape), str(X_.shape)
114+
)
111115
return prediction
112116

113117

@@ -218,6 +222,8 @@ def __init__(
218222
self.search_space: Optional[ConfigurationSpace] = None
219223
self._dataset_requirements: Optional[List[FitRequirement]] = None
220224
self._metric: Optional[autoPyTorchMetric] = None
225+
self._metrics_kwargs: Dict = {}
226+
221227
self._scoring_functions: Optional[List[autoPyTorchMetric]] = None
222228
self._logger: Optional[PicklableClientLogger] = None
223229
self.dataset_name: Optional[str] = None
@@ -737,7 +743,7 @@ def _do_dummy_prediction(self) -> None:
737743
stats=stats,
738744
memory_limit=memory_limit,
739745
disable_file_output=self._disable_file_output,
740-
all_supported_metrics=self._all_supported_metrics
746+
all_supported_metrics=self._all_supported_metrics,
741747
)
742748

743749
status, _, _, additional_info = ta.run(num_run, cutoff=self._time_for_task)
@@ -822,7 +828,7 @@ def _do_traditional_prediction(self, time_left: int, func_eval_time_limit_secs:
822828
stats=stats,
823829
memory_limit=memory_limit,
824830
disable_file_output=self._disable_file_output,
825-
all_supported_metrics=self._all_supported_metrics
831+
all_supported_metrics=self._all_supported_metrics,
826832
)
827833
dask_futures.append([
828834
classifier,
@@ -906,8 +912,8 @@ def _search(
906912
optimize_metric: str,
907913
dataset: BaseDataset,
908914
budget_type: str = 'epochs',
909-
min_budget: int = 5,
910-
max_budget: int = 50,
915+
min_budget: Union[int, float] = 5,
916+
max_budget: Union[int, float] = 50,
911917
total_walltime_limit: int = 100,
912918
func_eval_time_limit_secs: Optional[int] = None,
913919
enable_traditional_pipeline: bool = True,
@@ -920,7 +926,8 @@ def _search(
920926
disable_file_output: Optional[List[Union[str, DisableFileOutputParameters]]] = None,
921927
load_models: bool = True,
922928
portfolio_selection: Optional[str] = None,
923-
dask_client: Optional[dask.distributed.Client] = None
929+
dask_client: Optional[dask.distributed.Client] = None,
930+
**kwargs: Any
924931
) -> 'BaseTask':
925932
"""
926933
Search for the best pipeline configuration for the given dataset.
@@ -1048,7 +1055,14 @@ def _search(
10481055
Additionally, the keyword 'greedy' is supported,
10491056
which would use the default portfolio from
10501057
`AutoPyTorch Tabular <https://arxiv.org/abs/2006.13799>`_
1051-
1058+
kwargs: Any
1059+
additional arguments that are customed by some specific task.
1060+
For instance, forecasting tasks require:
1061+
min_num_test_instances (int): minimal number of instances used to initialize a proxy validation set
1062+
suggested_init_models (List[str]): A set of initial models suggested by the users. Their
1063+
hyperparameters are determined by the default configurations
1064+
custom_init_setting_path (str): The path to the initial hyperparameter configurations set by
1065+
the users
10521066
Returns:
10531067
self
10541068
@@ -1110,7 +1124,10 @@ def _search(
11101124
self.search_space = self.get_search_space(dataset)
11111125

11121126
# Incorporate budget to pipeline config
1113-
if budget_type not in ('epochs', 'runtime'):
1127+
if budget_type not in ('epochs', 'runtime') and (
1128+
budget_type in FORECASTING_BUDGET_TYPE
1129+
and STRING_TO_TASK_TYPES[self.task_type] != TIMESERIES_FORECASTING
1130+
):
11141131
raise ValueError("Budget type must be one ('epochs', 'runtime')"
11151132
f" yet {budget_type} was provided")
11161133
self.pipeline_options['budget_type'] = budget_type
@@ -1216,6 +1233,7 @@ def _search(
12161233
precision=precision,
12171234
logger_port=self._logger_port,
12181235
pynisher_context=self._multiprocessing_context,
1236+
metrics_kwargs=self._metrics_kwargs,
12191237
)
12201238
self._stopwatch.stop_task(ensemble_task_name)
12211239

@@ -1229,7 +1247,6 @@ def _search(
12291247
if time_left_for_smac <= 0:
12301248
self._logger.warning(" Not starting SMAC because there is no time left")
12311249
else:
1232-
12331250
_proc_smac = AutoMLSMBO(
12341251
config_space=self.search_space,
12351252
dataset_name=str(dataset.dataset_name),
@@ -1259,6 +1276,8 @@ def _search(
12591276
search_space_updates=self.search_space_updates,
12601277
portfolio_selection=portfolio_selection,
12611278
pynisher_context=self._multiprocessing_context,
1279+
task_type=self.task_type,
1280+
**kwargs,
12621281
)
12631282
try:
12641283
run_history, self._results_manager.trajectory, budget_type = \
@@ -1323,19 +1342,30 @@ def _get_fit_dictionary(
13231342
dataset: BaseDataset,
13241343
split_id: int = 0
13251344
) -> Dict[str, Any]:
1326-
X_test = dataset.test_tensors[0].copy() if dataset.test_tensors is not None else None
1327-
y_test = dataset.test_tensors[1].copy() if dataset.test_tensors is not None else None
1345+
if dataset.test_tensors is not None:
1346+
X_test = dataset.test_tensors[0].copy() if dataset.test_tensors[0] is not None else None
1347+
y_test = dataset.test_tensors[1].copy() if dataset.test_tensors[1] is not None else None
1348+
else:
1349+
X_test = None
1350+
y_test = None
1351+
1352+
X_train = dataset.train_tensors[0].copy() if dataset.train_tensors[0] is not None else None
1353+
y_train = dataset.train_tensors[1].copy()
13281354
X: Dict[str, Any] = dict({'dataset_properties': dataset_properties,
13291355
'backend': self._backend,
1330-
'X_train': dataset.train_tensors[0].copy(),
1331-
'y_train': dataset.train_tensors[1].copy(),
1356+
'X_train': X_train,
1357+
'y_train': y_train,
13321358
'X_test': X_test,
13331359
'y_test': y_test,
13341360
'train_indices': dataset.splits[split_id][0],
13351361
'val_indices': dataset.splits[split_id][1],
13361362
'split_id': split_id,
13371363
'num_run': self._backend.get_next_num_run(),
13381364
})
1365+
if STRING_TO_TASK_TYPES[self.task_type] == TIMESERIES_FORECASTING:
1366+
warnings.warn("Currently Time Series Forecasting tasks do not allow computing metrics "
1367+
"during training. It will be automatically set as False")
1368+
self.pipeline_options["metrics_during_training"] = False
13391369
X.update(self.pipeline_options)
13401370
return X
13411371

@@ -1398,7 +1428,7 @@ def refit(
13981428
# could alleviate the problem in algorithms that depend on
13991429
# the ordering of the data.
14001430
X = self._get_fit_dictionary(
1401-
dataset_properties=dataset_properties,
1431+
dataset_properties=copy.copy(dataset_properties),
14021432
dataset=dataset,
14031433
split_id=split_id)
14041434
fit_and_suppress_warnings(self._logger, model, X, y=None)
@@ -1630,7 +1660,7 @@ def fit_pipeline(
16301660
exclude=exclude_components,
16311661
search_space_updates=search_space_updates,
16321662
pipeline_config=pipeline_options,
1633-
pynisher_context=self._multiprocessing_context
1663+
pynisher_context=self._multiprocessing_context,
16341664
)
16351665

16361666
run_info, run_value = tae.run_wrapper(
@@ -1722,7 +1752,8 @@ def predict(
17221752

17231753
all_predictions = joblib.Parallel(n_jobs=n_jobs)(
17241754
joblib.delayed(_pipeline_predict)(
1725-
models[identifier], X_test, batch_size, self._logger, STRING_TO_TASK_TYPES[self.task_type]
1755+
models[identifier], X_test, batch_size, self._logger, STRING_TO_TASK_TYPES[self.task_type],
1756+
self.task_type
17261757
)
17271758
for identifier in self.ensemble_.get_selected_model_identifiers()
17281759
)

0 commit comments

Comments
 (0)