Skip to content

4. How It Works

Steven Paul Sanderson II, MPH edited this page Nov 16, 2023 · 10 revisions

How It Works

The regression and classification functions work in identical fashions so we will only pick on the regression function, that is to say, how does the fast_regression() function work. While, it does work for many, it does fail for some, which means the design is flawed and it is possible at this point that it is a fundamental flaw (which I think it is). First I will post the code, and then we will see what the output of each step looks like.

The Function

Here is the full function:

fast_regression <- function(.data, .rec_obj, .parsnip_fns = "all",
                            .parsnip_eng = "all", .split_type = "initial_split",
                            .split_args = NULL){

  # Tidy Eval ----
  call <- list(.parsnip_fns) %>%
    purrr::flatten_chr()
  engine <- list(.parsnip_eng) %>%
    purrr::flatten_chr()

  rec_obj <- .rec_obj
  split_type <- .split_type
  split_args <- .split_args

  # Checks ----

  # Get data splits
  df <- dplyr::as_tibble(.data)
  splits_obj <- create_splits(
    .data = df,
    .split_type = split_type,
    .split_args = split_args
  )

  # Generate Model Spec Tbl
  mod_spec_tbl <- fast_regression_parsnip_spec_tbl(
    .parsnip_fns = call,
    .parsnip_eng = engine
  )

  # Generate Workflow object
  mod_tbl <- mod_spec_tbl %>%
    dplyr::mutate(
      wflw = internal_make_wflw(mod_spec_tbl, .rec_obj = rec_obj)
    )

  mod_fitted_tbl <- mod_tbl %>%
    dplyr::mutate(
      fitted_wflw = internal_make_fitted_wflw(mod_tbl, splits_obj)
    )

  mod_pred_tbl <- mod_fitted_tbl %>%
    dplyr::mutate(
      pred_wflw = internal_make_wflw_predictions(mod_fitted_tbl, splits_obj)
    )


  # Return ----
  class(mod_tbl) <- c("fst_reg_tbl", class(mod_tbl))
  attr(mod_tbl, ".parsnip_engines") <- .parsnip_eng
  attr(mod_tbl, ".parsnip_functions") <- .parsnip_fns
  attr(mod_tbl, ".split_type") <- .split_type
  attr(mod_tbl, ".split_args") <- .split_args

  return(mod_pred_tbl)
}

We see that the function indeed is broken up into steps basically by building off of the previous output. So what is happening?

What's going on?

This function, named fast_regression, is designed to generate model specifications for regression using the parsnip package in R. Let's break down the key components of the code:

  1. Function Signature:

    fast_regression <- function(.data, .rec_obj, .parsnip_fns = "all",
                                .parsnip_eng = "all", .split_type = "initial_split",
                                .split_args = NULL)

    This function takes several parameters:

    • .data: The data for the regression problem.
    • .rec_obj: A recipe object.
    • .parsnip_fns: The parsnip model functions to use. Default is "all."
    • .parsnip_eng: The parsnip model engines to use. Default is "all."
    • .split_type: The type of data split to use (e.g., initial split). Default is "initial_split."
    • .split_args: Additional arguments for data splitting. Default is NULL.
  2. Tidy Eval and Parameter Handling:

    call <- list(.parsnip_fns) %>%
      purrr::flatten_chr()
    engine <- list(.parsnip_eng) %>%
      purrr::flatten_chr()
    
    rec_obj <- .rec_obj
    split_type <- .split_type
    split_args <- .split_args

    Here, the function uses tidy evaluation to handle the parameters related to parsnip functions and engines.

  3. Data Splitting:

    df <- dplyr::as_tibble(.data)
    splits_obj <- create_splits(
      .data = df,
      .split_type = split_type,
      .split_args = split_args
    )

    The function converts the input data to a tibble and then uses the create_splits function to generate data splits based on the specified split type and arguments.

  4. Model Specification Table Generation:

    mod_spec_tbl <- fast_regression_parsnip_spec_tbl(
      .parsnip_fns = call,
      .parsnip_eng = engine
    )

    The fast_regression_parsnip_spec_tbl function is called to generate a table of parsnip model specifications based on the specified parsnip functions and engines.

  5. Workflow and Fitted Model Generation:

    mod_tbl <- mod_spec_tbl %>%
      dplyr::mutate(
        wflw = internal_make_wflw(mod_spec_tbl, .rec_obj = rec_obj)
      )
    
    mod_fitted_tbl <- mod_tbl %>%
      dplyr::mutate(
        fitted_wflw = internal_make_fitted_wflw(mod_tbl, splits_obj)
      )

    The function creates a workflow (wflw) and a fitted workflow (fitted_wflw) using internal functions (internal_make_wflw and internal_make_fitted_wflw) based on the generated model specifications and data splits.

  6. Prediction Generation:

    mod_pred_tbl <- mod_fitted_tbl %>%
      dplyr::mutate(
        pred_wflw = internal_make_wflw_predictions(mod_fitted_tbl, splits_obj)
      )

    Finally, the function generates predictions using the fitted workflow and the data splits.

  7. Return:

    return(mod_pred_tbl)

    The function returns a table (mod_pred_tbl) containing information about the generated model specifications, workflows, fitted models, and predictions.

Clone this wiki locally