|
94 | 94 | "cell_type": "markdown", |
95 | 95 | "metadata": {}, |
96 | 96 | "source": [ |
97 | | - "First let's use some built in functions from EvalML to convert the data to a woodwork data structure and then cast its dtypes to something we'd rather work with. Then we're going to take a look at some of the unqiue, non-numeric values in the features. Sure enough, `Education_Level`, `Marital_Status`, and `Income_Category` have `Unknown` as a value. This is something we'll have to remember before we get to the model training, since `Unknown` isn't an acceptable value for any of the features." |
98 | | - ] |
99 | | - }, |
100 | | - { |
101 | | - "cell_type": "code", |
102 | | - "execution_count": null, |
103 | | - "metadata": {}, |
104 | | - "outputs": [], |
105 | | - "source": [ |
106 | | - "from evalml.utils.gen_utils import _convert_to_woodwork_structure, _convert_woodwork_types_wrapper\n", |
107 | | - "data = _convert_to_woodwork_structure(data)\n", |
108 | | - "data = _convert_woodwork_types_wrapper(data.to_dataframe())" |
| 97 | + "We're going to take a look at some of the unqiue, non-numeric values in the features. Sure enough, `Education_Level`, `Marital_Status`, and `Income_Category` have `Unknown` as a value. This is something we'll have to remember before we get to the model training, since `Unknown` isn't an acceptable value for any of the features." |
109 | 98 | ] |
110 | 99 | }, |
111 | 100 | { |
|
183 | 172 | "outputs": [], |
184 | 173 | "source": [ |
185 | 174 | "X = data.copy()\n", |
186 | | - "data = data.drop(['Credit_Limit'], axis=1)\n", |
| 175 | + "X = X.drop(['Credit_Limit'], axis=1)\n", |
187 | 176 | "y = X.pop('Attrition_Flag')\n", |
188 | 177 | "\n", |
189 | 178 | "X['Income_Category'] = X['Income_Category'].replace({'Less than $40K':0,\n", |
|
230 | 219 | "X = preprocessing(X, y)" |
231 | 220 | ] |
232 | 221 | }, |
| 222 | + { |
| 223 | + "cell_type": "markdown", |
| 224 | + "metadata": {}, |
| 225 | + "source": [ |
| 226 | + "Using `infer_feature_types`, we can convert our dataset into a [Woodwork](https://github.com/alteryx/woodwork) data structure, and even [specify what types](https://evalml.alteryx.com/en/stable/user_guide/automl.html) certain features should be. For example, we want to cast `Income_Category` as a categorical type, rather than natural language which is what it was inferred as." |
| 227 | + ] |
| 228 | + }, |
| 229 | + { |
| 230 | + "cell_type": "code", |
| 231 | + "execution_count": null, |
| 232 | + "metadata": {}, |
| 233 | + "outputs": [], |
| 234 | + "source": [ |
| 235 | + "from evalml.utils.gen_utils import infer_feature_types\n", |
| 236 | + "X = infer_feature_types(X, feature_types={'Income_Category': 'categorical',\n", |
| 237 | + " 'Education_Level': 'categorical'})\n", |
| 238 | + "X" |
| 239 | + ] |
| 240 | + }, |
233 | 241 | { |
234 | 242 | "cell_type": "markdown", |
235 | 243 | "metadata": {}, |
|
0 commit comments