|
57 | 57 | ## Deprecations
|
58 | 58 |
|
59 | 59 | * Deprecating Py2 support.
|
60 |
| - |
61 |
| -# Release 0.21.5 |
62 |
| - |
63 |
| -## Major Features and Improvements |
64 |
| - |
65 |
| -* Add `label_feature` to `StatsOptions` and enable `LiftStatsGenerator` when |
66 |
| - `label_feature` and `schema` are provided. |
67 |
| -* Add JSON serialization support for StatsOptions. |
68 |
| - |
69 |
| -## Bug Fixes and Other Changes |
70 |
| -* Only requires `avro-python3>=1.8.1,!=1.9.2.*,<2.0.0` on Python 3.5 + MacOS |
71 |
| - |
72 |
| -## Breaking Changes |
73 |
| - |
74 |
| -## Deprecations |
75 |
| - |
76 |
| -# Release 0.21.4 |
77 |
| - |
78 |
| -## Major Features and Improvements |
79 |
| - |
80 |
| -* Support visualizing feature value lift in facets visualization. |
81 |
| - |
82 |
| -## Bug Fixes and Other Changes |
83 |
| - |
84 |
| -* Fix issue writing out string feature values in LiftStatsGenerator. |
85 |
| -* Requires 'apache-beam[gcp]>=2.17,<3'. |
86 |
| -* Requires 'tensorflow-transform>=0.21.1,<0.22'. |
87 |
| -* Requires 'tfx-bsl>=0.21.3,<0.22'. |
88 |
| - |
89 |
| -## Breaking Changes |
90 |
| - |
91 |
| -## Deprecations |
92 |
| - |
93 |
| -# Release 0.21.2 |
94 |
| - |
95 |
| -## Major Features and Improvements |
96 |
| - |
97 |
| -## Bug Fixes and Other Changes |
98 |
| - |
99 |
| -* Fix facets visualization. |
100 |
| -* Optimize LiftStatsGenerator for string features. |
101 |
| -* Make `_WeightedCounter` serializable. |
102 |
| -* Add support computing for weighted examples in LiftStatsGenerator. |
103 |
| - |
104 |
| -## Breaking Changes |
105 |
| - |
106 |
| -## Deprecations |
107 |
| - |
108 |
| -* `tfdv.TFExampleDecoder` has been removed. This legacy decoder converts |
109 |
| - serialized `tf.Example` to a dict of numpy arrays, which is the legacy |
110 |
| - input format (prior to Apache Arrow). TFDV has stopped accepting that format |
111 |
| - since 0.14. Use `tfdv.DecodeTFExample` instead. |
112 |
| - |
113 |
| -# Release 0.21.1 |
114 |
| - |
115 |
| -## Major Features and Improvements |
116 |
| - |
117 |
| -## Bug Fixes and Other Changes |
118 |
| -* Do validation on weighted feature stats. |
119 |
| -* During schema inference, skip features which are missing common stats. This |
120 |
| - makes schema inference work when the input stats are generated from some |
121 |
| - pre-existing, unknown schema. |
122 |
| -* Fix facets visualization in Chrome >=M80. |
123 |
| - |
124 |
| -## Known Issues |
125 |
| - |
126 |
| -* Running TFDV with Apache Beam 2.18 or 2.19 does not work on Windows. If you |
127 |
| - are using TFDV on Windows, use Apache Beam 2.17. |
128 |
| - |
129 |
| -## Breaking Changes |
130 |
| - |
131 |
| -## Deprecations |
132 |
| - |
133 |
| -# Release 0.21.0 |
134 |
| - |
135 |
| -## Major Features and Improvements |
136 |
| - |
137 |
| -* Started depending on the CSV parsing / type inferring utilities provided |
138 |
| - by `tfx-bsl` (since tfx-bsl 0.15.2). This also brings performance improvements |
139 |
| - to the CSV decoder (~2x faster in decoding. Type inferring performance is not |
140 |
| - affected). |
141 |
| -* Compute bytes statistics for features of BYTES type. Avoid computing topk and |
142 |
| - uniques for such features. |
143 |
| -* Added LiftStatsGenerator which computes lift between one feature (typically a |
144 |
| - label) and all other categorical features. |
145 |
| - |
146 |
| -## Bug Fixes and Other Changes |
147 |
| - |
148 |
| -* Exclude examples in which the entire sparse feature is missing when |
149 |
| - calculating sparse feature statistics. |
150 |
| -* Validate min_examples_count dataset constraint. |
151 |
| -* Document the schema fields, statistics fields, and detection condition for |
152 |
| - each anomaly type that TFDV detects. |
153 |
| -* Handle null array in cross feature stats generator, top-k & uniques combiner |
154 |
| - stats generator, and sklearn mutual information generator. |
155 |
| -* Handle infinity in basic stats generator. |
156 |
| -* Set num_missing and num_examples correctly in the presence of sparse |
157 |
| - features. |
158 |
| -* Compute weighted feature stats for all weighted features declared in schema. |
159 |
| -* Enforce that mutual information is non-negative. |
160 |
| -* Depends on `tensorflow-metadata>=0.21.0,<0.22`. |
161 |
| -* Depends on `pyarrow>=0.15` (removed the upper bound as it is determined by |
162 |
| - `tfx-bsl`). |
163 |
| -* Depends on `tfx-bsl>=0.21.0,<0.22` |
164 |
| -* Depends on `apache-beam>=2.17,<3` |
165 |
| -* Validate that float feature does not contain NaNs (if disallow_nan is True). |
166 |
| - |
167 |
| -## Breaking Changes |
168 |
| - |
169 |
| -* Changed the behavior regarding to statistics over CSV data: |
170 |
| - |
171 |
| - - Previously, if a CSV column was mixed with integers and empty strings, FLOAT |
172 |
| - statistics will be collected for that column. A change was made so INT |
173 |
| - statistics would be collected instead. |
174 |
| - |
175 |
| -* Removed `csv_decoder.DecodeCSVToDict` as `Dict[str, np.ndarray]` had no longer |
176 |
| - been the internal data representation any more since 0.14. |
177 |
| - |
178 |
| -## Deprecations |
179 |
| - |
180 |
| -# Release 0.15.0 |
181 |
| - |
182 |
| -## Major Features and Improvements |
183 |
| - |
184 |
| -* Generate statistics for sparse features. |
185 |
| -* Directly convert a batch of tf.Examples to Arrow tables. Avoids conversion of |
186 |
| - tf.Example to intermediate Dict representation. |
187 |
| - |
188 |
| -## Bug Fixes and Other Changes |
189 |
| - |
190 |
| -* Generate statistics for the weight feature. |
191 |
| -* Support validation and schema inference from sliced statistics that include |
192 |
| - the default slice (validation/inference will be done using the default slice |
193 |
| - statistics). |
194 |
| -* Avoid flattening null arrays. |
195 |
| -* Set `weighted_num_examples` field in the statistics proto if a weight |
196 |
| - feature is specified. |
197 |
| -* Replace DecodedExamplesToTable with a Python implementation. |
198 |
| -* Building TFDV from source does not need pyarrow anymore. |
199 |
| -* Depends on `apache-beam[gcp]>=2.16,<3`. |
200 |
| -* Depends on `six>=1.12,<2`. |
201 |
| -* Depends on `scikit-learn>=0.18,<0.22`. |
202 |
| -* Depends on `tfx-bsl>=0.15,<0.16`. |
203 |
| -* Depends on `tensorflow-metadata>=0.15,<0.16`. |
204 |
| -* Depends on `tensorflow-transform>=0.15,<0.16`. |
205 |
| -* Depends on `tensorflow>=1.15,<3`. |
206 |
| - * Starting from 1.15, package |
207 |
| - `tensorflow` comes with GPU support. Users won't need to choose between |
208 |
| - `tensorflow` and `tensorflow-gpu`. |
209 |
| - * Caveat: `tensorflow` 2.0.0 is an exception and does not have GPU |
210 |
| - support. If `tensorflow-gpu` 2.0.0 is installed before installing |
211 |
| - `tensorflow-data-validation`, it will be replaced with `tensorflow` 2.0.0. |
212 |
| - Re-install `tensorflow-gpu` 2.0.0 if needed. |
213 |
| - |
214 |
| -## Breaking Changes |
215 |
| - |
216 |
| -## Deprecations |
217 |
| - |
218 |
| -# Release 0.14.1 |
219 |
| - |
220 |
| -## Major Features and Improvements |
221 |
| - |
222 |
| -* Add support for custom schema transformations when inferring schema. |
223 |
| - |
224 |
| -## Bug Fixes and Other Changes |
225 |
| - |
226 |
| -* Fix incorrect file hashes in the TFDV wheel. |
227 |
| -* Fix DOMException when embedding visualization in iframe. |
228 |
| - |
229 |
| -## Breaking Changes |
230 |
| - |
231 |
| -## Deprecations |
232 |
| - |
233 |
| -# Release 0.14.0 |
234 |
| - |
235 |
| -## Major Features and Improvements |
236 |
| - |
237 |
| -* Performance improvement due to optimizing inner loops. |
238 |
| -* Add support for time semantic domain related statistics. |
239 |
| -* Performance improvement due to batching accumulators before merging. |
240 |
| -* Add utility method `validate_examples_in_tfrecord`, which identifies anomalous |
241 |
| - examples in TFRecord files containing TFExamples and generates statistics for |
242 |
| - those anomalous examples. |
243 |
| -* Add utility method `validate_examples_in_csv`, which identifies anomalous |
244 |
| - examples in CSV files and generates statistics for those anomalous examples. |
245 |
| -* Add fast TF example decoder written in C++. |
246 |
| -* Make `BasicStatsGenerator` to take arrow table as input. Example batches are |
247 |
| - converted to Apache Arrow tables internally and we are able to make use of |
248 |
| - vectorized numpy functions. Improved performance of BasicStatsGenerator |
249 |
| - by ~40x. |
250 |
| -* Make `TopKUniquesStatsGenerator` and `TopKUniquesCombinerStatsGenerator` to |
251 |
| - take arrow table as input. |
252 |
| -* Add `update_schema` API which updates the schema to conform to statistics. |
253 |
| -* Add support for validating changes in the number of examples between the |
254 |
| - current and previous spans of data (using the existing `validate_statistics` |
255 |
| - function). |
256 |
| -* Support building a manylinux2010 compliant wheel in docker. |
257 |
| -* Add support for cross feature statistics. |
258 |
| - |
259 |
| -## Bug Fixes and Other Changes |
260 |
| - |
261 |
| -* Expand unit test coverage. |
262 |
| -* Update natural language stats generator to generate stats if actual ratio |
263 |
| - equals `match_ratio`. |
264 |
| -* Use `__slots__` in accumulators. |
265 |
| -* Fix overflow warning when generating numeric stats for large integers. |
266 |
| -* Set max value count in schema when the feature has same valency, thereby |
267 |
| - inferring shape for multivalent required features. |
268 |
| -* Fix divide by zero error in natural language stats generator. |
269 |
| -* Add `load_anomalies_text` and `write_anomalies_text` utility functions. |
270 |
| -* Define ReasonFeatureNeeded proto. |
271 |
| -* Add support for Windows OS. |
272 |
| -* Make semantic domain stats generators to take arrow column as input. |
273 |
| -* Fix error in number of missing examples and total number of examples |
274 |
| - computation. |
275 |
| -* Make FeaturesNeeded serializable. |
276 |
| -* Fix memory leak in fast example decoder. |
277 |
| -* Add `semantic_domain_stats_sample_rate` option to compute semantic domain |
278 |
| - statistics over a sample. |
279 |
| -* Increment refcount of None in fast example decoder. |
280 |
| -* Add `compression_type` option to `generate_statistics_from_*` methods. |
281 |
| -* Add link to SysML paper describing some technical details behind TFDV. |
282 |
| -* Add Python types to the source code. |
283 |
| -* Make`GenerateStatistics` generate a DatasetFeatureStatisticsList containing a |
284 |
| - dataset with num_examples == 0 instead of an empty proto if there are no |
285 |
| - examples in the input. |
286 |
| -* Depends on `absl-py>=0.7,<1` |
287 |
| -* Depends on `apache-beam[gcp]>=2.14,<3` |
288 |
| -* Depends on `numpy>=1.16,<2`. |
289 |
| -* Depends on `pandas>=0.24,<1`. |
290 |
| -* Depends on `pyarrow>=0.14.0,<0.15.0`. |
291 |
| -* Depends on `scikit-learn>=0.18,<0.21`. |
292 |
| -* Depends on `tensorflow-metadata>=0.14,<0.15`. |
293 |
| -* Depends on `tensorflow-transform>=0.14,<0.15`. |
294 |
| - |
295 |
| -## Breaking Changes |
296 |
| - |
297 |
| -* Change `examples_threshold` to `values_threshold` and update documentation to |
298 |
| - clarify that counts are of values in semantic domain stats generators. |
299 |
| -* Refactor IdentifyAnomalousExamples to remove sampling and output |
300 |
| - (anomaly reason, example) tuples. |
301 |
| -* Rename `anomaly_proto` parameter in anomalies utilities to `anomalies` to |
302 |
| - make it more consistent with proto and schema utilities. |
303 |
| -* `FeatureNameStatistics` produced by `GenerateStatistics` is now identified |
304 |
| - by its `.path` field instead of the `.name` field. For example: |
305 |
| - |
306 |
| - ``` |
307 |
| - feature { |
308 |
| - name: "my_feature" |
309 |
| - } |
310 |
| - ``` |
311 |
| - becomes: |
312 |
| - |
313 |
| - ``` |
314 |
| - feature { |
315 |
| - path { |
316 |
| - step: "my_feature" |
317 |
| - } |
318 |
| - } |
319 |
| - ``` |
320 |
| -* Change `validate_instance` API to accept an Arrow table instead of a Dict. |
321 |
| -* Change `GenerateStatistics` API to accept Arrow tables as input. |
322 |
| - |
323 |
| -## Deprecations |
324 |
| - |
325 |
| -# Release 0.13.1 |
326 |
| - |
327 |
| -## Major Features and Improvements |
328 |
| - |
329 |
| -## Bug Fixes and Other Changes |
330 |
| - |
331 |
| -* Modify validation logic to raise `SCHEMA_MISSING_COLUMN` anomaly when |
332 |
| - observing a feature with no stats (was still broken, now fixed). |
333 |
| - |
334 |
| -## Breaking Changes |
335 |
| - |
336 |
| -## Deprecations |
337 |
| - |
338 |
| -# Release 0.13.0 |
339 |
| - |
340 |
| -## Major Features and Improvements |
341 |
| - |
342 |
| -* Use joblib to exploit multiprocessing when computing statistics over a pandas |
343 |
| - dataframe. |
344 |
| -* Add support for semantic domain related statistics (natural language, image), |
345 |
| - enabled by `StatsOptions.enable_semantic_domain_stats`. |
346 |
| -* Python 3.5 is supported. |
347 |
| - |
348 |
| -## Bug Fixes and Other Changes |
349 |
| - |
350 |
| -* Expand unit test coverage. |
351 |
| -* Modify validation logic to raise `SCHEMA_MISSING_COLUMN` anomaly when |
352 |
| - observing a feature with no stats. |
353 |
| -* Add utility functions `write_stats_text` and `load_stats_text` to write and |
354 |
| - load DatasetFeatureStatisticsList protos. |
355 |
| -* Avoid using multiprocessing by default when generating statistics over a |
356 |
| - dataframe. |
357 |
| -* Depends on `joblib>=0.12,<1`. |
358 |
| -* Depends on `tensorflow-transform>=0.13,<0.14`. |
359 |
| -* Depends on `tensorflow-metadata>=0.12.1,<0.14`. |
360 |
| -* Requires pre-installed `tensorflow>=1.13.1,<2`. |
361 |
| -* Depends on `apache-beam[gcp]>=2.11,<3`. |
362 |
| -* Depends on `absl>=0.1.6,<1`. |
363 |
| - |
364 |
| -## Breaking Changes |
365 |
| - |
366 |
| -## Deprecations |
367 |
| - |
368 |
| -# Release 0.12.0 |
369 |
| - |
370 |
| -## Major Features and Improvements |
371 |
| - |
372 |
| -* Add support for computing statistics over slices of data. |
373 |
| -* Performance improvement due to optimizing inner loops. |
374 |
| -* Add support for generating statistics from a pandas dataframe. |
375 |
| -* Performance improvement due to pre-allocating tf.Example in |
376 |
| - TFExampleDecoder. |
377 |
| -* Performance improvement due to merging common stats generator, numeric stats |
378 |
| - generator and string stats generator as a single basic stats generator. |
379 |
| -* Performance improvement due to merging top-k and uniques generators. |
380 |
| -* Add a `validate_instance` function, which checks a single example for |
381 |
| - anomalies. |
382 |
| -* Add a utility method `get_statistics_html`, which returns HTML that can be |
383 |
| - used for Facets visualization outside of a notebook. |
384 |
| -* Add support for schema inference of semantic domains. |
385 |
| -* Performance improvement on statistics computation over a pandas dataframe. |
386 |
| - |
387 |
| -## Bug Fixes and Other Changes |
388 |
| - |
389 |
| -* Use constant '__BYTES_VALUE__' in the statistics proto to represent a bytes |
390 |
| - value which cannot be decoded as a utf-8 string. |
391 |
| -* Introduced CombinerFeatureStatsGenerator, a specialized interface for |
392 |
| - combiners that do not require cross-feature computations. |
393 |
| -* Expand unit test coverage. |
394 |
| -* Add optional frequency threshold that allows keeping only the most frequent |
395 |
| - values that are present in a minimum number of examples. |
396 |
| -* Add optional desired batch size that allows specification of the number of |
397 |
| - examples to include in each batch. |
398 |
| -* Depends on `numpy>=1.14.5,<2`. |
399 |
| -* Depends on `protobuf>=3.6.1,<4`. |
400 |
| -* Depends on `apache-beam[gcp]>=2.10,<3`. |
401 |
| -* Depends on `tensorflow-metadata>=0.12.1,<0.13`. |
402 |
| -* Depends on `scikit-learn>=0.18,<1`. |
403 |
| -* Depends on `IPython>=5.0`. |
404 |
| -* Requires pre-installed `tensorflow>=1.12,<2`. |
405 |
| -* Revise example notebook and update it to be able to run in Colab and Jupyter. |
406 |
| - |
407 |
| -## Breaking changes |
408 |
| -* Represent batch as a list of ndarrays instead of ndarrays of ndarrays. |
409 |
| -* Modify decoders to return ndarrays of type numpy.float32 for FLOAT features. |
410 |
| - |
411 |
| -## Deprecations |
412 |
| - |
413 |
| -# Release 0.11.0 |
414 |
| - |
415 |
| -## Major Features and Improvements |
416 |
| - |
417 |
| -* Add option to infer feature types from schema when generating statistics over |
418 |
| - CSV data. |
419 |
| -* Add utility method `set_domain` to set the domain of a feature in the schema. |
420 |
| -* Add option to compute weighted statistics by providing a weight feature. |
421 |
| -* Add a PTransform for decoding TF examples. |
422 |
| -* Add utility methods `write_schema_text` and `load_schema_text` to write and |
423 |
| - load the schema protocol buffer. |
424 |
| -* Add option to compute statistics over a sample. |
425 |
| -* Optimize performance of statistics computation (~2x improvement on benchmark |
426 |
| - datasets). |
427 |
| - |
428 |
| -## Bug Fixes and Other Changes |
429 |
| - |
430 |
| -* Depends on `apache-beam[gcp]>=2.8,<3`. |
431 |
| -* Depends on `tensorflow-transform>=0.11,<0.12`. |
432 |
| -* Depends on `tensorflow-metadata>=0.9,<0.10`. |
433 |
| -* Fix bug in clearing oneof domain\_info field in Feature proto. |
434 |
| -* Fix overflow error for large integers by casting them to STRING type. |
435 |
| -* Added API docs. |
436 |
| - |
437 |
| -## Breaking changes |
438 |
| - |
439 |
| -* Requires pre-installed `tensorflow>=1.11,<2`. |
440 |
| -* Make tf.Example decoder to represent a feature with no value list as a |
441 |
| - missing value (None). |
442 |
| -* Make StatsOptions as a class. |
443 |
| - |
444 |
| -## Deprecations |
445 |
| - |
446 |
| -# Release 0.9.0 |
447 |
| - |
448 |
| -* Initial release of TensorFlow Data Validation. |
0 commit comments