Skip to content

Commit 9e29f13

Browse files
authored
Update RELEASE.md
1 parent 442dacd commit 9e29f13

File tree

1 file changed

+0
-389
lines changed

1 file changed

+0
-389
lines changed

RELEASE.md

Lines changed: 0 additions & 389 deletions
Original file line numberDiff line numberDiff line change
@@ -57,392 +57,3 @@
5757
## Deprecations
5858

5959
* Deprecating Py2 support.
60-
61-
# Release 0.21.5
62-
63-
## Major Features and Improvements
64-
65-
* Add `label_feature` to `StatsOptions` and enable `LiftStatsGenerator` when
66-
`label_feature` and `schema` are provided.
67-
* Add JSON serialization support for StatsOptions.
68-
69-
## Bug Fixes and Other Changes
70-
* Only requires `avro-python3>=1.8.1,!=1.9.2.*,<2.0.0` on Python 3.5 + MacOS
71-
72-
## Breaking Changes
73-
74-
## Deprecations
75-
76-
# Release 0.21.4
77-
78-
## Major Features and Improvements
79-
80-
* Support visualizing feature value lift in facets visualization.
81-
82-
## Bug Fixes and Other Changes
83-
84-
* Fix issue writing out string feature values in LiftStatsGenerator.
85-
* Requires 'apache-beam[gcp]>=2.17,<3'.
86-
* Requires 'tensorflow-transform>=0.21.1,<0.22'.
87-
* Requires 'tfx-bsl>=0.21.3,<0.22'.
88-
89-
## Breaking Changes
90-
91-
## Deprecations
92-
93-
# Release 0.21.2
94-
95-
## Major Features and Improvements
96-
97-
## Bug Fixes and Other Changes
98-
99-
* Fix facets visualization.
100-
* Optimize LiftStatsGenerator for string features.
101-
* Make `_WeightedCounter` serializable.
102-
* Add support computing for weighted examples in LiftStatsGenerator.
103-
104-
## Breaking Changes
105-
106-
## Deprecations
107-
108-
* `tfdv.TFExampleDecoder` has been removed. This legacy decoder converts
109-
serialized `tf.Example` to a dict of numpy arrays, which is the legacy
110-
input format (prior to Apache Arrow). TFDV has stopped accepting that format
111-
since 0.14. Use `tfdv.DecodeTFExample` instead.
112-
113-
# Release 0.21.1
114-
115-
## Major Features and Improvements
116-
117-
## Bug Fixes and Other Changes
118-
* Do validation on weighted feature stats.
119-
* During schema inference, skip features which are missing common stats. This
120-
makes schema inference work when the input stats are generated from some
121-
pre-existing, unknown schema.
122-
* Fix facets visualization in Chrome >=M80.
123-
124-
## Known Issues
125-
126-
* Running TFDV with Apache Beam 2.18 or 2.19 does not work on Windows. If you
127-
are using TFDV on Windows, use Apache Beam 2.17.
128-
129-
## Breaking Changes
130-
131-
## Deprecations
132-
133-
# Release 0.21.0
134-
135-
## Major Features and Improvements
136-
137-
* Started depending on the CSV parsing / type inferring utilities provided
138-
by `tfx-bsl` (since tfx-bsl 0.15.2). This also brings performance improvements
139-
to the CSV decoder (~2x faster in decoding. Type inferring performance is not
140-
affected).
141-
* Compute bytes statistics for features of BYTES type. Avoid computing topk and
142-
uniques for such features.
143-
* Added LiftStatsGenerator which computes lift between one feature (typically a
144-
label) and all other categorical features.
145-
146-
## Bug Fixes and Other Changes
147-
148-
* Exclude examples in which the entire sparse feature is missing when
149-
calculating sparse feature statistics.
150-
* Validate min_examples_count dataset constraint.
151-
* Document the schema fields, statistics fields, and detection condition for
152-
each anomaly type that TFDV detects.
153-
* Handle null array in cross feature stats generator, top-k & uniques combiner
154-
stats generator, and sklearn mutual information generator.
155-
* Handle infinity in basic stats generator.
156-
* Set num_missing and num_examples correctly in the presence of sparse
157-
features.
158-
* Compute weighted feature stats for all weighted features declared in schema.
159-
* Enforce that mutual information is non-negative.
160-
* Depends on `tensorflow-metadata>=0.21.0,<0.22`.
161-
* Depends on `pyarrow>=0.15` (removed the upper bound as it is determined by
162-
`tfx-bsl`).
163-
* Depends on `tfx-bsl>=0.21.0,<0.22`
164-
* Depends on `apache-beam>=2.17,<3`
165-
* Validate that float feature does not contain NaNs (if disallow_nan is True).
166-
167-
## Breaking Changes
168-
169-
* Changed the behavior regarding to statistics over CSV data:
170-
171-
- Previously, if a CSV column was mixed with integers and empty strings, FLOAT
172-
statistics will be collected for that column. A change was made so INT
173-
statistics would be collected instead.
174-
175-
* Removed `csv_decoder.DecodeCSVToDict` as `Dict[str, np.ndarray]` had no longer
176-
been the internal data representation any more since 0.14.
177-
178-
## Deprecations
179-
180-
# Release 0.15.0
181-
182-
## Major Features and Improvements
183-
184-
* Generate statistics for sparse features.
185-
* Directly convert a batch of tf.Examples to Arrow tables. Avoids conversion of
186-
tf.Example to intermediate Dict representation.
187-
188-
## Bug Fixes and Other Changes
189-
190-
* Generate statistics for the weight feature.
191-
* Support validation and schema inference from sliced statistics that include
192-
the default slice (validation/inference will be done using the default slice
193-
statistics).
194-
* Avoid flattening null arrays.
195-
* Set `weighted_num_examples` field in the statistics proto if a weight
196-
feature is specified.
197-
* Replace DecodedExamplesToTable with a Python implementation.
198-
* Building TFDV from source does not need pyarrow anymore.
199-
* Depends on `apache-beam[gcp]>=2.16,<3`.
200-
* Depends on `six>=1.12,<2`.
201-
* Depends on `scikit-learn>=0.18,<0.22`.
202-
* Depends on `tfx-bsl>=0.15,<0.16`.
203-
* Depends on `tensorflow-metadata>=0.15,<0.16`.
204-
* Depends on `tensorflow-transform>=0.15,<0.16`.
205-
* Depends on `tensorflow>=1.15,<3`.
206-
* Starting from 1.15, package
207-
`tensorflow` comes with GPU support. Users won't need to choose between
208-
`tensorflow` and `tensorflow-gpu`.
209-
* Caveat: `tensorflow` 2.0.0 is an exception and does not have GPU
210-
support. If `tensorflow-gpu` 2.0.0 is installed before installing
211-
`tensorflow-data-validation`, it will be replaced with `tensorflow` 2.0.0.
212-
Re-install `tensorflow-gpu` 2.0.0 if needed.
213-
214-
## Breaking Changes
215-
216-
## Deprecations
217-
218-
# Release 0.14.1
219-
220-
## Major Features and Improvements
221-
222-
* Add support for custom schema transformations when inferring schema.
223-
224-
## Bug Fixes and Other Changes
225-
226-
* Fix incorrect file hashes in the TFDV wheel.
227-
* Fix DOMException when embedding visualization in iframe.
228-
229-
## Breaking Changes
230-
231-
## Deprecations
232-
233-
# Release 0.14.0
234-
235-
## Major Features and Improvements
236-
237-
* Performance improvement due to optimizing inner loops.
238-
* Add support for time semantic domain related statistics.
239-
* Performance improvement due to batching accumulators before merging.
240-
* Add utility method `validate_examples_in_tfrecord`, which identifies anomalous
241-
examples in TFRecord files containing TFExamples and generates statistics for
242-
those anomalous examples.
243-
* Add utility method `validate_examples_in_csv`, which identifies anomalous
244-
examples in CSV files and generates statistics for those anomalous examples.
245-
* Add fast TF example decoder written in C++.
246-
* Make `BasicStatsGenerator` to take arrow table as input. Example batches are
247-
converted to Apache Arrow tables internally and we are able to make use of
248-
vectorized numpy functions. Improved performance of BasicStatsGenerator
249-
by ~40x.
250-
* Make `TopKUniquesStatsGenerator` and `TopKUniquesCombinerStatsGenerator` to
251-
take arrow table as input.
252-
* Add `update_schema` API which updates the schema to conform to statistics.
253-
* Add support for validating changes in the number of examples between the
254-
current and previous spans of data (using the existing `validate_statistics`
255-
function).
256-
* Support building a manylinux2010 compliant wheel in docker.
257-
* Add support for cross feature statistics.
258-
259-
## Bug Fixes and Other Changes
260-
261-
* Expand unit test coverage.
262-
* Update natural language stats generator to generate stats if actual ratio
263-
equals `match_ratio`.
264-
* Use `__slots__` in accumulators.
265-
* Fix overflow warning when generating numeric stats for large integers.
266-
* Set max value count in schema when the feature has same valency, thereby
267-
inferring shape for multivalent required features.
268-
* Fix divide by zero error in natural language stats generator.
269-
* Add `load_anomalies_text` and `write_anomalies_text` utility functions.
270-
* Define ReasonFeatureNeeded proto.
271-
* Add support for Windows OS.
272-
* Make semantic domain stats generators to take arrow column as input.
273-
* Fix error in number of missing examples and total number of examples
274-
computation.
275-
* Make FeaturesNeeded serializable.
276-
* Fix memory leak in fast example decoder.
277-
* Add `semantic_domain_stats_sample_rate` option to compute semantic domain
278-
statistics over a sample.
279-
* Increment refcount of None in fast example decoder.
280-
* Add `compression_type` option to `generate_statistics_from_*` methods.
281-
* Add link to SysML paper describing some technical details behind TFDV.
282-
* Add Python types to the source code.
283-
* Make`GenerateStatistics` generate a DatasetFeatureStatisticsList containing a
284-
dataset with num_examples == 0 instead of an empty proto if there are no
285-
examples in the input.
286-
* Depends on `absl-py>=0.7,<1`
287-
* Depends on `apache-beam[gcp]>=2.14,<3`
288-
* Depends on `numpy>=1.16,<2`.
289-
* Depends on `pandas>=0.24,<1`.
290-
* Depends on `pyarrow>=0.14.0,<0.15.0`.
291-
* Depends on `scikit-learn>=0.18,<0.21`.
292-
* Depends on `tensorflow-metadata>=0.14,<0.15`.
293-
* Depends on `tensorflow-transform>=0.14,<0.15`.
294-
295-
## Breaking Changes
296-
297-
* Change `examples_threshold` to `values_threshold` and update documentation to
298-
clarify that counts are of values in semantic domain stats generators.
299-
* Refactor IdentifyAnomalousExamples to remove sampling and output
300-
(anomaly reason, example) tuples.
301-
* Rename `anomaly_proto` parameter in anomalies utilities to `anomalies` to
302-
make it more consistent with proto and schema utilities.
303-
* `FeatureNameStatistics` produced by `GenerateStatistics` is now identified
304-
by its `.path` field instead of the `.name` field. For example:
305-
306-
```
307-
feature {
308-
name: "my_feature"
309-
}
310-
```
311-
becomes:
312-
313-
```
314-
feature {
315-
path {
316-
step: "my_feature"
317-
}
318-
}
319-
```
320-
* Change `validate_instance` API to accept an Arrow table instead of a Dict.
321-
* Change `GenerateStatistics` API to accept Arrow tables as input.
322-
323-
## Deprecations
324-
325-
# Release 0.13.1
326-
327-
## Major Features and Improvements
328-
329-
## Bug Fixes and Other Changes
330-
331-
* Modify validation logic to raise `SCHEMA_MISSING_COLUMN` anomaly when
332-
observing a feature with no stats (was still broken, now fixed).
333-
334-
## Breaking Changes
335-
336-
## Deprecations
337-
338-
# Release 0.13.0
339-
340-
## Major Features and Improvements
341-
342-
* Use joblib to exploit multiprocessing when computing statistics over a pandas
343-
dataframe.
344-
* Add support for semantic domain related statistics (natural language, image),
345-
enabled by `StatsOptions.enable_semantic_domain_stats`.
346-
* Python 3.5 is supported.
347-
348-
## Bug Fixes and Other Changes
349-
350-
* Expand unit test coverage.
351-
* Modify validation logic to raise `SCHEMA_MISSING_COLUMN` anomaly when
352-
observing a feature with no stats.
353-
* Add utility functions `write_stats_text` and `load_stats_text` to write and
354-
load DatasetFeatureStatisticsList protos.
355-
* Avoid using multiprocessing by default when generating statistics over a
356-
dataframe.
357-
* Depends on `joblib>=0.12,<1`.
358-
* Depends on `tensorflow-transform>=0.13,<0.14`.
359-
* Depends on `tensorflow-metadata>=0.12.1,<0.14`.
360-
* Requires pre-installed `tensorflow>=1.13.1,<2`.
361-
* Depends on `apache-beam[gcp]>=2.11,<3`.
362-
* Depends on `absl>=0.1.6,<1`.
363-
364-
## Breaking Changes
365-
366-
## Deprecations
367-
368-
# Release 0.12.0
369-
370-
## Major Features and Improvements
371-
372-
* Add support for computing statistics over slices of data.
373-
* Performance improvement due to optimizing inner loops.
374-
* Add support for generating statistics from a pandas dataframe.
375-
* Performance improvement due to pre-allocating tf.Example in
376-
TFExampleDecoder.
377-
* Performance improvement due to merging common stats generator, numeric stats
378-
generator and string stats generator as a single basic stats generator.
379-
* Performance improvement due to merging top-k and uniques generators.
380-
* Add a `validate_instance` function, which checks a single example for
381-
anomalies.
382-
* Add a utility method `get_statistics_html`, which returns HTML that can be
383-
used for Facets visualization outside of a notebook.
384-
* Add support for schema inference of semantic domains.
385-
* Performance improvement on statistics computation over a pandas dataframe.
386-
387-
## Bug Fixes and Other Changes
388-
389-
* Use constant '__BYTES_VALUE__' in the statistics proto to represent a bytes
390-
value which cannot be decoded as a utf-8 string.
391-
* Introduced CombinerFeatureStatsGenerator, a specialized interface for
392-
combiners that do not require cross-feature computations.
393-
* Expand unit test coverage.
394-
* Add optional frequency threshold that allows keeping only the most frequent
395-
values that are present in a minimum number of examples.
396-
* Add optional desired batch size that allows specification of the number of
397-
examples to include in each batch.
398-
* Depends on `numpy>=1.14.5,<2`.
399-
* Depends on `protobuf>=3.6.1,<4`.
400-
* Depends on `apache-beam[gcp]>=2.10,<3`.
401-
* Depends on `tensorflow-metadata>=0.12.1,<0.13`.
402-
* Depends on `scikit-learn>=0.18,<1`.
403-
* Depends on `IPython>=5.0`.
404-
* Requires pre-installed `tensorflow>=1.12,<2`.
405-
* Revise example notebook and update it to be able to run in Colab and Jupyter.
406-
407-
## Breaking changes
408-
* Represent batch as a list of ndarrays instead of ndarrays of ndarrays.
409-
* Modify decoders to return ndarrays of type numpy.float32 for FLOAT features.
410-
411-
## Deprecations
412-
413-
# Release 0.11.0
414-
415-
## Major Features and Improvements
416-
417-
* Add option to infer feature types from schema when generating statistics over
418-
CSV data.
419-
* Add utility method `set_domain` to set the domain of a feature in the schema.
420-
* Add option to compute weighted statistics by providing a weight feature.
421-
* Add a PTransform for decoding TF examples.
422-
* Add utility methods `write_schema_text` and `load_schema_text` to write and
423-
load the schema protocol buffer.
424-
* Add option to compute statistics over a sample.
425-
* Optimize performance of statistics computation (~2x improvement on benchmark
426-
datasets).
427-
428-
## Bug Fixes and Other Changes
429-
430-
* Depends on `apache-beam[gcp]>=2.8,<3`.
431-
* Depends on `tensorflow-transform>=0.11,<0.12`.
432-
* Depends on `tensorflow-metadata>=0.9,<0.10`.
433-
* Fix bug in clearing oneof domain\_info field in Feature proto.
434-
* Fix overflow error for large integers by casting them to STRING type.
435-
* Added API docs.
436-
437-
## Breaking changes
438-
439-
* Requires pre-installed `tensorflow>=1.11,<2`.
440-
* Make tf.Example decoder to represent a feature with no value list as a
441-
missing value (None).
442-
* Make StatsOptions as a class.
443-
444-
## Deprecations
445-
446-
# Release 0.9.0
447-
448-
* Initial release of TensorFlow Data Validation.

0 commit comments

Comments
 (0)