Skip to content

Commit c7843d2

Browse files
paulgc17tf-data-validation-team
authored andcommitted
Update API docs.
PiperOrigin-RevId: 236202555
1 parent 1b9acba commit c7843d2

File tree

9 files changed

+20
-14
lines changed

9 files changed

+20
-14
lines changed

g3doc/api_docs/python/tfdv/GenerateStatistics.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ Initializes the transform.
3737

3838
#### Args:
3939

40-
* <b>`options`</b>: Options for generating data statistics.
40+
* <b>`options`</b>: <a href="../tfdv/StatsOptions.md"><code>tfdv.StatsOptions</code></a> for generating data statistics.
4141

4242

4343
#### Raises:

g3doc/api_docs/python/tfdv/StatsOptions.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,8 @@ __init__(
4242
num_quantiles_histogram_buckets=10,
4343
epsilon=0.01,
4444
infer_type_from_schema=False,
45-
desired_batch_size=None
45+
desired_batch_size=None,
46+
enable_semantic_domain_stats=False
4647
)
4748
```
4849

@@ -95,6 +96,8 @@ Initializes statistics options.
9596
on CSV data.
9697
* <b>`desired_batch_size`</b>: An optional number of examples to include in each
9798
batch that is passed to the statistics generators.
99+
* <b>`enable_semantic_domain_stats`</b>: If True statistics for semantic domains are
100+
generated (e.g: image, text domains).
98101

99102

100103

g3doc/api_docs/python/tfdv/generate_statistics_from_csv.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ PTransform API directly instead.
3737
* <b>`output_path`</b>: The file path to output data statistics result to. If None, we
3838
use a temporary directory. It will be a TFRecord file containing a single
3939
data statistics proto, and can be read with the 'load_statistics' API.
40-
* <b>`stats_options`</b>: Options for generating data statistics.
40+
* <b>`stats_options`</b>: <a href="../tfdv/StatsOptions.md"><code>tfdv.StatsOptions</code></a> for generating data statistics.
4141
* <b>`pipeline_options`</b>: Optional beam pipeline options. This allows users to
4242
specify various beam pipeline execution parameters like pipeline runner
4343
(DirectRunner or DataflowRunner), cloud dataflow service project id, etc.

g3doc/api_docs/python/tfdv/generate_statistics_from_dataframe.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@
88
``` python
99
tfdv.generate_statistics_from_dataframe(
1010
dataframe,
11-
stats_options=options.StatsOptions()
11+
stats_options=options.StatsOptions(),
12+
n_jobs=1
1213
)
1314
```
1415

@@ -20,7 +21,9 @@ as a pandas DataFrame.
2021
#### Args:
2122

2223
* <b>`dataframe`</b>: Input pandas DataFrame.
23-
* <b>`stats_options`</b>: Options for generating data statistics.
24+
* <b>`stats_options`</b>: <a href="../tfdv/StatsOptions.md"><code>tfdv.StatsOptions</code></a> for generating data statistics.
25+
* <b>`n_jobs`</b>: Number of processes to run (defaults to 1). If -1 is provided,
26+
uses the same number of processes as the number of CPU cores.
2427

2528

2629
#### Returns:

g3doc/api_docs/python/tfdv/generate_statistics_from_tfrecord.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ PTransform API directly instead.
3030
* <b>`output_path`</b>: The file path to output data statistics result to. If None, we
3131
use a temporary directory. It will be a TFRecord file containing a single
3232
data statistics proto, and can be read with the 'load_statistics' API.
33-
* <b>`stats_options`</b>: Options for generating data statistics.
33+
* <b>`stats_options`</b>: <a href="../tfdv/StatsOptions.md"><code>tfdv.StatsOptions</code></a> for generating data statistics.
3434
* <b>`pipeline_options`</b>: Optional beam pipeline options. This allows users to
3535
specify various beam pipeline execution parameters like pipeline runner
3636
(DirectRunner or DataflowRunner), cloud dataflow service project id, etc.

g3doc/api_docs/python/tfdv/validate_instance.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,8 @@ If an optional `environment` is specified, the schema is filtered using the
2222

2323
* <b>`instance`</b>: A single example in the form of a dict mapping a feature name to a
2424
numpy array.
25-
* <b>`options`</b>: Options for generating data statistics. This must contain a
26-
schema.
25+
* <b>`options`</b>: <a href="../tfdv/StatsOptions.md"><code>tfdv.StatsOptions</code></a> for generating data statistics. This must
26+
contain a schema.
2727
* <b>`environment`</b>: An optional string denoting the validation environment. Must be
2828
one of the default environments specified in the schema. In some cases
2929
introducing slight schema variations is necessary, for instance features

tensorflow_data_validation/api/stats_api.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ def __init__(
8383
"""Initializes the transform.
8484
8585
Args:
86-
options: Options for generating data statistics.
86+
options: `tfdv.StatsOptions` for generating data statistics.
8787
8888
Raises:
8989
TypeError: If options is not of the expected type.

tensorflow_data_validation/api/validation_api.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -276,8 +276,8 @@ def validate_instance(
276276
Args:
277277
instance: A single example in the form of a dict mapping a feature name to a
278278
numpy array.
279-
options: Options for generating data statistics. This must contain a
280-
schema.
279+
options: `tfdv.StatsOptions` for generating data statistics. This must
280+
contain a schema.
281281
environment: An optional string denoting the validation environment. Must be
282282
one of the default environments specified in the schema. In some cases
283283
introducing slight schema variations is necessary, for instance features

tensorflow_data_validation/utils/stats_gen_lib.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ def generate_statistics_from_tfrecord(
6565
output_path: The file path to output data statistics result to. If None, we
6666
use a temporary directory. It will be a TFRecord file containing a single
6767
data statistics proto, and can be read with the 'load_statistics' API.
68-
stats_options: Options for generating data statistics.
68+
stats_options: `tfdv.StatsOptions` for generating data statistics.
6969
pipeline_options: Optional beam pipeline options. This allows users to
7070
specify various beam pipeline execution parameters like pipeline runner
7171
(DirectRunner or DataflowRunner), cloud dataflow service project id, etc.
@@ -128,7 +128,7 @@ def generate_statistics_from_csv(
128128
output_path: The file path to output data statistics result to. If None, we
129129
use a temporary directory. It will be a TFRecord file containing a single
130130
data statistics proto, and can be read with the 'load_statistics' API.
131-
stats_options: Options for generating data statistics.
131+
stats_options: `tfdv.StatsOptions` for generating data statistics.
132132
pipeline_options: Optional beam pipeline options. This allows users to
133133
specify various beam pipeline execution parameters like pipeline runner
134134
(DirectRunner or DataflowRunner), cloud dataflow service project id, etc.
@@ -182,7 +182,7 @@ def generate_statistics_from_dataframe(
182182
183183
Args:
184184
dataframe: Input pandas DataFrame.
185-
stats_options: Options for generating data statistics.
185+
stats_options: `tfdv.StatsOptions` for generating data statistics.
186186
n_jobs: Number of processes to run (defaults to 1). If -1 is provided,
187187
uses the same number of processes as the number of CPU cores.
188188

0 commit comments

Comments
 (0)