Skip to content

Commit 8aa9773

Browse files
[SDP] FlowAnalysis.readGraphInput
1 parent 8e29070 commit 8aa9773

File tree

3 files changed

+28
-2
lines changed

3 files changed

+28
-2
lines changed

docs/declarative-pipelines/FlowAnalysis.md

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,11 @@ readStreamInput(
104104
streamingReadOptions: StreamingReadOptions): DataFrame
105105
```
106106

107-
`readStreamInput`...FIXME
107+
`readStreamInput` resolves the given `name` (in the given [FlowAnalysisContext](FlowAnalysisContext.md)).
108+
109+
For an `InternalDatasetIdentifier` (that is defined by the current pipeline), `readStreamInput` [readGraphInput](#readGraphInput).
110+
111+
For an `ExternalDatasetIdentifier` (that is external to the current pipeline), `readStreamInput` [readExternalStreamInput](#readExternalStreamInput).
108112

109113
---
110114

@@ -133,7 +137,22 @@ readGraphInput(
133137
readOptions: InputReadOptions): DataFrame
134138
```
135139

136-
`readGraphInput`...FIXME
140+
!!! note "Load DataFrame"
141+
It is up to the [Input](Input.md) (for the given `InternalDatasetIdentifier`) to [load a DataFrame](Input.md#load) that may either be batch or streaming.
142+
143+
`readGraphInput` records the input dataset identifier in the given [FlowAnalysisContext](FlowAnalysisContext.md#requestedInputs).
144+
145+
??? note "SparkException"
146+
For a dataset not defined in the dataflow graph (the given `InternalDatasetIdentifier` not being available in the [FlowAnalysisContext](FlowAnalysisContext.md#allInputs)),
147+
`readGraphInput` reports a `SparkException`.
148+
149+
`readGraphInput` finds the [Input](Input.md) for the given `InternalDatasetIdentifier` (in the [FlowAnalysisContext](FlowAnalysisContext.md#availableInput)).
150+
151+
`readGraphInput` requests the `Input` to [load a DataFrame](Input.md#load) (with the given [InputReadOptions](InputReadOptions.md)).
152+
153+
`readGraphInput` records a `ResolvedInput` in the [FlowAnalysisContext](FlowAnalysisContext.md) (in [streamingInputs](FlowAnalysisContext.md#streamingInputs) or [batchInputs](FlowAnalysisContext.md#batchInputs) for `StreamingReadOptions` or `BatchReadOptions`, respectively).
154+
155+
In the end, `readGraphInput` creates a (streaming or batch) `Dataset`.
137156

138157
---
139158

docs/declarative-pipelines/Input.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,13 @@ load(
1111
readOptions: InputReadOptions): DataFrame
1212
```
1313

14+
Loads a `DataFrame` with the given [InputReadOptions](InputReadOptions.md)
15+
1416
See:
1517

18+
* [ResolvedFlow](ResolvedFlow.md#load)
1619
* [Table](Table.md#load)
20+
* [VirtualTableInput](VirtualTableInput.md#load)
1721

1822
Used when:
1923

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# VirtualTableInput
2+
3+
`VirtualTableInput` is...FIXME

0 commit comments

Comments
 (0)