|
1 | | -# Hdfs output plugin for Embulk |
| 1 | +# Hdfs file output plugin for Embulk |
2 | 2 |
|
3 | 3 | A File Output Plugin for Embulk to write HDFS. |
4 | 4 |
|
5 | 5 | ## Overview |
6 | 6 |
|
7 | 7 | * **Plugin type**: file output |
8 | | -* **Load all or nothing**: no |
| 8 | +* **Load all or nothing**: yes |
9 | 9 | * **Resume supported**: no |
10 | 10 | * **Cleanup supported**: no |
11 | 11 |
|
12 | 12 | ## Configuration |
13 | 13 |
|
14 | 14 | - **config_files** list of paths to Hadoop's configuration files (array of strings, default: `[]`) |
15 | 15 | - **config** overwrites configuration parameters (hash, default: `{}`) |
16 | | -- **output_path** the path finally stored files. (string, default: `"/tmp/embulk.output.hdfs_output.%Y%m%d_%s"`) |
17 | | -- **working_path** the path temporary stored files. (string, default: `"/tmp/embulk.working.hdfs_output.%Y%m%d_%s"`) |
| 16 | +- **path_prefix** prefix of target files (string, required) |
| 17 | +- **file_ext** suffix of target files (string, required) |
| 18 | +- **sequence_format** format for sequence part of target files (string, default: `'.%03d.%02d'`) |
| 19 | +- **rewind_seconds** When you use Date format in path_prefix property(like `/tmp/embulk/%Y-%m-%d/out`), the format is interpreted by using the time which is Now minus this property. (int, default: `0`) |
| 20 | +- **overwrite** overwrite files when the same filenames already exists (boolean, default: `false`) |
| 21 | + - *caution*: even if this property is `true`, this does not mean ensuring the idempotence. if you want to ensure the idempotence, you need the procedures to remove output files after or before running. |
18 | 22 |
|
19 | 23 | ## Example |
20 | 24 |
|
|
24 | 28 | config_files: |
25 | 29 | - /etc/hadoop/conf/core-site.xml |
26 | 30 | - /etc/hadoop/conf/hdfs-site.xml |
27 | | - - /etc/hadoop/conf/mapred-site.xml |
28 | | - - /etc/hadoop/conf/yarn-site.xml |
29 | 31 | config: |
30 | 32 | fs.defaultFS: 'hdfs://hdp-nn1:8020' |
31 | | - dfs.replication: 1 |
32 | | - mapreduce.client.submit.file.replication: 1 |
33 | 33 | fs.hdfs.impl: 'org.apache.hadoop.hdfs.DistributedFileSystem' |
34 | 34 | fs.file.impl: 'org.apache.hadoop.fs.LocalFileSystem' |
| 35 | + path_prefix: '/tmp/embulk/hdfs_output/%Y-%m-%d/out' |
| 36 | + file_ext: 'txt' |
| 37 | + overwrite: true |
35 | 38 | formatter: |
36 | 39 | type: csv |
37 | 40 | encoding: UTF-8 |
|
0 commit comments