Skip to content

Format string and dictionary options are incompatible #124

@sebastic

Description

@sebastic

Describe the bug
stetl fails when using a configuration option with a dictionary value and an arguments dictionary.

Example config from: https://github.com/geopython/stetl/blob/master/examples/basics/11_formatconvert/etl.cfg#L46

# The GML must be a simple features collection
[convert_to_geojson]
class = stetl.filters.formatconverter.FormatConverter
input_format = etree_doc
output_format = geojson_collection
converter_args = {
    'root_tag': 'FeatureCollection',
    'feature_tag': 'featureMember',
    'feature_id_attr': 'fid'
    }

To Reproduce

$ PYTHONPATH=. python3 bin/stetl -c examples/basics/11_formatconvert/etl.cfg -a foo=bar
2021-11-29 14:49:25,134 util INFO Found lxml.etree, native XML parsing, fabulous!
2021-11-29 14:49:25,188 util INFO Found GDAL/OGR Python bindings, super!!
2021-11-29 14:49:25,190 main INFO Stetl version = 2.1.dev0
2021-11-29 14:49:25,191 ETL INFO INIT - Stetl version is 2.1.dev0
2021-11-29 14:49:25,191 ETL INFO Config/working dir = /home/bas/git/nlextract/nlextract/externals/stetl/examples/basics/11_formatconvert
2021-11-29 14:49:25,191 ETL INFO Reading config_file = examples/basics/11_formatconvert/etl.cfg
2021-11-29 14:49:25,191 ETL INFO Substituting 0 args in config file from args_dict: []
2021-11-29 14:49:25,191 ETL ERROR Error substituting config arguments: err="\n    'root_tag'"
Traceback (most recent call last):
  File "/home/bas/git/nlextract/nlextract/externals/stetl/bin/stetl", line 43, in <module>
    main()
  File "/home/bas/git/nlextract/nlextract/externals/stetl/bin/stetl", line 35, in main
    etl = ETL(vars(args), args.config_args)
  File "/home/bas/git/nlextract/nlextract/externals/stetl/stetl/etl.py", line 97, in __init__
    raise e
  File "/home/bas/git/nlextract/nlextract/externals/stetl/stetl/etl.py", line 91, in __init__
    config_str = config_str.format(**args_dict)
KeyError: "\n    'root_tag'"

Expected Behavior
The configuration is loaded successfully, including argument substitution.

Context (please complete one or more from the following information):

  • OS: Debian unstable
  • Python Version: 3.9.9
  • Stetl Version: 2.1.dev0
  • Stetl Input/Output/Filter Component: stetl/etl.py
  • Stetl Config file: examples/basics/11_formatconvert/etl.cfg

Additional context
A string2record converter was implemented:

--- a/stetl/filters/formatconverter.py
+++ b/stetl/filters/formatconverter.py
@@ -338,6 +338,29 @@ class FormatConverter(Filter):
         packet.data = etree.fromstring(packet.data)
         return packet
 
+    @staticmethod
+    def string2record(packet, converter_args=None):
+        if(
+            converter_args is not None and
+            'value_column' in converter_args
+        ):
+            key = converter_args['value_column']
+        else:
+            key = 'value'
+
+        record = dict({key: packet.data})
+
+        if(
+            converter_args is not None and
+            'column_data' in converter_args
+        ):
+            for key in converter_args['column_data']:
+                record[key] = converter_args['column_data'][key]
+
+        packet.data = record
+
+        return packet
+
     @staticmethod
     def struct2string(packet):
         packet.data = packet.to_string()
@@ -406,6 +429,7 @@ FORMAT_CONVERTERS = {
     },
     FORMAT.string: {
         FORMAT.etree_doc: FormatConverter.string2etree_doc,
+        FORMAT.record: FormatConverter.string2record,
         FORMAT.xml_doc_as_string: FormatConverter.no_op
     },
     FORMAT.struct: {

Which requires configuration like this:

# convert string to record
[convert_string_to_record]
class = stetl.filters.formatconverter.FormatConverter
input_format = string
output_format = record
converter_args = {
        'value_column': 'waarde',
        'column_data': {
            'sleutel': 'levering_xml',
        },
    }

Due to this issue the converters which require converter_args cannot be used in the NLExtract BAGv2 configuration because that sets arguments via options/<hostname>.args.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions