Skip to content

Conversation

@wenzeslaus
Copy link
Member

@wenzeslaus wenzeslaus commented Apr 18, 2023

This adds a Tools class which allows to access GRASS tools (modules) to be accessed using methods. Once an instance is created, calling a tool is calling a function (method) similarly to grass.jupyter.Map. Unlike grass.script, this does not require a general function to be called with a tool name as a parameter, and unlike grass.pygrass module shortcuts, this does not require special objects to mimic the module families just to get tool names into Python syntax.

Outputs are handled through a returned object which is result of automatic capture of outputs and can do conversions from known formats using properties.

The code is included under a new grass.tools package and is marked as experimental which allows merging the code while doing breaking changes after a release.

Features

Function-like calling of tools:

  • All returncode, standard output, and error output are part of one result object similar to what subprocess.run returns.
  • Access and post-processing of the text result (standard output) is done by the result object (its properties or methods).
  • Additionally, the result object can be directly indexed to access a JSON parsed as a Python list or dictionary.
  • Standard input is passed as parameter_name=io.StringIO which takes care of input="-" and piping the text into the subprocess.
  • Tool failure causes an exception with an error message being part of the exception (traceback).
  • A session or env is accepted when creating an object (Tools(session=session)).
  • __dir__ code completion for function names and help() calls work and work even outside of a session.

Other functionality:

  • High-level and low-level usage is possible with four different functions: taking Python function parameters as tool parameters or taking a list of strings forming a command, and processing the parameters before the subprocess call or leaving the parameters as is (2x2 matrix).
  • Handling of inputs and outputs can be customized (e.g., not capturing stdout and stderr).
  • The env parameter is also accepted by individual functions (as with run_command).
  • The Tools object is a context manager, so it can be created with other context managers within one with statement and can include cleanup code in the future.
  • Tools-object-level overwrite and verbosity setting including limited support for the reversed (False) setting which is not possible with the standard flags at the tool level (there is no --overwrite=False in CLI).

Examples

Run a tool:

tools = Tools()
tools.r_random_surface(output="surface", seed=42)

Create a project, start an isolated session, and run tools (XY project for the example):

project = tmpdir / "project"
gs.create_project(project)
with (
    gs.setup.init(project, env=os.environ.copy()) as session,
    Tools(session=session) as tools,
):
    tools.g_region(rows=100, cols=100)
    tools.r_random_surface(output="surface", seed=42)

Work with return values (tool with JSON output):

# accessing a single value from the result:
assert tools.r_info(map="surface", format="json")["rows"] == 100
data = tools.r_info(map="surface", format="json")
# accessing more than one value from the result:
assert data["rows"] == 100
assert data["cols"] == 100

Text input as standard input:

tools.v_in_ascii(
    input=io.StringIO("13.45,29.96,200\n"),
    output="point",
    separator=",",
)

Work with RegionManager and MaskManager (test code):

project = tmpdir / "project"
gs.create_project(project)
with (
    gs.setup.init(project, env=os.environ.copy()) as session,
    gs.RegionManager(rows=100, cols=100, env=session.env),
    gs.MaskManager(env=session.env),
    Tools(session=mapset_session) as tools,
):
    # The tools here respect the local environment,
    # but it does not need to be passed explicitly.
    tools.r_random_surface(output="surface", seed=42)
    tools.r_mask(raster="surface")
    assert tools.r_mask_status(format="json")["present"]
    assert tools.g_region(flags="p", format="json")["rows"] == 100

Commit message

This adds a Tools class which allows to access GRASS tools (modules) to be accessed using methods. Once an instance is created, calling a tool is calling a function (method) similarly to grass.jupyter.Map. Unlike grass.script, this does not require a general function to be called with a tool name as a parameter, and unlike grass.pygrass module shortcuts, this does not require special objects to mimic the module families just to get tool names into Python syntax.

Outputs are handled through a returned object which is result of automatic capture of outputs and can do conversions from known formats using properties. The capture is opt-out rather than opt-in as with the other interfaces. The result object is, in a way, similar to what subprocess.run function returns, but focused on GRASS text formats.

The code is included under a new grass.tools package and is marked as experimental which allows merging the code while doing breaking changes after a release.

The implementation maximizes sharing of the low-level code with grass.script, so handle_errors, Popen, and parameter processing is reused, but with different defaults (exceptions and output capturing). It relies on grass.script.Popen which newly defaults to text=True. Both stdout and stderr are passed as is to the result (so as string by default and bytes when text=False).

Adds also a run subcommand to have a CLI use case for the tools. It runs one tool in XY project, so useful only for things like g.extension or m.proj, but, with a workaround for argparse --help, it can do --help for a tool. Processing of special flags is more robust in grass.tools to accommodate the CLI usage.

Uses the no-copy approach for env taken also elsewhere like init which allows for the changes in the session to be reflected in the tools. This requires overwrite and verbosity setting to be handled in an internal, ad-hoc copy which is avoided if not needed. No special computational region treatment.

Dynamic module attributes are used instead of imports, anticipating standalone tools to live in the same package and users picking one or the other but not both. (This requires silencing a Pylint, but we also could not use the all attribute at all.) The use of lazy imports enables code from grass.tools import Tools without forcing that import on everyone importing anything from the other modules (like standalone tools). The file layout itself should accommodate future additions such as standalone tools.

Using io.StringIO for stdin assumes that any tool which will have stdin will have something like input=- or file=- because there is no other way of getting stdin data to the tool at this point. The cmd functions taking a command as a list take an input parameter for stdin aligning with subprocess.run and Popen.communicate (the name stdin is already used for a pipe flag parameter). Allows for stderr to be captured even without capturing stdout to allow for exceptions with error message while simply printing stdout.

The Tools class can also behave like a context manager which is useful when used with other context managers and also will be useful when more functionality is added (like pack file IO) or for aligning (feature parity) with other Tools-like implementations which will require resource handling (like the standalone tools).

The four different tool calling functions (when tool names are not used as attributes) use short names to always keep the focus on the tool, distinguish smart and basic behavior by run and call, and parameter-based (kwargs) and list (command) by no suffix and _cmd suffix.

Function-like calling of tools:

  • Tool failure causes an exception with an error message being part of the exception (traceback).
  • All returncode, standard output, and error output are part of one result object similar to what subprocess.run returns.
  • Access and post-processing of the text result (standard output) is done by the result object (its properties or methods).
  • Additionally, the result object can be directly indexed to access a JSON parsed as a Python list or dictionary.
  • Standard input is passed as parameter_name=io.StringIO which takes care of input="-" and piping the text into the subprocess.
  • A session or env is accepted when creating an object (Tools(session=session)). No need to pass env later.
  • __dir__ code completion for function names and help() calls work and work even outside of a session.
  • The keyval format processing converts strings to ints and floats.

Other functionality:

  • High-level and low-level usage is possible with four different functions: taking Python function parameters as tool parameters or taking a list of strings forming a command, and processing the parameters before the subprocess call or leaving the parameters as is (2x2 matrix).
  • Handling of inputs and outputs can be customized (e.g., not capturing stdout and stderr).
  • The env parameter is also accepted by individual functions (as with run_command).
  • The Tools object is a context manager, so it can be created with other context managers within one with statement and can include cleanup code in the future.
  • Tools-object-level overwrite and verbosity setting including limited support for the reversed (False) setting which is not possible with the standard flags at the tool level (there is no --overwrite=False in CLI).

Features originally implemented, but at the end removed:

  • Region freezing when Tools object is created.
  • Allow to specify stdin and use a new instance of Tools itself to execute with that stdin through a new function which returns that instance (feed_input_to method and stdin in the constructor).
  • A special function to ignore errors returning a new instance.
  • Overwriting by default like in grass.jupyter.
  • Wrappers with names from the run_command family.
  • Processing of command line call into JSON which is useful only for the pack file IO.
  • Tools in grass.experimental as opposed to grass.tools only marked as experimental.

@landam
Copy link
Member

landam commented Apr 20, 2023

It seems to be an useful addition. On the other hand we have already two APIs to run GRASS modules: grass.script.*_command() and grass.pygrass.modules which is already confusing for the user. What is a benefit of the third one? It would be useful to merge existing APIs into single one instead introducing another one.

@wenzeslaus
Copy link
Member Author

wenzeslaus commented Apr 21, 2023

It seems to be an useful addition.

I still need to provide more context for this, but do you see some benefits already?

On the other hand we have already two APIs to run GRASS modules: grass.script.*_command() and grass.pygrass.modules which is already confusing for the user.

The intro to this is obviously xkcd Standards.

I'm not happy with the two competing interfaces. It's almost three, because we have Module and than also shortcuts.

As far as I understand, grass.script.*_command() was written to closely mimic the Bash experience with minimal involvement of Python. Python layer is mostly just avoiding need to pass all parameters as strings.

grass.pygrass.modules was written to mimic the grass.script.*_command() API and to manipulate the module calls themselves.

What is a benefit of the third one?

The design idea is 1) to make the module (tool) calls as close to Python function calls as possible and 2) to access the results conveniently. To access the (text) results, it tries to mimic subprocess.run.

Additionally, it tries to 1) provide consistent access to all modules and 2) allow for extensibility, e.g., associating session parameters or computational region with a Tools object rather than passing it to every method.

The existing APIs are more general in some ways, especially because they make no assumptions about the output or its size. This API makes the assumption that you want the text output Python or that it is something small and you can just ignore that. If not, you need to use a more general API. After all, Tools itself, is using pipe_command to do the job.

It would be useful to merge existing APIs into single one instead introducing another one.

Given the different goals of the two APIs, I was not able to figure out how these can be merged. For example, the Module class from grass.pygrass was supposed to be a drop-in replacement for run_command, but it was not used that way much (maybe because it forces you to use class as an function). Any suggestions? What would be the features and aspects of each API worth keeping? For example, the Tools object might be able to create instances of the Module class.

I can also see that some parts of the new API could be part of the old ones like output-parsing related properties for the Module class, but there are some existing issues which the new API is trying to fix such as r.slope_aspect spelling in PyGRASS shortcuts and Python function name plus tools name as a string in grass.script.

Finally, the subprocess changed too over the years, introducing new functions with run being the latest addition, so reevaluation of our APIs seems prudent even if it involves adding functions as subprocess did.

Anyway, I think some unification would be an ideal scenario.

@wenzeslaus wenzeslaus force-pushed the add-session-tools-object branch from 96b1d0c to 0c21f1a Compare April 22, 2023 18:34
@wenzeslaus
Copy link
Member Author

wenzeslaus commented Apr 22, 2023

This is how exceptions look like currently in this PR: The error (whole stderr) is part of the exception, i.e., always printed with the traceback, not elsewhere, and it is under the traceback, not above like now (or even somewhere else in case of notebooks and GUI).

Traceback (most recent call last):
  File "experimental/tools.py", line 252, in <module>
    _test()
  File "experimental/tools.py", line 241, in _test
    tools_pro.feed_input_to("13.45,29.96,200").v_in_ascii(
  File "experimental/tools.py", line 185, in wrapper
    return self.run(grass_module, **kwargs)
  File "experimental/tools.py", line 148, in run
    raise gs.CalledModuleError(
grass.exceptions.CalledModuleError: Module run `v.in.ascii input=- output=point format=xstandard` ended with an error.
The subprocess ended with a non-zero return code: 1. See the following errors:

ERROR: Value <xstandard> out of range for parameter <format>
	Legal range: point,standard
Traceback (most recent call last):
  File "experimental/tools.py", line 252, in <module>
    _test()
  File "experimental/tools.py", line 241, in _test
    tools_pro.feed_input_to("13.45,29.96,200").v_in_ascii(
  File "experimental/tools.py", line 185, in wrapper
    return self.run(grass_module, **kwargs)
  File "experimental/tools.py", line 148, in run
    raise gs.CalledModuleError(
grass.exceptions.CalledModuleError: Module run `v.in.ascii input=- output=point format=standard` ended with an error.
The subprocess ended with a non-zero return code: 1. See the following errors:
WARNING: Vector map <point> already exists and will be overwritten
WARNING: Unexpected data in vector header:
         [13.45,29.96,200]
ERROR: Import failed

wenzeslaus added 10 commits June 3, 2023 23:57
This adds a Tools class which allows to access GRASS tools (modules) to be accessed using methods. Once an instance is created, calling a tool is calling a function (method) similarly to grass.jupyter.Map. Unlike grass.script, this does not require generic function name and unlike grass.pygrass module shortcuts, this does not require special objects to mimic the module families.

Outputs are handled through a returned object which is result of automatic capture of outputs and can do conversions from known formats using properties.

Usage example is in the _test() function in the file.

The code is included under new grass.experimental package which allows merging the code even when further breaking changes are anticipated.
@wenzeslaus wenzeslaus force-pushed the add-session-tools-object branch from 7996926 to 24c27e6 Compare June 3, 2023 22:56
@neteler neteler added this to the 8.4.0 milestone Aug 16, 2023
@landam landam added enhancement New feature or request Python Related code is in Python labels Nov 20, 2023
@wenzeslaus wenzeslaus modified the milestones: 8.4.0, Future Apr 26, 2024
@echoix echoix added the conflicts/needs rebase Rebase to or merge with the latest base branch is needed label Nov 7, 2024
@echoix echoix removed the conflicts/needs rebase Rebase to or merge with the latest base branch is needed label Nov 11, 2024
@echoix
Copy link
Member

echoix commented Nov 11, 2024

Solved conflicts

wenzeslaus and others added 2 commits July 21, 2025 09:50
Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com>
Copy link
Member

@echoix echoix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is code that uses pytest tests, code coverage is representative. The tests cover most of the changes, except 4 places.

@wenzeslaus
Copy link
Member Author

Thanks for looking at the coverage @echoix. I still struggle with evaluating it. This should be pretty well covered and my goal indeed is to be at 100% coverage or very close to it.

@echoix
Copy link
Member

echoix commented Jul 25, 2025

Thanks for looking at the coverage @echoix. I still struggle with evaluating it. This should be pretty well covered and my goal indeed is to be at 100% coverage or very close to it.

Indeed, 100% is not always desirable. I just took a quick look, and indeed, almost 100% patch coverage, except two cases, one I think is more easily testable. I’ll write a little comment in these places

echoix
echoix previously approved these changes Jul 25, 2025
Copy link
Member

@echoix echoix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll still approve if you think you don't need to add anything, as it's the cherry on top

@wenzeslaus
Copy link
Member Author

Not only that this is ready, but #6111 (replacing grass.script by grass.tools in written documentation) and #6015 (adding grass.tools to generated tools documentation) are close to going in after this.

@echoix
Copy link
Member

echoix commented Jul 25, 2025

Not only that this is ready, but #6111 (replacing grass.script by grass.tools in written documentation) and #6015 (adding grass.tools to generated tools documentation) are close to going in after this.

I'll make sure this gets merged for the weekend

Copy link
Member

@echoix echoix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve in the meantime, but coverage shouldn't be worse than now. So it would be good to go!

@echoix
Copy link
Member

echoix commented Jul 25, 2025

You made it! 100% patch coverage:
IMG_4345

@echoix
Copy link
Member

echoix commented Jul 25, 2025

@wenzeslaus Do you want to prepare the commit message?

Copy link
Contributor

@petrasovaa petrasovaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look forward to start using the new API.

@wenzeslaus wenzeslaus merged commit bca6f43 into OSGeo:main Jul 27, 2025
28 checks passed
@wenzeslaus wenzeslaus deleted the add-session-tools-object branch July 27, 2025 08:03
@wenzeslaus wenzeslaus mentioned this pull request Jul 28, 2025
wenzeslaus added a commit that referenced this pull request Jul 28, 2025
This adds grass.tools.Tools from #2923 to the generated documentation of individual tools as a new tab similar to the grass.script tab. The documentation and treatment of parameters is the same at this points in grass.tools as in grass.script except for _io.StringIO_.

The _io.StringIO_ type is added based on the gisprompt on a best-guess basis. The current parser metadata don't contain explicit machine-readable information about stdin being allowed. Parser then does not know whether or not a specific tool supports stdin for one of its parameters. The stdin is resolved while opening the file by a library function call at the level of individual tools, but during command line parsing, the parser merely allows for dash being provided instead of a filename. The resulting documentation now contains `"'-' for standard input"` or the like based on what is in each tool. This is slightly misleading for the Tools API because `"-"` won't do anything useful because the only way to supply stdin at this point is _io.StringIO_. However, without extending the parser metadata, I don't see any way how to make it better.

The first first parameter spacing and comma in short doc now needs a special handling because before, we always had a first parameter, namely the name of the tool.  

A care was given to the wording of the experimental part to be clear what part of our API is the experimental one.

This also fixes and clarifies the comment about parse_command versus read_command and run_command.
wenzeslaus added a commit that referenced this pull request Sep 5, 2025
This is adding NumPy array as input and output to tools when called through the Tools object.

My focus with this PR was to create a good API which can be used in various contexts and is useful as is. However, the specifics of the implementation, especially low performance comparing to native data, are secondary issues for me in this addition as long as there is no performance hit for the cases when NumPy arrays are not used which is the case. Even with the performance hits, it works great as a replacement of explicit grass.script.array conversions (same code, just in the background) and in tests (replacing custom tests asserts, and data conversions).

While the interface for inputs is clear (the array with data), the interface for outputs was a pick among many choices (type used as a flag over strings, booleans, empty objects, flags). Strict adherence to NumPy universal function was left out as well as control over the actual output array type (a generic array is documented; grass.script.array.array is used now).

The NumPy import dependency is optional so that the imports and Tools objects work without NumPy installed. While the tests would fail, GRASS build should work without NumPy as of now.

This combines well with the dynamic return value with control over consistency implemented in #6278 as the arrays are one of the possible return types, but can be also made as part of a consistent return type. This lends itself to single array, tuple of arrays, or object with named arrays as possible return types.

Overall, this is building on top of Tools class addition in #2923. The big picture is also discussed in #5830.
wenzeslaus added a commit that referenced this pull request Oct 7, 2025
This is adding r.pack files (aka native GRASS raster files) as input and output to tools when called through the Tools object. Tool calls such as r_grow can take r.pack files as input or output. The format is distinguished by the file extension.

Notably, tool calls such as r_mapcalc don't pass input or output data as separate parameters (expressions or base names), so they can be used like that only when a wrapper exists (r_mapcalc_simple) or, in the future, when more information is included in the interface or passed between the tool and the Tools class Python code. Similarly, tools with multiple inputs or outputs in a single parameter are currently not supported.  

The code is using --json with the tool to get the information on what is input and what is output, because all are files which may or may not exists (this is different from NumPy arrays where the user-provided parameters clearly say what is input (object) and what is output (class)). Consequently, the whole import-export machinery is only started when there are files in the parameters as identified by the parameter converter class.

Currently, the in-project raster names are driven by the file names. This will break for parallel usage and will not work for vector as is. While it is good for guessing the right (and nice) name, e.g., for r.mapcalc expression, ultimately, unique names retrieved with an API function are likely the way to go.

When cashing is enabled (either through use go context manager or explicitly), import of inputs is skipped when they were already imported or when they are known outputs. Without cache, data is deleted after every tool (function) call. Cashing is keeping the in-project data in the project (as opposed to a hidden cache or deleting them). The parameter to explicitly drive this is called use_cache (originally keep_data).

The objects track what is imported and also track import and cleaning tasks at function call versus object level. The data is cleaned even in case of exceptions. The interface was clarified by creating a private/protected version of run_cmd which has the internal-only parameters. This function uses a single try-finally block to trigger the cleaning in case of exceptions. 

While generally the code supports paths as both strings and Path objects, the actual decisions about import are made from the list of strings form of the command.

From caller perspective, overwrite is supported in the same way as for in-project GRASS rasters.

The tests use module scope to reduce fixture setup by couple seconds. Changes include a minor cleanup of comments in tests related to testing result without format=json and with, e.g., --json option.

The class documentation discusses overhead and parallelization because the calls are more costly and there is a significant state of the object now with the cache and the rasters created in the background. This includes discussion of the NumPy arrays, too, and slightly improves the wording in part discussing arrays.

This is building on top of #2923 (Tools API, and it is parallel with #5878 (NumPy array IO), although it runs at a different stage than NumPy array conversions and uses cache for the imported data (may be connected more with the arrays in the future). This can be used efficiently in Python with Tools (caching, assuming project) and in a limited way also with the experimental run subcommand in CLI (no caching, still needs an explicit project). There is more potential use of this with the standalone tools concept (#5843). The big picture is also discussed in #5830.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CMake enhancement New feature or request libraries Python Related code is in Python tests Related to Test Suite

Development

Successfully merging this pull request may close these issues.

5 participants