From e5bbf8eb57d8664499631a067f705346ddcafdc4 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Fri, 27 Dec 2024 19:28:54 -0800 Subject: [PATCH 01/13] Add generic guidance on recording errors on spans and metrics --- docs/cli/cli-spans.md | 3 +- docs/database/cassandra.md | 3 +- docs/database/cosmosdb.md | 3 +- docs/database/couchdb.md | 5 +- docs/database/database-spans.md | 56 ++-------------------- docs/database/elasticsearch.md | 3 +- docs/database/mariadb.md | 36 +------------- docs/database/mongodb.md | 3 +- docs/database/mssql.md | 1 + docs/database/mysql.md | 36 +------------- docs/database/postgresql.md | 36 +------------- docs/database/redis.md | 3 +- docs/exceptions/README.md | 17 +++++-- docs/faas/faas-spans.md | 2 + docs/gen-ai/gen-ai-spans.md | 5 ++ docs/general/recording-errors.md | 80 +++++++++++++++++++++++++++++++ docs/http/http-spans.md | 5 +- docs/messaging/messaging-spans.md | 5 ++ docs/rpc/rpc-spans.md | 5 ++ model/database/spans.yaml | 26 +++++++++- 20 files changed, 157 insertions(+), 176 deletions(-) create mode 100644 docs/general/recording-errors.md diff --git a/docs/cli/cli-spans.md b/docs/cli/cli-spans.md index be7e2a61b8..0540a9641b 100644 --- a/docs/cli/cli-spans.md +++ b/docs/cli/cli-spans.md @@ -13,7 +13,8 @@ Span kind SHOULD be `INTERNAL` when the traced program is the callee or `CLIENT` The span name SHOULD be set to `{process.executable.name}`. Instrumentations that have additional context about executed commands MAY use a different low-cardinality span name format and SHOULD document it. -Span status SHOULD be set to `Error` if `{process.exit.code}` is not 0. +Span status SHOULD be set to `Error` if `{process.exit.code}` is not 0. Refer to the [Recording Errors](/docs/general/recording-errors.md) document for +additional details on how to record span status. diff --git a/docs/database/cassandra.md b/docs/database/cassandra.md index 7ec690b232..cf3d2b3b68 100644 --- a/docs/database/cassandra.md +++ b/docs/database/cassandra.md @@ -69,8 +69,7 @@ system specific term if more applicable. **[5] `db.operation.name`:** If readily available and if there is a single operation name that describes the database call. The operation name MAY be parsed from the query text, in which case it SHOULD be the single operation name found in the query. -**[6] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes. -Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system. +**[6] `db.response.status_code`:** All Cassandra protocol error codes SHOULD be considered errors. **[7] `db.response.status_code`:** If the operation failed and status code is available. diff --git a/docs/database/cosmosdb.md b/docs/database/cosmosdb.md index 810967a907..679e1fde6c 100644 --- a/docs/database/cosmosdb.md +++ b/docs/database/cosmosdb.md @@ -193,8 +193,7 @@ additional values when introducing new operations. **[5] `db.operation.name`:** If readily available and if there is a single operation name that describes the database call. The operation name MAY be parsed from the query text, in which case it SHOULD be the single operation name found in the query. -**[6] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes. -Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system. +**[6] `db.response.status_code`:** Response codes in the 4xx and 5xx range SHOULD be considered errors. **[7] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred. When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred. diff --git a/docs/database/couchdb.md b/docs/database/couchdb.md index b4eb826f75..3b94489e5a 100644 --- a/docs/database/couchdb.md +++ b/docs/database/couchdb.md @@ -23,7 +23,7 @@ The Semantic Conventions for [CouchDB](https://couchdb.apache.org/) extend and o |---|---|---|---|---|---| | [`db.namespace`](/docs/attributes-registry/db.md) | string | The name of the database, fully qualified within the server address and port. | `customers`; `test.users` | `Conditionally Required` If available. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | | [`db.operation.name`](/docs/attributes-registry/db.md) | string | The HTTP method + the target REST route. [1] | `GET /{db}/{docid}` | `Conditionally Required` If readily available. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | -| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | The HTTP response code returned by the Couch DB. [2] | `200`; `201`; `429` | `Conditionally Required` [3] | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | +| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | The HTTP response code returned by the Couch DB recorded as string. [2] | `200`; `201`; `429` | `Conditionally Required` [3] | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | | [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [4] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [5] | `80`; `8080`; `443` | `Conditionally Required` [6] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [7] | `2`; `3`; `4` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | @@ -31,8 +31,7 @@ The Semantic Conventions for [CouchDB](https://couchdb.apache.org/) extend and o **[1] `db.operation.name`:** In **CouchDB**, `db.operation.name` should be set to the HTTP method + the target REST route according to the API reference documentation. For example, when retrieving a document, `db.operation.name` would be set to (literally, i.e., without replacing the placeholders with concrete values): [`GET /{db}/{docid}`](https://docs.couchdb.org/en/stable/api/document/common.html#get--db-docid). -**[2] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes. -Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system. +**[2] `db.response.status_code`:** HTTP response codes in the 4xx and 5xx range SHOULD be considered errors. **[3] `db.response.status_code`:** If response was received and the HTTP response code is available. diff --git a/docs/database/database-spans.md b/docs/database/database-spans.md index dcd4ec2202..772de3e5a3 100644 --- a/docs/database/database-spans.md +++ b/docs/database/database-spans.md @@ -89,59 +89,11 @@ For example, for an operation describing SQL query on an anonymous table like `S ## Status -[Span Status Code][SpanStatus] MUST be left unset if the operation has ended without any errors. +Refer to the [Recording Errors](/docs/general/recording-errors.md) document for +details on how to record span status. -Instrumentation SHOULD consider the operation as failed if any of the following is true: - -- the `db.response.status_code` value indicates an error - - > [!NOTE] - > - > The classification of status code as an error depends on the context. - > For example, a SQL STATE `02000` (`no_data`) indicates an error when the application - > expected the data to be available. However, it is not an error when the - > application is simply checking whether the data exists. - > - > Instrumentations that have additional context about a specific operation MAY use - > this context to set the span status more precisely. - > Instrumentations that don't have any additional context MUST follow the - > guidelines in this section. - -- an exception is thrown by the instrumented method call -- the instrumented method returns an error in another way - -When the operation ends with an error, instrumentation: - -- SHOULD set the span status code to `Error` -- SHOULD set the `error.type` attribute -- SHOULD set the span status description when it has additional information - about the error which is not expected to contain sensitive details and aligns - with [Span Status Description][SpanStatus] definition. - - It's NOT RECOMMENDED to duplicate `db.response.status_code` or `error.type` - in span status description. - - When the operation fails with an exception, the span status description SHOULD be set to - the exception message. - -### Recording exception events - -**Status**: [Experimental][DocumentStatus] - -When the operation fails with an exception, instrumentation SHOULD record -an [exception event](../exceptions/exceptions-spans.md) by default if, and only if, -the span being recorded is a local root span (does not have a local parent). - -> [!NOTE] -> -> Exception stack traces could be very long and are expensive to capture and store. -> Exceptions which are not handled by instrumented libraries are likely to be handled -> and logged by the caller. -> Exceptions that are not handled will be recorded by the outermost (local root) -> instrumentation such as HTTP or gRPC server. - -Instrumentation MAY provide a configuration option to record exceptions that -escape the surface of the instrumented API. +Semantic conventions for individual systems SHOULD specify which values of `db.response.status_code` +classify as errors. ## Common attributes diff --git a/docs/database/elasticsearch.md b/docs/database/elasticsearch.md index 175cf43fab..da64ed62fa 100644 --- a/docs/database/elasticsearch.md +++ b/docs/database/elasticsearch.md @@ -82,8 +82,7 @@ When a query string value is redacted, the query string key SHOULD still be pres **[4] `db.elasticsearch.path_parts`:** Many Elasticsearch url paths allow dynamic values. These SHOULD be recorded in span attributes in the format `db.elasticsearch.path_parts.`, where `` is the url path part name. The implementation SHOULD reference the [elasticsearch schema](https://raw.githubusercontent.com/elastic/elasticsearch-specification/main/output/schema/schema.json) in order to map the path part values to their names. -**[5] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes. -Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system. +**[5] `db.response.status_code`:** HTTP response codes in the 4xx and 5xx range SHOULD be considered errors. **[6] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred. When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred. diff --git a/docs/database/mariadb.md b/docs/database/mariadb.md index 58285321b5..ad9772fb91 100644 --- a/docs/database/mariadb.md +++ b/docs/database/mariadb.md @@ -42,41 +42,7 @@ Instrumentation SHOULD document if `db.namespace` reflects the database provided It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. -**[2] `db.response.status_code`:** SQL defines [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) as a database -return code which is adopted by some database systems like PostgreSQL. -See [PostgreSQL error codes](https://www.postgresql.org/docs/current/errcodes-appendix.html) -for the details. - -Other systems like MySQL, Oracle, or MS SQL Server define vendor-specific -error codes. Database SQL drivers usually provide access to both properties. -For example, in Java, the [`SQLException`](https://docs.oracle.com/javase/8/docs/api/java/sql/SQLException.html) -class reports them with `getSQLState()` and `getErrorCode()` methods. - -Instrumentations SHOULD populate the `db.response.status_code` with the -the most specific code available to them. - -Here's a non-exhaustive list of databases that report vendor-specific -codes with granularity higher than SQLSTATE (or don't report SQLSTATE -at all): - -- [DB2 SQL codes](https://www.ibm.com/docs/db2-for-zos/12?topic=codes-sql). -- [Maria DB error codes](https://mariadb.com/kb/en/mariadb-error-code-reference/) -- [Microsoft SQL Server errors](https://docs.microsoft.com/sql/relational-databases/errors-events/database-engine-events-and-errors) -- [MySQL error codes](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html) -- [Oracle error codes](https://docs.oracle.com/cd/B28359_01/server.111/b28278/toc.htm) -- [SQLite result codes](https://www.sqlite.org/rescode.html) - -These systems SHOULD set the `db.response.status_code` to a -known vendor-specific error code. If only SQLSTATE is available, -it SHOULD be used. - -When multiple error codes are available and specificity is unclear, -instrumentation SHOULD set the `db.response.status_code` to the -concatenated string of all codes with '/' used as a separator. - -For example, generic DB instrumentation that detected an error and has -SQLSTATE `"42000"` and vendor-specific `1071` should set -`db.response.status_code` to `"42000/1071"`." +**[2] `db.response.status_code`:** When [SQLSTATE](https://mariadb.com/kb/en/sqlstate/) is available, SQLSTATE of "Class 02" or higher SHOULD be considered errors.). When SQLSTATE is not available, all Maria DB error codes SHOULD be considered errors. **[3] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred. When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred. diff --git a/docs/database/mongodb.md b/docs/database/mongodb.md index 26f33c9afb..0c8dbe46ba 100644 --- a/docs/database/mongodb.md +++ b/docs/database/mongodb.md @@ -40,8 +40,7 @@ then that collection name SHOULD be used. **[2] `db.operation.name`:** See [MongoDB database commands](https://www.mongodb.com/docs/manual/reference/command/). -**[3] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes. -Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system. +**[3] `db.response.status_code`:** All MongoDB error codes SHOULD be considered errors. **[4] `db.response.status_code`:** If the operation failed and error code is available. diff --git a/docs/database/mssql.md b/docs/database/mssql.md index e6147cae1d..5b30b833fb 100644 --- a/docs/database/mssql.md +++ b/docs/database/mssql.md @@ -47,6 +47,7 @@ Instrumentation SHOULD document if `db.namespace` reflects the database provided It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. **[2] `db.response.status_code`:** Microsoft SQL Server does not report SQLSTATE. +Instrumentations SHOULD use [error severity](https://learn.microsoft.com/sql/relational-databases/errors-events/database-engine-error-severities) returned along with the status code to determine the status of the span. Response codes with severity 11 or higher SHOULD be considered errors. **[3] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred. When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred. diff --git a/docs/database/mysql.md b/docs/database/mysql.md index 99bb820e83..26807d169e 100644 --- a/docs/database/mysql.md +++ b/docs/database/mysql.md @@ -42,41 +42,7 @@ Instrumentation SHOULD document if `db.namespace` reflects the database provided It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. -**[2] `db.response.status_code`:** SQL defines [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) as a database -return code which is adopted by some database systems like PostgreSQL. -See [PostgreSQL error codes](https://www.postgresql.org/docs/current/errcodes-appendix.html) -for the details. - -Other systems like MySQL, Oracle, or MS SQL Server define vendor-specific -error codes. Database SQL drivers usually provide access to both properties. -For example, in Java, the [`SQLException`](https://docs.oracle.com/javase/8/docs/api/java/sql/SQLException.html) -class reports them with `getSQLState()` and `getErrorCode()` methods. - -Instrumentations SHOULD populate the `db.response.status_code` with the -the most specific code available to them. - -Here's a non-exhaustive list of databases that report vendor-specific -codes with granularity higher than SQLSTATE (or don't report SQLSTATE -at all): - -- [DB2 SQL codes](https://www.ibm.com/docs/db2-for-zos/12?topic=codes-sql). -- [Maria DB error codes](https://mariadb.com/kb/en/mariadb-error-code-reference/) -- [Microsoft SQL Server errors](https://docs.microsoft.com/sql/relational-databases/errors-events/database-engine-events-and-errors) -- [MySQL error codes](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html) -- [Oracle error codes](https://docs.oracle.com/cd/B28359_01/server.111/b28278/toc.htm) -- [SQLite result codes](https://www.sqlite.org/rescode.html) - -These systems SHOULD set the `db.response.status_code` to a -known vendor-specific error code. If only SQLSTATE is available, -it SHOULD be used. - -When multiple error codes are available and specificity is unclear, -instrumentation SHOULD set the `db.response.status_code` to the -concatenated string of all codes with '/' used as a separator. - -For example, generic DB instrumentation that detected an error and has -SQLSTATE `"42000"` and vendor-specific `1071` should set -`db.response.status_code` to `"42000/1071"`." +**[2] `db.response.status_code`:** All MySQL error codes SHOULD be considered errors. **[3] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred. When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred. diff --git a/docs/database/postgresql.md b/docs/database/postgresql.md index 545af7ae55..30410828c4 100644 --- a/docs/database/postgresql.md +++ b/docs/database/postgresql.md @@ -49,41 +49,7 @@ Instrumentation SHOULD document if `db.namespace` reflects the user provided whe It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. -**[2] `db.response.status_code`:** SQL defines [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) as a database -return code which is adopted by some database systems like PostgreSQL. -See [PostgreSQL error codes](https://www.postgresql.org/docs/current/errcodes-appendix.html) -for the details. - -Other systems like MySQL, Oracle, or MS SQL Server define vendor-specific -error codes. Database SQL drivers usually provide access to both properties. -For example, in Java, the [`SQLException`](https://docs.oracle.com/javase/8/docs/api/java/sql/SQLException.html) -class reports them with `getSQLState()` and `getErrorCode()` methods. - -Instrumentations SHOULD populate the `db.response.status_code` with the -the most specific code available to them. - -Here's a non-exhaustive list of databases that report vendor-specific -codes with granularity higher than SQLSTATE (or don't report SQLSTATE -at all): - -- [DB2 SQL codes](https://www.ibm.com/docs/db2-for-zos/12?topic=codes-sql). -- [Maria DB error codes](https://mariadb.com/kb/en/mariadb-error-code-reference/) -- [Microsoft SQL Server errors](https://docs.microsoft.com/sql/relational-databases/errors-events/database-engine-events-and-errors) -- [MySQL error codes](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html) -- [Oracle error codes](https://docs.oracle.com/cd/B28359_01/server.111/b28278/toc.htm) -- [SQLite result codes](https://www.sqlite.org/rescode.html) - -These systems SHOULD set the `db.response.status_code` to a -known vendor-specific error code. If only SQLSTATE is available, -it SHOULD be used. - -When multiple error codes are available and specificity is unclear, -instrumentation SHOULD set the `db.response.status_code` to the -concatenated string of all codes with '/' used as a separator. - -For example, generic DB instrumentation that detected an error and has -SQLSTATE `"42000"` and vendor-specific `1071` should set -`db.response.status_code` to `"42000/1071"`." +**[2] `db.response.status_code`:** Response codes of "Class 02" or higher SHOULD be considered errors. **[3] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred. When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred. diff --git a/docs/database/redis.md b/docs/database/redis.md index cd826850ff..8ef933e7a4 100644 --- a/docs/database/redis.md +++ b/docs/database/redis.md @@ -60,8 +60,7 @@ system specific term if more applicable. **[3] `db.operation.name`:** If readily available and if there is a single operation name that describes the database call. The operation name MAY be parsed from the query text, in which case it SHOULD be the single operation name found in the query. -**[4] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes. -Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system. +**[4] `db.response.status_code`:** All Redis error prefixes SHOULD be considered errors. **[5] `db.response.status_code`:** If the operation failed and status code is available. diff --git a/docs/exceptions/README.md b/docs/exceptions/README.md index bee2851df8..a9cc9c6817 100644 --- a/docs/exceptions/README.md +++ b/docs/exceptions/README.md @@ -7,13 +7,24 @@ path_base_for_github_subdir: # Semantic Conventions for Exceptions -**Status**: [Stable][DocumentStatus] - -This document defines semantic conventions for Exceptions. +**Status**: [Mixed][DocumentStatus] Semantic conventions for Exceptions are defined for the following signals: * [Exceptions on spans](exceptions-spans.md): Semantic Conventions for Exceptions associated with *spans*. * [Exceptions in logs](exceptions-logs.md): Semantic Conventions for Exceptions recorded in *logs*. +## Reporting errors in instrumentation code + +**Status**: [Development][DocumentStatus] + +When instrumented operation fails with an exception, instrumentation SHOULD record +this exception as a [span event](exceptions-spans.md) or a [log record](exceptions-logs.md). + +It's NOT RECOMMENDED to record exceptions that are handled by the instrumented library. + +It's RECOMMENDED to use `Span.recordException` API or logging library API that takes exception instance +instead of providing individual attributes. This enables the OpenTelemetry SDK to +control what information is recorded based on user configuration. + [DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status diff --git a/docs/faas/faas-spans.md b/docs/faas/faas-spans.md index b49ee5766c..89b01428ba 100644 --- a/docs/faas/faas-spans.md +++ b/docs/faas/faas-spans.md @@ -36,6 +36,8 @@ See also the [additional instructions for instrumenting AWS Lambda](aws-lambda.m Span `name` should be set to the function name being executed. Depending on the value of the `faas.trigger` attribute, additional attributes MUST be set. For example, an `http` trigger SHOULD follow the [HTTP Server semantic conventions](/docs/http/http-spans.md#http-server-semantic-conventions). For more information, refer to the [Function Trigger Type](#function-trigger-type) section. +Refer to the [Recording Errors](/docs/general/recording-errors.md) document for details on how to record span status. + If Spans following this convention are produced, a Resource of type `faas` MUST exist following the [Resource semantic convention](../resource/faas.md). diff --git a/docs/gen-ai/gen-ai-spans.md b/docs/gen-ai/gen-ai-spans.md index 99cfb96f8b..515dc3b265 100644 --- a/docs/gen-ai/gen-ai-spans.md +++ b/docs/gen-ai/gen-ai-spans.md @@ -30,6 +30,11 @@ GenAI spans MUST follow the overall [guidelines for span names](https://github.c The **span name** SHOULD be `{gen_ai.operation.name} {gen_ai.request.model}`. Semantic conventions for individual GenAI systems and frameworks MAY specify different span name format. +### Status + +Refer to the [Recording Errors](/docs/general/recording-errors.md) document for +details on how to record span status. + ## GenAI attributes These attributes track input data and metadata for a request to a GenAI model. Each attribute represents a concept that is common to most Generative AI clients. diff --git a/docs/general/recording-errors.md b/docs/general/recording-errors.md new file mode 100644 index 0000000000..97093e01d1 --- /dev/null +++ b/docs/general/recording-errors.md @@ -0,0 +1,80 @@ + + +# Recording errors + +**Status**: [Development][DocumentStatus]. + +This document provides recommendations to semantic conventions and instrumentation authors +on how to record errors on spans and metrics. + +Individual semantic conventions are encouraged to provide additional guidance. + +## What constitutes an error + +Operation SHOULD be considered as failed if any of the following is true: + +- an exception is thrown by the instrumented method (API, block of code, or another instrumented unit) +- the instrumented method returns an error in another way, for example, via an error code + + Semantic conventions that define domain-specific status codes SHOULD specify + which status codes should be reported as errors by a general-purpose instrumentation. + + > [!NOTE] + > + > The classification of a status code as an error depends on the context. + > For example, an HTTP 404 "Not Found" status code indicates an error if the application + > expected the resource to be available. However, it is not an error when the + > application is simply checking whether the resource exists. + > + > Instrumentations that have additional context about a specific request MAY use + > this context to set the span status more precisely. + +Errors that were retried or handled allowing operation to complete gracefully SHOULD NOT +be recorded on spans or metrics that describe this operation. + +## How to record errors on spans + +[Span Status Code][SpanStatus] MUST be left unset if the instrumented operation has +ended without any errors. + +When the operation ends with an error, instrumentation: + +- SHOULD set the span status code to `Error` +- SHOULD set the [`error.type`](/docs/attributes-registry/error.md#error-type) attribute +- SHOULD set the span status description when it has additional information + about the error which is not expected to contain sensitive details and aligns + with [Span Status Description][SpanStatus] definition. + + It's NOT RECOMMENDED to duplicate status code or `error.type` in span status description. + + When the operation fails with an exception, the span status description SHOULD be set to + the exception message. + +Refer to the [general exception guidance](/docs/exceptions/README.md) on capturing exception +details. + +## How to record errors on metrics + +Semantic conventions for operations usually define an operation duration histogram +metric. It SHOULD include the `error.type` attribute. This enables users to derive +throughput and error rates. + +Operations that complete successfully SHOULD NOT include the `error.type` attribute, +allowing users to filter out errors. + +Semantic conventions SHOULD include `error.type` on other metrics when it's applicable. +For example, `messaging.client.sent.messages` metric measures message throughput (one +messaging operation may involve sending multiple messages) and includes `error.type`. + +It's RECOMMENDED to report one metric that includes successes and failures as opposed +to reporting two (or more) metrics depending on the operation status. + +Instrumentation SHOULD ensure `error.type` is applied consistently across spans +and metrics when both are reported. A span and its corresponding metric for a single +operation SHOULD have the same `error.type` value if the operation failed and SHOULD NOT +include it if the operation succeeded. + +[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status +[SpanStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.39.0/specification/trace/api.md#set-status diff --git a/docs/http/http-spans.md b/docs/http/http-spans.md index 6d54263fcd..34274862d8 100644 --- a/docs/http/http-spans.md +++ b/docs/http/http-spans.md @@ -91,7 +91,7 @@ Instrumentation MUST NOT default to using URI path as a `{target}`. the response body; or 3xx codes with max redirects exceeded), in which case status MUST be set to `Error`. -> **Note:** +> [!NOTE] > > The classification of an HTTP status code as an error depends on the context. > For example, a 404 "Not Found" status code indicates an error if the application @@ -117,6 +117,9 @@ the client or server from sending/receiving the request/response fully. When instrumentation detects such errors it SHOULD set span status to `Error` and SHOULD set the `error.type` attribute. +Refer to the [Recording Errors](/docs/general/recording-errors.md) document for +details on how to record span status. + ## HTTP client This span type represents an outbound HTTP request. There are two ways this can be achieved in an instrumentation: diff --git a/docs/messaging/messaging-spans.md b/docs/messaging/messaging-spans.md index 0d13af08bd..7e75a9f685 100644 --- a/docs/messaging/messaging-spans.md +++ b/docs/messaging/messaging-spans.md @@ -247,6 +247,11 @@ Span kind SHOULD be set according to the following table, based on the operation Setting span kinds according to this table allows analysis tools to interpret spans and relationships between them without the need for additional semantic hints. +### Span status + +Refer to the [Recording Errors](/docs/general/recording-errors.md) document for +details on how to record span status. + ### Trace structure #### Producer spans diff --git a/docs/rpc/rpc-spans.md b/docs/rpc/rpc-spans.md index 9401826e9b..eff45f9cf2 100644 --- a/docs/rpc/rpc-spans.md +++ b/docs/rpc/rpc-spans.md @@ -79,6 +79,11 @@ Examples of span names: `MyServiceReference.ICalculator/Add` reported by the client for .NET WCF calls - `MyServiceWithNoPackage/theMethod` +### Span status + +Refer to the [Recording Errors](/docs/general/recording-errors.md) document for +details on how to record span status. + ### Service name On the server process receiving and handling the remote procedure call, the service name provided in `rpc.service` does not necessarily have to match the [`service.name`][] resource attribute. diff --git a/model/database/spans.yaml b/model/database/spans.yaml index 476127d09c..57f1346a0d 100644 --- a/model/database/spans.yaml +++ b/model/database/spans.yaml @@ -148,6 +148,9 @@ groups: represented as a string. note: > Microsoft SQL Server does not report SQLSTATE. + + Instrumentations SHOULD use [error severity](https://learn.microsoft.com/sql/relational-databases/errors-events/database-engine-error-severities) + returned along with the status code to determine the status of the span. Response codes with severity 11 or higher SHOULD be considered errors. examples: ["102", "40020"] - id: span.db.postgresql.client @@ -183,6 +186,8 @@ groups: - ref: db.response.status_code brief: > [PostgreSQL error code](https://www.postgresql.org/docs/current/errcodes-appendix.html). + note: > + Response codes of "Class 02" or higher SHOULD be considered errors. examples: ["08000", "08P01"] - id: span.db.mysql.client @@ -210,6 +215,8 @@ groups: - ref: db.response.status_code brief: > [MySQL error number](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html). + note: > + All MySQL error codes SHOULD be considered errors. examples: ["1005", "MY-010016"] - id: span.db.mariadb.client @@ -238,6 +245,11 @@ groups: brief: > [Maria DB error code](https://mariadb.com/kb/en/mariadb-error-code-reference/) represented as a string. + note: > + When [SQLSTATE](https://mariadb.com/kb/en/sqlstate/) is available, SQLSTATE of + "Class 02" or higher SHOULD be considered errors.). When SQLSTATE is not available, + all Maria DB error codes SHOULD be considered errors. + examples: ["1008", "3058"] - id: span.db.cassandra.client @@ -274,6 +286,8 @@ groups: - ref: db.response.status_code brief: > [Cassandra protocol error code](https://github.com/apache/cassandra/blob/cassandra-5.0/doc/native_protocol_v5.spec) represented as a string. + note: > + All Cassandra protocol error codes SHOULD be considered errors. examples: ["102", "40020"] - id: span.db.hbase.client type: span @@ -334,7 +348,9 @@ groups: note: "" # overriding the base note - ref: db.response.status_code brief: > - The HTTP response code returned by the Couch DB. + The HTTP response code returned by the Couch DB recorded as string. + note: > + HTTP response codes in the 4xx and 5xx range SHOULD be considered errors. examples: ["200", "201", "429"] requirement_level: conditionally_required: If response was received and the HTTP response code is available. @@ -395,6 +411,8 @@ groups: brief: > The Redis [simple error](https://redis.io/docs/latest/develop/reference/protocol-spec/#simple-errors) prefix. examples: ["ERR", "WRONGTYPE", "CLUSTERDOWN"] + note: > + All Redis error prefixes SHOULD be considered errors. - ref: db.operation.batch.size - ref: db.operation.parameter requirement_level: opt_in @@ -434,6 +452,8 @@ groups: - ref: db.response.status_code brief: > [MongoDB error code](https://www.mongodb.com/docs/manual/reference/error-codes/) represented as a string. + note: > + All MongoDB error codes SHOULD be considered errors. requirement_level: conditionally_required: If the operation failed and error code is available. examples: ["36", "11602"] @@ -492,6 +512,8 @@ groups: brief: > The HTTP response code returned by the Elasticsearch cluster. examples: ["200", "201", "429"] + note: > + HTTP response codes in the 4xx and 5xx range SHOULD be considered errors. requirement_level: conditionally_required: If response was received. - id: span.db.sql.client @@ -607,6 +629,8 @@ groups: brief: > Cosmos DB status code. examples: ["200", "201"] + note: > + Response codes in the 4xx and 5xx range SHOULD be considered errors. requirement_level: conditionally_required: if response was received - ref: db.response.returned_rows From e7013cf2258fdac05ba3df2fef12992b1a5b9aa1 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Sat, 28 Dec 2024 10:15:46 -0800 Subject: [PATCH 02/13] provide system-specific db status code notes --- docs/database/mariadb.md | 4 +++- docs/database/mysql.md | 4 ++-- docs/database/postgresql.md | 2 +- docs/exceptions/README.md | 2 +- docs/http/http-spans.md | 2 +- model/database/spans.yaml | 20 +++++++++++++------- 6 files changed, 21 insertions(+), 13 deletions(-) diff --git a/docs/database/mariadb.md b/docs/database/mariadb.md index ad9772fb91..e34fc231da 100644 --- a/docs/database/mariadb.md +++ b/docs/database/mariadb.md @@ -42,7 +42,9 @@ Instrumentation SHOULD document if `db.namespace` reflects the database provided It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. -**[2] `db.response.status_code`:** When [SQLSTATE](https://mariadb.com/kb/en/sqlstate/) is available, SQLSTATE of "Class 02" or higher SHOULD be considered errors.). When SQLSTATE is not available, all Maria DB error codes SHOULD be considered errors. +**[2] `db.response.status_code`:** MariaDB uses vendor-specific error codes on all errors and reports [SQLSTATE](https://mariadb.com/kb/en/sqlstate/) in some cases. +MariaDB error codes are more granular than SQLSTATE, so MariaDB instrumentations SHOULD set the `db.response.status_code` to this known error code. +When SQLSTATE is available, SQLSTATE of "Class 02" or higher SHOULD be considered errors. When SQLSTATE is not available, all MariaDB error codes SHOULD be considered errors. **[3] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred. When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred. diff --git a/docs/database/mysql.md b/docs/database/mysql.md index 26807d169e..19b42520c0 100644 --- a/docs/database/mysql.md +++ b/docs/database/mysql.md @@ -22,7 +22,7 @@ The Semantic Conventions for *MySQL* extend and override the [Database Semantic | Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | |---|---|---|---|---|---| | [`db.namespace`](/docs/attributes-registry/db.md) | string | The database associated with the connection. [1] | `products`; `customers` | `Conditionally Required` If available without an additional network call. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | -| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | [MySQL error number](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html). [2] | `1005`; `MY-010016` | `Conditionally Required` If response has ended with warning or an error. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | +| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | [MySQL error number](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html) recorded as a string. [2] | `1005`; `MY-010016` | `Conditionally Required` If response has ended with warning or an error. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | | [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [3] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [4] | `80`; `8080`; `443` | `Conditionally Required` [5] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [6] | `2`; `3`; `4` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | @@ -42,7 +42,7 @@ Instrumentation SHOULD document if `db.namespace` reflects the database provided It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. -**[2] `db.response.status_code`:** All MySQL error codes SHOULD be considered errors. +**[2] `db.response.status_code`:** MySQL error codes are vendor specific error codes and don't follow [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) conventions. All MySQL error codes SHOULD be considered errors. **[3] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred. When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred. diff --git a/docs/database/postgresql.md b/docs/database/postgresql.md index 30410828c4..27c269ea39 100644 --- a/docs/database/postgresql.md +++ b/docs/database/postgresql.md @@ -49,7 +49,7 @@ Instrumentation SHOULD document if `db.namespace` reflects the user provided whe It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. -**[2] `db.response.status_code`:** Response codes of "Class 02" or higher SHOULD be considered errors. +**[2] `db.response.status_code`:** PostgreSQL follows SQL standard conventions for [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE). Response codes of "Class 02" or higher SHOULD be considered errors. **[3] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred. When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred. diff --git a/docs/exceptions/README.md b/docs/exceptions/README.md index a9cc9c6817..eabed52389 100644 --- a/docs/exceptions/README.md +++ b/docs/exceptions/README.md @@ -7,7 +7,7 @@ path_base_for_github_subdir: # Semantic Conventions for Exceptions -**Status**: [Mixed][DocumentStatus] +**Status**: [Stable][DocumentStatus], Unless otherwise specified. Semantic conventions for Exceptions are defined for the following signals: diff --git a/docs/http/http-spans.md b/docs/http/http-spans.md index 34274862d8..ecf3c7c99f 100644 --- a/docs/http/http-spans.md +++ b/docs/http/http-spans.md @@ -118,7 +118,7 @@ When instrumentation detects such errors it SHOULD set span status to `Error` and SHOULD set the `error.type` attribute. Refer to the [Recording Errors](/docs/general/recording-errors.md) document for -details on how to record span status. +general considerations on how to record span status. ## HTTP client diff --git a/model/database/spans.yaml b/model/database/spans.yaml index 57f1346a0d..53dd3ae343 100644 --- a/model/database/spans.yaml +++ b/model/database/spans.yaml @@ -187,9 +187,9 @@ groups: brief: > [PostgreSQL error code](https://www.postgresql.org/docs/current/errcodes-appendix.html). note: > + PostgreSQL follows SQL standard conventions for [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE). Response codes of "Class 02" or higher SHOULD be considered errors. examples: ["08000", "08P01"] - - id: span.db.mysql.client type: span stability: experimental @@ -214,9 +214,10 @@ groups: examples: ["products", "customers"] - ref: db.response.status_code brief: > - [MySQL error number](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html). + [MySQL error number](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html) recorded as a string. note: > - All MySQL error codes SHOULD be considered errors. + MySQL error codes are vendor specific error codes and don't follow [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) + conventions. All MySQL error codes SHOULD be considered errors. examples: ["1005", "MY-010016"] - id: span.db.mariadb.client @@ -246,12 +247,17 @@ groups: [Maria DB error code](https://mariadb.com/kb/en/mariadb-error-code-reference/) represented as a string. note: > - When [SQLSTATE](https://mariadb.com/kb/en/sqlstate/) is available, SQLSTATE of - "Class 02" or higher SHOULD be considered errors.). When SQLSTATE is not available, - all Maria DB error codes SHOULD be considered errors. + MariaDB uses vendor-specific error codes on all errors and reports + [SQLSTATE](https://mariadb.com/kb/en/sqlstate/) in some cases. - examples: ["1008", "3058"] + MariaDB error codes are more granular than SQLSTATE, so MariaDB instrumentations + SHOULD set the `db.response.status_code` to this known error code. + When SQLSTATE is available, SQLSTATE of "Class 02" or higher SHOULD be + considered errors. When SQLSTATE is not available, all MariaDB error + codes SHOULD be considered errors. + + examples: ["1008", "3058"] - id: span.db.cassandra.client type: span span_kind: client From 2f76eac9bfbaf5bc4fe329d709de4117b45217aa Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Sat, 28 Dec 2024 10:30:02 -0800 Subject: [PATCH 03/13] changelog and lint --- .chloggen/1716.yaml | 4 ++++ docs/database/database-spans.md | 2 -- docs/gen-ai/gen-ai-spans.md | 1 + docs/messaging/messaging-spans.md | 1 + docs/rpc/rpc-spans.md | 1 + 5 files changed, 7 insertions(+), 2 deletions(-) create mode 100644 .chloggen/1716.yaml diff --git a/.chloggen/1716.yaml b/.chloggen/1716.yaml new file mode 100644 index 0000000000..1044d68c14 --- /dev/null +++ b/.chloggen/1716.yaml @@ -0,0 +1,4 @@ +change_type: enhancement +component: docs, db +note: Add common guidance on recording errors on spans and metrics, clarify DB conventions. +issues: [1536, 1716] diff --git a/docs/database/database-spans.md b/docs/database/database-spans.md index 772de3e5a3..5e1b17d4a5 100644 --- a/docs/database/database-spans.md +++ b/docs/database/database-spans.md @@ -12,7 +12,6 @@ linkTitle: Client Calls - [Name](#name) - [Status](#status) - - [Recording exception events](#recording-exception-events) - [Common attributes](#common-attributes) - [Notes and well-known identifiers for `db.system`](#notes-and-well-known-identifiers-for-dbsystem) - [Sanitization of `db.query.text`](#sanitization-of-dbquerytext) @@ -418,4 +417,3 @@ More specific Semantic Conventions are defined for the following database techno * [SQL](sql.md): Semantic Conventions for *SQL* databases. [DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status -[SpanStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.39.0/specification/trace/api.md#set-status diff --git a/docs/gen-ai/gen-ai-spans.md b/docs/gen-ai/gen-ai-spans.md index 515dc3b265..54ebc54f37 100644 --- a/docs/gen-ai/gen-ai-spans.md +++ b/docs/gen-ai/gen-ai-spans.md @@ -11,6 +11,7 @@ linkTitle: Generative AI traces - [Name](#name) + - [Status](#status) - [GenAI attributes](#genai-attributes) - [Capturing inputs and outputs](#capturing-inputs-and-outputs) diff --git a/docs/messaging/messaging-spans.md b/docs/messaging/messaging-spans.md index 7e75a9f685..02cbbc2691 100644 --- a/docs/messaging/messaging-spans.md +++ b/docs/messaging/messaging-spans.md @@ -22,6 +22,7 @@ - [Span name](#span-name) - [Operation types](#operation-types) - [Span kind](#span-kind) + - [Span status](#span-status) - [Trace structure](#trace-structure) - [Producer spans](#producer-spans) - [Consumer spans](#consumer-spans) diff --git a/docs/rpc/rpc-spans.md b/docs/rpc/rpc-spans.md index eff45f9cf2..d73492be49 100644 --- a/docs/rpc/rpc-spans.md +++ b/docs/rpc/rpc-spans.md @@ -15,6 +15,7 @@ This document defines how to describe remote procedure calls - [Common remote procedure call conventions](#common-remote-procedure-call-conventions) - [Span name](#span-name) + - [Span status](#span-status) - [Service name](#service-name) - [Client attributes](#client-attributes) - [Server attributes](#server-attributes) From ef9019436355fbb41d6aec8cdf0010137cf48561 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Thu, 2 Jan 2025 17:06:58 -0800 Subject: [PATCH 04/13] feedback and some clarifications --- docs/exceptions/README.md | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/docs/exceptions/README.md b/docs/exceptions/README.md index eabed52389..ddaa419c85 100644 --- a/docs/exceptions/README.md +++ b/docs/exceptions/README.md @@ -21,10 +21,36 @@ Semantic conventions for Exceptions are defined for the following signals: When instrumented operation fails with an exception, instrumentation SHOULD record this exception as a [span event](exceptions-spans.md) or a [log record](exceptions-logs.md). -It's NOT RECOMMENDED to record exceptions that are handled by the instrumented library. - It's RECOMMENDED to use `Span.recordException` API or logging library API that takes exception instance instead of providing individual attributes. This enables the OpenTelemetry SDK to control what information is recorded based on user configuration. +It's NOT RECOMMENDED to record the same exception more than once. +It's NOT RECOMMENDED to record exceptions that are handled by the instrumented library. + +For example, in this code-snippet, `ResourceNotFoundException` is handled and corresponding +native instrumentation should not record it. Other exceptions, that are propagated +to the caller, should be recorded (or logged) once. + +```java +public boolean createIfNotExists(String resourceId) throws IOException { + Span span = startSpan(); + try { + create(id); + return true; + } catch (ResourceNotFoundException e) { + // not recording exception and not setting span status to error - exception is handled + return false; + } catch (IOException e) { + // recording exception here (assuming it was not recorded inside `create` method) + span.recordException(e); + // or + // logger.atWarning().setCause(e).log(); + + span.setStatus(StatusCode.ERROR); + throw e; + } +} +``` + [DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status From ead60d7112d3f215bba5fad29180a46d784f10f7 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Wed, 8 Jan 2025 17:40:40 -0800 Subject: [PATCH 05/13] Remove exception.escapted, update examples, remove old example --- docs/attributes-registry/exception.md | 27 ++++------- docs/exceptions/README.md | 22 ++++++--- docs/exceptions/exceptions-spans.md | 46 +------------------ .../deprecated/registry-deprecated.yaml | 14 ++++++ model/exceptions/registry.yaml | 23 ---------- 5 files changed, 41 insertions(+), 91 deletions(-) create mode 100644 model/exceptions/deprecated/registry-deprecated.yaml diff --git a/docs/attributes-registry/exception.md b/docs/attributes-registry/exception.md index be9b732e15..201dc3bd75 100644 --- a/docs/attributes-registry/exception.md +++ b/docs/attributes-registry/exception.md @@ -6,30 +6,23 @@ # Exception +- [Exception Attributes](#exception-attributes) +- [Deprecated Exception Attributes](#deprecated-exception-attributes) + ## Exception Attributes This document defines the shared attributes used to report a single exception associated with a span or log. | Attribute | Type | Description | Examples | Stability | |---|---|---|---|---| -| `exception.escaped` | boolean | SHOULD be set to true if the exception event is recorded at a point where it is known that the exception is escaping the scope of the span. [1] | | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | `exception.message` | string | The exception message. | `Division by zero`; `Can't convert 'int' object to str implicitly` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | `exception.stacktrace` | string | A stacktrace as a string in the natural representation for the language runtime. The representation is to be determined and documented by each language SIG. | `Exception in thread "main" java.lang.RuntimeException: Test exception\n at com.example.GenerateTrace.methodB(GenerateTrace.java:13)\n at com.example.GenerateTrace.methodA(GenerateTrace.java:9)\n at com.example.GenerateTrace.main(GenerateTrace.java:5)` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | `exception.type` | string | The type of the exception (its fully-qualified class name, if applicable). The dynamic type of the exception should be preferred over the static type in languages that support it. | `java.net.ConnectException`; `OSError` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -**[1] `exception.escaped`:** An exception is considered to have escaped (or left) the scope of a span, -if that span is ended while the exception is still logically "in flight". -This may be actually "in flight" in some languages (e.g. if the exception -is passed to a Context manager's `__exit__` method in Python) but will -usually be caught at the point of recording the exception in most languages. - -It is usually not possible to determine at the point where an exception is thrown -whether it will escape the scope of a span. -However, it is trivial to know that an exception -will escape, if one checks for an active exception just before ending the span, -as done in the [example for recording span exceptions](https://opentelemetry.io/docs/specs/semconv/exceptions/exceptions-spans/#recording-an-exception). - -It follows that an exception may still escape the scope of the span -even if the `exception.escaped` attribute was not set or set to false, -since the event might have been recorded at a time where it was not -clear whether the exception will escape. +## Deprecated Exception Attributes + +Deprecated exception attributes. + +| Attribute | Type | Description | Examples | Stability | +|---|---|---|---|---| +| `exception.escaped` | boolean | Indicates that the exception is escaping the scope of the span. | | ![Deprecated](https://img.shields.io/badge/-deprecated-red)
It's no longer recommended to record exceptions that are handled and do not escape the scope of a span. | diff --git a/docs/exceptions/README.md b/docs/exceptions/README.md index ddaa419c85..36b32db52c 100644 --- a/docs/exceptions/README.md +++ b/docs/exceptions/README.md @@ -14,16 +14,23 @@ Semantic conventions for Exceptions are defined for the following signals: * [Exceptions on spans](exceptions-spans.md): Semantic Conventions for Exceptions associated with *spans*. * [Exceptions in logs](exceptions-logs.md): Semantic Conventions for Exceptions recorded in *logs*. -## Reporting errors in instrumentation code +## Reporting exceptions in instrumentation code **Status**: [Development][DocumentStatus] When instrumented operation fails with an exception, instrumentation SHOULD record this exception as a [span event](exceptions-spans.md) or a [log record](exceptions-logs.md). +Recording exceptions on spans SHOULD be accompanied by +- setting span status to `ERROR` +- setting [`error.type`](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/attributes-registry/error.md#error-type) + +Refer to the [Recording errors](/docs/general/recording-errors.md) document for additional +details on how to record errors across different signals. + It's RECOMMENDED to use `Span.recordException` API or logging library API that takes exception instance instead of providing individual attributes. This enables the OpenTelemetry SDK to -control what information is recorded based on user configuration. +control what information is recorded based on application configuration. It's NOT RECOMMENDED to record the same exception more than once. It's NOT RECOMMENDED to record exceptions that are handled by the instrumented library. @@ -36,18 +43,21 @@ to the caller, should be recorded (or logged) once. public boolean createIfNotExists(String resourceId) throws IOException { Span span = startSpan(); try { - create(id); + create(resourceId); return true; - } catch (ResourceNotFoundException e) { + } catch (ResourceAlreadyExistsException e) { // not recording exception and not setting span status to error - exception is handled + // but we can set attributes that capture additional details + span.setAttribute(AttributeKey.stringKey("acme.resource.create.status"), "already_exists"); return false; } catch (IOException e) { // recording exception here (assuming it was not recorded inside `create` method) span.recordException(e); // or - // logger.atWarning().setCause(e).log(); + // logger.warn(e); - span.setStatus(StatusCode.ERROR); + span.setAttribute(AttributeKey.stringKey("error.type"), e.getClass().getCanonicalName()) + span.setStatus(StatusCode.ERROR, e.getMessage()); throw e; } } diff --git a/docs/exceptions/exceptions-spans.md b/docs/exceptions/exceptions-spans.md index afcb5a50c3..96ca6f49cd 100644 --- a/docs/exceptions/exceptions-spans.md +++ b/docs/exceptions/exceptions-spans.md @@ -11,33 +11,6 @@ exceptions associated with spans. -- [Recording an Exception](#recording-an-exception) -- [Exception event](#exception-event) - - [Stacktrace Representation](#stacktrace-representation) - - - -## Recording an Exception - -An exception SHOULD be recorded as an `Event` on the span during which it occurred. -The name of the event MUST be `"exception"`. - -A typical template for an auto-instrumentation implementing this semantic convention -using an [API-provided `recordException` method](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.39.0/specification/trace/api.md#record-exception) -could look like this (pseudo-Java): - -```java -Span span = myTracer.startSpan(/*...*/); -try { - // Code that does the actual work which the Span represents -} catch (Throwable e) { - span.recordException(e, Attributes.of("exception.escaped", true)); - throw e; -} finally { - span.end(); -} -``` - ## Exception event @@ -57,30 +30,13 @@ This event describes a single exception. |---|---|---|---|---|---| | [`exception.message`](/docs/attributes-registry/exception.md) | string | The exception message. | `Division by zero`; `Can't convert 'int' object to str implicitly` | `Conditionally Required` [1] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`exception.type`](/docs/attributes-registry/exception.md) | string | The type of the exception (its fully-qualified class name, if applicable). The dynamic type of the exception should be preferred over the static type in languages that support it. | `java.net.ConnectException`; `OSError` | `Conditionally Required` [2] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`exception.escaped`](/docs/attributes-registry/exception.md) | boolean | SHOULD be set to true if the exception event is recorded at a point where it is known that the exception is escaping the scope of the span. [3] | | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`exception.escaped`](/docs/attributes-registry/exception.md) | boolean | Indicates that the exception is escaping the scope of the span. | | `Recommended` | ![Deprecated](https://img.shields.io/badge/-deprecated-red)
It's no longer recommended to record exceptions that are handled and do not escape the scope of a span. | | [`exception.stacktrace`](/docs/attributes-registry/exception.md) | string | A stacktrace as a string in the natural representation for the language runtime. The representation is to be determined and documented by each language SIG. | `Exception in thread "main" java.lang.RuntimeException: Test exception\n at com.example.GenerateTrace.methodB(GenerateTrace.java:13)\n at com.example.GenerateTrace.methodA(GenerateTrace.java:9)\n at com.example.GenerateTrace.main(GenerateTrace.java:5)` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | **[1] `exception.message`:** Required if `exception.type` is not set, recommended otherwise. **[2] `exception.type`:** Required if `exception.message` is not set, recommended otherwise. -**[3] `exception.escaped`:** An exception is considered to have escaped (or left) the scope of a span, -if that span is ended while the exception is still logically "in flight". -This may be actually "in flight" in some languages (e.g. if the exception -is passed to a Context manager's `__exit__` method in Python) but will -usually be caught at the point of recording the exception in most languages. - -It is usually not possible to determine at the point where an exception is thrown -whether it will escape the scope of a span. -However, it is trivial to know that an exception -will escape, if one checks for an active exception just before ending the span, -as done in the [example for recording span exceptions](https://opentelemetry.io/docs/specs/semconv/exceptions/exceptions-spans/#recording-an-exception). - -It follows that an exception may still escape the scope of the span -even if the `exception.escaped` attribute was not set or set to false, -since the event might have been recorded at a time where it was not -clear whether the exception will escape. - diff --git a/model/exceptions/deprecated/registry-deprecated.yaml b/model/exceptions/deprecated/registry-deprecated.yaml new file mode 100644 index 0000000000..ed9e57ac93 --- /dev/null +++ b/model/exceptions/deprecated/registry-deprecated.yaml @@ -0,0 +1,14 @@ +groups: + - id: registry.exception.deprecated + type: attribute_group + display_name: Deprecated Exception Attributes + brief: > + Deprecated exception attributes. + attributes: + - id: exception.escaped + type: boolean + stability: stable + deprecated: "It's no longer recommended to record exceptions that are handled + and do not escape the scope of a span." + brief: > + Indicates that the exception is escaping the scope of the span. diff --git a/model/exceptions/registry.yaml b/model/exceptions/registry.yaml index 1ebc90d854..7231a394de 100644 --- a/model/exceptions/registry.yaml +++ b/model/exceptions/registry.yaml @@ -31,26 +31,3 @@ groups: at com.example.GenerateTrace.methodB(GenerateTrace.java:13)\n at com.example.GenerateTrace.methodA(GenerateTrace.java:9)\n at com.example.GenerateTrace.main(GenerateTrace.java:5) - - id: exception.escaped - type: boolean - stability: stable - brief: > - SHOULD be set to true if the exception event is recorded at a point where - it is known that the exception is escaping the scope of the span. - note: |- - An exception is considered to have escaped (or left) the scope of a span, - if that span is ended while the exception is still logically "in flight". - This may be actually "in flight" in some languages (e.g. if the exception - is passed to a Context manager's `__exit__` method in Python) but will - usually be caught at the point of recording the exception in most languages. - - It is usually not possible to determine at the point where an exception is thrown - whether it will escape the scope of a span. - However, it is trivial to know that an exception - will escape, if one checks for an active exception just before ending the span, - as done in the [example for recording span exceptions](https://opentelemetry.io/docs/specs/semconv/exceptions/exceptions-spans/#recording-an-exception). - - It follows that an exception may still escape the scope of the span - even if the `exception.escaped` attribute was not set or set to false, - since the event might have been recorded at a time where it was not - clear whether the exception will escape. From 6edc7cab345dc19363083af16f1e83a78dec803a Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Thu, 9 Jan 2025 20:12:41 -0800 Subject: [PATCH 06/13] Apply suggestions from code review Co-authored-by: Trask Stalnaker --- .chloggen/1716.yaml | 2 +- docs/exceptions/README.md | 10 +++++----- docs/general/recording-errors.md | 6 +++--- docs/http/http-spans.md | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/.chloggen/1716.yaml b/.chloggen/1716.yaml index 1044d68c14..63c6aab06c 100644 --- a/.chloggen/1716.yaml +++ b/.chloggen/1716.yaml @@ -1,4 +1,4 @@ change_type: enhancement component: docs, db -note: Add common guidance on recording errors on spans and metrics, clarify DB conventions. +note: Add common guidance for recording errors on spans and metrics, clarify DB conventions. issues: [1536, 1716] diff --git a/docs/exceptions/README.md b/docs/exceptions/README.md index 36b32db52c..cd88e566cd 100644 --- a/docs/exceptions/README.md +++ b/docs/exceptions/README.md @@ -18,7 +18,7 @@ Semantic conventions for Exceptions are defined for the following signals: **Status**: [Development][DocumentStatus] -When instrumented operation fails with an exception, instrumentation SHOULD record +When an instrumented operation fails with an exception, instrumentation SHOULD record this exception as a [span event](exceptions-spans.md) or a [log record](exceptions-logs.md). Recording exceptions on spans SHOULD be accompanied by @@ -28,16 +28,16 @@ Recording exceptions on spans SHOULD be accompanied by Refer to the [Recording errors](/docs/general/recording-errors.md) document for additional details on how to record errors across different signals. -It's RECOMMENDED to use `Span.recordException` API or logging library API that takes exception instance +It's RECOMMENDED to use the `Span.recordException` API or logging library API that takes exception instance instead of providing individual attributes. This enables the OpenTelemetry SDK to control what information is recorded based on application configuration. It's NOT RECOMMENDED to record the same exception more than once. It's NOT RECOMMENDED to record exceptions that are handled by the instrumented library. -For example, in this code-snippet, `ResourceNotFoundException` is handled and corresponding -native instrumentation should not record it. Other exceptions, that are propagated -to the caller, should be recorded (or logged) once. +For example, in this code-snippet, `ResourceAlreadyExistsException` is handled and the corresponding +native instrumentation should not record it. Exceptions which are propagated +to the caller should be recorded (or logged) once. ```java public boolean createIfNotExists(String resourceId) throws IOException { diff --git a/docs/general/recording-errors.md b/docs/general/recording-errors.md index 97093e01d1..84e8aaa916 100644 --- a/docs/general/recording-errors.md +++ b/docs/general/recording-errors.md @@ -6,14 +6,14 @@ linkTitle: Recording errors **Status**: [Development][DocumentStatus]. -This document provides recommendations to semantic conventions and instrumentation authors +This document provides recommendations to semantic convention and instrumentation authors on how to record errors on spans and metrics. Individual semantic conventions are encouraged to provide additional guidance. ## What constitutes an error -Operation SHOULD be considered as failed if any of the following is true: +An operation SHOULD be considered as failed if any of the following is true: - an exception is thrown by the instrumented method (API, block of code, or another instrumented unit) - the instrumented method returns an error in another way, for example, via an error code @@ -31,7 +31,7 @@ Operation SHOULD be considered as failed if any of the following is true: > Instrumentations that have additional context about a specific request MAY use > this context to set the span status more precisely. -Errors that were retried or handled allowing operation to complete gracefully SHOULD NOT +Errors that were retried or handled (allowing an operation to complete gracefully) SHOULD NOT be recorded on spans or metrics that describe this operation. ## How to record errors on spans diff --git a/docs/http/http-spans.md b/docs/http/http-spans.md index ecf3c7c99f..2ac016a237 100644 --- a/docs/http/http-spans.md +++ b/docs/http/http-spans.md @@ -117,7 +117,7 @@ the client or server from sending/receiving the request/response fully. When instrumentation detects such errors it SHOULD set span status to `Error` and SHOULD set the `error.type` attribute. -Refer to the [Recording Errors](/docs/general/recording-errors.md) document for +**Status**: [Development][DocumentStatus] - Refer to the [Recording Errors](/docs/general/recording-errors.md) document for general considerations on how to record span status. ## HTTP client From 7ad4c4580f63b2eccb2ad1cdb5b3216113a69ad3 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Fri, 10 Jan 2025 08:28:23 -0800 Subject: [PATCH 07/13] Update docs/general/recording-errors.md Co-authored-by: Trask Stalnaker --- docs/general/recording-errors.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/general/recording-errors.md b/docs/general/recording-errors.md index 84e8aaa916..58929aec07 100644 --- a/docs/general/recording-errors.md +++ b/docs/general/recording-errors.md @@ -58,7 +58,7 @@ details. ## How to record errors on metrics Semantic conventions for operations usually define an operation duration histogram -metric. It SHOULD include the `error.type` attribute. This enables users to derive +metric. This metric SHOULD include the `error.type` attribute. This enables users to derive throughput and error rates. Operations that complete successfully SHOULD NOT include the `error.type` attribute, From 926585b4461458f797603d8087385b0a3bf23466 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Fri, 10 Jan 2025 08:30:02 -0800 Subject: [PATCH 08/13] nit --- docs/general/recording-errors.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/general/recording-errors.md b/docs/general/recording-errors.md index 58929aec07..01c91541ae 100644 --- a/docs/general/recording-errors.md +++ b/docs/general/recording-errors.md @@ -21,15 +21,15 @@ An operation SHOULD be considered as failed if any of the following is true: Semantic conventions that define domain-specific status codes SHOULD specify which status codes should be reported as errors by a general-purpose instrumentation. - > [!NOTE] - > - > The classification of a status code as an error depends on the context. - > For example, an HTTP 404 "Not Found" status code indicates an error if the application - > expected the resource to be available. However, it is not an error when the - > application is simply checking whether the resource exists. - > - > Instrumentations that have additional context about a specific request MAY use - > this context to set the span status more precisely. +> [!NOTE] +> +> The classification of a status code as an error depends on the context. +> For example, an HTTP 404 "Not Found" status code indicates an error if the application +> expected the resource to be available. However, it is not an error when the +> application is simply checking whether the resource exists. +> +> Instrumentations that have additional context about a specific request MAY use +> this context to set the span status more precisely. Errors that were retried or handled (allowing an operation to complete gracefully) SHOULD NOT be recorded on spans or metrics that describe this operation. From 7c037d74e2263fde627cbc5b95f7390f7f8331a1 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Fri, 10 Jan 2025 10:32:19 -0800 Subject: [PATCH 09/13] lint --- docs/exceptions/README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/exceptions/README.md b/docs/exceptions/README.md index cd88e566cd..a1dcf2e495 100644 --- a/docs/exceptions/README.md +++ b/docs/exceptions/README.md @@ -21,9 +21,10 @@ Semantic conventions for Exceptions are defined for the following signals: When an instrumented operation fails with an exception, instrumentation SHOULD record this exception as a [span event](exceptions-spans.md) or a [log record](exceptions-logs.md). -Recording exceptions on spans SHOULD be accompanied by +Recording exceptions on spans SHOULD be accompanied by: + - setting span status to `ERROR` -- setting [`error.type`](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/attributes-registry/error.md#error-type) +- setting [`error.type`](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/attributes-registry/error.md#error-type) attribute Refer to the [Recording errors](/docs/general/recording-errors.md) document for additional details on how to record errors across different signals. From 1f37350ab20c0af4670c93fc50563f8678144f15 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Wed, 15 Jan 2025 17:15:33 -0800 Subject: [PATCH 10/13] move 'recording exceptions' to recording_errors.md doc --- docs/exceptions/README.md | 52 +------------------------------- docs/general/recording-errors.md | 42 +++++++++++++++++++++++++- 2 files changed, 42 insertions(+), 52 deletions(-) diff --git a/docs/exceptions/README.md b/docs/exceptions/README.md index a1dcf2e495..6c1c976e10 100644 --- a/docs/exceptions/README.md +++ b/docs/exceptions/README.md @@ -7,61 +7,11 @@ path_base_for_github_subdir: # Semantic Conventions for Exceptions -**Status**: [Stable][DocumentStatus], Unless otherwise specified. +**Status**: [Stable][DocumentStatus] Semantic conventions for Exceptions are defined for the following signals: * [Exceptions on spans](exceptions-spans.md): Semantic Conventions for Exceptions associated with *spans*. * [Exceptions in logs](exceptions-logs.md): Semantic Conventions for Exceptions recorded in *logs*. -## Reporting exceptions in instrumentation code - -**Status**: [Development][DocumentStatus] - -When an instrumented operation fails with an exception, instrumentation SHOULD record -this exception as a [span event](exceptions-spans.md) or a [log record](exceptions-logs.md). - -Recording exceptions on spans SHOULD be accompanied by: - -- setting span status to `ERROR` -- setting [`error.type`](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/attributes-registry/error.md#error-type) attribute - -Refer to the [Recording errors](/docs/general/recording-errors.md) document for additional -details on how to record errors across different signals. - -It's RECOMMENDED to use the `Span.recordException` API or logging library API that takes exception instance -instead of providing individual attributes. This enables the OpenTelemetry SDK to -control what information is recorded based on application configuration. - -It's NOT RECOMMENDED to record the same exception more than once. -It's NOT RECOMMENDED to record exceptions that are handled by the instrumented library. - -For example, in this code-snippet, `ResourceAlreadyExistsException` is handled and the corresponding -native instrumentation should not record it. Exceptions which are propagated -to the caller should be recorded (or logged) once. - -```java -public boolean createIfNotExists(String resourceId) throws IOException { - Span span = startSpan(); - try { - create(resourceId); - return true; - } catch (ResourceAlreadyExistsException e) { - // not recording exception and not setting span status to error - exception is handled - // but we can set attributes that capture additional details - span.setAttribute(AttributeKey.stringKey("acme.resource.create.status"), "already_exists"); - return false; - } catch (IOException e) { - // recording exception here (assuming it was not recorded inside `create` method) - span.recordException(e); - // or - // logger.warn(e); - - span.setAttribute(AttributeKey.stringKey("error.type"), e.getClass().getCanonicalName()) - span.setStatus(StatusCode.ERROR, e.getMessage()); - throw e; - } -} -``` - [DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status diff --git a/docs/general/recording-errors.md b/docs/general/recording-errors.md index 01c91541ae..af118a155d 100644 --- a/docs/general/recording-errors.md +++ b/docs/general/recording-errors.md @@ -52,7 +52,7 @@ When the operation ends with an error, instrumentation: When the operation fails with an exception, the span status description SHOULD be set to the exception message. -Refer to the [general exception guidance](/docs/exceptions/README.md) on capturing exception +Refer to the [recording exceptions](#recording-errors) on capturing exception details. ## How to record errors on metrics @@ -76,5 +76,45 @@ and metrics when both are reported. A span and its corresponding metric for a si operation SHOULD have the same `error.type` value if the operation failed and SHOULD NOT include it if the operation succeeded. +## Recording exceptions + +When an instrumented operation fails with an exception, instrumentation SHOULD record +this exception as a [span event](exceptions-spans.md) or a [log record](exceptions-logs.md). + +It's RECOMMENDED to use the `Span.recordException` API or logging library API that takes exception instance +instead of providing individual attributes. This enables the OpenTelemetry SDK to +control what information is recorded based on application configuration. + +It's NOT RECOMMENDED to record the same exception more than once. +It's NOT RECOMMENDED to record exceptions that are handled by the instrumented library. + +For example, in this code-snippet, `ResourceAlreadyExistsException` is handled and the corresponding +native instrumentation should not record it. Exceptions which are propagated +to the caller should be recorded (or logged) once. + +```java +public boolean createIfNotExists(String resourceId) throws IOException { + Span span = startSpan(); + try { + create(resourceId); + return true; + } catch (ResourceAlreadyExistsException e) { + // not recording exception and not setting span status to error - exception is handled + // but we can set attributes that capture additional details + span.setAttribute(AttributeKey.stringKey("acme.resource.create.status"), "already_exists"); + return false; + } catch (IOException e) { + // recording exception here (assuming it was not recorded inside `create` method) + span.recordException(e); + // or + // logger.warn(e); + + span.setAttribute(AttributeKey.stringKey("error.type"), e.getClass().getCanonicalName()) + span.setStatus(StatusCode.ERROR, e.getMessage()); + throw e; + } +} +``` + [DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status [SpanStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.39.0/specification/trace/api.md#set-status From fb0ac8afa4695021439d22b692966bfe6949217f Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Wed, 15 Jan 2025 17:25:27 -0800 Subject: [PATCH 11/13] a string --- docs/database/couchdb.md | 2 +- docs/database/hbase.md | 2 +- docs/database/sql.md | 2 +- model/database/spans.yaml | 6 +++--- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/database/couchdb.md b/docs/database/couchdb.md index 3b94489e5a..873bd36565 100644 --- a/docs/database/couchdb.md +++ b/docs/database/couchdb.md @@ -23,7 +23,7 @@ The Semantic Conventions for [CouchDB](https://couchdb.apache.org/) extend and o |---|---|---|---|---|---| | [`db.namespace`](/docs/attributes-registry/db.md) | string | The name of the database, fully qualified within the server address and port. | `customers`; `test.users` | `Conditionally Required` If available. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | | [`db.operation.name`](/docs/attributes-registry/db.md) | string | The HTTP method + the target REST route. [1] | `GET /{db}/{docid}` | `Conditionally Required` If readily available. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | -| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | The HTTP response code returned by the Couch DB recorded as string. [2] | `200`; `201`; `429` | `Conditionally Required` [3] | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | +| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | The HTTP response code returned by the Couch DB recorded as a string. [2] | `200`; `201`; `429` | `Conditionally Required` [3] | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | | [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [4] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [5] | `80`; `8080`; `443` | `Conditionally Required` [6] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [7] | `2`; `3`; `4` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | diff --git a/docs/database/hbase.md b/docs/database/hbase.md index 6e8e2b03b9..37d4d29163 100644 --- a/docs/database/hbase.md +++ b/docs/database/hbase.md @@ -24,7 +24,7 @@ The Semantic Conventions for [HBase](https://hbase.apache.org/) extend and overr | [`db.collection.name`](/docs/attributes-registry/db.md) | string | The HBase table name. [1] | `mytable`; `ns:table` | `Conditionally Required` If applicable. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | | [`db.namespace`](/docs/attributes-registry/db.md) | string | The HBase namespace. [2] | `mynamespace` | `Conditionally Required` If applicable. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | | [`db.operation.name`](/docs/attributes-registry/db.md) | string | The name of the operation or command being executed. [3] | `findAndModify`; `HMSET`; `SELECT` | `Conditionally Required` If readily available. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | -| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | Protocol-specific response code recorded as string. [4] | `200`; `409`; `14` | `Conditionally Required` If response was received. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | +| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | Protocol-specific response code recorded as a string. [4] | `200`; `409`; `14` | `Conditionally Required` If response was received. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | | [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [5] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [6] | `80`; `8080`; `443` | `Conditionally Required` [7] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [8] | `2`; `3`; `4` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | diff --git a/docs/database/sql.md b/docs/database/sql.md index a66b348723..b935d68ee7 100644 --- a/docs/database/sql.md +++ b/docs/database/sql.md @@ -46,7 +46,7 @@ Instrumentations applied to generic SQL drivers SHOULD adhere to SQL semantic co | Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | |---|---|---|---|---|---| | [`db.namespace`](/docs/attributes-registry/db.md) | string | The database associated with the connection, fully qualified within the server address and port. [1] | `customers`; `test.users` | `Conditionally Required` If available without an additional network call. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | -| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | Database response code recorded as string. [2] | `ORA-17027`; `1052`; `2201B` | `Conditionally Required` If response has ended with warning or an error. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | +| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | Database response code recorded as a string. [2] | `ORA-17027`; `1052`; `2201B` | `Conditionally Required` If response has ended with warning or an error. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | | [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [3] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [4] | `80`; `8080`; `443` | `Conditionally Required` [5] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [6] | `2`; `3`; `4` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) | diff --git a/model/database/spans.yaml b/model/database/spans.yaml index 53dd3ae343..b03a9cf9a1 100644 --- a/model/database/spans.yaml +++ b/model/database/spans.yaml @@ -322,7 +322,7 @@ groups: conditionally_required: If readily available. - ref: db.response.status_code brief: > - Protocol-specific response code recorded as string. + Protocol-specific response code recorded as a string. examples: ["200", "409", "14"] requirement_level: conditionally_required: If response was received. @@ -354,7 +354,7 @@ groups: note: "" # overriding the base note - ref: db.response.status_code brief: > - The HTTP response code returned by the Couch DB recorded as string. + The HTTP response code returned by the Couch DB recorded as a string. note: > HTTP response codes in the 4xx and 5xx range SHOULD be considered errors. examples: ["200", "201", "429"] @@ -556,7 +556,7 @@ groups: It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization. - ref: db.response.status_code brief: > - Database response code recorded as string. + Database response code recorded as a string. note: | SQL defines [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) as a database return code which is adopted by some database systems like PostgreSQL. From c77c7d7866c943b357d1d26ffa2fa89b092f2b9f Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Wed, 15 Jan 2025 18:31:22 -0800 Subject: [PATCH 12/13] nits --- docs/general/recording-errors.md | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/docs/general/recording-errors.md b/docs/general/recording-errors.md index af118a155d..b6ed152711 100644 --- a/docs/general/recording-errors.md +++ b/docs/general/recording-errors.md @@ -6,6 +6,15 @@ linkTitle: Recording errors **Status**: [Development][DocumentStatus]. + + +- [What constitutes an error](#what-constitutes-an-error) +- [Recording errors on spans](#recording-errors-on-spans) +- [Recording errors on metrics](#recording-errors-on-metrics) +- [Recording exceptions](#recording-exceptions) + + + This document provides recommendations to semantic convention and instrumentation authors on how to record errors on spans and metrics. @@ -34,7 +43,7 @@ An operation SHOULD be considered as failed if any of the following is true: Errors that were retried or handled (allowing an operation to complete gracefully) SHOULD NOT be recorded on spans or metrics that describe this operation. -## How to record errors on spans +## Recording errors on spans [Span Status Code][SpanStatus] MUST be left unset if the instrumented operation has ended without any errors. @@ -55,7 +64,7 @@ When the operation ends with an error, instrumentation: Refer to the [recording exceptions](#recording-errors) on capturing exception details. -## How to record errors on metrics +## Recording errors on metrics Semantic conventions for operations usually define an operation duration histogram metric. This metric SHOULD include the `error.type` attribute. This enables users to derive @@ -79,7 +88,7 @@ include it if the operation succeeded. ## Recording exceptions When an instrumented operation fails with an exception, instrumentation SHOULD record -this exception as a [span event](exceptions-spans.md) or a [log record](exceptions-logs.md). +this exception as a [span event](/docs/exceptions/exceptions-spans.md) or a [log record](/docs/exceptions/exceptions-logs.md). It's RECOMMENDED to use the `Span.recordException` API or logging library API that takes exception instance instead of providing individual attributes. This enables the OpenTelemetry SDK to From 1b04f54e73a9117ad9ff2cbb7d0e066e78f3243d Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Wed, 15 Jan 2025 19:14:59 -0800 Subject: [PATCH 13/13] Update .chloggen/1716.yaml --- .chloggen/1716.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.chloggen/1716.yaml b/.chloggen/1716.yaml index 63c6aab06c..1c16bf56d1 100644 --- a/.chloggen/1716.yaml +++ b/.chloggen/1716.yaml @@ -1,4 +1,4 @@ change_type: enhancement component: docs, db note: Add common guidance for recording errors on spans and metrics, clarify DB conventions. -issues: [1536, 1716] +issues: [1516, 1536, 1716]