Skip to content

Commit be7a417

Browse files
Liudmila Molkovatrask
andcommitted
Add common guidance on recording errors on spans and metrics, clarify DB conventions (open-telemetry#1716)
Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
1 parent 16ef5c8 commit be7a417

File tree

27 files changed

+240
-270
lines changed

27 files changed

+240
-270
lines changed

.chloggen/1716.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
change_type: enhancement
2+
component: docs, db
3+
note: Add common guidance for recording errors on spans and metrics, clarify DB conventions.
4+
issues: [1516, 1536, 1716]

docs/attributes-registry/exception.md

Lines changed: 10 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,30 +6,23 @@
66

77
# Exception
88

9+
- [Exception Attributes](#exception-attributes)
10+
- [Deprecated Exception Attributes](#deprecated-exception-attributes)
11+
912
## Exception Attributes
1013

1114
This document defines the shared attributes used to report a single exception associated with a span or log.
1215

1316
| Attribute | Type | Description | Examples | Stability |
1417
|---|---|---|---|---|
15-
| <a id="exception-escaped" href="#exception-escaped">`exception.escaped`</a> | boolean | SHOULD be set to true if the exception event is recorded at a point where it is known that the exception is escaping the scope of the span. [1] | | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
1618
| <a id="exception-message" href="#exception-message">`exception.message`</a> | string | The exception message. | `Division by zero`; `Can't convert 'int' object to str implicitly` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
1719
| <a id="exception-stacktrace" href="#exception-stacktrace">`exception.stacktrace`</a> | string | A stacktrace as a string in the natural representation for the language runtime. The representation is to be determined and documented by each language SIG. | `Exception in thread "main" java.lang.RuntimeException: Test exception\n at com.example.GenerateTrace.methodB(GenerateTrace.java:13)\n at com.example.GenerateTrace.methodA(GenerateTrace.java:9)\n at com.example.GenerateTrace.main(GenerateTrace.java:5)` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
1820
| <a id="exception-type" href="#exception-type">`exception.type`</a> | string | The type of the exception (its fully-qualified class name, if applicable). The dynamic type of the exception should be preferred over the static type in languages that support it. | `java.net.ConnectException`; `OSError` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
1921

20-
**[1] `exception.escaped`:** An exception is considered to have escaped (or left) the scope of a span,
21-
if that span is ended while the exception is still logically "in flight".
22-
This may be actually "in flight" in some languages (e.g. if the exception
23-
is passed to a Context manager's `__exit__` method in Python) but will
24-
usually be caught at the point of recording the exception in most languages.
25-
26-
It is usually not possible to determine at the point where an exception is thrown
27-
whether it will escape the scope of a span.
28-
However, it is trivial to know that an exception
29-
will escape, if one checks for an active exception just before ending the span,
30-
as done in the [example for recording span exceptions](https://opentelemetry.io/docs/specs/semconv/exceptions/exceptions-spans/#recording-an-exception).
31-
32-
It follows that an exception may still escape the scope of the span
33-
even if the `exception.escaped` attribute was not set or set to false,
34-
since the event might have been recorded at a time where it was not
35-
clear whether the exception will escape.
22+
## Deprecated Exception Attributes
23+
24+
Deprecated exception attributes.
25+
26+
| Attribute | Type | Description | Examples | Stability |
27+
|---|---|---|---|---|
28+
| <a id="exception-escaped" href="#exception-escaped">`exception.escaped`</a> | boolean | Indicates that the exception is escaping the scope of the span. | | ![Deprecated](https://img.shields.io/badge/-deprecated-red)<br>It's no longer recommended to record exceptions that are handled and do not escape the scope of a span. |

docs/cli/cli-spans.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@ Span kind SHOULD be `INTERNAL` when the traced program is the callee or `CLIENT`
1313
The span name SHOULD be set to `{process.executable.name}`.
1414
Instrumentations that have additional context about executed commands MAY use a different low-cardinality span name format and SHOULD document it.
1515

16-
Span status SHOULD be set to `Error` if `{process.exit.code}` is not 0.
16+
Span status SHOULD be set to `Error` if `{process.exit.code}` is not 0. Refer to the [Recording Errors](/docs/general/recording-errors.md) document for
17+
additional details on how to record span status.
1718

1819
<!-- TODO: context propagation https://github.com/open-telemetry/semantic-conventions/issues/1612 -->
1920

docs/database/cassandra.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,7 @@ system specific term if more applicable.
6969

7070
**[5] `db.operation.name`:** If readily available and if there is a single operation name that describes the database call. The operation name MAY be parsed from the query text, in which case it SHOULD be the single operation name found in the query.
7171

72-
**[6] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
73-
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
72+
**[6] `db.response.status_code`:** All Cassandra protocol error codes SHOULD be considered errors.
7473

7574
**[7] `db.response.status_code`:** If the operation failed and status code is available.
7675

docs/database/cosmosdb.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -193,8 +193,7 @@ additional values when introducing new operations.
193193

194194
**[5] `db.operation.name`:** If readily available and if there is a single operation name that describes the database call. The operation name MAY be parsed from the query text, in which case it SHOULD be the single operation name found in the query.
195195

196-
**[6] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
197-
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
196+
**[6] `db.response.status_code`:** Response codes in the 4xx and 5xx range SHOULD be considered errors.
198197

199198
**[7] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred.
200199
When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred.

docs/database/couchdb.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,16 +23,15 @@ The Semantic Conventions for [CouchDB](https://couchdb.apache.org/) extend and o
2323
|---|---|---|---|---|---|
2424
| [`db.namespace`](/docs/attributes-registry/db.md) | string | The name of the database, fully qualified within the server address and port. | `customers`; `test.users` | `Conditionally Required` If available. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
2525
| [`db.operation.name`](/docs/attributes-registry/db.md) | string | The HTTP method + the target REST route. [1] | `GET /{db}/{docid}` | `Conditionally Required` If readily available. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
26-
| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | The HTTP response code returned by the Couch DB. [2] | `200`; `201`; `429` | `Conditionally Required` [3] | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
26+
| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | The HTTP response code returned by the Couch DB recorded as a string. [2] | `200`; `201`; `429` | `Conditionally Required` [3] | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
2727
| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [4] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
2828
| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [5] | `80`; `8080`; `443` | `Conditionally Required` [6] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
2929
| [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [7] | `2`; `3`; `4` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
3030
| [`server.address`](/docs/attributes-registry/server.md) | string | Name of the database host. [8] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
3131

3232
**[1] `db.operation.name`:** In **CouchDB**, `db.operation.name` should be set to the HTTP method + the target REST route according to the API reference documentation. For example, when retrieving a document, `db.operation.name` would be set to (literally, i.e., without replacing the placeholders with concrete values): [`GET /{db}/{docid}`](https://docs.couchdb.org/en/stable/api/document/common.html#get--db-docid).
3333

34-
**[2] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
35-
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
34+
**[2] `db.response.status_code`:** HTTP response codes in the 4xx and 5xx range SHOULD be considered errors.
3635

3736
**[3] `db.response.status_code`:** If response was received and the HTTP response code is available.
3837

docs/database/database-spans.md

Lines changed: 4 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ linkTitle: Client Calls
1212

1313
- [Name](#name)
1414
- [Status](#status)
15-
- [Recording exception events](#recording-exception-events)
1615
- [Common attributes](#common-attributes)
1716
- [Notes and well-known identifiers for `db.system`](#notes-and-well-known-identifiers-for-dbsystem)
1817
- [Sanitization of `db.query.text`](#sanitization-of-dbquerytext)
@@ -89,59 +88,11 @@ For example, for an operation describing SQL query on an anonymous table like `S
8988

9089
## Status
9190

92-
[Span Status Code][SpanStatus] MUST be left unset if the operation has ended without any errors.
91+
Refer to the [Recording Errors](/docs/general/recording-errors.md) document for
92+
details on how to record span status.
9393

94-
Instrumentation SHOULD consider the operation as failed if any of the following is true:
95-
96-
- the `db.response.status_code` value indicates an error
97-
98-
> [!NOTE]
99-
>
100-
> The classification of status code as an error depends on the context.
101-
> For example, a SQL STATE `02000` (`no_data`) indicates an error when the application
102-
> expected the data to be available. However, it is not an error when the
103-
> application is simply checking whether the data exists.
104-
>
105-
> Instrumentations that have additional context about a specific operation MAY use
106-
> this context to set the span status more precisely.
107-
> Instrumentations that don't have any additional context MUST follow the
108-
> guidelines in this section.
109-
110-
- an exception is thrown by the instrumented method call
111-
- the instrumented method returns an error in another way
112-
113-
When the operation ends with an error, instrumentation:
114-
115-
- SHOULD set the span status code to `Error`
116-
- SHOULD set the `error.type` attribute
117-
- SHOULD set the span status description when it has additional information
118-
about the error which is not expected to contain sensitive details and aligns
119-
with [Span Status Description][SpanStatus] definition.
120-
121-
It's NOT RECOMMENDED to duplicate `db.response.status_code` or `error.type`
122-
in span status description.
123-
124-
When the operation fails with an exception, the span status description SHOULD be set to
125-
the exception message.
126-
127-
### Recording exception events
128-
129-
**Status**: [Experimental][DocumentStatus]
130-
131-
When the operation fails with an exception, instrumentation SHOULD record
132-
an [exception event](../exceptions/exceptions-spans.md) by default if, and only if,
133-
the span being recorded is a local root span (does not have a local parent).
134-
135-
> [!NOTE]
136-
>
137-
> Exception stack traces could be very long and are expensive to capture and store.
138-
> Exceptions which are not handled by instrumented libraries are likely to be handled
139-
> and logged by the caller.
140-
> Exceptions that are not handled will be recorded by the outermost (local root)
141-
> instrumentation such as HTTP or gRPC server.
142-
143-
Instrumentation MAY provide a configuration option to record exceptions that
144-
escape the surface of the instrumented API.
94+
Semantic conventions for individual systems SHOULD specify which values of `db.response.status_code`
95+
classify as errors.
14596

14697
## Common attributes
14798

@@ -466,4 +417,3 @@ More specific Semantic Conventions are defined for the following database techno
466417
* [SQL](sql.md): Semantic Conventions for *SQL* databases.
467418

468419
[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
469-
[SpanStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.39.0/specification/trace/api.md#set-status

docs/database/elasticsearch.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,7 @@ When a query string value is redacted, the query string key SHOULD still be pres
8282

8383
**[4] `db.elasticsearch.path_parts`:** Many Elasticsearch url paths allow dynamic values. These SHOULD be recorded in span attributes in the format `db.elasticsearch.path_parts.<key>`, where `<key>` is the url path part name. The implementation SHOULD reference the [elasticsearch schema](https://raw.githubusercontent.com/elastic/elasticsearch-specification/main/output/schema/schema.json) in order to map the path part values to their names.
8484

85-
**[5] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
86-
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
85+
**[5] `db.response.status_code`:** HTTP response codes in the 4xx and 5xx range SHOULD be considered errors.
8786

8887
**[6] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred.
8988
When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred.

docs/database/hbase.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ The Semantic Conventions for [HBase](https://hbase.apache.org/) extend and overr
2424
| [`db.collection.name`](/docs/attributes-registry/db.md) | string | The HBase table name. [1] | `mytable`; `ns:table` | `Conditionally Required` If applicable. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
2525
| [`db.namespace`](/docs/attributes-registry/db.md) | string | The HBase namespace. [2] | `mynamespace` | `Conditionally Required` If applicable. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
2626
| [`db.operation.name`](/docs/attributes-registry/db.md) | string | The name of the operation or command being executed. [3] | `findAndModify`; `HMSET`; `SELECT` | `Conditionally Required` If readily available. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
27-
| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | Protocol-specific response code recorded as string. [4] | `200`; `409`; `14` | `Conditionally Required` If response was received. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
27+
| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | Protocol-specific response code recorded as a string. [4] | `200`; `409`; `14` | `Conditionally Required` If response was received. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
2828
| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [5] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
2929
| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [6] | `80`; `8080`; `443` | `Conditionally Required` [7] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
3030
| [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [8] | `2`; `3`; `4` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |

docs/database/mariadb.md

Lines changed: 3 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -42,41 +42,9 @@ Instrumentation SHOULD document if `db.namespace` reflects the database provided
4242

4343
It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization.
4444

45-
**[2] `db.response.status_code`:** SQL defines [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) as a database
46-
return code which is adopted by some database systems like PostgreSQL.
47-
See [PostgreSQL error codes](https://www.postgresql.org/docs/current/errcodes-appendix.html)
48-
for the details.
49-
50-
Other systems like MySQL, Oracle, or MS SQL Server define vendor-specific
51-
error codes. Database SQL drivers usually provide access to both properties.
52-
For example, in Java, the [`SQLException`](https://docs.oracle.com/javase/8/docs/api/java/sql/SQLException.html)
53-
class reports them with `getSQLState()` and `getErrorCode()` methods.
54-
55-
Instrumentations SHOULD populate the `db.response.status_code` with the
56-
the most specific code available to them.
57-
58-
Here's a non-exhaustive list of databases that report vendor-specific
59-
codes with granularity higher than SQLSTATE (or don't report SQLSTATE
60-
at all):
61-
62-
- [DB2 SQL codes](https://www.ibm.com/docs/db2-for-zos/12?topic=codes-sql).
63-
- [Maria DB error codes](https://mariadb.com/kb/en/mariadb-error-code-reference/)
64-
- [Microsoft SQL Server errors](https://docs.microsoft.com/sql/relational-databases/errors-events/database-engine-events-and-errors)
65-
- [MySQL error codes](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html)
66-
- [Oracle error codes](https://docs.oracle.com/cd/B28359_01/server.111/b28278/toc.htm)
67-
- [SQLite result codes](https://www.sqlite.org/rescode.html)
68-
69-
These systems SHOULD set the `db.response.status_code` to a
70-
known vendor-specific error code. If only SQLSTATE is available,
71-
it SHOULD be used.
72-
73-
When multiple error codes are available and specificity is unclear,
74-
instrumentation SHOULD set the `db.response.status_code` to the
75-
concatenated string of all codes with '/' used as a separator.
76-
77-
For example, generic DB instrumentation that detected an error and has
78-
SQLSTATE `"42000"` and vendor-specific `1071` should set
79-
`db.response.status_code` to `"42000/1071"`."
45+
**[2] `db.response.status_code`:** MariaDB uses vendor-specific error codes on all errors and reports [SQLSTATE](https://mariadb.com/kb/en/sqlstate/) in some cases.
46+
MariaDB error codes are more granular than SQLSTATE, so MariaDB instrumentations SHOULD set the `db.response.status_code` to this known error code.
47+
When SQLSTATE is available, SQLSTATE of "Class 02" or higher SHOULD be considered errors. When SQLSTATE is not available, all MariaDB error codes SHOULD be considered errors.
8048

8149
**[3] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred.
8250
When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred.

0 commit comments

Comments
 (0)