Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion #25104

Copilot · 2026-01-07T05:49:25Z

Describe your changes:

Adds queryStatementSource configuration property to Postgres and Timescale connectors, allowing users to specify a custom view/table for query logs instead of the default pg_stat_statements. This supports deployments that expose pg_stat_statements through a custom view for security policy compliance.

Changes:

Added queryStatementSource property to postgresConnection.json and timescaleConnection.json schemas (default: pg_stat_statements)
Parameterized POSTGRES_SQL_STATEMENT and POSTGRES_TEST_GET_QUERIES to use {query_statement_source} placeholder
Updated PostgresQueryParserSource.get_sql_statement() and connection test methods to pass the configured source
Added documentation for the new property in Postgres.md and Timescale.md
Added unit tests for default and custom source behavior

Example configuration:

{
  "type": "Postgres",
  "username": "user",
  "hostPort": "localhost:5432",
  "database": "postgres",
  "queryStatementSource": "my_schema.custom_pg_stat_statements"
}

This mirrors the existing pattern in the Snowflake connector (accountUsageSchema).

Type of change:

New feature

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

The issue properly describes why the new feature is needed, what's the goal, and how we are building it. Any discussion
or decision-making process is reflected in the issue.
I have updated the documentation.
I have added tests around the new logic.

Note on migrations: No migration script needed - this adds a new optional property with a default value that preserves existing behavior.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

www.antlr.org
- Triggering command: /usr/bin/curl curl -O REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Original prompt

This section details on the original issue you should resolve

<issue_title>Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion</issue_title>
<issue_description>Feature
Add feature issue reference

Add support to override the pg_stat_statements source table/view through configuration in the Postgres connector.

Default continues to use pg_stat_statements

If a custom source is provided, lineage queries should reference that instead

All existing filters and formatting remain unchanged

Describe the task
A clear and concise description of what the bug is.

Currently, the Postgres lineage ingestion relies directly on the pg_stat_statements extension to extract SQL queries:

Some deployments require restricting direct access to pg_stat_statements and instead exposing its contents through a custom view (e.g., my_schema.custom_pg_stat_statements).
This pattern already exists in the Snowflake connector, where lineage queries support overriding the statement source via configuration (SNOWFLAKE_SQL_STATEMENT).</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion #24865

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…views Co-authored-by: SumanMaharana <59608519+SumanMaharana@users.noreply.github.com>

github-actions · 2026-01-07T07:32:18Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

github-actions · 2026-01-07T07:35:38Z

TypeScript types have been updated based on the JSON schema changes in the PR

gitar-bot · 2026-01-07T23:32:19Z

ingestion/src/metadata/ingestion/source/database/postgres/queries.py

      SELECT
        u.usename,
        d.datname database_name,
        s.query query_text,


🚨 Security: SQL injection via unvalidated queryStatementSource parameter

Details

The queryStatementSource configuration value is directly interpolated into SQL queries using Python string formatting ({query_statement_source}) without any validation or sanitization. A malicious user could provide a value like pg_stat_statements; DROP TABLE users; -- which would be directly inserted into the SQL query, potentially allowing arbitrary SQL execution.

Impact: An attacker with access to configuration could execute arbitrary SQL commands against the database, leading to data theft, corruption, or complete database compromise.

Suggested fix: Add input validation to ensure queryStatementSource only contains valid identifier characters (alphanumeric, underscores, dots for schema qualification). Consider using a regex pattern like ^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)*$ in the JSON schema validation:

"queryStatementSource": { "title": "Query Statement Source", "description": "...", "type": "string", "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)*$", "default": "pg_stat_statements" }

Additionally, consider using proper identifier quoting in Python code as a defense-in-depth measure.

github-actions · 2026-01-07T23:42:03Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.12)`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (33)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/extended_sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/lineage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data_aut.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

No Vulnerabilities Found

github-actions · 2026-01-07T23:42:42Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion:trivy (debian 12.12)`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (33)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (9)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`Werkzeug`	CVE-2024-34069	🚨 HIGH	2.2.3	3.0.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.12.12	3.13.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.13.2	3.13.3
`deepdiff`	CVE-2025-58367	🔥 CRITICAL	7.0.1	8.6.1
`ray`	CVE-2025-62593	🔥 CRITICAL	2.47.1	2.52.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

No Vulnerabilities Found

github-actions · 2026-01-07T23:59:25Z

Jest test Coverage

UI tests summary

Lines	Statements	Branches	Functions
	65.31% (52727/80739)	43.35% (26268/60594)	46.71% (8198/17550)

gitar-bot · 2026-01-08T00:18:16Z

ingestion/src/metadata/ingestion/source/database/postgres/queries.py

        u.usename,
        d.datname database_name,
        s.query query_text,
        s.{time_column_name} duration


🚨 Security: SQL injection via unvalidated queryStatementSource parameter

Details

The queryStatementSource configuration parameter is directly interpolated into SQL queries without any validation or sanitization. This creates a SQL injection vulnerability where a malicious user could execute arbitrary SQL commands.

Impact:

A malicious actor with configuration access could inject arbitrary SQL like pg_stat_statements; DROP TABLE users; -- or use UNION-based attacks to extract sensitive data

This could lead to data exfiltration, data modification, or denial of service

Locations affected:

queries.py line 22-24: {query_statement_source} in POSTGRES_SQL_STATEMENT

queries.py line 140: {query_statement_source} in POSTGRES_TEST_GET_QUERIES

connection.py line 9-10: Passing unvalidated value to query format

query_parser.py line 44-45: Passing unvalidated value to query format

Suggested fix:

Add a validation pattern to the JSON schema to restrict input to valid SQL identifiers:

"queryStatementSource": { "type": "string", "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$", "default": "pg_stat_statements" }

Additionally, validate/sanitize the value in Python before use:

import re def validate_query_source(source: str) -> str: if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', source): raise ValueError(f"Invalid query statement source: {source}") return source

This mirrors security patterns used elsewhere for similar configurable SQL identifiers.

gitar-bot · 2026-01-08T18:48:49Z

ingestion/src/metadata/ingestion/source/database/postgres/queries.py

        u.usename,
        d.datname database_name,
        s.query query_text,
        s.{time_column_name} duration


🚨 Security: SQL injection via unvalidated queryStatementSource parameter

Details

The queryStatementSource parameter is directly interpolated into SQL queries via Python's .format() method without any validation or sanitization. This creates a SQL injection vulnerability.

Attack vector:
A user could provide a malicious value like:

"queryStatementSource": "pg_stat_statements; DROP TABLE users; --"

Impact:

Remote code execution via SQL injection

Data exfiltration, modification, or deletion

Complete database compromise

Affected code:

queries.py line 22: {query_statement_source} s in POSTGRES_SQL_STATEMENT

queries.py line 140: {query_statement_source} s in POSTGRES_TEST_GET_QUERIES

The JSON schema at postgresConnection.json line 109 defines the field as a plain string with no pattern validation

Recommended fix:

Add a strict regex pattern in the JSON schema to only allow valid SQL identifiers:

"queryStatementSource": { "type": "string", "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$", "default": "pg_stat_statements" }

Additionally, add server-side validation in Python to double-check the format before interpolation:

import re if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', query_statement_source): raise ValueError("Invalid queryStatementSource format")

This mirrors security patterns used in other database connectors that accept table/view names as configuration.

SumanMaharana · 2026-01-09T15:23:04Z

@copilot lets add tests for the changes made in test_postgres.py

Copilot · 2026-01-09T15:25:37Z

@copilot lets add tests for the changes made in test_postgres.py

Tests for the queryStatementSource feature were added in commit 5c62b9d. The test file includes:

mock_postgres_usage_config_custom_source - test configuration with custom source (lines 121-155)
test_query_statement_source_default - verifies default pg_stat_statements is used (lines 384-397)
test_query_statement_source_custom - verifies custom source is used when configured (lines 399-425)

gitar-bot · 2026-01-09T18:24:20Z

🔍 CI failure analysis for b14e8cf: Both CI failures (maven-postgresql-ci and maven-sonarcloud-ci) are caused by the same infrastructure issue - GitHub Actions runners ran out of disk space.

Issue

Multiple CI jobs failed due to disk space exhaustion on GitHub Actions runners:

maven-postgresql-ci (job 59941643771)
maven-sonarcloud-ci (job 59941643122)

Root Cause

Both jobs started around the same time (2026-01-09 18:19:31-32 UTC) and failed with identical error signatures. The runners' disks filled up during job execution, causing system failures with IOException: No space left on device. This prevented the runners from writing logs and continuing execution.

System.IO.IOException: No space left on device : '/home/runner/actions-runner/cached/_diag/Worker_20260109-181932-utc.log'

Details

These are infrastructure failures, not code issues. The failures occurred in the GitHub Actions runner system itself before any actual test execution or code compilation could be properly logged. The error manifests in the runner's internal logging system, indicating the runner environments ran out of available disk space.

The fact that multiple jobs failed simultaneously with identical errors suggests either:

A systemic issue with the GitHub Actions runner pool at that time
Shared infrastructure resources being exhausted
Multiple jobs on the same runner host competing for disk space

This type of failure is typically caused by:

Large build artifacts accumulating during the Maven build process
Docker images and containers consuming disk space
Test logs and outputs filling available storage
Insufficient cleanup between steps or from previous runs
SonarCloud analysis generating large temporary files

Code Review 👍 Approved with suggestions 0 resolved / 3 findings

Well-structured feature addition that follows existing patterns. The SQL injection concern from previous findings remains valid but represents an existing codebase pattern rather than a new vulnerability introduced by this PR.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:22 📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:140 📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/postgresConnection.json:109 🔗 CWE-89: SQL Injection

The queryStatementSource configuration parameter is directly interpolated into SQL queries without any validation or sanitization. This creates a SQL injection vulnerability where a malicious user could execute arbitrary SQL commands.

Impact:

A malicious actor with configuration access could inject arbitrary SQL like pg_stat_statements; DROP TABLE users; -- or use UNION-based attacks to extract sensitive data
This could lead to data exfiltration, data modification, or denial of service

Locations affected:

queries.py line 22-24: {query_statement_source} in POSTGRES_SQL_STATEMENT
queries.py line 140: {query_statement_source} in POSTGRES_TEST_GET_QUERIES
connection.py line 9-10: Passing unvalidated value to query format
query_parser.py line 44-45: Passing unvalidated value to query format

Suggested fix:

Add a validation pattern to the JSON schema to restrict input to valid SQL identifiers:

"queryStatementSource": {
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$",
  "default": "pg_stat_statements"
}

Additionally, validate/sanitize the value in Python before use:

import re
def validate_query_source(source: str) -> str:
    if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', source):
        raise ValueError(f"Invalid query statement source: {source}")
    return source

This mirrors security patterns used elsewhere for similar configurable SQL identifiers.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:22 📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:140 📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/postgresConnection.json:109 🔗 CWE-89: SQL Injection

The queryStatementSource configuration value is directly interpolated into SQL queries using Python string formatting ({query_statement_source}) without any validation or sanitization. A malicious user could provide a value like pg_stat_statements; DROP TABLE users; -- which would be directly inserted into the SQL query, potentially allowing arbitrary SQL execution.

Impact: An attacker with access to configuration could execute arbitrary SQL commands against the database, leading to data theft, corruption, or complete database compromise.

Suggested fix: Add input validation to ensure queryStatementSource only contains valid identifier characters (alphanumeric, underscores, dots for schema qualification). Consider using a regex pattern like ^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)*$ in the JSON schema validation:

"queryStatementSource": {
  "title": "Query Statement Source",
  "description": "...",
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)*$",
  "default": "pg_stat_statements"
}

Additionally, consider using proper identifier quoting in Python code as a defense-in-depth measure.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:22 📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:140 📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/postgresConnection.json:109 🔗 CWE-89: SQL Injection

The queryStatementSource parameter is directly interpolated into SQL queries via Python's .format() method without any validation or sanitization. This creates a SQL injection vulnerability.

Attack vector:
A user could provide a malicious value like:

"queryStatementSource": "pg_stat_statements; DROP TABLE users; --"

Impact:

Remote code execution via SQL injection
Data exfiltration, modification, or deletion
Complete database compromise

Affected code:

queries.py line 22: {query_statement_source} s in POSTGRES_SQL_STATEMENT
queries.py line 140: {query_statement_source} s in POSTGRES_TEST_GET_QUERIES
The JSON schema at postgresConnection.json line 109 defines the field as a plain string with no pattern validation

Recommended fix:

Add a strict regex pattern in the JSON schema to only allow valid SQL identifiers:

"queryStatementSource": {
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$",
  "default": "pg_stat_statements"
}

Additionally, add server-side validation in Python to double-check the format before interpolation:

import re
if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', query_statement_source):
    raise ValueError("Invalid queryStatementSource format")

This mirrors security patterns used in other database connectors that accept table/view names as configuration.

What Works Well

Clean implementation that mirrors the Snowflake connector's accountUsageSchema pattern. Good test coverage for both default and custom source scenarios. Documentation is thorough and explains the use case well.

Recommendations

Consider adding a regex validation pattern in the JSON schema to restrict queryStatementSource to valid PostgreSQL identifiers (e.g., ^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$). This would provide defense-in-depth against potential SQL injection, even though the configuration is admin-only. The same recommendation applies to the existing accountUsageSchema in the Snowflake connector.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off Gitar will not commit updates to this branch.
Display: compact Hiding non-applicable rules.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | This comment will update automatically (Docs)}

sonarqubecloud · 2026-01-09T18:49:38Z

Quality Gate passed for 'open-metadata-ui'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Initial plan

f4db374

Copilot AI assigned Copilot and SumanMaharana Jan 7, 2026

Copilot started work on behalf of SumanMaharana January 7, 2026 05:49 View session

Add queryStatementSource configuration for custom pg_stat_statements …

5c62b9d

…views Co-authored-by: SumanMaharana <59608519+SumanMaharana@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add support for custom pg_stat_statements view in Postgres~~ Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion Jan 7, 2026

Copilot AI requested a review from SumanMaharana January 7, 2026 06:06

Copilot finished work on behalf of SumanMaharana January 7, 2026 06:06

SumanMaharana marked this pull request as ready for review January 7, 2026 07:32

SumanMaharana requested review from a team as code owners January 7, 2026 07:32

SumanMaharana had a problem deploying to test January 7, 2026 07:32 — with GitHub Actions Failure

Update generated TypeScript types

4147575

harshach added the safe to test Add this label to run secure Github workflows on PRs label Jan 7, 2026

harshach had a problem deploying to test January 7, 2026 23:30 — with GitHub Actions Error

Merge branch 'main' into copilot/add-custom-pg-stat-statements-support

b303a90

harshach temporarily deployed to test January 7, 2026 23:31 — with GitHub Actions Inactive

gitar-bot bot reviewed Jan 7, 2026

View reviewed changes

gitar-bot bot reviewed Jan 8, 2026

View reviewed changes

Merge branch 'main' into copilot/add-custom-pg-stat-statements-support

fb07505

ulixius9 had a problem deploying to test January 8, 2026 18:42 — with GitHub Actions Failure

gitar-bot bot reviewed Jan 8, 2026

View reviewed changes

Copilot started work on behalf of SumanMaharana January 9, 2026 15:24 View session

Copilot finished work on behalf of SumanMaharana January 9, 2026 15:26

Merge branch 'main' into copilot/add-custom-pg-stat-statements-support

b14e8cf

SumanMaharana temporarily deployed to test January 9, 2026 18:19 — with GitHub Actions Inactive

SumanMaharana had a problem deploying to test January 9, 2026 18:19 — with GitHub Actions Failure

SumanMaharana temporarily deployed to test January 9, 2026 18:19 — with GitHub Actions Inactive

SumanMaharana had a problem deploying to test January 9, 2026 18:19 — with GitHub Actions Failure

Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion #25104

Are you sure you want to change the base?

Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion #25104

Conversation

Copilot AI commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

Checklist:

I tried to connect to the following addresses, but was blocked by firewall rules:

Comments on the Issue (you are @copilot in this section)

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

gitar-bot bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.12)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (4)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (9)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

Copilot AI commented Jan 7, 2026 •

edited

Loading

github-actions bot commented Jan 7, 2026 •

edited

Loading

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.12)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/ingestion/pipelines/extended_sample_data.yaml`

Target: `/ingestion/pipelines/lineage.yaml`

Target: `/ingestion/pipelines/sample_data.json`

Target: `/ingestion/pipelines/sample_data.yaml`

Target: `/ingestion/pipelines/sample_data_aut.yaml`

Target: `/ingestion/pipelines/sample_usage.json`

Target: `/ingestion/pipelines/sample_usage.yaml`

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

github-actions bot commented Jan 7, 2026 •

edited

Loading

Target: `openmetadata-ingestion:trivy (debian 12.12)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

github-actions bot commented Jan 7, 2026 •

edited

Loading

gitar-bot bot commented Jan 9, 2026 •

edited

Loading