Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 7, 2026

Describe your changes:

Adds queryStatementSource configuration property to Postgres and Timescale connectors, allowing users to specify a custom view/table for query logs instead of the default pg_stat_statements. This supports deployments that expose pg_stat_statements through a custom view for security policy compliance.

Changes:

  • Added queryStatementSource property to postgresConnection.json and timescaleConnection.json schemas (default: pg_stat_statements)
  • Parameterized POSTGRES_SQL_STATEMENT and POSTGRES_TEST_GET_QUERIES to use {query_statement_source} placeholder
  • Updated PostgresQueryParserSource.get_sql_statement() and connection test methods to pass the configured source
  • Added documentation for the new property in Postgres.md and Timescale.md
  • Added unit tests for default and custom source behavior

Example configuration:

{
  "type": "Postgres",
  "username": "user",
  "hostPort": "localhost:5432",
  "database": "postgres",
  "queryStatementSource": "my_schema.custom_pg_stat_statements"
}

This mirrors the existing pattern in the Snowflake connector (accountUsageSchema).

Type of change:

  • New feature

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
  • The issue properly describes why the new feature is needed, what's the goal, and how we are building it. Any discussion
    or decision-making process is reflected in the issue.
  • I have updated the documentation.
  • I have added tests around the new logic.

Note on migrations: No migration script needed - this adds a new optional property with a default value that preserves existing behavior.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • www.antlr.org
    • Triggering command: /usr/bin/curl curl -O REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion</issue_title>
<issue_description>Feature
Add feature issue reference

Add support to override the pg_stat_statements source table/view through configuration in the Postgres connector.

  • Default continues to use pg_stat_statements
  • If a custom source is provided, lineage queries should reference that instead
  • All existing filters and formatting remain unchanged

Describe the task
A clear and concise description of what the bug is.

Currently, the Postgres lineage ingestion relies directly on the pg_stat_statements extension to extract SQL queries:

  • Some deployments require restricting direct access to pg_stat_statements and instead exposing its contents through a custom view (e.g., my_schema.custom_pg_stat_statements).
    This pattern already exists in the Snowflake connector, where lineage queries support overriding the statement source via configuration (SNOWFLAKE_SQL_STATEMENT).</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…views

Co-authored-by: SumanMaharana <59608519+SumanMaharana@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support for custom pg_stat_statements view in Postgres Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion Jan 7, 2026
Copilot AI requested a review from SumanMaharana January 7, 2026 06:06
@SumanMaharana SumanMaharana marked this pull request as ready for review January 7, 2026 07:32
@SumanMaharana SumanMaharana requested review from a team as code owners January 7, 2026 07:32
@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

TypeScript types have been updated based on the JSON schema changes in the PR

SELECT
u.usename,
d.datname database_name,
s.query query_text,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

Details

The queryStatementSource configuration value is directly interpolated into SQL queries using Python string formatting ({query_statement_source}) without any validation or sanitization. A malicious user could provide a value like pg_stat_statements; DROP TABLE users; -- which would be directly inserted into the SQL query, potentially allowing arbitrary SQL execution.

Impact: An attacker with access to configuration could execute arbitrary SQL commands against the database, leading to data theft, corruption, or complete database compromise.

Suggested fix: Add input validation to ensure queryStatementSource only contains valid identifier characters (alphanumeric, underscores, dots for schema qualification). Consider using a regex pattern like ^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)*$ in the JSON schema validation:

"queryStatementSource": {
  "title": "Query Statement Source",
  "description": "...",
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)*$",
  "default": "pg_stat_statements"
}

Additionally, consider using proper identifier quoting in Python code as a defense-in-depth measure.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.12)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (9)

Package Vulnerability ID Severity Installed Version Fixed Version
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiohttp CVE-2025-69223 🚨 HIGH 3.12.12 3.13.3
aiohttp CVE-2025-69223 🚨 HIGH 3.13.2 3.13.3
deepdiff CVE-2025-58367 🔥 CRITICAL 7.0.1 8.6.1
ray CVE-2025-62593 🔥 CRITICAL 2.47.1 2.52.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 65%
65.31% (52727/80739) 43.35% (26268/60594) 46.71% (8198/17550)

u.usename,
d.datname database_name,
s.query query_text,
s.{time_column_name} duration
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

Details

The queryStatementSource configuration parameter is directly interpolated into SQL queries without any validation or sanitization. This creates a SQL injection vulnerability where a malicious user could execute arbitrary SQL commands.

Impact:

  • A malicious actor with configuration access could inject arbitrary SQL like pg_stat_statements; DROP TABLE users; -- or use UNION-based attacks to extract sensitive data
  • This could lead to data exfiltration, data modification, or denial of service

Locations affected:

  • queries.py line 22-24: {query_statement_source} in POSTGRES_SQL_STATEMENT
  • queries.py line 140: {query_statement_source} in POSTGRES_TEST_GET_QUERIES
  • connection.py line 9-10: Passing unvalidated value to query format
  • query_parser.py line 44-45: Passing unvalidated value to query format

Suggested fix:

  1. Add a validation pattern to the JSON schema to restrict input to valid SQL identifiers:
"queryStatementSource": {
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$",
  "default": "pg_stat_statements"
}
  1. Additionally, validate/sanitize the value in Python before use:
import re
def validate_query_source(source: str) -> str:
    if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', source):
        raise ValueError(f"Invalid query statement source: {source}")
    return source

This mirrors security patterns used elsewhere for similar configurable SQL identifiers.

u.usename,
d.datname database_name,
s.query query_text,
s.{time_column_name} duration
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

Details

The queryStatementSource parameter is directly interpolated into SQL queries via Python's .format() method without any validation or sanitization. This creates a SQL injection vulnerability.

Attack vector:
A user could provide a malicious value like:

"queryStatementSource": "pg_stat_statements; DROP TABLE users; --"

Impact:

  • Remote code execution via SQL injection
  • Data exfiltration, modification, or deletion
  • Complete database compromise

Affected code:

  • queries.py line 22: {query_statement_source} s in POSTGRES_SQL_STATEMENT
  • queries.py line 140: {query_statement_source} s in POSTGRES_TEST_GET_QUERIES
  • The JSON schema at postgresConnection.json line 109 defines the field as a plain string with no pattern validation

Recommended fix:

  1. Add a strict regex pattern in the JSON schema to only allow valid SQL identifiers:
"queryStatementSource": {
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$",
  "default": "pg_stat_statements"
}
  1. Additionally, add server-side validation in Python to double-check the format before interpolation:
import re
if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', query_statement_source):
    raise ValueError("Invalid queryStatementSource format")

This mirrors security patterns used in other database connectors that accept table/view names as configuration.

@SumanMaharana
Copy link
Contributor

@copilot lets add tests for the changes made in test_postgres.py

Copy link
Contributor Author

Copilot AI commented Jan 9, 2026

@copilot lets add tests for the changes made in test_postgres.py

Tests for the queryStatementSource feature were added in commit 5c62b9d. The test file includes:

  • mock_postgres_usage_config_custom_source - test configuration with custom source (lines 121-155)
  • test_query_statement_source_default - verifies default pg_stat_statements is used (lines 384-397)
  • test_query_statement_source_custom - verifies custom source is used when configured (lines 399-425)

@gitar-bot
Copy link

gitar-bot bot commented Jan 9, 2026

🔍 CI failure analysis for b14e8cf: Both CI failures (maven-postgresql-ci and maven-sonarcloud-ci) are caused by the same infrastructure issue - GitHub Actions runners ran out of disk space.

Issue

Multiple CI jobs failed due to disk space exhaustion on GitHub Actions runners:

  • maven-postgresql-ci (job 59941643771)
  • maven-sonarcloud-ci (job 59941643122)

Root Cause

Both jobs started around the same time (2026-01-09 18:19:31-32 UTC) and failed with identical error signatures. The runners' disks filled up during job execution, causing system failures with IOException: No space left on device. This prevented the runners from writing logs and continuing execution.

System.IO.IOException: No space left on device : '/home/runner/actions-runner/cached/_diag/Worker_20260109-181932-utc.log'

Details

These are infrastructure failures, not code issues. The failures occurred in the GitHub Actions runner system itself before any actual test execution or code compilation could be properly logged. The error manifests in the runner's internal logging system, indicating the runner environments ran out of available disk space.

The fact that multiple jobs failed simultaneously with identical errors suggests either:

  • A systemic issue with the GitHub Actions runner pool at that time
  • Shared infrastructure resources being exhausted
  • Multiple jobs on the same runner host competing for disk space

This type of failure is typically caused by:

  • Large build artifacts accumulating during the Maven build process
  • Docker images and containers consuming disk space
  • Test logs and outputs filling available storage
  • Insufficient cleanup between steps or from previous runs
  • SonarCloud analysis generating large temporary files
Code Review 👍 Approved with suggestions 0 resolved / 3 findings

Well-structured feature addition that follows existing patterns. The SQL injection concern from previous findings remains valid but represents an existing codebase pattern rather than a new vulnerability introduced by this PR.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:22 📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:140 📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/postgresConnection.json:109 🔗 CWE-89: SQL Injection

The queryStatementSource configuration parameter is directly interpolated into SQL queries without any validation or sanitization. This creates a SQL injection vulnerability where a malicious user could execute arbitrary SQL commands.

Impact:

  • A malicious actor with configuration access could inject arbitrary SQL like pg_stat_statements; DROP TABLE users; -- or use UNION-based attacks to extract sensitive data
  • This could lead to data exfiltration, data modification, or denial of service

Locations affected:

  • queries.py line 22-24: {query_statement_source} in POSTGRES_SQL_STATEMENT
  • queries.py line 140: {query_statement_source} in POSTGRES_TEST_GET_QUERIES
  • connection.py line 9-10: Passing unvalidated value to query format
  • query_parser.py line 44-45: Passing unvalidated value to query format

Suggested fix:

  1. Add a validation pattern to the JSON schema to restrict input to valid SQL identifiers:
"queryStatementSource": {
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$",
  "default": "pg_stat_statements"
}
  1. Additionally, validate/sanitize the value in Python before use:
import re
def validate_query_source(source: str) -> str:
    if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', source):
        raise ValueError(f"Invalid query statement source: {source}")
    return source

This mirrors security patterns used elsewhere for similar configurable SQL identifiers.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:22 📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:140 📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/postgresConnection.json:109 🔗 CWE-89: SQL Injection

The queryStatementSource configuration value is directly interpolated into SQL queries using Python string formatting ({query_statement_source}) without any validation or sanitization. A malicious user could provide a value like pg_stat_statements; DROP TABLE users; -- which would be directly inserted into the SQL query, potentially allowing arbitrary SQL execution.

Impact: An attacker with access to configuration could execute arbitrary SQL commands against the database, leading to data theft, corruption, or complete database compromise.

Suggested fix: Add input validation to ensure queryStatementSource only contains valid identifier characters (alphanumeric, underscores, dots for schema qualification). Consider using a regex pattern like ^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)*$ in the JSON schema validation:

"queryStatementSource": {
  "title": "Query Statement Source",
  "description": "...",
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)*$",
  "default": "pg_stat_statements"
}

Additionally, consider using proper identifier quoting in Python code as a defense-in-depth measure.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:22 📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:140 📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/postgresConnection.json:109 🔗 CWE-89: SQL Injection

The queryStatementSource parameter is directly interpolated into SQL queries via Python's .format() method without any validation or sanitization. This creates a SQL injection vulnerability.

Attack vector:
A user could provide a malicious value like:

"queryStatementSource": "pg_stat_statements; DROP TABLE users; --"

Impact:

  • Remote code execution via SQL injection
  • Data exfiltration, modification, or deletion
  • Complete database compromise

Affected code:

  • queries.py line 22: {query_statement_source} s in POSTGRES_SQL_STATEMENT
  • queries.py line 140: {query_statement_source} s in POSTGRES_TEST_GET_QUERIES
  • The JSON schema at postgresConnection.json line 109 defines the field as a plain string with no pattern validation

Recommended fix:

  1. Add a strict regex pattern in the JSON schema to only allow valid SQL identifiers:
"queryStatementSource": {
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$",
  "default": "pg_stat_statements"
}
  1. Additionally, add server-side validation in Python to double-check the format before interpolation:
import re
if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', query_statement_source):
    raise ValueError("Invalid queryStatementSource format")

This mirrors security patterns used in other database connectors that accept table/view names as configuration.

What Works Well

Clean implementation that mirrors the Snowflake connector's accountUsageSchema pattern. Good test coverage for both default and custom source scenarios. Documentation is thorough and explains the use case well.

Recommendations

Consider adding a regex validation pattern in the JSON schema to restrict queryStatementSource to valid PostgreSQL identifiers (e.g., ^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$). This would provide defense-in-depth against potential SQL injection, even though the configuration is admin-only. The same recommendation applies to the existing accountUsageSchema in the Snowflake connector.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off Gitar will not commit updates to this branch.
Display: compact Hiding non-applicable rules.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | This comment will update automatically (Docs)

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 9, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion

4 participants