[Import-Document-AI] Stix 2.1 model support gaps

# STIX 2.1 model support gaps in ImportDocumentAI connector

## Description

The ImportDocumentAI connector intermittently fails to emit valid inter-entity STIX 2.1 relationships and omits support for several STIX object types and workflows. In practice this shows up as:

* Predicted LLM relationships between SDOs (e.g., `uses`, `attributed-to`) not appearing in OpenCTI, with only item→container links created.
* Some invalid relationships being attempted against the API (e.g., wrong type pairs) instead of being pre-validated and skipped.
* Limited support for STIX objects (SDOs/SCOs) commonly extracted from reports, resulting in missing coverage or dropped objects.
* In span-based extraction mode, relationships are lost when temporary span IDs are not resolved to final STIX IDs before bundling.

Collectively, these gaps reduce fidelity of the imported knowledge graph and create noisy import behavior.

## Environment

1. OS (where OpenCTI server runs): {e.g., Ubuntu 22.04}
2. OpenCTI version: {e.g., 6.x.y}
3. OpenCTI client: Connector – ImportDocumentAI
4. Other environment details: {DB/Queue, Python version, connector image tag}

## Reproducible Steps

Smallest reproducible scenario using a short report that mentions a known intrusion set, malware family, and TTPs:

1. Ingest a PDF or text document with content such as:

   * “APT Foo (aka Bar) uses MalwareX to deliver payloads via Phishing (T1566). They also leverage Web Protocols (T1071.001). Activity is attributed to Threat Group Z.”
2. Configure ImportDocumentAI to use the LLM extraction path (OpenAI/Azure OpenAI) with span-based output enabled (default in many deployments).
3. Run the connector so it extracts entities and predicted relationships.
4. Inspect the resulting Report in OpenCTI and the import job logs.

## Expected Output

* Entities/observables are created for the intrusion set (or threat-actor), malware, and ATT&CK attack-patterns (`T1566`, `T1071.001`), plus extracted observables.
* Inter-entity relationships are emitted and visible in the graph, e.g.:

  * `intrusion-set --(uses)--> malware`
  * `intrusion-set --(uses)--> attack-pattern`
  * `intrusion-set --(attributed-to)--> threat-actor`
* Invalid relationships (e.g., `domain-name --(located-at)--> country`) are pre-validated against the OpenCTI schema and **not** sent to the API.
* Container (Report / Grouping / Case) includes `object_refs` to all created objects, with correct “related-to” back-links where appropriate.

## Actual Output

* The connector often creates only item→container links (e.g., report‐level `related-to`) and **drops** valid predicted relationships between SDOs.
* Some invalid relationships are attempted and rejected by the API with errors similar to:
  `{'name': 'FUNCTIONAL_ERROR', 'error_message': 'Only stix-core-relationship can be created through this method.'}`
* In span-based mode, relationships are silently lost when temporary span IDs (e.g., `from_id`/`to_id` tokens) are not mapped to the final STIX IDs in the bundle.
* ATT&CK IDs or names may not resolve to existing `attack-pattern` SDOs, leading to missing TTP relationships or duplicated custom objects.

## Additional information

### Observable symptoms from logs (illustrative)

* Predicted relationships recognized by the extraction step (e.g., `INTRUSION-SET -[USES]-> MALWARE`, `INTRUSION-SET -[USES]-> ATTACK-PATTERN`) but not present in the final graph.
* Skips or API rejections for invalid relations (e.g., `DOMAIN-NAME -[LOCATED-AT]-> COUNTRY`) should be filtered client-side by the connector using the OpenCTI relation matrix, but currently reach the API.

### Likely root causes in the current connector

* **No ID remap before bundling (span mode):** predicted relations reference temporary span UUIDs; if these aren’t expanded to the final STIX IDs, edges are dropped.
* **No pre-validation of relation types:** the connector does not consistently consult the OpenCTI schema relation mapping, so it may both discard valid relations (over-filtering) and attempt invalid ones (under-filtering).
* **Limited STIX coverage:** several commonly-seen SDOs/SCOs are not emitted or are emitted inconsistently, reducing graph completeness.
* **ATT&CK resolution not robust:** lack of a cache or name/ID dual lookup leads to missed `attack-pattern` reuse and broken `uses` links.

### Impact

* Relationship loss results in incomplete knowledge graphs, weaker analytics, and user confusion (entities appear, but inter-entity context is missing).
* API errors add noise to import logs and slow down bulk processing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Import-Document-AI] Stix 2.1 model support gaps #4907

STIX 2.1 model support gaps in ImportDocumentAI connector

Description

Environment

Reproducible Steps

Expected Output

Actual Output

Additional information

Observable symptoms from logs (illustrative)

Likely root causes in the current connector

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Import-Document-AI] Stix 2.1 model support gaps #4907

Description

STIX 2.1 model support gaps in ImportDocumentAI connector

Description

Environment

Reproducible Steps

Expected Output

Actual Output

Additional information

Observable symptoms from logs (illustrative)

Likely root causes in the current connector

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions