[CI Failure]: mi325_1: Entrypoints Integration Test (API Server)

### Name of failing test

`pytest -v -s entrypoints/openai/test_collective_rpc.py; pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/test_collective_rpc.py --ignore=entrypoints/openai/tool_parsers/; pytest -v -s entrypoints/test_chat_utils.py`

### Basic information

- [ ] Flaky test
- [x] Can reproduce locally
- [ ] Caused by external libraries (e.g. bug in `transformers`)

### 🧪 Describe the failing test

**Failing Tests Summary:**

**test_abort_metrics_reset in test_metrics.py**
Tests: Metrics reset after request abort with frontend multiprocessing disabled
Failure: AssertionError
Configuration: --disable-frontend-multiprocessing-text flag
Likely cause: Metrics tracking not properly resetting abort counts when frontend multiprocessing is disabled, possible state management issue in metrics collection

**test_openapi_stateless[POST /tokenize] in test_openai_schema.py**
Tests: OpenAPI schema validation for tokenize endpoint using schemathesis
Failure: SUBFAIL during schema validation
Configuration: Stateless endpoint validation with generated test cases
Likely cause: Schema mismatch between OpenAPI spec and actual tokenize endpoint behavior, possibly incorrect request/response format or missing field validation

**test_mcp_tool_env_flag_enabled in test_response_api_mcp_tools.py**
Tests: MCP (Model Context Protocol) tool functionality with environment flag
Failure: Test failure for openai/gpt-oss-20b model
Configuration: model=openai/gpt-oss-20b with MCP tools enabled
Likely cause: MCP tool integration not functioning correctly for gpt-oss-20b, possibly missing tool server initialization or incorrect tool call format

**test_empty_file, test_embeddings, test_score in test_run_batch.py**
Tests: Batch processing for empty files, embeddings generation, and score/rerank endpoints
Failure: AssertionError across multiple batch API endpoints
Configuration: Batch API with /score, /rerank, /v1/score, /v2/rerank endpoints
Likely cause: Batch processing API implementation issues with request formatting, response handling, or endpoint routing for score/rerank operations

**test_same_response_as_chat_completions in test_serving_tokens.py**
Tests: Token serving consistency with chat completions API
Failure: Response mismatch between token serving and chat completions
Configuration: Comparing token-based and chat-based API responses
Likely cause: Token serving endpoint producing different output format or content than chat completions, inconsistent tokenization or response formatting

**test_basic_audio_with_lora in test_transcription_validation.py and test_translation_validation.py**
Tests: Audio transcription/translation with LoRA adapter loading
Failure: LoRA integration failure for speech models
Configuration: model=ibm-granite/granite-speech-3.3-2b with speech LoRA adapter
Likely cause: LoRA adapter loading failing for audio models, possibly incompatible adapter format or missing LoRA runtime initialization for audio modalities

**test_single_chat_session_image_base64encoded_beamsearch in test_vision.py**
Tests: Vision model inference with base64 encoded images using beam search
Failure: Beam search with vision inputs for Phi-3.5-vision-instruct
Configuration: n=2, beam_search=True, model=microsoft/Phi-3.5-vision-instruct, image_idx=3
Likely cause: Beam search implementation not correctly handling multimodal (vision) inputs, possible image tensor duplication issue or incorrect beam state management

**test_single_request in test_vision_embeds.py**
Tests: Vision embedding generation for geospatial model with custom inputs
Failure: Embedding pooling for Prithvi model with pixel_values and location_coords
Configuration: model=ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL-Sen1Floods11, runner=pooling, enable-mm-embeds
Likely cause: Custom vision embedding format not properly handled, terratorch implementation issue with pixel_values/location_coords tensor processing

**ERROR tests in test_optional_middleware.py (7 tests)**
Tests: API middleware for authentication and request ID headers
Failure: RuntimeError during test execution
Configuration: Various --api-key and --enable-request-id-headers configurations
Likely cause: Server fixture initialization failing, likely timeout or server startup failure preventing all parameterized middleware tests from executing

**ERROR tests in test_response_api_with_harmony.py (26 tests)**
Tests: Harmony API integration for stateful responses with tools, streaming, code interpreter
Failure: RuntimeError during server initialization for all test variants
Configuration: model=openai/gpt-oss-20b with various harmony API features
Likely cause: Server failing to start for gpt-oss-20b model with harmony features, possibly missing dependencies, model loading timeout, or harmony API initialization failure preventing entire test module execution

### 📝 History of failing test

AMD-CI build Buildkite references: 
- 1041
- 1077
- 1088
- 1109
- 1111

### CC List.

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CI Failure]: mi325_1: Entrypoints Integration Test (API Server) #29541

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

CC List.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[CI Failure]: mi325_1: Entrypoints Integration Test (API Server) #29541

Description

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

CC List.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions