Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,16 @@ DataHub is a **schema-first, event-driven metadata platform** with three core la
- Frontend: Tests in `__tests__/` or `.test.tsx` files
- Smoke tests go in the `smoke-test/` directory

#### Security Testing: Configuration Property Classification

**Critical test**: `metadata-io/src/test/java/com/linkedin/metadata/system_info/collectors/PropertiesCollectorConfigurationTest.java`

This test prevents sensitive data leaks by requiring explicit classification of all configuration properties as either sensitive (redacted) or non-sensitive (visible in system info).

**When adding new configuration properties**: The test will fail with clear instructions on which classification list to add your property to. Refer to the test file's comprehensive documentation for template syntax and examples.

This is a mandatory security guardrail - never disable or skip this test.

### Commits

- Follow Conventional Commits format for commit messages
Expand Down
22 changes: 22 additions & 0 deletions docs/developers.md
Original file line number Diff line number Diff line change
Expand Up @@ -261,3 +261,25 @@ git reset --hard
```

See also [here](https://stackoverflow.com/questions/5917249/git-symbolic-links-in-windows/59761201#59761201) for more information on how to enable symbolic links on Windows 10/11 and Git.

## Security Testing

### Configuration Property Classification Test

**Location**: `metadata-io/src/test/java/com/linkedin/metadata/system_info/collectors/PropertiesCollectorConfigurationTest.java`

This test ensures all configuration properties are explicitly classified as either sensitive (redacted) or non-sensitive (visible in system info). It prevents accidental exposure of secrets through DataHub's system information endpoints.

**When you add new configuration properties:**

1. The test will fail if your property is unclassified
2. Follow the test failure message to add your property to the appropriate classification list
3. When in doubt, classify as sensitive - it's safer to over-redact than expose secrets

**Run the test:**

```bash
./gradlew :metadata-io:test --tests "*.PropertiesCollectorConfigurationTest"
```

Refer to the test file itself for comprehensive documentation on classification lists, template syntax, and examples. This is a mandatory security guardrail that protects against credential leaks.
23 changes: 23 additions & 0 deletions metadata-io/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Metadata IO Module

This module contains the core metadata I/O services for DataHub, including system information collection and property management.

## Security: Configuration Property Classification

**Critical Test**: `PropertiesCollectorConfigurationTest` enforces that all configuration properties are explicitly classified as either sensitive (redacted) or non-sensitive (visible in system info).

**Why**: Prevents accidental exposure of secrets through DataHub's system information endpoints.

**When adding new properties**: The test will fail with instructions on which classification list to add your property to. The test file contains comprehensive documentation on:

- The four classification lists (sensitive/non-sensitive, exact/template)
- Template syntax for dynamic properties (`[*]` for indices, `*` for segments)
- Security guidelines and examples

**Test Command**:

```bash
./gradlew :metadata-io:test --tests "*.PropertiesCollectorConfigurationTest"
```

**Security Rule**: When in doubt, classify as sensitive. This test is a mandatory security guardrail - never disable it.
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
package com.linkedin.metadata.system_info;

import com.fasterxml.jackson.annotation.JsonInclude;
import java.util.Map;
import lombok.Builder;
import lombok.Data;

@Data
@Builder
@JsonInclude(JsonInclude.Include.NON_NULL)
public class ComponentInfo {
private String name;
private ComponentStatus status;
private String version;
private Map<String, Object> properties;
private String errorMessage;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
package com.linkedin.metadata.system_info;

public enum ComponentStatus {
AVAILABLE,
UNAVAILABLE,
ERROR
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
package com.linkedin.metadata.system_info;

import com.fasterxml.jackson.annotation.JsonInclude;
import lombok.Builder;
import lombok.Data;

@Data
@Builder
@JsonInclude(JsonInclude.Include.NON_NULL)
public class PropertyInfo {
private String key;
private Object value;
private String source;
private String sourceType;
private String resolvedValue;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
package com.linkedin.metadata.system_info;

import com.fasterxml.jackson.annotation.JsonInclude;
import lombok.Builder;
import lombok.Data;

@Data
@Builder
@JsonInclude(JsonInclude.Include.NON_NULL)
public class PropertySourceInfo {
private String name;
private String type;
private int propertyCount;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
package com.linkedin.metadata.system_info;

import com.fasterxml.jackson.annotation.JsonInclude;
import lombok.Builder;
import lombok.Data;

@Data
@Builder
@JsonInclude(JsonInclude.Include.NON_NULL)
public class SpringComponentsInfo {
private ComponentInfo gms;
private ComponentInfo maeConsumer;
private ComponentInfo mceConsumer;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
package com.linkedin.metadata.system_info;

/** Constants for system information components */
public class SystemInfoConstants {

// Component names
public static final String GMS_COMPONENT_NAME = "GMS";
public static final String MAE_COMPONENT_NAME = "MAE Consumer";
public static final String MCE_COMPONENT_NAME = "MCE Consumer";

// Component keys for remote fetching
public static final String GMS_COMPONENT_KEY = "gms";
public static final String MAE_COMPONENT_KEY = "maeConsumer";
public static final String MCE_COMPONENT_KEY = "mceConsumer";

private SystemInfoConstants() {
// Utility class - no instantiation
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
package com.linkedin.metadata.system_info;

/** Exception thrown when system information collection fails */
public class SystemInfoException extends RuntimeException {

public SystemInfoException(String message) {
super(message);
}

public SystemInfoException(String message, Throwable cause) {
super(message, cause);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
package com.linkedin.metadata.system_info;

import com.fasterxml.jackson.annotation.JsonInclude;
import lombok.Builder;
import lombok.Data;

@Data
@Builder
@JsonInclude(JsonInclude.Include.NON_NULL)
public class SystemInfoResponse {
private SpringComponentsInfo springComponents;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
package com.linkedin.metadata.system_info;

import com.linkedin.metadata.system_info.collectors.PropertiesCollector;
import com.linkedin.metadata.system_info.collectors.SpringComponentsCollector;
import java.util.Map;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import javax.annotation.PreDestroy;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;

/**
* Service for collecting and providing system information.
*
* <p>This service orchestrates the collection of system information from various sources including:
*
* <ul>
* <li>Spring component status (GMS, MAE Consumer, MCE Consumer)
* <li>Spring application configuration properties (available via separate methods)
* <li>System properties and environment variables (available via separate methods)
* </ul>
*
* <p><strong>API Design:</strong>
*
* <ul>
* <li>The main getSystemInfo() method returns only Spring component information
* <li>Detailed system properties are available via separate getSystemPropertiesInfo() method
* <li>This separation avoids duplication and improves response clarity
* </ul>
*
* <p><strong>Security Considerations:</strong>
*
* <ul>
* <li>This service exposes sensitive system configuration data
* <li>Access should be restricted to administrators with MANAGE_SYSTEM_OPERATIONS_PRIVILEGE
* <li>Sensitive properties (passwords, secrets, keys) are automatically redacted
* <li>Property filtering is applied to prevent accidental exposure of credentials
* </ul>
*
* <p><strong>Performance:</strong>
*
* <ul>
* <li>Uses parallel execution for improved performance
* <li>Includes timeouts for remote component fetching
* <li>Graceful degradation when components are unavailable
* </ul>
*
* @see SystemInfoController for REST API endpoints
* @see PropertiesCollector for configuration property collection
* @see SpringComponentsCollector for component status collection
*/
@Slf4j
@Service
@RequiredArgsConstructor
public class SystemInfoService {

// Thread pool for parallel execution
private final ExecutorService executorService = Executors.newFixedThreadPool(10);

// Collectors
private final SpringComponentsCollector springComponentsCollector;
private final PropertiesCollector propertiesCollector;

/**
* Get Spring components information in parallel.
*
* @return SpringComponentsInfo containing status of GMS, MAE Consumer, and MCE Consumer
*/
public SpringComponentsInfo getSpringComponentsInfo() {
return springComponentsCollector.collect(executorService);
}

/**
* Get all system properties with detailed metadata.
*
* <p>Returns comprehensive property information including:
*
* <ul>
* <li>Individual property details with sources and resolution
* <li>Property source metadata
* <li>Filtering and redaction statistics
* </ul>
*
* @return SystemPropertiesInfo with detailed property metadata
*/
public SystemPropertiesInfo getSystemPropertiesInfo() {
return propertiesCollector.collect();
}

/**
* Get only configuration properties as a simple map (for backward compatibility).
*
* <p>This method provides a simplified view of system properties without metadata, suitable for
* legacy integrations or simple configuration debugging.
*
* @return Map of property keys to resolved values
*/
public Map<String, Object> getPropertiesAsMap() {
return propertiesCollector.getPropertiesAsMap();
}

/**
* Get system information - spring components only.
*
* <p>This method retrieves Spring component information including GMS, MAE Consumer, and MCE
* Consumer status. For detailed system properties information, use the separate
* getSystemPropertiesInfo() method or call the /properties endpoint directly.
*
* @return SystemInfoResponse containing component information
* @throws SystemInfoException if collection fails or times out
*/
public SystemInfoResponse getSystemInfo() {
try {
SpringComponentsInfo springComponents = getSpringComponentsInfo();
return SystemInfoResponse.builder().springComponents(springComponents).build();
} catch (Exception e) {
log.error("Error collecting system info", e);
throw new SystemInfoException("Failed to collect system information", e);
}
}

@PreDestroy
public void shutdown() {
executorService.shutdown();
try {
if (!executorService.awaitTermination(5, TimeUnit.SECONDS)) {
executorService.shutdownNow();
}
} catch (InterruptedException e) {
executorService.shutdownNow();
Thread.currentThread().interrupt();
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
package com.linkedin.metadata.system_info;

import com.fasterxml.jackson.annotation.JsonInclude;
import java.util.List;
import java.util.Map;
import lombok.Builder;
import lombok.Data;

@Data
@Builder
@JsonInclude(JsonInclude.Include.NON_NULL)
public class SystemPropertiesInfo {
private Map<String, PropertyInfo> properties;
private List<PropertySourceInfo> propertySources;
private int totalProperties;
private int redactedProperties;
}
Loading
Loading