Skip to content

Wrong prefix parsing when JSON-LD root is an array #3335

@ziodave

Description

@ziodave

Version

5.5.0

What happened?

The following JSON-LD will cause Jena to parse description as a prefix to the A value that ends with a colon: namespace:

[
            {
                "@context": "http://schema.org",
                "@id": "https://data.example.org/dataset/entity",
                "@type": "Thing",
                "description": "A value that ends with a colon:"
            }
]

The JSON-LD is valid and correctly expanded by the JSON-LD Plaground, https://json-ld.org/playground/

Image

This is the test to reproduce the issue:

https://github.com/ziodave/jena/blob/main/jena-arq/src/test/java/org/apache/jena/riot/lang/TestPrefixes.java

    @Test
    public void prefixes_01() {

        String json = """
                [
                             {
                                 "@context": "http://schema.org",
                                 "@id": "https://data.example.org/dataset/entity",
                                 "@type": "Thing",
                                 "description": "A value that ends with a colon:"
                             }
                 ]
                """;

        Model model = ModelFactory.createDefaultModel();
        StringReader sr = new StringReader(json);
        RDFDataMgr.read(model, sr, null, Lang.JSONLD11);
        Assertions.assertFalse(model.isEmpty());
        Assertions.assertTrue(model.getNsPrefixMap().isEmpty(), () -> {
            StringBuilder sb = new StringBuilder("Found the following namespaces, expecting none:\n");
            model.getNsPrefixMap().forEach((s, n) -> {
                sb.append(s)
                        .append(": ")
                        .append(n)
                        .append("\n");
            });

            return sb.toString();
        });
    }

It fails with:

Failures: 
TestPrefixes.prefixes_01:50 Found the following namespaces:
description: A value that ends with a colon:
 ==> expected: <true> but was: <false>

The expectation is to have an empty prefix map because there are none in the JSON-LD.

This happens because of a bug in the way prefixes are calculated when the JSON-LD starts with an array:

    private static void extractPrefixes(JsonValue jsonValue, BiConsumer<String, String> action) {
        if (jsonValue == null)
            return;
        // JSON-LD 1.1 section 9.4
        switch (jsonValue.getValueType()) {
        case ARRAY:
            jsonValue.asJsonArray().forEach(jv -> extractPrefixes(jv, action));
            break;
        case OBJECT:
            extractPrefixesCxtDefn(jsonValue.asJsonObject(), action);
            break;

Here it calls extractPrefixes directly and the JSON object is passed to the parser as is. However, if the JSON-LD wasn't started with an array, it would have been preprocessed first by the calling function:

    private static void extractPrefixes(Document document, BiConsumer<String, String> action) {
        try {
            JsonStructure js = document.getJsonContent().orElseThrow();
            switch (js.getValueType()) {
            case ARRAY:
                extractPrefixes(js, action);
                break;
            case OBJECT:
                JsonValue jv = js.asJsonObject().get(Keywords.CONTEXT);
                extractPrefixes(jv, action);
                break;
            default:
                break;

Which extracts the @context.

So that a suggested fix is to move JsonValue jv = js.asJsonObject().get(Keywords.CONTEXT); from extractPrefixes(Document document, BiConsumer<String, String> action) to extractPrefixes(JsonValue jsonValue, BiConsumer<String, String> action).

This only happens with JSON-LD parsing when the value of a string ands with a colon because of the following processing of values in org/apache/jena/riot/lang/LangJSONLD11.java:

            if (uri.endsWith("#") || uri.endsWith("/") || uri.endsWith(":")) {
                action.accept(prefix, uri);
                return;
            }

Are you interested in making a pull request?

Maybe

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions