Skip to content

Closing JanusGraph intermittently gets into an endless loop if there is an FDB error #59

@davisdk

Description

@davisdk

We have been using this adapter with JanusGraph for quite a while and have not seen issues, until we started encountering this. We have a job that usually succeeds and terminates, but occasionally runs forever with this error output over and over:

[pool-37-thread-1] WARN org.janusgraph.diskstorage.log.kcvs.KCVSLog - Could not read messages for timestamp [2021-05-11T02:21:21.872Z] (this read will be retried)
org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
    at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:56)
    at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:158)
    at org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller.run(KCVSLog.java:725)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.janusgraph.diskstorage.PermanentBackendException: Permanent exception while executing backend operation messageReading@0:0
    at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:81)
    at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)
    ... 9 more
Caused by: java.util.concurrent.CompletionException: com.apple.foundationdb.FDBException: Broken promise
    at java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:375)
    at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
    at com.apple.foundationdb.RangeQuery$AsyncRangeIterator.hasNext(RangeQuery.java:250)
    at org.janusgraph.diskstorage.foundationdb.FoundationDBRecordIterator.fetchNext(FoundationDBRecordIterator.java:64)
    at org.janusgraph.diskstorage.foundationdb.FoundationDBRecordAsyncIterator.fetchNext(FoundationDBRecordAsyncIterator.java:66)
    at org.janusgraph.diskstorage.foundationdb.FoundationDBRecordIterator.hasNext(FoundationDBRecordIterator.java:47)
    at org.janusgraph.diskstorage.util.StaticArrayEntryList.of(StaticArrayEntryList.java:390)
    at org.janusgraph.diskstorage.util.StaticArrayEntryList.ofStaticBuffer(StaticArrayEntryList.java:345)
    at org.janusgraph.diskstorage.keycolumnvalue.keyvalue.OrderedKeyValueStoreAdapter.convert(OrderedKeyValueStoreAdapter.java:145)
    at org.janusgraph.diskstorage.keycolumnvalue.keyvalue.OrderedKeyValueStoreAdapter.getSlice(OrderedKeyValueStoreAdapter.java:69)
    at org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller$1.call(KCVSLog.java:791)
    at org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller$1.call(KCVSLog.java:788)
    at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:147)
    at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:161)
    at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68)
    ... 10 more
Caused by: com.apple.foundationdb.FDBException: Broken promise
    at com.apple.foundationdb.NativeFuture.Future_getError(Native Method)
    at com.apple.foundationdb.FutureResults.getIfDone_internal(FutureResults.java:38)
    at com.apple.foundationdb.FutureResults.getIfDone_internal(FutureResults.java:25)
    at com.apple.foundationdb.NativeFuture.marshalWhenDone(NativeFuture.java:63)
    ... 3 more

This happens after our class performs a System.exit. In succeeded runs, the StandardJanusGraph closes all the instances of the adapter successfully, and in this case it closes all but one.

We have tried quite a few things with no success, so wanted to reach out here to see if anyone else has seen it or has any insights. With isolation level read_committed_with_write, the adapter retries only three times as expected, but then this error is thrown and it starts again.

Our current Janus storage settings are:

storage.backend=org.janusgraph.diskstorage.foundationdb.FoundationDBStoreManager
storage.lock.retries=5
storage.lock.wait-time=250
storage.fdb.cluster-file-path=/etc/foundationdb/fdb.cluster
storage.fdb.isolation-level=serializable
storage.fdb.get-range-mode=iterator

and we are using JanusGraph 0.5.3 and FDB 6.2.22. We also tried isolation-level read_committed_with_write and range mode list.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions