Replies: 2 comments
-
|
CC #2088 |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
We discussed this a bit more on discord, but I prefer the approach in #2554 better - in general, we use signing for "generate a message" logic, and most of that logic is already replay-able as we need to replay if we disconnect from the peer and reconnect to discover that the original message never made it to the peer. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Motivation
We run an LDK-based multi-tenant container; i.e., a single process that runs several independent Lightning nodes concurrently. For some of these nodes, we'd like to delegate the signing operations to a remote service.
A remote signing service could be a secure enclave running in the same datacenter, a service operated by a third-party that maintains the signing keys, or even an individual device.
In each of these cases, we'd make an remote request to the signing service to perform the actual signing operation, or to provide LDK with a necessary secret (e.g., the per-commitment secret).
For this to work, each of the operations that LDK might perform must be fallible; i.e., it should be allowed to transiently fail, and then be resumed later when the result becomes available.
Currently (as of 0.0.116), LDK's signing interfaces (e.g.,
ChannelSigner) are not infallible:Resulttype. For exampleget_per_commitment_pointreturnsPublicKeyand admits no possibility that the commitment point is not immediately available.Resulttype, but are fallible "in signature only": actually returning an error will crash LDK.Resulttype but any error result will cause an immediate channel force-closure.We'd like to work through the various signing interfaces and improve LDK's implementation to support the above use case. In particular, each method should admit an implementation that may not immediately have a result but has not failed permanently.
As a motivating example, consider an implementation of the
ChannelSignerinterface implemented using webhooks. In a canonical webhook-based design, a request is sent via HTTPPOSTto a remote server. The response is typically a short200 OK, followed later by the remote server issuing an HTTPPOSTback to the requester with the results.Proof-of-concept
A proof-of-concept implementation is in-progress in #2487, which specifically addresses
ChannelSigner::get_per_commitment_pointandChannelSigner::release_commitment_secret. Our goal here is to explore how we might rework LDK's internals to support 1) these methods returning aResulttype (and so they can fail), and 2) resuming the channel state machine appropriately when a result is returned.After some initial prototyping and discussion, we opted for the following approach:
get_per_commitment_pointandrelease_commitment_secretto return an error that is simply the unit type,(). Current in-memory implementations of the signer never return an error, and so they need to change only inasmuch as to wrap the results inOk(...).Errresult to be user-defined as follows. If the signing failure is permanent, then the user must handle force-closing the channel themselves after returning theErrresult. On the other hand, if the signing failure is temporary (e.g., requires a response from a remote party), then the user can explicitly retry the operation when the results are available.get_per_commitment_pointorrelease_commitement_secretand receives anErrresult, it unwinds out and stores a retry state associated with the channel in the per-peer state.ChannelManagermethod,retry_channel. This method accepts the remote peer's public key and the channel ID, and restarts the operation that previously had failed. The assumption is that now the request toget_per_commitment_pointorrelease_commitment_secretwill succeed because the signer implementation will have the required material.As an example, consider the following (somewhat simplified) flow that occurs during
commitment_signed:Here the
WebookSignerImplis an implementation of theChannelSignerinterface provided by a user, andSigningServiceis the service to which that implementation is delegating the signature operations.Upon receiving an error response from the signer, the
commitment_signedhandler in theChannelpropagates the error out to theChannelManagerthat then notes that the channel is pending retry forcommitment_signed. (Specifically, it does so by adding an entry in a new per-peer state table keyed by channel ID whose value is anenumwith sufficient side-information to restart the operation.)Later, when the user's
WebhookSignerImplhas been provided with information sufficient to proceed, it invokes theretry_channelon theChannelManager, passing in the peer and channel ID. From this, theChannelManagercan recover the retry state and restart thecommitment_signedprocessing.Overview of changes
get_per_commitment_pointandrelease_commitment_secretto return aResulttype. The error type is the unit type, and is interpreted to mean either a) the channel is not ready, and the user will later attempt to resume processing by callingretry_channel, or b) the channel signer has permanently failed and the user will eventually force-close (or abandon) the channel.Context. These are bothOptionvalues, withNonefor a newly constructed channel awaiting its first commitment point (or revocation).by the caller to mean "the information is not ready".
ChannelandChannelManagerhandlers that initialize or modify the per-commitment point to cache the values correctly:Channel::funding_signedChannel::commitment_signedChannel::channel_reestablishChannel::funding_createdChannelManager::do_accept_inbound_channelChannelManager::create_channel(still to do)Beta Was this translation helpful? Give feedback.
All reactions