-
-
Notifications
You must be signed in to change notification settings - Fork 49
feat(auto-scale-up): add fake online, server status cache, and seamless connection during scale-up #406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Regarding questions: Version ResponseIt would be ideal to ping the backend once at startup and retain its version/protocol; however, in the spirit of incremental PRs I'm fine with it being hardcoded at first. I'll leave it up to you for this initial PR. |
|
Thanks for your input @itzg It seems that |
Yeah, it now looks like it would be good to bundle up the config parameters to avoid the argument/field growth. Since |
|
I'm pretty new to golang. Sorry if the code is a bit messy. |
That's totally fine. I'll point out comments where I can and can push changes to the PR if there's any that are too hard to explain. |
|
Sorry for the delay getting back to your PR. I'm focusing on #405 and then after that's merged can shift to yours. |
No problem 😃 |
| sc.cache[serverAddress] = status | ||
| } | ||
|
|
||
| func (sc *StatusCache) Delete(serverAddress string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to delete status cache of server that are not anymore in routes mappings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, yes. Otherwise it would be a (very, very small) memory leak, right?
| LastUpdated time.Time | ||
| } | ||
|
|
||
| type StatusCache struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heavily inspired from routes mappings, can maybe simplify a little bit by removing mu key like routesImpl ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would make it consistent. I go back and forth about the unnamed/inheritance style like I used in routesImpl 😄
|
Previous PR is now merged, so now I can focus on yours after you get merge conflicts caught up. |
35e42c7 to
9f3262c
Compare
|
Conflicts should be fixed @itzg |
|
Thanks @alexfrs69 @Sammcb since you've been in some this code recently I was wondering if you could help look at this PR also? |
|
@alexfrs69 just noticed the PR is still in draft. Want to mark it ready for review? |
server/connector.go
Outdated
| logrus.WithField("cachedStatus", cachedStatus).Debug("Using cached status") | ||
| status = &mcproto.StatusResponse{ | ||
| Players: cachedStatus.Players, | ||
| Description: cachedStatus.Description, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently FakeOnlineMOTD is displayed only if we don't have a cached status for the backend.
Would be nice to add an option to join the Description with FakeOnlineMOTD ?
server/connector.go
Outdated
| } | ||
| } | ||
|
|
||
| writeStatusErr := mcproto.WriteStatusResponse( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, if this occurs does this logic need to be changed (amount, err := io.Copy(backendConn, preReadContent))? I'm wondering how this would work if the router has already returned as response to the client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we write a status to the client we return just after. I don't think io.Copy(...) will be reached
|
Thanks for the reviews @itzg @Sammcb! I've addressed your requested changes, please take another look 🙂 If a client repeatedly refreshes the server status page, each status request is handled sequentially, causing a backlog and significant response delays. This could potentially lead to a DoS situation. How do you suggest we address this? On the cache side, I think it would be better to:
This should reduce backend overhead. What do you think? |
|
@alexfrs69 So to understand the high level goals. The fake online (and associated caching) and fake online MOTD really only comes into play in the case where a mapping for a backend exists, but that backend is not reachable and the user has auto scale up enabled? If that is the case, my understanding is the final goal is to capture information about a backend server's state to display in status requests to the player when that server is down. One case I don't think is solved by this is if all backend servers are scaled down when I'm wondering what your thoughts would be on instead of trying to get all this info from the server, simply adding a config file the user could supply with fallback status info? Obviously, this could drift from the running server status too and would be up to the user running the servers/router to keep things synced. Thank you for doing so much extensive testing and research on these changes so far! Just wanted to suggest the config option route as I recently ran into this issue when wanting to setup allow/deny lists for server autoscalers. I realized it would be very difficult to get the information from the servers directly so opted to simply allow users to enter it as a config for the |
Exactly I like the idea! I definitely considered the limitation that mc-router can't get the status if the backend is stopped. Relying only on configuration for status at scale could be a bit heavy to manage, though. We could accept this limitation for now, or, as you suggested, add support for config files so users can provide fallback status info for backends that aren't in the cache. This way, if a status isn't cached, we can return the configured fallback. It offers flexibility for those who need it, without requiring config management for every backend by default. |
|
Yeah, I think either accepting that as a limitation or adding the config file input could work! In terms of getting the data dynamically, I really like this approach (made some slight modifications to your initial suggestion just to clarify):
|
|
I can work on implementing the config files tomorrow 🙂 It would also be great to automatically detect any new backends added via k8s labels and cache their status right away. |
|
If you hook that into the router CreateMapping, etc. then that would be totally doable! (I did something very similar for the down scaler in #405) |
|
|
||
| var cacheInterval time.Duration | ||
| if config.AutoScale.CacheStatus { | ||
| cacheInterval, err = time.ParseDuration(config.AutoScale.CacheStatusInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this config processing and then ultimately cacheInterval should be fed into server.ConnectorConfig
| connector.StatusCache.StartUpdater(connector, cacheInterval, func() map[string]string { | ||
| mappings := server.Routes.GetMappings() | ||
| logrus.WithField("mappings", mappings).Debug("Updating status cache with mappings") | ||
| return mappings | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and that starting of the cache updater should be encapsulated inside of the connector.WaitForConnections method.
As it is, it's leaking a bit too much of runtime behavior into the main method.
| sc.cache[serverAddress] = status | ||
| } | ||
|
|
||
| func (sc *StatusCache) Delete(serverAddress string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, yes. Otherwise it would be a (very, very small) memory leak, right?
| Players: mcproto.StatusPlayers{ | ||
| Max: 0, | ||
| Online: 0, | ||
| Sample: []mcproto.PlayerEntry{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be a compile error here. Probably meant StatusPlayerEntry?
| const ( | ||
| handshakeTimeout = 5 * time.Second | ||
| handshakeTimeout = 5 * time.Second | ||
| backendTimeout = 30 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you double check this is less than the timeout that the Minecraft client seems to run when attempting to connect?
| Name: "UNKNOWN", | ||
| Protocol: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you been able to see what the Minecraft client thinks of these version and protocol values?
| LastUpdated time.Time | ||
| } | ||
|
|
||
| type StatusCache struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would make it consistent. I go back and forth about the unnamed/inheritance style like I used in routesImpl 😄
| delete(sc.cache, serverAddress) | ||
| } | ||
|
|
||
| func (sc *StatusCache) updateAll(getBackends func() map[string]string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice encapsulation of passing in the getter function! Since it gets passed in a couple of places, I wonder if it could be provided once to NewStatusCache?
That's somewhat by design since |
|
Hey guys, just letting you know that i dont have time to invest in this PR this week. |
|
Hi everyone. I hate to be that guy, but I was wondering about the state of this PR? Do you still intend to finish this feature @alexfrs69? |
|
Perhaps we need to scale back the features and behavior to simply what @L-Wehmschulte said
I had forgotten a lot of this enhancement and from the comments it looks like we were getting into more complex scenarios, caching backend status info. I'll more this weekend myself to refresh my memory. |
That would be awesome. Thank you so much! |
|
Hi everyone, sorry I haven’t been able to finish this PR yet. I’ll try to get back to it whenever I have some free time, maybe this weekend. Most of the features are working fine, but it still needs some polishing, especially around implementing backend status config, a status cache according to our plan, and thoroughly testing everything out (Minecraft timeouts, status protocol, etc.). |
This PR introduces a feature to display a configurable "fake online" MOTD when a backend server is offline and auto-scale-up is enabled. When a status ping is received and the backend is not available, the router responds with a custom MOTD instead of appearing offline.
It also "hang" the user while retrying to connect to the backend until it reach a defined timeout and then close the connection.
Overall improving the user experience during backend startup.
Related: #36 #21
Key Changes
For fake online:
For status requests:
For connect requests:
Questions
Version Response
Currently, the status response uses:
Should the version and protocol be configurable, or is it fine to hardcode the latest supported version? What version/protocol should we default to for best compatibility?
Next Steps