You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When setting up the cluster environment, I want to run a deduplication task for a large data set (1T, stored locally), but how should I load the data? Should I put all the data on the supervisor node and then load it? Or should we divide the data equally into each node, and then run xorbits.init(address=http://supervisor_ip:web_port) on the supervisor node to load all the node data for deduplication? Please answer it, thank you~