-
Notifications
You must be signed in to change notification settings - Fork 118
perf: increase interpolator3 speed and remove large minestom generator allocations #512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev/7.0-2
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
import org.gradle.api.Project | ||
import org.gradle.kotlin.dsl.apply | ||
|
||
fun Project.configureBenchmarking() { | ||
apply(plugin = "me.champeau.jmh") | ||
} | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
package com.dfsek.terra.addons.chunkgenerator.generation.math.interpolation; | ||
|
||
import org.openjdk.jmh.annotations.Benchmark; | ||
import org.openjdk.jmh.annotations.Fork; | ||
import org.openjdk.jmh.annotations.Measurement; | ||
import org.openjdk.jmh.annotations.Scope; | ||
import org.openjdk.jmh.annotations.State; | ||
import org.openjdk.jmh.annotations.Warmup; | ||
import org.openjdk.jmh.infra.Blackhole; | ||
|
||
@State(Scope.Benchmark) | ||
@Fork(1) | ||
@Warmup(iterations = 2, time = 1) | ||
@Measurement(iterations = 2, time = 5) | ||
Comment on lines
+13
to
+14
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. probably good to do more iterations for a bit longer than that. I usually see 1 warmup iteration for 5 seconds & 5 measurement iterations for 5 seconds, for a total of 60 seconds. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 |
||
public class Interpolator3Benchmark { | ||
private final Interpolator3 interpolator = new Interpolator3(0, 1, 0, 1, 0, 1, 0, 1); | ||
|
||
@Benchmark | ||
public void benchmarkInterpolator3(Blackhole blackhole) { | ||
blackhole.consume(interpolator.trilerp(0.5, 0.75, 0.5)); | ||
} | ||
} | ||
Comment on lines
+15
to
+22
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this benchmark should probably use random values to stop the jvm from perhaps pre-populate an array with random values and step through that for each invocation of the benchmark to avoid the overhead of the random number generator, since this is rather low level. which architectures & jvms has this been tested on? low level optimizations like this are extremely finicky. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 |
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,16 +7,21 @@ | |
|
||
package com.dfsek.terra.addons.chunkgenerator.generation.math.interpolation; | ||
|
||
|
||
import com.dfsek.seismic.math.numericanalysis.interpolation.InterpolationFunctions; | ||
import com.dfsek.seismic.math.arithmetic.ArithmeticFunctions; | ||
|
||
|
||
/** | ||
* Class for bilinear interpolation of values arranged on a unit square. | ||
*/ | ||
public class Interpolator3 { | ||
private final Interpolator bottom; | ||
private final Interpolator top; | ||
private final double _000; | ||
private final double _001; | ||
private final double _010; | ||
private final double _011; | ||
private final double _100; | ||
private final double _101; | ||
private final double _110; | ||
private final double _111; | ||
|
||
/** | ||
* Constructs an interpolator with given values as vertices of a unit cube. | ||
|
@@ -33,12 +38,25 @@ public Interpolator3(double _000, double _100, | |
double _010, double _110, | ||
double _001, double _101, | ||
double _011, double _111) { | ||
this.top = new Interpolator(_000, _010, _001, _011); | ||
this.bottom = new Interpolator(_100, _110, _101, _111); | ||
this._000 = _000; | ||
this._001 = _001; | ||
this._010 = _010; | ||
this._011 = _011; | ||
this._100 = _100; | ||
this._101 = _101; | ||
this._110 = _110; | ||
this._111 = _111; | ||
} | ||
|
||
//TODO this system is not very good, replace it wholesale | ||
public double trilerp(double x, double y, double z) { | ||
return InterpolationFunctions.lerp(top.bilerp(y, z), bottom.bilerp(y, z), x); | ||
double a = ArithmeticFunctions.fma(2, x, -1); | ||
double b = ArithmeticFunctions.fma(2, y, -1); | ||
double g = ArithmeticFunctions.fma(2, z, -1); | ||
|
||
// using explicit fma here somehow makes this slower | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you share the benchmarks for this? in fact, it would be good if you could share all the benchmarks you did for different changes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1, that honestly should not be the case as the jvm should just optimize fma into a*b + c on non supported platforms |
||
return ((1 - g) * (1 - b) * (1 - a) * _000 + (1 - g) * (1 - b) * (1 + a) * _100 + | ||
(1 - g) * (1 + b) * (1 - a) * _010 + (1 - g) * (1 + b) * (1 + a) * _110 + | ||
(1 + g) * (1 - b) * (1 - a) * _001 + (1 + g) * (1 - b) * (1 + a) * _101 + | ||
(1 + g) * (1 + b) * (1 - a) * _011 + (1 + g) * (1 + b) * (1 + a) * _111) / 8; | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,7 +15,7 @@ | |
|
||
public class GeneratedChunkCache { | ||
private static final Logger log = LoggerFactory.getLogger(GeneratedChunkCache.class); | ||
private final LoadingCache<Pair<Integer, Integer>, CachedChunk> cache; | ||
private final LoadingCache<Long, CachedChunk> cache; | ||
private final DimensionType dimensionType; | ||
private final ChunkGenerator generator; | ||
private final ServerWorld world; | ||
|
@@ -29,7 +29,7 @@ public GeneratedChunkCache(DimensionType dimensionType, ChunkGenerator generator | |
this.cache = Caffeine.newBuilder() | ||
.maximumSize(128) | ||
.recordStats() | ||
.build((Pair<Integer, Integer> key) -> generateChunk(key.getLeft(), key.getRight())); | ||
.build((Long key) -> generateChunk(unpackX(key), unpackZ(key))); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if it would be worthwhile to have our own kind of cache which uses is backed by a fastutil There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have something in the works for this for layered, out of scope for this PR but good idea |
||
} | ||
|
||
private CachedChunk generateChunk(int x, int z) { | ||
|
@@ -50,6 +50,18 @@ public void displayStats() { | |
} | ||
|
||
public CachedChunk at(int x, int z) { | ||
return cache.get(Pair.of(x, z)); | ||
return cache.get(pack(x, z)); | ||
} | ||
|
||
private long pack(final int x, final int z) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe we already have a function somewhere which packs two integers into a long There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is from like the 3.0 days, I wrote it. |
||
return ((long) x & 0xFFFFFFFFL) << 32 | (z & 0xFFFFFFFFL); | ||
} | ||
|
||
private int unpackX(long packed) { | ||
return (int) (packed >> 32); | ||
} | ||
|
||
private int unpackZ(long packed) { | ||
return (int) packed; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned it a while ago that it might be good to add some jmh benchmarks, I made a gradle config for that, I forget if it was just identical to this or if I had smth else as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1