- add warmup to all benchmarks and make sure we reach the steady state - check if they would run with pypy 3.5, if not, report issues (I remember that was lxml issue that was resolved and then broke again)