My apologies in advance for a question that might seem trivial - I am a mostly solo dev in academic environment and a lot of industry best practices don't necessarily make it here.
Several of my projects run high-performance numerical computation loops. Per se, a single iteration of the loop is rather fast (~1 sec), but there are a lot of them (10-100k per run). Due to that, the performance of the loop is essential to the performance of the whole application and some minor modifications to it (such as unnecessary array type/shape conversions) can slow it down by a lot (latest optimization pass I performed accelerated it by a factor of 20-ish).
As such, it is critical for me to monitor if any changes to the code I am making are having a performance impact on the core loop- be it immediate or by accumulating the loss of performance over time.
I am using a CI suite and number of tools to run unittest, measure test coverage and automatically build the apidocs. However, so far I have not found anything that would perform performance tests as easily or combine their outputs into a graph-over-time. Looking around, I realized that actually comparing performance on different builds is non trivial, given it can be affected by the hardware running the performance tests as well as the software enabling the code isolation and results collection.
Is there a recommended way to perform performance tests to minimize the effects of hardware? Or at least make sure the values are comparable? Is there a standard way of outputting/ingesting performance test results in Python? For instance something along the lines of pytest --duration
?