Up to a few years ago I was of the breed that spawns code fated to wait for requests for most of it's life so really bothering with optimizing it is not cost effective. Then I stumbled into the one million to one situations which warranted a lecture about how many of my monthly wages a server upgrade is worth. It was revealed to me that adding more memory to that particular server was worth what was spent on me over more than 6 months, and as a result I had suddenly become interested in optimization. The story ended with that particular call being upgraded from finishing in more than 4 hours to finishing in less than 5 minutes and in me starting to have real doubts about the meme regarding the cost ratio hardware/developer time.
This is the context. The meat of the issue is Dlang has a very easy to use way to let the compiler decide how operations on arrays should be done based on what is available to it at compile time:
Array Operations . Testing the performance gains was done according to
Other Dev tools: Valgrind Helper . Screenshots are of KCachegrind.
The code:
module tests.vector_operations;
void main(){
test_loop();
test_vector();
}
void test_vector() {
int[1000000] test;
int[] result;
result[] = test[] + 1;
}
void test_loop() {
int[1000000] test;
int[1000000] result;
for (size_t i = 0; i < test.length; i++) {
result[i] = test[i] + 1;
}
}
Compiling, running valgrind, demangling
$ dmd -g vector_operations.d
$ valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes --callgrind-out-file=callgrind_out ./vector_operations
$ ddemangle callgrind_out > callgrind_out.demangled
Inspecting the results in KCachegrind: