I was so long wondering whether to do stuff this or that way, that I've convinced myself to decide once forever, and measure the real performance differences. These are by no means scientific tests in a lab you can see on TV. Just an easy application anyone can run – I was interested in the results so I thought someone might find this useful too.
The tests were done on Meridian/P, using release build, with a debugger attached. The code of console application was as follows:
using System; using Microsoft.SPOT; namespace MFConsoleApplication1 { public class Program { public static void Main() { DateTime start; TimeSpan end1, end2, end3; start = DateTime.Now; Test1(); end1 = DateTime.Now - start; start = DateTime.Now; Test1(); end2 = DateTime.Now - start; start = DateTime.Now; Test1(); end3 = DateTime.Now - start; Debug.Print("Test1: " + end1 + "," + end2 + "," + end3); start = DateTime.Now; Test2(); end1 = DateTime.Now - start; start = DateTime.Now; Test2(); end2 = DateTime.Now - start; start = DateTime.Now; Test2(); end3 = DateTime.Now - start; Debug.Print("Test2: " + end1 + "," + end2 + "," + end3); } ... } }
The differences we are interested in are really small, microseconds or less. Such small intervals are difficult to measure, so we have to do them many times to get some constitutive numbers. I've choosed to repeat the instructions 100,000 times, so that the results are around one second. This is long enough to hide CLR's scheduling, interrupts and other noise, and short enough to not fall asleep during testing. Moreover, as you see I did every test three times to verify consistency between results.
I've sorted the cases ascending by difference observed.
This one is actually the only one which surprised me, although the difference is really negligable. Negation seems to be more costy than xor. I wonder if !b would be more costy than b == false...
Classical C++ lesson. The post-increment is expected to be slower because it need to allocate and make a temporary copy of the value, increment the value and return the copy. The pre-increment operation just increments the value and returns it directly. Note that the compilation was optimized, and the result is not being used, so actually the difference may be usually bigger. Though there is still a tiny one, interestingly enough.
The processors we are running .NET Micro Framework on are 32-bit, which means the registers are 32-bit. So naturally, there is a workload when working with long (and ulong and double) types, which are 64-bit. However, what about smaller types? Does it cost anything when passing eg. bytes?
Shifting is very easily to implement in hardware (using flip-flops), while division is much more difficult to realize. Thus it is not surprising that the further is faster.
The remainder of division is a bit faster to get than the quotient, but masking is way faster than shifting. So examining the last bit has a clear favorite:
So here it is. How much is foreach faster than for? For those who don't know why, the foreach needs to create an instance of an enumerator, and every cycle its MoveNext() method is called and Current property read.
Until recently I thought that swapping variables using xor is cool. Then, at university I was told that there are people who think that swapping variables using xor is cool. Though usually slower. Damn – indeed. (I still keep the right to think that swapping pointers to large objects using xor is cool.)
The second obvious thing after the foreach stuff is to prefer fields to properties (properties are compiled into to get_ resp. set_ methods), but I hadn't realized the difference is that significant. It's like 12 μs per single call!
Appart from the differences in test cases, we also gained some idea how costy are the operations in relation to each other, and that every test repetition always resulted in a faster overall time. Anyway, everything here is for informative purposes only. Although embedded devices should be designed with performance in mind, think twice if the code readibility is worth the effect. If your application usability is affected by saving few microseconds, then something other is probably already wrong.
How fast can you toggle an output pin? Or more practically, how short impulse can you do? And most importantly, if it is not fast enough, will porting kit help you?
The code to test this is pretty easy:
OutputPort port = new OutputPort(pin, false); bool value = false; while (true) { port.Write(value); value ^= true; }
And here are the results (release build, debugger detached):
So basically, with a 100 MHz processor, you can usually do around 20 μs wide pulses using managed code (about 25 kHz).
However, here comes the fact that the .NET Micro Framework is not a real time system: nobody can guarantee such timing. Nothing prevents the runtime to switch the threads in between, or to run the garbage collector. Even if you tried to ensure that, still a native interrupt can come in during the native part of the Write method. See, the cycle above, in a console application containing no other code, produced not only the nice picture we have already seen, but also these ones (and quite often):
Okay, so you need faster pulses, for example to implement the 1-Wire® bus, which represents 1 by pulse of maximum width of 15 μs. What now? Obviously you can't do that in managed code on this hardware. So you can either get a faster processor, or try to move the code on the native side. But how fast can you work there?
Actually there are several layers in the porting kit which you can use for this purpose, and it is always a trade-off between speed and abstraction - hardware independency. The shortest pulse possible would require you to find out which processor does the hardware platorm use, get its datasheet, findout which regiter contains state of the pin you are interested in, and use assembly instructions to toggle it. But hey, this is .NET Micro Framework, let's keep it hardware independent even on the native side:
while (TRUE) { ::CPU_GPIO_SetPinState(portId, TRUE); ::CPU_GPIO_SetPinState(portId, FALSE); }
And here we go:
2.16 μs (about 463 kHz) is pretty good, isn't it? Just for curiosity, because the above snippet did not disabled processor's interrupts either, you again don't get 100 % regular signal:
(I have observed pulses from 2.12 μs to 2.56 μs, still pretty good variance) — but this is expected, there is a PWM intended for generating regular signals, not this fancy code!