I’m back after some time spent on hard working, overtiming, doubleshifting and all that since my daytime project was heading for a rough landing. On the up side it gave me a lot to think and write about. Here’s the first topic – queue implementations and thread sleeps.

First of all, what motivated this – quite often (too often) I see a piece of code that looks like this:

Where the queue is shared among producers and consumers. The problem with this code is that if you run it, it’s going to spin. If the queue is more often empty then not (and it most likely is if you don’t have enormous amounts of data flowing through it). A very frequent modification of this ‘pattern’ is one with Thread.Yield() or Thread.Sleep(0) (they’re equivalend) after the while loop.

I used to think that it’s not all that bad, the Yield would give back the scheduled time without spinning much, the waste would be only on context switching, but it wouldn’t be that CPU consuming. Wrong! It does perform slightly better under overall CPU heavy load, but in ‘idle’ mode, it still consumes a whole one CPU core (if there is one consumer).

Another variation of this is something like Thread.Sleep(1) or Thread.Sleep(5). While it helps in idle mode, it has a couple of problems. It still does a lot of context switching completely unnecessairly and it also delays your processing (the bigger the sleep value, the larger the delay, the less context switching)

So how can we do better? I used to use some form of Monitor.Wait and Monitor.Pulse/PulseAll  was a good solution, and while it works fine it’s usually a bit difficult to get completely right and the consequences of not getting it right are dire. If you miss one Pulse or mishandle some exception, you can stop processing or wait while there are messages in queue, or a whole bunch of other unwanted stuff can happen.

Happily, .NET 4.0 introduced a BlockingQueue. This code:

can completely replace what we’ve had above (minus the cancellation, we’ll get there in a moment).

The foreach loop will block until there are elements in the collection and will consume them in a nice thread safe manner. When a producer is done adding, they can call collection.CompleteAdding() and the loop will end for all consumers.

The BlockingCollection contains a bunch of static methods that allow different behaviors dealing with mulltiple collections at one (AddToAny/TakeFromAny) that also can take cancellation tokens and bail out nicely if you decide to quit processing. One word of caution – it’s not guaranteed to behave as if there were any priorities – if you have two blocking collections and you do TakeFromAny, you’ll most likely receive elements from the first one (if there are any) and then from the second one, but it’s not a guaranteed behavior (at least I have not found it documented anywhere). In .NET 4 and 4.5 ILSpy can tell us that it will always behave this way, but in 5.0+ – who knows.

One last thing. This behavior brings me to mind another cool library for .NET – reactive extensions (parts of it – some interfaces – are actually a part of the framework). In short it’s a way of treating your collections (or any deta sources) as sort of ‘data available event producers’ (and more). It’s pretty robust, and it plays nicely with linq. Check it out here: http://msdn.microsoft.com/en-us/data/gg577609.aspx.

 

Some time ago when working on some performance critical serialization code for a C# project, I was looking deeply into all flavors of available serialization benchmarks. One thing that struck me at the time was that there weren’t that many of them. The most comprehensive I’ve found was this one: http://www.servicestack.net/benchmarks/NorthwindDatabaseRowsSerialization.100000-times.2010-08-17.html Published by service stack so maybe a little biased, but pretty neat nonetheless.

The test published were pretty in line with what I have observed – protobuf was a clear winner, but if you didn’t want to (couldn’t) use open source, DataContractSerializer was best available to you.

After some more testing, I started finding some discrepancies between this report and my tests and they we’re pretty big – it seemed that DataContractSerializer was better than reported, and the whole order was sometimes shuffled . It all boiled down to two major differences that I’ll explain here

 

Data Contract Serializer performance

In the quoted tests, DataContractSerializer was almost 7 times slower than protobuf. That’s a lot. In my tests it was 3 times slower. Big difference, huh ? When I ran the tests on my own machine (the thing I love about people at servicestack.net is that they published on GIT the source to their tests so that you could reproduce it at home) I found that DataContractSerializer was 7 times slower. So obviously, there had to be a difference in the way we used it! And indeed there was, when I added my version it was 2,8 times slower than protobuf running on the same test set. Here’s the difference:

Original:

My version:

See the difference ?

Sure, it’s binary, you can’t always use it, but in most cases it will do, and it’s way faster. I also noticed, that the payload size is about 1/3 more than JsonDataContractSerializer, which is actually pretty good, so if you’re worried by bandwidth usage, it’s an additional plus. Here are the results of running the Northwind test data with just one additional serialization option (binary one) on my machine:

Serializer Larger than best Serialization Deserialization Slower than best
MS DataContractSerializer 5,89x 10,24x 5,38x 7,24x
MS DataContractSerializerBinary 4x 3,01x 2,75x 2,85x
MS JsonDataContractSerializer 2,22x 8,07x 9,59x 9,01x
MS BinaryFormatter 6,44x 10,35x 6,23x 7,81x
ProtoBuf.net 1x 1x 1x 1x
NewtonSoft.Json 2,22x 2,83x 3,00x 2,94x
ServiceStack Json 2,11x 3,43x 2,48x 2,84x
ServiceStack TypeSerializer 1,67x 2,44x 2,05x 2,19x

 

Having this difference aside, the results were more consistent with my tests, but still there were big differences in some cases, which leads us to second important point

 

It’s all about data

There is nothing that will be best for all cases (even protobuf), so you should carefully pick your use case. In my tests, the data was larger, and usually more complex than in Northwind data set used for benchmarking by servicestack.net. Even if you look at their results, the smaller data set the bigger the difference – for RegionDto which has only one int and one string, the DataContractSerializer is 14 times slower (10 times slower on my machine, binary XML for DataContractSerializer is 3,71 slower).

So, does the DataContractSerializer perform better (especially in binary version) if larger objects are involved? It does indeed.

I created a EmployeeComplexDto class that inherits from EmployeeDto and adds some data as follows:

The object being serialized  consisted of not more than 10 Orders, 10 Customers and 10 Friends (for each test the same amount and the same data), and here’s what I got serializing and deserializing this 100 000 times:

Serializer Payload size Larger than best Serialization Deserialization Total Avg per iteration Slower than best
MS DataContractSerializer 1155 1,88x 6407245 6123807 12531052 125,3105 4,11x
MS DataContractSerializerBinary 920 1,50x 2380570 3452640 5833210 58,3321 1,91x
MS JsonDataContractSerializer 865 1,41x 7386162 14391807 21777969 217,7797 7,14x
MS BinaryFormatter 1449 2,36x 9734509 7949369 17683878 176,8388 5,80x
ProtoBuf.net 613 1x 1099788 1948251 3048039 30,4804 1x
NewtonSoft.Json 844 1,38x 2844681 4272574 7117255 71,1726 2,34x
ServiceStack Json 844 1,38x 4904168 5747964 10652132 106,5213 3,49x
ServiceStack TypeSerializer 753 1,23x 3055495 4606597 7662092 76,6209 2,51x

So – are you really ‘stuck’ with DataContractSerializer, or is it quite good ? The answer is of course – it depends. Surely, whenever you can, use the Binary Writer as it is way faster, but even if you do, whether it’s a best choice or not can be answered only with a specific use case in mind – think about what data will you be serializing, how often, etc.

As always with reasoning about performance, best option is to test with your data, in close to real scenarios and (preferably) on similar hardware as the application is going to run in production. The closer you get to this, the more accurate your tests will be.

 

Welcome to String[] blog!

Let me start with a beirf word of introduction, I work in one of the big multinational corporations as a software engineer, and through the years I have worked in multiple projects and with multiple technologies (starting with Java / JSP / Oracle, and now hovering around .NET ecosystem). In those years I’ve learned a lot about programming and have often encountered stuff (problems, solutions, quirky behaviors, mysteries) that I found interesting and worth sharing. This is what this blog is about. My medium to share interesting technical ‘stuff’, mostly related to programming and hopefully to discuss and extend my understanding.