Sunday, October 26, 2014

Introduction to Multithreaded and Parallel Development - Part 3

Locks and Synchronizations

Thread synchronization is the act of notifying all executing code of the latest information regarding the resources they need, locks are there just to make it easy.

Why locking is bad?

- Priority, when a low priority process holds a lock that a high priority process needs there is no easy way of telling it to let go.
- Convoying, when a thread that holds the lock is paused, all other threads are stuck waiting.
- Deadlocks, when one lock is blocking another lock from being released.
 
We have several types of locks in the .NET framework, each one is designed to address a different problem but all locks are designed to give us a way to update a shared resource or to read a stateful resource, such as a file stream that requires you to seek to a specific location before reading a value.

Monitor - good general purpose locking and synchronization mechanism, it provides a way to lock and notify where Monitor.Enter and Monitor.Exit provide the locking and Monitor.Wait and Monitor.Pulse or Monitor.PulseAll provide the notification part, used for relatively long term locking (more than a few commands or for IO).

lock - a syntactic sugar for Monitor which is a type of scoped lock if you're familiar with C++ locking.

Interlocked - easy access to locked bus atomic operations, mostly used for Add, CompareExchange / Exchange and Increment/Decrement.

Mutex - Mostly used for synchronization across processes, since this lock is serviced by the Kernel, its relatively slow.

SpinLock - Very fast lock but CPU intensive as it uses a busy wait to lock, use for only a very short duration such as a few commands like data structure modifications, variable value updates, etc'.

ReaderWriterLock and ReaderWriterLockSlim are locks for managing read/write resources, use ReaderWriterLock for long term locking and ReaderWriterLockSlim for short term locking, slim locks use the SpinLock internally.

Semaphore and SemaphoreSlim can be used as locks, but are best used to limit the amount of concurrency a certain resource should have, for example, you would like only 5 open files at the same time, you can use semaphores to block the 6th request.
 

Multithreading Implementations

.NET strives to provides us with all the tools for whichever multithreaded job we'd like, a few of them would be:

Thread - a thread is a single execution unit, you can create a thread with new Thread(new ThreadStart(..)), Start it and Abort it, which can be caught inside the thread as ThreadAbortException and then cancelled with ResetAbort, then you can wait for it to end with Join, threads can be prevented from taking CPU with Sleep or give up the rest of their time slice with Yield.

Threadpool - A threadpool is useful if you want to QueueUserWorkItems for execution, then you can be notified when they are done if you'd like.

BackgroundWorker - A wrapper for common threading concepts, it creates a thread, you can pass the code to execute with DoWork and have events for RunWorkerCompleted and ProgressChanged, you can also CancelAsync the job. You start the background worker with RunWorkerAsync and get results and exceptions inside the RunWorkerCompleted.

Threading.Timer/Timers.Timer - Both provide a way to execute a piece of code at certain intervals, while Timers.Timer execute in the UI's thread, the Threading.Timer executed in the threadpool, so their usage depends on your use case.

Parallel LINQ (PLINQ) - An extension to LINQ which can parallelize LINQ queries with a simple AsParallel method.

The Task Parallel library (TPL) - Task can be created (or StartNew)and chained (ContinueWith), Started and Waited for. Tasks use the TaskScheduler, and by default execute on the threadpool. The Parallel class is also a part of TPL and contain Parallel.For which can do a parallel for loop, a Parallel.ForEach which can execute a parallel foreach and Parallel.Invoke which can execute multiple actions in parallel.

Data Containers and Types

ConcurrentDictionary - A threadsafe implementation of a Dictionary, the main differences is that its preferred to use its Try methods, TryAdd, TryGetValue, TryRemove,TryUpdate, the implementation is using Monitor locks for write/count operations and no locks for read operations.

ConcurrentQueue - A threadsafe implementation of a Queue (FIFO).

ConcurrentStack - A threadsafe implementation of a Stack (LIFO).

ConcurrentBag - A threadsafe implementation of an unordered list of items.

BlockingCollection - a Threadsafe implementation of  a Producer-Consumer collection, it will block on Add and Take when the collection is full/empty.

ThreadStaticAttribute - An attribute that works on static fields, it will create a separate variable for each thread, almost like an opposite of volatile, so instead of synchronizing across threads, it will be completely separate.

ThreadLocal - A helper for creating a local copy of a variable for each thread that uses it.

Lazy<T> with isThreadSafe true - A helper for lazy initialization of resources, when used with IsThreadSafe=true parameter, it will initialize the value in a threadsafe manner and create a single instance.

Lock-Free algorithms - Lock free algorithms are all about implicit locking, they use Interlocked instructions for synchronization, CompareExchange (which has implicit memory barrier) is used so it maintains state by trying and retrying to change a value atomically. Lock-Free algorithms are usually slightly faster in low contention situations and can gain more performance over locking algorithms in high contention situations.

John Hanna implemented a nice collection of lock-free algorithms - https://hackcraft.github.io/Ariadne/

[MethodImpl(MethodImplOptions.Synchronized)] - The lazy person locking scheme, it locks the instance on each member function the attribute is on. It's like making a class threadsafe without the hassle of knowing what and how multithreading works, performance is equivalent to executing serially on a single thread plus the time it takes to lock on each method.

Browser Multithreading

In the old days (286 anyone?), DOS multi tasking such as DESQView used to provide a basic experience of multi tasking by saving the current state and switching to a different task. These days software is a lot more complex but the same principle of context switching remains, this idea can serve us well when trying to multitask with browsers.
 
While all Browsers provide setTimeout, setInterval which is just scheduling execution on the browser event loop, HTML5 standard provides us with web workers which can use multithreading in the browsers.
 
Based on the same principles of DESQView a developer can provide a rich experience to the user with async programming with the help of setTimeout/setInterval and if a richer, more CPU intensive processing is required, to start a new web worker and give it a task to process in the background.
 
For example, jQuery can do animations thanks to setInterval, it splits the animation to multiple steps which are executed each time setInterval is called (Neeraj Singh wrote an article about it), this can give the illusion that multiple things are happening on the same time.

Exceptions

Thread exceptions are handled silently, the starting thread doesn't know if a certain thread failed, this creates a unique situation in which an application needs to handle thread failures explicitly, to make matters worse, if a thread fails while holding a lock, that lock may stay locked forever, on top of that, the default thread abort mechanism works by throwing exception inside the thread, so it might make the unlocked locks problem even worse.
 
You should always handle thread exceptions, a thread body should always be inside a try catch with ThreadAbortException at the very least and release all locks in finally section.

try
{
}catch (ThreadAbortException)
{
}
 
An exception to this is if you're only using the lock syntactic sugar as its released automatically, but good habits prevent future problems in case you decide to use a different lock.

Debugging Threads

Debugging threads is not a trivial task, Visual Studio Performance Profiler can do Concurrency profiling, which helps to understand what threads are doing and how locks are affecting them.

Visual Studio Profiler



Moreover, you should name your threads so when looking at the Threads window in Visual Studio, you can locate the threads you're interested in easily and you can select which thread you would like to step in next.

Threads Window


Summary

Threads are not a magic bullet for performance problems, their use should be heavily considered against the cost, complexity, development and testing time making sure the extra effort is worth the potential benefits, POC and profile your design so you won't be disappointed.

Many algorithms have faster alternatives which should be investigated first, proper multithreaded development requires both care and experience and if not careful, can introduce concurrency based bugs (or Heisenbugs as some like to call them), always look in the documentation if a certain piece of code is instance threadsafe and method threadsafe before making decisions and using locks, when you do use locks, use the most lightweight lock you can for each one you use and use the proper one.

I've came to the conclusion that solving multithreaded issues is a tedious job and its best to avoid these problems in both of concurrency thought process and performance, in each lock there is a performance penalty while using lock-free collections is both safer and quicker the implementation is more prone to memory model based error and can provide a challenge when facing with random errors. IMHO the best multithreaded application is one that simplicity is balanced with performance, which can be achieved mostly with Messaging, Tasks and separation of concerns.

Avoid modifying sturcts, arrays with elements smaller than the memory alignment (32/64 bits usually), small elements could be written to memory as 'get whole element, modify only 1 byte, push back the modified element', you can see the problem in this, same thing with arrays, in general I would recommend splitting the arrays/structs to the data needed to be modified separately and join the modifications back to the struct when all threads are done.

For small tasks the more common scenarios can be solved using a task library or dividing the tasks to smaller tasks and executing them in threadpools.

For larger tasks/collections, like matrix calculations, it is better to divide the tasks to ranges and let each thread handle a separate range to avoid the mandatory locks and CPU cache issues.

Starting/stopping threads to perform tasks is usually more costly as the price of starting threads is significant and therefore a threadpool of some sort should be used in most cases.
In addition for tight loops, trying to avoid code branches might pay off as branch predication is not 100% and each miss incurs penalty.

Lastly, locks are more of a statistic probability kind rather than the perfect first thread which waited gets the lock, welcome to the era of statistical probabilities programming.

Further Reading

Great presentation slides from Gaƫl Fraiteur - http://www.slideshare.net/sharpcrafters/multithreading-fundamentals
Article about locks - http://locklessinc.com/articles/locks/
Threading in C# by Joseph Albahari - http://www.albahari.com/threading/
https://stackoverflow.com/questions/3342941/kill-child-process-when-parent-process-is-killed
PLINQ Partitioning - http://blogs.msdn.com/b/pfxteam/archive/2009/05/28/9648672.aspx
MSDN Overview of Synchronization Primitives  - http://msdn.microsoft.com/en-us/library/ms228964(v=vs.110).aspx
Microsoft CHESS tool for testing concurrency issues, Microsoft released the source code but the project seems abandoned - https://research.microsoft.com/en-us/projects/chess/
 
Problems and Solutions
- Producer Consumer/Bounded Buffers, easy to implement with BlockingCollection or ConcurrentQueue.
- Readers / Writers, easy to implement with ReaderWriterLock and ReaderWriterLockSlim.
- Dining Philosophers, an interesting implementation from Microsoft.
 
 
Synchronization Algorithms - Some are more or less relevant to .NET, but are interesting read altogether.
- Thundering herd problem - https://en.wikipedia.org/wiki/Thundering_herd_problem
- Lock Convoy https://en.wikipedia.org/wiki/Lock_convoy
- Sleeping barber problem - https://en.wikipedia.org/wiki/Sleeping_barber_problem
- Cigarette smokers problem - https://en.wikipedia.org/wiki/Cigarette_smokers_problem
- Readers-writers problem - https://en.wikipedia.org/wiki/Readers-writers_problem
- ABA problem - https://en.wikipedia.org/wiki/ABA_problem
- Readers-writer lock - https://en.wikipedia.org/wiki/Read/write_lock_pattern
- Lamport's bakery algorithm - https://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm
- Dekker's algorithm - https://en.wikipedia.org/wiki/Dekker%27s_algorithm
- Eisenberg & McGuire algorithm - https://en.wikipedia.org/wiki/Eisenberg_%26_McGuire_algorithm
- Peterson's algorithm - https://en.wikipedia.org/wiki/Peterson%27s_algorithm
- Szymanski's Algorithm - https://en.wikipedia.org/wiki/Szymanski%27s_Algorithm
- Spinlock - https://en.wikipedia.org/wiki/Spinlock
- Seqlock / frlock / ticket lock
 


No comments:

Post a Comment