Sunday, October 26, 2014

Introduction to Multithreaded and Parallel Development - Part 1

In recent years, as limitations arise on CPU clock speeds, manufacturers have been adding cores to continue delivering faster and faster processors, as a result, multithreaded development is more and more necessary to fully utilize the machines' capabilities. 

Back when computers had a single core CPU, normal, procedural and even object oriented programming was pretty straight forward, while multithreaded development requires a different thought process it is simple and similar enough to single core programming that you can get started very quickly. In addition, Microsoft has been keen on adding simplified ways of using threads with BackgroundWorker, Task and PLINQ libraries

There are two distinctive concepts with regard to multithreading:

Concurrency - is when two distinctive processes are executed at the same time, sharing time on the CPU, taking turns.

Parallelism - is when two distinctive processes are executed at the same time without sharing time, meaning, each one is executed in a separate core.
On a single core, there is no way to execute parallel tasks, as tasks need to share the CPU time to execute concurrently. On a multicore CPU both concurrency and parallelism takes place as cores can execute parallel tasks but they also share time as most systems have more threads than CPUs.

There are great number of resources on the web for learning how to write exact code, I thought I'll save both of our time by not rewriting all the examples again but by explaining the underlying decisions that led to how and what is in these examples, IMHO understanding is more important.

Let's start with the basics.

What are the basic building blocks?

A Thread - a single execution unit, a thread might execute or sleep. a thread can also wait for a wakeup from another thread or process, can give up the rest of its allotted time slot and can access the whole of the process memory and resources.
A Process - a process is a collection of threads, it can have 1 or more threads.
An Application - an application is a collection of processes, it can have 1 or more processes
A Job - a job allows grouping of processes, it's possible to manage a few processes with jobs. an example would be chrome browser, if you kill the main process, all the other processes are also killed, while not implemented directly in .NET, it is advantageous to know and use in the right circumstances (an example by Matt Howells)

For a sense of completeness, there are also fibers and UMS, but both are not implemented in the .NET frameworks, fibers are basically light threads and UMS is highly specialized for high performance highly multithreaded applications, the work involved in using it makes it cost prohibitive to most developers interesting in pursuing this adventure.
Lets imagine a system we would like to use as an example:
- It's a document management system.
- Documents are coming in via multiple ways, e.g. email, ftp, web uploads.
- Documents needs to be converted to a common format
- Documents needs to be scanned for information
- Both converted documents and scanned information needs to be saved for later processing
We have the following options for architecture:
- create a process for all tasks, execute sequentially.
- create a process for each task.
- create a process for all tasks and use threads for each task.
What ways do we have to communicate between these tasks?
- File system
- Shared Memory
- IPC (named pipes)
- Messaging (like MSMQ)
While Databases can also be used to communicate state between systems, IMHO its less than ideal solution, while it can provide as good experience as Messaging, it is not as simple as a Messaging mechanism. However, Microsoft SQL does support Queuing mechanisms since 2005, if your requirements are that the Queues be backed up regularly with SQL and that the messages be a part of a database workflow and/or transactions, it might be beneficial to look into Microsoft SQL Service Broker.
  Scalability Safety Speed Disadvantages
File System A typical directory can hold many files. Parallel writes are not easily implemented, file system crashes are relatively rare with Mirrored/RAIDed systems. Limited by disk speed, with today's SSDs, we're talking typically on 100's of MB/s or 10's of thousands of IOPS. Relatively slow
Shared Memory Limited to an individual system memory. One error can take the whole application with it, if data is not persisted in a correct way, data loss will result. Limited by system's memory bus speed and application architecture. Requires understanding on how memory is used by compiler/processor and multithreaded programming knowledge.
IPC/Named pipes Limited to an individual machine, between machines a better solution would be socket/WCF. Considered to be safe. Less than Shared memory but more than messaging. Limited to same machine or slow in case of remoting.
Messaging Accepted around the software industry as the scalability choice. Multiple subscribers/publishers can go down, as long as the queue is still alive, the system will be resilient to crashes. Typically its limited by hardware, if multiple queues and multiple servers, can scale horizontally beyond current needs. Requires different though process, in terms of 'packets' rather than 'functions'.
What do these mean in the context of our example application?

File System

File system can be used to communicate between different applications or different parts of our application, for example, all incoming documents can be saved to one folder as soon as they arrive before any processing, then they can be converted, again, saving the converted files to another folder, then they can be scanned and the scanning results can be saved to a third file.

This is simple, but will probably grind our disks on a high load system.

- The biggest advantage in my opinion is that if the application/server crashes, we have the last state saved to disk and we can pick up where we left off.
- The biggest disadvantage is that file system operations are relatively slow.
- We need to implement a file locking and retry mechanism.
- While two processes can read from the same file it is hard to implement a scalable solution, file locks can help, but .NET requires exceptions to know that and it is a slow process.
- A lot of CPU is wasted on the file system, try catch and file saving and loading, so for a low profile application it might do the trick, but if every cycle of your CPU is important, avoid this solution.
Implementation orientation:
- Use proper locking, if you only want one process to access one file, use exclusive locks (FileShare.None).
- Use try catch on all file accesses and locks are throwing exceptions.
- Use FileSystemWatcher instead of constantly monitoring for changes, also, FileSystemWatcher is known to be not 100% reliable when its internal buffer is full, so implement a scanner and execute it in timed intervals to make sure you didn't miss a change or increase FileSystemWatcher.InternalBufferSize.

Shared Memory

Different parts of our application can do everything in memory, making things faster, for example, incoming document contents (or pointer to content) can be passed around to multiple functions, conversion and scanning can be done in memory and very quickly.
- Speed
- Need to synchronize memory accesses, multiple readers can read the same memory location but multiple writes should be synchronized.
- Application/server crashes leads to loss of data, so a program might need to retry uncompleted documents and still save state when a batch of documents is completed to save time in case of a crash.
Implementation orientation:
- Shared memory across processes can be implemented using Memory Mapped Files.
- Shared memory in the same process could be implemented using variables and concurrent collections with threads, threadpools and Tasks.
- Since multiple writes should be synchronized and for most cases read should be prevented or stale data can be used instead while write is in progress, a ReaderWriterLock should be considered.
- It is the purpose of this article to introduce the ideas behind this solution, so read on.

IPC/Named Pipes

Multiple parts of our application can receive the information they need from named pipes, each one is doing their task and pushing the results to the next pipe.
- Relative speed and separation of concerns.
- IPC and Socket programming could be tricky, implementing a retry, knowing when to listen and when to open/close connections to achieve optimum performance is not trivial, while WCF provides a simplified way of doing things, it still requires much attention to security and retry mechanisms.
- Using Named Pipes on the same machine is fast, but using sockets between different machines is relatively slow and network speed will affect the performance of the application greatly.
Implementation orientation:
- Named pipes could be implemented using NamedPipeServerStream and NamedPipeClientStream.
- I recommend using WCF facilities for this, although it adds overhead, the configuration options, flexibility and optional security and encryption will save a lot of time in the future in case the application becomes a large scale operation.


Multiple parts of our application are listening to different queues, these parts are also responsible for pushing the next message to the next queue, this goes on and on until the incoming documents are handled to completion.
- Horizontally scalable (depending on the queuing mechanism and architecture of curse)
- Simple programming and thought processes.
- Adding publishers/subscribers to our queue is relatively simple, more listeners, meaning easier concurrent processing, so if we have two listeners to the same queue, two of them can work on the same task but with a different data at the same time.
- While providing high bandwidth processing it also provides a relatively high latency, meaning, it takes time to pass the document through all the queues.
Implementation orientation:
- For in-app implementation, a simple ConcurrentQueue with a waiting thread could do the trick (and is very fast), add waiting threads if your application demands it AND you have available cores.
- For more robust solution use MSMQ, ActiveMQ, ZeroMQ etc', note that I have used only MSMQ so I can't recommend any specific solution, one size doesn't fit all and you should do your own research on which one is the best for you, many cloud providers also provide a messaging solution, so it might be beneficial to take a look at them.
- Passing only a document id is much preferred to passing the whole document.

So what are Concurrency Models?

Concurrency models are the basic building blocks for adding concurrency to your application, consider the following:

- Shared Memory, which relies on the basic premise that all code in the program can access all memory, in a shared memory model, we need to synchronize access to shared resources as conflicts might cause a program to misbehave due to the result being non-deterministic.

- Immutable Data Structures, which relies on all data being immutable (e.g. unchangeable), so if you want to add value 1 to an array containing 1,2,3, the value is not simply added, but a copy of the array is created, 1 is added and the new array is being returned. multiple writes will each return a different array with different values and will be valid only for that calling function, to sync all of these arrays, another one is created by combining all results. this is an easier model to understand but with huge memory structures it could be a problem, the biggest advantage of immutable data structures is that they don't need to be synchronized or locked for shared access.

- Message Passing, which relies on each process to be responsible for its own actions, want to tell it to do something? send a message. JavaScript multithreading (Web Worker) is based on that.


1 comment:

  1. I'm glad to be reading this article, I simply want to offer you a huge thumbs up for your great information.
    Tableau Guru