This article is about common sense sprinkled with personal opinion and experience, don’t follow everything to the letter, remember:
There is an exception to every rule
In my opinion, optimization is for either one of two cases, customer demands it, or you’re doing it for fun. We’ll look at most of these components and attitudes in this article, some optimizations are more for a challenge than a real income as the performance gains could be great but will benefit no one other than our own pat on our backs. I enjoy both attitudes as some of my past articles show, I’ll optimize string format for fun, but check if I can accelerate an entire processing job for business if my customer demands the minimalistic latency that a computer can possibly achieve, but I will give them the information regarding which process will take how many developer resources and what’s the maximum amount of benefit that can be gained so they will decide if that second is worth 2 days of development time.
Though everyone knows the value of tests, in performance tuning there is a special weight for it, not only that we improved the performance but also to make sure the outcome is the correct one, as performance tuning makes things more complicated it is sometimes hard to keep everything perfectly together, an outside inspector (tests) can help you keep your mistakes to a minimum.
What is performance analysis?
- From Wikipedia: profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or frequency and duration of function calls. The most common use of profiling information is to aid program optimization.
What is software optimization?
- From Wikipedia: software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be optimized so that it executes more rapidly, or is capable of operating with less memory storage or other resources, or draw less power.
Let us assume
- The more work you do, the more resources it will take, where resources are any combination of CPU time, IO (network/disk) and memory.
- For the same amount of data, a few big chunks load faster than many small chunks, where chunks can be files on disk or network, the two major reasons are latency and overhead.
- Don’t optimize unless you have to! Many times the optimization is idealized in the developer’s eyes, write this optimized, its more efficient, write that optimized, its less resources, most of the time optimizations make things more complicated and less maintainable and you don’t want that overhead when a real bug comes in and you need to figure out what went wrong. Many times optimizations without profiling is time wasted, think like a businessman in that sense, if you don’t have the extra cash, don’t spend it, if you're considering a loan, then be prepared to pay the interest (delays).
- For many server applications its cheaper to throw more hardware on the problem than optimize.
- For many client applications the customer’s time is more expensive than developer time! If you’re offering a free application you might encourage your users to abandon it, if it’s a paid application, you should think of your customers, their time is not free and if their productivity is lowered, they won’t want to spend much time/money on your application, when thinking about customer time, double the amount of users with the time wasted, is it worth more than developer time and potential income?
- Understand before optimization. Always understand what the application is trying to accomplish before you try to optimize, a big picture will show you that perhaps the whole process isn't necessary to achieve the end result, now you've cut a significant amount of time and resources. Algorithms are bottlenecks, is there a more efficient one?
- Your hardware has finite resources.
David Knuth wrote in The Art of Computer Programming - “premature optimization is the root of all evil”.
- It is hard to be certain where a bottleneck will be in production system, you might assume that a certain piece of code slows things down, but in reality there could be so many variables affecting the system, that the piece you thought was the problem is negligible in the grand scheme of things, its also inefficient to optimize tasks you don’t care they are slow and tasks that execute rarely, unless these task need to provide an answer in real time.
- Design with performance in mind. Its much cheaper to develop something that will withstand or almost withstand the planned performance requirements and then optimize rather than develop a cheap solution that will require much more work to change it later for the designed performance, find your balance for time to market vs performance. I know this looks like a contradiction to premature optimization (...), but really its not, assume you have a web application that needs to display 1000’s of records in a grid, planning to display 10 and later optimizing it will waste time, infinite scroll or other common practice is much more suitable.
- Do a preliminary profiling as early as possible for the busiest parts so you’ll have a big picture of what smells and needs another peek. Do your profiling later when the system goes beta/production so you’ll know where the real bottlenecks are.
- Working, Correctly, Fast. Your correct development flow should be first to make things work. Then work correctly and only then look for optimizations. Its easier and more accurate to do your performance optimization after everything works correctly.
- Optimization usually complicates rather than simplifies. Optimizing code for performance usually requires caches, dictionaries, arrays and other less readable methodologies, that’s why it should be done at the end, so that the first working code will actually work correctly, complexity usually defies readability and ease of maintenance.
- Waste not, want not. Don’t waste resources, don’t over-calculate stuff, don’t store things you don’t need, don’t retrieve data you’re not going to use. This will leave you with more resources for the stuff that really needs it. When you load more data than you need, IO needs to work more, depending on latency and throughput, in turn this uses more CPU and sometimes also memory, in the end, you paid a high price for doing unnecessary work.
- Don’t waste memory resources. High memory usage leads to swapping, don’t cache what you don’t need to cache or might rarely use, cache needs to be maintained so it won’t become stale. Overusing memory leads to swapping and leaving less memory for file system cache which could make the whole system slow. One of the first things I check when I have a slow server is how much memory is available and if SQL is not configured with a hard limit. If you suspect your program is misbehaving, profile it!
- Profile, Optimize and Validate, repeat as needed, don’t assume anything is faster/slower and don’t assume your improvement actually improves anything, under the right circumstances slow things can be fast and vice versa. Validate, I can’t stress that enough.
- Don’t keep your code if it’s a minuscule improvement, most of us have a sentiment to the code we write, thinking we’ve spent an hour or more on something that it shouldn’t go to waste but remember that optimizations usually impair readability and maintainability and keeping the code will do more harm than good, if you’re really attached to it, open a graveyard blog and put it on display, this way others can learn and you get to keep it.
- Why code metrics is not enough. Code metrics can be great to detect complexity and maintainability hotspots in your code, but it can’t measure performance, the fact that your code is complex or simple doesn't mean its slow or fast. A lot of performance bottlenecks can only be detected based on the data that goes through the program’s pipeline, that’s why its even more important to check the performance with real-world data rather than a mockup.
- Understand Big O notation for algorithmic use. Big O notation is a measurement that describes the performance a certain function will have based on input length. The closer it is to 1, the more chance a long input will affect performance linearly.
How to get performance improvements?
- Tweaking, usually minuscule, while reordering commands might affect CPU cache performance and can speed things up and using SIMD commands can even double the performance for certain actions, but unless these commands execute in big loops or on very big data, these tweaks will probably save a second here and there, if you see a significant amount of work being done on blocks of data, like matrix/vector calculations, you should look into SIMD programming, this is completely outside the scope of this article and you’ll probably need to go native – C++, but there is a also a .NET Implementation.
- Memory access - Understanding reference/value. All references are 32/64bits (depending on architecture), primitives and structs are value based and you should know that calling a method with a parameters by value will copy the contents, this is a type of micro-optimizations but in large loops can have an effect on the performance. In case you didn't know, strings are passed by reference but strings are also immutable, which means that any manipulation on strings creates a copy and discards the old reference, strings are a nice meal for the garbage collector. In any case, the time spent on copying structs is directly related to their sizes and worse case, you can always pass them by reference with ref.
- Algorithmic. Usually big improvements, in memory and/or time. For example, different sorting algorithms, Search trees, etc'.
- Object reuse. Object pools were created because we know that allocating memory and creating objects is time consuming (not to mention destroying them), its probably going to be more efficient to use object pools, you can request an object and when you’re done with it, return it to the pool.
- Exceptions. try-catch is usually handled by saving the current state, executing the try body and in case there’s an error, the state is rolled back, this is expensive in terms of CPU resources and try-catch is advertised as a low performance mechanism. In extreme cases where performance is critical, it might be wrong in design sense but right in performance sense to use other error handling methods, again, if you have such significant performance loses due to exceptions, you should reconsider your design anyway.
- Caching. Processing power costs money, memory costs money, time costs money, Caching can be a balance between the three, but it can also take your application to the wrong direction. Don’t cache everything, maintaining a dirty cache is as bad as no cache. Distributed cache can be a scale out solution but it can also be a design pitfall. With cache – design for failure, never assume something is already in the cache or fresh enough, keep a timestamp or other marks to make sure your cache is not stale.
- Deferred execution. If you don’t need real time handling, don’t do real time. Processing things now is costly and most servers are not busy 24/7, if you can postpone your reporting until the server is idle, you’ll look more professional than letting your users wait for an undetermined amount of time. Don’t promise the report to run in real time and get your system frozen until it finishes and don’t display the due time and miss it, regularly. Its unprofessional.
- Serial execution/Queues. If your application cannot run concurrently without using too many locks and creating random deadlocks (other than the fact that its misbehaving), sometimes its more beneficial to use queued execution, for example, sometimes its even faster to execute jobs in serial as none of them is attempting any locks.
- Push vs Polling. Both have their pros and cons, while push is usually faster and in some sense its less resource intensive since its not being queried every certain amount of time, its also keeping a connection open all the time. You should consider both options and decide based on how much delay you're willing to accept and how much resources you're willing to invest. Push can be more complicated to implement but solutions such as SignalR/Socket.IO/WCF callbacks provide enough infrastructure to never use that excuse again.
- Casts. Casts take time, not a lot, but in some cases can affect performance. You should use Generics or Interfaces to make the program more readable and not worry about casts too much.
- Accuracy. If you only need float, don't use double. This has varying results on different architectures, sometimes double is faster, sometimes its slower, I've contemplated if I should add it as an optimization, if your code is heavy on floating point operations you should at least look at it.
Where to optimize?
This is Visual Studio Profiler. I've made a little test program that compares several sorting algorithms, we can see clearly which ones take more CPU, the faster ones don't even show up, this is just to show you how easy it is to look for performance bottlenecks with this tool.
Note the views at the top, each view help you determine which are the most probable sources for bottlenecks, each one has a different purpose so explore all of them.
|IE 10 Profiler on http://demos.dojotoolkit.org/demos/mobileCharting/demo.html|
- setAttribute is executed 51k times and takes 329ms. perhaps its affecting the UI's performance? is there something we can optimize in that function or the calling functions?
- elementFromPoint is not executed too many times but still takes 316ms, what does it do? does it have internal loops?
- hideChartView is executed only twice but take 192ms. What else does it do beside hiding a chart view?
From first glance it doesn't look that this web application is taking too much CPU, so unless our customer demands it, we will probably not optimize anything.
But wait, we should also look at the call tree:
Here we can actually see which function is calling which, it can help us to understand some of the hotspots calling tree and we see that the most significant slow down is when a mouse is moving, something is rendering, creating a rectangle and setting stroke and fill. With some more digging, perhaps its even possible to speed up the execution.
Chrome can also show you how CSS selectors slow down your application, it is part of the style recalculation.
|Timeline - Events Profiling|
Yahoo's YSlow is a nice plugin, it gives you a general performance overview, if you loaded too many files, where you link you styles and other common issues, it doesn't give you a thorough knowledge of the application, but its still needed information for optimizing your application/website.
There are many more tools that can help you diagnose and optimize your websites and applications, but I've come to like Chrome and Visual Studio and for now they are exactly what I needed.
Methods of profiling:
- Sampling - Sampling takes a snapshot of the currently executing threads stacks, by statistically analyzing which functions are caught more in these snapshots, you can see which method takes more CPU resources.
- Instrumentation – Instrumentation inserts interception points in the code so its being measured whenever a function is called. This is more accurate than statistical sampling, but also slows down the execution significantly, it should be used in environments where the system being profiled is under load and its not known how many resources the application actually have at its disposal, the reason is that if the application have unstable CPU resources, the results will be skewed whenever there is or there isn't any load on the system, fast methods will seem like slow methods. Its also useful in multithreaded applications where some threads are affecting other threads performances though concurrency profiling might be more suitable.
- Performance Counters – Application performance can also be monitored by windows Performance Monitor, the user can also assign triggers to values you know have bad effect on performance, but this way you leave it to the administrator instead of implementing your own. Its very useful in server environments where these performance counters can be collected and analyzed later.
- Memory Profiling - Performance issues may arise as more and more memory is being allocated and the system goes into swapping, but that's not the only concern, each allocation incurs an overhead, it is usually better to allocate more than needed right now than to allocate very small chunks. Like everything, the key is balance. Visual Studio Profiler allows you to see which functions allocated memory, how many allocations and their total size.
- Resource Contention (concurrency) Profiling - When developing multithreaded applications, sometimes its not clear why a certain function is slow, you may see that there is a lot of time wasted on locks, but to understand how the interaction between the threads is hampering the application, you can use Resource Contention (concurrency) profiling, which visualizes how the application behaves.
Understanding trade-offsThere is no “best” solution, there are trade-offs, either memory efficiency, high performance or readability and maintainability. It is rare to have more than two, in a cost-effective planning you should decide which is more important to you for each module.
- List vs Dictionary. While list is usually more memory efficient, pulling non-sequential information from it, is usually slower, not to mention a search, which is O(n) on the other hand an ideal dictionary is close to O(1) but nothing is really ideal, on the other hand, adding a value to a list is quicker than adding it to a dictionary as a dictionary has to maintain an internal state for its key.
- Buffering. Buffering data can do two things, it can either prepare a larger block for an IO operation or it can buffer incoming blocks to achieve the same result in one large operation instead of multiple small operations, Buffering is usually done to reduce overheads at the cost of delay and memory. Buffering can be used in many ways, you can buffer a number of messages you want to pass to the client and then push one big bulk of messages, you can buffer incoming information so you'll execute a parser only once to show a few.
- Filtering/Grouping. If you do store a dataset in the cache, consider breaking it down to what your queries are going to use, storing multiple groups can be more efficient for later processing, but remember that storing multiple groups can add more latency so then again, it can also be less efficient.
- Precompute. Precomputing results can save time if these results will be used multiple times, partial precomputing can also help, again, take into account the problems stale data might create.
- Caching. Caching can help with static or partially static data especially when using a slow storage medium, this might give you a rough idea of what to expect when you think about caching:
Read 1 MB sequentially from SSD* .... 1,000,000 ns = 1 ms
Disk seek ........................... 10,000,000 ns = 10 ms
Read 1 MB sequentially from disk .... 20,000,000 ns = 20 ms
Send packet CA->Netherlands->CA ..... 150,000,000 ns = 150 ms
Note: that it talks about a packet, not 1 MB.
- Hashing. Lets think of this scenario, you want to find duplicate files in a file system based on content, for that you can really compare each file content to each file content, which will give you the best results. Lets think of optimizations for that program, first, we’ll remove all the 0 size files from comparison, because you can’t really compare 0 to 0. Then we can only compare same file sizes, if the file size doesn't match, the files are not the same anyway. Then we’re left with all the files with the same size. But what do we do with these 100mb files? We have 20 of them, comparing them is the same as comparing 19GB of data! We can save some time by hashing all the files, that’s only 2GB, then comparing all the hashes and only if any of the hashes match, compare the file contents, so now we’re comparing a minimum of 2GB and already know if there’s a chance anything else will match, we can also decide if our hash is enough for us or we want to compare the actual files. Now, a bigger question should be, should we optimize the hash function? Well, does it hash slower than your disk drive? How many comparisons like that are we going to execute each day? 1? Not worth it. All day? Perhaps worth it. Is it critical we’ll compare everything in the shortest amount of time? Could be worth it. Who is paying for that? You get the point.
- Parallelism. For a single user its probably more efficient to use their entire CPU with all their cores, but consider a situation on a server that you optimize a certain operation to use all its cores, a 2nd user comes along and gets access to all its cores, the performance effects are unpredictable, especially if the application is designed to create more and more threads, context switching is a real issue when using too many threads. I would advise against using multiple threads in web applications for the same request. Instead, you can use queuing mechanism and keep the heavy lifting on a different server.
- Inlining. Inlining has a minuscule performance gain, with that in mind, Inlining is a method the JIT compiler uses to speed things up for very small methods, when a program calls a method, a stack has to be filled with all parameters and the processor jumps to the method’s body. Inlining saves some of these steps by pushing the method’s body directly into the calling method, for very small methods it can actually speed things up, for medium to large methods, its negligible. .NET 4.0 introduced flags to force inlining (AggressiveInlining), previous frameworks only supplied an attribute to avoid inlining (NoInlining).
- Lambda/LINQ limitations. Lambda and Linq (which could use lambda) are great, I love them, especially for RAD, but since all lambda methods can’t be optimized at JIT compiler time, inlining will never happen. Take that into account if you’re processing many records. You should also consider using CompiledQuery for your LINQ.
Most probable performance hotspots:
- Loops/For/for each/recursion. Any loop has the potential to occupy the CPU and use a lot of memory, You should check how many objects are being processed and how much time each loop takes.
- IO (DISK/Network/Database) . Disk/Network/Database IO are many times slower than memory access, see if you can load the data in the background or load less data.
- UI. UI updates takes time, each layout has to be calculated and each control has to draw itself.
- Program flow. You should make sure a 2nd command is not executed if it depends on the first one to be successful - you are just throwing resources away, if attempting a retry for something, don’t discard the work that was already done if you can.
Most probable memory hogs
- Indexes, Hash-tables, Lists, Arrays. Containers of any sort hold data by definition, check if its really needed, offload it to cache if its giving you performance benefits but still want it to be off the application's memory. Be careful with distributed cache/out of process cache, network latency and throughput takes away the benefit of storing large collections as they need to be serialized and deserialized for going over the network, this is also important when using ASP.NET state service.
- RAW data, file contents (audio, video, text files). Most probably they don’t need to be stored in memory, the file system cache does a great job of using the unused memory.
Basic SQL Optimizations
- IO is expensive! You can minimize the amount of data being transferred inside the engine by using index includes and using the smallest possible temporary tables. Table variables are suitable for 1000 records or less in most cases.
- Functions are expensive! Especially when they run on each row, reconsider every function! Also, in that context, avoid joining on expressions, filtering by expressions and selecting with expressions, if you do, check the execution plans and make sure these queries are not too heavy on the CPU and not falling back to table scans.
- Avoid comparing different data types, some of them will cause a scan, it might be better to have an indexed computed column.
- Consider using Persisted Computed Columns with indexes and consider executing your queries on them instead of expressions, persisted computed columns execute the function on row update and also update their associated indexes like regular columns which makes them very efficient for a data-warehouse application.
- Execution plans are expensive! Use query parameters or stored procedures.
- Cross joins are expensive! Cartesian products are expensive, for each row in table ‘a’ join each row on table ‘b’. that’s a times b rows!
- Avoid Cursors! Cursors are slow, sometimes they are the only way to go but for many problems there are faster options than cursors.
- Indexes can make your life either heaven or hell depending on your understanding of them. Indexes can help you quickly retrieve data but each index adds overhead for each update, insert and delete. Use index includes (include columns in each index for quicker retrieval – avoids Key Lookup) when the benefits are greater than the downsides, index includes duplicate the data, that means more memory and IO, you can even add indexes on persisted calculated columns and have indexed views (schemabinding).
- Partitioning. Partitioning is an old trick, RAID0 is a type of partitioning without redundancy, RAID5 is a 3 disk partitioning. While SATA3/SAS removed most of the bottlenecks controller wise, more common disk drives are still relatively slow, by splitting your database files on multiple disks, you can have more IO/s available for your database, also, datacenters are starting to use SSD more and more, which speed things up even further.
- Memory. SQL assumes all the server’s memory belongs to it, it might free some memory when the server is in a really bad situation, but from my experience that’s usually too little too late and can bring a server to its knees. Its best to tell it how much memory to use, reserve about 1 GB of for Windows and even more if the server is not a dedicated database server. Ideally you should have at least enough memory for all the active database tables + enough memory for their most common queries. These days memory is cheap enough that saving money on it doesn't pay off but actually causes you to waste money on a slow server.
- Avoid negative comparisons. Negative comparisons will most likely cause a scan since it can’t use an index, consider your negative comparisons and the table sizes, Check your execution plans!
- Retrieve only what you need, join only what you have to. Retrieving more adds overhead which is usually CPU/memory/network/disk waste, save these resources for operations that needs them.
- Avoid Dynamic SQL, use prepared statements if you have to. All queries are compiled to execution plans, if these plans contain changing values then a new plan has to be created each time the query executes instead of reusing the previous query.
- Avoid Like comparison, especially with wildcards. Like with wildcards causes a table scan, if you must search for text, use full text search or avoid wildcards.
- Learn to read execution plans! I can’t stress that enough!
- Once your database is running for a while, get the top 10 most expensive queries to get a clue whats taking most of your resources.
- SQL Profiler is a great tool for knowing what's going on in real time, if I have performance issues, I'm setting it up to show everything above 100ms. Bear in mind that running a profiler on a server is not without risks, I've had to restart the SQL Server service once after a profiler froze and got the database to misbehave.
- Most expensive is IO and UI! Every time you show/hide/create/delete an element, the browser recalculates where everything should be, its called a reflow.
- Big/Multiple loops are slow, use associative arrays if you need a dictionary and regular arrays if you want fast iteration over the elements (for.. in is slower).
- On old browsers, DOM updates/parsing is very slow, innerHTML on IE is faster than on Chrome and Firefox.
- Avoid updating elements which cause reflows, padding on IE6-IE8 has the highest performance penalty (x4), while DOM manipulations generally trigger reflows, multiple DOM updates which create reflows are slow to terrible performance, absolute positioned objects have only their own performance penalties rather than affecting the whole document.
- Updating elements is faster before they are part of the DOM for the same reason - reflows.
- CSS wildcards are slower, use the most selective selectors
- Avoid IFRAMEs, browser in browser have a similar penalty/overhead to having a new browser window open. Also, IFRAMES block onload until they are done.
- Nested DIVs are slower in certain circumstances, deeper elements to reflow.
- Mind your scope! local scope is faster than global scope, prefix local variables with var.
- Avoid using nested properties in loops, they are not optimized. Make local variables and use them instead.
- Always measure! Don’t assume anything in the browser, there are too many variables!
var start = (new Date());
console.log("executed " + ((new Date()) - start).toString() + "ms");
- Use CDNs where needed. Browsers have a limitation for how many concurrent open connections they have, loading resources from multiple domains can overcome that limitation, in addition, CDNs can provide a closer to browser server, which can serve resources faster.
- Avoid eval(s), not only for performance issues but also security.
- Know when to use setInterval and recursive setTimeout. setInterval raises an event every x ms, but if your browser is a bit slow then it will be unresponsive as its executing the interval events. Another option is to use setTimeout and part of the function, call setTimeout again, this way you can tell the browser how much time to rest between events instead.
- Most expensive is IO and UI!
- C# is very fast, in some conditions competitive with C++
- Use the data types you need, avoid structs as method parameters or use ref if applicable, structs are considered values, a copy is created for every function call.
- Avoid dynamic. While its not directly related to performance, and it is a static type, its not checked at compile time, dynamic types are making your code less readable and could introduce problems.
- Avoid Reflection, accessing object type on runtime is slow, if you have to, cache accessors and learn about dynamic proxies.
- Avoid casting, generics is typesafe and avoids unneeded boxing/unboxing.
- Avoid COM, prefer managed code.
- Use Dictionaries for fast retrieval, avoid searching/linq on lists in critical sections, compare with CompiledQuery(ies).
- StringBuilder. Use StringBuilder if you have long list of manipulations, this can save both CPU time and memory as strings are immutable.
- If using multithreading, understand different locking mechanisms, use busy waits for very quick operations and wait locks for long locks.
- If creating/destroying many/large objects, understand garbage collection, generations, Dispose/Finalize, SuppressFinalize and Large Object Heap. While garbage collection provides a simple memory management for .NET applications, its important to know that reference rich and large objects have a certain penalty, reference rich objects needs to go through more work for unreferencing them and large objects are stored in the Large Object Heap which is unable to relocate objects and therefore doesn't free memory as often. While garbage collection could be a performance hit, it should only be optimized if collection times are affecting performance significantly
- Like everything else, measure! (StopWatch)
- Defer IO execution until the system is either idle or you must. IO takes CPU time, but it also uses system interrupts, which sometimes lock the whole system for the duration of the interrupt, more IO means less CPU for other tasks.
- Virtual methods/Interfaces. They have a performance penalty of about 10%, I would not recommend avoiding them as most of the time, the readability and maintainability they provide well exceeds their penalty, on the other hand, a property override might have a higher penalty (using 'new' keyword).
- Use System.Diagnostics.Debug if you want the compiler to remove the calls when compiling in Release and System.Diagnostics.Trace if you want to keep.
Guidelines for designing a new application/feature
- Ask/Decide on minimum requirements. Don’t optimize prematurely but design for requirements!
- Prefer library/framework methods, most of them have been already optimized but don’t follow blindly, when you profile, also look at the libraries.
- Avoid unnecessary function calls, where necessary could also be necessary right now.
- Prefer positive checks rather than negative in SQL.
- Learn to use threading mechanisms, Tasks, Actions, Async.
- Don’t waste!
- Is the execution order optimal?
- Does it have to run now?
- Am I discarding data?
- Is the application doing too much work in this point in time? How can I minimize it?
- Does the code trigger too many actions? Can I combine them to a single execution?
CLR Profiler - http://msdn.microsoft.com/en-us/library/ff650691.aspx
NGEN - Compiling Managed Code to Native - Avoid JIT
PageSpeed - https://developers.google.com/speed/pagespeed/
CLR Profiler - http://msdn.microsoft.com/en-us/library/ff650691.aspx
NGEN - Compiling Managed Code to Native - Avoid JIT
PageSpeed - https://developers.google.com/speed/pagespeed/