I must give thanks to Jeffrey Richter for writing his wonderful book called CLR via C# as it cleared up the confusion about volatile keyword that I had.
How does CPU access variables in memory
We all have heard that CPUs have cache memory and having more cache memory at the CPU (L1,L2,L3) makes processing faster so when we buy a computer we try to buy it with a CPU with large cache memory. Why? The reason behind this is that in terms of CPU speed the RAM on motherboard is very slow. The memory that CPU has called cache memory is extremely faster than the normal memory.
So on the first run the CPU accesses the memory address and stores it in the cache. When the variable is accessed for the second time, it is returned from the cache. So all subsequent reads are done from the cache. The same thing happens for write operations as well. When the variable is changed, it is changed into the cache, also subsequent read/writes are done from cache. However the writes are eventually flushed into memory when cache is cleared or filled with other data. One of the intelligent things that the CPU does is when fetching the value of the variable (a few bytes) into the cache, it also fetches the values around it since the next variables to be used should be close by it.
This is fairly fine when you have only one CPU, but most of the PC and Laptops have multiple CPU. Now with the multiple processors this cache access can become problematic. With multi-core CPUs, true threading happens, two machine instructions get executed physically at the same time. Since two processors can have different caches, which are copies of the ram, they can have different values
In x86 and x64 processors (according to Jeffrey’s book) are designed to sync the caches of different processors so we may not see the problem. But the IA64 processor takes the advantage of the fact that each processor has its own cache and does not synchronize rigorously. So different threads executing may have put in different values in the cache.
Please note that one CPU may write to the its own cache, which will eventually be transferred to RAM, the other CPU may have read from the RAM which contained old value since the first CPU has not updated the value yet. See below (click on image for larger view)
So this creates an obvious concurrency problem. That is where the volatile keyword comes in. If you declare a field volatile it is always read from memory and written into memory immediately. However it must be noted that all interlocked operation (again thank to Jeffrey) like lock, Monitor, Mutex, Semaphore etc synchronizes the caches. So this volatile keyword is not needed for those.
But volatile variables are slower. It would have been great if there was a write volatile only variable.