« The Joys of CSS | Main | RSS vs. The Web »

2003.09.23

Why Does Setup Make Windows Reboot So Much?

The short answer is: because memory mapped files are locked by the system and can't be changed while they are in use.

For the long answer, I want to go way back to how programs are run on a computer and begin at the beginning...

Running a Program

A computer program consists of a bunch of instructions to the CPU (for instance, the 1.3 GHz Pentium III) of the computer it is run on, and some additional data used by those instructions.  These are very detailed, taking the form of:

Take the number four and the value of address 4000 in RAM and add them together, put the result in address 5000 of RAM.

When you run your program, those instructions need to be read into RAM (the memory of the computer) so they can be available to be loaded into the CPU to tell it what to do.  There are a few options on how to load those instructions into RAM.  One of the most straightforward is to read the whole file (say, program.exe) in and keep it there, but it has a few drawbacks:

    • It uses a lot of RAM.  Most programs use shared libraries (in Windows they're called Dynamic Link Libraries, or DLLs) to re-use functionality already in either the operating system or component pieces the program writers created themselves.  In other words, you not only have to load your program's file into memory, but also all the files of all the DLLs that the program uses.
    • It is slow.  Reading from disk is one of the slower things you can do (relatively speaking) with modern computers.  Anything that can be done to minimize disk usage will make the overall system and the program you're running faster.
      • This is especially true if the program you're running, and/or one or more of the DLLs it is loading, is being accessed across a network instead of local to the computer.
    • It is wasteful.  There are parts of programs that are not used most of the time.  Some users run Microsoft Word, as an example, and are very comfortable and skilled with a particular set of its functionality, but the Office suite in general has enormous functionality and very few people use a large portion of it.  Why take up a lot of RAM and make everything slow to load instructions for the CPU that will never be needed?

    Another option, the one used by Microsoft Windows, is 'memory-mapped files'.

    Memory-Mapped Files

    Let's say we have our file program.exe, and it is 5,000 bytes long.  If we memory-map it, we associated a RAM address with the beginning of it, and the addresses from that start through the size of the file are now associated with the data in the file.  If RAM address 5000 was the start of program.exe's memory mapping, then RAM addresses 5000-10,000 would be the data in the file itself.

    The operating system keeps track of this association, so if you access RAM address 6000, it knows to get the data from the file if it doesn't have it in memory already.  This is important, because now that there is this association, the operation system only needs to read from the disk when someone asks for a value. In practice it does this a page at a time, which means that if you ask for any information whose address is within the page, or predefined size chunk of memory, it will fault it in, which essentially means stop everything and read the data into memory.  Any requests for other information within the same page will find it already in memory, and no reading from the disk is necessary.

    The operating system can be quite clever with this scheme, and keep track of which pages haven't been used in a long time, free up the space being used by it, and fault it back in later if someone wants information out of it again.

    Some people reading this will said to themselves "Isn't this how virtual memory works?"  The answer is yes, virtual memory is implemented by a paging file which is just a memory-mapped file the system uses to store more information in the virtual RAM than there is physical RAM on the computer.  In fact, memory-mapped files are used for all sorts of things, including letting programs running on the same computer communicate with each other (inter process communication).

    Sharing Memories

    So far so good, but we quickly run into a snag with both of the options discussed above, what happens when more than one program wants to use the same information at the same time?

    Let's say I've got a DLL called mydll.dll (believe it or not, such DLL naming conventions do exist), and it is being used by myprogram.exe and yourprogram.exe at the same time.  If you are going with the 'load the whole file into RAM' approach, you have to decide whether to load mydll.dll into memory a second time or share the already loaded version with the new process.  The former is wasteful and uses a lot of RAM as well as being slow (see the drawbacks listed above), but it cleanly takes care of the problem of what happens when mydll.dll has actually changed between when the first program started and when the second one did.

    The drawbacks to having every program have its own version of the full copy :

    • It uses a lot of RAM.  Look familiar?  I'm wasting RAM by keeping duplicate copies of the same information most of the time.  If I've faulted in the same page for the same DLL in two different programs, I would have those pages in physically different memory taking up twice the space they otherwise would.
    • It is slow. Keep in mind that every time I fault in one of these pages I don't have to or read into memory a file I otherwise wouldn't, I'm accessing the hard drive (at best).  This slows down your computer in a huge way, don't underestimate it.  In addition, if by faulting in pages from a memory-mapped file I use up all the physical memory, I might have to page out stuff that is currently in memory, which means to write it out to disk and make more room in the computer's physical memory.  If that is information a program is going to need soon, it will have to be faulted in again soon, which creates a cycle of slowing down the computer. This kind of thing can lead to page thrashing, where a computer is spending so much time reading and writing pages to files that almost nothing useful is being done.
    • It is wasteful. Except in very specific cases, most of the time for most programs the files we're talking about will not change between one program running and another running.  This means that the system would be optimizing for the exception, rather than the normal case.  Not good design if you want programs to run quickly and efficiently on a day to day basis.

    The same basic choice exists with memory-mapped files as well.  Do I share the memory-mapping between the processes or keep them completely separate in order to deal with this versioning problem?  However, this doesn't work at all with memory-mapped files because the actual information is stored on the hard disk, not in memory.  If, halfway through running my program, the information gets changed and new pages of instructions and data from the new version of the file start getting faulted in, the program will begin executing instructions it never expected and doesn't know anything about.  This is bad.  In fact, the operating system needs to make sure this can't happen.

    Locked Files

    It does so by 'locking' the memory-mapped files.  The prevents anyone from changing the file out from underneath the programs using it, so we don't need to worry about getting unexpected changes to the instructions and data while running a program.  This is good, it makes the system stable, but it has one major drawback, so I'll try to bring everything I've said together into what I hope is a coherent whole:

    1. For efficiency and performance, executable files and DLLs are accessed as memory-mapped files by programs running on Windows.
    2. To allow multiple programs to share the memory-mapping of those files and to protect against the contents of those files changing underneath running programs, executables and DLLs are locked against changes while they are in use by any programs.
    3. Windows itself is written as executable files and DLLs and runs as programs which lock those executables and DLLs.
    4. Installation programs and programs that update Windows itself are written as executable files and DLLs that run as programs which lock executables and DLLs they use.
    5. If the installation program needs, for whatever reason, to update a locked file, it cannot do so.
    6. If the installation program for an update to Windows, for instance, needs to update a portion of Windows that it is itself using, it cannot possibly succeed.

    Ut oh.  Through the joy of logic we've managed to paint ourselves into a corner and can't get out.  But of course most people who own or use a Windows machine have installed an update, a service pack, or a program that updates executables or DLLs that are in use (in fact, this is why most installers 'recommend highly' that you exit every program you possibly can before continuing installation, to reduce the number of locked files they're likely to encounter).  What's the solution to this problem?

    Reboot

    The solution is to put the new version of the file into a temporary place and add an instruction to the boot process for Windows to tell it that, next time it runs (next time the Windows system boots up), copy this new file over the old file.  The old file will not be locked because nothing except the Windows boot process is running, and it doesn't depend on anything but itself.  In other words, you break the deadlock by putting the computer into the only state it has where nothing can be locking a file, then copy the new version onto the old one.  Then encourage the user to reboot the computer, one way or another.

    The Past

    This was a much bigger problem in the past, and in previous versions of Windows (prior to Windows XP).  New programs would often install DLLs that they used but that weren't available in the earliest version of the operating system they supported, and would do so to the system directory.  Reboots have become quite a problem, however, so most companies now go to great lengths in order to avoid them by either having private versions of the DLLs in their own directories (which may waste hard disk space, but that's cheap enough these days) or by doing sophisticated detection during runtime to tell a user that, although their program is running fine, certain functionality won't be available until they reboot.

    Having said that, the less adept developers may create programs that still cause this problem.  Types of installations that can still reasonably (more or less) require a reboot:

    • Operating system updates.  Note that even here Microsoft goes to great lengths to try to avoid reboots in patches whenever possible.
    • Driver updates.  Chances are if you're updating your video drivers the old ones are being used and their files are locked.
    • ... Not much else.  If you install a normal piece of software and it tells you that you need to reboot for installation to complete, the odds are that blaming lazy developers is a winning bet.

    The Present

    There are a couple new operating system features that have been created to address these issues, and have the potential to eliminate most of the causes for reboots:

    • System File Protection (SFP)  This is a feature to help security for the Windows user as well as help the reboot problem.  In sum, most of the files that Windows installs are locked down from the start, so nobody except Windows updates can change them.  This helps prevent malicious programs from copying themselves over a Windows executable, for instance, and also reduces the chance of program installations requiring a reboot since they can't even attempt to change those Windows system files.
    • Side-by-Side Installation (WinSxS)  Up above I said "this doesn't work at all with memory-mapped files" when talking about having the two different versions of the same DLL coexist at the same time and be used by different programs, but of course there is a way.  Have two different DLLs.  Then you have to solve the problem that when a program says it needs functionality in mydll.dll the operating system needs to know which one is the right one.  Side-by-side does this problem my keeping a bunch of information about both the program, which versions of those DLLs it is dependent on, and the location of the different DLLs and which version they are.  When the program loads, it tells the operating system to load the DLL, and the OS knows the right one to load.  Since the different versions are physically different files, there's no problem.  If you install a new version, it is put into a new file, the old file remains, and there is no need for a reboot.

    The Future

    The ideal, of course, is for a reboot-free computing experience.  Expect more features like SPF and WinSxS in future versions of Windows, and even enhancements to those.  I'm not personally aware of what work is being done in this area, but I have no doubt there is.  This is a well understood although difficult problem area, and a lot of work has gone into and will no doubt continue going into making the experience better and hopefully eventually solving the issue once and for all.

    TrackBack

    TrackBack URL for this entry:
    http://www.typepad.com/t/trackback/2416040/18064074

    Listed below are links to weblogs that reference Why Does Setup Make Windows Reboot So Much? :

    Comments

    Post a comment

    If you have a TypeKey or TypePad account, please Sign In