How a program deals with problems is often the best measure of whether or not it is truly world class. It isn't an aspect of the application that most users deal with on a regular basis (at least, you hope not), but when it does happen it is memorable. How many users have horror stories about the number of hours worth of work they lost because of application X? Could that work have been salvaged? Would their view be different if instead of losing hours the code had been smart enough to reduce it to minutes?
Data Integrity
Reading this heading, I bet most people are thinking about dealing with catastrophic failure cases: Power loss, processes hanging, processes being killed, processes killing themselves, and so on. How a program handles these situations is very important, but it is a subset of the topic.
How does your application deal with user mistakes? Take that farther, how does it deal with a user doing exactly what they wanted to do, but they find out later on it wasn't the right approach?
Failure Cases
The axiom is: Always fail with more data. When designing the system you should be figuring out what line of code would be the very worst for the plug to be pulled on the computer, and make sure you can recover from that spot.
If you're in the middle of an operation, you should be able to recover to the point before it (make the operation atomic). At some point you'll have enough data that you can finish the operation even if it is interrupted. Knowing and designing for these thresholds is critical for a robust application.
Undo/Redo
A critical aspect of data integrity that is often overlooked is the user's ability to protect their own data, and the application's support of that. I've used programs where I make a few changes, hit control-Z to undo the last, and it undoes a bunch of the changes all at once. I now no longer trust the program's undo, which means I don't trust it with my data. If there's a redo it becomes a recoverable operation, and now my data is much more safe again.
With 'infinite undo and redo' that actually works I could try a train of thought and get back to the original point without worrying. Let's go a step farther, what if I was able to save those undo/redo paths? Then we get into the wild world of...
Versioning
Proper versioning support is conceptually undo/redo support at a much bigger scope. Instead of being able to undo changes across only the current session of using the application, version history allows users to undo/redo entire sessions. As it becomes more sophisticated, a user can find out not only what change was made, but who made it and when. With change descriptions, the user can also find out why.
Power Management
I think about the role applications have in power management in two ways: how they interact with OS notifications of changes in power state, and what they do to help or hurt battery life and similar power management issues.
Power Management Awareness
If the default operating system mechanisms for supporting power management on your application's behalf aren't enough (and you should know exactly why they aren't before going to any trouble here, see the last section in my article on Future-Proofing for more details on why), the first level of support is to get notified by the OS that power management levels are changing and do the right thing.
Be very judicious in what you do here, however. Users want standby (for instance) to happen as quickly as possible, and they want to come back from it as quickly as possible. If your code is getting between the user and the experience they want (no matter how important you think your application is), you are a burden, not a boon. Do the minimum possible and get out. Heck, this applies to code running during start up, shut down, log in, log off, etc. Get out of the way of the user.
Active Power Management Applications
At this level, the application is paying attention to the power state of the computer in a deeper way and going out of its way to help. The best example of this is the program that notices the user is on battery power and attempts to minimize disk access and heavy computation. Can you imagine a game that has a 'battery power' mode? I like to dream about such things.
Network Management
How well does the application deal with: Losing a network connection, the machine losing its IP address, computers with multiple network interface cards (and thus multiple IP addresses), wireless roaming where a laptop's IP and subnet can change as the user walks around, or lossy networks. The correct behavior for a networking application is usually so specific to its domain that I won't add any more advice than to think about all of these issues during design, implementation, and testing.
Storing Information
Wherever possible, do not store hardcoded paths. Use relative paths, especially when you can be relative to well known locations (such as Program Files, My Documents, etc.).
Store network locations by name rather than IP address or similar.
If you have user configuration information, you've got a few decisions. File or registry? Explicitly list the default value during installation or have them programmatically enforced if they aren't specified?
File or Registry
I categorize settings as follows:
- Settings for the current computer - I like these to go into the registry in HKEY Local Machine (unless the user is not an administrator, in which case you should have a place for these in HKEY Current User so you don't require administrative privileges).
- Settings for the current user on this computer - Since these are specific to the current computer, I believe the right place is in HKEY Current User in the registry.
- Settings for the current user computer independent - These can either go in the registry, or in a file (XML makes some sense here as long as you know how to write and parse it efficiently) under My Documents. Why there? My Documents is a great folder for users to roam and use Offline Files support, which then allows the user to have one preference file for all their computers using that same account. Note that you would then need to implement yourself things the registry gets you for free (including fault tolerance support), so weigh your options carefully.
Explicit or Implicit
I'm a big fan of only listing settings that have either been explicitly set by the user or administrator or are different from the defaults. This allows flexibility for future releases of the product, since the question of whether or not to change the user's setting based on a change in the application is pretty straightforward (don't touch the settings, just change the default behavior if there has been nothing explicitly set).
Debugability
Bear with me for a moment, but let's assume briefly that your application is released with bugs. I know, that's crazy talk, but let's just suppose that were the case. One mark of world class programs is the ability to handle this situation, let's look at a few approaches.
Windows Events
The first question you need to answer when you're thinking about storing information about an error or potential problem is who your audience is. Who can actually take action about the issue? If that audience is the system and/or network administrator, the rich data about the problem should go into Windows Events. This is especially true if you ever have an error message shown to the user that tells them to contact their Network or Computer Administrator.
Logs
I'm not a huge fan of log files, although for some situations they can be incredibly useful. Too many developers feel comfortable with logs and use them as the hammer for which all issues become nails. Use appropriately, log information that has high value for the effort, and keep an eye on it to keep it from being abused. Structure your data well, maybe in XML, so that it can easily be parsed and viewed in reasonable ways. You can put all the information you want in these log files but if it is difficult to get and deal with the information needed to figure out what went wrong, it was all pointless.
Useful Error Messages
Most error messages that make it in front of end users have no more value to them than if a dialog came up that only said: "Crap. Good luck." Many are not even that nice, and are worded such that they blame the user for the failure (such as any message that starts with "You have failed to...").
Whenever you show UI to a user there should be a purpose. What action can they take to recover from the problem? And while we're talking about that, if you know enough to tell them how to recover from the problem...
Automatic Error Recovery
Why trouble them at all? Fix it. Make it right. If needed you can explain to the user what you've done and allow them to undo, but the whole point of computer applications is productivity. Don't put barriers in front of your users getting things done, break them down and get out of their way.
Comments