Operations Support: The Feature Not On Your Product Roadmap

It never fails. Your company starts a new software project and there’s a laundry list of functionality the business wants to see. There are more features than time, money and resources available, but IT folds under pressure and says yes anyway in the hopes that maybe the team can actually pull it off. Under these conditions developers sling code like it’s the wild west in order to meet milestones and deadlines. No one wants to sit through those painful meetings where the business says the project is behind schedule, but it happens anyway. The development team sits there in the status meeting getting grilled praying for the next bathroom break or lunch (because those meetings typically happen first thing in the morning). They sit there silently steaming about how the they were given unrealistic expectations from the start and it’s not fair. Yet they somehow force a sheepish grind promising to work faster and they do. On the surface everything initially look great and stakeholders begin to feel confidence that they actually didn’t ask for too many features after all. Chuckling to they tell themselves they knew it wasn’t that hard and how these IT guys always exaggerate the level of effort to get things done…

The Deployment

Fast forward towards the planned go live and timelines have slipped, deadlines have passed, but the application is finally “ready”. Stakeholders who themselves had uncomfortable meetings with their management about why the project isn’t “on schedule” are now feeling like they will finally get this thing out there and claim victory. Early on some issues happen during deployment, but the team finally overcomes them. The users slowly begin using the system in greater numbers and everything is look good. Then one day it happens…

Around 10:00 AM you suddenly get an unusual spike in user activity and the servers start screaming, performance hits the floor and users are calling in like it’s a PBS fundraiser hotline. Managers are running around asking people “How come we didn’t see this in testing”? Everyone shrugs their shoulders, but they know the answer. Testing was never the priority because they had to meet the deadlines. IT promises the business they’ll get to the bottom of issues and sets up their version of a situation room. The email blast goes out for an emergency all hands on deck meeting. All work stops and everyone is corralled into the room working the phones trying to help users and figuring out what is wrong. In the heat of battle the development and operations teams are ripping through code looking for issues. Some bugs are easy to fix while others appear to be happening at random. Fueled by on-going issues the user who fought the hardest against the project begin to hurl fireballs at both management and IT calling it a disaster. Management announces to the team we need to get these issues fixed ASAP or really bad things are going to happen. The developers pull out the Red Bull and Hot Pockets and go to work….

Postmortem Reality

With all hands on deck the developers are ripping through the code like homicide detectives. Unfortunately the find a several lack of information about the issues. They see the exceptions, but tracing code paths that lead to the exceptions is challenging. It’s at this point everyone start feeling the pain of the development phase of the project. To the dismay of the team there is not a great deal of telemetry and logging information coming from the system. You see, there simply wasn’t enough time to worry about operations support because everyone was so busy making deadlines. Under pressure the first two aspects of software development that go out the window is code quality and operations supportability. No one sitting in a room without a code editor will likely know this is happening, but it just so happens that the two most valuable things that would have helped in this end of the world situation is code quality and logging.

This is not to over simplify all the things that need to happen on a project, but just to point out that operations is not typically even a discussion during development. If operations teams were able to teleport back from the future how many of them would be smiling versus crying? The thing that you have to consider is that the initial development phase of most projects is a mere fraction of the total cost of ownership of a system. If you build for speed of deployment as opposed to maintainability then you’ve just shot yourself with an invisible bullet that you don’t feel yet. You may be developing in agile sprints, but operations is a marathons. Short term thinking can(will) cause long term pain. You’ve deployed your system and it’s not feasible to simply do a rewrite so you keep running with a big hole in your foot.

Operational support is a feature because it requires time and effort to implement. You can think of operations support as an insurance policy that know you’ll have to use one day. This insurance policy may not pay your entire claim when something goes wrong, but it may save someone’s life/job. It will certainly feel worth it when things do go wrong and when it comes to IT something going wrong is a certainty.

Categories: Blog, Project Delivery


%d bloggers like this: