This article does not pretend to be a universal guide on legacy code replacement; its aim is just to showcase how a software development team can solve some legacy code issues without going mad.
Once upon a time, we at 8allocate got an outsourced project that had been worked on by 3 different teams before. Here’s a brief project description:
Many physicians in UK clinics use voice recorders to dictate patient records. Our Client (under NDA) created an effective voice-to-text solution that allows creating pre-built templates and monitoring the entire workflow from a voice-to-text entry to task completion. Using this software, physicians can see the whole business logic and easily integrate their converted records with the hospital EHR and other digital systems.
Some workflow functions can be performed on the iPad, while the desktop still provides the best experiences ever. A very important condition and prerequisite are that no patient data can be lost, so all data is stored in a MySQL database on the user’s device and is then synced with the central server.
This software kit is currently deployed in more than 10 clinics across the United Kingdom, each having unique specificity. The databases also contain confidential information about patients, so each clinic has its own central server.
The software upgrade is pretty tough and cumbersome, as we need to connect each clinic to the test server first, then test it for 3 weeks and deploy to the main server afterward.
What We Got From 3 Previous Software Teams
About 400,000 lines of C # code and a lot of external libraries used before. Part of the source code was lost. There was a lack of documentation, a lack of proper QA and, as a result, a lack of test data.
Also, there was nobody on the client’s side who could onboard our team, share project insights, and transfer at least some knowledge for our developers to take over the project. The project code was written poorly, it didn’t comply with any of the coding standards and was pretty messy all in all.
Other legacy issues we’d inherited included:
- The database had over 120 tables;
- Solution performance issues (slow load speed, freezing windows);
- Wrong use of ADO.NET, Dapper, and LINQ
The app’s architecture looked as follows:
The desktop client is a working app for physicians. Most of the system functions are performed on it;
Desktop Client Service is a service that all Desktop Clients use;
Task Scheduler is a system module that is responsible for the transfer of tasks to the next stage and other periodic operations;
iPad App iOS is an iPad app for physicians;
Web Admin is a web part for setting up workflows, access rights, etc.;
Web Client is a web app for physicians with a somewhat limited list of functional features.
Initially, the app could process no more than 5,000 simultaneous tasks a week (from voice-to-text all the way through to synchronization with a hospital server to printing out for patients use).
Current State and Lessons Learned After 18 Months of Coping With Legacy Code
The performance of the application increased by 10-12 times, the number of simultaneous users increased 5 times, and the number of tasks processed per week exceeded 30,000.
Now let’s see how we solved major issues while upgrading the client’s legacy code and what lessons we learned from the endeavor.
Lesson #1: how to prove to the client that their application sucks
As you understand, the client only sees the tip of the iceberg: the app works somehow, it processes tasks as intended, etc. The client rarely cares about “under the hood” issues that may cost an arm and a leg in the long run.
To better explain the roots of the problem and convince the client that the app leaves a lot to be desired and should undergo a major overhaul to stay competitive, we decided to visualize the whole situation along with issues and bottlenecks. For this purpose, we chose NDepend, a static code analysis system that checked the code against 150 standard coding rules and found dead code parts that don’t use any classes and methods.
Further, we provided a comprehensive speed analysis report, which really impressed the Client and convinced them that the solution was to be revamped to let the company keep the head above water.
Bottom line: NDepend has very good capabilities, such as creating visually clear and attractive charts and graphs that help identify and demonstrate the solution’s bottlenecks in a nice yet compelling way to project stakeholders.
Lesson #2: how to ensure sensitive data confidentiality
The clinics work with real patients, and their data should be kept confidential as per the UK and EU laws. Moreover, there’s a strict requirement that all EHRs be located on the hospital’s central server. Our Client’s voice-to-text solution had to comply with EHR maintenance rules to be used within the hospital. However, we needed a copy of the patient records’ database for better QA and testing. Due to their confidential nature, the hospitals refused to share the database with us, which came as no surprise.
To solve the issue, we had to build from scratch a unique data anonymization app that replaced all letters in the patients’ records with units, replaced all audio content with an array of zeros, replaced all names with generic words like Name 1, Name 2, etc.
It took us 3 days to build and deploy the app, which allowed us to create a live database with anonymized records and avoid breaking any personal data sharing rules and policies.
Bottom line: confidential data can pose a huge impediment for software development teams. Implementing database anonymization can provide short- to mid-term advantages and better predictability.
Lesson #3: how to cope with the code not optimized for unit testing
As the quality of the inherited code was very poor and the code implied that no testing be carried out (they used an obsolete Entity Framework not optimized for unit testing at all), we decided to implement integration testing and cover all new features with unit tests.
For integration tests, we created an additional add-on and a database recovery function. So the programmer had to launch the add-on before the start of the test, and then run the data saving utility. The utility for saving data simply went through all the tables in the database and saved the data as an XML file. After this, the recovery function dynamically created a new database, recovered the data, and changed the connection strings so that the tests would see and match this new database. For data recovery, we ran an insert command and used ADO.NET to execute it. This allowed us to be independent of a largely limited Entity Framework.
As our experience shows, with a properly prepared database, data recovery literally takes just a couple of seconds.
Given the Continuous Integration (CI) that we deployed across the project implied that night tests be held, successful data recovery was one of our acceptable values.
Besides, our testers used Ranorex which allowed them to add command-line arguments and use the database to restore data.
Bottom line: for legacy code, “reinventing a wheel” or using kludges is almost inevitable. Restoring a database can actually be very convenient. There’s no need to store data and restore the database in the test project; it is better to archive their data with the dropbox and store a link to it in the archive.
Lesson #4: how to recover the lost part in iOS code
In addition to the web app, the Client’s solution was meant to be used on the Pad. As it turned out, the iOS source code was lost. Having conducted comprehensive research, we tried to restore the Objective-C code using decompilation (we used RedGate Developer Bundle including Ants Profiler). In general, the quality of decompilation was very good and the research let us authenticate iOS code in the WCF server and restore it in the latest version of the application.
Bottom line: The RedGate Developer Bundle really helps recover the lost or damaged code.
Lesson #5: how to identify profiler issues in just 3 minutes
We also encountered a critical system performance problem: after 40 minutes of use by the client, printing delays began. We checked all we could from the server’s point of view, but we didn’t find any issues there.
Fortunately, at that time, we decided to try Ants Profiler, and it turned out to be the right choice. Although it is really easy to use, it requires administrative rights to install on a local device. After launching the client for 40 minutes under the profiler, finding the problem was just a matter of 3 minutes (in one place there was an accumulation of the list of abbreviations, and each time the character was typed, and the reduction was checked, delays happened).
Other advantages are an outstanding visual reporting, the ability to log SQL queries, disk accesses, network queries, etc.
The disadvantage is that at the highest level, profiling does not work well with COM objects. But after some experiments, we found that detailing at the procedure level does not change anything, and we have been using this tool for more than a year now.
Lesson #6: how to accelerate database performance
As I’ve already mentioned above, due to the fact that the Client works in healthcare, we have a rather lengthy and complex system upgrade process. About half a year ago, when the number of tasks processed simultaneously increased to 20,000 per week, the Client faced another issue: the system was too slow to process the task queue, which prevented users from following the workflow.
As such, we had to optimize the application without changing it on the server.
It is logical to assume that if you need to optimize the app without changing the code, you need to optimize the database, not the code itself. In this case, the indexes in the database have already been created.
After a week of thorough analysis, we decided to try to transfer the database to a snapshot mode. Unexpectedly, it gave us surprisingly good results; the queue from 400 tasks was reduced to 12 tasks only.
Bottom line: in addition to creating indexes, don’t forget about MS SQL.
These are just six of many lessons learned from working with a legacy project inherited from 3 previous teams. Predecessors left the code in a very poor condition, failed to test it, optimize its performance, or follow any coding standards, which eventually resulted in the voice-to-text app for physicians being glitchy, slow, and unable to meet the sophisticated requirements of modern data protection and EHR management policies and rules.
Having applied kludges and team brainstorming, we could significantly speed up the app’s performance, access EHR database with anonymization, recover iOS code, enable users to use the app on the iPad, and significantly increase the number of simultaneous sessions within the app.