Rewriting Software




Rewriting Software



by David Barth
written May 31, 2007

The Case for Rewriting Software in the Original Language


A Less Expensive Approach to Software Renewal

Many of the software systems that were written more than twenty years ago, often referred to as "legacy systems," are in need of being replaced. This paper discusses some of the justification factors for rewriting a system, and then discusses possible methodologies, in particular, the case for rewriting the software system using its existing language instead of converting to another language.


JUSTIFICATION FACTORS FOR REWRITES


Eliminate obsolete, unused code
Migrate to a modern language
Increase system functionality
Increase system maintainability
Achieve a higher level of Capability Maturity Model (CMM)
Interface with newer technologies
Provide Web-enabled functionality
Eliminate software architectures originally needed to address hardware restrictions

Each of these reasons may be important enough, by themselves, to justify a rewrite. Below is a general abstract of each of those factors, listed above.


Justification to rewrite: Eliminate obsolete, unused logic


As a software system evolves, there are usually functions that are no longer needed, and the system has to be modified to remove them. Programmers usually leave the code for these unused functions in the programs just in case that function is resurrected in the future, by simply routing the logic path around the unneeded program instructions, resulting in what is known as "dead code."

Leaving the unused, dead code in place is a "safety net" for the programming staff because there are times when a function or requirement is reinstated shortly after being deemed unnecessary. If the code has been removed, it requires more time to re-invent and re-deploy it than if it were simply left in place and the logic routed around it.

The problem with this approach is that this code is rarely removed in the future, and the aggregate knowledge of the programmers about exactly what code became obsolete, and why, is lost. As time goes by, programmers are less interested in removing unused code because it may upset the operation of the system.

Most languages have a tool that can identify unused code, but such tools are rarely used to clean up a system because the code doesn't create an acute problem and the cost/benefit of acquiring and using the tool is rarely justified by management. Unused code related to a deleted function often may span several programs. The location of dead code is rarely documented to allow it to be safely removed later.

Dead code exists because the methods that programmers use to modify a software system is usually left up to them, and management ignores the long-term effects of how they solve a problem at the code level. The code itself is rarely viewed by anyone besides the programmers who write and maintain it.

As a system evolves, it often becomes more difficult to maintain, and unused code is one of the culprits. Unused code can increase the size of a system by 20 percent or more, depending upon the rate of change to system functionality and the longevity of the system.

The reason to rewrite usually isn't based solely on the fact that there is unused code, but is often done because maintenance has passed some threshold of difficulty, increasing the cost of supporting a system. Often, the solution that management chooses to eliminate this problem is to rewrite the system from scratch, using a newer language that management perceives is more capable.


Justification to rewrite: Migrate to a modern language


Legacy systems are usually written in an older language that has lost favor by management, is no longer taught by schools, and is not understood by recent programming school graduates. Examples of older languages include Cobol, Fortran, Pascal, Jovial, Basic, and a . Examples of newer languages are C, C++, Java, and various derivatives of them including C sharp, Javascript, and Java beans. (For a list of programs and links to additional information on them, see Appendix A.)

The problem with older languages is not that the language has lost the capabilities it had when it was new, and it is not that it cannot be made to be as capable than a newer language. Two problems that haunt older languages are: 1) the lack of a support organization that ensures it is improved, and 2) the existence of new languages that are touted to be "better" or more "modern" than the old languages.

For these two reasons, education institutions may elect to not teach older languages. Creators of a new language tout it as though it were much better than existing languages, and corporate decision-makers, who may not possess sufficient technical expertise, grab on to the new language and its associated "buzz words" as the solution to existing problems with the current software.

The outcome is that the corporate decision-makers may decide that the company's software applications should be rewritten in the new language. Usually, the rewrite is started with a "clean slate," approach in which subject matter experts (SME) in the existing software system are recruited to join the rewrite effort. This dilutes the knowledge of the existing system, which exacerbates the problems and costs of maintenance.

Choosing a language should be based on two factors: 1) that its instruction set can provide the necessary functionality; 2) that it is easy to write. The reason for ensuring that a language is easy to write is that throughout the life of a language, from its cradle to grave, the cost of the system is directly attributable to the ease of maintaining the software system. A language that is easy to learn means that there will never be a lack of programmers because the language can be easily taught to anyone with a programming aptitude and desire.

That different languages provide different computer capabilities is, basically, false. All computers operate using a specific, proprietary, "low-level" machine language. The only thing that a "high-level" compiled or interpreted language does is to convert the code written by a programmer into the language that the machine can understand. The one variable is that a high-level language might not be capable of generating the appropriate machine language code. This problem can be solved by introducing new constructs to the high-level language so that it can accomplish everything that the computer is capable of doing at the machine language level.

Adding functionality to a language usually requires the existence of an organization that oversees and improves the language. A case in point is Cobol, created in the 1960s and promoted by the U.S. Navy though the efforts of Commander Grace Hopper. Since its inception, Cobol has gone through several major improvements including the addition of object oriented constructs, the addition of string manipulation capabilities, the implementation of graphical user interface (GUI) features, and has been made compatible with Microsoft dot net which means that it can be used to create Web-based applications. In essence, the dot net version of Cobol is every bit as functionally capable newer languages such as C++ and Java.


Justification to rewrite: Increase system functionality


Sometimes the decision to rewrite a system is made to meet the requirement of increasing its capabilities. This may be a logical approach if the desired capabilities are much different from the existing capabilities. However, most system requirements don't change catastrophically over time. Most new requirements are adjuncts to the existing functionality.


Justification to rewrite: Increase system maintainability


Older systems that have had a lot of changes and have not been well documented throughout their life cycle, using CMM techniques, for example (see Appendix B for a CMM overview), can improve system documentation and maintenance to the extent that its longevity is lengthened. Better documentation can reduce maintenance overhead because it is easy to find out how the software currently operates. This can translate into easier, faster maintenance of the system.


Justification to rewrite: Achieve a higher level of Capability Maturity Model (CMM)


It is possible that when a system becomes so fractured due to ad hoc changes and maintenance procedures, a rewrite using CMM procedures can create a system that is easier to write and maintain. However, an organization should develop and refine its CMM expertise before it begins a major project.


Justification to rewrite: Interface with newer technologies


The advent of new technologies has created the need for many software systems to interface with them. Examples include Global Positioning Systems (GPS), cellular telephones, satellite telephones, cable television, satellite television, satellite radio, operation monitors in vehicles, airport security systems, etc.


Justification to rewrite: Provide Web-enabled functionality


The advent of the World Wide Web has created the need for software systems that can use it. The reason to rewrite a software system so that it is web-enabled can be valid if the existing language and system cannot be altered to provide that capability.


Justification to rewrite: Eliminate software architectures originally needed to address hardware restrictions


Some legacy systems may have had to be designed to avoid hardware restrictions such as a limit for the amount of memory that could be installed, a limit of the amount of Direct Access Storage Device (DASD) that could be installed on a hardware system, a size limit for programs, and a limited processor speed.

As computers have leaped ahead in their capabilities, if the work-arounds to accommodate these limitations are no longer necessary, a rewrite could be justified. For example, if memory was too limited, the software might have been divided into smaller programs. In at least one case, a "traffic control" program was written to manage multiple calls of subprograms. In other cases, large files or tables had to be subdivided and portions off-loaded to auxiliary storage due to storage capacity limitations. Very large programs had to be parsed into smaller functional segments, but in today's modern hardware, program size is rarely an issue.


REWRITE IN THE EXISTING LANGUAGE OR A NEW ONE?


This question is often answered by non-technical executives who have latched onto a buzz word they have heard or read in the media. For example, "object-oriented language," "Java," and "C++" are considered by some to be the methodologies that will solve IT software problems.

In the year 2000, nearly a quarter of all commercial projects in the U.S. were canceled, at an estimated cost of $67 billion. The failure of large projects and the significant cost overruns for many of those that are completed indicates that new ways of programming are needed.

An example of the cancellation of a large commercial software project is the rewrite of Qwest's customer billing system, canceled in October 2000. On the government side, the Bureau of Land Management's ALMERS software project was canceled around 1999. These are just two of many high-dollar projects that were terminated after more than $100 million had been spent on them.


THE PROBLEM


The complexity of large software applications has made new development difficult to complete. Maintenance of large applications is challenging. A Carnegie Mellon University study showed that programmers make 100 to 150 mistakes per 1,000 lines of code that they write.


PROPOSED SOLUTIONS


Current research is being done to try to overcome problems involved in creating and maintaining complicated programs. The approaches addressed here involve modeling very large applications. One researcher is attempting to develop a way to automatically generate program code from a model.


UNIFIED MODELING LANGUAGE


An approach to improve the creation of program code is the use of the Unified Modeling Language being developed by IBM. This modeling language is used by programmers to model an application based on the user's needs. After the model has been finalized, programmers can write code to reflect the model. This approach still requires the programmers to code the programs, but it helps programmers and users to understand and agree on the requirements and functionality of the proposed system.

Another idea is a proposal to create a model of the large application before programming is started, then use intelligent software tools to convert the model directly to program code.


MODELING EXISTING PROGRAMS


James Gosling, a research fellow at Sun Labs and the inventor of Java, is working on a project called Jackpot that will provide the ability to feed an existing program into a modeling tool to create a graphic representation of it.


TOOLS TO CONVERT A MODEL TO A PROGRAM


Intentional Software, a company run by Charles Simonyi, former researcher at the Xerox Palo Alto Research Center (PARC) during the 1970s, who later became chief architect at Microsoft is trying to automate the code writing portion of the software development process to reduce bugs.

Simonyi's proposed solution is to create programming tools to assist writing bug-free programs and make the code look like the design. The idea is for a programmer to convert application requirements into a chart format from which powerful tools create the code. After the code has been created, the user community could modify the code without the help of a programmer. Programmers would become program designers. The detailed programming work would be accomplished by the software tools.


AUTONOMIC COMPUTING


Autonomic research is looking for ways to created "self-healing" software that fixes itself when a problem occurs. For example, when some poorly written programs are run several times, memory leaks can cause memory to be used up so that nothing can run. The solution is to reboot the computer. A goal of self-healing software would be to detect the memory problem and reboot the system automatically.

Microsoft's Windows XP already has some autonomic concepts in it. It stores models of its original configuration so that if a program becomes corrupted, it can be restored without having to reboot the computer.


EXTREME PROGRAMMING


This concept is designed to eliminate problems that programmers face when developing code and to shorten development cycles. The rules for extreme programming follow.


Shorten development cycles


Break long development cycles into shorter time spans such as two or three weeks. This gives the programmer the satisfaction of meeting multiple short-term goals instead of having to wait more than a year or two to reach the final goal. Shorter cycles also allow clients to provide changes to the program design at the beginning of each cycle. The changes are then made part of the goal for the new cycle.


Do the simplest thing that could possibly work


This rule says that if a program is so complicated that it would be difficult to modify later, it is too complex and should be broken down into two or more smaller programs.


Avoid the Hype of Buzzwords


This caution applies more to non-technical executives who make software decisions rather than technical experts who are not bowled over by the hype of buzzwords such as "Object Oriented."


Implement new features only when you need them


Programmers sometimes add code for features that might be needed in the future. However, because this code may not be needed, because it requires additional time to write and test, and because it obfuscates the rest of the code, it should not be written until it is actually needed.


Programmers work in pairs


By working in pairs, sharing one workstation, programmers can catch each other's mistakes, review each other's code, and exchange knowledge during the coding process.


Customer interaction


Being able to get information from the customer is essential during code writing. A customer who can answer questions and prioritize work should be on site and available.


Sustainable work pace


The schedule should be designed to permit programmers to work regular hours. Overtime should not be allowed unless it is absolutely necessary. "Death march" projects do not produce quality software.


JUST-IN-TIME PROGRAMMING


This concept allows a user of the software to make a change to the software model which triggers the regeneration of the code from the model with the new changes in place for immediate use.


INFORMATION SOURCE


The source of this information is the November 2003 issue of "Technology Review," personal experience, and discussions with peers.


List of some computer languages:


ABC
Ada
ADL
Aleph
Algo 60
Algol 68
APL
AppleScript
ASP
Assembly
Awk
BASIC
Befunge
BETA
Bigwig
Bistro
Blue
Brainfuck
C
C++
C-sharp
Caml
Cecil CHILL
Clarion
Clean
Clipper
CLU
Cobol CobolScript
Cocoa
Comparison and Review
Compiled
Component Pascal
Concurrent
Constraint
Curl
D
Database
Dataflow
Declarative
Delphi
Directories
DOS Batch
Dylan
E
Education
Eiffel
ElastiC
Erlang
Euphoria
Forth
Fortran
FP
Frontier
Functional
Garbage Collected
Goedel
Hardware Description
Haskell
History
HTML
HTMLScript
HyperCard
ICI
Icon
IDL
Imperative
Intercal
Interface
Interpreted
Io
Java
JavaScript
LabVIEW
Lagoona
Language-OS Hybrids
Leda
Limbo
Lisp
Logic-based
Logo
Lua
m4
Markup
MATLAB
Mercury
Miranda
Miva
ML
Modula-2
Modula-3
Moto
Multiparadigm
Mumps
NET
Oberon
Obfuscated
Object-Oriented
Objective-C
Objective Caml
Obliq
Occam
Open Source
Oz
Parallel
Pascal
Perl
PHP
Pike
PL
PL-SQL
Pliant
POP-11
Postscript
PowerBuilder
Procedural
Prograph
Prolog
Proteus
Prototype-based
Python
REBOL
Reflective
Regular Expressions
Rexx
Rigal
RPG
Ruby
S-Lang
SAS
Sather
Scheme
Scripting
Self
SETL
SGML
Simkin
Simula
Sisal
Smalltalk
Snobol Specification
SQL
Squeak
T3X
Tcl-Tk
Tempo
TOM
TRAC
Turing
UML
VBA
VBScript
Visual
Visual Basic
Visual DialogScript
Visual FoxPro
Water
Wirth
XML
XOTcl
YAFL
Yorick
Z