Project Management


Phase 3

Figure D-8. CCPDS-R first demonstration activities and schedule

IPDR Demonstration Scope

The basic scope of the IPDR demonstration was defined in the CCPDS-R statement of work:

The contractor shall demonstrate the following capabilities at the NORAD Demo 1: system services, system initialization, system failover and recovery, system reconfiguration, test message injection, and data logging.

These capabilities were fairly well understood by the customer and TRW. They represented the key components and use cases necessary to meet the objectives.

1. System services were the NAS software components of general utility to be reused across all three subsystems. These components were the foundation of the architectural infrastructure. They included the interprocess communications services, generic applications control (generic task and process executives), NAS utilities (list routines, name services, string services), and common error reporting and monitoring services. These components were all building blocks needed to demonstrate any executable thread.

2. Data logging (SSV CSCI) was a capability needed to instrument some of the results of the demonstration and was a performance concern.

3. Test message injection (TAS CSCI) components permitted messages to be injected into any object in the system so that there was a general test driver capability.

4. System initialization was the fundamental use case (called phase 1 in Figure D-8) that would illustrate the existence of a consistent software architecture skeleton and error-free operation of a substantial set of the system services. One of the perceived performance risks was the requirement to initialize a large distributed software architecture, including both custom and commercial components, within a given time.

5. The second scenario (phase 2) was to inject the peak message traffic load into the architecture and cause all the internal message traffic to cascade through the system in a realistic way. Executing this scenario required all the software objects to have smart, but simple, message processing stubs to be "modeled." These simple Ada programs completed the thread with dummy message traffic by reading and writing messages as expected under a peak load. Prototype message processing software was constructed to accept incoming messages and forward them through the strings of components that made up the SAS. This included all significant expected traffic, from receipt of external sensor messages through missile warning display updates, across both primary and backup threads. It also included all overhead traffic associated with status monitoring, error reporting, performance monitoring, and data logging.

6. System failover and recovery (phase 3) was one of the riskiest scenarios, because it required a very sophisticated set of state management and state transition control interfaces to be executed across a logical network of hundreds of software objects. The basic operation of this use case was to inject a simulated fault into a primary thread operational object to exercise the following sequence of events: fault detection, fault notification, orchestrated state transition from primary thread to backup thread, shutdown of primary thread. All these network state transitions needed to occur without interruption of service to the missile warning operators. Reconfiguration, in this specific case, meant recovering from a degraded mode. Following the system failover defined above, a new backup thread would be initialized so that there was minimum exposure to single-point failures. In the delivered system, repair immediately followed failover.

IPDR Demonstration Evaluation Criteria

The essential IPDR evaluation criteria were derived from the requirements, the risk assessments, and the evolving design trade-offs:

• No critical errors shall occur.

• The system shall initialize itself in less than 10 minutes.

• The system shall be initialized from a single terminal.

• After initialization is complete, the number of processes, tasks, and sockets shall match exactly the expected numbers in the then-current SAS baseline.

• Averaged over the worst-case minute of the 20-minute peak scenario, the total processor utilization for each node shall be less than 30%.

• There shall be no error reports of duplicate or lost messages.

• All displayed data shall be received within 1 second from its injection time.

• The message injection process shall maintain an injection rate matching the intended scenario rate.

• The data logs shall show no unexpected state transitions or error reports and shall log all injected messages.

• The operator shall be capable of injecting a fault into any object.

• An error report shall be received within 2 seconds of the injection of a fault.

• The switchover from the primary to backup thread shall be completed within 2 seconds of the fault injection with no loss of data.

• The shutdown of the failed primary thread and reinitialization as a new backup thread shall be completed in less than 5 minutes from failure.

• The data logs shall match the expected state transitions with no fatal errors reported other than the injected fault.

There were 23 other evaluation criteria for less important visibility into detailed capabilities and intermediate results. They are not listed because they require much more explanation.

IPDR Demonstration Results

The results of the IPDR demonstration were fruitful. Of the 37 evaluation criteria, 31 were considered satisfactory. Six criteria were not met, including three of the essential criteria just discussed. These were considered very serious issues that required immediate redesign and re-demonstration. Of most concern was excessive processor utilization during the peak load scenario. While the threshold was 30%, actual utilization was 54%. This corresponded to the essential overhead of the architectural infrastructure, operating system, and networking software. Because this was always a perceived risk of the CCPDS-R reusable middleware design, it received extensive attention. Five distinct action items for performance analysis were created, as well as an action item to demonstrate the performance improvement at the next project management review after the five action items were resolved.

Greatly simplified, the five action items were as follows:

1. Update the scenario. The actual test scenario used as the peak load was in fact about 33% worse than the real peak load. The internal message traffic threads were worse than the true worst case (for example, each message caused an "alarm" that resulted in redundant and unnecessary message traffic). The IPDR demonstration forced TRW, the customer, and the user to converge on a better understanding of the real worst-case mission scenario in tangible and objective terms. It also forced the architecture team to understand better the message traffic patterns and the optimization tradeoffs. The return on investment realized from this activity was never quantified, but it was certainly enormous.

2. Tune the interprocess communications (IPC) buffering parameters. The NAS components had many options for optimizing performance. Even though numerous local optimizations were made over the final month of integration activities, there was a definite need for a more global analysis to take advantage of lessons learned in exploiting the patterns of message traffic.

3. Enhance network transactions. The node-to-node message traffic was an obvious bottleneck because the current version of the operating system (DEC VMS 4.7) did not exploit the symmetric multiprocessing capability of the VAX processors. The pending upgrade to VMS 5.0 would provide a substantial increase to this component of the overall performance.

4. Improve performance of the IPC component. An obvious bottleneck in the NAS interprocess communications component had an impact on one of the performance optimization features. The demonstration team identified this as a design flaw that needed resolution. (A prototype solution was already in progress.)

5. Improve reliability in the IPC component. The IPDR demonstration exposed another serious design flaw: Erroneous behavior could occur under a very intense burst of messages. The overly stressful scenario made this flaw obvious. In a system with the stringent reliability requirements of CCPDS-R, it had to be fixed, even though it might never occur in operation. Although fixing this sort of problem was mildly painful at the time, it could have caused malignant breakage and immense scrap and rework if the flaw had gone undetected until late in the project.

The five action items accurately represented the critical issues that were still unresolved at the time of the demonstration. There was tremendous anxiety on the part of TRW management and the customer; both had expected the demonstration to conclude with no open issues. Nevertheless, both parties were pleased with the demonstration process and the unprecedented insight they had achieved into the true design progress, design trade-offs, requirements understanding, and risk assessment. The overall anxiety of the stakeholders was significantly relieved after the closure of the action items and the re-demonstration that occurred about one month after the IPDR demonstration. While the original objective of 30% processor utilization still had not been achieved, the team had demonstrated the flexibility of the architecture and the opportunities for optimization, and succeeded in reducing the overall utilization from 54% to 35%. This positive trend was sufficient for everyone to feel comfortable that the performance requirement would ultimately be met through straightforward engineering optimizations and operating system upgrades.

These were the visible and formal results of the IPDR demonstration. As the responsible manager for the process, the architecture, and this demonstration, I also observed many intangible results. Over a period of 8 weeks of late-night integration and debug sessions—during which priorities were coordinated, design issues were resolved, workarounds were brainstormed, stakeholders were placated with on-going status reports, and the engineering teams were motivated toward an ambitious objective—many lessons were learned:

1. Very effective design review was occurring throughout the period. The demonstration was the result of the engineering team's review, presented to the stakeholders as tangible evidence of progress. Although we ended up with only five open issues, 50 or more design issues had been opened, resolved, and closed during the 8-week integration activity. This early resolution of defects—in the requirements specification, the process, the tools, and the design—had undocumented but extensive return on investment by avoiding a tremendous amount of late downstream breakage that could have occurred had we not resolved these issues in this early demonstration.

2. Through day-to-day participation in this activity, I gained detailed insight into where the design was weak, where it was robust, and why. For example, when we uncovered issues in some components, the responsible designer delivered a resolution within hours. In other components, there was recurring resistance and resolutions frequently took days. By the time the demonstration activity concluded, I knew very well where change was easy (usually indicating well-designed components) and where it was difficult (for numerous reasons). These lessons helped in structuring the risk profile for future planning, personnel allocation, and test priorities.

3. The demonstration served as a strong team-building exercise in which there was a very tangible goal and the engineers were working in the forum they preferred: getting stuff to work.

4. The detailed technical understanding and objective discussions of design trade-offs proved invaluable to developing a trustworthy relationship with all stakeholders, including the customer, the user, and TRW management. We were armed with facts and figures, not subjective speculation.

Government Response to the IPDR Demonstration

The formal IPDR demonstration represented a major paradigm shift from conventional design reviews. Consequently, there was a fair amount of tension and anxiety between TRW and the Air Force in converging on detailed evaluation criteria for the demonstration. The following paragraphs, with quotations presented in italics, were extracted verbatim from the final plan TRW submitted. This is a good summary of some of the concerns likely to show up when an organization takes on this process for the first time. It also provides insight into the spirit of the demonstration.

After careful evaluation of the Government's Preliminary Demo 1 Plan comments, the following observations summarize this submittal of the Demo 1 Plan and the modifications that have been made from the previous version:

1. This submittal has eliminated all requirements references to avoid making any allusion to an intent of satisfying, proving, or demonstrating any requirements. These requirements verification activities are performed by the test organization in a very rigorous and traceable fashion. The demonstration activity is intended to be an engineering-intensive activity, streamlined through minimal documentation, to provide early insight into the design feasibility and progress. TRW intends to maximize the usefulness of the demonstration as an engineering activity and to avoid turning it into a less useful documentation-intensive effort.

2. Several government comments requested further details on requirements, designs, etc. This information is not necessary in the Demo Plan. It is redundant with other documents (SRS, SDD, design walkthrough packages) or it is provided in the informal test procedures delivered 2 weeks prior to the demonstration. Providing more information in a single document (and in every document) may make the reviewer's job easier but it would also be excessive, more time-consuming, and counterproductive to produce, thereby reducing the technical content of the engineering product being reviewed.

3. In light of the government's concern over the relationship of the demonstration to the requirements, the evaluation criteria provided in this plan should be carefully scrutinized. We feel that the evaluation criteria are explicit, observable, and insightful with respect to determining design feasibility, especially at such an early point in the life cycle. Although we are open to constructive modification of these evaluation criteria, we feel that modifying them to relate more closely to the System Specification or SRS requirements would be inappropriate. The requirements perspective and our demonstration perspective are different and difficult to relate.

4. The source code for the components being demonstrated has not been delivered with the plan as required in the statement of work. The total volume for the demonstrated components is roughly 1 to 2 feet thick, and it is still changing at a rapid rate. Instead of delivering all the source code, interested reviewers may request specific components for review. All source code will be browseable at the contractor facility during the demonstration.

As mentioned before, the government's overall response to the IPDR demonstration was very positive, although the five critical action items were an unexpected outcome and initially caused intense concern. After TRW demonstrated resolution of these action items one month later, the government response was overwhelmingly positive. The objective insight, open discussion of trade-offs, and understandability of the design issues, requirements issues, and performance issues resulted in exceptional relationships among the stakeholders. The customer and the user representatives requested encore demonstrations to their upper management, and there was a sense of success among stakeholders in which they could all take ownership. This event proved to be very important: From this point on, everyone wanted to maintain the project's reputation as a flagship example of how to do software right.

Project Management Made Easy

Project Management Made Easy

What you need to know about… Project Management Made Easy! Project management consists of more than just a large building project and can encompass small projects as well. No matter what the size of your project, you need to have some sort of project management. How you manage your project has everything to do with its outcome.

Get My Free Ebook

Post a comment