Home
Search
Site map
FAQ
NZ legislation
Client file
Careers
About the PCO
PAL Project
Links
Archive
Contact us

About this site
Access keys

SAIC Advice to PCO—Technical Review—Feb 2007

A PDF version of this document is also available (PDF version, 269KB).

Timothy Arnold-Moore, Legislation Management Consultant
SAIC Pty Ltd

© SAIC Pty Ltd, 2007. All rights reserved.
The TeraText® trademark and logo are owned by Scientific Applications International Corporation.

Table of Contents

1. Background
2. Scope
     2.1 Review of the individual components
     2.2 Review of the deployment approach
          2.2.1 PCO IT support and PAL SLA
          2.2.2 Packaging of releases
          2.2.3 Maintenance of stylesheets
          2.2.4 Outstanding defects and issues
          2.2.5 Schedule slippage
          2.2.6 Performance issues
          2.2.7 Graphics
3. Approach
4. Review of the individual PAL components
     4.1 Authoring environment
          4.1.1 Training
          4.1.2 Usage patterns
          4.1.3 Usability concerns
          4.1.4 Outstanding authoring environment issues
     4.2 Print rendering
          4.2.1 Print rendering quality and accuracy
          4.2.2 Print rendering performance
          4.2.3 Outstanding print rendering issues
     4.3 CMS
     4.4 Link management
          4.4.1 Role of the link management system
          4.4.2 Current status
          4.4.3 Vendor support
          4.4.4 Outstanding issues
          4.4.5 Recommendation on link management
     4.5 UUI
          4.5.1 Interface to CMS
          4.5.2 Reports
          4.5.3 Editorial diary
          4.5.4 Recommendation on UUI
     4.6 Public website
5. Deployment approach
     5.1 Administration
     5.2 Packaging of releases
     5.3 Maintenance of stylesheets
     5.4 Outstanding defects, change requests and issues
     5.5 Schedule slippage
     5.6 Change management
     5.7 Performance issues
          5.7.1 Screen rendering performance
          5.7.2 Print rendering performance
          5.7.3 Performance testing strategy
     5.8 Graphics format
          5.8.1 Background
          5.8.2 PAL and graphic formats
          5.8.3 Storage issues
          5.8.4 Recommendation on graphic formats
6. Summary of recommendations
     6.1 Required functionality
     6.2 Data integrity
     6.3 Usability
     6.4 Release packaging
     6.5 Schedule
     6.6 Conclusion

1. Background

In 2003, InQuirion Pty Ltd provided a technical review of the Public Access to Legislation (PAL) project to the New Zealand Parliamentary Counsel Office (PCO) to enable PCO to reassure the New Zealand Government, as sponsors of the PAL project, that the PAL system, when implemented, will be operationally stable, maintainable, and capable of supporting future enhancement and development. Since that review, the project has recommenced and PCO seeks continued advice on various matters related to the project. Other offices that will use the PAL system in an authoring capacity include the Inland Revenue Department (IRD), primarily for drafting taxation legislation, and the Office of the Clerk (OC), for commentaries and a variety of other interactions with Bills.

In October 2005, SAIC Pty Ltd, the Australian subsidiary of Science Applications International Corporation purchased InQuirion Pty Ltd. SAIC is a US research and engineering company with annual revenues of over USD$7.8 billion and over 43,000 employees in offices in over 150 cities worldwide. SAIC engineers and scientists solve complex technical challenges requiring innovative solutions for customers’ mission-critical functions. InQuirion continues to trade now under the SAIC name. Dr Arnold-Moore relocated from Melbourne, Australia to Annapolis, Maryland in early 2006.

PCO has requested that SAIC provide them with a further technical review to be performed early in 2007 to review the progress to date and provide additional advice with respect to the technical aspects of the project and how the solution can and should be deployed. This document captures the results of that technical review.

2. Scope

This review examines the current state of the project and the progress since the last review. The aim of the review will be to assess the actual progress of the project against the project plan and the current state of the proposed PAL deliverables.

It identifies outstanding issues in the project that may affect system implementation and how PCO should address these issues. SAIC has revisited known issues arising from previous technical reviews as well as identifying and discussing some new issues raised by the current implementation including issues raised by changes in the architecture from earlier versions of the PAL system. This document includes advice on how to mitigate any risk these issues create for project deployment.

This document contains an opinion on the system's readiness and appropriateness for deployment and suggests actions that should be taken by PCO that can assist a successful and timely system deployment.

2.1 Review of the individual components

After examining the current PAL architecture, this report addresses SAIC's opinion of the readiness of each of the components that make up the complete PAL system. These components include:

2.2 Review of the deployment approach

2.2.1 PCO IT support and PAL SLA

PCO has an IT infrastructure already with considerable knowledge of the business requirements of the PAL user base and accustomed to meeting the very high responsiveness requirements of the PCO. In addition to managing customization of the existing Documentum deployment in the PCO and trouble-shooting and correcting problems with the repository, the IT team are also accustomed to managing the configuration and deployment of the PCO workstations and creating and managing access to user accounts in the PCO.

The PAL system servers are all remotely located at a Unisys data centre and their management and upkeep is the subject of a Service Level Agreement (SLA) between Unisys and the PCO that is currently being negotiated.

This review will also cover the extent to which PCO IT staff will be able to continue this function on the new system.

2.2.2 Packaging of releases

Previous reviews have identified concerns about the packaging of releases and the need to create different releases for deployment to the three different environments currently planned, production, UAT, and system testing. This report will address this issue.

2.2.3 Maintenance of stylesheets

The Authoring environment, print rendering tool, and web site (HTML publish process) need stylesheets to map the underlying XML to the appropriate presentation format. PTC/ArborText provide a tool (Styler) that generates multiple stylesheets including XSLT for generating XSL-FO (one possible alternative path to creating PDF), FOSI (the chosen path for creating stylesheets for the authoring environment and for print rendering), and XSLT for HTML (the original path to create HTML pages for the web).

PCO has expressed concern in the past at the need for manual intervention after creating stylesheets and the additional maintenance burden that this would create. For performance reasons and size restrictions, screen and print rendering FOSIs were manually split in the past, reunited to avoid the maintenance issues, then split again using an automated process again for performance and size reasons. SAIC will comment on the current stylesheet creation, maintenance, and deployment process.

2.2.4 Outstanding defects and issues

The PAL project is a large and complex system with complex interactions between components that are not always easily predictable. Like common desktop software products, such systems are usually deployed with known (and unknown) defects. Since it is unrealistic to ever eliminate all defects in such complex systems, it is still possible to construct them in such a way that the majority of interactions are unaffected. Once the defects have been eliminated for common or regular business processes, it is usually better to deploy such systems to handle the majority of cases. If the system can provide manual work-arounds or strategies to address any known and anticipated system limitations, the system can be deployed into production without undue risk. SAIC’s report addresses the deployment strategy and provides advice to the PAL stakeholders on acceptable deployment approaches.

2.2.5 Schedule slippage

There has been some obvious schedule slippage on this project. This is not uncommon for such a large, complex system such as the PAL system (for similar reasons to schedule slippage on large, public building projects). This report supports a recommendation to avoid further project slippage.

2.2.6 Performance issues
2.2.6.1 Screen rendering performance

More recently, the interactive performance of the editing solution has caused concern to the PCO. Since this is the environment in which most PAL users will perform the majority of their work, any poor performance would severely impact the productivity of PAL users. This also relates to the integration of the link management tool and its impact on the performance of the system, particularly when documents are registered for cross-referencing.

This report addresses the likely user experience (including for drafters, secretarial support, Pre-Publication Unit (PPU) and Reprints Unit (RU)) for interacting with different size documents to perform common drafting and print preparation tasks for documents of a few pages (extremely common), documents of 1-200 pages (relatively common), and extremely large documents greater than 1000 pages (uncommon but likely to occur at least once in the near future). Note as discussed in the more general section on the authoring environment that RU usage does not reflect this general pattern with larger documents more commonly worked on.

2.2.6.2 Print rendering performance

At various times throughout the project, PCO and other stakeholders have expressed concern over the throughput of print rendering tasks. This report addresses the likely user experience (including drafters, PPU and RU) for printing documents of a few pages (extremely common except in RU), documents of 1-200 pages (relatively common), and extremely large documents greater than 1000 pages (uncommon except in RU but likely to occur at least once in the near future). This is covered in the more general section on the print rendering solution.

2.2.6.3 Performance testing strategy

The current PAL testing strategy includes a pre-UAT environment in the PCO offices with a number of workstations connected to back-end systems at the Unisys data centre. At any given time during normal office hours, there are typically multiple testers currently interacting with the PAL system (although far fewer than will interact with the production system once deployed). However, many of the testing scripts do not involve interactive use of the authoring environment (which is essentially stand-alone) and testers often copy documents rather than create them from scratch to test back-end system functions. While the number of users interacting with the back-end systems is lower than would be in a production system, each user is interacting more frequently with the back-end system than would be normal in a production environment.

The system testing repository only has a small selection of legacy data and the documents created in the last few months of system testing available for interactive use. A complete legacy data conversion will contain many more documents and a much larger collection.

While stand-alone performance of the authoring environment and the print preview capability is relatively independent of these server issues, these factors make extrapolation of the back-end system performance from current behaviour to the production system speculative at best.

This report examines the testing strategy to ensure that performance is acceptable before deployment.

2.2.7 Graphics

There is still some uncertainty about suitable graphics formats for PAL and approaches for managing graphics when they occur in legislative documents. Legislative drafting and production systems obviously need to handle graphics for diagrams, maps, and road signs and similar objects that are clearly graphical. They also often use graphics to manage other items that appear in the legislative collection for which the formats might otherwise be difficult to produce consistently in multiple output formats such as formulas, flow charts, and (in other jurisdictions) complex tables and included documents such as forms, treaties, deeds of contract, and other agreements. The PAL system has avoided graphics for these included documents and contains a sufficiently sophisticated table model to avoid the need to use it for complex tables (and many forms), however, there are still likely to be a number of documents that contain large numbers of graphics (such as the numerous formulas in tax legislation or sensitive tables of customs tariffs and possibly appropriations).

3. Approach

Reviews of such large projects can be extended indefinitely. Because of SAIC/InQuirion’s ongoing involvement in reviewing the project and intimate knowledge of its progress, SAIC has provided a fairly broad scope review within a reasonable time frame minimizing disruption to the project.

SAIC's lead legislation consultant, Dr Timothy Arnold-Moore, spent a week in Wellington to conduct the review beginning 5 February 2007. During this time, he was on site at PCO to view the proposed PAL system in its pre-UAT environment, interview PCO staff and other stakeholders, examine code and documentation (including user documentation, system documentation, quality assurance and test plans, and change request documentation), to be briefed by the development team, and to collect material for more detailed review. He presented a brief summary of his findings to the PAL Steering Committee and to the PAL staff.

The onsite visit included:

SAIC then prepared this written report based on the information gathered during the visit and the documents provided.

4. Review of the individual PAL components

The reaction to the PAL system currently under system testing has been encouragingly positive. While there are some concerns and issues that still remain to be addressed, the future PAL users interviewed are generally optimistic and eager to see the system deployed.

While the Public Access to Legislation project is about making the law available to the public more effectively in print and electronic form, a key factor in this capability was regaining custody and control of the electronic statute book and providing an infrastructure to maintain and enhance that resource from which the public products are created. The PAL project always envisaged an XML repository stored in a CMS, a website to deliver HTML and other renditions of documents stored and managed by the CMS, a rendering tool to generate print products (and their electronic equivalents), and a structured authoring environment to allow PAL users to maintain and extend the content supporting drafters and administrative officers to create content, PPU to refine and prepare content for publication, and RU to consolidate amendments and officialize legacy content.

4.1 Authoring environment

SAIC has reported in the past that the PAL team has reacted more positively to a new structured authoring environment than other drafting offices. This has not changed and the comments continue to be encouraging.

Most regular PAL users will spend the majority of their time in the authoring environment so this is a key factor to the success of the project. SAIC had an opportunity to use the environment over a number of days and the authoring environment supports the regular drafting tasks extremely well. SAIC’s exposure to the system and discussions with users of the system testing environment led to the observations below.

4.1.1 Training

The PAL team has experimented with a training approach for pre-UAT that appears to be very successful. By using peers who have had exposure to the systems testing (and eventually pre-UAT and UAT) environment, PCO and IRD (and to a lesser extent, OC) are creating a community of PAL "experts" who can support other users in production. These experts are spread across the different PAL user groups including PCO drafters and secretarial support, PPU, and RU. These experts could provide an initial intensive one-on-one training session to familiarize users with the basic legislative authoring tasks—covering about 90% of drafted material.1 Users may be able to achieve this basic level in as little as an hour or two. Many users may not need more formal training than this.

The plan is to have one or more experts follow up the basic training with short, focussed group or individual sessions aimed at training users to perform a few selected tasks rather than overwhelming users with the entire system capability in an extended training session over many days. Alternatively, the team of experts could augment the basic training with occasional side-by-side sessions where users learn one or two new tasks as they arise in their working environment.

While IRD can make use of the PCO experts, there are some differences in drafting style between the two offices and hence some different techniques or interactions needed. Given the different office location, it is also more practical for IRD to continue to maintain some level of independence and self-determination on these issues. System testing has exposed some IRD staff to the system but this exposure has been more limited than for PCO. There is still plenty of time before deployment for selected IRD staff to become sufficiently familiar with the PAL environment that they will need little support from PCO. Regardless, PCO appears to be more than willing to provide any support in the unlikely event that it is necessary.

4.1.2 Usage patterns

All users interviewed reported a preference for working with whole documents unless they are working in a team drafting environment.

This raises issues in usability, link management, and working with large documents as described below.

Drafters, secretarial support, and PPU will be working primarily with Bills and new Statutory Instruments. While such documents often extend to 100-200 pages (and one to 2700 pages), most documents are only a few pages. The average size for new documents is less than 10 pages. New Zealand often uses a drafting team to create new documents more than a few hundred pages with each user primarily working on one Part (or Subpart) at a time. Therefore, drafters and secretarial staff need rarely work in documents of more than few hundred pages for any length of time. PPU are more likely to work with whole documents regardless of their size. PPU (and, to a lesser extent, secretarial users) are also more likely to be managing revision tracking markup and working with link markup.

While the average size of new documents is low, the majority of amendments are applied to longer documents. Smaller documents are typically not frequently amended and, when they are, the amendments are most often small, simple and quickly applied. RU will spend more time in the most frequently amended documents. The average size of these documents is much larger—probably above 100 pages and applying these amendments typically takes longer because there is more document to search to find the target of the amendment, and more text to amend in the first place.

RU will also be inserting and managing link markup (that result from amendments to the existing legislation). This means that RU will be generally working in the authoring environment and the link management utilities with larger documents than the rest of the user community. However, RU knows ahead of time which portions of the document they need to access so can usually access the documents one fragment at a time.

4.1.3 Usability concerns

The PAL project has included significant customization of the Epic tool. However, PAL users are typically comparing the environment with a WordPerfect environment that has been in production for many years or structured authoring environments in other jurisdictions that have been in production for a number of years. Both have received continual addition and refinement of macros and other tools to assist users to be more effective in their tasks.

However, while all of the usability concerns currently expressed in the bug tracking system are unlikely to be addressed before deployment, the PAL team should expect to be continually addressing these issues through the life time of the system.

SAIC summarizes the most prominent usability issues that emerged during the onsite review below.

4.1.3.1 Density of tags on the screen hiding the text

While the following usage observations are not directly relevant to deployment concerns, some users expressed concern about the busyness of the screen display in the authoring environment when displaying all the tags.

The authoring environment provides the ability to edit documents with all tags visible, partial tags (small triangular icons in place of start and end tags), and no tags (effectively a WYSIWYG view),2 in addition to the structure view in the left window. There were quite differing views between different users as to which mode they preferred. One user interviewed preferred full tags always. One user preferred using no tags 90% of the time switching to full tags only for copying and pasting (it can be difficult to work out exactly where to paste something unless a user can see where the tags are). A number of users switched between full-tags and partial tags but never used no-tags.

SAIC experimented using all three modes and observed that basic authoring tasks could be performed directly from the keyboard in all three modes equally easily. Any operations that use the mouse or precise location of the cursor (e.g. cutting and pasting, going back and inserting additional markup) were difficult in no tags mode, less difficult in partial tags, and most effective in full tags mode. The revision tracking and link markup (both markup that is likely to be inserted later hence more likely to be inserted in full tags mode) was quite verbose and viewing either of these in full tags mode was quite distracting. While this is not an issue preventing deployment, usability studies suggest that the amount of text visible on a screen has a direct correlation to user productivity for knowledge workers manipulating text. Any mechanism that increases the amount of visible substantive text on the screen should improve productivity.

Despite technical document writers frequently using the structured view in other environments, no PAL users said that they made significant use of this view. SAIC was unable to investigate whether this is a tool familiarity issue, a training issue, or a cultural issue. We simply observe that it is possible to draft in the no tags view and use the structured view for pasting without switching modes. Alternatively, if users do not benefit from the structured view, the PAL team should apply that screen real estate for other uses.

SAIC observes that, while some minor refinements to limit the number of tags visible especially with revision tracking and link markup are possible, there are a number of strategies to support users that can limit the screen real estate devoted to displaying tags. SAIC suspects that this may be best dealt with primarily as a training issue.

4.1.3.2 Revision tracking markup

The revision-tracking markup entered by drafters and secretarial support is typically for presentation to the select committee. While RU (and, to a lesser extent, PPU) create markup with a wide variety of attribute values, most users do not. Drafters and secretarial support only have control of documents when a small subset of the revision tracking markup possibilities is in use.

Displaying the attribute values in full tags mode for this markup uses a large amount of screen real estate. Inserting this markup often requires a number of clicks. It may be possible to select default values for these attributes in the DTD that mean that only the RU and PPU need to enter and see attribute values in revision tracking markup. This might involve a (very minor) DTD change, which is best performed before deploying the system. If it can only be addressed by adjusting the screen rendering stylesheets, there is less urgency.

4.1.3.3 Element in context list

The authoring environment allows a user to press the "Enter" key while drafting to get a menu of elements available in or near the current context. The Unisys team customized this behaviour considerably. The authoring environment currently lists the contexts in two parts—an alphabetical ordering of all elements possible in the context immediately following the "para" ("box") element and, above the line, a few selected additional contexts. The description of these contexts is most likely a hard-coded string in the configuration. These descriptions tend to be related to the markup rather than using the language of the user community.

SAIC recommends that the system testing team revise the wording of these descriptions to make them more accessible. While this is not a critical issue, it will improve immediate usability, reduce training requirements, and should be relatively trivial to change.

4.1.3.4 Cross-reference automation

The link management tool, DLM is designed to manage and maintain links within and between legislative documents. There are currently numerous issues with this tool described below in section 4.4. Of particular relevance to the perceived usability of the authoring environment, currently the link markup is not being used to auto-update the reference numbers of the targets of the links (like the Insert >> Reference >> Cross-reference >> Numbered item functionality in MS Word).

A principle of change management is that it is acceptable to create an additional burden providing that there is also a perceived benefit. While often this benefit may not be to the user performing the additional task, it is always preferable to provide some related or compensating benefit to the user or group of users performing the task. Entering link markup will impose an additional burden on the PAL user community that they do not currently have. The major benefit was always intended to be to support links on the website. However, an additional anticipated benefit was that, if inserted during the drafting of Bills, it would significantly reduce the need to check cross-references in Bills when PAL users renumber the provisions. The insertion of link markup and the checking of cross-references are currently primarily tasks that will be undertaken by PPU (although both may be performed by other PAL users). The PAL system was specified to automatically update the references using the cross-reference markup. Failing to automate the update of cross-references doesn't undermine the primary benefit but it does remove the additional anticipated benefit.

While the system could be deployed without automating the update of cross-reference numbering, the new work processes are considerably less attractive without it.3 This is an obvious benefit of a structured authoring environment that is not currently being realized in the delivered PAL system.

Deployment without linking markup on the web site for enacted legislation is not an option because it would deliver a reduction in functionality from the existing web site.

4.1.3.5 Keyboard shortcuts and users who need to avoid the mouse

Potential users of legislative drafting systems often express concern that moving from a keyboard-heavy editing environment to one that involves more graphical interfaces (typically mouse use) will result in lost productivity (caused by switching between the keyboard and the mouse) and occupational health issues (created by heavy mouse usage). SAIC has heard these concerns in a number of legislative drafting environments, particularly those moving from older versions of WordPerfect to a structured editing environment.

While the Unisys team has expended significant effort in producing the current environment to speed data entry of the common structural components of legislation (with keyboard- and mouse-driven tools), the task of reordering and re-organizing provisions is more difficult to perform without using a mouse. It also typically requires displaying more tag information right at the time when viewing as much context as possible is most valuable. Interviews with users of the pre-UAT environment reveal that many users switch between two common modes—linearly creating or entering material with relatively simple structure, and revising and reorganizing the material, either under instructions from the client, or as part of a process of refinement to ensure that the content accurately conveys the intended meaning and gives effect to the policy.

While it is unlikely that there is sufficient time to refine the interface before entering UAT, the PAL user community should plan to examine the ways users interact with the system after deployment to identify potential improvements in this area to increase productivity and reduce the risk of occupational health issues.

4.1.4 Outstanding authoring environment issues

There are relatively few outstanding issues in the authoring environment.

Multiple users have reported issues with the Epic application disappearing,4 often many times a day. SAIC did not experience this but worked primarily with newly created documents. SAIC wonders whether this might be caused by opening older versions of documents that satisfy older versions of DTDs. Any complex interactive software environment will experience the occasional unexplained exit. Modern office tools with user-bases of millions still experience such events. Their occurrence is no cause for great alarm (providing they are relatively infrequent). However, users have come to expect more gracious recovery from such crashes with regular auto-save of content and preservation of the save location of the documents within the authoring environment.

The only other issue of any real concern in the authoring environment are character translation issues. These mostly occur in formulas where a number of technically different characters are being mapped to the same character (making it impossible to reverse automatically). These include mapping different width spacing characters to a space, mapping both the period and decimal point character to a period, and mapping various horizontal rules (mdash, ndash, soft and hard hyphen) to a regular hyphen. This is a minor issue but, because it affects the integrity of the data, is much better to fix before deployment. If not resolved before deployment, it may require manual checking and correction of large sections of data. SAIC believes that the Unisys team has created a fix for this (modifications to the entity definitions in the DTD) but it was not deployed in the system that SAIC examined.

4.2 Print rendering

Many users clearly understand the immediate benefits of single-source publishing that XML brings. They are also eager to deploy the PAL rendering environment so that they can create final copy whenever they wish without having to wait for copy to come back from an external printer. This realization, together with the obvious benefits of a reliable, authoritative public web site, means that the benefits of the PAL system are already well-understood by the user community.

When SAIC (then InQuirion) was originally engaged to review the PAL project, there were numerous concerns about the output and the performance of the print rendering subsystem. Each of these issues is addressed separately below. The Unisys team have clearly worked hard to address both of these issues and the current release is a considerable improvement.

4.2.1 Print rendering quality and accuracy

User comments about the quality and accuracy of the output of the current rendering subsystem are generally very positive. Formatting is correct and repeatable, the outstanding issues with print rendering are relatively minor, and the testing team is confident that the Unisys team will be able to fix issues without undue delay.

The rate of rendering errors fixed, tested, and then reappearing is much lower although there is still the occasional occurrence. SAIC investigated this with Unisys and it appears that there still is not an appropriate regression-testing regime in place for ensuring that the Unisys team test stylesheets against documents known to have caused previous defects before shipping.5 The Unisys test manager suggested that the output and stylesheets were only just reaching the stability required for such a testing regime to be sensible.

SAIC would be much more comfortable about deploying the system operationally if such a regression testing regime were in place. The alternative is for PCO to establish such a regime in their own testing environment before deploying fixes shipped by Unisys. SAIC understands that Unisys undertook to provide this infrastructure for print rendering when the project recommenced in 2005.

4.2.2 Print rendering performance

The rendering performance has improved significantly. The Income Tax Bill 2006 (approximately 2700 pages) now renders in between 25 and 40 minutes. A typical 100 page document takes less than 1 minute to render. It appears that the KPIs for print rendering in the PAL system will be met by the delivered system. This should be verified by the performance testing regime proposed.

4.2.3 Outstanding print rendering issues
4.2.3.1 Revision tracking

There are still some issues with the rendering of revision tracked markup. The number of combinations of revision tracking markup make this difficult to test exhaustively and it is to be expected that issues will be discovered during system testing and even potentially UAT. Providing that these issues are isolated to rare markup combinations,6 the system could be deployed without resolving these issues although the PAL team (especially PPU and RU) would be much more comfortable if they were resolved before deployment.

4.2.3.2 Complete production publication process

While the print rendering infrastructure appears to be robust, fast and accurate, some users observed that the publication process is not currently complete. The system testing and pre-UAT environments do not close the loop with SecuraCopy demonstrating that the system can deliver paper documents to the user community. This should be addressed at the very latest in UAT.

4.2.3.3 Handling of "changeable"

IRD has expressed some concern that the font for the "changeable" element is significantly different from the font in current use. It is not as easily distinguishable from the text around it. This makes it harder to spot on the paper and more likely that errors will be made by proof readers. This concern was not shared by Office of the Clerk users, although they remained concerned that the 'changeable' elements remain distinct from other text.

However, the process of removing and updating changeable text will be largely an electronic process and can be entirely dealt with within the authoring environment (in fact, it could be largely automated which would remove most concerns—see above section 4.1.3.4). Users can search for occurrences of particular tags in the authoring environment which also reduces the need to rely on page rendering characteristics to identify marked content.

This issue appears to be more settled than first revealed and can be addressed after deployment if there are still concerns about the formatting.

4.2.3.4 Hyphenation

The PTC/ArborText product suite allows hyphenation in the output to be controlled through a number of configuration methods. These include:

The PTC/ArborText product suite defaults to a 2-3 hyphenation rule not the 3-3 rule as specified in the Legislation Output Specification. This can be managed by creating a larger than normal exceptions table but PPU is concerned that this may affect print rendering performance and the turn around to get changes made to this table is going to be slower. The remaining alternatives would require each occurrence of a word to be marked to ensure that correct behaviour is achieved. Proprietary processing instructions are not desirable as migrating to another rendering tool will force changes to the underlying markup and these will require special processing for delivery to the web. Whether processing instruction markup or special Unicode characters are used for local overrides, care will need to be taken to ensure that search and web display both behave correctly.

While the system could be deployed without further resolving this issue (the system clearly supports a number of work-arounds), there are long-term data integrity issues potentially arising. PCO should ensure that the long-term and any short-term strategies for managing hyphenation are clearly documented before deployment including any issues for web searching and display and how the PAL system will manage these. This documentation should clearly describe any markup that the web publication system cannot manage and describe the supported alternative. It should also describe any required data clean-up procedures to remove unnecessary local overrides once a long-term solution is in place.

In the long-term, these documents are amongst the most important in the nation and the former government printer employees are highly skilled and see themselves as experts in this area. The technology exists to automate this process to a very high standard and it is perfectly reasonable to expect the technology to be configured to do so, particularly if the specification is clear and undisputed.

4.3 CMS

The CMS is virtually invisible to users now. The only view that they see is through the UUI (see below section 4.5.1).

Throughout the review process, the CMS appeared to be stable and robust. Users reported no stability issues and acceptable performance. SAIC considered the interactive check-in and check-out performance to be slow but acceptable. While this is a process that, for many users, will only happen a few times a day per user, some users will be checking in and out more frequently (particularly RU and possibly PPU). For instance, when working on amendments arising out of a Statutes Amendment Bill or consequential amendments that amend multiple Principal documents, RU will open dozens of documents one at a time to apply only a few amendments to each, checking in each document as they go.

There appeared to be a number of issues experienced by the system testing team that related to stale data in the repository (including both the CMS and the DLM repositories). SAIC suspects that the CMS only checks documents against the DTD when they are checked-in and does not ensure that all documents continue to be valid against a DTD when that DTD is updated. Unisys need to ensure that there is a strategy to manage the evolution of the PAL DTDs and that the CMS continues to contain data valid against the DTDs over time. Unlike database schemas, XML documents are only semi-structured and DTDs are rarely created that never change. Any seriously complex XML document system should anticipate and accommodate DTD evolution (typically extending rather than restricting the permitted content) and have in place strategies for validating legacy data against a new DTD set to ensure that the system continues to behave correctly.

SAIC also observes that there are still issues with the display titles for fragments. Part and Subpart fragment display titles do not contain a Part or Subpart number until numbering is fixed (at least in the Part or Subpart). Changing the title of a Part or Subpart can therefore radically affect where it appears in the list of fragments. Fixing then unfixing the numbering can have side affects on the settings affecting interactive performance on large documents so this is not a sufficient solution (given that documents with Parts and Subparts are likely to be larger than average to begin with). This was identified as an issue before the project restarted in 2005 and it appears that there has been frustratingly little progress on this issue.

SAIC notes that changing //part/title correctly updates the display title however changing the //billdetail/title does not update the display title for the whole document, only changing //cover/title does. SAIC is concerned about the data duplication inherent here and recommends that the PCO request from Unisys some description of the tools or procedures that ensure that this duplicated content is maintained consistently.

4.4 Link management

4.4.1 Role of the link management system

The link management functionality to be deployed in the PAL system is crucial for four major areas of functionality:

  1. to support cross-reference hypertext links within the web site and potentially PDFs,
  2. to support the editorial diary functionality including:
    1. managing links in amending documents to the amended target, and
    2. finding links to repealed documents and provisions to remove potential dangling links and discover potential consequential amendments,
  3. to validate existing cross-references and links in the legacy data, and
  4. to automate cross-reference wording updates when Bills (and Subsidiary legislation drafts) are renumbered.
4.4.2 Current status

Unisys selected a link management solution from PTC/ArborText, DLM because it was designed to work with the other PTC/ArborText tools selected (E3 and Epic Editor). This is a relatively new tool in the PTC/ArborText offering and there have been some technical and performance issues associated with its usage. This is the first time anybody has applied this tool to a collection of legislative documents. Other sites using Epic Editor for drafting or publishing legislation have not used DLM for link management.

Legislation is much more densely structured than typical documents. Structural elements in legislation occur about 7 for every 100 words whereas newspaper articles are more like 2-3 for every 100 words and general prose or technical documents tend to be even less dense. In a well-designed DTD, virtually all of these elements represent potential reference targets. Legislation is also rich with references within the collection. Each provision in an amending Act or Statutory Instrument (about 70-80% of all legislation contains some amendments) will refer to at least one target in another Act or Statutory Instrument. Substantive legislation is not quite as densely referenced but it is not unusual for a single provision to contain half a dozen references. A large document like the Income Tax Act 2004 (ITA) will contain tens of thousands of potential link targets and thousands of links.

This means that solutions designed for technical documents will not necessarily translate easily to manage legislative documents, particularly large documents like the Income Tax Act 2004. This raises performance issues for registering documents (each document needs to be registered so that the links to targets in that document can be created), registering new links, and resolving links. Unisys is currently working with PTC/ArborText to address these issues. PCO will need to ensure that the solution delivered balances the need for responsiveness with the need to manage links between and within legislative documents appropriately.

Unisys has created an infrastructure that allows a single document to be registered and updated by updating each of the individual components in turn. At the time of the review, Unisys were waiting on changes and documentation from PTC/ArborText but appear to be confident that they can address this issue and will deliver a working link management solution to PCO within a few weeks of the initial consultation leading to this report (week beginning 5 February 2007).

4.4.3 Vendor support

As a relatively new product, it is unsurprising that there are some issues with documentation and functionality. Some examples of poor documentation in the DLM product include:

These documentation issues are easily addressed and PTC/ArborText is responding to Unisys requests for clarification and further documentation in a timely manner.

A more fundamental issue is the way the product suite manages and embeds the link data in the documents. DLM creates and manages element and attribute markup that was designed to be validated using XML Schema. This markup is embedded in the XML instance. If the DLM markup is included in a DTD, then some of the automatic management in the authoring environment is not available. If not included in the DTD, there will be issues in validating the XML documents later. It appears that the Epic Editor parses DLM markup as though it is a processing instruction rather than valid element markup (which begs the question as to why PTC/ArborText didn’t use processing instructions in the first place). PTC/ArborText is providing excellent factory level support to Unisys to resolve outstanding issues.

The current implementation priority is to complete the required functionality, then optimize for performance, and then address any usability concerns. SAIC endorses this strategy although care should be taken to keep in mind performance and usability concerns that can be rolled into functionality changes.

4.4.4 Outstanding issues

The link management functionality currently delivered is clearly causing concern amongst the PAL team. A credible working version of the link management system was yet to be delivered at the time SAIC performed its review. This makes addressing user comments and any meaningful comment from SAIC about performance and functionality premature. Together with the Editorial diary (which is intimately connected to the link management infrastructure), this is the subsystem that causes SAIC most concern about the PAL system and its readiness for deployment.

Unisys are confident that they have seen all the components of a link management system work in isolation and, once they have information from PTC/ArborText, will be able to put all of these components together to deliver a working link management system that addresses many of these concerns. SAIC and the PAL team are eager to see them deliver on this confidence.

The following issues will remain in the proposed deliverables.

4.4.4.1 Link management and versions

The current PAL link management solution as designed does not allow for maintenance of links on previous versions of documents. Internal links are resolved within each version of the document (so a link in the original Assent version of an Act will go to the listed target in the Assent version despite the fact that the target may have been removed by a subsequent amendment and not be available in the current consolidation). External links only go to the latest version (typically the current consolidation) of the target document. That means that an old version of a document will always point to the latest version of the target. That means that, despite validating a link at the time it is created, at any given time, many links in the collection can be invalid. This is most likely to happen when a document or a provision to which there is a link is repealed or revoked. Current drafting procedure is to perform a search on a repealed or revoked provision to find any references to that provision and amend those references at the same time. The PAL system includes a search capability to support that thus reducing but not eliminating the risk of this happening going forward.

This also means that the timing of the link check is crucial. A check is currently performed as part of the drafting process before a Bill is introduced. Intervening amendments can insert references to provisions between the drafting of their repeal and its commencement (and subsequent consolidation). This means that additional checks should continue to be performed as a Bill progresses through the enactment process (while it is still possible to insert additional consequential amendments to related reference wording). A link management solution that also incorporated time (including prospective times and uncommenced amendments) and checked links in Bills before Parliament could further reduce the risk of creating dangling links.

The current PAL system as specified by PCO only requires link markup to be present once a Bill becomes an Act. Performing a reference lookup on Bills would only pick up changes where somebody had chosen to insert this markup earlier in the process so a reference lookup in Bills would not guarantee detection of all potential dangling links.

PCO should be aware that, while the PAL solution as currently designed is an improvement on the capabilities available in the current non-PAL system, it does not guarantee link integrity. It is still possible, although less likely, to create new dangling links without realizing.

4.4.4.2 Link management and copying and pasting

Copying and pasting text that contains managed links or targets is also potentially dangerous. Unless care is taken, multiple occurrences of a target could be created when a target is copied and pasted into another document. DLM has been designed to manage this process and assign new IDs (although probably not in a version-aware fashion). RU users need to be aware of this exact behaviour when applying amendments to ensure that the desired link behaviour is achieved and existing links remain valid when required.

Although less dangerous, copying and pasting link markup might also cause issues particularly if the markup is copied (referring to one target) but the content of the cross-reference text is altered to implicitly refer to another target without updating the link markup.

So that this issue can be properly addressed in training, the PCO should insist that Unisys document the supported copy and paste behaviour when the copied item contains a referenceable target or reference markup and describe any procedures to "clean" the pasted item to ensure correct system behaviour.

4.4.4.3 Performance of link management

The DLM system was designed to support authoring large documents by editing small components of the document rather than the whole document. As described above (section 4.1.2), PAL users (like most legislative drafting users) tend to prefer to interact with whole documents rather than fragments of documents. This has lead to a number of performance issues with registering large numbers of link targets at once (a large document contains potentially hundreds even thousands of potential link targets all of which are registered when a document is registered for link management) and large numbers of links (large documents typically contain significant numbers of links as well).

Unisys has spent considerable resources investigating the cause and potential remedies for these performance issues. They are reasonably confident that it is not the database as database transactions are a relatively small proportion of the time spent registering targets and links. The registration process has a number of steps that are all potential bottlenecks for larger documents:

The solution Unisys has chosen is to manage the registration of large documents by breaking them up on the client end and registering each fragment one by one to ensure that resources are not over taxed causing slower than expected performance. This should address many of the performance issues although it may produce a few of its own. SAIC will wait eagerly with PCO for a link management solution to be delivered that addresses these concerns which Unisys assures us will occur within weeks of the initial investigation leading to this report.

4.4.4.4 Link maintenance and print rendering

There are a few raised issues with respect to links and print rendering artefacts. It is quite likely that these are caused by changes in the markup scheme and toolsets working on a mixture of older and newer markup. PCO should ensure that the delivered link management solution produces markup that renders correctly as specified (or that any changes to the markup are also propagated to the Legislation Output Specification and the resulting stylesheets).

4.4.5 Recommendation on link management

Since Unisys are planning to release entirely revamped link management functionality in the near future, any comments and recommendations about the link management functionality as currently delivered are of little value in advancing the project. SAIC’s recommendation is to await the delivery of the new, improved version of the functionality and assess it against the requirements including KPIs.

4.5 UUI

In the first delivery of the PAL system in 2002/2003, Unisys performed a great deal of customization of the default Documentum interface. This made migrating to more recent versions of Documentum problematic and tightly coupled the system to Documentum as a back end.

When the project was recommenced in 2005, Unisys presented the PAL team with a proposed architecture where a UUI (Uniform User Interface) was hosted within the authoring environment and provided a user interface to the required back end functionality through a Service Oriented Architecture (SOA), commonly called web services.

This UUI currently delivered does not quite match this description. It was written in Java and, together with the PTC/ArborText Epic configuration, provides the primary interface for PAL users to the system functionality. However, it is a completely separate application that initiates the Epic environment but its functionality is not currently accessible from within Epic. A user cannot check-in a document that they are working on from the authoring environment. This is not really consistent with the architecture that was agreed to by the PAL team.

Many of the usability issues described below could be dealt with relatively simply by providing a check-in capability for the currently active document directly from the authoring environment.

The UUI provides an interface to the CMS functionality as well as to reports and the Editorial diary. These are each described in more detail below.

4.5.1 Interface to CMS

The interface to the CMS supports searching for documents in the collection, retrieval of different versions and fragments of those documents, check-out of those versions and check-in of new versions.

User reactions to this functionality varied between reasonably warm to frustrated. PCO users are quite familiar with Documentum and know what functionality it can deliver. A Documentum CMS has been in use there for many years. The small subset of this functionality available to them in the UUI is likely to disappoint some.

4.5.1.1 Search functionality

The current search functionality only supports job number or title searches and the title search only searches for words beginning the title. Words occurring later in the title are not used to find the document. Search within the content of the document is not currently delivered (although is specified and designed). This practically means that you either have to know the exact title of your document or know the PCO job number—possibly reasonable in a test system containing only a few hundred jobs but not very practical in a production system managing many thousands of documents.

4.5.1.2 Search result display

If a user is successfully able to recall the beginning of a title or a job number, the result display lists all the fragments of documents that matched the search request including the whole document. These are supported alphabetically by display title. This means that, if the document title begins with anything between "Par" and "Sub", it is likely to be somewhere in the middle of a list containing a number of entries beginning with "Part", "Schedule" or "Subpart". Most users want the whole document most of the time. The utility of this search result list would be vastly improved by ensuring that the whole document was always first. Alternatively but less desirable is to identify it with some presentation artefact—boldface, italic, colour, or some icon.

Once a user successfully identifies the fragment or whole document they wish to access, they can click on it and see a display of the available versions of that document and any supporting documents associated with the document. This functionality is generally warmly received with the following notable exception.

4.5.1.3 Unmet reasonable expectations

The UUI contains a number of potentially useful features as specified that would mitigate the limitations of the search interface which include a "My Recent Documents" screen (which contains the 10 most recently accessed documents by that user) and a "My Worklist" screen (to which users can add their own documents or document fragments).

Both Documentum and Windows support "My Recent Documents" functionality. Under both systems, you can simply click on the entry for that document to open it for further editing (in Documentum, it checks-out the document if required). SAIC questions what utility there is for such a screen if you can’t navigate from the list of documents to the Authoring environment either directly or indirectly.

The search result set list is similar to a file browse capability and mirrors the Documentum explorer plug in to some extent. Both of these environments allow a user to enter "F5" to refresh the list. The current UUI requires the user to re-enter the search in order to re-execute it.

The current display does not provide the user with any information about the document ID or the file location in which the UUI places a checked-out version of a file. This makes it virtually impossible to find that document if the authoring environment is exited before the edit on the current version is completed. The UUI relies on a magic file location (not revealed to the user) from which to upload the document for check-in rather than allowing a user to select a different file location to upload.

4.5.2 Reports

The UUI also contains a set of screens for creating and viewing reports. The reports screens require a full set of data to produce meaningful results and this will not be available until the system enters UAT. PCO should ensure that the reports are fully exercised as part of UAT to validate that the system delivered meets the needs for reporting as specified.

4.5.3 Editorial diary

The editorial diary was never delivered in the initial delivery of the PAL project in 2002/2003, and well into the system testing phase of the current project an editorial diary is still not delivered. A prototype was scheduled for delivery 6 months ago and has not been realized. Virtually every other component of the PAL system is an implementation of functionality that is available in another system in use by the PAL users and most PAL users have seen a previous version of the interfaces in which they will be spending the majority of their working life. However, the editorial diary is an electronic system for managing a process that has no electronic precedent within the PCO. No previous model exists for this functionality within the PCO (and the system used internally in Brookers is only of partial relevance). A manual precedent involving literally cutting and pasting paper is not a strong model for an integrated, electronically managed process.

This has led to concerns independently expressed by multiple members of the PAL team (including 2 reprints staff) about the short time frame in which to respond to the functionality and usability of the delivered subsystem. Given that this is the first attempt at a user interface to support this task, it is likely to need substantial refinement.

The proposed implementation of the Editorial diary is reliant on link management system changes yet to be delivered (primarily managing the document ID in amending legislation of the amendment target or Principal). Once the link management infrastructure is in place, Unisys claim that the editorial diary can be delivered relatively quickly.

While it is possible to deploy the PAL system without an editorial diary in place, there are a number of factors that might make this difficult:

The reliability and integrity of the editorial diary repository (not necessarily the user interface) is essential to managing the reprints process. It is vital to know what is NOT complete. Changes and refinements to the user interface are of lesser concern.

SAIC recommends that PCO insist, as they are entitled to, that the system only be deployed with a functioning editorial diary that satisfies the stated requirements. If PCO are to accept any lag between deployment of the PAL system and deployment of the Editorial diary, this should be kept to only a month or two to ensure that the backlog of reprints and consolidation work is not overwhelming.

4.5.4 Recommendation on UUI

Recommendations on the Editorial diary component of the UUI appear immediately above.

The remaining UUI functionality provides an interface to the required functionality but suffers from a number of more minor usability issues. While it could be deployed without addressing any of these issues, the desire to deploy sooner rather than later should be weighed against possible user backlash of deploying without fixing some of the relatively simple issues. In particular, the lack of a click-through on the My Worklist and My Recent Documents renders these screens completely useless.

PCO should ensure that the reports are fully exercised as part of UAT to validate that the system delivered meets the needs for reporting as specified. These were not available to SAIC to examine in any detail.

4.6 Public website

In some ways, the public website is the primary driver for the PAL project. Public Access to Legislation is ultimately the goal of this project and the PAL website delivers access to documents at every stage of the legislative process in HTML and PDF format.

The HTML produced by the current web publication process is attractive and sufficiently close to the print display to be usable in the absence of PDF documents. The website has been tested from Bar-1 to Bar-5 through to Act delivering each version to the web without issues including repealed Acts and split Bills. There are a few outstanding format and search issues that appear relatively simple to resolve. Providing that the link maintenance capability is delivered, hypertext links for cross-references and structural navigation will be available in the website (subject to the limitations described above in section 4.4.4.1).

While caching has been removed from the current website, this is not of any great concern. The architecture uses what is commonly referred to as a "baked" approach, where the HTML versions (both whole document and section/schedule fragments) are created at the time of publication rather than generated on demand from the XML (commonly referred to as the "fried" approach). This effectively means that the HTML versions are already cached in the web application file system. Additional caching infrastructure is therefore redundant.

5. Deployment approach

The above section summarizes the PAL system components and areas of functionality and their readiness for deployment. This section focuses on system-wide issues affecting deployment. These issues include:

5.1 Administration

Unisys has proposed using email to report on the requests made to the PAL server infrastructure. An email message is sent for every transform, publish and print request. Access to this email list will allow Unisys and PCO IT to better diagnose problems and what component of the system is involved.

In order to maintain independence of the PAL system from other IT infrastructure and to avoid issues with the Parliamentary Service email architecture, Unisys is proposing to provide a POP3 server within the PCO network to which email messages will be delivered. Virtually any and every email client can interact with POP3 so this solution is pretty independent of the chosen email infrastructure within the client organizations. Centralized logging and management of transaction events is essential for proper maintenance of such a complex system and exposure to PCO IT staff of this information will provide a corporate memory within the PCO that will future proof the PAL infrastructure.

The PAL team (including PCO IT staff) and Unisys maintenance staff will be able to maintain a permanent log of the transactions by connecting to the POP3 server using their preferred email client and maintaining a local copy of the email. They can also use filters to classify the emails for statistics gathering and other uses.

Steve Thomsen’s technical review recommended augmenting the existing UUI with administration screens to avoid having a separate piece of infrastructure. SAIC believes that a separate POP3 server is sufficiently independent and having the reporting mechanism available from standard email infrastructure provides other benefits that a custom user interface would not provide. The huge variety of tools available to manage and redirect email (including sorting and classifying, forwarding to selected clients including mobile devices, etc,) provides a great deal of flexibility for reporting of various events to those who have to manage the PAL infrastructure.

PCO has a team of three internal IT staff who provide an extremely responsive level of support. They can currently create user accounts and modify access privileges relatively quickly and simply. SAIC understands that the SLA for ongoing maintenance is currently under negotiation. If PCO IT is to continue to provide this level of service, they should be provided with the facility (access and procedure documentation) to create and modify accounts and access privileges on the PAL servers.

5.2 Packaging of releases

The servers that support the system testing, user acceptance testing (UAT), and production environments are all located at the Unisys data centre. In the event of a Wellington disaster (e.g. terrorist attack or earthquake) compromising the primary PAL sites, a laptop installed with the drafting environment could be physically taken to the server site to extract the relevant documents and the laptop could be used to generate PDFs for printing for any emergency sessions of Parliament. Any connections to the server from the authoring environment could fail over to a mode supporting saving directly to the file system (as is done in many jurisdictions). This provides a modest but reasonable level of disaster recovery without unnecessary expense.

Releases for the client seem to have the server location and ports for the relevant server hard-coded into the release. This means that a release for system testing cannot be the same release passed to UAT and production. Normal practice is to separate the configuration of server locations and ports from the release packaging to allow a single release to proceed untouched from system testing to UAT and then into production. The configuration of server locations and ports is typically contained in a separate text or XML file. This allows the test team to be sure that the exact release that they tested is what is being deployed into publication. Steve Thomsen’s review identified this as an issue and SAIC agrees that this practice is undesirable and that Unisys should create a release with separate configuration that allows it to be pointed at the system testing, UAT and production servers independently of the release packaging.

The PCO has had a number of pre-UAT releases delivered in to the environment. These have involved DTD changes and changes in the DLM repository. Partial and incomplete registration of documents to the DLM has also resulted in stale or incomplete data being present in the repository. The result is that the system testing repositories cannot be trusted to provide the user interfaces with reliable, clean data and some reported errors are almost certainly due to the existence of stale data within the servers. Unisys should ensure that a completely clean repository is delivered to UAT and that any releases delivered through the UAT period that might invalidate server data be coordinated with cleansing of the server repositories to ensure that the tests are a valid and fair assessment of the delivered functionality.

The IRD client is a slightly different configuration that relies on the presence of various shared drives that are not present in the PCO environment. The system testing is yet to be deployed physically in the IRD network. Such a deployment needs to be tested as part of the system testing and UAT to ensure that the PAL environment can be successfully deployed in both PCO and IRD environments.

5.3 Maintenance of stylesheets

Stylesheets are crucial to the PAL system for interactive viewing of the underlying XML documents and for publishing the documents to PDF and HTML formats. The complexity of the XML DTDs (and the underlying documents) and the exacting rendering requirements results in quite complex stylesheets. Concern has been expressed in the past about the ongoing maintainability of the stylesheets. The latest release of the PAL project has attempted to address this by utilizing available tools.

PTC/ArborText provide a Styler tool that delivers a graphical user interface for creating stylesheets that can generate FOSI (an SGML-based US Department of Defense standard for describing the format of structured documents), XSLT transforms to generate XSL-FO (a W3C standard for describing the layout of a physical page), and XSLT and matching CSS to generate a HTML for web viewing of the documents. Earlier stages of the project correctly identified that XSL-FO was not sufficiently powerful to represent the entire document formatting requirements of New Zealand legislation. This means that FOSI was required in order to utilize the full formatting capabilities of the PrintComposer and E3 rendering engines.

SAIC notes that the FOSI produced by Styler augments the basic MILSPEC standard by using XPath (a W3C standard for identifying locations within XML documents) to select formatting rules. The XPath implementation in E3 and PrintComposer is only partial and appears not to be optimized.

Styler is used to create both screen and print rendering rules. In order to reduce the size of the resulting stylesheets and to improve the interactive performance of the screen rendering, the stylesheets generated by Styler are split into screen and print rendering versions using a small script (presumably XSLT but SAIC was unable to verify this).

The FOSI generated by Styler contains markup for every value even where the rule takes the default value. This results in extremely large stylesheets. It is quite feasible to programmatically strip out these default values. Given that a script is already being applied to the Styler-generated FOSI, this script could be augmented to strip out these default values without changing the current infrastructure. Removing these default values will almost certainly improve the time it takes to load the stylesheet (which is almost certainly cached in the E3 server but almost certainly not cached in the PrintComposer environments). Stripping these defaults could improve the processing performance, significantly degrade the processing performance (accessing local values may be faster than attempting to find local values, failing and then finding the default value), or have no discernible impact on interactive performance. The Unisys team should investigate if stripping these default values out will have a positive, negative or neutral affect on interactive performance of rendering both in the screen environment and in the print rendering environment. This should be done sooner rather than later so that any unintended changes to print rendering can be detected before deployment (alternatively, a reliable regression testing regime for the print rendering would reduce the risk of deploying such stylesheet simplifications).

The HTML generated by the Styler XSLT had conformance issues with the New Zealand government web guidelines (including not producing valid HTML 4.0 and other usability issues) and produced extremely large CSS files (500-600kb). For these reasons, Unisys decided to manually generate XSLT for generating HTML for the web. The manually-generated stylesheets had very few formatting and web guideline issues and these were easily resolved. The CSS was much smaller (25kb if the extra code for Internet Explorer 5 anomalies is removed). They also ran much faster.

SAIC considers that the decision to manually manage XSLT to create HTML to be an acceptable compromise. At some point, the burden of fighting with a tool designed for simpler tasks becomes higher than manual processes and it appears that the Unisys team well and truly met this point with Styler and generating HTML.

5.4 Outstanding defects, change requests and issues

At the time of the review, there were 5 critical defects open with 3 of those resolved and awaiting testing, 1 probably mislabelled as critical (#345803—incorrect display of //extRef), and the remaining issue (#30749—problems printing ITA) looks to be largely resolved (the ITA can and has been printed from the system) with some configuration issues still to be finalized. Of the 114 urgent defects still open, approximately 31 are resolved awaiting testing. Another 4 are categorized as "resolved—as designed". Even if this means there will be some negotiation over the design specifications and possibly moving these to change requests, if they are urgent, the work still needs to be done.

There appears to be some duplication in the lists of reported defects. It is likely that many of the reported defects can be fixed by the same solution. As noted above, a number of outstanding issues are resolved and not tested. Both the PAL team and the Unisys team are currently managing change requests to address only critical changes. Many of these have been implemented but not formally delivered for testing. This means that the graphs and numbers relating to defects and change requests are actually in a considerably better position than a superficial glance might suggest.

There are also a number of raised issues that might be resolved by resolving other issues first. For instance, the issue with the "changeable" font could be completely obviated by supporting automatic updating of cross-reference wording within Bills and draft subordinate legislation.

Both change requests and defects are being categorized for urgency and scheduled for addressing before UAT, during UAT, and after deployment. SAIC recommends that the PAL Steering Committee continue to maintain a report on outstanding issues and change requests and monitor the scheduling and prioritization of these to ensure a timely deployment.

5.5 Schedule slippage

SAIC’s consultant attended the PAL Project Steering Committee meeting on 9 February 2007. At that meeting, it was suggested that the project was nearing a point where a realistic date for deployment could be determined and that, rather than managing schedule slippage, the approach could shift to managing the functionality and issues to be resolved by that date—a "line in the sand" approach. SAIC believes that the most crucial components of the PAL system are ready for this approach although has some concerns about the yet undelivered editorial diary functionality and some of the link management functionality on which it depends.

In the week beginning 5 February 2007, Unisys were confident that this functionality was only a few weeks away from delivery to system testing. Once this functionality has been delivered, SAIC believes that the Steering Committee could make such a decision in an informed manner and choose a deployment date with confidence.

To this end, SAIC recommends that the PAL team continue to prioritize all outstanding issues, whether defects or change requests in order of preference so that the development team steadily works towards fixing these in a defined order so that the most crucial to deployment are more likely to be delivered to User Acceptance Testing and to the first deployment. It may be helpful to identify which development teams would work on which defect category to manage priority lists for all teams to ensure that the available development time is fully utilized.

The advantage of this approach is that it moves away from discussing schedule slippage to considering what is possible to achieve in the available period. This makes planning for both the PAL team and the Unisys team much easier. Ongoing discussions will also then focus on selecting and prioritizing development and deployment priorities. This will help to further isolate arguments about whether a required change is a defect or change request to the level of management that deals with cost and payment rather than unnecessarily involving the user community.

5.6 Change management

While SAIC has been engaged as a technical consultant on this project, the system is a huge change to the operating environment of the PAL users. The system will provide the primary work environment for most PAL users, particularly PCO and IRD drafters, secretarial staff, PPU and the reprints team. This involves huge changes to the working environment of all of these users and attendant change management issues primarily for the PCO and IRD management. While both PCO and IRD are aware of many of the change management issues, SAIC will highlight a few that may not have achieved as much attention.

PCO and IRD expect secretarial staff will play a considerable role in assisting drafters initially to create and edit documents. To this end, many of the secretarial staff have been given advance exposure by involving them in system and user acceptance testing. Others have been encouraged to use the system in the pre-UAT environment without formal testing roles.

SAIC believes that the new PAL system will encourage drafters to seek secretarial (and possibly PPU staff) to help choose the correct tags early rather than the current practice of getting secretarial staff to fix bad styling in the WordPerfect documents later in the process. This cultural change should help reduce the last minute pressure on secretarial and PPU staff to get the document right before it goes out of the door. However, it is a change in current practice that will require explanation and management to ensure that the transition is smooth.

There may be an initial and ongoing temptation to use the secretarial staff to create any markup that is at all difficult. While the drafters are employed primarily to write words and not as typesetters, structuring the legislative material is also an important part of drafting and should involve substantial input from the legal officer. PCO and IRD will need to monitor carefully once the system is deployed to ensure that drafters are not placing too much additional burden on the secretarial staff and that they are transitioning successfully to drafting in the provided environment.

5.7 Performance issues

5.7.1 Screen rendering performance

The complexity of the stylesheets created for viewing legislative documents is such that screen views in interim deliverables were refreshing extremely (and unacceptably) slowly. Unisys has experimented with a number of setting changes to address this limitation including only refreshing the display of generated text on request (instead of on every key stroke) and by eliminating space before and after rules in the screen view.

SAIC experienced good interactive performance with the current deliverables for common drafting tasks for small to medium length documents (clearly, there were unresolved issues at the time of the review to do with DLM performance which have been addressed separately).

While the setting changes address this problem (and have been documented), some additional changes still need to be made manually in the editing environment. Some of the tools in the authoring environment for managing numbering and print preview can undo some of these settings. Care should be taken by these tools to restore the settings that were in place before the operation (to the extent that that is not contradictory to the operation of course).

5.7.2 Print rendering performance

See above in section 4.2

5.7.3 Performance testing strategy

Unisys described to SAIC that they had subcontracted the performance testing of the system including both interactive performance of the authoring environment and performance of the print rendering solution to a specialized contract tester. However, until the legacy data has been deployed in the system (planned for beginning UAT), it is impossible to reproduce a meaningful facsimile of the production environment. Four weeks has been set aside in the schedule for performance testing. SAIC recommends that PCO ensure that a realistic hyphenation exception table be used for these performance tests (see above section 4.2.3.4).

5.8 Graphics format

There is still considerable concern about the choice of graphic formats for dealing with images, formulas, and similar artefacts in the legislation collection.

5.8.1 Background

This is a complex issue that requires a little background. Different image formats are better for different types of image. The main two types are raster and vector image formats. Raster image formats represent the images as a set of dots and, like a newspaper image or digital photograph, if you zoom in far enough, you will see jagged edges (typically referred to as pixilation). Vector images represent the images as a set of shapes. The image is only rendered as dots when displayed on a screen or page so can be arbitrarily zoomed without loss of sharpness.

The size of a raster image representation is dependent on the size of the image, the number of colours represented, the resolution of the image (how many dots per inch), and any compression used. Making appropriate selections to produce optimal parameters to maximize the quality and minimize the size of each raster image for each image is difficult and requires some expertise although tools exist to assist. Common raster image formats include TIFF (allows a wide variety of resolutions, colours, and compression approaches), GIF (lossless compression), JPEG (lossy compression better suited to photographs), and BMP (no compression at all). Raster formats are best used to represent photographs or scanned images.

The size of a vector image representation is dependent on the complexity of the shapes in the image and any compression employed. Display size, display resolution, and to a lesser extent, number of colours, are not typically factors. Vector formats are best suited to diagrams, maps, signs, collections of symbols, etc—the type of images more often found in legislation.

Virtually every web browser supports the GIF and JPEG formats. TIFF is only supported in some very common browsers via a plug-in using non-conformant HTML. SVG is not supported natively by the leading browsers Internet Explorer, Mozilla Firefox, Safari, or Netscape. All require a separate plug-in. Two free SVG plug-ins are widely used—Adobe’s SVG Viewer and the Open Source Squiggle from the Apache Batik project. The Adobe SVG Viewer is typically installed with the free Adobe Acrobat Reader (and automatically registered with Internet Explorer and Safari but not Mozilla/Firefox). Given that the PAL web site delivers PDF as one of its output formats (typically viewable only if the Adobe Acrobat Reader or similar plug-in is installed), relying on a plug-in for SVG places the user in no worse position than for PDF (or TIFF).

There are a number of commercial and free tools for converting between formats. Converting from raster formats to vector formats is unreliable and different tools produce wildly different results. Converting between raster formats or rasterizing vector formats is much easier.

5.8.2 PAL and graphic formats

Unisys originally intended to use SVG as the storage format for all images on the current release of PAL. This is consistent with the nature of graphics in legislation (most typically suited to a vector format) and SVG is part of the XML standards family. Visio was the proposed authoring and transformation tool (to convert other formats to SVG). However, SVG images created by Visio with arrow heads (SVG version 1.1) are not correctly displayed in Epic or rendered by E3 (presumably implementing only 1.0) even though they are visible in the Adobe SVG Viewer.

Unisys proposed TIFF as a replacement format proposing 600 dpi resolution to ensure high quality print results. The problem with this is that TIFF is a raster format so zooming will eventually lead to pixilation. It is also not really well supported by any of the widely-used browsers.

It appears that E3 converts SVG into TIFF for rendering anyway potentially creating huge PDF files. However, ArborText rendering products produce PostScript as a by-product to producing a PDF so are extremely unlikely to rasterize an EPS as part of the rendering process.

5.8.3 Storage issues

To put the discussion of image formats in context, the Income Tax Bill 2006 contains approximately 400 formulas (many of which would be represented as graphics). The Customs and Excise Act 1996 contains a number of tables that are drafted outside of PCO (using InDesign) and are currently provided in PDF format only. Representing these tables as images would involve representing more than 100 separate pages (full page-sized images) and require on average about half of the pages in a particular table to be updated if any changes were made to a row in that table. If care is not taken about colour (assume 24-bit) and compression (assume none), a single A4 sized-image could be as large as 28MB. In the Customs and Excise Act 1996 with 100 full page images that means about 2.7GB! Excise legislation is amended reasonably often but, regardless of the size of the amendment, each commencement date would create the need for an entire new PDF to be stored along with all of the images that are used to construct it. Since the performance of the web site, the CMS, and the interactive performance of the network will all be affected by document size, it is clear that the image issue needs to be well-managed.

Note that the numbers above are a worst-case scenario. The PAL system can achieve much better document sizes by selecting colour only when needed (only rarely required and not for customs or tax legislation) and by using appropriate compression if a raster format (TIFF or GIF) is used.

5.8.4 Recommendation on graphic formats

The lack of support for SVG arrowheads in E3 and Epic seems to be a bug or at least a limitation of an earlier version of SVG (that has been remedied in a number of open source products recently). The most desirable solution is to continue to use SVG (or preferably compressed SVG—.svgz) and address the limitations of the SVG rendering in Epic and E3. Unisys and PCO should consult further with PTC/ArborText on the SVG rendering issues described above (primarily arrowheads and rasterizing SVG images). In the worst case, the rendering process could convert SVG to EPS using third party tools.

For web viewing, PCO should consider generating a 72 dpi GIF as many browsers don’t support SVG (or include instructions on how to download and install SVG Viewer or an alternative if the browser cannot display the image) or fully support PNG and TIFF and JPEG is simply not appropriate for the types of images appearing in legislation. This could be done automatically as part of the "publish-to-web" process unless manual intervention is required to size the images. The website could allow users to click on the GIF image to download the higher fidelity image underneath (whether TIFF, SVG or some other format).

Regardless, the source image formats (preferably vector) and any generated image formats should be stored and managed in the CMS to ensure that documents can be reproduced as initially rendered as required.

6. Summary of recommendations

The PAL system is tantalizingly close to being ready to deploy in production. The most critical components for deployment, the authoring environment for creating documents and the publication mechanisms including the print rendering subsystem for creating paper documents for delivery to the legislative process and the website infrastructure are almost deployable as is, and with very minor refinements, will provide a productive environment that will better support the democratic process and rule of law in New Zealand.

Other system components are at varying stages of readiness but are not as crucial to successful deployment. As stated above, it is quite normal for a system of this complexity to be deployed with known outstanding issues providing that viable workarounds are in place, the integrity of the data is not compromised particularly in a way that requires considerable additional effort in data conversion, and that long-term, sustainable capabilities are pending.

6.1 Required functionality

SAIC considers the editorial diary together with the other link management issues to be the largest risk area in the PAL project. Maintaining a reliable public web site of legislation is a major outcome of the PAL project and the inability to provide timely consolidations of current legislation will severely degrade the utility of any other delivered functionality. The current website contains hypertext links on cross-references within the legislation collection and any PAL web site must provide an improvement on the functionality of the existing web site given the public’s investment in this project.

Another obvious risk is the graphics format. Graphics can be used as a last ditch solution where the markup cannot represent adequately the intended meaning but must be in place to ensure that this fallback is available.

It is extremely late in the project cycle for decisions to be made about graphic formats. This decision will affect how the images are collected from the instructing agencies and will result in data integrity issues long term. Whatever format is selected, PCO should always ensure that they gather the images in a vector format where possible (the native format in which they were created or PDF unless they were scanned). While a high resolution raster image format may be the lowest risk solution for reproducing print versions, it creates risks for data storage requirements and performance and should only be used as a last resort. It is also likely to result in very large PDFs of documents that contain many images such as the Income Tax Bill 2006 and the Customs and Excise Act 1996.

Unlike many data issues, images can be programmatically migrated from one format to another. While it is possible to create vector images from raster formats, the results vary considerably. It is much more reliable to create raster images from vector images. Whatever decision is made, vector formats should be stored where available.

6.2 Data integrity

While there are a number of issues that look relatively trivial and appear to be easily dealt with after deployment, a number of these issues impinge on the integrity of the underlying XML data that is created. They include issues to do with minor DTD changes or the management of non-ASCII characters. Changing these after deployment could create considerable extra work cleansing any data created between deployment of the PAL system and deployment of the data solutions.

In particular, Unisys should provide the PAL team with documentation on how to manage hyphenation issues in such a way to ensure correct management and delivery to the web as well as print rendering to avoid any down-stream issues.

SAIC recommends that Unisys investigate what is necessary to remove the display of attribute values in the Bar-2 revision tracking markup. If a DTD-fix is the most effective solution, SAIC recommends implementing this before deployment.

While it is possible that the character mapping issues have been corrected, because this effects the integrity of the data, it is important that these issues be fixed before the system is deployed otherwise the PAL users will need to track any occurrences of these characters and manually check or re-enter these characters when any fix is delivered. It is much more practical and effective to fix these before deployment so no production data is ever affected.

Data integrity is also the most important factor in the editorial diary. While it is possible to deploy the editorial diary while awaiting user interface refinements, it is not sensible to deploy the editorial diary unless the underlying data management capabilities are stable and reliable. Providing that the reprints team can work confident that any work that they do between deployment and refinement of the editorial diary is correctly captured and managed, even if the interface is not ideal, these issues can be managed.

6.3 Usability

There is a whole spectrum of usability issues surrounding the PAL project ranging from trivial "nice-to-have" improvements requested by one or two users to serious issues that could contribute to data loss and loss of productivity.

Simply classifying an issue as "usability" is not sufficient to make a decision about implementation priorities. SAIC considers that usability issues (whether change requests or defects) should be prioritized together with other outstanding issues, defects and change requests to provide a complete, integrated implementation priority list. Any partitioning of this list should be along the lines of the development personnel responsible for creating the fix and the schedule priority rather than any other categorization.

6.4 Release packaging

As mentioned above, there are some short-comings in the way releases have been packaged by Unisys. SAIC considers that a release should be delivered to system testing, user acceptance testing, and finally production unchanged. Any new release should go through each of these processes. By embedding configuration parameters specific to the system and user acceptance testing servers in the release, Unisys is preventing sound release and deployment practice.

SAIC endorses the recommendation made by Steve Thomsen that the configuration parameters be separated from the rest of the releases so that a single (client) release can be deployed to either of system testing, UAT, or production. Ideally, the configuration of the servers will be transparent in this manner as well.

6.5 Schedule

SAIC endorses the suggestion made by the PAL Project Steering Committee that a realistic date for deployment be determined shortly—a line in the sand. As discussed above, this requires essential functionality—the link management subsystem and the editorial diary—to be demonstrable if not deployable. In the meantime, the PAL team should prioritize all outstanding issues, whether defects or change requests in order of preference so that the development team steadily works towards fixing these in a defined order so that the most important are more likely to be delivered to User Acceptance Testing and to the first deployment. This prioritization should pay attention to the priorities outlined above. This list should also record where possible dependencies and relationships exist between issues and proposed fixes.

Once a realistic and reliable deployment date has been selected, discussions about deployment will then switch from considering when deployment will occur to determining what functionality will make it into the initial deployment.

6.6 Conclusion

SAIC is extremely encouraged by the progress made on the PAL system since the last review. There is a buzz of eager expectation within both the user community and the development team that the system is very nearly ready for production deployment. Key components including the document authoring environment, the print rendering subsystem, and the website infrastructure, while always capable of improvement, could be deployed without undue risk now. Providing that Unisys can deliver link management and editorial diary functionality to system testing in the near future, and the most severe limitations of the UUI can be addressed, the remaining functionality should be deployable and a complete, integrated management system for the entire legislative document lifecycle deployed with confidence.

 

Footnotes

1 This refers to the main body material such as sections, subsections, definitions, paragraphs, subparagraphs, etc as well as container elements such as Parts and Subparts and simpler, more common forms of Schedules. Drafters and secretaries know the basic structure of legislation and really only need to be taught how to enter that structure to begin drafting effectively.
2 Note that MS Word is described as a WYSIWYG editor but still provides a print preview capability. "What You See Is What You Get" was always an approximation rather than an actuality.
3 Delivering this functionality might also diminish issues with the "changeable" markup and font choice (see section 4.2.3.3 below).
4 Defect No 341052.
5 Note that regression testing is not just about testing the system against a set of pre-agreed example documents but is about adding documents that exhibit defects to the user test suite so that subsequent changes to the code (in this case, the stylesheets) do not reintroduce bugs that were previously fixed and tested before the code is delivered to system testing.
6 It is always possible, although undesirable, to only display a single level of change tracking which should remove the untested corner cases.