Working it out in Hazard Logs

Railway safety management is a complex subject that involves a significant amount of manual intervention in the assessment, analysis and control of risk. Supporting documentation is, usually, worked on by multiple parties, with differences in system viewpoints and writing styles. Maintaining quality safety documentation is therefore an interesting challenge for the industry. Hazard logs, for example, play a central role in both system engineering and risk assessment activity.

The role of the log is to contain a representation of the risks related to the system under consideration. The content of the hazard log relies upon input from a variety of sources and collaborative activities involving teams with varying expertise and knowledge. From past experience we have found that the quality of this information can vary greatly both within and between projects. This is particularly so for larger projects where problems can arise when the amount of textual data that has to be processed increases. The volume and variety of the data and the need for collaboration creates the significant challenge of managing the content, keeping up the textual readability, format and consistency.

What we are currently working on is a tool that automatically assesses the ‘quality’ of a risk log. The intention is that the tool can be used to monitor the quality of a hazard log in ‘real’ time or at least at regular intervals during a project or for checking the output from critical risk workshop sessions. The tool uses Natural Language Processing and machine learning to assess the quality of a hazard log, based solely on the textual content in the log. The method includes text classification and term frequency-inversion to identify important keywords on different textual elements to represent quality indicators.

The intention is not to replace a human expert, but rather to support assessments by providing an early indication of the textual data in the log. This involves checking for signs of imprecise and unclear writing and identifying issues that may make it hard for readers to fully interpret accident sequences. The tool has been built around the CENELEC standards to aid compliance with both the standards and risk management best practice in general.

A preliminary study in collaboration with Lancaster University has been undertaken to prove the method. Results from this study have demonstrated the power of using textural analysis in this arena. We have identified a number of hazard log quality indicators and developed demonstrator software which performed well against a manual evaluation of a sample data set. In general, the tool can help the users by saving time and effort by helping in the review of entries in the log. It can also help clarify thinking around accident sequences by highlighting ambiguous or multi-content entries.