RADIX is focused on the asset integrity management (AIM) data mining problem and uses a combination of subject matter expertise (SME), records classification and attribution technologies, custom search tools, database management, and data synchronization robots. We target the records and legacy systems and prepare a cost and risk-stratified work plan in alignment with client objectives and budgets. Our goal is to produce to engineering data sets needed for pipeline integrity management and facility pressure vessels at a fraction of the cost typically incurred using standard methods.
Our goal is to improve engineering efficiency by using our advanced methods to establish the data baseline which feeds a number of enterprise applications. Client’s data for projects is retained on our servers and desired data populated into a custom SQL database which can feed other software including PODS, ECM's RBI and PMS software applications.
RadixData’s clients are operators of liquids and gas pipelines, gas plants, refineries and chemical plants, as well as the engineering services companies and enterprise software companies involved in the projects. RadixData supports projects such as:
RadixData’s job is to rationalize the data to the objectives of the engagement and deliver a clean data set which can feed other applications.
Define Records and Data in Scope
Asset Integrity Management requires data from multiple sources some of which are vintage paper records and antiquated or proprietary systems. RadixData has a process for identifying those sources, analyzing the content, and developing a process for mining data from those sources. The process is completed when all data is gathered, and a complete data set exists to import into an Asset Integrity Management System. Our clients leverage our expertise in filling addressing the problems with gathering vintage records and data to fill the gaps of missing data. The process defining of records and data in-scope the gathering of the data is iterative and continues until all gaps in data are filled.
Data Investigation and Collection
Radix Data will organize interviews with key stakeholders in the operator’s organization to fully understand record descriptions, search criteria, physical and virtual locations for data and records of interest. The interview yields information held by long time employees and client subject matter experts, providing a history of the types, locations, and nomenclature associated with legacy records. This input feeds the collection process, where collection teams are fielded to defensibly query and gather both physical and electronic records. Collected records and data are processed in our facilities to populate the project database.
Content Analytics
Radix Data performs “Records Viability Analytics” of current metadata to assess the availability of critical records types. Although generally sparse and incomplete, the metadata hits will point to the “low hanging fruit” records which appear to be readily available. This metadata typically includes digital and physical file inventory descriptions and searchable index fields in content management systems. Performing analytics on the metadata creates a profile of the available records, facilitating decisions by the project team to intelligent processing.
RadixData What’s in the Box (WIB) is complimentary to content management systems. WIB can create a detailed inventory expanding the current content management system inventory to broaden the scope of records for processing. This can be done at a low cost with high returns for AIM projects.
Data Mining Objectives
The data mining objectives associated with preparing a highly curated engineering data set for an AIM program typically consist of the following components:
Our observations of the difficulties associated with current data gathering processes include the following:
Collaboration to develop the business rules begins at the onset of the engagement and continues throughout the process. Initial business rule definitions are based on the known available data and determine the primary record source for each data point. As data gaps are realized, secondary and tertiary records are identified to fill the gaps. Business Rules allow for the standardization attributes by defining the unit of measure, the numerical format, the degree of accuracy (decimals), and if calculated how the value is derived. Business Rules are the foundation for creating complete and defensible data. Business Rules clearly define the process for collecting, validating and standardizing data, making the process reliable. The process is repeated for each attribute and therefor traceable.
Source Data from share drives containing PDFs, excel spreadsheets, databases and/or digital images are processed through our Data Inventory and Transformation Process where hash tags are generated for chain of custody, zip file content is extracted and flattened for inventory of each file, removal of blacklisted file types, and de-duplication is performed. The inventory allows for tracking the files and their pages throughout the entire processing cycle. RadixData provides a DIT report listing the inventory database containing the batch information. The DIT inventory is performed for each drive or collection of data received and can be reported on as a whole for the entire project and by each source.
RadixData performs discreet records classification after the DIT process where we assign each document to a classification based on title and/or the structure of the content on the document. Where a title is available and can be matched, the document is identified as belonging to a particular classification. While your organization may contain many unstructured sources, it’s unlikely that all the information is valuable to your organization. By using our services and the power of computer processing, the attribute extraction platform can sort through and find only the information you need. Think of it as an automatic filter for any unstructured data that you are managing.
For example, a U1 is a form managed by Regulators and has a clearly defined title: FORM U-1A MANUFACTURER'S DATA REPORT FOR PRESSURE VESSELS. In some cases the title is cut off or not legible but can still be classified based on the content of the document and how that content is structured. For example, the following U-1A contains information in a specific order with a label for each. When compared to the FORM U-2A MANUFACTURER’S PARTIAL DATA REPORT (ALTERNATE FORM), though most of the same information is provided the order and structure of the information and labels differ allowing for classification of each as their respective type.
RadixData performs a Gap Analysis to determine the presence or absence of key documents. This is a key step in the process. Often, we find that organizations feel their data is complete when in fact they are missing data. This can occur when records are labeled incorrectly or misfiled. The process occurs in tandem with the data scraping and quality check processes and is repeated until all desired records and data points are present. In some instances, the primary record may not be present, but the gap analysis is satisfied according to the business rules by secondary and tertiary documents. The process is iterative and is determined as complete by our client. Upon completion of the project a final gap analysis is performed.
RadixData’s intelligent algorithm is made up of rules written with the experience of processing millions of documents with incomplete, old, and in some instances unstructured data, similar to what your struggles are today. The platform can easily be enhanced, via a customer specific layer, taking advantage of all the existing knowledge and configuring new rules specific to your organization's needs. A powerful feature is the ability to pre-validate information found in your data. Our platform can match existing information within your database. The system also has the ability to pre-validate against standard logic or criteria defined by your business rules. Our automated data extraction software coupled with human verification is able to help your organization achieve the highest level of accuracy.
Collaboration to develop the business rules begins at the onset of the engagement and continues throughout the process. Initial business rule definitions are based on the known available data and determine the primary record source for each data point. As data gaps are realized, secondary and tertiary records are identified to fill the gaps. Business Rules allow for the standardization attributes by defining the unit of measure, the numerical format, the degree of accuracy (decimals), and if calculated how the value is derived. Business Rules are the foundation for creating complete and defensible data. Business Rules clearly define the process for collecting, validating and standardizing data, making the process reliable. The process is repeated for each attribute and therefor traceable.
Import Data into AIM System
RadixData is technology agnostic when delivering the results of an Asset Integrity Project. All information is formatted to meet the specifications of the AIM System the data is imported into. The Business Rules also define what and how data is delivered to the AIM System. Since Business rules define the standardization of data, how it is extracted, calculated, formatted and delivered, the results are traceable, verifiable and reliable.
The following are examples of standardized data definitions: