Looking Behind Big Data in Manufacturing – 8 steps to ML readiness

August 11, 2020

Collecting data from the manufacturing process is the key to the future success of smart manufacturing. However, there are some rules of thumb to keep in mind.

Machine learning (ML) technologies are used for a wide range of applications, from image recognition to factory automation. No matter the application, the common feature of ML software is the reliance on big data.

As ML depends on data, the benefits quickly disappear when you supply inaccurate or incomplete data. In a survey of executives across various industries, 75% of respondents were not confident in the quality of their data.  

Unlocking the full potential of artificial intelligence (AI) and ML requires a marriage of process and communication. Here are eight essential rules to remember for improving your data collection methods.

1. Ensure Your Data Is Clean

Duplicate entries and spelling errors can impact the accuracy of your data. Your datasets may also contain corrupted records, incorrect formatting, or incomplete data. The quality of the data that you use directly influences the quality of the analysis provided by ML technologies.

For example, when using ML solutions to monitor quality control tools and processes, you may obtain data related to corrective action requests (CARs), overall equipment effectiveness (OEE), and return material authorisations (RMAs). If this information is improperly formatted or incomplete, the ML software may not offer insight for reducing defects.

Data cleaning provides a way to scrub your datasets of corrupt or inaccurate records. This process often involves an audit of your data to detect anomalies and contradictions. The main criteria for evaluating the quality of data includes:

1. Validity

2. Accuracy

3. Consistency

4. Uniformity

5. Completeness

The data must conform to the constraints of your production technology. For example, certain machinery may only provide a set number of values or a specific range. This ensures the validity of the data.

Accuracy, consistency, and uniformity are often measured by comparing datasets against existing data. Detecting outliers and contradictions helps uncover potentially inaccurate or inconsistent data.

The completeness of the data depends on access to a wide range of inputs. This is one area that data cleaning may not address.  

2. Use Clear Naming Conventions

Some inputs require manual data entry, which creates the risk of typing errors. Using a clear naming convention ensures that anyone responsible for manual data entry knows what to enter.  

Adopting naming conventions also improves the quality of the data by ensuring that data remains consistent across multiple systems. For example, engineers from multiple locations may enter data that is pooled in a central database and analysed by ML software.  

A naming convention makes it easier for engineers to use the same format. Some of the components that may be included in a naming convention include:

1. Asset type – boiler, motor, HVAC unit

2. Characteristics – make, model, manufacturer

3. Location – site, building, floor, factory line number

4. Serial numbers or VINs

You may also assign value ranges and formats to maintain the consistency of your data. Unfortunately, naming conventions may still result in discrepancies when relying on manual data.  

One solution is the creation of a data entry interface. Giving users a simple interface for entering common values or selecting from preset entries limits the room for errors.  

3. Digitise Manual Data and Inputs

Inputs of the manufacturing process that are not digitally recorded limits your data sets. As mentioned, the completeness of the data impacts its overall quality. An incomplete set of data reduces the ability of ML software to detect patterns.

For example, an engineer may make manual notes on the factory floor. Without these notes, your data does not provide a complete picture.  

As mentioned in the previous suggestion, developing a data entry interface may help address this problem. Making it easier for engineers to enter manual notes allows for a more complete record.

Many manufacturers also rely on legacy equipment. AI technologies cannot automatically collect data from disconnected equipment. This requires additional manual data entry.  

Manual data needs to be captured to properly track your production technology. Legacy equipment may be upgraded or adapted to allow communication with AI solutions.  

4. Be Aware of Changes in the Manufacturing Setup

The insight provided by data science in manufacturing helps manufacturers improve their processes. However, making changes in the manufacturing set up during or after data collection can be critical to the usability of the data.

For example, you plan on implementing an ML solution that predicts end-product quality based on parameters from the material and machinery used during the production process. If you upgrade the production line with new machinery, multiple years worth of big data may become unusable.  

5. Emphasize on Data Connectivity

When combining several sources of data, connectivity becomes the important key point. A useful rule is to ensure that data is associated through a specific unique id. For example, it could be the RFID code of a single production unit going through the line that brings together equipment, environmental and quality data from each step of manufacturing.

Very often the most difficult connections can be the ones where data needs to be associated through specific time periods (timestamps). For this, it is important to be able to track backwards through the production line and identify which manufactured part was going through the production step at the specific time.

6. Select the Right Data Storage Solutions

How do you plan on collecting and storing data? Big data needs to be made accessible and well-structured. There are several main data storage solutions, each with separate pros and cons:

1. On-premises data storage

2. Colocation data storage

3. Public cloud storage providers

4. Private cloud storage providers

On-premises data storage is a major investment when it comes to big data. Manufacturers may need to build an entire data centre just to manage communication between equipment and ML software.  

Colocation provides a more efficient option compared to on-premises storage. Companies can manage their data using off-site data centres that collect data from multiple locations. However, this is still a costly process that requires additional IT resources.

Many companies use cloud storage to make data more accessible. With public cloud storage, data is collected and stored through a cloud service provider. With private cloud storage, companies maintain their own virtualised data centre, which provides more control.  

7. Implement a Data Warehouse

How do you collate your data? Implementing a data warehouse is an important step in the data collection process. It allows you to obtain data from a variety of sources and store it in a central database.

Many data warehousing solutions also include a staging area. After collecting the data, it is stored in a temporary database where it can be cleansed and validated. This provides a way to maintain quality data before adding more information to your existing datasets.

8. Use an Up-to-Date Data Dictionary

After establishing naming conventions, rules, and constraints for data collection, you need to maintain an updated data dictionary. A data dictionary provides engineers with detailed information about the contents of datasets.  

Following these rules can ensure that you supply machine learning (ML) software with consistently accurate data. Manual data entry errors, incomplete data sources, and improper storage can keep you from enjoying the full benefits of ML and AI in manufacturing.

The bottom line is that smart manufacturing depends on data readiness. If you want to increase product quality, track daily production, or improve predictive maintenance processes, focus on your data collection methods.

Feel free to also check out our other posts:

Experience-Based vs. Fact-Based Decisions: How AI/ML Is Changing the Way Engineers Work

Machine Learning Projects in Manufacturing: Expectations vs. Reality

The Future of Six Sigma — Machine Learning Is Redefining Industrial Precision