How To Identify Unknown Features In Machine Learning

What is a feature in machine learning?

A feature is a measurable property or characteristic of an event you want to predict.

But, what happens if you have missing or unknown features for the event you want to predict? What if these features are crucially important to make accurate predictions?

Let’s look at a concrete example:

Suppose your goal is to predict whether a pipe will break/collapse due to erosion.

In this example, erosion can be defined as the process by which the internal surface of a pipe deteriorates due to the abrasive action of moving solid particles in the fluid.

This is an important prediction problem because you can anticipate structural problems before they occur, and take action accordingly. There are a number of variables (or features) that contribute to this pipe erosion process, depending on the specific scenario.

For this example we will use a structured dataset (rows + columns = database table). One possible structured dataset to address this problem could include the following columns:

  • Type of fluid, such as water or oil
  • Speed of the fluid in the pipe
  • Pressure of the fluid in the pipe
  • Temperature of the fluid in the pipe
  • Impurity content, such as minerals and other compounds
  • Erosion level? (This could be the label – the quantity your ML model will predict)

Each row in the table could represent the state of a given pipe at different points in time. This is an example of time series forecasting.

feature is any (relevant) characteristic of the pipe erosion process, represented by a column in a database table, as seen above.

In supervised machine learning, we always have both features and labels. A label is the specific property/feature you want to predict using a machine learning model.

For our example, we can choose a measurable property of the pipe erosion process as the variable we want to predict. Since our goal is to predict pipe erosion levels, this is a regression problem (regression = prediction of a numerical value).

Once we are able to accurately predict the level of erosion in a pipe at any given moment in time, we can create a process or protocol to anticipate any issues. For example, if the erosion level reaches a certain value, we may trigger repairs, replacements, or simply bake the solution into the original pipe from the beginning (perhaps use different materials that withstand corrosion better).

In my previous article titled, “The Most Important Element Of AI Adoption” we covered why domain knowledge is the most important factor to the success of any AI/ML project.

Here, the domain knowledge required to properly address this business problem is a subset of structural engineering.

But, what if you don’t have this domain knowledge? What could go wrong?

Suppose a given pipe erosion process involves more features that you thought. For example, what if there is an additional sediment, chemical, or compound present in the fluid that contributes to the erosion process significantly?

What happens if the software/data engineers never realized it and even failed to collect that data?

This means you have a structured dataset with missing/unknown features that may be critical for solving the business problem correctly. This is another example of the crucial importance of domain knowledge when developing machine learning solutions.

One of the best solutions is to create cross-functional AI/ML teams involving technical experts AND non-technical subject matter experts that have a deep understanding of the business/problem domain.

If they have a strong understanding of AI/ML fundamentals, even without knowing how to write code, subject matter experts can even tell if you have the right dataset for the right problem. This adds immense value to AI initiatives.

Ask the domain experts if you seem to be missing any important features/characteristics of the events you are trying to predict. Explore the datasets together and give mutual feedback.

The communication between the technical and non-technical team members is paramount to ensure you are solving the right problem, with the right dataset. Have them learn from each other: Have AI/ML engineers ask the right questions to the business leaders, and similarly have business leaders increase their AI/ML knowledge by learning from the ML engineers.

One of the most important roles in any AI/ML team is a project/product manager working at the intersection of the technical and business domains to make sure this communication/interaction is effective.

Having a cross-functional, high performance AI/ML team is essential to successful AI adoption in the enterprise.

If you need help to accelerate your company’s machine learning efforts, or if you need help getting started with enterprise AI adoption, send me a LinkedIn message or email me at info@carloslaraai.com and I will be happy to help you.

Subscribe to this blog to get the latest tactics and strategies to thrive in this new era of AI and machine learning.

Subscribe to my YouTube channel for business AI video tutorials and technical hands-on tutorials.

Client case studies and testimonials: https://gradientgroup.ai/enterprise-case-studies/

Follow me on LinkedIn for more content: linkedin.com/in/CarlosLaraAI

Leave a Reply

%d bloggers like this: