Database Systems (IS353) Blog: Data As An Inhibitor When Building AI Models

Tuesday, April 6, 2021

Data As An Inhibitor When Building AI Models

This article emphasizes the importance of having good quality data and strong practices around your data—like precision and timeliness—before starting to build efficient and accurate AI models. Although I don’t fully disagree with what the author is saying, I do believe a few key points need to be added. First, not only can AI can help companies to improve customer experience automate business processes, but the outcomes from AI help to inform the business and guide the business stakeholders in their decision making and overall strategy. Some examples of these decisions could be hiring, supplier sourcing, and more. Next, it is not flawed data itself that creates the possibility for biased outcomes of AI models. It is the types of data being used to train the models, and the ways that those features impact the models, that lead to bias. Some issues that flawed data could create are extra cost, increased time and effort, and reduced efficiency for data scientists and engineers. It takes time, resources, and money to access and clean up the data.

Additionally, there are more practices around making sure that you have good data to feed into your AI model than mentioned in the article. You must first be able to access it, which is a big part of the overall challenge, and you have to know how to interpret it. There are several access methods that one could use, for example database APIs or some kind of data feed like ETL. For interpretation, you could use reverse engineering through discovery. You will of course need to find out the database API or the data feed type, or even make an assumption, to be able to use these methods to access and interpret it. When it comes to data quality, Natural Language Processing (NLP) may be used as a way to structure the data so it is easy to consume.

Other unmentioned inhibitors to performance when working with the data that goes into AI models include dependencies on other teams, varying institutional knowledge, the speed of change, and where the data sits. For these, you need strong practices around governance, knowledge transfer, flexibility, and data harmony. It is very common for data to be spread across various data sources, or even across regions. This could create complexity and slow the user down when trying to access and harmonize it, not to mention the regulations they would need to comply with.

For these reasons, is critical to be well equipped to manage the constant fast-paced changes, lack of harmony, and the level of quality of the data going into your AI model. No human will be able to keep up without having the right tools and experts in place, and if you’re going to be successful, the right software investments must be made. It is not as simple as understanding how the data is gathered and cleaned as the author says—first and foremost you need visibility, the right tools could help get you there.

https://insidebigdata.com/2021/04/02/new-to-ai-adoption-dont-let-data-be-your-achilles-heel/

1 comment:

Cole SpallittaApril 15, 2021 at 11:13 PM
We saw the consequences of AI firsthand on a very small scale when we were organizing our data for the web advisor project. If we had not checked our work, we would have had tables full of missing and incorrect data. This would have been detrimental to objectives had we continued and not corrected the mistakes that AI made. This relates to the article by showing that bad data can make or break a project or even an entire company.
One of the most important points that the author makes is that companies should not hesitate to spend extra time on gathering and organizing data. Its importance cannot be understated. The article gave an example of how when data changed during COVID that AI was not as effective because the data was inaccurate due to changes in preferences. Obviously the pandemic wasn’t in any companies’ forecast and they weren’t expecting data changes, but there are other examples where companies could have had enough data to predict a changing environment but chose not to. One example of this led to the demise of Blockbuster. Blockbuster had the opportunity to buy Netflix, but they declined. Their data showed that people enjoyed the social aspect of going to the store, looking through the aisles, and finding a movie. This assumption proved to me horribly wrong as convenience became much more important to customers than any social factor. They didn’t put enough effort into data collection when determining if people would be happy with a new way of getting their movies. This proved to have devasting consequences for Blockbuster because they failed to collect accurate data and did not predict changes in customer preference. Each company should put extra emphasis on data because business environments can change quickly and if you are not prepared it could mean the end of you company.
In the spirit of predicting future events, I agree with you and the author about what steps need to be taken. Blockbuster missed the mark on transparency, timeliness, precision, size, and the other attributes you mention. Given the examples in the text, I think size and timeliness are the most important. While transparency and precision are obviously almost just as important, if you don’t have the right size data set at the right time then transparency and precision don’t matter because it will be outdated.
Accurate data collection is necessary for AI to be effective. I have seen other blog posts discuss the same bias resulting from AI with bad data that this article discusses. It is a common problem and many times the bias is inaccurate. Companies need to anticipate the degree to which data will change when the pandemic ends. They need to figure out if it will go back to the way it was before or will preferences be more similar to how they were during the pandemic. This will allow them to have accurate AI if they acquire the proper data and precisely interpret it.

ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.