However, as more services become automated, “computer says no” could mean being turned down for a job, a mortgage, or even healthcare treatment without any explanation. No laughing matter. It’s the responsibility of all data scientists — myself included — to ensure that the datasets used to train AI/ML models are accurate, complete, and unbiased. In addition to that, it’s also incredibly important that we make sure the factors used by algorithms to derive a decision are transparent to the people affected.
What’s coming next?
As more government, healthcare, and judicial systems become automated, I predict that we’ll see a shift towards explainable AI, driven by consumer wariness around the algorithms that underpin automated decision-making. We have already seen an example of this with the challenge to the UK government’s visa application algorithm mounted by Foxglove, a lobbying group that promotes ethics in the technology sector. I foresee organizations introducing AI governance, or a standardized level of transparency that will outline the factors and pipeline showing how an AI algorithm derived a decision or prediction. As automation becomes the norm, organizations that shroud their AI risk losing customers to those that offer transparency. However, consumers may not demand the same level of transparency from a product recommendation system as a medical diagnosis system, or the logic driving real-time decisions for autonomous vehicles.
How enterprises will respond to demands for explainable AI in 2020
Summarizing the findings from CCS Insight’s IT Decision-Maker Workplace Technology Survey 2019, analyst, Nick McQuire wrote, “The ability of AI systems to ensure data security and privacy, and the level of transparency of how systems work and are trained, are now the two most important requirements when investing in machine learning technology.” For certain applications, such as medical diagnosis, I believe that having the ability to explain the logic behind an algorithm will be a differentiator. For other verticals, the differentiator will be the run-time of the application, the frequency of fine-tuning and re-learning of the dataset with new data, and finally success metrics such as accuracy, true-positives and false-negatives.
How can we build transparency into AI models?
Bias and variance in a dataset is a common challenge that is reflective of real-world applications. As an example, car insurance companies have been challenged over their application of policy/premium automation in African countries, where differences in road infrastructure may result in good drivers being penalized for poor road surfaces. This fits into the scope of categorical bias, which, for a data scientist, is a design-related challenge. To overcome bias in data models, the first piece of advice would be to learn using a machine learning algorithm or statistical learning approaches, rather than deep learning. Usually, machine learning approaches are feature extractors and are great to work with when the dataset is small or categorically biased. Whereas, deep learning algorithms require a larger dataset that is categorically uniform. The second piece of advice would be to augment the dataset such that the categories are artificially created to obtain a uniform dataset. However, from experience, augmentation only works well for certain types of use-cases – and may negatively impact the learning process if it isn’t designed well.
Clean data aids trust and transparency
I foresee automated data cleansing, data analytics, and predictive analytics being required to enable enterprise users and consumers to trust the data that is feeding AI models. Using AI to master or cleanse data requires a subject-matter expert to validate the predictive decision. The best way to provide transparency to the predictive result is, in my opinion, to show the AI pipeline and the influential data points or data attributes that contributed to arrive at the predictive end-state. This could be shown as a visual key performance indicator, in the form of a graph or image. Deep learning posits that algorithms don’t necessarily provide truth, rather they explain levels of accuracy, wherein each layer extracts an abstraction of the input data and so every layer in a deep learning algorithm has its own function which skews the final predictive decision. Algorithms are designed to “weigh” the data points and come up with a flexible function (if you will) that can adapt to newer data points. The resulting prediction is a likelihood that is derived from combining and factoring different data points. Having clean, recent, relevant, high quality data is more likely to generate valid predictions.
Explaining AI, without giving away IP
What we observe in the AI domain is that organizations are investing much of their time and money on making AI more accessible to different personas in their consumer-base. As we continue to increase the stickiness and ROI of AI, we will start to get more visibility into automated and proactive suggestions that are put forward by AI (i.e. prescriptive analytics). Explainable AI will then follow which provides a level of transparency on how a predictive decision was derived. There are strategies that organizations can have in place to show a level of transparency for getting to a predictive end-state without exposing their intellectual property. The majority of consumers wouldn’t necessarily need visibility into the design of the AI algorithm (which is the organization’s intellectual property) but rather a foundational understanding of how that predictive decision was derived (like a link analysis). I believe it’ll be more beneficial for consumers to understand first which input parameters influenced the predicted decision, and second what were the combination of data points or attributes that yielded a higher likelihood for that predicted decision.
AI needs a human guide
Today, ‘black box logic’ threatens to become one of the dominant challenges for organizations as they grapple with automating processes using AI and machine learning. Automation is beneficial to any organization, but I believe that, given the current maturity of AI models, AI needs hand-holding and the subject matter must be kept in the center of the equation at all times. My advice is to use AI for suggestions and recommendations, and have a subject matter expert validate the outcome of the AI algorithms. The benefit to this strategy is that the algorithm is exposed to a larger data set, hence solidifying its knowledge-base, which will over time try to mimic the subject matter experts’ responses, yielding a higher accuracy.