As I’ve explained in previous posts, many organizations are rushing to enter the AI fray, hiring teams of data scientists, coders, and project managers in the urgency to get into the game. But what many teams retain in terms of dedication and enthusiasm for AI projects, they often lack in understanding of the art of applied AI: a comprehensive discipline that I described earlier.
By way of example, I recently spoke at an AI conference with an audience of senior executives responsible for AI projects. I gave a presentation including many of the crucial questions I cover below. At the Q&A the moderator asked the audience: “how many of you have asked yourselves the questions that Barney has been talking about?” Nobody raised their hand, not even one person. This experience has played out in one way or another in many of my conversations over the years: there is a wide consensus that even the most basic applied questions are being ignored.
And, time and again, I’ve seen this gap lead to very basic mistakes. For example, I have seen many AI teams take pride in achieving 95% accuracy in certain medical classifications on their prototype system, and say they have a target to get to 99% accuracy with further work. But why is that the right accuracy target? Is your team building a model that predicts a rare disease?[mfn]1) Suppose a rare disease affects 1 in 1 million people. If a model simply says “no disease” to all patients, then, other things being equal, it will be only be wrong 1 in 1 million times. If the symptoms or other data make a patient 10,000 times more likely than others to have the disease, then the odds are only 1% that the patient has the disease. So always saying “no disease” still gives 99% accuracy.[/mfn] Well, then you might be able to get 99% accuracy from a model that always answers “no”. Your metric for success needs to be informed by the characteristics of the problem and the outcome that the model is meant to achieve.
Why do we make these kinds of mistakes? I believe it’s because of a widespread belief that understanding the technology combined with a high-level business need is enough, combined with a general lack of understanding of the subtleties and complexities of successful application of new technology. Many don’t know what they don’t know, a kind of Dunning-Kruger Effect of AI.
So a key mission of this blog is to provide a practical and comprehensive framework for this art of applied AI. Towards that end, here’s a list of the categories of questions I’ll be covering in future posts:
- Customers: As I introduced earlier, when teams start to develop an AI application, analysis of contextual questions and issues like “who’s the customer” and “what’s the addressable market” oftentimes get short shrift, if they are considered at all. It’s important to consider all market conditions, segmentation, and end-user targets for any AI application. Have you identified the core requirements for success? What are the use cases? Finally, for any AI project, you’ll need to understand the value that AI brings to customers.
- Capabilities: Be clear-eyed about what your AI solution’s capabilities are and what it delivers to the customer. Do you have the right mix of data, knowledge, skills, team, and technology to succeed? Do these resources align with the project’s core requirements? If not, then how do you acquire them?
- Alternatives and Competition: Also as introduced earlier, AI solutions need to be superior to any of the obvious existing alternatives, both from a competitive standpoint and from what the customer can do themselves.
- Quality: We tend to think that AI systems will either succeed or fail, but in practice, there’s a whole spectrum of outcomes from using a system, so its value depends on the performance overall as well as in very specific cases. This includes an overall accuracy threshold, and also how and how well a wide variety of error cases and “corner” (aka “edge”) cases are handled. So, it’s critical to understand what’s good enough, because that question will actually drive half of product design, and the true success or failure of the project. As is often the case in machine learning, you can probably achieve some lower accuracy percentage fairly quickly. But what happens when a task requires 95% accuracy? Do you know how much additional data is required—if ever—to get to your desired level of accuracy? It’s important to have a plan to close this gap. In addition, just defining and measuring the right quality attributes is hard and critically important for all future success.
- Usability: As we focus on core technology, usability can be overlooked. Ask: do the features and operations of the AI system make sense to the user? Does it deliver on their expectations? For instance, conversational interfaces and chatbots are often considered as examples of AI technology to enhance usability of a system. But like many AI applications, they present usability challenges of their own. How does a user know what kind of questions they can ask the system and expect to get good answers? I’ve found with several of the companies I work with that, if the answer is that it’s hit-or-miss and the user can’t figure it out, this is a huge problem. And for the inevitable cases where the system gets it wrong, how do the system and the user work together to detect and recover from errors?
- Trust: This is a major gating factor in the success or failure of AI systems. If users can’t trust an AI-based system, they’re less likely to use it. When we introduce AI into products like an autonomous car, trust becomes a crucial element. Can you trust whether or not you can take your hands off the wheel and trust the car to do what it’s supposed to do? What would a user have to see to feel safe, and what would the developer need to see to feel safe deploying this? Without a plan to ensure trust, and inclusion of all stakeholders, your system won’t only not get used, it also won’t be funded, built, tested, and supported.
- Bias is a closely-related concept to trust, because it can erode trust. A machine learning system is only as good as the data that it uses for analysis (which notably is not every field: some fields are much more important than others). If that data embodies biases or is otherwise not representative of the application domain of the system, then this can have devastating consequences after deployment. A good resource here is Weapons of Math Destruction, which surveys the potentially dangerous consequences of algorithmic bias for people, organizations, and even entire countries. I’m also seeing a number of AI and ethics entities emerging; for instance I participate in the annual Responsible AI/DI summit, which explores how thought leaders are overcoming these ever-shifting challenges.
- Centrality: How central is AI to the overall solution being provided to customers? Is the AI component so compelling that it can transform the business and lead to entire new categories for it (we call this AI-first)? Or is AI one feature among many? Best practice here is to ask how large the other non-AI elements of the solution are and how long these will take to build before AI starts to give value. It often turns out that the non-AI elements dominate the development and deployment cycle, greatly reducing the chance for AI to make a difference.
An AI deployment—like any project—can be thought of as a chain: every element must be in place to create a path to market. A classic mistake here is to over-focus on the AI or the data (single links in the chain) and to ignore others; conscious attention to the entire chain is a best practice.
- Human/machine division: Organizations implement AI applications to save costs, increase efficiency, increase reliability, boost profitability, and improve customer service. But, as I addressed earlier, AI system designers need to ask what is the optimal mix of human and machine. Many project developers think the answer is to be fully automated, but this is often the wrong decision. Various different levels of autonomy and divisions of labor lead to very different systems and success. Once you’ve evaluated and think you know what the right mix should be, go further to ask: what are the requirements that this decision about division-of-labor and interface design places on both humans, machines, and the interface between them. What are the options for degree of automation, and what are the most effective and viable human insertion points?
AI use cases often represent a unique and new kind of partnership between us and our technological offspring. It is important to take the relationship seriously, and to design the interaction well.
I’ll have more to say on all of the above as we move forward. I have found that answering these questions correctly is often a gating factor to success: they represent the leap that we must take as an industry to cross the chasm into the mass market: to treat application knowledge as seriously as technology and algorithms. Stay tuned!