“As a rule of thumb, you’ll spend a few months getting to 80% and something between a few years and eternity getting the last 20%.”
Chris Dixon in “The idea maze for AI startups”
It may seem obvious to some—but bears repeating for others—that our AI product needs to satisfy our customers’ needs to be successful. As I described in this previous post, it is essential to define required accuracy via a well-defined and measurable objective function aligned with customer needs. Internal measurements like ROC curves often require some translation into units more relevant and understandable to the business. Measuring accuracy in business terms is the foundation to ensure that the system is accurate enough to be successful, understanding the consequences of different kinds of inaccuracy, and exploring how to increase the accuracy or cope with the accuracy we have. Here are six best practices:
- Understand precision vs. recall tradeoffs and their impact on solution design.
- Understand the payoff matrix for quality outcomes. This is, simply put, the cost or benefit of a false positive versus a false negative.
- Use a mix of humans and automation to increase quality.
- Lower expectations (if possible) to reduce demands on quality beyond what is achievable.
- Find a target use case or market segment that enables higher quality through focus.
- Find a target use case or market segment that tolerates lower quality.
To understand these strategies, let’s start by understanding how an AI system can be right and wrong. Consider a system that predicts something like, “is a medical scan normal?” or “will a customer churn?” (e.g. leave you for a competitor). Along with “true positives” cases where the AI and reality agree that the answer is “yes”, and “true negatives” where they agree on “no”, there are two ways the AI can be wrong. A “false positive” occurs when the AI says “yes”, and reality says “no”. A “false negative” is when the AI says “no”, and reality says “yes”.
These errors are not equivalent; they have different costs, and the cost differences depend on your situation. Consider, for example, a churn management system. The cost of missing the opportunity to retain a high-paying at-risk customer (a false negative) might be considerably greater than the cost of applying a churn prevention strategy to a customer incorrectly identified as at risk (a false positive). This is even more stark in medical diagnosis, where most patients would rather tolerate an unnecessary test (for a false positive) than miss a fatal disease (a false negative).
Since all errors are not equal, it is often useful to design or tune a system to reflect this differential cost, leaning towards more false positives than false negatives. Here’s a deeper dive into the terminology typically used to express this strategy: we choose a threshold on the systems output such that it answers “yes” (sometimes called “triggers”) in such a way that its precision (the percentage of true positives) is typically high, even though recall may be low (there are many false negatives). This is called precision-based triggering: the system triggers often enough (incidence) to be valuable enough in the rare situations that it does trigger, for a given recall level. Often, we use human intervention downstream of the AI system to take a second look at flagged situations to reduce the false positive incidence; the mammogram triage system described in a previous post about combining data and knowledge is an example.
The AI in self-driving cars illustrates how to manage imperfect models
The quest to develop a self-driving car provides a wealth of illustrations of ways to lower expectations, improve quality by selecting the right segment, and to find segments where lower quality is acceptable. Here’s an example. Seeking to discover such pockets of business value in the face of imperfect quality, SAE International defined Levels of Driving Automation™ as follows—humans drive at automation Levels 1-3; automation is on most of the time for Level 4, and Levels 5 and 6 label fully automated driving conditions. SAE considerations include what driver support or automated driving features are available, and under what conditions the vehicle can safely operate, with limited conditions for automated driving at Levels 4 and 5.
There is not a widespread understanding of these tradeoffs. I was once part of a product strategy session around self-driving cars, working to identify most valuable use cases. Unfortunately, the organizers lacked the above distinctions: they made the assumption that the cars work perfectly, so that people wouldn’t be overly focused on today’s engineering problems. However, this simplification meant that this exercise was largely futile. Of course the best scenario for a self-driving car is when it works perfectly: A car that drives itself! What’s not to love about this? The market opportunities are endless. But that’s not a realistic goal. We need to be creative when systems aren’t perfect, and to build systems that provide value despite their limitations.
It’s worth going into some detail here, because there’s been considerable thinking about how to organize the systems (technical, people, regulatory, and more) around the imperfect AIs within self-driving cars. As such, this use case acts as a model for other systems that embody imperfect AI models.
Here’s a quick review of where autonomous systems stand at this time:
Only self-driving in a geo-fenced area: On November 1, 2021, Cruise founder Kyle Vogt was the first passenger in a Level 5 driverless robotaxi in San Francisco. The passenger sits in the back seat, and there is no provision for the vehicle to ask the passenger to take control. Cruise is currently geofenced to certain areas of San Francisco, permitted to operate only between 10 pm and 6 am, and cannot yet charge for rides, although it has applied for a permit to do so.
Only self-driving on freeways: Cars with Autopilot in 2021 reviews a number of Level 2 systems that provide adaptive cruise control and lane centering on mild turns mostly on freeways. Tesla’s system will even navigate on-ramp to off-ramp. These systems all have lowered expectations by requiring an attentive human drive and include ways to alert an inattentive driver.
Only self-driving at low speeds: In March of 2021, Honda announced a level 3 Traffic Jam Pilot. It’s designed for slow speed stop-and-go traffic and allows the human to relax and read a book or watch a movie until alerted to take back control when moving out of the jam. Honda is proceeding slowly; the announcement limited this feature to 100 cars.
Only self-driving on private property: Self-driving vehicles have found sweet spots in several applications on private property, where there are fewer regulations, fewer or no human drivers, and any humans that need to be near the vehicles can be given appropriate safety training. According to Forbes, Monarch will begin delivering “driver optional” electric tractors later this year. In March the Association of Equipment Manufacturers (AEM) reported that “Although the equipment sectors that AEM represents have historically been hesitant to adopt innovative technologies, they are set for a significant amount of change, as autonomous vehicles (AV) dramatically alter the nature of the work.” Examples from the article include “the world’s first fully-autonomous mine operation. It operates with fully automated trucks, loaders, and drills.” And in April Spectra concluded that, “automated forklifts have become a crucial part of modern warehousing and logistics operations. Working safely alongside humans, these robot helpers can take over the heavy lifting, leaving people to focus on more complex tasks.”
Only self-driving with special markings or in a special corridor: China has autonomous vehicle trackless trams operating in three cities. Qatar and several Australian cities have trackless tram plans in place. Providing a ride that feels more like light rail than a bus, the technology requires special road markings as well as LIDAR and GPS, and in some cases construction of a dedicated right of way or lane(s).The “hands-off” car won’t be in most garages this year, but driver support systems are making driving easier, and autonomous vehicles are coming into their own in restricted environments. Of course, the AI in self-driving vehicles is part of a larger system and the interactions of AI with the rest of the system are another source of errors and quality issues. I’ll look these in our next post. [NM1]Insert correct link