The robotics team had a great idea. Why not use self-driving car technology for a task that was essentially easier, and could make money on the way to autonomous nirvana? And, as often happens, they were pitching me for investment. A solid and practical team, I liked these guys for their clever concept: to build self-driving (“robot”) industrial floor scrubbers: those machines you see at midnight trolling around clean airports, office building, shopping malls, and the like. 

Yet the chief scientist said they weren’t quite there. You see, he hadn’t solved a critical problem: what happens if the scrubber approaches an obstacle that it can’t recognize: a doorway, a child, a puppy? It’s important that our auto-scrubber turn around in these circumstances. 

This was going to need extra research, and new software development. And I don’t like to invest when there are unknowns this big.

But I recognized a pattern: an over-focus on full autonomy. So I said, “You know as you framed it, this doesn’t sound like a great investment. But let me ask a question. Do you have cameras on this device? What if a remote person could handle the occasional situations where you’re approaching an unknown obstacle, and then a human could steer it?”

Surprised for a moment, after a pause they answered, “Yes, of course”, and went on to implement my idea.

Why hadn’t the team thought of it already? Because they were facing what I’ve seen more often than not in the companies that pitch to me: a way of thinking that comes from an AI culture that hasn’t yet made the transition from an academic to an “art of the applied” mindset. Not having to go all the way to market—and with a reward structure based on publications and grants, and where “applied” can be a dirty word”—this academic-based culture lacks the tempering that arises when Kaggle competition ideals run up against production implementation (and startup market) reality.

Since my early days at NASA, and up to the present time when I have the privilege of advising AI innovation teams worldwide, I’ve seen this pattern arise time and again.

Why you might want to only partially automate an AI system

As I learned at NASA, the problem with automating everything is that by taking humans out of the loop you lose the opportunity to catch the “unknown unknowns”: those situations that are not “happy path”. We repeatedly found that, when we attempted to fully automate a subsystem, the complexity and risk exploded. Why? Because we had to program in all the ways that things could go wrong. This is not an easy task in something as complex as a space vehicle! And, “Artificial General Intelligence (AGI)” clickbait notwithstanding, the task of understanding a complex system, with constraints and cause-and-effect elements, is not a strength of AI systems, then or now, compared to many human experts.   

One way to understand this: AI systems are only based on data. And this data only comes from the past behavior of systems, whether they’re human systems, spaceships, or self-driving floor buffers. Future situations, which require an understanding of systems and how they operate in novel situations, are not the natural domain of AI systems, unless you’ve built a great simulator to capture all possible failure scenarios.  

I’m reminded of Jack Swigert, command module pilot aboard Apollo 13, whose swap in at the last minute turned out to be a very lucky accident. The reason: Swigert had a deep knowledge of the very systems within Apollo 13 that ended up failing. Swigert ended up being a critical resource in resolving these problems and bringing the Apollo 13 astronauts home.

So at NASA we ultimately learned to be judicious when we looked for what makes sense to automate and what not.  This is a lesson that AI teams must take on board.

The specific situation where a fully automated system is the only one that’s considered is easy to understand: indeed the most obvious AI successes around us are such “dark room” systems, whether it’s Netflix recommending a movie, Pandora suggesting a song, or Facebook or Google showing an ad. But we’ve been a bit fooled by these early successes. Indeed, as we go forward, the vast majority of AI systems will, ultimately, include a human in the loop: a substantial departure from what we’ve seen to date.

From internal to external measures of success

Yet the question of autonomy versus human-in-the-loop is just one of many questions that arises only when we actually try to take products to market. Here, we’re optimizing a fundamentally different function: not the number of publications or grants but rather usually return on investment (ROI). And don’t be fooled: this leads to substantially different optimal behaviors when we build AI systems, based on fundamentally different benefits, constraints, and limitations.

Luckily, myself and a handful of my colleagues have been encountering these constraints for decades: we were the pioneers who chose to jump the “applied” hurdle.  

Another example comes from another company I advised: a medical diagnosis company that built an AI to detect cancer in pathology slide images. I asked them one of my applied questions, “What performance do you think you need to achieve in order to be successful and go to market?” They said well, we think 90%. I asked, “Why 90%?” And they told me that this was a benchmark set by an industry group, as opposed to having market tested the number.

I said, “Well, how are you going to measure that 90%? And after they scratched their heads for a while, I realized that they hadn’t thought clearly about the mixture of false positives (“detect” a cancer when there is one”) vs. false negatives (fail to detect a cancer when there is one) in that 90%. This kind of task needs to be focused much more away from false negatives and towards false positives and they hadn’t thought this through.

Why? Because missing a cancer cell on a slide is a big problem, and saying a cancer cell exists when it’s not actually there is a much smaller problem. The latter might require some extra medical tests, but it’s not going to lead to a fatal result. 

This illustrates another widespread pattern within applied AI: data science courses don’t usually teach us how to translate from internal measures like AUC, the area under the receiver operating characteristic (ROC) curve, to the numbers that will actually represent value to a customer, profits to a company, or ROI to an investor. Failure to understand problem-specific costs and constraints like these is another area of classic AI mistakes.

I’ll have much more to say about these and related points as we go forward with these posts. I’ll talk about the lessons learned from both successes that I’ve had, but also the situations where I have been tripped up or outright fooled—I started in academics myself!. And these lessons were not obvious at the beginning but come from having practical experiences running into these constraints that come from applications and AI systems that solve important problems.

In the next post I’ll begin to introduce a systematic way to think about the art of applied AI, as I describe how to assess whether an AI project falls within the Goldilocks zone of likely success.  Then I’ll introduce the Crucial Questions framework, which forms the outline of the remainder of this book.