AI’s Hidden Humans
An Interview with Scale’s Alex Wang
Artificial intelligence might seem like magic, but it takes an army of human data labelers to do the trick. Globally, an army of distributed workers do things like identify what’s visible in images, helping train machine learning algorithms to navigate the physical world.
Their tasks may appear monotonous, but Scale CEO Alex Wang says that for many, it’s welcome work. Even as machine learning matures, there will still be a vital role for people, Alex believes.
Here are edited highlights of my podcast interview with Alex about teaching cars to drive, why software is not static, and how data labeling offers opportunity to people around the world.
Driverless ed
As I understand it, Scale provides labeled data for artificial intelligence and machine learning. If I’m building a self-driving car, I need to teach it to recognize things. This is a stop sign, this is a truck, this is a bush. The car’s AI requires labeled images to learn. I go to Scale because your company has people and technology to do that for me. Correct?
That’s exactly right. Our mission is to accelerate the development of AI applications. Many companies hit certain bottlenecking snags when building machine learning and AI products. Getting large, cleanly labeled data sets is a big one. It’s one of the dirty secrets of machine learning: we need massive amounts of data to train state-of-the-art machine learning algorithms to perform as well, or better than, humans.
For example, the data set that Google used to train Google Maps to recognize house numbers, was about 10 billion labeled images. That’s the piece that we solve for our customers today.
How is this work structured and — pardon the pun — what’s the scale of people who work on this?
We work with a lot of the leading self-driving companies, folks like Lyft and Zoox and Toyota. Our platform itself is already scaled up to tens of thousands of people who work frequently on our platforms. Globally, in the whole self-driving industry, there’s hundreds of thousands of people employed to help these algorithms learn.
But we’re in the early stages of the technology. For that growth to be sustainable — and for it to even be possible — as the technology grows, we need to automate as much of the labeling process as possible.
If we’re going to have any chance of deploying true self-driving cars, we need to be as efficient as possible while not sacrificing quality. These cars will make life-or-death decisions. It’s so crucial that they’re trained on high-quality data.
Getting better all the time
At this interim stage of the development of AI, are human data labelers only needed as data labelers because machine learning isn’t good enough to do it automatically? Will there be a point when there’s just a few humans doing spot checks here and there? Or will hundreds of thousands of people do these jobs in the foreseeable future?
That’s a really great question. There are misconceptions about it, even within the tech industry. Some of the oldest machine learning efforts in the Valley — like Google search, Facebook ranking and Facebook moderation — still require a significant amount of human input on an ongoing basis to ensure that the models are performing better and better.
When companies deploy self-driving cars, it will still be really critical to have human input, tens of thousands or hundreds of thousands of people, to ensure that these algorithms are behaving correctly and continuing to get better.
I draw this analogy: back in the ’80s, people had the misconception that software engineers needed to build just a few apps: notes, email, calendar. And then we wouldn’t need any more engineering. If they were built well, people would use them.
As we now know, that wasn’t the case. There are many areas where custom software makes a big difference. And you can always make software better.
We’re at the same early stage with machine learning. We’ll always need to make self-driving algorithms better. We’ll always want to improve automated checkout models like Amazon Go.
What’s more, the number of applications will keep growing. Right now, even though it’s hyped up, people really underestimate the full impact of machine learning: both the number of applications and the true amount of work that will be required for each.
Engaging a truly global workforce
Let’s talk about the people who label this data. They often hail from underdeveloped countries or other places that lack economic opportunity. There’s a narrative that these employees are better off doing eight or 12 hours a day of rote, tedious, monotonous data labeling, rather than manual labor, for instance.
But I wonder what you think about that. The monotony affects performance and the quality of the data. From both an ethical and a practical perspective, do you build things into your technology that help people do this job in a more engaging way?
Definitely. The first thing is making data labeling into an engaging occupation. One of our inspirations is actually video games. Words with Friends or Candy Crush seem pretty monotonous at first. Yet people play them for hours and hours. That’s because they’ve been built to be enjoyable experiences where you can always get better. There are always ways for you to be creative.
We try to do the same thing with data labeling. From an engagement perspective, we’re constantly trying to show people the areas where they can improve, or share tricks for working more efficiently. We try to inspire people to get excited about the work and make it rewarding.
Then there’s the other piece: providing occupational opportunities. It’s hard for us in America to grasp this, but there’s very little industry in many places around the world. Globally — and this probably will continue to be true for quite some time — there’s really a dearth of opportunities for most people in the world.
Usually, there has to be some physical presence — a factory is built near a village, for example, for people to have jobs. Now, anyone who has access to the Internet can access work-from-home opportunities like Scale. It allows these areas to progress much more quickly.
We have stories about people who have worked on our platform who’ve gotten medical attention for themselves or their parents that they couldn’t otherwise. Or they were able to build their dream home. As our platform scales and as the Internet scales in the world, many people are getting access to jobs and employment opportunities when otherwise they simply wouldn’t.
Your relative youth is part of your story. You were a tech lead at Quora as a teenager. Forbes included you in its 30 Under 30 for enterprise tech. You’re still in your early twenties. How does that affect your work?
It doesn’t affect the company much. We have our mission: to accelerate the development of AI applications. We have customers we work very closely with. We have a job to do. We’re very thankful for the opportunity to work on such an important problem. That’s really the core of our work, day in and day out.
Don’t miss the next podcast episode! Get email updates.