Wei Wen, PhD student
Researching deep learning models with Hai "Helen" Li, Co-Director of the Duke Center for Evolutionary Intelligence
B.S. and M.S., Electronic and Information Engineering, Beihang University, Beijing, China
Why is deep learning so popular right now?
The straight answer is that deep learning achieves human-level or even super-human prediction accuracy in very challenging applications across computer vision, speech recognition, natural language processing and games.
In recent years, we’ve had a lot of breakthroughs in deep learning, for three main reasons. First, we humans create a lot of labeled data as compared to decades ago—for instance, lots of images with labels of object categories. Second, there is more powerful hardware. In the past, it would take forever to train a deep neural network model. It wasn’t efficient. But then the GPU came along and gave us more power to train the model. Finally, we better understand the model that we use now, the neural network, which is more powerful than other machine learning models and gives us the flexibility to learn any kind of mathematical function.
What are the practical applications of research in deep learning?
One application is computer vision, which relates to the detection and classification of images and videos. In self-driving cars, this means detecting red lights or green lights, or pedestrians. Another application is natural language processing, like a recommendation system. Say you are shopping online—it can analyze your shopping history and interests, and make recommendations for products in which you might be interested. Also, speech recognition—when you speak one language, it can interpret and translate to another language.
This is a broad field, it seems. What specific projects are you working on?
The deep learning model has two phases: training and inference. We train the model with labeled data, and then do inference, or perform the task. I have projects in both phases.
First, there’s the training phase. We have a lot of data to train with, and the model is very large, so we split the data among servers in the Cloud and train in parallel. The problem is, we have to synchronize between servers and average the models to get the final model. When we want to synchronize, we have to communicate over the network, and the network is slow— the communication is the bottleneck, and the training can take hours. So my first project is to reduce the communication time, taking it from hours to minutes.
How can we efficiently deploy big models to small devices? One of my projects in this area is to compress the large model so that it can execute tasks faster, without losing intelligence, and also consume less energy.
Second, if we deploy a powerful model to mobile devices, the computing speed becomes very slow because the model is so large. It can’t do real-time image classification or speech recognition, which is one problem. And, computation uses a lot of energy, which is another problem because mobile devices have limited energy and are quickly drained. The problem is not limited to mobile devices—any device on the Edge, like a drone or a self-driving car, has to carry its own power supply and has limited resources—it cannot request service through the Cloud. It must operate independently from the Cloud for privacy and reliability. So, how can we efficiently deploy big models to small devices? One of my projects in this area is to compress the large model so that it can execute tasks faster, without losing intelligence, and also consume less energy.
What’s a typical day for you?
It’s pretty simple! I go to my desk, read papers, do programming, implement my ideas. We have group meetings, one-on-one meetings and discussions with professors and students.
I’m a TA for “Introduction to Deep Learning,” so I also have office hours. It’s different to teach than to do research—you have to express ideas that you know well, very clearly and concisely. Sometimes I think I’ve already known basicsvery well, but when I try to explain, I find that there are things I’m not 100 percent sure about, and I have to find the answersand learn new things.
You had a couple of interesting internships while you were here. Can you talk about those?
I interned with Microsoft Research in Seattle one year ago, working on efficiency for deep learning inference. Microsoft has many powerful machines in the Cloud, but handles trillions of AI requests. With that number of requests, they still have efficiency problems and need to compute faster.
And this year, I interned at Facebook Research in Menlo Park. Facebook trains models from data, and is interested in training efficiently in the Cloud. I also got chance to contribute to open source AI frameworks at Facebook.
I incorporated my research into industrial AI productions for both organizations.
What are your future plans?
I’m struggling to decide. I could go to industry or academia. In academia, I could have more freedom to work on what I’d like. Here at Duke, I have a lot of freedom. My advisor allows me to do any kind of research I want to do, and that’s been a very good experience.
I’d continue my two lines of work, but also do some work on trying to understand how machine learning systems work. One big problem in deep learning is that we keep feeding data to the big model, but we don’t know what is happening inside of the model. We’re human, so we want to understand what the model really learns.
For example, in medical imaging,a model can identify that a patient might have cancer—but right now the model cannot explain very well how it came to that conclusion. The model has very high accuracy, but the doctor involved has to know why the projection was made, and the model cannot tell us why. The data the model uses is in such high dimension that it’s hard to parse—it’s just floating points, just numbers. It’s a very interesting research question, and we have to solve it.
What do you see as the strengths of your PhD program at Duke?
The most competitive thing about Duke is that we have the top experts in the world.
The most competitive thing about Duke is that we have the top experts in the world. And we have proximity to the Research Triangle; we are not isolated in our research. I attended Triangle Machine Learning Day, where there were experts from three research universities as well as researchers from companies like IBM. We mingled, presented research… there is a Triangle research culture here that is awesome.