Justin Chen received his PhD in Physics from Rice University before deciding to make the jump from academia to machine learning. He spent the last few years as a Machine Learning Engineer at Manifold, building and deploying end-to-end ML and data pipelines from the ground up. He recently transitioned to Google, where he is currently on the hotword (“Ok Google”) team developing technology behind speaker identification and audio speech processing. Here, he talks about use cases, best practices, and what he’s learned along his journey into the field of ML.
How did you first make the transition from academia to machine learning?
It wasn’t easy. I think the challenge was understanding how best to showcase what I’ve done in a way that’s useful to the industry. So I shifted my resume from focusing on the research problems I solved in my PhD program to the mathematical methods I had to implement, the coding work I did, and the groups I had to organize to do it. This approach had much better traction and my success rate with landing job interviews was a lot higher after that.
What are some of the key skills you needed to develop to be a machine learning engineer?
I’d say being able to write code and then communicate about it with other people is one of the most crucial skills to succeeding as a machine learning engineer. Anyone with a computational PhD has the mathematical and programming background to thrive, but it’s more about navigating the different ways of working in a company. For me, the hardest thing to learn was having to produce code in between meetings, and more importantly, having to produce code with other people. This was a huge struggle for me, and I think a lot of people from academia might struggle with this too. I was very accustomed to coding by myself. I had never been through a code review—and certainly not one where someone cared about readability. At first it seemed like a lot of nitpicky suggestions, but I quickly realized that working in this industry is a very collaborative process. You can’t just do all the coding by yourself.
Your first role in machine learning was at Manifold AI. Tell us a little about your responsibilities there and some of the most interesting cases you worked on.
Manifold does AI consulting, partnering with other companies in various domains. They implement the entire AI workflow, all the way from data prep and data ingestion to deploying and monitoring models. For my job, I wore many hats and was everything from a data engineer to a front-end engineer doing dashboards (which I was not very good at) to a machine learning engineer.
The most interesting problems I worked on at Manifold were the ones I got to solve from end to end, because it’s actually pretty rare to do that now with all the frameworks out there. It was great to be able to work with data that wasn’t already organized in a way that was easily ingested into models and then design a system that could ingest the data, process it, create features, train the model, and then learn how to monitor it.
It was also nice to work in the healthcare space where there are interesting challenges around keeping data safe. You routinely deal with PII (personally identifiable information) and PHI (protected health information), and if you can’t even dive into the data to see it for yourself, how can you solve the model? Manifold did a great job of siloing, protecting, and working around those challenges.
What’s your advice for starting to build a model once you’ve finished the experimentation phase and how do you make sure it’s ready for production?
People outside the field are often eager to jump to this really cool thing they’ve heard of, like image recognition or neural networks, and they want to know if you can build a brain that does everything for them. And it might not even be the right place to start, because if you’ve never done basic ML models—linear regression, SVMs (support-vector machines), or random forest algorithms—you don’t know if neural networks will even help.
Oftentimes it’s better to get the data, fill the overall pipeline from end to end, and then start with a basic model and see what the baseline is. See what you can get with pure linear regression or pure decision trees. And then you can start to figure out where the model is underperforming and how to address it before you work your way up towards more complicated models.
It’s tough to know when something is truly ready for production. I think the biggest mistake I’ve made is spending too much time trying to perfect, refine, and retrain models. What I’ve found to be a good strategy is to identify one or two metrics that are relevant to your problem, and when you launch that model, compare it to what those metrics currently are with some baseline model. If it doesn’t do better in those one or two metrics, you should go back to the drawing board. I’ve been in situations where we tried to have five or 10 metrics, but we soon realized that it’s impossible to evaluate when you have too many metrics to look at. It will always do better at some and worse at others. So, probably when you find the most relevant ones, that’s when you’re ready for launching.
Can you tell us about your new position at Google and the differences in responsibilities from your job at Manifold?
My role now at Google is so much more focused on working on the ML part of things. Manifold was constantly building models for different partners and clients and I had to wear many hats, whereas at Google, I’m working on one particular problem and there’s a well established path for a lot of the basic DevOps and setups.
What are you most excited about in your new role and what kind of problems are you hoping to solve?
Without going into too much detail, I’m working on identifying speech actively and solving the interesting problem of doing it efficiently on small devices. For me, the most exciting thing with this particular problem is learning NLP (natural language processing) and audio speech processing and being able to work on these more advanced methods in an environment where I really get to focus on this one problem.
Can you give us an example of a challenge you’ve faced in your ML experience and how you went about solving for it?
A general problem we might face is around where to start because you’re handed so much data. You can’t possibly test all hundred features, and building a model this way is very hard. What works better is finding the subject matter expert in the company who already has some idea about what keeps users happy and what makes them unhappy. Find out what patterns they’ve noticed about when people do and don’t want to buy. Once you have those subject matter experts, you can narrow down the focus to some set of features, and then from there, the problem becomes more tractable.
We’ve talked about putting models into production and not being able to predict what happens. How have you tackled any situations where the inference data or model started to drift?
My biggest lesson from putting models in production is to put them in production on Mondays—because there’s no way to predict what kind of data’s going to be thrown at your model. The best part about testing a model in a test environment that replicates reality as closely as possible is that downstream consumers are not affected if something goes wrong, and if some weird upstream data comes in, you can catch it. But then eventually you put the model into production and monitor it for a while. Inevitably, some assumptions are going to be wrong and then your model could break. So the key thing here is monitoring regularly and having performance metrics on the data so that any aberrations raise an alarm.
Can you talk about your approach to AI fairness and bias tracing?
I don’t think there’s like a well established path to preventing bias in your model, and this idea of throwing more data at it doesn’t always work. I think you need to focus on explainability. I worked on a project where I implemented SHAP values, but no model is going to be perfect. Some are at least useful for finding big biases and slowly working them down.
In order to really tackle it, you have to have a human in the loop who’s actively looking for possible biases and actively trying methods to monitor these things, because bias is something that absolutely fails silently. You will have no idea that it’s happening unless you’re trying to find it, and even then, you’re going to miss some. And because bias itself is so subjective, it’s important to have a diverse group of humans in the loop because they’ll all see the problem uniquely and they’ll see different things that might occur to them. Having that kind of environment will always perform better than having just one person or a small effort trying to identify bias.
What’s your take on working for a startup versus a bigger, more established company?
There’s a certain excitement that comes from working at a startup and having to do things like code by the seat of your pants that you just don’t get at a bigger company. I did want to try that out, but I also wanted to grow in other directions. For me now, it’s been great to be able to focus on just one problem that is particularly interesting rather than trying to do tons of other things.