Getting a foot in the door as a young data scientist

sibmike
4 min readNov 27, 2020

--

As an international student, I expected that finding a Data Science internship during the COVID-19 lockdown would be extremely hard. Yet, I was lucky to get one. A Physics Ph.D. has sent me a message: “Hey Mike — I’m looking for data scientists.” I replied that I want to hear more. We have talked over the phone for 15 mins, and I got an offer. That is too good to be true, but such things happened to me before.

The first time it happened to me when I got a Research Assitant position while working towards my bachelor’s degree. An econometrics professor asked us to collect data for our regression project. Mine was on real estate pricing. I have tried to collect data from Zillow manually, but homes have dozens of attributes, so my best speed was around 30 homes per hour. Automating the process seemed like a good time investment. With python, private proxies, and Selenium, I have increased data collection speed to 30,000 homes per hour. I have also made a pipeline to clean the data, remove outliers, fill missing values, calculate a few powerful features, and submitted “data-acquisition-report” to the professor. Next class, she offered me a research assistant position to build an ETL pipeline for her research project. I had full freedom in creating the pipeline, from choosing data sources to database architecture to ETL tools, but a very modest budget. To meet the budget constraint, our team had to be creative and savvy. But in the end — instead of paying tens of thousands to third-party data providers each month, we build a streaming data pipeline that worked 24/7 and cost us just $375/mo. As graduation approached, I had to start looking for my next place.

The second time, I got lucky after my graduation. I knew the job search would be challenging, so I tried to get prepared ahead of time. My plan A was to get a job. I have grinded LeetCode, studied SQL, and started competing at kaggle. I have sent hundreds of personalized resumes and went to dozens of interviews without any success. My plan B was to create a real-estate startup based on my research project and self employ. As part of plan B, I went to Ycombinator Startup School and had been networking as much as I could, trying to validate or invalidate my startup idea. Once, I pitched my startup idea to a real estate investor; the guy has not been impressed with my pitch but was impressed with my Business Analytics background and offered me a job on the spot. Next year I have learned a lot about business, negotiations, and real estate investments. I have also helped identify and remove the company’s bottlenecks and double its revenue in less than 12 months. However, my goal was to become a Data Scientist, so I had to move on.

I have been lucky to get admitted to the SFSU MSBA program, which has a spot on curriculum and great professors at a reasonable cost. I was unsure if I would get lucky one more time, so I decided to start preparing for a job search from day one. Throughout my first semester, I focused on building up my network, refreshing LeetCode, and DS skills. My LinkedIn went from 6 connections with family and friends to 2,500+ connections with kaggle masters and data science professionals. To prep for code interviews, I have joined a LeetCode prep group where Data Scientists from Lyft, Uber, and other local startups spent Saturday mornings doing 8hrs code marathons. I took Timeseries and demand forecasting classes to refresh Data Science skills and joined the M5 kaggle competition to make sure DS muscles are still there. The competition ended in June, we got 32/8000+ in Accuracy, and I got a solo silver in Uncertainty. The plan was to join a deep learning competition, but I got a message: “Hey Mike — I’m looking for data scientists.”

During the interview with my future boss, we have mostly talked about data solutions, my work experience with the research team, and my recent kaggle competitions. I had code examples at kaggle and GitHub, so I did not have a coding interview. (LeetCode grinding is yet to pay off.) Despite my previous experience, I have not been ready for new tasks. I had to learn very quickly: git, docker, CI/CD, FastAPI, unit and integration testing, async load testing, microservices architectures, GC and AWS best practices and architectures, adversarial feature selection, DNN concept drift monitoring, domain transfer problems, as well as quite a bit on physics and optics. In my day-to-day work, I spend 80% building well-documented and tested data pipelines, 10% on data cleaning and feature engineering, and 10% on modeling/fast prototyping. I also present and explain technical decisions to the CEO, CTO, VP of AI, communicate with software engineers, scientists, and bio-statisticians. Work at a fast-paced startup is probably the best place to learn and apply new technologies and best practices. During the internship, I do not have time for kaggle anymore, but I try to practice LeetCode whenever I have time as I feel that it will help me get lucky next time.

It took me over three years of constant effort to get a foot in the door with the Data Science industry. This is considered slow, but I still feel lucky as luck has helped me a lot. I have just tried to maximize the probability of getting a chance by trying all ways of finding a job. And I tried to maximize the likelihood of seizing the opportunity by maxing out relevant skills. Where there’s a will, there is a way.

As graduation approaches, my plan A is to get a full-time job; my plan B is to start a demand forecasting or real-estate investment startup and self-employ :)

--

--