How my practicum prepared me to kick off my career as a data scientist
After half-a-year practicum working as a data scientist with the world’s largest self-storage operator Public Storage, focusing on e-commerce web analytics, I grew fast and got a few new perspectives on my analytics career through the journey.
One year ago, I obtained my bachelor of law degree from Tsinghua University. However, I transitioned my career path from a liberal arts domain to a more quantitative role after I read Hans’ book Factfulness: Ten Reasons We’re Wrong About the World — and Why Things Are Better Than You Think. In this book, I got shocked by how my instincts were preventing me from deriving appropriate insights from the real world and I believed the root cause for my misleading instincts was that I lacked sufficient analytic knowledge to understand the world in a more objective manner.
So I decided to equip myself with the power of analytics to help me explore the world. I joined the M.S. Business in Analytics program at UC Davis, I got good grades on all my statistics and programming courses and kind of got the illusion that I was already well-equipped to fulfill a data scientist role. I didn’t realize the job for data scientists is far more complicated than just applying statistical theorems or mastering programming languages and algorithms until I dived into my practicum. If you are not cautious playing with data, your analysis could lead you even to much more biased results than brutal instincts, it can be fatal especially when you are providing key business recommendations to your boss.
To become a competent data scientist, I would say you must get your hands dirty from the very beginning of collecting and cleansing data, make some mistakes and learn from them. And there are three important lessons I took away from my practicum experience.
1. Focus on a primary business objective and set a proper project scope
It’s understandable that at the very beginning of any project, people are ambitious and tend to plan a bunch of deliverables. That’s what I once did at the beginning of my practicum project. Our main goal was to create a net-new customer segmentation for Public Storage, however, we set up too many side tasks which were not directly contributing to our segmentation and we didn’t appropriately assign time to all the tasks according to their priority. As a result, we became very tight on time in the initial phase. Therefore, it is really worth the time and effort to think about what is your primary business objective and what is within the scope of the project before you start to plan for details.
2. Exploratory Data Analysis (EDA) is more valuable than we thought
Exploratory Data Analysis is deemed as a quite elementary step for analytics and it is usually not demanding for statistics knowledge and light in coding. Therefore, people usually think it’s not so cool to spend much time on EDA since it’s hard to showcase their advanced and fancy analytics skills. I used to think that way as well. But the reality is it can be extremely useful and powerful throughout the whole process of data analysis.
At the initial stage of the analytics project, EDA could help us check data availability and data quality. Only by knowing what kind of data we have can we determine which analytics method is appropriate for our analysis and what kind of data preprocessing is required. Also, doing EDA can help us identify interesting patterns in data distribution with intuitive visualization, which enables us to form meaningful assumptions for our next steps. Overall speaking, doing EDA can get us familiar with the dataset and inspire us to dive into the dataset with clear assumptions in mind.
When we are applying more complicated analysis to the dataset, we could also leverage EDA to optimize and improve our modeling. For instance, we I am predicting whether a customer will make a reservation of Public Storage’ property, I found the recall of my model is quite low, which means that there was many false negatives. I suspected that it was due to some customers who intended to make a reservation had very similar patterns in the features to those who would not make a reservation. So I made an EDA to check the distribution of the two groups regarding different features and successfully verified my assumption. Then I excluded those features which are contributing minimum to the predictability (figure below shows one of such features where two groups have very similar patterns) and tried to regroup and add some more meaningful features to my modeling. And this helped me to increase my model predictability by 10%.
3. Accurately defining business metrics is key to meaningful recommendations
Peter Ferdinand Drucker is famously quoted as the saying that “if you cannot measure it, you cannot improve it”. American Statistician W. Edwards Deming also said that “In God we trust, all others must bring data.” Their opinions give us a clue that we can not make informed decisions unless the metrics are accurately defined, quantified and measured in a tangible way supported by data.
Defining business metrics requires deep understanding of the business, industry domain knowledge and acumen, effective communications with the stakeholders. A same thing can be quantified and presented in different ways. For example, to measure the revenue growth, we could calculate the absolute value of increased revenue in a given period, we could also calculate customer lifetime value to measure the growth of revenue. Using different measurement, we will probably get different conclusions when analyzing the same dataset. Hence, we need to clarify business metrics and specify the calculation formulas in advance before we measure the business outcome or make any recommendations to the management. By doing so, we will also be able to adjust our analytic process accordingly to achieve the desired deliverables.
The journey to be a competent Data scientist requires relentlessly hands-on experiences and constant reflections. I am looking forward to sharing with you more of my learnings as my career progressed.