Meet the Data Experts
Have you ever wondered what all those “data people” do each day?
It could be your daughter, husband, colleague, or mentor, but everytime they discuss their job, your eyes just gloss over. Data science, data analytics, and all those “data” terms have ambiguous and overlapping definitions. On top of that, job postings from many organizations will lump a half dozen data skills together for any and all job titles. These are often unrealistic. This post is my attempt to sort through the BS and assign proper categories for each “data expert”. They are primarily influenced by job postings at Google, Amazon, Apple, and Facebook. If anyone should know the proper definitions for these roles, it’s the Tech giants.
These categorizations will help you understand the strengths and weaknesses of each individual and what is the right question to ask each person. It will also help any aspiring data analytics professionals who want to understand which job path makes the most sense for them. These positions will frequently overlap and each individual is unique, but this should serve as a guide that is directionally correct in defining these roles.
Defining the Data Experts
There are dozens of different job roles for people with data expertise; however, most of positions tend to match up closely with the following ten:
Role | Summary | Key Phrases | Primary Software | Secondary Software |
---|---|---|---|---|
Data Engineer | They design, develop, support, and connect systems that store or report data. Data engineers get the data to you in a clean format before any analysis can be done. You want a perfectionist in this role; otherwise, the data might end up dirty. | Data modeling ETL Data warehouse Data pipelines | Excel, Python, SQL, NoSQL, Oracle, SAP, Teradata, MongoDB, Business Objects, AWS, Azure, GCP | JavaScript, Java, C/C++ |
Business Intelligence Engineer | BI engineers are data engineers who are slightly weaker at programming but more adept at gathering business requirements. They design, develop, document, and manage scalable solutions for new and ongoing metrics, reports, analyses, and dashboards to support business needs. | Requirements gathering Reports, metrics, dashboards Data warehouse ETL Databases | Excel, Python, SQL, NoSQL, Oracle, SAP, Teradata, MongoDB, Business Objects, AWS, Azure, GCP | C/C++, Hive, Spark, Hadoop, Pig |
Analyst | Combine domain knowledge with some decent programming skills and you can get some quick, useful insights on the data. This individual loves exploring data to find interesting subsets of data. Their technical skills are more breadth than depth. Speed is the priority over perfect, clean code. | Insights and analysis Data vizualization Macros Automation | Excel, SQL, Python, Tableau, Pandas, Dplyr, Alteryx, PowerBI | AWS. Azure, GCP |
Senior Analyst | Same as the analyst, but with more experience. They tend to understand the business a little more and are faster at analyzing the data. | Insights and analysis Data vizualization Macros Automation | Excel, SQL, Python, Tableau, Pandas, Dplyr, Alteryx, PowerBI | AWS. Azure, GCP |
Analytics/BI Manager | A manager who will have direct reports that include data engineers and analysts. They are managers for various technical projects aimed at building business intelligence tools. This individual communicates with senior level stakeholders on a semi-regular basis. They must be a generalist who grasps a wide array of technical concepts. | Requirements gathering Reports, metrics, dashboards Data warehouse ETL Communication Insights Leadership | Excel, Powerpoint, Word, SQL, Tableau, NoSQL, Oracle, SAP, Teradata, MongoDB, Business Objects, AWS, Azure, GCP | Python, C/C++, Java JavaScript |
Applied Machine Learning Engineer | Applied machine learning engineers possess a strong understanding of how to use algorithms to churn through large data sets and glean useful insights. This individual understands the required assumptions and assessments needed to build a useful model. They must be able to cope with failure because finding a useful model requires a lot of tinkering and testing. | Machine learning Data mining AI Experimentation Statistical models Algorithms | Python, R, SQL, Sci-kit Learn, Tensorflow, Pytorch, H2O, NoSQL | AWS, Azure, GCP, Hive, Spark, Pig, Hadoop, Java, C/C++ |
Statistician | Statisticians are similar to Applied Machine Learninng Engineers; however, they have stronger statistics expertise but are typically weaker at programming. Statisticians help decision makers come to safe decisions under uncertainty. They identify conclusions that can be made beyond the data. | Machine learning Data mining AI Experimentation Statistical models Algorithms | Python, R, SQL, Sci-kit Learn, Tensorflow, Pytorch, H2O, NoSQL, STATA, SAS | AWS, Azure, GCP, Hive, Spark, Pig, Hadoop |
Data Scientist | The Data Scientist is a combination of the Senior Analyst, Applied Machine Learning Engineer, and Statistician. The best Data Scientists possess the skills sets of all three roles; however, these are called unicorns because they are hard to find. Most data scientists only have a subset of all these skills. | Machine learning Data mining AI Experimentation Statistical models Algorithms Databases | Python, R, SQL, Sci-kit Learn, Tensorflow, Pytorch, H2O, NoSQL | AWS, Azure, GCP, Hive, Spark, Pig, Hadoop, Java, C/C++ |
Data Science Lead | The Data Scientist Lead manages the data science team and ensures that they add value to the business. They manage projects aimed at extracting value from data using statistical and machine learning models. Direct reports include analysts, engineers, statisticians, and data scientists. This individual communicates with senior level stakeholders on a semi-regular basis. | Machine learning Communication AI Leadership Cloud Experimentation | Excel, Powerpoint, Word, Python, R, SQL, Sci-kit Learn, Tensorflow, Pytorch, H2O, NoSQL, AWS, Azure, GCP | Hive, Spark, Pig, Hadoop, Java, C/C++ |
Decision Maker | The Decision Maker, typically a director, understands the science AND art of decision making. They are responsible for identifying areas where data can provide value, framing the appropriate use case, and ensuring their team executes on the project. They must understand the analytics, as well as, the potential impact on the business. This individual communicates with senior level stakeholders on a regular basis. | Machine learning Communication AI Leadership Vision Cloud | Excel, Powerpoint, Word, SQL, Tableau, NoSQL, Oracle, SAP, Teradata, MongoDB, Business Objects, AWS, Azure, GCP | Java, Python, C/C++ |
Scoring the Data Experts
Each skill is rated on a scale of 1-5 (5 being the best). A 1 does not mean this person is unskilled in this area. These scores are relative to the other data analytics roles within the organization. Let’s establish the six skill sets for which we graded each role:
1.) Domain Expertise– Understanding the insights and analyses’ impact on the business.
2.) Data Manipulation– The ability to write clean code or configure a tool that can ingest, manipulate, analyze, model, and visualize data. Depth over breadth is more important because technical skills easily translate between tools.
3.) Communication Skills– The ability to explain complicated technical concepts to a non-technical audience.
4.) Managerial Competence– Requires emotional intelligence, strong social skills, and the ability to communicate clear responsibilities to each member of the team. Must be a leader who can manage direct reports.
5.) Databases– Understanding systems that store data, including data model designs, databases, data warehouses, and data pipelines. This includes knowledge of SQL and NoSQL databases, as well as, the cloud.
6.) Statistics– Knowledge of statistical and machine learning models, including the appropriate assumptions, use cases, and conclusions that can be made beyond the data set. This includes knowledge in statistics and machine learning software tools such as R, Python, and SAS.
Visualizing the Data Experts
Assumptions
– The analysts are positioned close to the business (e.g. Business Analyst, Policy Analyst, etc.).
– The Data Scientist chart displays an upper bound of their abilities (i.e. a “unicorn”)
Interpreting the Charts
The above charts are called spider (or radar) charts. The longer the point travels from the center of the “web” to the edge of a specific category, the higher the value for that category.
From the table of data roles and the spider charts, you could conclude the following:
– Analysts can produce some quick insights with a wide range of tools, but could struggle with handling big data.
– Data engineers tend to have the strongest computer science backgrounds.
– Unsurprisingly, statisticians have the best understanding of machine learning and statistical models (i.e. statistics).
– Statisticians and applied machine learning engineers are extremely similar.
– The “perfect” data scientists can do it all. However, most data scientists are simply applied machine learning engineers or statisticians with a title change in name only.
– The higher up you move in the organizational hierarchy, you tend to have a better grasp of concepts, but your skills might slip at the granular level (e.g. data wrangling using a new library).
Sources
The primary inspiration for this piece was a blog written by Cassie Kozyrkov, a Chief Decision Scientist at Google. The definitions of the data analytics roles and scoring were primarily influenced by open job posts at leading technology companies, including Google, Amazon, Facebook, and Apple.
The following websites helped me create the spider charts in R:
https://www.rdocumentation.org/packages/fmsb/versions/0.7.0/topics/radarchart
https://www.r-graph-gallery.com/142-basic-radar-chart.html