Meet the Data Experts

Have you ever wondered what all those “data people” do each day?
It could be your daughter, husband, colleague, or mentor, but everytime they discuss their job, your eyes just gloss over. Data science, data analytics, and all those “data” terms have ambiguous and overlapping definitions. On top of that, job postings from many organizations will lump a half dozen data skills together for any and all job titles. These are often unrealistic. This post is my attempt to sort through the BS and assign proper categories for each “data expert”. They are primarily influenced by job postings at Google, Amazon, Apple, and Facebook. If anyone should know the proper definitions for these roles, it’s the Tech giants.

These categorizations will help you understand the strengths and weaknesses of each individual and what is the right question to ask each person. It will also help any aspiring data analytics professionals who want to understand which job path makes the most sense for them. These positions will frequently overlap and each individual is unique, but this should serve as a guide that is directionally correct in defining these roles.

Defining the Data Experts

There are dozens of different job roles for people with data expertise; however, most of positions tend to match up closely with the following ten:

RoleSummaryKey PhrasesPrimary SoftwareSecondary Software
Data EngineerThey design, develop, support, and connect systems that store or report data. Data engineers get the data to you in a clean format before any analysis can be done. You want a perfectionist in this role; otherwise, the data might end up dirty.Data modeling
ETL
Data warehouse
Data pipelines
Excel, Python, SQL, NoSQL, Oracle, SAP, Teradata, MongoDB, Business Objects, AWS, Azure, GCPJavaScript, Java, C/C++
Business Intelligence EngineerBI engineers are data engineers who are slightly weaker at programming but more adept at gathering business requirements. They design, develop, document, and manage scalable solutions for new and ongoing metrics, reports, analyses, and dashboards to support business needs. Requirements gathering
Reports, metrics, dashboards
Data warehouse
ETL
Databases
Excel, Python, SQL, NoSQL, Oracle, SAP, Teradata, MongoDB, Business Objects, AWS, Azure, GCPC/C++, Hive, Spark, Hadoop, Pig
AnalystCombine domain knowledge with some decent programming skills and you can get some quick, useful insights on the data. This individual loves exploring data to find interesting subsets of data. Their technical skills are more breadth than depth. Speed is the priority over perfect, clean code. Insights and analysis
Data vizualization
Macros
Automation
Excel, SQL, Python, Tableau, Pandas, Dplyr, Alteryx, PowerBIAWS. Azure, GCP
Senior AnalystSame as the analyst, but with more experience. They tend to understand the business a little more and are faster at analyzing the data. Insights and analysis
Data vizualization
Macros
Automation
Excel, SQL, Python, Tableau, Pandas, Dplyr, Alteryx, PowerBIAWS. Azure, GCP
Analytics/BI ManagerA manager who will have direct reports that include data engineers and analysts. They are managers for various technical projects aimed at building business intelligence tools. This individual communicates with senior level stakeholders on a semi-regular basis. They must be a generalist who grasps a wide array of technical concepts.Requirements gathering
Reports, metrics, dashboards
Data warehouse
ETL
Communication
Insights
Leadership
Excel, Powerpoint, Word, SQL, Tableau, NoSQL, Oracle, SAP, Teradata, MongoDB, Business Objects, AWS, Azure, GCPPython, C/C++, Java JavaScript
Applied Machine Learning EngineerApplied machine learning engineers possess a strong understanding of how to use algorithms to churn through large data sets and glean useful insights. This individual understands the required assumptions and assessments needed to build a useful model. They must be able to cope with failure because finding a useful model requires a lot of tinkering and testing.Machine learning
Data mining
AI
Experimentation
Statistical models
Algorithms
Python, R, SQL, Sci-kit Learn, Tensorflow, Pytorch, H2O, NoSQL AWS, Azure, GCP, Hive, Spark, Pig, Hadoop, Java, C/C++
StatisticianStatisticians are similar to Applied Machine Learninng Engineers; however, they have stronger statistics expertise but are typically weaker at programming. Statisticians help decision makers come to safe decisions under uncertainty. They identify conclusions that can be made beyond the data.Machine learning
Data mining
AI
Experimentation
Statistical models
Algorithms
Python, R, SQL, Sci-kit Learn, Tensorflow, Pytorch, H2O, NoSQL, STATA, SASAWS, Azure, GCP, Hive, Spark, Pig, Hadoop
Data ScientistThe Data Scientist is a combination of the Senior Analyst, Applied Machine Learning Engineer, and Statistician. The best Data Scientists possess the skills sets of all three roles; however, these are called unicorns because they are hard to find. Most data scientists only have a subset of all these skills.Machine learning
Data mining
AI
Experimentation
Statistical models
Algorithms
Databases
Python, R, SQL, Sci-kit Learn, Tensorflow, Pytorch, H2O, NoSQL AWS, Azure, GCP, Hive, Spark, Pig, Hadoop, Java, C/C++
Data Science LeadThe Data Scientist Lead manages the data science team and ensures that they add value to the business. They manage projects aimed at extracting value from data using statistical and machine learning models. Direct reports include analysts, engineers, statisticians, and data scientists. This individual communicates with senior level stakeholders on a semi-regular basis.Machine learning
Communication
AI
Leadership
Cloud
Experimentation
Excel, Powerpoint, Word, Python, R, SQL, Sci-kit Learn, Tensorflow, Pytorch, H2O, NoSQL, AWS, Azure, GCPHive, Spark, Pig, Hadoop, Java, C/C++
Decision MakerThe Decision Maker, typically a director, understands the science AND art of decision making. They are responsible for identifying areas where data can provide value, framing the appropriate use case, and ensuring their team executes on the project. They must understand the analytics, as well as, the potential impact on the business. This individual communicates with senior level stakeholders on a regular basis.Machine learning
Communication
AI
Leadership
Vision
Cloud
Excel, Powerpoint, Word, SQL, Tableau, NoSQL, Oracle, SAP, Teradata, MongoDB, Business Objects, AWS, Azure, GCPJava, Python, C/C++

Scoring the Data Experts

Each skill is rated on a scale of 1-5 (5 being the best). A 1 does not mean this person is unskilled in this area. These scores are relative to the other data analytics roles within the organization. Let’s establish the six skill sets for which we graded each role:

1.) Domain Expertise– Understanding the insights and analyses’ impact on the business.

2.) Data Manipulation– The ability to write clean code or configure a tool that can ingest, manipulate, analyze, model, and visualize data. Depth over breadth is more important because technical skills easily translate between tools.

3.) Communication Skills– The ability to explain complicated technical concepts to a non-technical audience.

4.) Managerial Competence– Requires emotional intelligence, strong social skills, and the ability to communicate clear responsibilities to each member of the team. Must be a leader who can manage direct reports.

5.) Databases– Understanding systems that store data, including data model designs, databases, data warehouses, and data pipelines. This includes knowledge of SQL and NoSQL databases, as well as, the cloud.

6.) Statistics– Knowledge of statistical and machine learning models, including the appropriate assumptions, use cases, and conclusions that can be made beyond the data set. This includes knowledge in statistics and machine learning software tools such as R, Python, and SAS.

Visualizing the Data Experts

Assumptions
– The analysts are positioned close to the business (e.g. Business Analyst, Policy Analyst, etc.).
– The Data Scientist chart displays an upper bound of their abilities (i.e. a “unicorn”)

Interpreting the Charts

The above charts are called spider (or radar) charts. The longer the point travels from the center of the “web” to the edge of a specific category, the higher the value for that category.

From the table of data roles and the spider charts, you could conclude the following:

– Analysts can produce some quick insights with a wide range of tools, but could struggle with handling big data.
– Data engineers tend to have the strongest computer science backgrounds.
– Unsurprisingly, statisticians have the best understanding of machine learning and statistical models (i.e. statistics).
– Statisticians and applied machine learning engineers are extremely similar.
– The “perfect” data scientists can do it all. However, most data scientists are simply applied machine learning engineers or statisticians with a title change in name only.
– The higher up you move in the organizational hierarchy, you tend to have a better grasp of concepts, but your skills might slip at the granular level (e.g. data wrangling using a new library).

~ The Data Generalist


Sources

The primary inspiration for this piece was a blog written by Cassie Kozyrkov, a Chief Decision Scientist at Google. The definitions of the data analytics roles and scoring were primarily influenced by open job posts at leading technology companies, including Google, Amazon, Facebook, and Apple.

The following websites helped me create the spider charts in R:
https://www.rdocumentation.org/packages/fmsb/versions/0.7.0/topics/radarchart
https://www.r-graph-gallery.com/142-basic-radar-chart.html


Leave a Reply