How to screen Data Science skills

Published: Last updated:
Screen data science skills

Data science. A modern-day buzzword. In our present-day digital world, it’s common to discover titles assigned to roles and disciplines that are not yet universally defined and accepted. None are more so prolific than data science and the data scientist skills that are attributed to them.

In this article, we’re going to break down the meaning of data science, data scientist skills and give you our advice on how to best screen for a data science position.

The down-low on data science

According to market research company Forrester, by 2021, insight-driven businesses will be collectively worth $1.8 trillion, which is up from $333 billion in the year 2015. These ‘insights’ are derived from data, which plays a pivotal role in helping the world’s most successful companies become more profitable. The same report found that data-driven organizations are growing 8x faster than the global GDP. Food for thought.

The ability to interpret data and harness its usefulness is clearly a pretty serious job. But there is more or less a consensus about the lack of consensus regarding a clear definition of data science.

Despite the field’s difficulties in defining itself, it hasn’t slowed down the creation of new graduate programs with “data science” in their names. To confirm that, a recent survey analysis by KDNuggets has shown graduate degrees with the name ‘data science’ began to emerge in 2007, with an enormous spike of enrolments 2012.

It’s evident that data science positions are on a critical trajectory of their lifespan. Due to the field’s scalability, it’s receiving the attention it demands. But without being able to properly understand what it is, how are we supposed to hire for it?

DevSkiller’s got you covered on both fronts.

What is data science

What is Data Science?

In its simplest form, data science is the discipline of making data useful. The concept of data science is ‘to unify statistics, data analysis, machine learning, and their related methods’ in order to ‘understand and analyze actual phenomena’ with data.

Traditionally, the data we could evaluate was mostly structured and small in size, and able to be analyzed by using simple BI tools. Unlike data in the traditional systems which was mostly structured, today most of the data is unstructured or semi-structured. This demand has accelerated the role of the data scientist.

1.1 What is the role of a data scientist?

A data scientist should be setting the data strategy of the company which involves setting everything up from the engineering and infrastructure for collecting data and logging, to privacy concerns. They decide what data will be user-facing, how data is going to be used to make decisions, and how it’s going to be built back into the product. They will also be concerned with patenting innovative solutions and setting research goals. A list of their basic responsibilities include:

  • Synthesizing all available information, statistics, and data of an organization,
  • Compiling information about the AI needs in an organization,
  • Analyze data and find potential uses with AI (sometimes called Exploratory Data Analysis),
  • Explain data patterns to business-oriented colleagues and clients (a process known as data storytelling),
  • Design and prepare machine learning models,
  • Evaluate models’ efficacy in the production environment.

In case you didn’t know, a machine learning model is a program that has been trained to recognize certain types of patterns. It’s possible to train a model over a set of data, providing it an algorithm that it can use to reason over and learn from those data.

A chief data scientist should manage a team of engineers, scientists, and analysts and should communicate with leadership across the company, including the CEO, CTO, and product leadership. She’ll also be concerned with patenting innovative solutions and setting research goals.

A popular Twitter definition has described a data scientist as ‘someone who is better at statistics than any software engineer and better at software engineering than any statistician’.

1.2 Is a data scientist similar to any other positions?

Many different kinds of analysts are able to ‘make data useful’, starting from a data engineer, all the way to a qualitative expert. While all these roles participate in data science, to refer to someone as a data scientist they should have expertise in all three areas (analytics, statistics, and ML/IA).

To offer an example, a machine learning developer does a subset of the data scientist’s tasks but focuses only on Machine Learning Models. The position of data scientist really is an umbrella term although job titles have never really been an accurate reflection of one’ responsibilities

Data science: What is important for an IT recruiter

What is important for an IT recruiter to know about Data Science?

2.1 How often does the environment/challenges faced change?

One thing an IT recruiter should note is that the landscape is changing constantly. The data is always getting bigger, and problems are getting harder; so new techniques are developed and new frameworks are sure to follow.

2.2 Are there many resources/tools/technologies (libraries, frameworks, etc.) available?

Being familiar with certain resources and tools will certainly be a big advantage. Currently, a lot of tools are available in the Python language, however, there are a lot fewer available for R (another programming language). Some deep-learning frameworks are available in C++, as it’s faster and more memory-efficient than Python. In Python, some of the most popular libraries include: pandas, Seaborn, plotly, scikit-learn, PyTorch, TensorFlow.

2.3 What should a data scientist know about and what are the most important data scientist skills?

Data scientists are expected to know a lot — machine learning, computer science, statistics, mathematics, data visualization, communication, and deep learning. Within those areas, there are dozens of languages, frameworks, and technologies data scientists can learn.

Data science requires statistics and computer science skills — no surprise there. It is interesting that communication is mentioned in nearly half of the data science job listings these days. Data scientists need to be able to communicate insights and work with others. A basic list of what makes a good data scientist is below:

  • Data analysis capability
  • Skilled at machine learning
  • Has good communication skills
  • Has mastered a deep learning framework
  • Is fluent in Python or R

2.4. What type of experience is important to look for in a data scientist (commercial, open-source, scientific, academic)?

For research, only projects — academic or scientific experience will be the most crucial and well-rounded. But in terms of creating production models — previous experience with working with other models of production will give you the best insight.

Verify skills

How to verify data scientist skills in the screening phase?

Growing data means growing opportunities — it all just needs good management. Verifying skills in the screening phase is tricky but focusing on a candidate’s soft-skills can also help weed out talent in a unique way. Finding data scientists who are already great decision-

makers can save a lot of hassle for your business.

3.1 What to take into account when screening a CV?

The most important thing to consider is whether the candidate has a detailed background in the most relevant areas. A history of exposure to mathematics, statistics, computer science, programming, and machine learning libraries are absolutely key here. Previous experience with data science analytics and programming are vital too.

What will separate a good data scientist from a great one are interpersonal communication skills, i.e the ability to converse and cooperate with a wide variety of people. The candidate should also have a good business acumen or a well-rounded understanding of business fundamentals and principles.

Be sure to check whether the candidate has indicated how their work positively affected an increase in sales, ROI, etc. It’s quite essential for top candidate’s to include quantitative evidence of their achievements.

If the candidate you’re looking for is a recent graduate, focus on their skills and relevant coursework or internships they may have done to assess their breadth of knowledge.

3.2 What glossary terms are important to know?

  • Exploratory data analysis – this consists of data cleanup, exploration of data patterns, and the manual discovery of patterns in data
  • Data storytelling – this refers to the description and visualization of data patterns for persons without the technical knowledge
  • Classical Machine Learning – solving tasks using models like linear or logistic regression, decision trees, random forests, boosting, support vector machines, non-negative matrix factorization, K-means, k-nearest neighbors
  • Deep Learning – solving tasks using neural networks. Some types of neural networks include Convolutional Neural Networks and Recurrent Neural Networks
Data analysis and manipulation librariesIn Python: NumPy, pandas   In R: dyplr, tidyr
Distributed data analysis and manipulation librariesIn Python: Dask   In Scala, Java, and Python: Spark
Data visualization librariesIn Python: Seaborn, Plotly, Matplotlib   In R: ggplot2
General Machine Learning librariesIn Python: scikit-learn   In R: caret, e1071
Deep Learning librariesIn Python: Keras, Tensorflow, PyTorch   In R: Nnet In C++: Caffe

3.3 Which certifications are available and respected? How useful are they in determining data scientist skills?

Let’s get one thing clear upfront: you do not need any kind of data science certificate to get a job in data science. It helps, but recruiters aren’t overly fussed.

However, around half of machine learning knowledge is theoretical so certifications in this area are highly applicable. The other 50% comes from experience, so any kind of production model created, or Kaggle competitions. Certifications usually don’t check for business analysis skills or general people skills. The top courses we have found are below.

  • Certified Analytics Professional (CAP)
  • Cloudera Certified Associate: Data Analyst
  • Cloudera Certified Professional: CCP Data Engineer
  • Data Science Council of America (DASCA) Senior Data Scientist (SDS)
  • Data Science Council of America (DASCA) Principle Data Scientist (PDS)
  • Dell EMC Data Science Track
  • Google Certified Professional Data Engineer
  • Google Data and Machine Learning
  • IBM Data Science Professional Certificate
  • Microsoft MCSE: Data Management and Analytics
  • Microsoft Certified Azure Data Scientist Associate
  • Open Certified Data Scientist (Open CDS)
  • SAS Certified Advanced Analytics Professional
  • SAS Certified Big Data Professional
  • SAS Certified Data Scientist

Certifications obtained from Coursera, edX, or Udacity are also highly respected.

3.4 What other lines on a CV can show data scientist skills?

Taking note of the candidates’ participation in conferences as a speaker can indicate a necessary skill to be an adequate storyteller, an important requirement in data science. It is obviously imperative to be an expert on the technical side of things, but having the ability to explain your findings to those without your technical knowledge is just as crucial.

Taking part in machine learning competitions can also be a great advantage.  Platforms such as Kaggle.com, topcoder.com, crowdai.org, and knowledgepit.ml all offer the chance to compete for awards in the space.

In today’s world, having a good resume alone might not be enough to land that coveted interview call. Especially if you are applying for a data scientist role. As we are living and thriving in the midst of a digital revolution, it stands to reason that the recruiting process would incorporate that as well.

Browsing a candidate’s LinkedIn and GitHub accounts can be useful to gauge the outline of a candidate as well to view their proficiency in open-source projects. You can decide whether the projects are relevant to the current role. This helps you to visualize the candidate’s profile so you are able to structure questions in a certain manner. You will also be able to determine whether the data scientist skills mentioned by the candidate in his/her resume are reflective in their GitHub profile.

Technical screening of data science skills during a phone/video technical interview

It’s difficult to rely on just the words of a resume. After all, it’s important to challenge the candidate to determine whether they really have the skills they claim to have. Even if it’s just a phone interview, it can help you understand how the candidate thinks and goes about solving problems related to their craft.

4.1 Questions that you should ask about a data scientist’s experience. Why should you ask each of those questions?

  • What kind of DS projects did you do, and what was the extent of your engagement in the projects?
    Reason: As data science is an extremely broad position, oftentimes with differing responsibilities; some candidates may only work in data analysis and storytelling or only gather requirements and create machine learning models. The candidate’s experience should match the responsibilities of the position you’re recruiting for. This question is really aimed at checking the extent of the candidate’s skills.
  • How did your work have a positive financial impact on the organization with the projects you played a part in?Reason: The data scientist role is a position that requires a good understanding of business requirements and conditions. Look for answers that show specific measurements, such as ‘the marketing team was able to cut costs by 10% due to our results’, or ‘we have lowered customer turnover by 5% due to our new retention capabilities’.
  • What kinds of libraries and programming techniques did you use?
    Reason: Data scientists can use a wide variety of tools to achieve the same results. These can depend on the programming language one chooses, the internal company infrastructure, and the size of the dataset the candidate has worked with. The candidate will likely perform best with tools they have previous experience with.

4.2 Questions that you should ask about a data scientist’s knowledge and opinions. Why should you ask each of those questions?

  • How would you check that a model is functioning properly?
    Reason: The ideal methodology is to split the dataset into sections: training set, validation set, and test set. The training set is the only one available to the model and is the basis of the training process. The model’s parameters are set using the validation set and model efficiency is tested on the test set.
  • How would you check if the data in the dataset is of good quality?
    Reason: A data scientist will most likely have to work with a dataset collected within the company that might contain missing values, errors or inconsistencies – these are the signs of messy data. To find such problems, a data scientist should perform Exploratory Data Analysis to summarize their main characteristics.
  • What is boosting and what are the benefits or using it?
    Reason: Boosting models are tree-based models consisting of groups of trees that are trained sequentially. Boosting models are currently the most efficient ones with great accuracy, relatively short training times, reduced memory usage, and medium sized required training datasets (in comparison to deep learning techniques).

A tip from our expert is to ask questions that are related to business problems you’re currently recruiting for. Like anyone, data scientists will work best in areas they’re familiar with.

For example, not every candidate may have a “feel” for (or be interested in, or willing to learn) the inner-workings of factory equipment (problems of predictive maintenance), medical terms (creating AI for the medical industry), or client preferences (recommender systems for e-commerce).

4.3 Behavioral questions that you should ask a data scientist. Why should you ask each of those questions?

  • How do you deal with differences of opinion with colleagues?
    Reason: A data scientist must have good communication and interpersonal skills (i.e empathy) as their role is based on compiling data from colleagues and finding areas for improvement within their organization or society.
  • Where do you find information about new data science techniques or cases?
    Reason: As the data science field is constantly evolving and growing, the role requires constant research to stay up to date with the latest updates and to problem solve in the most efficient manner. Any of these sources are worthy: conference papers, workshop papers, MOOCs, blogs of companies dealing with DSs, meetups of DS community, Facebook or mail groups with a DS theme, or learning from a mentor.
  • What do you consider to be your greatest success and biggest failure in the DS field?
    Reason: This is a pretty generic question but it shows the self-recognition and self-reflection skills of the candidate. Both are necessary in the learning process which is a major part of being a great data scientist.
Coding tests

Technical screening of a data scientist’s skills using an online coding test

Hiring a data scientist can be a tricky process. The actual definition of a data scientist is vague, and the day-to-day job of someone with ‘data scientist’ in their job title varies dramatically between organizations. Also, people come to the field from a wide variety of backgrounds. Examining the past of a data scientist candidate is a science in itself, one worthy of a blog post of its own. We’re going to stick to showing you how best to screen for a data scientist!

5.1 Which online test for data scientist skills should you choose?

When looking for the right data science skills test you should make sure it matches the following criteria:

  • The test reflects the quality of professional work being carried out
  • The duration is not too long, one to two hours max
  • The test can be sent automatically and is straight-forward in nature
  • The difficulty level matches the candidate’s abilities
  • The test goes beyond checking whether the solution works – it checks the quality of the code and how well it works in edge cases
  • It’s as close to the natural programming environment as possible and allows the candidate to access relevant resources
  • It provides the candidate the opportunity to use all the libraries, frameworks, and other tools they regularly come across

5.2 DevSkiller ready-to-use online data science skills tests

DevSkiller coding tests use our RealLifeTesting™ methodology to mirror the actual coding environment that your candidate works in. Rather than using obscure algorithms, DevSkiller tests require candidates to build applications or features. They are graded completely automatically and can be taken anywhere in the world. At the same time, the candidate has access to all of the resources that they would normally use including libraries, frameworks, StackOverflow, and even Google.

Companies use DevSkiller to test candidates using their own codebase from anywhere in the world. To make it easy, DevSkiller also offers a number of pre-made data science skills tests like the ones here:

Python
MIDDLE
Tested skills
Duration
70 minutes max.
Evaluation
Automatic
Test overview

Choice questions

assessing knowledge of Python, Spark

Programming task - Level: Medium

Python | PySpark | Customer Preference Model - Implement a Data Engineering application for preprocessing marketing data.

Python
JUNIOR
Tested skills
Duration
65 minutes max.
Evaluation
Automatic
Test overview

Choice questions

assessing knowledge of Python

Programming task - Level: Easy

Python | PySpark | ML Logs Transformer - Complete the implementation of the logs transformation pipeline.

Scala
JUNIOR
Tested skills
Duration
66 minutes max.
Evaluation
Automatic
Test overview

Choice questions

assessing knowledge of Scala

Programming task - Level: Easy

Scala | Spark | ML Logs Transformer - Complete the implementation of the logs' transformation pipeline.

Data Science
JUNIOR
Tested skills
Duration
45 minutes max.
Evaluation
Automatic
Test overview

Task - Level: Easy

SQL | Stamps catalogue | The three highest prices - Select three stamps (price and name) with the highest price.

Programming task - Level: Easy

Python | Pandas | HTML table parser - Implement a function to convert HTML table into a CSV-format file.

Python
JUNIOR
Tested skills
Duration
35 minutes max.
Evaluation
Automatic
Test overview

Choice questions

assessing knowledge of Python

Programming task - Level: Easy

Python | Pandas | HTML table parser - Implement a function to convert HTML table into a CSV-format file.

Python
MIDDLE
Tested skills
Duration
120 minutes max.
Evaluation
Automatic
Test overview

Choice questions

assessing knowledge of Python

Programming task - Level: Medium

Python | Vehicle sales report - Implement an application to create reports based on the vehicle sales data warehouse.

Python
MIDDLE
Tested skills
Duration
96 minutes max.
Evaluation
Automatic
Test overview

Choice questions

assessing knowledge of Python

Programming task - Level: Medium

Python | Pandas | A food delivery startup - Transform a database of orders by reducing its dimensionality and creating an additional analytical table.

Python
JUNIOR
Tested skills
Duration
45 minutes max.
Evaluation
Automatic
Test overview

Choice questions

assessing knowledge of Python

Programming task - Level: Easy

Python | Client Base Creator - Implement the application to retrieve customer's contact data from the chat messages.

Python
MIDDLE
Tested skills
Duration
70 minutes max.
Evaluation
Automatic
Test overview

Choice questions

assessing knowledge of Machine Learning, Python

Programming task - Level: Medium

Python | DNA Analyzer | Create and clean DNA strands - Implement 2 methods in Python that create and clean DNA strands.

Python
JUNIOR
Tested skills
Duration
49 minutes max.
Evaluation
Automatic
Test overview

Choice questions

assessing knowledge of Machine Learning

Programming task - Level: Easy

Python | DNA Analyzer - Implement a method in Python that generates DNA statistical report.

Test Data Science skills with our built-in PyCharm IDE

You can now assess your candidates’ Data Science skills with the use of our built-in PyCharm IDE.

Given how hard it is to attract skilled data scientists, creating the most candidate-friendly assessment environment possible is a huge asset. Letting Data Scientists work exactly the way they normally do during the recruitment process is a game-changer.

What this means for you and your candidates:

  • Your candidates can now work directly in the browser, without having to download any components or wait for the program to load,
  • They no longer have to clone the code, wait for the dependencies to install or indexes to build,
  • Instead, they can literally start coding as soon as they open the test invitation. This quickens up the process, resulting in lower candidate drop-off and a more positive candidate experience overall. Our PyCharm IDE is hosted by our own server within the cloud. Candidates can run tests, preview and play their solutions and run their code.

We aim to make the screening process as close to a Data Scientists normal working environment as possible.

This is the second in-browser IDE from JetBrains that we’ve added to our platform, following the addition of IntelliJ IDEA for all Java tests earlier this year.

We’ll soon be rolling out more IDEs to the platform to make the testing environment universally enjoyable to candidates across all tech stacks.

Share post

Learn more about tech hiring

Subscribe to our Learning Hub to get useful insights right into your inbox.

Verify & develop coding skills seamlessly.

See DevSkiller products in action.

Security certifications & compliance. We make sure your data is safe and secure.

DevSkiller logo TalentBoost logo TalentScore logo