Sådan screener du datalogiske færdigheder

Udgivet: Sidst opdateret:
Screen datavidenskabelige færdigheder

Data science. A modern-day buzzword. In our present-day digital world, it’s common to discover titles assigned to roles and disciplines that are not yet universally defined and accepted. None are more so prolific than data science and the data scientist skills that are attributed to them.

In this article, we’re going to break down the meaning of data science, data scientist skills and give you our advice on how to best screen for a data science position.

The down-low on data science

According to market research company Forrester, by 2021, insight-driven businesses will be collectively worth $1.8 trillion, which is up from $333 billion in the year 2015. These ‘insights’ are derived from data, which plays a pivotal role in helping the world’s most successful companies become more profitable. The same report found that data-driven organizations are growing 8x faster than the global GDP. Food for thought.

The ability to interpret data and harness its usefulness is clearly a pretty serious job. But there is more or less a consensus about the lack of consensus regarding a clear definition of data science.

Despite the field’s difficulties in defining itself, it hasn’t slowed down the creation of new graduate programs with “data science” in their names. To confirm that, a recent survey analysis by KDNuggets has shown graduate degrees with the name ‘data science’ began to emerge in 2007, with an enormous spike of enrolments 2012.

It’s evident that data science positions are on a critical trajectory of their lifespan. Due to the field’s scalability, it’s receiving the attention it demands. But without being able to properly understand what it is, how are we supposed to hire for it?

DevSkiller’s got you covered on both fronts.

What is data science

What is Data Science?

In its simplest form, data science is the discipline of making data useful. The concept of data science is ‘to unify statistics, data analysis, machine learning, and their related methods’ in order to ‘understand and analyze actual phenomena’ with data.

Traditionally, the data we could evaluate was mostly structured and small in size, and able to be analyzed by using simple BI tools. Unlike data in the traditional systems which was mostly structured, today most of the data is unstructured or semi-structured. This demand has accelerated the role of the data scientist.

1.1 What is the role of a data scientist?

A data scientist should be setting the data strategy of the company which involves setting everything up from the engineering and infrastructure for collecting data and logging, to privacy concerns. They decide what data will be user-facing, how data is going to be used to make decisions, and how it’s going to be built back into the product. They will also be concerned with patenting innovative solutions and setting research goals. A list of their basic responsibilities include:

  • Synthesizing all available information, statistics, and data of an organization,
  • Compiling information about the AI needs in an organization,
  • Analyze data and find potential uses with AI (sometimes called Exploratory Data Analysis),
  • Explain data patterns to business-oriented colleagues and clients (a process known as data storytelling),
  • Design and prepare machine learning models,
  • Evaluate models’ efficacy in the production environment.

In case you didn’t know, a machine learning model is a program that has been trained to recognize certain types of patterns. It’s possible to train a model over a set of data, providing it an algorithm that it can use to reason over and learn from those data.

A chief data scientist should manage a team of engineers, scientists, and analysts and should communicate with leadership across the company, including the CEO, CTO, and product leadership. She’ll also be concerned with patenting innovative solutions and setting research goals.

A popular Twitter definition has described a data scientist as ‘someone who is better at statistics than any software engineer and better at software engineering than any statistician’.

1.2 Is a data scientist similar to any other positions?

Many different kinds of analysts are able to ‘make data useful’, starting from a data engineer, all the way to a qualitative expert. While all these roles participate in data science, to refer to someone as a data scientist they should have expertise in all three areas (analytics, statistics, and ML/IA).

To offer an example, a machine learning developer does a subset of the data scientist’s tasks but focuses only on Machine Learning Models. The position of data scientist really is an umbrella term although job titles have never really been an accurate reflection of one’ responsibilities

Data science: What is important for an IT recruiter

What is important for an IT recruiter to know about Data Science?

2.1 How often does the environment/challenges faced change?

One thing an IT recruiter should note is that the landscape is changing constantly. The data is always getting bigger, and problems are getting harder; so new techniques are developed and new frameworks are sure to follow.

2.2 Are there many resources/tools/technologies (libraries, frameworks, etc.) available?

Being familiar with certain resources and tools will certainly be a big advantage. Currently, a lot of tools are available in the Python language, however, there are a lot fewer available for R (another programming language). Some deep-learning frameworks are available in C++, as it’s faster and more memory-efficient than Python. In Python, some of the most popular libraries include: pandas, Seaborn, plotly, scikit-learn, PyTorch, TensorFlow.

2.3 What should a data scientist know about and what are the most important data scientist skills?

Data scientists are expected to know a lot — machine learning, computer science, statistics, mathematics, data visualization, communication, and deep learning. Within those areas, there are dozens of languages, frameworks, and technologies data scientists can learn.

Data science requires statistics and computer science skills — no surprise there. It is interesting that communication is mentioned in nearly half of the data science job listings these days. Data scientists need to be able to communicate insights and work with others. A basic list of what makes a good data scientist is below:

  • Data analysis capability
  • Skilled at machine learning
  • Has good communication skills
  • Has mastered a deep learning framework
  • Is fluent in Python or R

2.4. What type of experience is important to look for in a data scientist (commercial, open-source, scientific, academic)?

For research, only projects — academic or scientific experience will be the most crucial and well-rounded. But in terms of creating production models — previous experience with working with other models of production will give you the best insight.

Verify skills

How to verify data scientist skills in the screening phase?

Growing data means growing opportunities — it all just needs good management. Verifying skills in the screening phase is tricky but focusing on a candidate’s soft-skills can also help weed out talent in a unique way. Finding data scientists who are already great decision-

makers can save a lot of hassle for your business.

3.1 What to take into account when screening a CV?

The most important thing to consider is whether the candidate has a detailed background in the most relevant areas. A history of exposure to mathematics, statistics, computer science, programming, and machine learning libraries are absolutely key here. Previous experience with data science analytics and programming are vital too.

What will separate a good data scientist from a great one are interpersonal communication skills, i.e the ability to converse and cooperate with a wide variety of people. The candidate should also have a good business acumen or a well-rounded understanding of business fundamentals and principles.

Be sure to check whether the candidate has indicated how their work positively affected an increase in sales, ROI, etc. It’s quite essential for top candidate’s to include quantitative evidence of their achievements.

If the candidate you’re looking for is a recent graduate, focus on their skills and relevant coursework or internships they may have done to assess their breadth of knowledge.

3.2 What glossary terms are important to know?

  • Exploratory data analysis – this consists of data cleanup, exploration of data patterns, and the manual discovery of patterns in data
  • Data storytelling – this refers to the description and visualization of data patterns for persons without the technical knowledge
  • Classical Machine Learning – solving tasks using models like linear or logistic regression, decision trees, random forests, boosting, support vector machines, non-negative matrix factorization, K-means, k-nearest neighbors
  • Deep Learning – solving tasks using neural networks. Some types of neural networks include Convolutional Neural Networks and Recurrent Neural Networks
Data analysis and manipulation librariesIn Python: NumPy, pandas   In R: dyplr, tidyr
Distributed data analysis and manipulation librariesIn Python: Dask   In Scala, Java, and Python: Spark
Data visualization librariesIn Python: Seaborn, Plotly, Matplotlib   In R: ggplot2
General Machine Learning librariesIn Python: scikit-learn   In R: caret, e1071
Deep Learning librariesIn Python: Keras, Tensorflow, PyTorch   In R: Nnet In C++: Caffe

3.3 Which certifications are available and respected? How useful are they in determining data scientist skills?

Let’s get one thing clear upfront: you do not need any kind of data science certificate to get a job in data science. It helps, but recruiters aren’t overly fussed.

However, around half of machine learning knowledge is theoretical so certifications in this area are highly applicable. The other 50% comes from experience, so any kind of production model created, or Kaggle competitions. Certifications usually don’t check for business analysis skills or general people skills. The top courses we have found are below.

  • Certified Analytics Professional (CAP)
  • Cloudera Certified Associate: Data Analyst
  • Cloudera Certified Professional: CCP Data Engineer
  • Data Science Council of America (DASCA) Senior Data Scientist (SDS)
  • Data Science Council of America (DASCA) Principle Data Scientist (PDS)
  • Dell EMC Data Science Track
  • Google Certified Professional Data Engineer
  • Google Data and Machine Learning
  • IBM Data Science Professional Certificate
  • Microsoft MCSE: Data Management and Analytics
  • Microsoft Certified Azure Data Scientist Associate
  • Open Certified Data Scientist (Open CDS)
  • SAS Certified Advanced Analytics Professional
  • SAS Certified Big Data Professional
  • SAS Certified Data Scientist

Certifications obtained from Coursera, edX, or Udacity are also highly respected.

3.4 What other lines on a CV can show data scientist skills?

Taking note of the candidates’ participation in conferences as a speaker can indicate a necessary skill to be an adequate storyteller, an important requirement in data science. It is obviously imperative to be an expert on the technical side of things, but having the ability to explain your findings to those without your technical knowledge is just as crucial.

Taking part in machine learning competitions can also be a great advantage.  Platforms such as Kaggle.com, topcoder.com, crowdai.org, and knowledgepit.ml all offer the chance to compete for awards in the space.

In today’s world, having a good resume alone might not be enough to land that coveted interview call. Especially if you are applying for a data scientist role. As we are living and thriving in the midst of a digital revolution, it stands to reason that the recruiting process would incorporate that as well.

Browsing a candidate’s LinkedIn and GitHub accounts can be useful to gauge the outline of a candidate as well to view their proficiency in open-source projects. You can decide whether the projects are relevant to the current role. This helps you to visualize the candidate’s profile so you are able to structure questions in a certain manner. You will also be able to determine whether the data scientist skills mentioned by the candidate in his/her resume are reflective in their GitHub profile.

Technical screening of data science skills during a phone/video technical interview

It’s difficult to rely on just the words of a resume. After all, it’s important to challenge the candidate to determine whether they really have the skills they claim to have. Even if it’s just a phone interview, it can help you understand how the candidate thinks and goes about solving problems related to their craft.

4.1 Questions that you should ask about a data scientist’s erfaring. Hvorfor skal du stille hvert af disse spørgsmål?

  • What kind of DS projects did you do, and what was the extent of your engagement in the projects?
    Reason: As data science is an extremely broad position, oftentimes with differing responsibilities; some candidates may only work in data analysis and storytelling or only gather requirements and create machine learning models. The candidate’s experience should match the responsibilities of the position you’re recruiting for. This question is really aimed at checking the extent of the candidate’s skills.
  • How did your work have a positive financial impact on the organization with the projects you played a part in?Reason: The data scientist role is a position that requires a good understanding of business requirements and conditions. Look for answers that show specific measurements, such as ‘the marketing team was able to cut costs by 10% due to our results’, or ‘we have lowered customer turnover by 5% due to our new retention capabilities’.
  • What kinds of libraries and programming techniques did you use?
    Reason: Data scientists can use a wide variety of tools to achieve the same results. These can depend on the programming language one chooses, the internal company infrastructure, and the size of the dataset the candidate has worked with. The candidate will likely perform best with tools they have previous experience with.

4.2 Questions that you should ask about a data scientist’s viden og holdninger. Hvorfor skal du stille hvert af disse spørgsmål?

  • How would you check that a model is functioning properly?
    Reason: The ideal methodology is to split the dataset into sections: training set, validation set, and test set. The training set is the only one available to the model and is the basis of the training process. The model’s parameters are set using the validation set and model efficiency is tested on the test set.
  • How would you check if the data in the dataset is of good quality?
    Reason: A data scientist will most likely have to work with a dataset collected within the company that might contain missing values, errors or inconsistencies – these are the signs of messy data. To find such problems, a data scientist should perform Exploratory Data Analysis to summarize their main characteristics.
  • What is boosting and what are the benefits or using it?
    Reason: Boosting models are tree-based models consisting of groups of trees that are trained sequentially. Boosting models are currently the most efficient ones with great accuracy, relatively short training times, reduced memory usage, and medium sized required training datasets (in comparison to deep learning techniques).

A tip from our expert is to ask questions that are related to business problems you’re currently recruiting for. Like anyone, data scientists will work best in areas they’re familiar with.

For example, not every candidate may have a “feel” for (or be interested in, or willing to learn) the inner-workings of factory equipment (problems of predictive maintenance), medical terms (creating AI for the medical industry), or client preferences (recommender systems for e-commerce).

4.3 Adfærdsmæssig questions that you should ask a data scientist. Why should you ask each of those questions?

  • How do you deal with differences of opinion with colleagues?
    Reason: A data scientist must have good communication and interpersonal skills (i.e empathy) as their role is based on compiling data from colleagues and finding areas for improvement within their organization or society.
  • Where do you find information about new data science techniques or cases?
    Reason: As the data science field is constantly evolving and growing, the role requires constant research to stay up to date with the latest updates and to problem solve in the most efficient manner. Any of these sources are worthy: conference papers, workshop papers, MOOCs, blogs of companies dealing with DSs, meetups of DS community, Facebook or mail groups with a DS theme, or learning from a mentor.
  • What do you consider to be your greatest success and biggest failure in the DS field?
    Reason: This is a pretty generic question but it shows the self-recognition and self-reflection skills of the candidate. Both are necessary in the learning process which is a major part of being a great data scientist.
Kodning af prøver

Technical screening of a data scientist’s skills using an online coding test

Hiring a data scientist can be a tricky process. The actual definition of a data scientist is vague, and the day-to-day job of someone with ‘data scientist’ in their job title varies dramatically between organizations. Also, people come to the field from a wide variety of backgrounds. Examining the past of a data scientist candidate is a science in itself, one worthy of a blog post of its own. We’re going to stick to showing you how best to screen for a data scientist!

5.1 Which online test for data scientist skills should you choose?

Når du leder efter den rigtige data science skills test skal du sikre dig, at den opfylder følgende kriterier:

  • Prøven afspejler kvaliteten af det professionelle arbejde, der udføres
  • Varigheden er ikke for lang, højst en til to timer.
  • Testen kan sendes automatisk og er ligetil i sin natur
  • Sværhedsgraden passer til ansøgerens evner
  • Testen går ud over at kontrollere, om løsningen fungerer - den kontrollerer kodens kvalitet og hvor godt den fungerer i edge cases
  • Det er så tæt på det naturlige programmeringsmiljø som muligt og giver kandidaten adgang til relevante ressourcer
  • Det giver kandidaten mulighed for at bruge alle de biblioteker, frameworks og andre værktøjer, som de regelmæssigt støder på.

5.2 DevSkiller ready-to-use online data science skills tests

DevSkiller-kodningstests bruger vores RealLifeTesting™-metode til at afspejle det faktiske kodningsmiljø, som din kandidat arbejder i. I stedet for at bruge obskure algoritmer kræver DevSkiller-testene, at kandidaterne skal opbygge applikationer eller funktioner. De bedømmes helt automatisk og kan tages hvor som helst i verden. Samtidig har kandidaten adgang til alle de ressourcer, som de normalt ville bruge, herunder biblioteker, frameworks, StackOverflow og endda Google.

Virksomheder bruger DevSkiller til at teste kandidater ved hjælp af deres egen kodebase fra et hvilket som helst sted i verden. For at gøre det nemt tilbyder DevSkiller også en række færdighedstests inden for datalogi, som f.eks. de her, der er lavet på forhånd:

Python
MIDDLE
Testede færdigheder
Varighed
70 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Python, Gnist

Programmeringsopgave - Niveau: Medium

Python | PySpark | Customer Preference Model - Implementer en datateknisk applikation til forbehandling af markedsføringsdata.

Python
JUNIOR
Testede færdigheder
Varighed
65 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Python

Programmeringsopgave - Niveau:

Python | PySpark | ML Logs Transformer - Færdiggør implementeringen af logtransformationspipeline.

Scala
JUNIOR
Testede færdigheder
Varighed
66 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Scala

Programmeringsopgave - Niveau:

Scala | Spark | ML Logs Transformer - Færdiggør implementeringen af loggenes transformationspipeline.

Datalogi
JUNIOR
Testede færdigheder
Varighed
45 minutter max.
Evaluering
Automatisk
Testoversigt

Opgave - Niveau: Let

SQL | Frimærkekatalog | De tre højeste priser - Vælg tre frimærker (pris og navn) med den højeste pris.

Programmeringsopgave - Niveau:

Python | Pandas | HTML table parser - Implementer en funktion til at konvertere HTML-tabellen til en CSV-fil.

Python
JUNIOR
Testede færdigheder
Varighed
35 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Python

Programmeringsopgave - Niveau:

Python | Pandas | HTML table parser - Implementer en funktion til at konvertere HTML-tabellen til en CSV-fil.

Python
MIDDLE
Testede færdigheder
Varighed
120 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Python

Programmeringsopgave - Niveau: Medium

Python | Rapport om salg af køretøjer - Implementer en applikation til at oprette rapporter baseret på datalageret om salg af køretøjer.

Python
MIDDLE
Testede færdigheder
Varighed
96 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Python

Programmeringsopgave - Niveau: Medium

Python | Pandas | En startupvirksomhed, der leverer mad - Omdan en database med ordrer ved at reducere dens dimensionalitet og oprette en ekstra analytisk tabel.

Python
JUNIOR
Testede færdigheder
Varighed
45 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Python

Programmeringsopgave - Niveau:

Python | Client Base Creator - Implementer programmet til at hente kundens kontaktdata fra chatbeskederne.

Python
MIDDLE
Testede færdigheder
Varighed
70 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Maskinlæring, Python

Programmeringsopgave - Niveau: Medium

Python | DNA Analyzer | Opret og rens DNA-strenge - Implementer 2 metoder i Python, der opretter og renser DNA-strenge.

Python
JUNIOR
Testede færdigheder
Varighed
49 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Maskinlæring

Programmeringsopgave - Niveau:

Python | DNA Analyzer - Implementer en metode i Python, der genererer en statistisk DNA-rapport.

Del indlæg

Få mere at vide om ansættelse af teknologiske medarbejdere

Tilmeld dig vores Learning Hub for at få nyttig viden direkte i din indbakke.

Kontroller og udvikl kodningsevner uden problemer.

Se DevSkiller-produkterne i aktion.

Sikkerhedscertificeringer og overholdelse. Vi sørger for, at dine data er sikre og beskyttede.

DevSkiller-logo TalentBoost-logo TalentScore-logo