Sådan screener du datalogiske færdigheder

Udgivet: Sidst opdateret:
Screen datavidenskabelige færdigheder

Datalogi. Et moderne buzzword. I vores digitale verden af i dag er det almindeligt at finde titler til roller og discipliner, som endnu ikke er defineret og accepteret af alle. Ingen er mere udbredt end datavidenskab og de datavidenskabelige færdigheder, der tilskrives dem.

I denne artikel vil vi gennemgå betydningen af datavidenskab og datavidenskabelige færdigheder og give dig vores råd om, hvordan du bedst screener dig til en stilling inden for datavidenskab.

Det vigtigste om datavidenskab

Ifølge markedsanalysefirmaet Forrester vil indsigtsdrevne virksomheder i 2021 samlet set være til en værdi af $1,8 billioner, hvilket er en stigning fra $333 mia. i 2015. Disse "indsigter" stammer fra data, som spiller en central rolle for at hjælpe verdens mest succesfulde virksomheder med at blive mere rentable. Samme rapport viste, at datadrevne organisationer vokser 8 gange hurtigere end det globale BNP. Stof til eftertanke.

Evnen til at fortolke data og udnytte deres nytteværdi er helt klart et ret seriøst job. Men der er mere eller mindre enighed om om den manglende konsensus om en klar definition af datavidenskab.

På trods af feltets vanskeligheder med at definere sig selv har det ikke bremset oprettelsen af nye kandidatuddannelser med "datalogi" i deres navn. For at bekræfte dette, skal en seneste analyse af en undersøgelse af KDNuggets har vist, at kandidatuddannelser med navnet "datavidenskab" begyndte at dukke op i 2007, med en enorm stigning i antallet af tilmeldinger i 2012.

Det er tydeligt, at stillinger inden for datalogi befinder sig på en kritisk bane i deres levetid. På grund af feltets skalerbarhed får det den opmærksomhed, det kræver. Men uden at kunne forstå ordentligt, hvad det er, hvordan skal vi så ansætte til det?

DevSkiller dækker dig på begge fronter.

What is data science

Hvad er datalogi?

I sin enkleste form er datavidenskab disciplinen, der går ud på at gøre data nyttige. Begrebet datavidenskab er "at forene statistik, dataanalyse, maskinlæring, og de dertil knyttede metoder" med henblik på at "forstå og analysere faktiske fænomener" med data.

Traditionelt var de data, vi kunne evaluere, oftest strukturerede og små i størrelse og kunne analyseres ved hjælp af simple BI-værktøjer. I modsætning til data i de traditionelle systemer, som for det meste var strukturerede, i dag er de fleste data ustrukturerede eller halvstrukturerede. Denne efterspørgsel har accelereret datavidenskabsfolkenes rolle.

1.1 Hvad er en datalogs rolle?

En datavidenskabsmand bør fastlægge virksomhedens datastrategi, hvilket indebærer at etablere alt fra teknik og infrastruktur til indsamling af data og logning til beskyttelse af personlige oplysninger. De beslutter, hvilke data vil være rettet mod brugerne, hvordan data skal bruges til at træffe beslutninger, og hvordan de skal indbygges i produktet. De vil også beskæftige sig med patentering af innovative løsninger og fastsættelse af forskningsmål. En liste over deres grundlæggende ansvarsområder omfatter bl.a:

  • Sammenfatning af alle tilgængelige oplysninger, statistikker og data om en organisation,
  • Indsamling af oplysninger om AI-behovene i en organisation,
  • Analyser data og find potentielle anvendelsesmuligheder med AI (undertiden kaldet Exploratory Data Analysis),
  • Forklare datamønstre for forretningsorienterede kolleger og kunder (en proces, der kaldes data storytelling),
  • Design og forberedelse af maskinlæringsmodeller,
  • Evaluering af modellernes effektivitet i produktionsmiljøet.

Hvis du ikke vidste det, er en maskinlæringsmodel et program, der er blevet trænet til at genkende visse typer af mønstre. Det er muligt at træne en model over et sæt data og give den en algoritme, som den kan bruge til at ræsonnere over og lære af disse data.

En chef datalog skal lede et team af ingeniører, forskere og analytikere og skal kommunikere med ledelsen på tværs af virksomheden, herunder CEO, CTO og produktledelse. Hun skal også beskæftige sig med patentering af innovative løsninger og opstilling af forskningsmål.

En populær Twitter definition har beskrevet en datavidenskabsmand som "en person, der er bedre til statistik end enhver softwareingeniør og bedre til softwareudvikling end enhver statistiker".

1.2 Ligner en datavidenskabsmand nogen andre stillinger?

Mange forskellige typer analytikere kan "gøre data nyttige", lige fra datatekniker til en datamatiker og hele vejen til en kvalitativ ekspert. Alle disse roller indgår i datavidenskab, men for at kunne betegne en person som datavidenskabsmand skal vedkommende have ekspertise inden for alle tre områder (analytik, statistik og ML/IA).

Som eksempel kan nævnes, at en maskinlæringsudvikler udfører en delmængde af dataforskerens opgaver, men kun fokuserer på maskinlæringsmodeller. Stillingen som datavidenskabsmand er i virkeligheden et paraplybegreb, selv om jobtitler aldrig har været en præcis afspejling af ens ansvarsområder.

Data science: What is important for an IT recruiter

Hvad er vigtigt for en it-rekrutteringsmedarbejder at vide om Data Science?

2.1 Hvor ofte ændrer omgivelserne/udfordringerne sig?

One thing an IT recruiter should note is that the landscape is changing constantly. The data is always getting bigger, and problems are getting harder; so new techniques are developed and new frameworks are sure to follow.

2.2 Are there many resources/tools/technologies (libraries, frameworks, etc.) available?

Being familiar with certain resources and tools will certainly be a big advantage. Currently, a lot of tools are available in the Python language, however, there are a lot fewer available for R (another programming language). Some deep-learning frameworks are available in C++, as it’s faster and more memory-efficient than Python. In Python, some of the most popular libraries include: pandas, Seaborn, plotly, scikit-learn, PyTorch, TensorFlow.

2.3 What should a data scientist know about and what are the most important data scientist skills?

Data scientists are expected to know a lot — machine learning, computer science, statistics, mathematics, data visualization, communication, and deep learning. Within those areas, there are dozens of languages, frameworks, and technologies data scientists can learn.

Data science requires statistics and computer science skills — no surprise there. It is interesting that communication is mentioned in nearly half of the data science job listings these days. Data scientists need to be able to communicate insights and work with others. A basic list of what makes a good data scientist is below:

  • Data analysis capability
  • Skilled at machine learning
  • Has good communication skills
  • Has mastered a deep learning framework
  • Is fluent in Python or R

2.4. What type of experience is important to look for in a data scientist (commercial, open-source, scientific, academic)?

For research, only projects — academic or scientific experience will be the most crucial and well-rounded. But in terms of creating production models — previous experience with working with other models of production will give you the best insight.

Verify skills

Hvordan kan man kontrollere datalogens færdigheder i screeningsfasen?

Growing data means growing opportunities — it all just needs good management. Verifying skills in the screening phase is tricky but focusing on a candidate’s soft-skills can also help weed out talent in a unique way. Finding data scientists who are already great decision-

makers can save a lot of hassle for your business.

3.1 What to take into account when screening a CV?

The most important thing to consider is whether the candidate has a detailed background in the most relevant areas. A history of exposure to mathematics, statistics, computer science, programming, and machine learning libraries are absolutely key here. Previous experience with data science analytics and programming are vital too.

What will separate a good data scientist from a great one are interpersonal communication skills, i.e the ability to converse and cooperate with a wide variety of people. The candidate should also have a good business acumen or a well-rounded understanding of business fundamentals and principles.

Be sure to check whether the candidate has indicated how their work positively affected an increase in sales, ROI, etc. It’s quite essential for top candidate’s to include quantitative evidence of their achievements.

If the candidate you’re looking for is a recent graduate, focus on their skills and relevant coursework or internships they may have done to assess their breadth of knowledge.

3.2 What glossary terms are important to know?

  • Exploratory data analysis – this consists of data cleanup, exploration of data patterns, and the manual discovery of patterns in data
  • Data storytelling – this refers to the description and visualization of data patterns for persons without the technical knowledge
  • Classical Machine Learning – solving tasks using models like linear or logistic regression, decision trees, random forests, boosting, support vector machines, non-negative matrix factorization, K-means, k-nearest neighbors
  • Deep Learning – solving tasks using neural networks. Some types of neural networks include Convolutional Neural Networks and Recurrent Neural Networks
Data analysis and manipulation librariesI Python: I Python: NumPy, pandas I R: dyplr, tidyr
Distributed data analysis and manipulation librariesI Python: Dask i Scala, Java og Python: Dask i Scala, Java og Python: Spark
Data visualization librariesIn Python: Seaborn, Plotly, Matplotlib   In R: ggplot2
General Machine Learning librariesIn Python: scikit-learn   In R: caret, e1071
Deep Learning librariesI Python: Keras, Tensorflow, PyTorch I R: Nnet I C++: Caffe

3.3 Which certifications are available and respected? How useful are they in determining data scientist skills?

Let’s get one thing clear upfront: you do not need any kind of data science certificate to get a job in data science. It helps, but recruiters aren’t overly fussed.

However, around half of machine learning knowledge is theoretical so certifications in this area are highly applicable. The other 50% comes from experience, so any kind of production model created, or Kaggle competitions. Certifications usually don’t check for business analysis skills or general people skills. The top courses we have found are below.

  • Certified Analytics Professional (CAP)
  • Cloudera Certified Associate: Data Analyst
  • Cloudera Certified Professional: CCP Data Engineer
  • Data Science Council of America (DASCA) Senior Data Scientist (SDS)
  • Data Science Council of America (DASCA) Principle Data Scientist (PDS)
  • Dell EMC Data Science Track
  • Google Certified Professional Data Engineer
  • Google Data and Machine Learning
  • IBM Data Science Professional Certificate
  • Microsoft MCSE: Data Management and Analytics
  • Microsoft Certified Azure Data Scientist Associate
  • Open Certified Data Scientist (Open CDS)
  • SAS Certified Advanced Analytics Professional
  • SAS Certified Big Data Professional
  • SAS Certified Data Scientist

Certifications obtained from Coursera, edX, or Udacity are also highly respected.

3.4 What other lines on a CV can show data scientist skills?

Taking note of the candidates’ participation in conferences as a speaker can indicate a necessary skill to be an adequate storyteller, an important requirement in data science. It is obviously imperative to be an expert on the technical side of things, but having the ability to explain your findings to those without your technical knowledge is just as crucial.

Taking part in machine learning competitions can also be a great advantage.  Platforms such as Kaggle.com, topcoder.com, crowdai.org, and knowledgepit.ml all offer the chance to compete for awards in the space.

In today’s world, having a good resume alone might not be enough to land that coveted interview call. Especially if you are applying for a data scientist role. As we are living and thriving in the midst of a digital revolution, it stands to reason that the recruiting process would incorporate that as well.

Browsing a candidate’s LinkedIn and GitHub accounts can be useful to gauge the outline of a candidate as well to view their proficiency in open-source projects. You can decide whether the projects are relevant to the current role. This helps you to visualize the candidate’s profile so you are able to structure questions in a certain manner. You will also be able to determine whether the data scientist skills mentioned by the candidate in his/her resume are reflective in their GitHub profile.

Teknisk screening af datavidenskabelige færdigheder under et teknisk interview pr. telefon/video

Det er svært at stole på ord i et CV. Det er trods alt vigtigt at udfordre kandidaten for at afgøre, om han/hun virkelig har de færdigheder, han/hun hævder at have. Selv om det blot er et telefoninterview, kan det hjælpe dig med at forstå, hvordan kandidaten tænker og går til at løse problemer i forbindelse med sit håndværk.

4.1 Questions that you should ask about a data scientist’s erfaring. Hvorfor skal du stille hvert af disse spørgsmål?

  • What kind of DS projects did you do, and what was the extent of your engagement in the projects?
    Reason: As data science is an extremely broad position, oftentimes with differing responsibilities; some candidates may only work in data analysis and storytelling or only gather requirements and create machine learning models. The candidate’s experience should match the responsibilities of the position you’re recruiting for. This question is really aimed at checking the extent of the candidate’s skills.
  • How did your work have a positive financial impact on the organization with the projects you played a part in?Reason: The data scientist role is a position that requires a good understanding of business requirements and conditions. Look for answers that show specific measurements, such as ‘the marketing team was able to cut costs by 10% due to our results’, or ‘we have lowered customer turnover by 5% due to our new retention capabilities’.
  • What kinds of libraries and programming techniques did you use?
    Reason: Data scientists can use a wide variety of tools to achieve the same results. These can depend on the programming language one chooses, the internal company infrastructure, and the size of the dataset the candidate has worked with. The candidate will likely perform best with tools they have previous experience with.

4.2 Questions that you should ask about a data scientist’s viden og holdninger. Hvorfor skal du stille hvert af disse spørgsmål?

  • Hvordan kan du kontrollere, at en model fungerer korrekt?
    Reason: The ideal methodology is to split the dataset into sections: training set, validation set, and test set. The training set is the only one available to the model and is the basis of the training process. The model’s parameters are set using the validation set and model efficiency is tested on the test set.
  • How would you check if the data in the dataset is of good quality?
    Reason: A data scientist will most likely have to work with a dataset collected within the company that might contain missing values, errors or inconsistencies – these are the signs of messy data. To find such problems, a data scientist should perform Exploratory Data Analysis to summarize their main characteristics.
  • What is boosting and what are the benefits or using it?
    Reason: Boosting models are tree-based models consisting of groups of trees that are trained sequentially. Boosting models are currently the most efficient ones with great accuracy, relatively short training times, reduced memory usage, and medium sized required training datasets (in comparison to deep learning techniques).

A tip from our expert is to ask questions that are related to business problems you’re currently recruiting for. Like anyone, data scientists will work best in areas they’re familiar with.

For example, not every candidate may have a “feel” for (or be interested in, or willing to learn) the inner-workings of factory equipment (problems of predictive maintenance), medical terms (creating AI for the medical industry), or client preferences (recommender systems for e-commerce).

4.3 Adfærdsmæssig questions that you should ask a data scientist. Why should you ask each of those questions?

  • How do you deal with differences of opinion with colleagues?
    Reason: A data scientist must have good communication and interpersonal skills (i.e empathy) as their role is based on compiling data from colleagues and finding areas for improvement within their organization or society.
  • Where do you find information about new data science techniques or cases?
    Reason: As the data science field is constantly evolving and growing, the role requires constant research to stay up to date with the latest updates and to problem solve in the most efficient manner. Any of these sources are worthy: conference papers, workshop papers, MOOCs, blogs of companies dealing with DSs, meetups of DS community, Facebook or mail groups with a DS theme, or learning from a mentor.
  • What do you consider to be your greatest success and biggest failure in the DS field?
    Reason: This is a pretty generic question but it shows the self-recognition and self-reflection skills of the candidate. Both are necessary in the learning process which is a major part of being a great data scientist.
Kodning af prøver

Teknisk screening af en datamatikers færdigheder ved hjælp af en online kodningstest

Hiring a data scientist can be a tricky process. The actual definition of a data scientist is vague, and the day-to-day job of someone with ‘data scientist’ in their job title varies dramatically between organizations. Also, people come to the field from a wide variety of backgrounds. Examining the past of a data scientist candidate is a science in itself, one worthy of a blog post of its own. We’re going to stick to showing you how best to screen for a data scientist!

5.1 Which online test for data scientist skills should you choose?

Når du leder efter den rigtige data science skills test skal du sikre dig, at den opfylder følgende kriterier:

  • Prøven afspejler kvaliteten af det professionelle arbejde, der udføres
  • Varigheden er ikke for lang, højst en til to timer.
  • Testen kan sendes automatisk og er ligetil i sin natur
  • Sværhedsgraden passer til ansøgerens evner
  • Testen går ud over at kontrollere, om løsningen fungerer - den kontrollerer kodens kvalitet og hvor godt den fungerer i edge cases
  • Det er så tæt på det naturlige programmeringsmiljø som muligt og giver kandidaten adgang til relevante ressourcer
  • Det giver kandidaten mulighed for at bruge alle de biblioteker, frameworks og andre værktøjer, som de regelmæssigt støder på.

5.2 DevSkiller ready-to-use online data science skills tests

DevSkiller-kodningstests bruger vores RealLifeTesting™-metode til at afspejle det faktiske kodningsmiljø, som din kandidat arbejder i. I stedet for at bruge obskure algoritmer kræver DevSkiller-testene, at kandidaterne skal opbygge applikationer eller funktioner. De bedømmes helt automatisk og kan tages hvor som helst i verden. Samtidig har kandidaten adgang til alle de ressourcer, som de normalt ville bruge, herunder biblioteker, frameworks, StackOverflow og endda Google.

Virksomheder bruger DevSkiller til at teste kandidater ved hjælp af deres egen kodebase fra et hvilket som helst sted i verden. For at gøre det nemt tilbyder DevSkiller også en række færdighedstests inden for datalogi, som f.eks. de her, der er lavet på forhånd:

Python
MIDDLE
Testede færdigheder
Varighed
70 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Python, Gnist

Programmeringsopgave - Niveau: Medium

Python | PySpark | Customer Preference Model - Implementer en datateknisk applikation til forbehandling af markedsføringsdata.

Python
JUNIOR
Testede færdigheder
Varighed
65 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Python

Programmeringsopgave - Niveau:

Python | PySpark | ML Logs Transformer - Færdiggør implementeringen af logtransformationspipeline.

Scala
JUNIOR
Testede færdigheder
Varighed
66 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Scala

Programmeringsopgave - Niveau:

Scala | Spark | ML Logs Transformer - Færdiggør implementeringen af loggenes transformationspipeline.

Datalogi
JUNIOR
Testede færdigheder
Varighed
45 minutter max.
Evaluering
Automatisk
Testoversigt

Opgave - Niveau: Let

SQL | Frimærkekatalog | De tre højeste priser - Vælg tre frimærker (pris og navn) med den højeste pris.

Programmeringsopgave - Niveau:

Python | Pandas | HTML table parser - Implementer en funktion til at konvertere HTML-tabellen til en CSV-fil.

Python
JUNIOR
Testede færdigheder
Varighed
35 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Python

Programmeringsopgave - Niveau:

Python | Pandas | HTML table parser - Implementer en funktion til at konvertere HTML-tabellen til en CSV-fil.

Python
MIDDLE
Testede færdigheder
Varighed
120 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Python

Programmeringsopgave - Niveau: Medium

Python | Rapport om salg af køretøjer - Implementer en applikation til at oprette rapporter baseret på datalageret om salg af køretøjer.

Python
MIDDLE
Testede færdigheder
Varighed
96 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Python

Programmeringsopgave - Niveau: Medium

Python | Pandas | En startupvirksomhed, der leverer mad - Omdan en database med ordrer ved at reducere dens dimensionalitet og oprette en ekstra analytisk tabel.

Python
JUNIOR
Testede færdigheder
Varighed
45 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Python

Programmeringsopgave - Niveau:

Python | Client Base Creator - Implementer programmet til at hente kundens kontaktdata fra chatbeskederne.

Python
MIDDLE
Testede færdigheder
Varighed
70 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Maskinlæring, Python

Programmeringsopgave - Niveau: Medium

Python | DNA Analyzer | Opret og rens DNA-strenge - Implementer 2 metoder i Python, der opretter og renser DNA-strenge.

Python
JUNIOR
Testede færdigheder
Varighed
49 minutter max.
Evaluering
Automatisk
Testoversigt

Spørgsmål efter valg

vurdering af viden om Maskinlæring

Programmeringsopgave - Niveau:

Python | DNA Analyzer - Implementer en metode i Python, der genererer en statistisk DNA-rapport.

Del indlæg

Få mere at vide om ansættelse af teknologiske medarbejdere

Tilmeld dig vores Learning Hub for at få nyttig viden direkte i din indbakke.

Kontroller og udvikl kodningsevner uden problemer.

Se DevSkiller-produkterne i aktion.

Sikkerhedscertificeringer og overholdelse. Vi sørger for, at dine data er sikre og beskyttede.

DevSkiller-logo TalentBoost-logo TalentScore-logo