The Data Science Skills That Truly Matter Aren’t Always Technical

UNLIMITED DATA | BY JAMES KULICH | 5 MIN READ

An illustration showing three students using data science skills while sitting on a blue tablet that is displaying charts and graphs..

Recently, I came across two distinct views of the future of data science. I found these dueling KDNuggets posts by Mikhail Mew and Ahmar Shah to be interesting and a bit amusing:

A closer read reveals that Mew, in the first post, is not really predicting the demise of data science. Rather, he claims that our understanding of the field is evolving as tools for handling the more technical aspects of our work become increasingly available, accessible, and automated. Mew writes:

The pressing hiring demand has shifted to problem solvers and critical thinkers who understand the business, the respective industry as well as its stakeholders. No longer will the ability to navigate a couple of software packages or regurgitate a few lines of code suffice, nor will a data science practitioner be defined by the ability to code.

I completely agree! This has been our philosophy from the start of the Elmhurst University Master’s in Data Science and Analytics program—and is a primary reason why our graduates have been so successful.

Data Science Skills for the Next 10 Years

Shah makes some points in his more optimistic post that also fit well with our program’s approach.

Shah reminds us that data science is indeed science—and has, in various forms, been around for centuries. Science is all about using information to create theories which are then applied to solve problems.

As Shah says, that’s exactly what we do in data science. Our language is a little different. We call information “data” and theories “models.” With the recent explosion of cheap computing power and available data, we can use data science to address an unprecedented range of meaningful problems.

Shah offers some advice regarding data science projects. He tells us that developing models is a very small part of actual projects and that real-world data science projects need iterative development.

Once again, these ideas are a great fit with Elmhurst’s project-based curriculum with its focus on creating stakeholder value.

Skills of Present ValueSkills of Future Value
Business UnderstandingBusiness Value Mapping
Data PreparationData Ops (Data Pipelines)
Model Creation (Coding)ML Ops (Automation with Focused Coding)
Model ValidationModel Curation (Domain Expert Coordination)
Model DeploymentChange Management (Model Maintenance)

Case Study: How Students Applied Modern Data Science Skills

Here’s one example of these ideas in action from a recently completed student capstone project.

The setting is an organization that offers platforms for survey design and reporting software. They are interested in finding ways to offer a better user experience. How so?

The process of developing a new survey on the platform often involves a high degree of customization, which in turn requires specialized survey configuration settings to be set. More customization means more configuration settings, and that creates more opportunities for user error. If better default configuration settings can be determined dynamically, unintended user error can be reduced.

These surveys make use of a large number of open-ended questions. The sentiment structure of these questions—the degree to which they have a more positive or more negative tone—turns out to be tied to the necessary survey configuration settings. So, if sentiment can be predicted from the text used in the questions, better default configuration settings are possible.

The project goal became clear: Predict the sentiment of survey questions with at least 80% accuracy.

Notice the focus on getting the problem statement right and the iterative process needed to do so. This has nothing to do with anything technical. It’s all about good use of domain knowledge and the scientific method.

Case Study: Conclusions

The next big step was understanding the data and getting it right. This is no easy task when working with text data. Techniques as simple as determining word frequencies and as sophisticated as Latent Dirichlet Allocation and Short Text Topic Modeling were used to identify and quantify useful patterns in the text. These patterns became the ingredients used by candidate models to predict survey question sentiment.

While this may appear to be neat and clean, several iterations were necessary to get to this stage. Some ideas simply didn’t work. Others showed promise and pointed toward useful refinements in later rounds.

The same was true for the modeling phase. Many models were developed, tested, and refined. Each had strengths and weaknesses. Powerful techniques like SHAP plots yielded insights into how models made their choices, highlighting everything from basic word counts to things like the number of comparative adjectives used.

In the end, decisions were made on both technical and business grounds, balancing accuracy with interpretability, ease of deployment, and capability for long-term maintenance.

This was great work, but only 70% accuracy was achieved, falling a bit short of the 80% threshold. The results were still useful. More important, the careful and systematic approaches used pointed to areas for improvement that may ultimately allow for the 80% goal to be met.

This is real and enduring data science.

As we begin this eighth year of our program in data science, I am very impressed by the success our students and alumni are experiencing. Recent graduates are beginning new careers and alumni are advancing in their professions. They are making a difference in an amazingly wide range of settings. These are the outcomes that make us proud!

Refine Your Data Science Skills at Elmhurst

Elmhurst University’s Data Science and Analytics program helps professionals excel in business. Meanwhile, our flexible online format allows you to earn a master’s degree on your terms. Ready to learn more? Complete the form below.

Fill out my online form.
About the Author

Jim KulichJim Kulich is a professor in the Department of Computer Science and Information Systems at Elmhurst University. Jim directs Elmhurst’s master’s program in data science and analytics and teaches courses to graduate students who come to the program from a wide range of professional backgrounds.

Posted Oct. 5, 2021

ai blog image

AI, Ethics and a New Division of Labor

April 9, 2024 | 5 Minute Read

The Current State of AI

November 7, 2023 | 5 Minute Read

data literacy

A Path to Data Literacy

October 10, 2023 | 5 Minute Read

ChatGPT: Ask the Right Questions

July 18, 2023 | 5 Minute Read

Leave a Reply

Your email address will not be published. Required fields are marked *

Connect with #elmhurstu