Grounding Dreams in Reality: Data Science Expert on How Businesses Can Benefit from AI and ML

Denis Pinchuk
Denis Pinchuk

Today, nearly every company sees significant potential in leveraging neural networks for their business. According to Statista, in 2024, the share of businesses implementing artificial intelligence (AI) in at least one business function has grown to 72%, compared to 33% in 2023. McKinsey reports that 92% of organizations plan to increase their investments in AI over the next three years. BCG adds that this year, one in three companies worldwide will allocate more than $25 million to AI technology.

Data Science, Machine Learning (ML), and Computer Vision expert Denis Pinchuk helps businesses derive real value from algorithms. After completing his master's degree in applied mathematics at the University of Central Florida, he optimized operations for several startups before becoming a Senior Data Science Engineer at The Walt Disney Company. He shared insights into his career development, the challenges companies face when implementing data science projects, and what to consider when selecting AI models.

Which companies did you collaborate with before joining The Walt Disney Company?

— After completing my master's degree, I worked with startups across various industries. One of them offered cybersecurity expertise to clients. As a data scientist, I identified inefficiencies in their business processes and then optimized them using modern data engineering and machine learning techniques.

For example, before I joined, vulnerabilities in clients' products were identified manually. An analyst would review manuals and other documents to compile a list, which was then passed to a supervisor for further action. I automated this process by configuring an algorithm based on Google BERT, which understands context and can extract useful information from text to build a cybersecurity strategy. This alone saved the company at least $15,000 annually.

In another project, I developed a model that tracked publicly available data about our company's products from official sources using keywords. This solution saved the startup over $20,000 per year.

What other industries have you worked in?

— For instance, in logistics—another startup specialized in delivering perishable goods. My main task was to optimize the amount of dry ice in shipment boxes to reduce delivery costs while ensuring product quality for customers.

Initially, the startup only considered data on shipment and delivery locations and the approximate number of hours required for transit. The algorithm was straightforward: if an order would take more than two days, it was sent by plane; if less, by truck. The company realized this approach was far from optimal.

I developed and implemented a predictive tree-based algorithm from scratch, handling everything from data collection and labeling to algorithm implementation. The algorithm incorporated numerous parameters, including temperature data from locations along the truck's route, which required integration with a meteorological service.

I had to deeply research where to obtain accurate temperature data. This is not a trivial task, as even a small difference matters when dealing with hundreds of thousands of shipments annually. Ultimately, I saved the company's clients $270,000 in delivery costs in the first year, even though the startup only had three major clients at the time.

What exactly do you do in your current role as a Senior Data Science Engineer?

— My first project involved solving an identity resolution problem. The Walt Disney Company is a massive corporation encompassing various businesses: theme parks, hotels, television networks, cable channels, movie studios, streaming services, and more. Each subsidiary generates vast amounts of consumer data.

The corporation decided to consolidate this data into a single database, process it, link disparate transactions, and build analytics to offer personalized services to customers. For example, if a consumer visits a theme park and a year later stays at a Disney hotel, these transactions would initially appear with different IDs in the database. However, with well-configured algorithms, the company can recognize that these transactions belong to the same customer and, for instance, offer them a discount on services.

What was your specific role in this project?

— I was responsible for organizing user data and writing algorithms to correlate over 100 million rows of data daily. This is a massive scale. Specifically, I worked with my team to migrate data from DynamoDB to a graph database, Neptune DB, which was better suited for identity resolution tasks. In the new structure, a person becomes the root element, and each transaction "searches" for the user it needs to "join." This sped up computations significantly and improved identification accuracy by 20%. Ultimately, the migration enabled predictive analytics to personalize guest services.

Additionally, I developed an optimized data modeling strategy for Neptune, configured ETL processes using AWS Glue and Lambda, automated SQL query generation in Snowflake, and improved CI/CD processes with Docker. These efforts optimized model development and deployment, accelerated data-driven decision-making, enhanced marketing strategies (particularly by enabling real-time customer segmentation), and reduced manual labor by at least 10 hours per week. My team and I also wrote an algorithm to normalize user addresses and names, which increased individual identification accuracy by another 15%.

What other projects have you worked on at the corporation?

— On a previous project, I significantly expanded my knowledge base and gained experience working in a team of over 100 professionals across dozens of departments, from engineers to lawyers and top executives. I had to explain our solutions in terms they could understand.

These skills proved invaluable in my current project, where our team uses computer vision algorithms to enhance guest safety in theme parks. If cameras detect dangerous behavior—such as suspicious activity or ride malfunctions—the neural network alerts the team so they can take immediate action.

In this project, I serve in a managerial role, leading a team of data scientists. Essentially, I ensure we achieve our goals in the most efficient way possible. This includes assigning tasks based on team members' strengths and weaknesses. Sometimes, I handle tasks myself because it's quicker and easier than delegating. While it's too early to discuss results, no guest has been harmed during our work.

What challenges do businesses most often face when implementing data science initiatives? Is it a lack of data?

— There's never enough data, just as there's never enough money. But, I think the main issue is the gap between expectations and reality regarding data science. It's currently a hype, so top management often has unrealistic expectations of AI and ML. They don't see the boundary between a solvable and an unsolvable problem. This is understandable, as data science is outside their area of expertise.

Our role as data scientists is to clearly and transparently demonstrate to management what tools are available and what we can realistically implement, given limited resources. I believe it's better to do a little, but quickly and well, than to build castles in the air.

Implementation is always challenging. If you develop an AI model that makes good predictions, it doesn't mean the business can use it. You also need to deploy it to production, which requires automating data collection and ensuring it's error-free. As long as some of these processes are manual, implementing an ML project will be impossible—especially when dealing with hundreds of millions of transactions daily, as in the case of The Walt Disney Company.

How do you choose the right model for a specific project?

— I'd say it's not much different from choosing accounting software. You simply consider your needs and test different solutions. For some tasks, accuracy is more important; for others, speed. Sometimes, cost is the main factor. Other times, companies choose a less efficient neural network because it's easier to integrate into their existing infrastructure.

For example, in my first startup, I needed to develop a model for the Florida Department of Transportation that would use surveillance camera footage to monitor the number and type of passing vehicles. The challenge was that, for legal reasons, the video couldn't be stored.

For this project, I chose the Yolo neural network, which is known for its high speed. While I sacrificed some accuracy, it was able to process camera data streams in real time. The accuracy still reached 97%, better than the 90% achieved by the older, more expensive method of laying cables on roads for counting.

So, you can't choose a model without understanding the problem you're solving. First, you need to grasp all the nuances of the project, then select a few algorithm options and evaluate which one performs best. For instance, if the task is predicting customer churn, you might consider classical models like linear regression, which typically offer high accuracy and interpretability. For natural language processing tasks, transformers are worth considering.

What else should be considered when implementing AI models?

— One key factor is explainability. There's a clear demand for this from businesses. AI often works like a black box: we can understand the general principles of a model's operation but can't always explain exactly which features and mechanisms led to a specific decision. In some tasks—like traffic monitoring—explainability may not be crucial. However, in others, such as detecting anomalies and explaining their causes, it's critical. For example, in banking, it's essential to understand why an AI recommends denying someone a loan. Additionally, regulations like the EU AI Act are emerging, emphasizing the importance of explainability in AI tools.

There are various ways to enhance explainability. You can use classical regression models, which banks have used for decades. These not only predict outcomes but also explain which factors influenced them. If a neural network produces unexpected results, you can run a linear regression or decision tree analysis to examine the impact of individual factors. This won't provide precise predictions but will help better understand the significance of different features and improve the model. Another method is SHAP (SHapley Additive exPlanations), based on game theory, which calculates the "fair" contribution of each feature by comparing different input combinations.

The higher a model's explainability, the easier it is to use in critical processes, especially where decision justification is required, such as credit scoring. However, in tasks like autonomous vehicle control or disease diagnosis, accuracy is generally more important than interpretability.

ⓒ 2024 TECHTIMES.com All rights reserved. Do not reproduce without permission.
Join the Discussion