Making Sense of Data: From Statistics to AI

3 min read

I first encountered statistics as a graduate student. Like most of my classmates, I saw it as formulas to memorize and regurgitate on exams. Another hoop to jump through.

That changed when I started working with a pharmaceutical company. I watched seasoned statisticians design clinical trials for a new cancer treatment. Suddenly, those formulas determined whether real patients would have access to potentially life-saving medication.

One senior statistician pulled me aside after I’d made a careless error in my analysis. “Every record in our trials represents a real person,” she said. “Someone’s parent, child, or spouse. We owe it to them to get this right.”

That conversation changed how I thought about data. Statistics wasn’t about calculating perfect answers. It was about making the best possible decisions with incomplete information, and being honest about what the data actually tells you versus what you wish it would say.

Statistics, Data Science, Machine Learning, AI

People use these terms interchangeably. They shouldn’t. Each field has its own purpose.

Statistics is the science of collecting, analyzing, and drawing conclusions from data while accounting for uncertainty.

Data science combines statistical analysis, programming, and domain expertise to extract insights from data. It’s the bridge between raw data and business decisions.

Machine learning is algorithms that improve their performance through experience. Instead of following rigid rules, these systems learn patterns from data.

AI (artificial intelligence) is the broader field focused on creating systems that can perform tasks typically requiring human intelligence. ML is a subset of AI.

I learned the differences between these fields the hard way, through years of projects where picking the wrong approach cost time and credibility.

Data Science: Where Theory Meets Reality

My transition to data science came through failure. I was working on a retail forecasting project, armed with solid statistical knowledge. My model could predict how many products customers would buy each month with impressive accuracy.

One problem: we couldn’t get clean data in time to make the predictions useful.

The sales data lived in three different systems that didn’t talk to each other. Product returns were recorded manually and often weeks late. Promotional pricing wasn’t tracked consistently. My model was useless without reliable, timely data.

This forced me to learn new skills fast. Writing code to pull data from multiple sources. Building systems to detect and fix data quality issues. Working backward from business decisions to figure out what analysis would actually be useful.

Data science taught me that 80% of the work happens before any analysis begins. It’s the unglamorous work of data pipelines, quality checks, and automation that makes everything else possible.

Machine Learning: Complexity Isn’t Always the Answer

Three weeks into building a deep learning model to predict customer churn for a telecom company, I had to face an uncomfortable truth. My sophisticated neural network, using hundreds of variables and state-of-the-art techniques, was being outperformed by a simple calculation anyone could do in a spreadsheet.

The simple approach not only worked better, it revealed why customers were leaving. Billing surprises. Unresolved support tickets lasting over a week. Service interruptions in specific neighborhoods. The business could act on these insights immediately.

My complex model produced accurate predictions but no explanations. When executives asked why Customer X was likely to leave, all I could say was “the model thinks so based on patterns in the data.” Not actionable.

The goal isn’t building the most sophisticated model possible. It’s solving real problems effectively. Sometimes that means cutting-edge deep learning. Often it means finding a simple solution people can understand and trust.

AI: Enhancing, Not Replacing

A large hospital system hired us to help doctors diagnose rare diseases using AI. The vision was to process thousands of research papers and patient histories to spot patterns humans might miss.

Our first pilot produced mixed results. The AI found genuinely interesting patterns, flagging potential diagnoses doctors hadn’t considered. But it also made mistakes any first-year resident would catch.

One case: the AI flagged a possible rare autoimmune condition based on a patient’s lab results and symptoms. The attending physician glanced at the medication list and immediately recognized the symptoms as common side effects of a blood pressure medication. The AI had never been trained to check for medication interactions.

We changed our approach. Instead of trying to replace doctors’ judgment, we built a system that enhanced their workflow. It surfaced relevant research papers, similar cases from the hospital’s history, and potential diagnoses to consider, while making it clear these were suggestions, not conclusions.

Since then, AI has become significantly more capable. Agentic systems can chain together multiple tools and handle complex, multi-step tasks with less oversight. But the principle from that hospital project holds: the most successful AI implementations enhance human judgment rather than replace it, especially when the stakes are high.

What Actually Matters

Years in this field have taught me that technical sophistication isn’t everything.

Curiosity about root causes. The best data scientists I’ve worked with can’t stop asking “why?” They’re never satisfied with correlation alone.

Comfort with failure. Most analyses don’t work the first time. Models break. Pipelines crash. The key is learning from it quickly.

Focus on real problems. It’s easy to get drawn to interesting techniques. But if you’re not solving a problem someone cares about, you’re doing expensive math.

Communication. The best analysis in the world is worthless if you can’t explain it to the people who need to use it. This might be the most underrated skill in the field.

Where This Leads

These tools are becoming more accessible every day. You don’t need a PhD to use them effectively. But you do need to understand their strengths and limitations.

The future belongs to people who can combine these disciplines thoughtfully. Statistics for rigor, data science for scalable solutions, machine learning for complex patterns, and AI for systems that enhance what humans can do.

From Mendel’s Peas to ChatGPT: A History of Machine…

The journey of machine learning is a captivating tale that spans centuries, from the humble pea plants in a 19th-century monastery garden to the...
mladvocate
6 min read

What is Machine Learning? A Beginner’s Guide

Your phone’s camera doesn’t just take pictures anymore. It decides when to use night mode, adjusts focus automatically, and can even remove photobombers. Nobody...
mladvocate
2 min read

Welcome to Machine Learning Advocate

ChatGPT answering your questions, your iPhone sorting photos by face, Netflix deciding what to show you next. All of it is machine learning. Most...
mladvocate
43 sec read
ML Advocate Assistant
Answers from the blog
Hi! 👋 Ask me anything about machine learning — I'll answer using ML Advocate blog posts.