There’s an expression from the early days of computing: “garbage in, garbage out.” If you give a computer “garbage” data to compute, it will come back with “garbage” results. In other words, computers can only do as they’re told. Information from computers is only as reliable as the data you give them. Though this expression dates back all the way to 1957, it’s just as relevant today. With the rise of analytics, data plays an important role in almost every modern business. Decision makers go to their dashboards, their spreadsheets, and their data feeds, ready to make a call on the fly.
The thing is, what if that data isn’t as accurate as it seems? Furthermore, if you can’t trust the accuracy of your data, how valuable can it really be? In truth, any number of things can go wrong and create errors in the data. Perhaps an employee with sticky fingers and an old keyboard hit a 9 instead of a 6. Maybe a machine captured the data with a broken sensor.
Whatever the reason, inaccurate data can have big consequences down the road. The first step towards preparing for analytics is a hard look at your data. If it’s not reliable, you have some work to do. Here are some tips for identifying and fixing problems in your data flow.
The first step to accurate data is capturing it as well as you can. Though you can always correct your data later, poorly captured data can have a ripple effect on other metrics. Furthermore, correcting data after the fact can be a pain, and you have to be aware of a mistake before you can correct it. You need to do everything you can, within reason, to ensure you capture data accurately at the source.
First of all, take stock of the different data sources within your organization. What data is captured by machines? By humans? Is any of your data outside your control (social media data, for example)? There’s a big difference between customer satisfaction data from a survey versus customer data from usage statistics. For some companies, it’s a shock to learn that certain data comes from a person filling out a paper form.
Next, if possible, take a look at known errors in your data. Sniffing out the source of a single error often reveals larger problems in your data capture methods. With a better idea of where your data comes from, and where errors happen the most, you can take steps to prevent them.
Reducing human error
It’s important to remember that every method of capturing data can make mistakes, even machines. That said, we find that most errors originate with people. After all, humans are only human. Fortunately, there are a few ways to minimize human error.
One method to reduce human error is to replace humans with a machine. For example, instead of relying on a person on an assembly line to count the number of boxes going out, you could install a sensor. Then again, this isn’t always possible or cost effective.
Another method is to capture data at the source, or as close to the source as you can manage. Imagine you’re a detective looking for a description of a robber. An eyewitness comes forward, ready for an interview. If you speak to them that day, their description might have some inaccuracies, but the longer you wait, the hazier their memory gets. The same is true of your employees, if they wait too long to enter data, it starts getting inaccurate. Take steps that make it quick and easy for people to input data while the iron is hot.
Another way to improve human data capture is to give them the time to do it. As you can imagine, people make more mistakes when they’re in a rush. It seems obvious, but many organizations don’t make data a priority at the employee level. It’s handled after the fact, while people are rushed, or not at all. Your employees should take data capture as seriously as you do.
Capturing data accurately is crucial, but even the best methods make mistakes. Without data validation, you can’t be sure your data is accurate. While adding verification processes may seem unnecessary, they’re crucial to your data operations. Inaccurate data can be just as bad, if not worse, than no data at all. Make sure you take the time to verify your data.
One method to verify data is to use redundant data capturing processes. Another method is to use third-party data verification. You can also perform data audits every month or quarter. Whatever method, or methods, you choose, make sure you take them seriously. If inaccuracies aren’t identified and corrected swiftly, they can linger and affect your data operations as a whole. Many organizations find that the more verification processes you use, the clearer it will be which numbers are right.
Consider a supermarket. The average grocery store runs very tight profit margins on most of its products. Because of this, inventory is a crucial part of their data. With this in mind, they have multiple ways of capturing and verifying inventory data. It starts with invoices from their suppliers, a list of everything they’ve received. From here, a stockist puts them on the shelf, taking some care to make sure everything’s there. When a customer buys something, the checker scans the item, and inventory numbers adjust. In addition, most stores check their inventory manually every month or so, going through the store to see if their actual stock numbers line up with what the data says. Without these checks, it’d be easy for stock numbers to become inaccurate. Whether it’s theft, something sliding under a shelf, or something breaking that no one reports, there are a number of ways an item can go missing.
While there’s not always a way to prevent inaccurate data from showing up in the first place, verifying your data at multiple stages can help you can account for it.
Single source of truth
So you have your data, and you’ve done everything you can to ensure it’s accurate. The only problem is, it’s all over the place. Inventory data is in the warehouse’s system, and finance data is in the finance system. Even worse, some of your data might be stuck in a machine on the other side of the country. Before you can really make use of your data, you need to get it in one place. That might mean consolidating your data, or simply integrating different systems, but as we’ve learned, your BI tool is only as good as the data it can access.
For years, it was impractical to store data in one location. Part of the reason for this was logistical. If a company had multiple locations, it made sense to store that location’s data on site. They might upload high-level information to a central source, but much of their data remained in silos. However, as information technology has evolved, particularly the cloud, it’s become fairly easy to store all of your data in one place. That said, not every organization is up to date.
These days, there isn’t much stopping you from maintaining a single source of truth, and tradition is no excuse. While many analytics tools can mashup data from multiple sources, fewer sources generally means better, faster analytics. You don’t need to build a new network from scratch, but a more centralized approach to data will improve and simplify your analytics, not to mention other aspects of your organization.
When consolidating your data isn’t feasible, you should at least agree on a data source of record. You can’t get everyone on the same page when everyone is looking at a different page. To avoid the problem of everyone having different versions of the same data, you need to agree on an “official” source.
With your data sources in order, you’re ready to start implementing analytics. In an upcoming post, we’ll explain how you can figure out what metrics you should measure. In the meantime, check out our resources page to learn more about preparing for analytics.