Do you use a spreadsheet? Of course you do; who doesn’t? Spreadsheets, along with word processors, were the two “killer apps” that drove the rise of the Personal Computer (PC) as processing power moved out of the server room and arrived on desktops across the world.
Before computers, spreadsheets were sheets of paper – usually quite large – with a printed grid pattern that was used to set out accounting or other data in rows and columns. Spreadsheets as we know them today arrived with VisiCalc which was launched for the Apple II computer in 1979 and arrived on the Microsoft MS-DOS platform two years later.
The power and sophistication of spreadsheets grew rapidly with the leaps in processing capabilities and with the demand for richer functionality and capacity. The first release of VisiCalc in 1979 allowed the creation of a spreadsheet that contained 16,000 cells, if your computer had 256Kb of memory. Microsoft Excel now takes that figure up to 17,179,869,184 cells on every worksheet and an unlimited number of worksheets in a work book – again memory allowing.
This vast increase in capacity – around one million times over four decades – means that spreadsheets can now create data systems of a massive scale and complexity. Data processing that would have been a major project 30 years ago can now run on a laptop computer. And while we should celebrate spreadsheets as the great enablers and democratisers of this data revolution, in reality spreadsheets are being used to undertake major data processes without the sort of discipline and professionalism that you would see in a major systems project.
Research suggests that 91% [previously 88%] of spreadsheets contain some sort of material error – so if you’re using a spreadsheet for anything that matters, you’re probably doing it wrong.
Why do spreadsheets go wrong?
I think there are six broad reasons why spreadsheets go wrong.
1. Spreadsheets are badly structured
In systems development we often refer to three layers – data, business logic/algorithms and presentation. Each of these layers utilises different design paradigms, different structures and different approaches to testing. In a spreadsheet these three aspects are brought together with the presentation of the spreadsheet normally dictating how the data and algorithms are structured. Data and algorithms get thrown together in a way that pleases the eye and this makes any sort of systematic quality assurance almost impossible.
2. Spreadsheets evolve and have no overarching design
Spreadsheets are rarely designed. They have often start small and grow on a piecemeal basis over time. Data and algorithms are added and modified with no overarching design principles or standards and inconsistent approaches lead to failures. Spreadsheets are often shared with different people taking their own approach to creating formulae and entering data. Worse than that, in systems like Sharepoint, multiple users (in different locations) can edit the same spreadsheet at the same time. Just because we can, it doesn’t mean we should….
3. Formulae are very difficult to quality assure
Spreadsheet algorithms are incredibly difficult to quality assure; they are normally expressed in terms of sheet and cell references that have no inherent meaning making it impossible to perform any kind of visual check for errors. Should this formula be referring to cell Sheet3!AB45 or Sheet3!AC48?
4. Spreadsheets behave in unpredictable ways
Spreadsheets can behave in unpredictable ways when you insert or delete rows or columns or move data cells; sometimes formulae will automatically update to account for these structural changes, sometimes they do not. There is a logic to these behaviours but it is very complex and subtle and is rarely understood by users. Formulae that were correct can become incorrect, as was the case in the Reinhart-Rogoff analysis that underpinned the austerity policies of governments across the globe over the past decade and which was subsequently shown to be flawed, because of a spreadsheet error.
5. Spreadsheets change your data
Unlike database systems, where you are forced to specify the type of data to be stored, spreadsheets make assumptions about the nature of the data you are entering and often these assumptions are not made apparent to the user. A major gene study failed when Excel assumed that many of the gene identifiers were actually dates and then treated them differently in algorithms due to their different data type.
6. A culture that doesn’t take spreadsheets seriously
When a university found a £100 million hole in its finances due to a spreadsheet error, the response from the official university spokesman was “it’s just a spreadsheet error.”
What can we do?
We need to re-calibrate our relationship with the spreadsheet application and understand that there is a lot more to creating high-quality data processes and analyses than just being able to “drive Excel”. The use of spreadsheets need to be underpinned by the sort of skills and mindset that underpins professional systems development. The spreadsheet-jockeys are running amok and we need to take control of our data.
There is more on spreadsheets here.