Assumptions and Their Impact on Data

In today’s data-driven business environment, information is collected that is meant to reflect real world activities that can be analysed to judge performance and make better decisions.  Metrics that represent actual counts of a certain action represent a factual portrayal of what occurred on an actual page, a specific button or relative to a particular customer activity. The actual count represents a true account of how many times that particular action was completed.

Of course, the majority of metrics being employed in analysis are derived from calculations or represent on average or aggregate and are not necessarily a straight count.  Data collected like this do contain some underlying assumptions and these assumptions present a danger as they are used to stand-in for what might be perceived as a truth.

Recognizing when a truth is actually just another assumption is critical.  A good example comes from a past debate concerning the time spent on site metric.For many years it had been viewed as an important measurement of user engagement across a website.  The thinking was that longer time spent indicated a higher level of interaction with a site’s content.

The time spent metric represents an average calculated from all visitors but never offered an absolute, verifiable number that could be assigned to an individual user.  In other words, the metric represented an assumption on time spent derived from an average of all website visitors to a certain page.

This however, was largely understood by experienced analytics practitioners and was employed in analysis more as an indicator or guide.  Individual activity might have been traceable but doing so at scale represented a challenge that did not make much sense to tackle.  The needed insight (indication of engagement) could be garnered provided the knowledge of this underlying assumption was kept top of mind and not viewed as an absolute truth.  

At its inception, the total amount of time spent on a particular page could be tracked and documented.  It division by the total number of verifiable users then could be applied providing an estimate of time users were spending on pages and their content.  As analysis of web activities became more sophisticated, such measurements and their use and value evolved to better reflect their best use acknowledging the underlying assumption used to derive meaning for the metric.

In the last several years, support for the time spent metric has dropped considerably.  Experienced analysts now point to the fact that users employing the tab function (opening pages and sites in separate tabs) has negated any value that can be derived from the old time spent metric.  This ability has existed for some time now but did not back when the original metric was introduced.  

The argument is that users often leave pages open on unused tabs extending time spent recordings when in fact, they are not actively on the page interacting with content.  Time spent can no longer be viewed as useful because they are likely not even on the page, let alone as   being “engaged” as long as indicated  because of the artificial bump in time the tab function provides.

An objective listener of this argument will agree that the time spent metric would see an increase in its reported averages for a given page because of this potential “open tab” behavior. Further support for this viewpoint can likely be found in one’s own behavior.  Keeping multiple tabs open while searching/ browsing online is a fairly common online habit. 

However, continuing with a purely objective view, it can be pointed out that this entire argument is yet again, based entirely on an assumption.  While there is anecdotal evidence that this tab behavior occurs and may even be widespread, it cannot be stated absolutely that all users do this and that it skews the time spent metric into a state that is of no value.  

Remember, original practitioners recognized that the metric’s value was as an indicator versus an actual measurement of individual activity.  Significant changes in the average signaled taking a closer look at factors that might impact a user’s engagement.  Employment of open tabs would likely raise that baseline average initially but this change in “behavior” would ultimately be buried in higher metric averages that would not demonstrate noticeable increases or decreases on its own over time.

What we come back to is the original assumption that the value of the metric was based on.  That the activity time spent on page/site represents is an indicator of engagement and not a true measure of individual user activity.  It still can provide a signal for issues or success but not necessarily a confirmation of such results.  The newer belief that the metric is useless reflects yet another set of assumptions but under the guise of an absolute truth supported by more subjective “facts” derived from personal experience.

So what is the lesson here?  Is the time spent metric of such great import that it’s value and use must be defended and rectified?  The answer is no.  Instead, it’s the role that assumption can play in how a piece of information and its value is perceived and utilized.  

Recognition upfront that a particular metric or piece of data is based in part on an assumption can direct its proper use.  Calculated metrics (those that go beyond simple counting of individual activities) always contain assumptions on some level.  This fact does not negate their value but is important to understand upfront in order to derive the true meaning of what said data provides.

Conversely, challenges to the usefulness of a given piece of data may also be laden with their own set of assumptions disguised as “truths”.  These must also be understood upfront to avoid missing out on valuable insights or, more importantly, to avoid being led down an entirely wrong path.

Assumptions are made everyday.  It is an integral part of the functioning human brain and can act as a useful shortcut when properly recognized and employed.  Assumptions can also serve as stand ins’ for truth creating a bias that, when followed, can blind one from facts and lead down a path that gets farther and farther from actual reality.  The time spent metric is hardly representative of a profound truth that has life hanging in the balance.  Arguments around its perceived value and usefulness does provide a window into the existence of assumptions in data and their impact, both positive and negative.  

Assumptions and their employment exist as a fundamental part of human behavior.  Knowing better as to where they exist and how they are impacting one’s view is critically important, not just in the analysis of data, but in one’s life.  They have their uses and can provide invaluable shortcuts to needed insights and information.  Assumptions also create bias and can be equally damaging when their presence is not fully understood or recognized.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: