Winning the Hardware Software Game Winning the Hardware-Software Game - 2nd Edition

Using Game Theory to Optimize the Pace of New Technology Adoption
  • How do you encourage speedier adoption of your product or service?
  • How do you increase the value your product or service creates for your customers?
  • How do you extract more of the value created by your product or service for yourself?

Read more...

Latest Comments

  • Anonymous said More
    Great explanation for the relationship... 4 days ago.
  • Anonymous said More
    nice analysis, thanks Wednesday, 21 October 2020
  • Anonymous said More
    The fact that CBD from marijuana is... Sunday, 14 June 2020
  • Anonymous said More
    This was excellent and extremely... Tuesday, 21 April 2020
  • Anonymous said More
    Well written. Well constructed. Tuesday, 13 August 2019

Are you getting as much value from your Big Data or IoT analyses as you can? There’s a very good chance you’re not. And it might not be for lack of trying. There are three, big contributors that are likely to be preventing you from being able to extract as much value as you could from your data:

  1. You dive right into the data without first creating a roadmap;
  2. You don’t understand the context and limitations of your data; and/or
  3. Your analyses are too complex.

Create a Roadmap Before Diving into the Data

Creating a roadmap for your analyses before you dive into the data will help you increase the efficiency and accuracy of your analyses in two important ways.

First, defining the big picture – where you are and where you’d like to go – gives you a vision of the big picture, which helps you to guide and structure your analyses. With a good understanding of the big picture, you are also much less likely to get lost in your analyses and end up wandering down stray alleys. For both these reasons, having a clear roadmap will help you move through your analyses much more quickly and efficiently.

Second, thinking about what you would like to accomplish with your data helps you better understand what your ideal data look like. This is crucial, because you must understand how the data you have differ from that data you would like to have, so that you can adjust your analyses and interpret your findings accordingly. This is exceedingly important for increasing the accuracy of your analyses. This issue is discussed in more detail in the next section.

 

Understand the Context and Limitations of Your Data

Data are very contextual: they are collected in specific situations, under specific conditions, with specific intentions. If you lose site of the context in which the data were collected, then you're very likely to misinterpret what your data represent, and therefore become misinformed by any analyses you perform. Any of the following factors could contribute to your data misrepresenting what you’re trying to gauge.

Your data are bad proxies for what you’re using them for.

A classic example of a bad proxy is the ubiquitous use of GNP for a country’s well-being. Pundits generally take the view that if GNP is growing fast enough, then the country is booming, and otherwise it’s lagging. On a related note, there is also increasing concern that continued growth in global GNP will over-deplete the world’s resources and is thus unsustainable (see, for example, here). However, what both these views overlook is the fact that GNP is not, in fact, a good measure of people’s well-being. GNP does not capture, for example, a population’s access to education, healthcare, clean air, or job satisfaction. If a better proxy for well-being were found, then pundits would have more accurate measures of the state of citizens’ well-being in both good and bad times, and people would recognize that growth in global well-being, is, in fact, sustainable.

Another good example of a bad proxy is using a person’s credit score to gauge whether or not he’d make a good employee. Credit scores are measures of a person’s likelihood of repaying a debt. Employers use credit scores to assess job-worthiness because it’s an easily accessible metric. But is it an appropriate measure of job worthiness? Maybe, but maybe not.

Your data have significant errors, inaccuracies, or omissions in them.

Analyses performed using inaccurate data can provide misleading results. 

Patient health data, for example, contain a high incidence of errors, including inaccurate diagnoses, inaccurate medications and dosing information, and missing information. In the case of big data analyses of patient medical records, inaccuracies in patient data can lead to inaccurate conclusion about which treatments or courses of action are most effective for patients.

In the case of data omissions, analyses can lead to spurious, or false, conclusions. There is a website run by Tyler Vigen that provides fantastic examples of spurious correlations, such as that “US spending on science, space, and technology correlates with suicides by hanging, strangulation and suffocation.”

In some cases spurious results are due to chance. Specifically, if you look at a large enough number of low probability events, you’ll eventually find one that happens (this is a version of the law of large numbers). Alternatively, spurious correlations between two data series can be due to omitted variable bias. For example an analysis might indicate that a lot of people in the South who go to the local shopping mall buy ice cream. This might lead you to conclude that people who shop at the mall have a particular preference for ice cream. However, the true relationship might actually be that people go to the mall to avoid the heat, and when it’s hot outside, people eat more ice cream. An owner of shopping malls located throughout the country might use the mistaken interpretation of this correlation to make sure all his malls contain plenty of ice cream shops. Having plenty of ice cream shops in all his shopping malls could then cause him to lose money in malls located in cold weather climates.

Your data are out of date.

In dynamic environments, the reliance on untimely data can lead to inappropriate conclusions. As the saying goes, “Generals always fight the last war.” Whenever I travel, I wonder how much money the TSA is spending trying to prevent the next underwear or shoe bomber.

Other situations in which untimely data lead to inappropriate conclusions are in the case of perishable data. Airline tickets and hotel rooms are notoriously priced using revenue management methods. If information on inventories of unsold seats or rooms are not kept up in a timely manner, then pricing algorithms won’t work to maximize revenues while minimizing numbers of seats and rooms that go unsold.

Your data are mixed and matched from different sources.

When different data elements are taken from different data sources and then used together in analyses, there is a good chance that the analysis may lead to inaccurate results. In particular, mix-and-match data are often internally inconsistent.

Suppose, for example that you have a data source that says that in Ancient Persia the price of a sack of wheat was one-tenth of sigloi, while the price of a bushel of apples was one-twentieth of a sigloi. In this case, the information for the price of wheat is internally consistent with the price of apples. With this information, we wouldn’t necessarily know how many dollars that sack of wheat cost, but we would be relatively confident that a sack of wheat was twice as valuable as a bushel of apples at the time and place the data were taken from.

However, suppose we had one source of information that said that a sack of wheat used to cost one-tenth of sigloi, while a different source that said that a bushel of apples used to cost one-twentieth of a sigloi. In this case, the data are not necessarily internally consistent, and we would be much less certain about the relative values of wheat and apples. What if the two measurements came from different time periods? Or different cities?

Your data are biased.

Perhaps the most insidious problems with data analysis occur when data are biased. Biases in data are especially likely to occur if the data have been collected from sources that exclude specific chunks of the underlying population that contribute to what you’re trying to analyze.

One of the easiest ways to determine if your data may be biased is to ask, “What criteria were used to determine if observations were either included in or excluded from my data?” If there are certain factors that cause certain types of observations to be either over-represented or under-represented in the data, then your data may very well be biased.

More obvious biases occur, for example, when data come from

Less obvious biases occur, for example, when data come from

  • People who are asked to self-report information about themselves. Self-reported data are notoriously inaccurate; or
  • People who are successful at completing some task. These data may suffer from attrition bias by excluding information from people have tried, but failed, to complete the task.

 

Simplify Your Analyses

When faced with a large set of data, there is a tendency is to throw everything into the mix to see what works. There are two good reasons, however, to simplify your analyses as much as possible.

First, as analyses become more elaborate – that is, when they include more interrelationships among the different variables – you generally end up with complex, unintuitive outcomes. It then becomes difficult to navigate the relationships and associations in order to uncover the true insights.

Conversely you can gain a much clearer understanding of the underlying dynamics of your situation by first examining simple relationships among the variables. Once you’ve nailed down the basics, you may then try to further elaborate on those basics in order to hone your results.

The second reason you should simplify your analyses as much as possible is that funky relationships (i.e., multicollinearity) among your different variables can end up clouding your results. Any such inter-relationships can make it difficult to understand the real underlying dynamics. Again, you’re much better off starting out with simple analyses to understand the basics, and then further elaborating thereafter to better understand the nuances.

 

Getting Better Value from Your Data`

In order to perform the most effective and efficient analyses and generate the most value from your data you should plan ahead. Before you jump into your data, you should first think about what you’re looking for, which outcomes you think you might find, and what information you have to try to get you there.

By creating a model based on theory, you will have a better understanding of where you’re going, how you plan to get there, and any adjustments you might have to make along the way, either to your data or to your analyses.

Next, you must understand the context of the data in your datasets so you will be aware of any limitations of your data. Only by understanding what information your data capture will you be able either (i) to adapt your analyses to account for the limitations or (ii) to view your results through the appropriate lenses.

Finally, simplify your analyses, at least initially, until you have a clear understanding of the basic dynamics underlying your system. Only after you’ve nailed down the basics should you try to further elaborate on those basics in order to gain a better understanding of nuances or otherwise hone your results.

By using foresight to guide your analyses, having an understanding of the context and limitations of your data, and using simple analyses to uncover the basics, you will generate not only greater value from your data, but you will also do so more quickly and efficiently.

More Blogs

Cannabis Cultivation: Seeds vs. Clones

26-09-2020 - Hits:1763 - Ruth Fisher - avatar Ruth Fisher

Cannabis plants are dioecious, that is, they are either male or female. Plant reproduction occurs naturally, when male plants pollinate female plants, causing female plants to produce seeds. New cannabis plants can thus be cultivated by collecting seeds from fertilized females and replanting them, or by buying seeds generated by...

Read more

Cannabis Cultivation: Indoor vs. Outdoor vs. Greenhouse

22-09-2020 - Hits:1448 - Ruth Fisher - avatar Ruth Fisher

There are three basic locales for growing cannabis: indoors, outdoors, or in greenhouses. Greenhouses enable growers to benefit from natural light, while also being able to strategically block out light to induce quicker flowering. Budget-friendly greenhouse operations are more subject natural climate variations, while higher-end greenhouses are more similar to...

Read more

Would the Endocannabinoid System Have Been Discovered Earlier without the Ban on…

10-06-2020 - Hits:1588 - Ruth Fisher - avatar Ruth Fisher

Cannabis was used medicinally in the Western world from the mid-1800s through 1940, even though doctors did not understand cannabis’s mechanisms of action. The Marijuana Tax At of 1937 Federally banned the use of cannabis in the US for either medical or recreational uses, and it restricted scientific studies of...

Read more

How Regulations Shape the Cannabis Industry

16-05-2020 - Hits:2367 - Ruth Fisher - avatar Ruth Fisher

  The cannabis industry is highly regulated, and the various regulations play a powerful role in shaping the structure, and thus outcome, of the industry. This analysis examines the following questions: How do cannabis market regulations shape market structure? Are the resulting outcomes favorable to suppliers and/or consumers? What are the pros and cons...

Read more

Cannabis Industry Rollouts: Lessons Learned from States’ Experiences

27-04-2020 - Hits:1754 - Ruth Fisher - avatar Ruth Fisher

Bart Schaneman from MJ Business Daily recently released, “2020 Cultivation Snapshot: U.S. Wholesale Marijuana Prices & Supply.” The information contained in the report helped cement certain insights I’ve had about the evolution of the cannabis market. Background info In addition to the myriad other laws and regulations, all states essentially have two...

Read more

A Data-Generating System: A Framework for Data Assessment

14-04-2020 - Hits:1069 - Ruth Fisher - avatar Ruth Fisher

Suppose I gave you, the Data Analyst, a dataset of information on sales of Ford automobiles, and suppose I told you to use that dataset to predict total national sales of Ford automobiles for next 12 months. What would you want to know about the data you were given? If you...

Read more

Hemp and CBD Market Supply

06-04-2020 - Hits:1890 - Ruth Fisher - avatar Ruth Fisher

The information in this post was compiled in an attempt to understand 2 issues: Does the cultivation of hemp differ depending on the hemp product supplied (fiber, seed, or flower)? Is the CBD produced from hemp (cannabis with ≤ 0.3% THC) identical to the CBD produced from marijuana (cannabis with > 0.3%...

Read more

Trends in Cannabis Patents Over Time

08-12-2019 - Hits:2353 - Ruth Fisher - avatar Ruth Fisher

Patent Counts by Year I searched the USPTO patent database for all patents for which the patent abstract contained any of the following terms: cannabis, cannabinoid, marijuana, tetrahydrocannabinoid, or cannabinol. My search yielded 914 patents. As seen in Figure 1, there were only a handful of cannabis patents each year until the...

Read more