Monthly Archive: September 2014

Statistics and Probability

Roulette wheel in casino, close-up on No. 28.If you have ever used probability or percentage to measure some value then you know that it is very hard to predict how often an event will occur and if it occur at all. In a completely deterministic universe if you somehow could know the initial starting measurement of any object or thing you could predict how the process will unfold and thus predict the outcome. But in real world it is almost impossible to measure every single aspect of any given situation so giving probabilities as a likelihood that one thing would happen is more useful. If you imagine a card game then you never know what are the cards in a shuffled deck but imagine if after playing a card game you see how the dealer takes those cards and shuffles, you see where each card goes and in the end you know what will the next card be and thus have an edge over other players. But imagine if you could see just half of the shuffling actions and partially the other half. In this situation you could assign probabilities to how likely is that a card will come before other cards as well as where are the strongest but you could never be sure because you do not have the complete information.

If we see how statistics is related to probability theory then we can see that in every statistical prediction there is a certain level of probability that the prediction is true but you can never say anything with complete confidence. This is because statistics takes a lot of data, organizes and finds the underlying model and analyses it but this method cannot tell you that the next event that will occur will be that and this method can just guess the likelihood. This means that statistical data and analysis as well as predictions are all just informed probabilities and they will never be 100% accurate.

And if we delve deeper into this subject then there are even more unpredictability as for example in quantum physics everything consists of likelihoods that something will happen but in reality the thing may or may not happen. And that means that any statistical measurements will not be completely accurate as even after a significant sample data there would be accourances when something goes out of control and basically is unpredictable. And this is also true for humans as we are unpredictable creatures and we can change the odds of something occurring in our favor but that is a completely different story.

Statistics in Marketing

marketing dataMarketing is one of those fields where everything is closely monitored and where data-driven approaches are the norm. And because of this marketers are using statistical methods to gather, analyze and predict trends and their campaign efforts as well as collect information about their focus groups and apply it to the real world scenarios.

Statistics can help marketers to overcome some gaps in their understanding about demographic groups and people in specific areas where they want to promote a product or service. If you have ever seen a targeted advertisement then you know that these marketers are becoming more and more precise in pinpointing their products to people that would actually buy them. With the help of our modern tools and the internet these people are collecting more and more data on everyone and then using this data to predict what where and how you will buy their products. But that doesn’t stop there and lately using statistical data analysis marketers are starting to promote products that are related to what you do but wouldn’t even have thought of buying.

To see how far statistical data analysis has come you need to look at the Target supermarket chain and their efforts in increasing profits and analyzing their customers. The story goes that Target with their loyalty cards and online purchases are collecting huge amounts of data on every customers and their buying habits as well as where they live and even what they could buy. Then smart data analysts use statistical analysis and try to find every customer buying patterns so they can then intervene and get even more out of everyone. This customer targeting even includes coupon sending to their regular customer homes with item that they could like and might buy. I once read a story that Target sent coupons with baby-clothes and other pregnancy items to a 16 year old girl and the father furiously demanded that they stop doing that and that his daughter is just 16 and can’t be pregnant. But after a couple of months the father came back and apologized because his daughter was in fact pregnant and that customer targeting system knew it faster than that girls father. Of course this is an extreme case but imagine how much money a store like this could make it that system could know everything about every customer and their buying patterns. And it is not science fiction because if you can tell that a buyer is pregnant only from the items bought in the store then soon enough that statistical analysis will reveal much more intricate details about you and your buying habits.

Bell Curve aka Gaussian Function

Bell_CurveBell curve or more widely known as the Gaussian distribution/function is a theory that states how a standard deviation happens in any environment and by doing so this theory proposes a two or three dimensional graph that shows a succeed line that starts off with exponential function, has a top point and then decreases in exponential fashion. This Bell curve is a very widely used graph to show different environmental, statistical and other phenomena and study their speed of change as well as where the peak value could be. If you would search around for this specific graph this you would start noticing it more and more and see that this normal distribution is a part of almost any process starting from sales and new product introductions to population growth as well as oil and other non-renewable energy source availability and peak values. As I understand this phenomenon then a bell curve can appear in any situation where there is a limited amount of something but an entity(most likely humans) are not acknowledging this and doing everything like there will be continuous growth all the time. This phenomenon can be seen in the peak-oil graphs that you can find where some statisticians and data scientists have predicted that because Oil is a non-renewable resource and we as a species are using this resource like it will never end then we will soon get to the point where demand will exceed supply and would have reached the peak of the curve. When this happens then there will be no point in trying to find new oil fields or new extraction methods because it will be more and more harder to do so.
This normal distribution curve has a formula that states how wide will the curve be and where the top point resides and then the curve is created from these values visualizing that distribution to anyone and showing more insights than a simple data-set could ever do. These bell curves are used for a wide range of applications and they show up in almost every statistical data where there is a real world experiment with data gathering. Almost always there is a standard-deviation curve that will show some underlying truths about the data and will allow the statistician to make predictions about the standard deviation and overall data sets. Quite frankly there is not one application or any reason why these bell curves appear in data but they show that underneath all datasets and all experiments are some overall truths that we can extract.


What is statistics?

statistic_curveStatistics is a branch of science that deals primarily with data. Statistics as a science deals with all aspects of data including collection, organization, interpretation and conclusion making. This means that where there are large or even small bodies of data there are always statistical methods applied to that data to display, interpret and even change and see some patterns within that data. Some say that statistics is a mathematical science because basically everything you do with data involves in one way or other math and thus statistics without math do not exist. But my view on this subject is a bit different and I think that statistics is a whole scientific field separate from math because when you collect, organise and devise meaning from data you get patterns and conclusions that are not pure math and thus can be interpreted otherwise.

This scientific field is well known in computer science because where you get computers there always are a lot of date and when you need to collect and process any kind of data then those methods that you use are from this field. There are a lot of programs created for analysing data and maybe the most well known is Microsoft Excel as this spreadsheet program can represent large datasets in charts and even apply mathematical functions on that data to derive values such as average, largest, smallest and a lot more. Because of programs like these anyone can participate in statistical data analysis and can start to derive meaning and conclusions from large bodies of information.

The best way to see statistics in action is to look at infographics and other data visualizations like charts and diagrams because our human brain is very bad with numbers but excellent at pictures and abstract meaning extraction. If you take a lot of data and organise it in a chart then you can see how the data is divided, and what are some other underlying aspects of that data that you could not have seen just looking at that information. But visualizing data has its limits and you cannot derive real conclusions just by creating and measuring some charts and diagrams. To fully interpret and see the underlying patterns you need to apply formulas like statistical mean, average and standard deviation and then you can start to understand the dataset with more accuracy.

As you can see – statistics is a lot more than just visualizing data and to fully understand this science you must start from the very beginning and collect data, then process it and only then you can start to interpret that information but that is a whole new topic.