Statistics 2015

Popullation and families

Globally, there are about 62 million more men than women in the world. It is interesting, that in younger age groups men outnumber women, but in older age groups women outnumber men.

Nowadays both men and women are getting married later – men marry at age 29, but women marry at age 25. In developing regions more than 1 in 4 women at age 20 – 24 are married before they turn 18. These proportions are higher in sub-Saharan Africa and Southern Asia. Informal unions are on the rise all over the world, exceeding 30% in many countries.

In most countries among divorced or separated couples at age 45 – 49 there are at least 5 women for every 4 men. 3 in 4 one-parent households are lone mothers with children. Women at older ages (60+) are more likely than men to live in one-person households. This tendency is more observed in developed regions.


Globally, life expectancy has risen over the last 20 years for both genders. In 1995 women lived for about 64 years, but now they live for about 72 years. In 1995 men lived for about 60 years, but now they live for about 68 years. Also maternal health has improved considerably globally – in 1990 there were 380 deaths per 100 000 live births, but in 2013 there were 210 deaths per 100 000 live births.

Women are less likely to exercise than men, therefore, obesity is more prevalent among women than men. The prevalence of tobacco smoking is higher among men than women. Men are also more likely to be involved in heavy episodic drinking than women.

HIV / AIDS and maternal conditions are the leading causes of death for young women. Top 3 causes of death among young men are road injuries, interpersonal violence, as well as self-harm. For both women and men aged 60+ cardiovascular disease is the leading cause of death, but men are at higher risk of dying due to ischemic heart disease. Women are more likely to be affected by dementia – 25% – 50% women who are older than 85 years have dementia.

Breast and cervical cancer are the most common cancers which affect women. Lung cancer is the most common type of cancer among men over 60.

Probability in poker

Poker might be one of the most popular card games, because it is played by people all around the world and of all ages. Some play it for fun, some play on bets, but some play it as their source of income. However, because poker is a card game, and cards are something countable, poker might be beatable by math. No wonder those big casinos in Las Vegas have forbidden counting cards.

In poker the most efficient method of math that can be used is probability. Probability is the mathematical branch, that calculates the likelihood that at the end one outcome or another will occur. The easiest example is coin toss. If you toss a coin there are two possible outcomes – either you will get heads or tails. So the probability of flipping a coin is 50/50 – 50 % that you will get tails and 50% that you will get heads.

If it sounds easy with coin toss, the probability in any card game let alone poker is much more complicated, because in card game there are more that two possible outcomes. Lets calculate a little: if regular poker card deck have 52 cards in it and each of the cards can be one of four types – clubs, diamonds, hearts and spades – and one of 13 ranks – from the number two to Kings, Aces and Jacks – then the odds of getting a King as your first card is 1 in 13 (7.7%), but the odds of getting a heart as your first card is 1 in 4 (25%). But with cards you also have to remember that once a card is drawn, there are less cards left in the deck, for example once a King is drawn there are only three left and the probability changes – becomes bigger to 1 in 3 (33.33%) -, so you constantly need to recalculate your probabilities.

The probability that can be calculated in the game of poker is the probability of each type of card of the 5-card hand. It can be figured out by calculating the proportion of hands of that type among all possible hands.

This is why math actually is on of the most important things while playing poker, because if you do it quick enough and sneaky enough, it can really help you at the poker game and you can actually beat the poker game with the help of math and probability.

Null hypothesis



Statistics is data collection in order to later organize, analyse, interpret and also present them in a specific manner that gives inside look in the problem and probable solutions in the area that is being studied. It can be used in many spheres from science to social and industrial fields. One of the most prominent hypotheses that is used very often in statistics in the null hypothesis, because in this discipline in many cases the null hypothesis is assumed true until evidence proves otherwise.

The null hypothesis in general is a statement or default positions that suggests that between two specific measures phenomena there is no relationships. Therefore with the help of statistics researcher need to determine that there is a relationship between two phenomena in order to  disprove the null hypothesis.

The null hypothesis also know as ad denoted as H0 is used in two very different statistical approaches. In the first approach called significant testing that was patented by Roland Fisher the null hypothesis can be disproved on the basis of data that is under the assumption. But in the second approach – hypothesis testing – that was introduced by Jerzy Neryman and Egon Pearson, the alternative hypothesis is put against the null hypothesis and then the truth is distinguished basing on the data, keeping in mind error rates.

The null hypothesis was established in 1925 by one Roland Fisher, although even before then there were talk about similar concepts in the statistical research and testing community. Fisher announced the null hypothesis and made it the main way to analyse almost all of experimental science back then. Later in 1933 Neyman and Pearson came of with an improvement to Fishers test, but later that became an alternative to the Fishers test rather than enhancement. And so these two ways of using the null hypothesis in statistics became to be and are still used all over the world.

One of the fields this null hypothesis is used is testing the significance of differences in treatment and control groups. At the beginning of the test it is assumed that there is no difference between the control and the experimental groups or any other two variables for that matter. For example with this method you can test and see if there is any difference in two groups, even if you only have one random sample of test scores from men and other – from women. Then first you need to assume that both test scores are the same and that can be showed with equation H0: μ1=μ2, where H0 is the null hypothesis but the μ1 and μ2 are the result of easy groups test scores.  Now all you need to do is to overthrow this statement and you will have overthrown the null hypothesis.

Statistics and Probability

Roulette wheel in casino, close-up on No. 28.If you have ever used probability or percentage to measure some value then you know that it is very hard to predict how often an event will occur and if it occur at all. In a completely deterministic universe if you somehow could know the initial starting measurement of any object or thing you could predict how the process will unfold and thus predict the outcome. But in real world it is almost impossible to measure every single aspect of any given situation so giving probabilities as a likelihood that one thing would happen is more useful. If you imagine a card game then you never know what are the cards in a shuffled deck but imagine if after playing a card game you see how the dealer takes those cards and shuffles, you see where each card goes and in the end you know what will the next card be and thus have an edge over other players. But imagine if you could see just half of the shuffling actions and partially the other half. In this situation you could assign probabilities to how likely is that a card will come before other cards as well as where are the strongest but you could never be sure because you do not have the complete information.

If we see how statistics is related to probability theory then we can see that in every statistical prediction there is a certain level of probability that the prediction is true but you can never say anything with complete confidence. This is because statistics takes a lot of data, organizes and finds the underlying model and analyses it but this method cannot tell you that the next event that will occur will be that and this method can just guess the likelihood. This means that statistical data and analysis as well as predictions are all just informed probabilities and they will never be 100% accurate.

And if we delve deeper into this subject then there are even more unpredictability as for example in quantum physics everything consists of likelihoods that something will happen but in reality the thing may or may not happen. And that means that any statistical measurements will not be completely accurate as even after a significant sample data there would be accourances when something goes out of control and basically is unpredictable. And this is also true for humans as we are unpredictable creatures and we can change the odds of something occurring in our favor but that is a completely different story.

Statistics in Marketing

marketing dataMarketing is one of those fields where everything is closely monitored and where data-driven approaches are the norm. And because of this marketers are using statistical methods to gather, analyze and predict trends and their campaign efforts as well as collect information about their focus groups and apply it to the real world scenarios.

Statistics can help marketers to overcome some gaps in their understanding about demographic groups and people in specific areas where they want to promote a product or service. If you have ever seen a targeted advertisement then you know that these marketers are becoming more and more precise in pinpointing their products to people that would actually buy them. With the help of our modern tools and the internet these people are collecting more and more data on everyone and then using this data to predict what where and how you will buy their products. But that doesn’t stop there and lately using statistical data analysis marketers are starting to promote products that are related to what you do but wouldn’t even have thought of buying.

To see how far statistical data analysis has come you need to look at the Target supermarket chain and their efforts in increasing profits and analyzing their customers. The story goes that Target with their loyalty cards and online purchases are collecting huge amounts of data on every customers and their buying habits as well as where they live and even what they could buy. Then smart data analysts use statistical analysis and try to find every customer buying patterns so they can then intervene and get even more out of everyone. This customer targeting even includes coupon sending to their regular customer homes with item that they could like and might buy. I once read a story that Target sent coupons with baby-clothes and other pregnancy items to a 16 year old girl and the father furiously demanded that they stop doing that and that his daughter is just 16 and can’t be pregnant. But after a couple of months the father came back and apologized because his daughter was in fact pregnant and that customer targeting system knew it faster than that girls father. Of course this is an extreme case but imagine how much money a store like this could make it that system could know everything about every customer and their buying patterns. And it is not science fiction because if you can tell that a buyer is pregnant only from the items bought in the store then soon enough that statistical analysis will reveal much more intricate details about you and your buying habits.

Bell Curve aka Gaussian Function

Bell_CurveBell curve or more widely known as the Gaussian distribution/function is a theory that states how a standard deviation happens in any environment and by doing so this theory proposes a two or three dimensional graph that shows a succeed line that starts off with exponential function, has a top point and then decreases in exponential fashion. This Bell curve is a very widely used graph to show different environmental, statistical and other phenomena and study their speed of change as well as where the peak value could be. If you would search around for this specific graph this you would start noticing it more and more and see that this normal distribution is a part of almost any process starting from sales and new product introductions to population growth as well as oil and other non-renewable energy source availability and peak values. As I understand this phenomenon then a bell curve can appear in any situation where there is a limited amount of something but an entity(most likely humans) are not acknowledging this and doing everything like there will be continuous growth all the time. This phenomenon can be seen in the peak-oil graphs that you can find where some statisticians and data scientists have predicted that because Oil is a non-renewable resource and we as a species are using this resource like it will never end then we will soon get to the point where demand will exceed supply and would have reached the peak of the curve. When this happens then there will be no point in trying to find new oil fields or new extraction methods because it will be more and more harder to do so.
This normal distribution curve has a formula that states how wide will the curve be and where the top point resides and then the curve is created from these values visualizing that distribution to anyone and showing more insights than a simple data-set could ever do. These bell curves are used for a wide range of applications and they show up in almost every statistical data where there is a real world experiment with data gathering. Almost always there is a standard-deviation curve that will show some underlying truths about the data and will allow the statistician to make predictions about the standard deviation and overall data sets. Quite frankly there is not one application or any reason why these bell curves appear in data but they show that underneath all datasets and all experiments are some overall truths that we can extract.


What is statistics?

statistic_curveStatistics is a branch of science that deals primarily with data. Statistics as a science deals with all aspects of data including collection, organization, interpretation and conclusion making. This means that where there are large or even small bodies of data there are always statistical methods applied to that data to display, interpret and even change and see some patterns within that data. Some say that statistics is a mathematical science because basically everything you do with data involves in one way or other math and thus statistics without math do not exist. But my view on this subject is a bit different and I think that statistics is a whole scientific field separate from math because when you collect, organise and devise meaning from data you get patterns and conclusions that are not pure math and thus can be interpreted otherwise.

This scientific field is well known in computer science because where you get computers there always are a lot of date and when you need to collect and process any kind of data then those methods that you use are from this field. There are a lot of programs created for analysing data and maybe the most well known is Microsoft Excel as this spreadsheet program can represent large datasets in charts and even apply mathematical functions on that data to derive values such as average, largest, smallest and a lot more. Because of programs like these anyone can participate in statistical data analysis and can start to derive meaning and conclusions from large bodies of information.

The best way to see statistics in action is to look at infographics and other data visualizations like charts and diagrams because our human brain is very bad with numbers but excellent at pictures and abstract meaning extraction. If you take a lot of data and organise it in a chart then you can see how the data is divided, and what are some other underlying aspects of that data that you could not have seen just looking at that information. But visualizing data has its limits and you cannot derive real conclusions just by creating and measuring some charts and diagrams. To fully interpret and see the underlying patterns you need to apply formulas like statistical mean, average and standard deviation and then you can start to understand the dataset with more accuracy.

As you can see – statistics is a lot more than just visualizing data and to fully understand this science you must start from the very beginning and collect data, then process it and only then you can start to interpret that information but that is a whole new topic.