Skip to content

Statistics

Hypothesis Testing

It's a statistical method used to determine whether a hypothesis about a population is true or not. It involves collection data, analyzing it, and making a decision based on a the evidence

Steps

Step 1: State your null and alternate hypothesis

The null hypothesis is a prediction of no relationship between variables you are interested in. The alternate hypothesis on the other hand is your hypothesis that predicts a relationship between variables.

Examples
  1. You want to test whether there is relationship between gender and height. Based on your knowledge of human physiology, taller than women. To test this hypothesis you restate it as:

  2. H0 : Men are, on average, not taller than women

  3. Ha: Men are, on average, taller than women.
Some Guidelines when using mathematical symbols
H0 Ha
Equal (=) Not equal (\(\neq\))
Greater Than or equal to (\(\geq\)) Less than (\(\lt\))
Less than or equal to (\(\leq\)) Greater than (\(\gt\))
Examples

We want to test whether the mean GPA of students in American colleges is different from 2.0.

The null and alternative hypothese are

H0: \(\mu\) = 2.0

Ha: \(\mu\) \(\ne\) 2.0

Steps 2 : Perform an appropriate statistical test

For this step we perform something known as the t-test. A t-test is any statistical hypothesis test in which the test statistic follows a t-distribution under the null hypothesis.

A t-test is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known.

The t-test can be used to determine if the means of two sets of data are significantly different from each other.

An independent Samples t-test compares the means for two groups.

A paired sample t-test compares means from the same group at different times

A one sample t-test test the mean of a single group against a known mean.

T-score is a ration between 2 groups and the difference within the groups.

The larger the t score, the more difference there is between groups.

The smaller the t score, the more similarity there is between groups.

Every t-score has a p-value to go with it.

A p-value is th probability that results that your sample data occured by chance.

P-values are from 0% to 100%

Low p-values are good. They indicate that your data did not occur by chance.

Step 3: Decide Whether to reject or accept your null hypothesis.

To understand this step let us solve a problem:

Suppose a sample of n students were give a diagnostic test before studying a particular module and then again after completing the module. We want to find out if in general teaching leads to improvements in students knowledge/skills. We can use the results from our sample of students to draw concludsion about the impact of this module in general.

So since we are calculating the mean of the same sample at different points in time we will be using the Pairesd t-test.

Null hypothesis - There is no difference after completing the module. Alternate Hypothesis - There is a difference after completing the module.

Calculate the difference between the two observations i.e. di = yi - xi.

Calculate the mean difference d

Calcualte the standard deviation of the differences, Sd and use this to calculate the standard error of the mean difference, SE(d) = \(\frac{Sd}{\sqrt{n}}\)

Calculate the T value

t = \(\frac{d}{SE(d)}\)

then use a table of t value to look up the p-values for the paired t-test.

Gaussian Distribution Problems in Interviews

Overview

When going for any Machine Learning or Data Science Interviews, the interviewers like to check if a candidate can model a problem after a distribution. For everyone the favourite being the Gaussian Distribution. I'm sure everyone is familiar with how to do this, but to refresh everyones memory on the subject lets look at a question.

Interview Question

On any given day the average customer visiting a store is 500 and the standard deviation is 20. What is the probability that the number of customers on any day is in the range of 480-520.

Solution

So lets note own the details of the problem

We know that the mean \(\mu\) is 500 and \(\sigma\) is 20. Assuming gaussian distribution. Our model will look something like this:

Gaussian Distribution

So the total percentage is 68.26%

Transforming Data to New Mean and Variance

Overview

Sometimes, we would want to transform a data to new mean or variance for different purposes. For instance, for a normal distribution we might want to skew the mean and variance to correct an error during data collection. These operations are very common in data wrangling

How do you do it?

It's quite simple really. Let us understand the math behind it.

Step 1: Standardize your dataset

To standardise your dataset you need to perform the following transformation first:

\[ z = \frac{x - \mu}{\sigma} \]

Where x is the input \(\mu\) is the meand and \(\sigma\) is the standard deviation and \(\sigma^2\) is the variance. This implementation is similar to the sklearn implementation for StandardScaler in python.

Step 2: Calculating new inputs

To calculate the new inputs we simply need to use the following formula:

\[ x' = z*\sigma' + \mu' \]

Where x' is the new input values and \(\sigma'\) is the new standard deviation and \(\mu'\) is the new mean

Python Code

import numpy as np
# mean = 25
# std = 10
input_vals =  np.random.normal(loc=25,scale=10, size=1000)
mean = input_vals.mean()
std = input_vals.std()
transformed_vals = input_vals-mean
transformed_vals = transformed_vals/std
new_mean = 50
new_std = 20
new_input_vals = transformed_vals * new_std
new_input_vals = new_input_vals + new_mean
new_input_vals
array([ 49.49601559,  81.95221737,  44.10701235,  -4.43684172,
        43.12316393,  22.37556816,  50.8232128 ,  50.82211642,
        54.34191641,  44.22147494,  47.6687756 ,  77.44097655,
        10.03763848,  47.29017838,  32.11374724,  -1.23982667,
        69.48691703,  48.91739126,  86.26436   ,  54.24775184,
        -0.62913152,  46.80564143,  34.39202535,  42.49736382,
        11.11665713,  67.84445118,  34.59465903,  61.88864806,
        40.33390889,  51.23802672,  36.43257266,  67.12160189,
        34.66234269,  36.87892063,   7.19051749,  32.21378979,
        43.02836817,  43.15184828,  47.1522313 ,  42.8129976 ,
        76.41399973,  72.86920209,  49.40598658,  20.55679545,
        39.21964224,  41.66889385,  75.12315452,  40.55983653,
        35.84141976,  31.58795549,  55.91554017,   2.05367405,
        68.68371294,  89.64689468,  35.3533851 ,  65.48295692,
        91.77368542,  66.95120604,  22.30549826,  37.0382882 ,
        62.95677903,  52.15918057,  43.90733809,  44.98067858,
        56.7238287 ,  32.07208405,  71.82333845,  83.68577369,
        42.82555785,  75.28336789,  26.21175009,  96.55165242,
        38.51350607,  70.11426895,  74.27860112,  65.36207749,
        81.35233833,  50.18617206,  55.93360283,  77.53365166,
        69.63520905,  37.59198146,  46.79127037,  44.60258121,
        28.17262707,  66.3659093 ,  27.81046051,  32.12126147,
        74.81794259,  40.02111546,  58.35644463,  72.6791059 ,
        79.23500871,  18.61984409,  73.44601248,  62.16258075,
        71.36050328,  72.18368902,  62.11357691,  60.01150657,
        12.93929019,  33.25076732,  58.92842542,  85.16931205,
        35.14074635,  30.94234057,  56.63988325,  39.54632178,
        50.98317019,  39.01565464,  44.35046655,  42.8802156 ,
        67.70448826,  69.68267456,  67.88673202,  47.03415948,
        35.70957455,  62.0271697 ,  29.95210341,  85.6417443 ,
        86.41651521,  13.35185704,  73.3740623 ,  46.06879338,
        33.06050445,  64.62951854,  21.38971394,  36.92346644,
        60.46378494,  56.42159419,  46.88107641,  59.5814996 ,
        80.12756863,  44.68597813,  65.62078714,  90.01413935,
        45.31101182,  54.69107064,  38.97076947,  32.8735822 ,
        85.46860426,  33.63271914,  77.7963452 ,  36.70177956,
        56.22516901,  49.61501512,  48.62277677,  34.71191839,
        70.58431327,  47.215934  ,  86.01083296,  10.01352701,
        47.27567932,  33.96263808,  48.70166808,  27.40983011,
        52.55928308,  48.01592305,  61.45921988,  45.88339219,
        79.73244761,  43.8490769 ,  63.3760365 , 108.63398782,
        52.43253858,  44.70183207,  23.75090548,  44.57956294,
        52.82850474,  52.25872225,  45.52727945,  52.99699568,
        63.45042797,  74.26054273,  74.8948219 ,  57.09896442,
        37.91847292,  89.77592224,  60.1371484 ,  24.00906182,
        77.56627315,  81.01670238,  62.80787995,  43.67763844,
        26.54282655,  62.64378931,  61.53516981,  56.66331484,
        40.82505139,  35.73135155,  49.38087751,  35.38622963,
        54.57180493,  68.1206563 ,  58.30386778,  23.94199085,
        40.79241966,  66.08012181,  51.79244687,  22.75451046,
        26.07074966,  33.11550218,  75.85995492,  56.79386647,
        55.3297153 ,  30.92124425,  56.02856793,  33.96853867,
        26.26417803,  75.88605475,  62.39167368,  34.75454549,
        74.46107655,  37.20314954,  38.38163625,  74.66845515,
        75.66322402,  24.28010982,  43.94214999,  40.62395058,
        48.0843374 ,  51.98710602,  66.69022463,  83.16548506,
        48.1024695 ,  26.16834048,  29.89578215,  60.10091581,
        27.11641347,  53.81504542,  40.37371576,  62.53805758,
        18.60222102,  25.46361251,  57.12387699,  20.34342658,
        34.85790115,  29.45391819,  51.56221642,  36.93229206,
        69.65466631,  61.23242145,  34.13827241,  46.82041056,
        53.53046763,  34.23611329,  56.34852414,  44.12829082,
        83.13023373,  60.70016783,  83.19769711,  32.20696478,
        24.44920165,  61.80384099,  16.75989065,  74.25626731,
        68.36131381,  56.16197942,  32.74899458,  59.23027229,
        50.65085652,  46.34568657,  55.1050968 ,  71.59045805,
        45.04037161,  48.84160109,   9.73137866,  67.13901612,
        70.18102173,  10.97512729,  44.47815867,  58.10020807,
        61.3599267 ,  56.59742826,  13.56102442,   3.92188211,
        63.93193683,  74.89339799,  83.21784375,  53.03416625,
        19.53061296,  17.03327139,  82.19633235,  55.8401601 ,
        73.41605689,  57.66720166,   5.3398253 ,  15.02813978,
        34.17079683,  54.57416008,  51.08909237,  59.24314076,
         3.61556204,  48.62684436,  22.98114859,  54.50169801,
        85.54156458,  87.04567889,  65.312347  ,  35.79250589,
        32.26966647,  33.17354654,  47.02984857,  44.145624  ,
        59.4074548 ,  36.55781391,  55.18682041,  58.34930214,
        40.286385  ,  58.81879478,  12.40276423,  41.00470755,
        53.61278918,  51.33011874,  33.00966439,  48.86525779,
        58.35053152,  41.92292913,  38.45732378,  54.4998052 ,
        37.50265526,  41.60417335,  38.18343955,  52.14381858,
        51.25683047,  59.68082105,  60.96865136,  43.61793798,
        53.45929256,  55.96906224,  23.29333053,  63.63911366,
        23.97048848,  75.71691226,  57.50459948,  52.18680612,
        25.88142988,  40.25965288,  18.71230929,  60.27017089,
        37.20885774,  64.33217068, -15.80580728,  83.1248649 ,
        23.79554506,  76.58716818,  26.2192708 ,  40.83835812,
        23.03772042,  68.98720606,  57.57873014,  68.52695854,
         5.14758133,  45.1179288 ,  64.97623344,  63.57160003,
        77.08161055,  52.1084482 ,  61.68888811,  27.16821272,
        46.0101753 ,  65.28611687,  64.46894115,  42.26620475,
        64.90930088,  38.81286685,  67.72977263,  42.98501795,
        44.88827085,  54.93992946,  25.47413835,  52.27312205,
        85.69922305,  61.61658672,   7.83273776,  59.8997075 ,
         6.73665383,  27.96350816,  70.43863418,  32.58226973,
        48.39457149,  61.43033804,  54.35377947,  83.34789135,
        65.9375121 ,  59.34051355,  67.24611435, 102.54055678,
        39.02170513,  24.99161933,  43.5885292 ,  39.43727887,
        59.12757792,  83.05789748,  47.38251863,  46.00796688,
        65.44530126,  50.26898215,  17.59570274,  86.65436661,
        33.2437534 ,  63.84817761,  26.69816948,  63.43081244,
        19.51755875,  49.07459211,  41.58803545,  59.22852463,
        63.00270275,  35.83958966,  50.24515112,  87.57937072,
        40.74589485,  59.7137842 ,  91.7339808 ,  55.82675139,
        65.38667423,  45.43300605,  63.62709081,  41.57023071,
        99.50174099,  49.94879407,  85.28333911,  38.98283286,
        22.95127191,  28.48640215,  74.40190731,  56.39152034,
        87.11789373,  49.96592323,  47.91872821,  69.60728571,
        51.23902791,  71.64063386,  79.6016418 ,  71.95821828,
        77.68705163,  54.71355569,  68.26163132,   9.70179471,
        64.29988672,  27.08808979,  73.68217633,  31.30879697,
        44.96018459,  31.74515341,  55.90425015,  57.80316169,
        21.13254216,  47.48109419,  54.97651116,  17.85206471,
        48.17671051,  46.9278787 ,  31.70650285,  41.04334338,
        54.58315842,  61.66939818,  49.1901462 ,  39.92231791,
        62.12917698,  73.11726575,  68.64225752,  26.54410158,
        51.51106242,  12.74469986,  39.00122347,  56.2995436 ,
        37.5701076 ,  44.42675697,  23.99295103,  32.05722478,
        22.74339285,  49.68311884,  59.52754907,  15.94074257,
        50.36358847,  43.60625194,  59.88294185,  50.14999084,
        62.31356872,  68.53996927,  38.9083538 ,  69.91116662,
        60.54048083,  56.99174173,  46.15787547,  36.1217023 ,
        70.14227673,  72.09617741,  83.53838605,  16.36291389,
        63.21899147,  42.39208864,  42.11577349,  56.77338503,
        43.29132042,  49.74369988,  11.02817352,  40.57930242,
        78.03595312,  61.20053244,  44.25454593,  73.46078844,
        39.19593048,  55.75033887,  82.09623613,  38.84314227,
        40.85204792,  50.86187538,  10.72616764,  18.28888217,
        46.55790238,  55.43780903,  38.20987205,  42.50837861,
        79.26291754,  72.30333049,  24.20580798,  37.02655925,
        30.89209407,  54.4846919 ,  82.50612511,  58.67607329,
        58.23064539,  47.88389914,  18.12263489,  52.5182857 ,
        43.41903826,  41.14214894,  68.01724688,  25.3949731 ,
        44.7449377 ,  27.16228127,  58.73924531,  56.86747413,
        84.61124395,  45.05992081,  71.1624996 ,  37.48260365,
        46.48870978,  23.69901148,  51.15919288,  18.96718992,
        56.79575079,  54.31872957,  28.83104669,  71.49908403,
        61.81759572,  45.09595638,  85.44426864,  30.9801629 ,
        38.49977853,  71.87824812,  59.38518561,  14.70401097,
        23.84136401,  36.41924606,  51.85082062,  24.42709582,
        46.96733474,  59.95369635,  87.22312558,  46.45375422,
        79.63697279,  34.48244042,  49.61857759,  58.24109826,
        69.28433602,  12.28545874,  36.08342907,  74.61247067,
        44.54569835,  29.59746389,  42.73948595,  17.7863682 ,
        34.32873818,  30.87491666,  29.31222089,  49.94580299,
        51.12224323,  68.13000142,  55.25825597,  70.91162641,
        39.3351263 ,  43.15723031,  48.69918162,  35.77614902,
        48.20871546,  41.7321273 ,  56.46405943,  29.60390014,
        51.87946251,  56.76641015,  49.74083861,  53.47655012,
        38.35481504,  59.32787208,  64.8190974 ,   4.59899069,
        24.65197628,  76.13535529,  31.87519699,  49.87142476,
        55.66396191,  33.03398098,  39.32249353,  55.15723999,
        31.75098199,  61.82868185,  67.98436359,  66.45111173,
        68.09088384,  57.42580141,   7.90566222,  93.05117015,
        44.21880902,  65.5422945 ,  57.44372298,  52.08799937,
        63.14130041,  68.53298467,  48.00403173,  58.55683871,
        59.14957095,  63.00563394,  60.46529197,  79.58424895,
        36.21158367,  32.58760247,  55.02829998,  54.40420964,
        58.85174303,  71.32178925,  77.24437475,  28.3589688 ,
        32.16694145,  33.19155973,  54.26637119,  41.16327586,
        50.58551412,  64.76908479,  38.28325466,  57.01256249,
        26.4025322 ,  51.71212334,  58.58063136,  36.49654396,
        45.05438892,  64.52502789,  24.11713391,  70.08197726,
        33.83088736,  16.29646357,  57.63483978,  61.79741407,
        59.56417407,  32.03595271,  48.93540253,  50.41790003,
        82.64721788,  45.3544536 ,  54.81711147,  77.28636658,
        40.19796987,  52.54404744,  32.56437429,  47.85691129,
        58.62257712,  55.36607257,  59.41562041,  77.16077912,
        21.38747626,  19.38870772,  85.40004097,  26.49009319,
         8.0640901 ,  36.55693585,  14.9592918 ,  32.44420826,
        57.87655584,  47.88871025,  80.95017661,  43.29675305,
        54.95892607,  35.60244226,  41.27100482,  59.14702036,
        44.85299406,  45.88970679,  64.90173338,  16.68215149,
        60.83374325,  54.51335594,  66.12371104,  43.71407551,
        76.35766673,  47.0578887 ,  44.48930216,  58.67214604,
         6.93099036,  44.0784275 ,  87.26743732,  33.74051501,
        31.61433847,  16.18837785,  71.41725892,   7.56165002,
        74.25763798,  53.00725407,  21.21984948,  27.00235844,
        43.27854072,  62.58751425,  72.62697027,  84.83697849,
        39.29949741,  27.74751162,  84.54174836,  53.60907931,
        23.51072372,  52.30666938,  44.9227771 ,  62.00104535,
        36.76673651,  67.79146944,  76.40024822,  72.37617406,
        32.15380912,  38.59583061,  55.06194823,  86.37161605,
        45.8461077 ,  44.76196931,  53.67580888,  55.94391549,
        40.54273035,  72.81303502,  80.22448102,  83.59834747,
        42.68294477,  56.22101165,  42.50225088,  74.24983962,
        74.41151259,  71.16334458,  41.85641042,  57.63886762,
        53.31873729,  18.19712412,  46.10496567,  65.21976216,
        28.71065915,  97.38021009,  32.86679772,  19.66351098,
        38.16941953,  30.58965124, 104.90914337,  25.40905709,
        48.50673348,  40.18356751,  40.01859325,  53.28474473,
        15.10319505,  41.45816069,  60.98240438,  72.87696864,
        17.83039328,  57.73843473,  69.49674981,  12.1607061 ,
        33.19392956,  32.87282521,  22.95524749,  41.49371038,
        21.74403892,  70.68449793,  32.62075174,  41.58195433,
        81.18567803,  33.58930942,  34.64131154,  44.54289417,
        64.36337003,  40.37054446,  22.78226617,  58.91188644,
        67.67729368,  26.23094686,  62.93973918,  55.49175452,
        51.4648093 ,  49.28916451,  47.06318444,  42.58523182,
        56.10613689,  52.40085237,  23.38133769,  55.67721105,
        31.49967033,  72.31546648,  59.25186389,  56.84445298,
        66.88545026,  66.95470607,  20.38200418,  99.7359384 ,
        46.07265114,  69.3898383 ,  78.10224603,  44.37805652,
         8.37255714,  60.67226014,  53.11479668,  64.04126634,
        18.04627226,  38.58411279,  56.02370156,  39.15303342,
        34.63214408,  88.21876653,  67.44644253,  16.43229522,
        83.90872194,  35.93661211,  60.19263761,  42.15495887,
        58.93682819,  33.10147147,  58.65792064,  24.18783816,
        88.72537889,  82.56698502,  49.25789954,  29.74450617,
        58.47459962,  28.70369747,  49.89468339,  78.39481375,
        79.47880766,  82.48991671,  58.24369182,  32.75084164,
        54.21499264,  40.47349446,  66.56791093,  56.32661118,
        37.41345055,  54.07731934,  91.19557085,  71.43299712,
        34.64809948,  90.94826767,  38.70820119,  49.93275932,
        71.52085894,  21.64971732,  67.19714009,  62.88390642,
        11.12847965,  20.41944811,  55.73799815,  17.06247734,
        54.17605223,  86.01987149,  53.79450487,  43.47483199,
        67.10418026,  35.99357907,  22.21169736,  56.93503159,
        43.87585123,  63.02749859,  33.49599091,  54.36035587,
        39.97066494,  35.73469642,  32.5950815 ,  62.69388504,
        55.72244848,  27.56650818,  41.4477204 ,  44.53380104,
        86.10294567,  64.49073862,  50.72484679,  52.70817822,
        43.01421884,  92.43937255,  70.28028032,  40.51593511,
        34.45853813,  22.83929454,  58.21433989,  74.31262301,
        93.80795957,   5.87029511,  53.21195362,  55.25038156,
        72.02442023,  64.24480159,  86.85869318,  44.43041239,
        69.01396061,  50.55803314,  56.43512514,  51.3159804 ,
        54.75580638,  54.56525244,  55.682404  ,  74.54055696,
        51.85127714,  54.69033416,  67.01174536,  19.75335069,
        68.69085763,  34.59123781,  54.60158215,  25.75319982,
        18.98062956,  56.99697019,  28.78845467,  39.70492392,
        49.91829085,  23.6065864 ,  65.61351123,  26.89794478,
        39.59783814,  70.30614995,  29.79739873,  44.79248053,
        34.23500871,  56.96573679,  29.35327205,  15.07479586,
        49.39795207,  37.89516878,  55.27399095,  70.5287793 ,
        34.96631774,  54.9593308 ,  55.06654355,  54.91532766,
        77.45651561, 101.26344763,  40.35671192,  47.72859174,
        84.29509919,  82.24451983,  29.43647485,  49.62186862,
        44.52559469,  19.67293782,  28.19906789,  53.3988459 ,
        76.98822075,  70.76100192,  78.58102673,  42.48809645,
        49.50163486,  60.34105469,  97.2933164 , -11.63648322,
        37.99545145,  22.63833621,  32.4570644 ,  54.51626858,
        12.8345893 ,  52.88724172,  34.79923337,  53.49351012,
        67.21403336,  44.4836404 ,  63.52270485,  61.64884962,
        58.22986623,  44.88083416,  13.91709381,  48.76284166,
        40.26437942,  86.22442121,  57.68535584,  68.62851517,
        71.3348765 ,  36.94344385,  63.38509275,  53.05445266,
        28.86990277,  44.70024564,  75.33577266,  57.81937316,
        26.07899087,  42.28069837,  45.52787923,  55.90744697,
        -3.9831072 ,  83.63294764,  59.18088379,  67.89096616])
new_input_vals.mean()
50.00000000000001
new_input_vals.std()
20.0