Skip to content

Blog

Transforming Data to New Mean and Variance

Overview

Sometimes, we would want to transform a data to new mean or variance for different purposes. For instance, for a normal distribution we might want to skew the mean and variance to correct an error during data collection. These operations are very common in data wrangling

How do you do it?

It's quite simple really. Let us understand the math behind it.

Step 1: Standardize your dataset

To standardise your dataset you need to perform the following transformation first:

\[ z = \frac{x - \mu}{\sigma} \]

Where x is the input \(\mu\) is the meand and \(\sigma\) is the standard deviation and \(\sigma^2\) is the variance. This implementation is similar to the sklearn implementation for StandardScaler in python.

Step 2: Calculating new inputs

To calculate the new inputs we simply need to use the following formula:

\[ x' = z*\sigma' + \mu' \]

Where x' is the new input values and \(\sigma'\) is the new standard deviation and \(\mu'\) is the new mean

Python Code

import numpy as np
# mean = 25
# std = 10
input_vals =  np.random.normal(loc=25,scale=10, size=1000)
mean = input_vals.mean()
std = input_vals.std()
transformed_vals = input_vals-mean
transformed_vals = transformed_vals/std
new_mean = 50
new_std = 20
new_input_vals = transformed_vals * new_std
new_input_vals = new_input_vals + new_mean
new_input_vals
array([ 49.49601559,  81.95221737,  44.10701235,  -4.43684172,
        43.12316393,  22.37556816,  50.8232128 ,  50.82211642,
        54.34191641,  44.22147494,  47.6687756 ,  77.44097655,
        10.03763848,  47.29017838,  32.11374724,  -1.23982667,
        69.48691703,  48.91739126,  86.26436   ,  54.24775184,
        -0.62913152,  46.80564143,  34.39202535,  42.49736382,
        11.11665713,  67.84445118,  34.59465903,  61.88864806,
        40.33390889,  51.23802672,  36.43257266,  67.12160189,
        34.66234269,  36.87892063,   7.19051749,  32.21378979,
        43.02836817,  43.15184828,  47.1522313 ,  42.8129976 ,
        76.41399973,  72.86920209,  49.40598658,  20.55679545,
        39.21964224,  41.66889385,  75.12315452,  40.55983653,
        35.84141976,  31.58795549,  55.91554017,   2.05367405,
        68.68371294,  89.64689468,  35.3533851 ,  65.48295692,
        91.77368542,  66.95120604,  22.30549826,  37.0382882 ,
        62.95677903,  52.15918057,  43.90733809,  44.98067858,
        56.7238287 ,  32.07208405,  71.82333845,  83.68577369,
        42.82555785,  75.28336789,  26.21175009,  96.55165242,
        38.51350607,  70.11426895,  74.27860112,  65.36207749,
        81.35233833,  50.18617206,  55.93360283,  77.53365166,
        69.63520905,  37.59198146,  46.79127037,  44.60258121,
        28.17262707,  66.3659093 ,  27.81046051,  32.12126147,
        74.81794259,  40.02111546,  58.35644463,  72.6791059 ,
        79.23500871,  18.61984409,  73.44601248,  62.16258075,
        71.36050328,  72.18368902,  62.11357691,  60.01150657,
        12.93929019,  33.25076732,  58.92842542,  85.16931205,
        35.14074635,  30.94234057,  56.63988325,  39.54632178,
        50.98317019,  39.01565464,  44.35046655,  42.8802156 ,
        67.70448826,  69.68267456,  67.88673202,  47.03415948,
        35.70957455,  62.0271697 ,  29.95210341,  85.6417443 ,
        86.41651521,  13.35185704,  73.3740623 ,  46.06879338,
        33.06050445,  64.62951854,  21.38971394,  36.92346644,
        60.46378494,  56.42159419,  46.88107641,  59.5814996 ,
        80.12756863,  44.68597813,  65.62078714,  90.01413935,
        45.31101182,  54.69107064,  38.97076947,  32.8735822 ,
        85.46860426,  33.63271914,  77.7963452 ,  36.70177956,
        56.22516901,  49.61501512,  48.62277677,  34.71191839,
        70.58431327,  47.215934  ,  86.01083296,  10.01352701,
        47.27567932,  33.96263808,  48.70166808,  27.40983011,
        52.55928308,  48.01592305,  61.45921988,  45.88339219,
        79.73244761,  43.8490769 ,  63.3760365 , 108.63398782,
        52.43253858,  44.70183207,  23.75090548,  44.57956294,
        52.82850474,  52.25872225,  45.52727945,  52.99699568,
        63.45042797,  74.26054273,  74.8948219 ,  57.09896442,
        37.91847292,  89.77592224,  60.1371484 ,  24.00906182,
        77.56627315,  81.01670238,  62.80787995,  43.67763844,
        26.54282655,  62.64378931,  61.53516981,  56.66331484,
        40.82505139,  35.73135155,  49.38087751,  35.38622963,
        54.57180493,  68.1206563 ,  58.30386778,  23.94199085,
        40.79241966,  66.08012181,  51.79244687,  22.75451046,
        26.07074966,  33.11550218,  75.85995492,  56.79386647,
        55.3297153 ,  30.92124425,  56.02856793,  33.96853867,
        26.26417803,  75.88605475,  62.39167368,  34.75454549,
        74.46107655,  37.20314954,  38.38163625,  74.66845515,
        75.66322402,  24.28010982,  43.94214999,  40.62395058,
        48.0843374 ,  51.98710602,  66.69022463,  83.16548506,
        48.1024695 ,  26.16834048,  29.89578215,  60.10091581,
        27.11641347,  53.81504542,  40.37371576,  62.53805758,
        18.60222102,  25.46361251,  57.12387699,  20.34342658,
        34.85790115,  29.45391819,  51.56221642,  36.93229206,
        69.65466631,  61.23242145,  34.13827241,  46.82041056,
        53.53046763,  34.23611329,  56.34852414,  44.12829082,
        83.13023373,  60.70016783,  83.19769711,  32.20696478,
        24.44920165,  61.80384099,  16.75989065,  74.25626731,
        68.36131381,  56.16197942,  32.74899458,  59.23027229,
        50.65085652,  46.34568657,  55.1050968 ,  71.59045805,
        45.04037161,  48.84160109,   9.73137866,  67.13901612,
        70.18102173,  10.97512729,  44.47815867,  58.10020807,
        61.3599267 ,  56.59742826,  13.56102442,   3.92188211,
        63.93193683,  74.89339799,  83.21784375,  53.03416625,
        19.53061296,  17.03327139,  82.19633235,  55.8401601 ,
        73.41605689,  57.66720166,   5.3398253 ,  15.02813978,
        34.17079683,  54.57416008,  51.08909237,  59.24314076,
         3.61556204,  48.62684436,  22.98114859,  54.50169801,
        85.54156458,  87.04567889,  65.312347  ,  35.79250589,
        32.26966647,  33.17354654,  47.02984857,  44.145624  ,
        59.4074548 ,  36.55781391,  55.18682041,  58.34930214,
        40.286385  ,  58.81879478,  12.40276423,  41.00470755,
        53.61278918,  51.33011874,  33.00966439,  48.86525779,
        58.35053152,  41.92292913,  38.45732378,  54.4998052 ,
        37.50265526,  41.60417335,  38.18343955,  52.14381858,
        51.25683047,  59.68082105,  60.96865136,  43.61793798,
        53.45929256,  55.96906224,  23.29333053,  63.63911366,
        23.97048848,  75.71691226,  57.50459948,  52.18680612,
        25.88142988,  40.25965288,  18.71230929,  60.27017089,
        37.20885774,  64.33217068, -15.80580728,  83.1248649 ,
        23.79554506,  76.58716818,  26.2192708 ,  40.83835812,
        23.03772042,  68.98720606,  57.57873014,  68.52695854,
         5.14758133,  45.1179288 ,  64.97623344,  63.57160003,
        77.08161055,  52.1084482 ,  61.68888811,  27.16821272,
        46.0101753 ,  65.28611687,  64.46894115,  42.26620475,
        64.90930088,  38.81286685,  67.72977263,  42.98501795,
        44.88827085,  54.93992946,  25.47413835,  52.27312205,
        85.69922305,  61.61658672,   7.83273776,  59.8997075 ,
         6.73665383,  27.96350816,  70.43863418,  32.58226973,
        48.39457149,  61.43033804,  54.35377947,  83.34789135,
        65.9375121 ,  59.34051355,  67.24611435, 102.54055678,
        39.02170513,  24.99161933,  43.5885292 ,  39.43727887,
        59.12757792,  83.05789748,  47.38251863,  46.00796688,
        65.44530126,  50.26898215,  17.59570274,  86.65436661,
        33.2437534 ,  63.84817761,  26.69816948,  63.43081244,
        19.51755875,  49.07459211,  41.58803545,  59.22852463,
        63.00270275,  35.83958966,  50.24515112,  87.57937072,
        40.74589485,  59.7137842 ,  91.7339808 ,  55.82675139,
        65.38667423,  45.43300605,  63.62709081,  41.57023071,
        99.50174099,  49.94879407,  85.28333911,  38.98283286,
        22.95127191,  28.48640215,  74.40190731,  56.39152034,
        87.11789373,  49.96592323,  47.91872821,  69.60728571,
        51.23902791,  71.64063386,  79.6016418 ,  71.95821828,
        77.68705163,  54.71355569,  68.26163132,   9.70179471,
        64.29988672,  27.08808979,  73.68217633,  31.30879697,
        44.96018459,  31.74515341,  55.90425015,  57.80316169,
        21.13254216,  47.48109419,  54.97651116,  17.85206471,
        48.17671051,  46.9278787 ,  31.70650285,  41.04334338,
        54.58315842,  61.66939818,  49.1901462 ,  39.92231791,
        62.12917698,  73.11726575,  68.64225752,  26.54410158,
        51.51106242,  12.74469986,  39.00122347,  56.2995436 ,
        37.5701076 ,  44.42675697,  23.99295103,  32.05722478,
        22.74339285,  49.68311884,  59.52754907,  15.94074257,
        50.36358847,  43.60625194,  59.88294185,  50.14999084,
        62.31356872,  68.53996927,  38.9083538 ,  69.91116662,
        60.54048083,  56.99174173,  46.15787547,  36.1217023 ,
        70.14227673,  72.09617741,  83.53838605,  16.36291389,
        63.21899147,  42.39208864,  42.11577349,  56.77338503,
        43.29132042,  49.74369988,  11.02817352,  40.57930242,
        78.03595312,  61.20053244,  44.25454593,  73.46078844,
        39.19593048,  55.75033887,  82.09623613,  38.84314227,
        40.85204792,  50.86187538,  10.72616764,  18.28888217,
        46.55790238,  55.43780903,  38.20987205,  42.50837861,
        79.26291754,  72.30333049,  24.20580798,  37.02655925,
        30.89209407,  54.4846919 ,  82.50612511,  58.67607329,
        58.23064539,  47.88389914,  18.12263489,  52.5182857 ,
        43.41903826,  41.14214894,  68.01724688,  25.3949731 ,
        44.7449377 ,  27.16228127,  58.73924531,  56.86747413,
        84.61124395,  45.05992081,  71.1624996 ,  37.48260365,
        46.48870978,  23.69901148,  51.15919288,  18.96718992,
        56.79575079,  54.31872957,  28.83104669,  71.49908403,
        61.81759572,  45.09595638,  85.44426864,  30.9801629 ,
        38.49977853,  71.87824812,  59.38518561,  14.70401097,
        23.84136401,  36.41924606,  51.85082062,  24.42709582,
        46.96733474,  59.95369635,  87.22312558,  46.45375422,
        79.63697279,  34.48244042,  49.61857759,  58.24109826,
        69.28433602,  12.28545874,  36.08342907,  74.61247067,
        44.54569835,  29.59746389,  42.73948595,  17.7863682 ,
        34.32873818,  30.87491666,  29.31222089,  49.94580299,
        51.12224323,  68.13000142,  55.25825597,  70.91162641,
        39.3351263 ,  43.15723031,  48.69918162,  35.77614902,
        48.20871546,  41.7321273 ,  56.46405943,  29.60390014,
        51.87946251,  56.76641015,  49.74083861,  53.47655012,
        38.35481504,  59.32787208,  64.8190974 ,   4.59899069,
        24.65197628,  76.13535529,  31.87519699,  49.87142476,
        55.66396191,  33.03398098,  39.32249353,  55.15723999,
        31.75098199,  61.82868185,  67.98436359,  66.45111173,
        68.09088384,  57.42580141,   7.90566222,  93.05117015,
        44.21880902,  65.5422945 ,  57.44372298,  52.08799937,
        63.14130041,  68.53298467,  48.00403173,  58.55683871,
        59.14957095,  63.00563394,  60.46529197,  79.58424895,
        36.21158367,  32.58760247,  55.02829998,  54.40420964,
        58.85174303,  71.32178925,  77.24437475,  28.3589688 ,
        32.16694145,  33.19155973,  54.26637119,  41.16327586,
        50.58551412,  64.76908479,  38.28325466,  57.01256249,
        26.4025322 ,  51.71212334,  58.58063136,  36.49654396,
        45.05438892,  64.52502789,  24.11713391,  70.08197726,
        33.83088736,  16.29646357,  57.63483978,  61.79741407,
        59.56417407,  32.03595271,  48.93540253,  50.41790003,
        82.64721788,  45.3544536 ,  54.81711147,  77.28636658,
        40.19796987,  52.54404744,  32.56437429,  47.85691129,
        58.62257712,  55.36607257,  59.41562041,  77.16077912,
        21.38747626,  19.38870772,  85.40004097,  26.49009319,
         8.0640901 ,  36.55693585,  14.9592918 ,  32.44420826,
        57.87655584,  47.88871025,  80.95017661,  43.29675305,
        54.95892607,  35.60244226,  41.27100482,  59.14702036,
        44.85299406,  45.88970679,  64.90173338,  16.68215149,
        60.83374325,  54.51335594,  66.12371104,  43.71407551,
        76.35766673,  47.0578887 ,  44.48930216,  58.67214604,
         6.93099036,  44.0784275 ,  87.26743732,  33.74051501,
        31.61433847,  16.18837785,  71.41725892,   7.56165002,
        74.25763798,  53.00725407,  21.21984948,  27.00235844,
        43.27854072,  62.58751425,  72.62697027,  84.83697849,
        39.29949741,  27.74751162,  84.54174836,  53.60907931,
        23.51072372,  52.30666938,  44.9227771 ,  62.00104535,
        36.76673651,  67.79146944,  76.40024822,  72.37617406,
        32.15380912,  38.59583061,  55.06194823,  86.37161605,
        45.8461077 ,  44.76196931,  53.67580888,  55.94391549,
        40.54273035,  72.81303502,  80.22448102,  83.59834747,
        42.68294477,  56.22101165,  42.50225088,  74.24983962,
        74.41151259,  71.16334458,  41.85641042,  57.63886762,
        53.31873729,  18.19712412,  46.10496567,  65.21976216,
        28.71065915,  97.38021009,  32.86679772,  19.66351098,
        38.16941953,  30.58965124, 104.90914337,  25.40905709,
        48.50673348,  40.18356751,  40.01859325,  53.28474473,
        15.10319505,  41.45816069,  60.98240438,  72.87696864,
        17.83039328,  57.73843473,  69.49674981,  12.1607061 ,
        33.19392956,  32.87282521,  22.95524749,  41.49371038,
        21.74403892,  70.68449793,  32.62075174,  41.58195433,
        81.18567803,  33.58930942,  34.64131154,  44.54289417,
        64.36337003,  40.37054446,  22.78226617,  58.91188644,
        67.67729368,  26.23094686,  62.93973918,  55.49175452,
        51.4648093 ,  49.28916451,  47.06318444,  42.58523182,
        56.10613689,  52.40085237,  23.38133769,  55.67721105,
        31.49967033,  72.31546648,  59.25186389,  56.84445298,
        66.88545026,  66.95470607,  20.38200418,  99.7359384 ,
        46.07265114,  69.3898383 ,  78.10224603,  44.37805652,
         8.37255714,  60.67226014,  53.11479668,  64.04126634,
        18.04627226,  38.58411279,  56.02370156,  39.15303342,
        34.63214408,  88.21876653,  67.44644253,  16.43229522,
        83.90872194,  35.93661211,  60.19263761,  42.15495887,
        58.93682819,  33.10147147,  58.65792064,  24.18783816,
        88.72537889,  82.56698502,  49.25789954,  29.74450617,
        58.47459962,  28.70369747,  49.89468339,  78.39481375,
        79.47880766,  82.48991671,  58.24369182,  32.75084164,
        54.21499264,  40.47349446,  66.56791093,  56.32661118,
        37.41345055,  54.07731934,  91.19557085,  71.43299712,
        34.64809948,  90.94826767,  38.70820119,  49.93275932,
        71.52085894,  21.64971732,  67.19714009,  62.88390642,
        11.12847965,  20.41944811,  55.73799815,  17.06247734,
        54.17605223,  86.01987149,  53.79450487,  43.47483199,
        67.10418026,  35.99357907,  22.21169736,  56.93503159,
        43.87585123,  63.02749859,  33.49599091,  54.36035587,
        39.97066494,  35.73469642,  32.5950815 ,  62.69388504,
        55.72244848,  27.56650818,  41.4477204 ,  44.53380104,
        86.10294567,  64.49073862,  50.72484679,  52.70817822,
        43.01421884,  92.43937255,  70.28028032,  40.51593511,
        34.45853813,  22.83929454,  58.21433989,  74.31262301,
        93.80795957,   5.87029511,  53.21195362,  55.25038156,
        72.02442023,  64.24480159,  86.85869318,  44.43041239,
        69.01396061,  50.55803314,  56.43512514,  51.3159804 ,
        54.75580638,  54.56525244,  55.682404  ,  74.54055696,
        51.85127714,  54.69033416,  67.01174536,  19.75335069,
        68.69085763,  34.59123781,  54.60158215,  25.75319982,
        18.98062956,  56.99697019,  28.78845467,  39.70492392,
        49.91829085,  23.6065864 ,  65.61351123,  26.89794478,
        39.59783814,  70.30614995,  29.79739873,  44.79248053,
        34.23500871,  56.96573679,  29.35327205,  15.07479586,
        49.39795207,  37.89516878,  55.27399095,  70.5287793 ,
        34.96631774,  54.9593308 ,  55.06654355,  54.91532766,
        77.45651561, 101.26344763,  40.35671192,  47.72859174,
        84.29509919,  82.24451983,  29.43647485,  49.62186862,
        44.52559469,  19.67293782,  28.19906789,  53.3988459 ,
        76.98822075,  70.76100192,  78.58102673,  42.48809645,
        49.50163486,  60.34105469,  97.2933164 , -11.63648322,
        37.99545145,  22.63833621,  32.4570644 ,  54.51626858,
        12.8345893 ,  52.88724172,  34.79923337,  53.49351012,
        67.21403336,  44.4836404 ,  63.52270485,  61.64884962,
        58.22986623,  44.88083416,  13.91709381,  48.76284166,
        40.26437942,  86.22442121,  57.68535584,  68.62851517,
        71.3348765 ,  36.94344385,  63.38509275,  53.05445266,
        28.86990277,  44.70024564,  75.33577266,  57.81937316,
        26.07899087,  42.28069837,  45.52787923,  55.90744697,
        -3.9831072 ,  83.63294764,  59.18088379,  67.89096616])
new_input_vals.mean()
50.00000000000001
new_input_vals.std()
20.0


Why go for RAG?

Overview

In this project we choose a foundational model i.e. GPT or BERT and create an API that makes it easy to interact with the LLM.

Foundational model

We want a foundational model that can interact in the medical context. Some of the models considered here are :

  • Medical Llama-8b - Optimized to address health related inquiries and trained on comprehensive medical chatbot dataset (Apache License 2.0) foundational model used here Meta-Llama-3-8b
  • Llama3-OpenBioLLM-8B - fine tuned on corpus of high quality of biomedical data, 8 billion parameters. Incorporated the DPO data set

Approaches

To create a chat bot we have 2 approaches:

  • Fine tuning existing foundational models on medical data set
  • Create a Retrieval augmented generation framework which is used for retrieving facts from an external knowledge

Comparisons

Fine tuning existing foundation models on medical data set

  • Incorporates the additional knowledge into the model itself
  • Offers a precise, succinct output that is attuned to brevity.
  • High initial cost
  • Minimum input size

Retrieval Augmented Generation

  • Augments the prompt with external data
  • Provides an additional context during question answering.
  • Possible collision among similar snippets during the retrieval process
  • RAG has larger input size due to inclusion of context information ,output information tends to be more verbose and harder to steer.

Experiment Conclusion

GPT learned 47% of new knowledge with fine-tuning with RAG this number goes upto 72% and 74%.

Preferred approach

What we want?

  • Fast Deployment option

Choice of Approach

RAG allows to create embeddings easily and allows for a fast deployment option.

Architecture

Architecture

References

  • https://arxiv.org/pdf/2401.08406

LangChain

Overview

Consists of 3 components:

  • Components:
  • LLM Wrappers
  • Prompt Templates
  • Indexes for information Retrieval
  • Chains: Assemble Components to solve a specific task
  • Agents: allows LLMs To interact with it's environment

Installation

  • Use Pycharm as your preferred IDE since it makes things easier and user friendly
  • Create a new project in Pycharm which looks as follows:

Create Project Window

References

  1. Youtube