MARCH 2021 - DISTRIBUTION FITTING
Hi! Todays newsletter contains two items. In the topic of today distribution fitting is discussed, why it is important and what often goes wrong in practice. Secondly in the API update the reasoning behind creating an API first project is discussed. This format allows for expansion over time with little effort.
TOPIC OF TODAY
Most of the readers are probably familiar with some statistics. It is perfectly possible that you did some number crunching in high school or at university. The probability density function (PDF) is probably the most well known feature of statistical computations. Unfortunately, it is often used and implemented incorrectly. Two very commmon mistakes in fitting a distribution are:
- Automatically assuming a variable is normally distributed
- Fitting a distribution to little measurements and not checking if it actually fits properly
You can compare this to going to a shopping center for swimming pants and blindly picking the first set of short trousers you see. There can only be one logical consequence: uncomfortable holidays. Therefore you always ask yourself two questions:
- Am I actually looking at swimming pants?
- Do the swimming pants actually fit me properly?
When fitting a distribution the first question can be answered by looking how good the distribution types fit the data. This can be determined using Quantile-Quantile plots. Below figure shows that the Normal distribution actually fits the dataset considered in this study best.
After determining the best suitable distribution you check whether the selected distribution fits the data. This is done using a Goodness of Fit test. The API employs the Kolmogorov Smirnov test. This compares the data point which deviates most to a maximum allowed deviation (based on a fixed threshold). If the deviation is sufficiently small the test result is successfull. Based ono a succesfull test you can be reasonably sure that the distribution fits the data.
API UPDATE
The entire project was developed API first, which means with an automation point of view as a basis. This allows statistical analyses to be served fast, eases incorporation in your own workflow and simplifies usage. While it is a common method applied in many fields, engineering lacks behind. Many engineering automation innovations are based on graphical user interfaces. While this can be an advantage it often limits the application. Automated programming interfaces (APIs) are a common method of externally executing small chunks and bits and pieces This helps not only automating but also improving your calculation processes.
FOOTNOTE
Please note that I run this service besides my job at Temporary Works Design. It is my ambition to continuously improve this project and publish corresponding newsletters on new innovations. In busy times this might be less, in quiet times this might be more. Any ideas? Let me know!