We are looking into the possibility of adding an Outliers transform and looking for some feedback.
This would allow you to look for outlier values in a selected column. It would only work for numeric values.
A value would be identified as an outlier if it’s value was above or below a threshhold where the threshhold is:
- N times the Inter Quartile Range above or below the column mean; or
- N times the Standard Deviation above or below the column mean
The user chooses Inter Quartile Range or Standard Deviation and the value of N.
The user would also have a choice what to do with outliers, selecting one of:
- remove outlier rows
- keep only outlier rows
- change outlier values to column mean
- change outlier values to the threshhold
- change outlier values to a value provided by the user
- add a new column marking outliers
Would this be useful to anyone? How do you handle outliers currently? Is this something you deal with much @DanFeliciano ?