Outliers Explained - By Saurabh

 




What are Outliers??

Outliers are defined as samples that are significantly different from the remaining data. Those are points that lie outside the overall pattern of the distribution. Statistical measures such as mean, variance, and correlation are very susceptible to outliers.

A simple example of an outlier is here, a point that deviates from the overall pattern.


Outliers can occur in the dataset due to one of the following reasons:-
1. Genuine extreme high and low values in the dataset
2. Introduced due to human or mechanical error
3. Introduced by replacing missing values.


In some cases, the presence of outliers are informative and will require further study. For example, outliers are important in use-cases related to transaction management where an outlier might be used to identify potentially fraudulent transactions.



How to Detect Outliers??

1.Extreme Value Analysis by BoxPlot 
2.K Means clustering-based approach
3.Visualizing the data


How to Treat Outliers?

1.Mean/Median or random Imputation
2.Trimming
3.Discretization

However, none of these methods will deliver the objective truth about which of the observations are outliers. There is no rigid mathematical definition of what constitutes an outlier; determining whether or not an observation is an outlier is ultimately a subjective exercise that depends heavily on the business problem. So the methods discussed in this article can be a starting point to identify points in your data that should be treated as outliers.

Real-life Examples of Outliers:-

Many times Outliers are treated as a bad component of the data but that's not the case. Suppose you are handling a dataset of cricketers of the Indian cricket team. In the variable of total runs, you will encounter that Sachin Tendulkar and Virat Kohli would be considered as an outlier since their total runs scored is much higher than other cricketers, this doesn't mean that the data entered in their field is wrong, its just that they are better than other cricketers that much.





THANKS FOR READING!!!



Comments

  1. Neat and clean explanation, good work πŸ‘ŒπŸ½

    ReplyDelete
  2. Great Explanation Simple and understandable

    ReplyDelete
  3. Great Explanation Simple and understandable

    ReplyDelete
  4. The real life example given was really very good to understand...keep it up..good work

    ReplyDelete
  5. Explained in very simple way. And that real life example that u give is something that makes it more easy to understand πŸ‘πŸΌπŸ€—

    ReplyDelete

Post a Comment

Popular posts from this blog

Friends TV Show Analysis

Machine Learning Introduction - By Saurabh

Linear Regression Implementation By : Saurabh