# Measure of central tendency and Dispersion

Let’s say you have dataset in which one column has numeric data type and there are 1000 data points (rows) in that column, It is hard and time consuming to go through each and every data point, hence to overcome this problem we use descriptive statistics which describes our data and makes our task much more simpler. We also use visualizations such as histogram and boxplot to understand the distribution of the data.

# What is summary statistics?

Rather than understanding 1000 rows, summary statistics only has 1 number which can give the idea of whole data.

There are basically 2 types of summarizing techniques.

1) Measure of central tendency.

• Mean
• Median
• Mode

2) Measure of dispersion.

• Range
• Variance
• Standard deviation
• IQR (Interquartile range)

# Measure of central tendency

1.Mean

2. Median

• Sort the data, return the middle value that would be the median.

3. Mode

• Most frequent value in the data.

# Lets take the example importing libraries Calculate mean, median and mode

Now let’s experiment, we will append extreme values in x and see the changes in mean, median and mode.

• Appending big positive integer Affect of outlier on mean, median and mode
• Appending big negative integer Affect of outlier on mean, median and mode

If we add a big positive integer mean followed towards it and if we add a big negative integer mean followed towards it. Notice median and mode remained unchanged.

# Effect of skewness on mean, median and mode.

• Normal distribution Generated using python by author
• Right skewed Generated using python by author
• Left skewed Generated using python by author Conclusion based on observation from above graphs

# Measure of dispersion Comparing two distribution X1 and X2, we will use them to conclude
• Both data i.e. x1, x2 have the same mean, median and mode however when we visualize the data there is a difference in distribution between x1 and x2.
• X2 which is represented by red has more spread than X1 which is represented in green.

So, for complete understanding of data `Measure of central tendency` alone is not helpful. To understand the data more precisely we need to calculate the measure of spread. Measure of dispersion

# Range Define x

# Variance Range

# Standard deviation Variance

# Interquartile Range Standard Deviation Interquartile range

# Concluding on x1 and x2 Concluding x1 Concluding x2

# Conclusion:

• However X1 and X2 have the same mean, median and mode, difference lies in the measure of dispersion.