Measures of Central Tendency: Median

The median is one of the most robust methods to measure the central tendency of a given distribution. It represents the exact middle value of the dataset, making it valuable when dealing with the outliers that skew the distribution. It is called the 50th percentile. Which means that we can divide it between the category where values are below the median value and the values above the median value.

Pros and Cons of the Median

As with any statistical tool, the median has distinct advantages and some disadvantages to consider.

Advantages
  • Resistance to Outliers (Robustness): Its primary strength is that it is not distorted by a small number of extremely high or low values.
  • Easy to Understand and Calculate: The concept of a “middle value” is intuitive, and its calculation is straightforward.
  • Works with Ordinal Data: Unlike the mean, the median can be used with data that is ranked but not truly numerical.
Disadvantages
  • Does Not Use All Information: The median is determined only by the middle value(s) and ignores the precise value of every other observation. This means it can be less sensitive to the full range of the data.
  • High Sampling Variability: In small samples, the median can be more variable than the mean. As a result, the mean is generally considered a more stable and efficient estimator for symmetric distributions.
  • Not Amenable to Further Calculation: Unlike the mean, medians are not easily combined. You cannot calculate the combined median.
The Median and the Shape of a Distribution

The relationship between the mean, median, and mode reveals the shape of a distribution:

  • Symmetric Distribution: In a normal or bell-shaped distribution, the mean, median, and mode are all equal or very close to each other.
  • Skewed Distribution: The mean is always pulled toward the extreme tail.
    In a positively skewed distribution (tail on the right), the mean is greater than the median.
    In a negatively skewed distribution (tail on the left), the mean is less than the median.

In this article, we will see and understand the methods to find the median values and work out some examples.

1. Let us find out the median for an ungrouped dataset.
a. If the number of data points is an odd number (there are 11 values in the dataset), then the sixth value (after ascending-order arrangement) is the median value for the dataset.
b. If the number of data points is an even number (there are 10 values in the dataset), then the average of the fifth and sixth values (after ascending-order arrangement) is the median value for the dataset.

Example (odd N)
Find the median for the data: 5, 4, 6, 7, 3, 9, 10.
Solution:- Sort the data in ascending order: 3, 4, 5, 6, 7, 9, 10.
Find the middle value(s): 6
Meaning of the answer: If an assignment is scored out of 10 points, then 3 students scored less than 6 (50th percentile), and 3 students scored more than 6 points.

Example (even N)
Find the median for the data: 9, 4, 8, 5, 7, 6, 2, 10.
Solution:- Sort the data in ascending order: 2, 4, 5, 6, 7, 8, 9, 10.
Find the middle value(s): 6 and 7
Median =  \frac{6+7}{2} = 6.5
Meaning of the answer: If an assignment is scored out of 10 points, then 4 students scored less than 6.5 (50th percentile), and 4 students scored more than 6.5 points.

2. Let’s find the median for a discrete data (with frequency table)
When data is summarized in a table (value x and frequency f), you cannot list all values, so you use cumulative frequency.

Method: Cumulative Frequency (cf)
Formula:Position=N+12
Where N=f
Procedure:

Calculate N and the position by using the above formula.
Create a cumulative frequency column.
Find the class (value) where the cumulative frequency first reaches or exceeds the position.
The median is that value.

Books (x)Frequency (f) (people having read x books)
23
58
715
912
127
155

Find the median value for this data:
Solution:- Find the sum of all frequencies:
3 + 8 + 15 + 12 + 7 + 5 = 50
Find (N + 1)/2 = 51/2 = 25.5
Since the data has an even number of values, the average of the 25th and 26th positions will be the median.
Cumulative frequency tells that 3 + 8 + 15 = 26, meaning that the 25th and 26th positions are covered.
For both positions, the number of books is 7.
Therefore, the median is (7 + 7)/2 = 7 books.

3. Median for grouped data
The following table shows the distribution of monthly electricity bills (in USD) for 80 households in a city.

Bill (in $)Number of householdsCumulative frequency
20 to 4066
40 to 601010 + 6 = 16
60 to 802016 + 20 = 36
80 to 1002536 + 25 = 61
100 to 1201261 + 12 = 73
120 to 140773 + 7 = 80

Find the median for the data (median monthly electricity bill for the household)
Solution:- As you can see, the cumulative frequency is already included in the table. Normally, this column is not given, and you have to calculate the c.f. as shown in the data.
Hence, the total N = 80
And N/2 = 40
40 comes between 36 and 61. So, the median class: $80 to $100 monthly bill.
Width of the median class = 100 – 80 = 20 (w)
Lower limit of the median class = 80 (L)
c.f. before median class (cf_prev) = 36
frequency of the median class (f_median) = 25
$$ Median = L + (\frac{\frac{N}{2} – c.f_{prev}}{f_{median}})\times {w} $$
$$ Median = 80 + (\frac{\frac{80}{2} – 36}{25})\times {20} $$
$$ Median = 80 + (\frac{40 – 36}{25})\times {20} $$
$$ Median = 80 + (\frac{4}{25})\times {20} $$
$$ Median = 80 + (\frac{80}{25}) $$
$$ Median = 80 + 3.2 = $83.2 $$

Interpretation: 50% of households have a monthly electricity bill less than or equal to $83.20**, and 50% have a bill greater than or equal to $83.20.

Important note: Even though all the group width may not be the same in the distribution, you can still find the median, since it assumes that the data is evenly distributed in the median class, and the formula needs only the width of median class.

4. Median by Graphical Method (Ogive / Cumulative Frequency Curve)

This method finds the median visually without a formula.

Procedure:

  1. Plot a less-than ogive (upper limits vs. cumulative frequency).
  2. Calculate N/2 on the y-axis.
  3. Draw a horizontal line from N/2 to the curve.
  4. From that intersection, draw a vertical line down to the x-axis.
  5. The point where it hits the x-axis is the median.

Take the example of the electricity bill for the household:
So the x and y points will be as follows for less-than ogive:

Bill (in $) Upper LimitCumulative Frequency (c.f.)
406
6016
8036
10061
12073
14080

Similarly the x and y points for more-than Ogive curve will be as follows:

Bill (in $) Lower LimitReverse Cumulative Frequency (c.f.)
2080
4080 – 6 = 74
6074 – 10 = 64
8064 – 20 = 44
10044 – 25 = 19
12019 – 12 = 7

The procedure for finding the median from this data remains the same and delivers the same answer.

Complete List of Graphical Methods to Find the Median
MethodType of CurveHow It Works
1. Less Than OgiveCumulative frequency (upper limits)Find N/2 on Y-axis → drop down
2. More Than OgiveReverse cumulative (lower limits)Find N/2 on Y-axis → drop down (gives same point)
3. Both Ogives TogetherTwo curves on same graphIntersection point of both curves → drop down
4. Frequency PolygonLine connecting class midpointsFind area under curve split into two equal halves
5. HistogramBar chartDraw vertical line splitting total area equally
6. Less-Than-Percentage OgiveCumulative percentage (0-100%)Find 50% on Y-axis → drop down

5. Finding the median by integration for the continuous distribution of data:

To find the median for the continuous data, we can use integration:
We have to define the function and its limits such that
$$ \int_{-{\infty}}^{m}{f(x)} dx = 0.5 $$
For example,
$$ Let \space f(x) = (2x) \space where \space 0 \leq x \leq 1 $$
$$ Hence, \space \int_{0}^{m}{f(x)} dx = 0.5 $$
$$ \int_{0}^{m}{2x} dx = 0.5 = m^2 = 0.5 $$
$$ m = \sqrt{0.5} = 0.7071 $$

The median stands as one of the most robust and intuitive measures of central tendency, offering distinct advantages over the mean when dealing with skewed distributions, outliers, or ordinal data. Its ability to represent the true center of a dataset without being pulled by extreme values makes it indispensable in fields ranging from income analysis and real estate pricing to medical research and quality control.

Throughout this article, we have explored:

  • When and why the median is preferred over the mean and mode.
  • Calculation methods for both discrete data (using positional formulas and cumulative frequency) and continuous data (using the standard interpolation formula, with or without equal class widths).
  • The theoretical definition of the median via integration for continuous probability distributions.
  • Multiple graphical approaches—including the less than ogive, more than ogive, their intersection, histogram area splitting, and the percentage ogive—all of which visually confirm the median’s value without requiring algebraic formulas.

A key takeaway is that while integration provides the exact theoretical median for a known probability density function, real-world grouped data relies on the linear interpolation formula, which assumes uniform distribution within the median class. This approximation is highly accurate when class intervals are reasonably narrow. Moreover, the formula works seamlessly even when class widths are unequal, as long as the actual width of the median class is used.

That’s about it for the article on Median. Until next time, Stay curious and keep learning!


Discover more from universeunlocks.in

Subscribe to get the latest posts sent to your email.

Leave a Reply