So if you’re watching this video you

probably know how helpful pivot tables are for aggregating data and grouping

data by different categories. In this case part numbers. And you probably also

realize that sometimes you run into outliers and these can actually skew the

mean value. So what I’m gonna do in this video is kind of explain some different

ways that we can filter out those outliers in order to give more accurate

averages! 🙂 Hello everyone! And welcome to The Engineering Toolbox Channel, where I

give you the tools you need to solve real-world engineering problems. So what

I’ve done is I’ve created an example where I’m looking at production data for

given part numbers and we’re trying to find the mean value for how long it

takes to actually make a certain part so if I look at the raw data we can see

that we have multiple jobs where we created a certain quantity of a part and

we can see how long it took and then we just divide that out to see how long per

part it took for that particular run and then we can make a pivot table from that

to find the mean and just as an example and I’ll go through the step-by-step

afterwards but we can see that the outliers if I take out the outliers so I

just removed the outlier values our average for this particular part number

drops down to 0.125 hours per part which is this green line here and that’s a

more accurate representation of the average amount of time that it takes to

complete a part versus if I add this one outlier back in the value or the mean

value jumps back up to point 137 which is actually over here and you can see

just visually when we look at the density distribution that that mean

value is actually incorrect because this outlier is skewing the the average quick

side note about outliers I’m not advocating or saying that all Liars

should always be removed all lawyers often tell us a lot about the data we’re

analyzing but in some cases it’s necessary and actually more accurate to

remove them especially in cases like this where extreme values are more

likely due to human error than not method we’re going to use for finding

our outliers we’ll be using different quantiles and the interquartile range so

if you’ve ever made like a box and whisker plot you’re probably familiar

with this but in order to do it we kind of have to trick Excel a little bit or

use some special functions so let’s just go ahead and add some columns

here for a different quartiles so we’re gonna add q1 q2 and q3 and Excel has

some functions built-in called quantile and we’ll use inclusive and this is

going to be an array function so if you’re not familiar with that I’ll try

to break this down and we’re gonna say if our part range so all of our part

values equals the specific part number then we will feed the hours per part

value back into the quartile function and we don’t need a value there and then

we need to tell it what quartile we’re gonna be solving so we have to use ctrl

+ shift to end hit enter so hold ctrl + shift hit enter for an array function

and that should be our quartile 1 value and then we can just take this and copy

it over to here and change this to quartile 2 by the way that’s the same as

the median value and then for a quartile 3 would be the same thing again base

that over change it to 3 for the 3rd quartile and then control shift enter

for that so just to kind of break this down we use the quartile inclusive

function which means that it includes the values that are at the line of each

quartile kinda hard to explain do a quick Google search if you need more

information on that and then the reason we did this array function is we’re

actually aggregating the data for each part number right here in our table so

what we did is we said look at each of the part numbers or look at all the part

numbers and if that matches the specific role or the part number in this specific

role and we’re gonna feed all the values where that matches back into the core

title function so when we hit control shift enter notice that it adds these

brackets so that we know it’s an array function alright it’s all downhill from

here so once we have our quartiles we can calculate the interquartile range

and that’s just simply the third quartile value minus the first quartile

value and that gives us the range between those two quartiles

and then in order to find our outliers we need to determine the upper and lower

bounds so let’s go lower bound I’ll be four lower bound and you be four upper

bound so the lower bound is going to be quartile one minus one point five times

the interquartile range and then you can just copy that over and we’ll go switch

that Plus and switch this to quartile three for our upper bound and now that

we have our bounds determined we just need to write an if statement to

determine if the values are outside of those bounds so we’ll just add a column

for outlier and then we’ll say if our hours per part value is less than or

equal to our lower bound and we’ll have to make this or as well because we need

to do both bounds so if it’s less than or equal to the lower bound or this

value is greater than or equal to the upper bound close off the or value of

true we’ll say it true it is an outlier and if not we’ll say false

is not enough letter so if we just filter that real quick we’ll see that

there aren’t too many outliers but there are a few and as we saw before that is

enough to see our data so if we select our data and insert a pivot table and

then group by the part number and take our hours per part as the value select

average from our values and this will give us the mean I mean hours per part

and if we insert a slicer and select outliers there now we can actually

filter this to either include or exclude outliers so we can see a few of the

values are affected by outliers this one in particular is pretty significant the

other ones aren’t too significant this is a this is the worst one so that is

how you remove outliers from aggregated data and pivot tables alright I hope you

all enjoyed that one to get more videos as they released make sure to hit that

subscribe button if you liked this video make sure to like and share and if you

didn’t like it well then leave me a comment and let me know how I can

improve last but not least this channel is for you so if you have any

suggestions for topics you want me to cover make sure to let me know thanks

for watching we’ll see you next time

## Reader Comments

Hey everyone! Just to clarify, I was using the terms "quartile" and "quantile" in the video without explaining… A quantile is a GENERAL term for the equal division of a distribution. A quartile is a SPECIFIC type of quantile that breaks the distribution into quarters. The function in excel is for quartiles (i accidentally said quantile). Hope this clearifies! Let me know if there are other questions! 🙂

Click Here: [1:43] to skip explanations and background and get straight to the function! 🙂

I'm using Excel 2016 but I can't seem to build the array like you are. What are your keystrokes at 2:36 through 2:50. I can't seem to display the ([Part #]=[@[Part #]],[HRS/Part]] to select my columns and build my array. Is this a VBA function? I'm a relatively intermediate user of Excel and need to impose your displayed calculation on a million rows.