In this tutorial, we’ll learn how to build and use a pandas series as well as explore the different measures of central tendency.
Pandas is a library used in the Python programming language for data analysis. One (of the many) things that this library can do is to create a series. A series is similar to a list, but it is formatted like a single column of a data table. Let’s check out the code that can be used to create a series.
Let’s go over the syntax. First, we import the pandas library. Pandas is often abbreviated as pd
. This way later in the program, we can just use pd
instead of typing out the full word pandas
. The variable name pies
is used to store our Series. We use pd
(which calls on the pandas library) followed by a period and the word Series
. The S should be capitalized in Series
. Then, you can see the list that will be used to populate the series.
Run the code to view the list printed in the editor.
When a series is created, each entry, or item, will be assigned a labeled index starting with 0. Also, notice that the data type of the items in the series is printed out as well. The list used contains Strings which are objects.
Change the elements and see how the list changes as well!
Now that we have our data in a series, we can use functions that are built into the pandas library to find the mean, the median and the mode. Let’s talk about these three different measures of central tendency. Central tendency is a summary of different measures that attempts to find the center of a data set. You may be familiar with some of these measures. Here is a list of functions that can be used on a series.
series.mean()
- adds up all of the numbers and divides by the number of items in the set
series.median()
- determines the middle number, when the set is arranged in order
series.mode()
- determines the number that appears the most often
This activity contains a list of the weights of cats at a shelter. Use the functions just learned along with the series called cats
to determine and print the mean, median, and mode of the dataset. Note: You must use the print statement as well to view the data in the editor.
Notice in the last coding activity, that we were able to change the indices! The cats were given names instead of the default index numbering. We can add a parameter to the Series function by including a comma and then index=
. Here we can use another list that will act as the index instead of using the default numbering.
Here is your chance to put everything we learned in this tutorial together. The following table lists the winners of the last few year’s hot dog eating contest and the number of hot dogs that they consumed. Use this information to create a Pandas Series. Use the names as the indices of the Series.
Name | Number of Hotdogs |
---|---|
Joey Chestnut | 72.0 |
Miki Sudo | 41.0 |
Joey Chestnut | 70.0 |
Miki Sudo | 38.0 |
Matthew Stonie | 62.0 |
Miki Sudo | 38.0 |
Joey Chestnut | 61.0 |
Miki Sudo | 34.0 |
Joey Chestnut | 69.0 |
Sonya Thomas | 36.8 |
Then, find the mean, the median, and the mode of the dataset.