To give you a brief overview of sort, filter, slice, and loop in Jupyter with Pandas, I will start with some definitions and move to simple codes that perform those tasks.
Jupyter is an open-source web application that is used to create and share live codes and visualizations. Pandas, on the other hand, provides data structures to efficiently store and manipulate large datasets.
Sorting
As the name implies, this method is used for sorting datasets. It can have an ascending or descending order. Below is an example of sorting a Pandas DataFrame:
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Jane', 'Mark'],
'Age': [35, 28, 21, 42],
'Salary': [50000, 60000, 45000, 70000]})
# Sort the DataFrame by Age in ascending order
df_sorted = df.sort_values(by='Age')
# Print the sorted DataFrame
print(df_sorted)
# Output
# Name Age Salary
# 2 Jane 21 45000
# 1 Mary 28 60000
# 0 John 35 50000
# 3 Mark 42 70000
Filtering
One example of filtering a Pandas DataFrame is using boolean indexing as shown:
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Jane', 'Mark'],
'Age': [35, 28, 21, 42],
'Salary': [50000, 60000, 45000, 70000]})
# Filter the DataFrame by Age greater than 30
df_filtered = df[df['Age'] > 30]
# Print the filtered DataFrame
print(df_filtered)
# Output
# Name Age Salary
# 0 John 35 50000
# 3 Mark 42 70000
Slicing
Slicing is extracting a subset of data from a larger set of data. To slice a Pandas DataFrame in Jupyter, you can use the loc[]
or iloc[]
methods. Below is an example:
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Jane', 'Mark'],
'Age': [35, 28, 21, 42],
'Salary': [50000, 60000, 45000, 70000]})
# Slice the DataFrame to include the first two rows
df_sliced = df.iloc[:2]
# Print the sliced DataFrame
print(df_sliced)
# Output
# Name Age Salary
# 0 John 35 50000
# 1 Mary 28 60000
Looping
Looping is a way to go through a large dataset and perform an operation on each row. To loop through a Pandas DataFrame in Jupyter, you can use the iterrows()
method as shown below:
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Jane', 'Mark'],
'Age': [35, 28, 21, 42],
'Salary': [50000, 60000, 45000, 70000]})
# Loop through the DataFrame and print each row
for index, row in df.iterrows():
print(row['Name'], row['Age'], row['Salary'])
# Output
# John 35 50000
# Mary 28 60000
# Jane 21 45000
# Mark 42 70000
Exercises
The data for the following charts are taken from the data extracted using the methods I have mentioned above.
You may see my code in my GitHub repo here.