Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

To give you a brief overview of sort, filter, slice, and loop in Jupyter with Pandas, I will start with some definitions and move to simple codes that perform those tasks.

Jupyter is an open-source web application that is used to create and share live codes and visualizations. Pandas, on the other hand, provides data structures to efficiently store and manipulate large datasets.

Sorting

As the name implies, this method is used for sorting datasets. It can have an ascending or descending order. Below is an example of sorting a Pandas DataFrame:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Jane', 'Mark'],
                   'Age': [35, 28, 21, 42],
                   'Salary': [50000, 60000, 45000, 70000]})

# Sort the DataFrame by Age in ascending order
df_sorted = df.sort_values(by='Age')

# Print the sorted DataFrame
print(df_sorted)

# Output
#    Name  Age  Salary
# 2  Jane   21   45000
# 1  Mary   28   60000
# 0  John   35   50000
# 3  Mark   42   70000

Filtering

One example of filtering a Pandas DataFrame is using boolean indexing as shown:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Jane', 'Mark'],
                   'Age': [35, 28, 21, 42],
                   'Salary': [50000, 60000, 45000, 70000]})

# Filter the DataFrame by Age greater than 30
df_filtered = df[df['Age'] > 30]

# Print the filtered DataFrame
print(df_filtered)

# Output
#    Name  Age  Salary
# 0  John   35   50000
# 3  Mark   42   70000

Slicing

Slicing is extracting a subset of data from a larger set of data. To slice a Pandas DataFrame in Jupyter, you can use the loc[] or iloc[] methods. Below is an example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Jane', 'Mark'],
                   'Age': [35, 28, 21, 42],
                   'Salary': [50000, 60000, 45000, 70000]})

# Slice the DataFrame to include the first two rows
df_sliced = df.iloc[:2]

# Print the sliced DataFrame
print(df_sliced)

# Output
#    Name  Age  Salary
# 0  John   35   50000
# 1  Mary   28   60000

Looping

Looping is a way to go through a large dataset and perform an operation on each row. To loop through a Pandas DataFrame in Jupyter, you can use the iterrows() method as shown below:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Jane', 'Mark'],
                   'Age': [35, 28, 21, 42],
                   'Salary': [50000, 60000, 45000, 70000]})

# Loop through the DataFrame and print each row
for index, row in df.iterrows():
    print(row['Name'], row['Age'], row['Salary'])

# Output
# John 35 50000
# Mary 28 60000
# Jane 21 45000
# Mark 42 70000

Exercises

The data for the following charts are taken from the data extracted using the methods I have mentioned above.

You may see my code in my GitHub repo here.

Recommended Articles