Jupyter with Pandas using Large Datasets for Sorting, Filtering, Slicing, and Looping

To give you a brief overview of sort, filter, slice, and loop in Jupyter with Pandas, I will start with some definitions and move to simple codes that perform those tasks.

Jupyter is an open-source web application that is used to create and share live codes and visualizations. Pandas, on the other hand, provides data structures to efficiently store and manipulate large datasets.

Sorting

As the name implies, this method is used for sorting datasets. It can have an ascending or descending order. Below is an example of sorting a Pandas DataFrame:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Jane', 'Mark'],
                   'Age': [35, 28, 21, 42],
                   'Salary': [50000, 60000, 45000, 70000]})

# Sort the DataFrame by Age in ascending order
df_sorted = df.sort_values(by='Age')

# Print the sorted DataFrame
print(df_sorted)

# Output
#    Name  Age  Salary
# 2  Jane   21   45000
# 1  Mary   28   60000
# 0  John   35   50000
# 3  Mark   42   70000

Filtering

One example of filtering a Pandas DataFrame is using boolean indexing as shown:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Jane', 'Mark'],
                   'Age': [35, 28, 21, 42],
                   'Salary': [50000, 60000, 45000, 70000]})

# Filter the DataFrame by Age greater than 30
df_filtered = df[df['Age'] > 30]

# Print the filtered DataFrame
print(df_filtered)

# Output
#    Name  Age  Salary
# 0  John   35   50000
# 3  Mark   42   70000

Slicing

Slicing is extracting a subset of data from a larger set of data. To slice a Pandas DataFrame in Jupyter, you can use the loc[] or iloc[] methods. Below is an example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Jane', 'Mark'],
                   'Age': [35, 28, 21, 42],
                   'Salary': [50000, 60000, 45000, 70000]})

# Slice the DataFrame to include the first two rows
df_sliced = df.iloc[:2]

# Print the sliced DataFrame
print(df_sliced)

# Output
#    Name  Age  Salary
# 0  John   35   50000
# 1  Mary   28   60000

Looping

Looping is a way to go through a large dataset and perform an operation on each row. To loop through a Pandas DataFrame in Jupyter, you can use the iterrows() method as shown below:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Name': ['John', 'Mary', 'Jane', 'Mark'],
                   'Age': [35, 28, 21, 42],
                   'Salary': [50000, 60000, 45000, 70000]})

# Loop through the DataFrame and print each row
for index, row in df.iterrows():
    print(row['Name'], row['Age'], row['Salary'])

# Output
# John 35 50000
# Mary 28 60000
# Jane 21 45000
# Mark 42 70000