reading-notes

Purpose and Basic Functionality of Pandas

Pandas is a Python library that provides high-performance, easy-to-use data structures and data analysis tools. It is widely used for data manipulation and analysis in a variety of domains, including finance, healthcare, and marketing.

Some common operations that can be performed on data using Pandas include:

These operations can be used to contribute to data analysis and manipulation in a number of ways. For example, loading data from files allows you to access and work with data from a variety of sources. Cleaning and transforming data can help to ensure that your data is consistent and accurate. And analysing data can help you to identify patterns and trends in your data.

Primary Data Structures in Pandas

The primary data structures in Pandas are the Series and the DataFrame.

The Series and the DataFrame are both powerful data structures that can be used for a variety of tasks. However, they differ in terms of their use cases. The Series is typically used for storing and manipulating one-dimensional data, while the DataFrame is typically used for storing and manipulating two-dimensional data.

Loading Data into a Pandas DataFrame

To load a dataset into a Pandas DataFrame, you can use the read_csv() function. The read_csv() function takes a filename as input and returns a DataFrame object.

For example, the following code loads the iris dataset into a DataFrame:

```python import pandas as pd

df = pd.read_csv(‘iris.csv’)

The iris dataset is a well-known dataset that contains measurements of sepal length, sepal width, petal length, and petal width for 150 flowers from three species of iris.

The read_csv() function can also be used to read data from other file formats, such as Excel and JSON.

Common File Formats that can be Used

The following are some common file formats that can be used to load data into a Pandas DataFrame:

The read_csv() function supports a number of options that allow you to customize the way that data is loaded from a file. For example, you can specify the delimiter that is used to separate the columns in a CSV file, or you can specify the header row that contains the column names.

Pandas Functions utilized to read these formats

The following are some of the Pandas functions that are utilized to read these formats:

all hail the mighty Bard…

Class Practice / Warm Up

Given nested lists of integers, write some code that will add 2 to each one.

#

Input:

#

[[1, 2, 3] , [4, 5, 6]]

#

Output:

#

[[3, 4, 5], [6, 7, 8]]

This came from Chat GPT def add_two(nested_list): for i in range(len(nested_list)): for j in range(len(nested_list[i])): nested_list[i][j] += 2 return nested_list

input_list = [[1, 2, 3], [4, 5, 6]] output_list = add_two(input_list) print(output_list)

Roger def add_2(lst): for items in lst: for i, num in enumerate(items): num += 2 items[i] = num return lst

if name = ‘main’:

lst = [[1, 2, 3], [4, 5, 6]] print(add_2(lst))

Next Set use enumerate for the following Given a list of months, print the month and the numerical value of the month

months = [‘January’, ‘February’, ‘March’, ‘April’, ‘May’, ‘June’, ‘July’, ‘August’, ‘September’, ‘October’, ‘November’, ‘December’]

for index, month in enumerate(months, start=1): print(f”Month: {month}, Numerical Value: {index}”)