reading-notes

Purpose and Basic Functionality of Pandas

Pandas is a Python library that provides high-performance, easy-to-use data structures and data analysis tools. It is widely used for data manipulation and analysis in a variety of domains, including finance, healthcare, and marketing.

Some common operations that can be performed on data using Pandas include:

Loading data from files: Pandas can read data from a variety of file formats, including CSV, Excel, and JSON.
Cleaning and transforming data: Pandas provides a number of tools for cleaning and transforming data, such as removing duplicates, handling missing values, and converting data types.
Analysing data: Pandas provides a number of tools for analysing data, such as grouping data, calculating statistics, and plotting graphs.

These operations can be used to contribute to data analysis and manipulation in a number of ways. For example, loading data from files allows you to access and work with data from a variety of sources. Cleaning and transforming data can help to ensure that your data is consistent and accurate. And analysing data can help you to identify patterns and trends in your data.

Primary Data Structures in Pandas

The primary data structures in Pandas are the Series and the DataFrame.

Series: A Series is a one-dimensional array of data. It can be indexed by a label or an integer.
DataFrame: A DataFrame is a two-dimensional array of data. It is indexed by rows and columns.

The Series and the DataFrame are both powerful data structures that can be used for a variety of tasks. However, they differ in terms of their use cases. The Series is typically used for storing and manipulating one-dimensional data, while the DataFrame is typically used for storing and manipulating two-dimensional data.

Loading Data into a Pandas DataFrame

To load a dataset into a Pandas DataFrame, you can use the read_csv() function. The read_csv() function takes a filename as input and returns a DataFrame object.

For example, the following code loads the iris dataset into a DataFrame:

```python import pandas as pd

df = pd.read_csv(‘iris.csv’)

The iris dataset is a well-known dataset that contains measurements of sepal length, sepal width, petal length, and petal width for 150 flowers from three species of iris.

The read_csv() function can also be used to read data from other file formats, such as Excel and JSON.

Common File Formats that can be Used

The following are some common file formats that can be used to load data into a Pandas DataFrame:

CSV: Comma-separated values
Excel: Microsoft Excel
JSON: JavaScript Object Notation
HTML: HyperText Markup Language
XML: Extensible Markup Language

The read_csv() function supports a number of options that allow you to customize the way that data is loaded from a file. For example, you can specify the delimiter that is used to separate the columns in a CSV file, or you can specify the header row that contains the column names.

Pandas Functions utilized to read these formats

The following are some of the Pandas functions that are utilized to read these formats:

read_csv(): Reads data from a CSV file.
read_excel(): Reads data from an Excel file.
read_json(): Reads data from a JSON file.
read_html(): Reads data from an HTML file.
read_xml(): Reads data from an XML file.

all hail the mighty Bard…

Class Practice / Warm Up

Given nested lists of integers, write some code that will add 2 to each one.

Input:

[[1, 2, 3] , [4, 5, 6]]

Output:

[[3, 4, 5], [6, 7, 8]]

This came from Chat GPT def add_two(nested_list): for i in range(len(nested_list)): for j in range(len(nested_list[i])): nested_list[i][j] += 2 return nested_list

input_list = [[1, 2, 3], [4, 5, 6]] output_list = add_two(input_list) print(output_list)

Roger def add_2(lst): for items in lst: for i, num in enumerate(items): num += 2 items[i] = num return lst

if name = ‘main’:

lst = [[1, 2, 3], [4, 5, 6]] print(add_2(lst))

Next Set use enumerate for the following Given a list of months, print the month and the numerical value of the month

months = [‘January’, ‘February’, ‘March’, ‘April’, ‘May’, ‘June’, ‘July’, ‘August’, ‘September’, ‘October’, ‘November’, ‘December’]

for index, month in enumerate(months, start=1): print(f”Month: {month}, Numerical Value: {index}”)