Pandas

Pandas Full Course 2025 | Digital E-Filing Coach

🔵 1. INTRODUCTION TO PANDAS

What is Pandas?

Pandas is like a super-powered Excel inside Python. It lets you load, clean, analyse, and transform data using simple Python commands — no complicated coding needed.

Key Points

Open Source Python Library — Free to use. Just install it with one command: pip install pandas. No payment needed.
Built on NumPy — Pandas is built on top of NumPy (another Python library), making it very fast when working with large numbers and datasets.
Panel Data Derivation — The name "Pandas" comes from "Panel Data" — a term used in statistics for multi-dimensional structured data.
Data Analysis and Manipulation — You can filter rows, add columns, calculate averages, sort data, and much more — all with just 1–2 lines of code.

Simple Example

import pandas as pd   # Step 1: Import Pandas
data = {'Name': ['Aman', 'Sara', 'Ravi'],
        'Score': [85, 92, 78]}
df = pd.DataFrame(data)   # Create a table
print(df)

📤 Output:Name — Score
Aman — 85 | Sara — 92 | Ravi — 78

Feature	What it Means	Simple Analogy
Open Source	Free, public code	Like a free public library
Built on NumPy	Uses NumPy arrays inside	Building on a strong foundation
Panel Data	Multi-dimensional structured data	Big Excel sheet with rows & columns
Data Manipulation	Filter, sort, merge, clean data	Editing a spreadsheet with superpowers

🟢 2. DATA STRUCTURES

2A. Pandas Series — 1D Labeled Array

A Series is like a single column of an Excel sheet — one list of values, each with a label (index).

1D Labeled Array — One-dimensional, like one column of a table.
Holds Any Data Type — Numbers, text, dates, booleans — anything goes in a Series.
Labeled Indices — Each value has a label: default 0,1,2 or custom labels like 'a','b','c'.
Creation via Lists or Dictionaries — Create from a Python list or dictionary easily.

s = pd.Series([10, 20, 30], index=['a','b','c'])
print(s)   # a→10, b→20, c→30

2B. Pandas DataFrame — 2D Labeled Table

A DataFrame is like a full Excel table — many rows and columns together. Each column is actually a Series.

2D Labeled Table — Has rows and columns, just like a spreadsheet.
Collection of Series — Every column in a DataFrame is a Series.
Rows (Observations) — Each row = one data record (e.g., one student, one transaction).
Columns (Features) — Each column = one attribute (e.g., Name, Age, Score).

df = pd.DataFrame({
    'Name'  : ['Aman', 'Sara'],
    'Age'   : [25, 22],
    'Score' : [88, 95]
})

Structure	Dimensions	Analogy	Creation Method
Series	1D (one column)	Single Excel column	pd.Series([1,2,3])
DataFrame	2D (rows + columns)	Full Excel sheet	pd.DataFrame({...})

🟠 3. DATA ACCESS AND INDEXING

3A. Indexers — Selecting Data

Indexers are used to pick specific rows and columns — like clicking specific cells in Excel.

▸ loc — Label-based Selection

Use loc when you know the row/column name (label).

df.loc[0, 'Name']          # Row 0, column 'Name' → 'Aman'
df.loc[1:2, ['Name','Score']]  # Multiple rows and columns

▸ iloc — Integer Position Selection

Use iloc when you know the position number (0, 1, 2...).

df.iloc[0, 1]     # Row 0, Column 1 (integer) → 25
df.iloc[0:2, 0:2] # First 2 rows, first 2 columns

▸ at / iat — Single Cell Access (Fastest)

df.at[0, 'Name']  # By label → 'Aman'
df.iat[0, 0]      # By position → 'Aman'

3B. Methods

▸ Head and Tail

df.head(3)   # First 3 rows
df.tail(3)   # Last 3 rows

▸ Boolean Indexing — Filter by Condition

df[df['Score'] > 80] # Only rows where Score is greater than 80

▸ Select Dtypes

df.select_dtypes(include=['int64'])  # Only integer columns

Method	Used For	Example
loc	Label-based row/col access	df.loc[0,'Name']
iloc	Integer position access	df.iloc[0,1]
at	Single label-based cell	df.at[0,'Age']
iat	Single integer-based cell	df.iat[0,1]
head(n)	First n rows preview	df.head(5)
tail(n)	Last n rows preview	df.tail(5)
Boolean	Filter rows by condition	df[df['A']>10]
select_dtypes	Filter columns by data type	df.select_dtypes('float')

🟣 4. KEY OPERATIONS

4A. Arithmetic Operations

Broadcasting — Apply one value to all rows automatically. Example: add 5 to every student's score.
Inter-column Operations — Do math between two columns. Example: Total = Maths + Science.
add, sub, mul, div Methods — Named methods that handle missing values (NaN) more gracefully than + - * /.

df['Score'] + 5           # Broadcasting: adds 5 to every row
df['Total'] = df['Math'] + df['Science']  # Inter-column
df['Score'].add(10, fill_value=0)  # Named method

4B. Statistical Operations

mean, median, mode — Average, middle value, and most repeated value of a column.
sum, min, max — Total, lowest, and highest values in a column.
idxMin, idxMax — Returns the row index (label) of the minimum or maximum value.
describe() — Gives a complete statistical summary in one shot.

df['Score'].mean()    # Average score
df['Score'].idxmax()  # Which row has the highest score?
df.describe()          # Full statistical summary

4C. Data Manipulation

Sort Values and Index — Arrange rows by a column's value or by the row index.
AsType Conversion — Convert a column's data type (e.g., text → number).
Value Counts — Count how many times each unique value appears in a column.
Unique Values — See only the distinct (non-repeating) values in a column.
Inplace Parameter — inplace=True saves the change directly to the DataFrame, no need to reassign.

df.sort_values('Score', ascending=False)   # Descending sort
df['Age'] = df['Age'].astype(int)          # Convert to integer
df['City'].value_counts()                  # Count each city
df['City'].unique()                        # Unique cities only
df.drop_duplicates(inplace=True)            # Remove duplicates directly

Operation	Method	Simple Meaning
Mean	.mean()	Average of all values
Median	.median()	Middle value when sorted
Mode	.mode()	Most frequently occurring value
Sum / Min / Max	.sum() .min() .max()	Total / Smallest / Largest
idxmin / idxmax	.idxmin() .idxmax()	Row index of min/max value
Describe	.describe()	Full statistics summary report
Sort Values	.sort_values()	Arrange rows by column value
AsType	.astype()	Change column data type
Value Counts	.value_counts()	Count each unique value
Unique	.unique()	List of all distinct values
Inplace	inplace=True	Save change to same object

🟡 5. ADVANCED FUNCTIONS

5A. Group By

groupby() splits your data into groups and lets you analyse each group separately — exactly like Pivot Table in Excel.

df.groupby('City')['Score'].mean()
# Average score for each city separately

📍 Real Example:If data has students from Delhi and Mumbai, groupby('City').mean() gives the average score of Delhi students AND Mumbai students — separately in one line.

5B. Aggregate (agg)

agg() lets you apply multiple statistics at once on grouped data — saves you running many commands separately.

df.groupby('City')['Score'].agg(['mean','max','min','count'])
# Shows mean, max, min, count for each city — all at once

5C. Apply and Lambda

apply() runs your own custom function on every row or column. Lambda is a one-line shortcut function — no need to write a full def block.

df['Grade'] = df['Score'].apply(lambda x: 'Pass' if x>=50 else 'Fail')
# Adds a Grade column: 'Pass' if score ≥ 50, otherwise 'Fail'

5D. ApplyMap

applymap() applies a function to every single cell in the entire DataFrame — useful for formatting or transforming all values.

df.applymap(lambda x: round(x, 2))  # Round every cell to 2 decimals

Function	Works On	Purpose
groupby()	DataFrame	Split data into groups for separate analysis
agg()	GroupBy result	Multiple statistics applied at once
apply()	Column or Row	Apply any custom function
lambda	Inline in apply()	One-line function without def
applymap()	Entire DataFrame	Function applied to every individual cell

🔷 6. DATA LOADING

6A. read_csv — Load a CSV File

Loads data from a CSV file (Comma Separated Values — a plain text file where data is separated by commas, like when you export from Excel).

df = pd.read_csv('students.csv')
print(df.head())    # Preview first 5 rows

6B. read_excel — Load an Excel File

Loads data from an Excel file (.xlsx or .xls). You can also choose which specific sheet to load.

df = pd.read_excel('marks.xlsx', sheet_name='Sheet1')

6C. index_col — Set a Column as Row Index

Tells Pandas which column should become the row label (index) instead of the default 0, 1, 2, 3...

df = pd.read_csv('data.csv', index_col='RollNo')
# Now RollNo column becomes the row index label

📍 Simple Example:If your CSV has a "StudentID" column, using index_col='StudentID' makes each student's ID the row label, so you can easily search by df.loc['S001'].

Function	File Type	Key Parameters	Quick Example
read_csv()	CSV (.csv)	sep, header, dtype, nrows	pd.read_csv('file.csv')
read_excel()	Excel (.xlsx/.xls)	sheet_name, usecols, skiprows	pd.read_excel('file.xlsx')
index_col	Both (CSV & Excel)	Column name or column number	index_col='RollNo'

📊 PANDAS LEARNING FLOWCHART

Step-by-step learning path for Pandas — from installation to professional data analysis.

🐍 STEP 1 — Install Python & Pandas
pip install pandas numpy

📦 STEP 2 — Import Pandas
import pandas as pd

📂 STEP 3 — Load Your Data
read_csv() / read_excel() / index_col

🗂️ STEP 4 — Understand Data Structures
Series (1D) & DataFrame (2D)

🔍 STEP 5 — Access & Filter Data
loc / iloc / at / iat / Boolean Indexing

📐 STEP 6 — Arithmetic Operations
Broadcasting / add / sub / mul / div

📊 STEP 7 — Statistical Analysis
mean / median / mode / describe / idxmax

🔧 STEP 8 — Data Manipulation
sort / astype / value_counts / unique / inplace

⚙️ STEP 9 — Advanced Functions
groupby / agg / apply / lambda / applymap

📈 STEP 10 — Visualise & Export
df.plot() / to_csv() / to_excel()

✅ DATA ANALYSIS COMPLETE!
Insights ready for decisions 🎯

🧠 PANDAS MIND MAP

Complete visual overview of all Pandas topics from the mind map.

🗺️ PANDAS LEARNING ROADMAP 2025

Structured 6-Phase Roadmap — follow week by week to master Pandas from zero to professional level.

🔵 PHASE 1 — Foundation (Week 1)

Install Python 3.x and Pandas using pip
Understand what Pandas is and why data analysts use it
Learn NumPy basics — the foundation beneath Pandas
Create your first Series and DataFrame from lists and dictionaries
Understand index, columns, values, shape, and dtypes

🟢 PHASE 2 — Data Loading (Week 2)

Load CSV files using read_csv() with all key parameters
Load Excel files using read_excel() — choose sheets and columns
Use index_col, header, usecols, dtype, nrows parameters
Explore loaded data: head(), tail(), info(), shape, describe()
Handle missing values: isnull(), dropna(), fillna()

🟠 PHASE 3 — Data Access & Indexing (Week 3)

Master loc (label-based) and iloc (integer-based) selection
Use at and iat for fast single-cell access
Filter rows using Boolean Indexing with conditions
Select specific column types with select_dtypes()
Understand head(), tail(), and slicing techniques

🟣 PHASE 4 — Operations & Statistics (Week 4)

Arithmetic: Broadcasting and inter-column math operations
Use add(), sub(), mul(), div() named methods safely
Calculate mean, median, mode, sum, min, max statistics
Use idxmin(), idxmax(), describe() for quick summaries
Sort with sort_values() and sort_index()
Convert data types with astype() properly
Analyse with value_counts() and unique()

🔴 PHASE 5 — Advanced Functions (Week 5)

Master groupby() for grouped data analysis (like Pivot Table)
Use agg() for multiple statistics on groups at once
Write custom logic with apply() and lambda functions
Apply cell-level transformations with applymap()
Merge, join, and concatenate multiple DataFrames
Create Pivot Tables and cross-tabulations

🔷 PHASE 6 — Real Projects & Export (Week 6)

Work on real-world datasets — Sales, Marks, Finance, GST data
Visualise data with df.plot() and matplotlib integration
Export clean results to CSV with to_csv() and Excel with to_excel()
Build an end-to-end data analysis mini-project from scratch
Present and share analysis reports professionally

⚠️ Educational Disclaimer: This resource is for educational purposes only and does not constitute legal or professional advice. All code examples are for learning purposes only. Refer to official Pandas documentation at pandas.pydata.org for authoritative guidance.
© 2025 Digital E-Filing Coach – Amanuddin Education. All Rights Reserved.

🐼 PANDAS FULL COURSE 2025