Pandas

Pandas Full Course 2025 | Digital E-Filing Coach
πŸ”΅ 1. INTRODUCTION TO PANDAS

What is Pandas?

Pandas is like a super-powered Excel inside Python. It lets you load, clean, analyse, and transform data using simple Python commands β€” no complicated coding needed.

Key Points

  • Open Source Python Library β€” Free to use. Just install it with one command: pip install pandas. No payment needed.
  • Built on NumPy β€” Pandas is built on top of NumPy (another Python library), making it very fast when working with large numbers and datasets.
  • Panel Data Derivation β€” The name "Pandas" comes from "Panel Data" β€” a term used in statistics for multi-dimensional structured data.
  • Data Analysis and Manipulation β€” You can filter rows, add columns, calculate averages, sort data, and much more β€” all with just 1–2 lines of code.

Simple Example

import pandas as pd # Step 1: Import Pandas data = {'Name': ['Aman', 'Sara', 'Ravi'], 'Score': [85, 92, 78]} df = pd.DataFrame(data) # Create a table print(df)
πŸ“€ Output:Name β€” Score
Aman β€” 85 | Sara β€” 92 | Ravi β€” 78
FeatureWhat it MeansSimple Analogy
Open SourceFree, public codeLike a free public library
Built on NumPyUses NumPy arrays insideBuilding on a strong foundation
Panel DataMulti-dimensional structured dataBig Excel sheet with rows & columns
Data ManipulationFilter, sort, merge, clean dataEditing a spreadsheet with superpowers
🟒 2. DATA STRUCTURES

2A. Pandas Series β€” 1D Labeled Array

A Series is like a single column of an Excel sheet β€” one list of values, each with a label (index).

  • 1D Labeled Array β€” One-dimensional, like one column of a table.
  • Holds Any Data Type β€” Numbers, text, dates, booleans β€” anything goes in a Series.
  • Labeled Indices β€” Each value has a label: default 0,1,2 or custom labels like 'a','b','c'.
  • Creation via Lists or Dictionaries β€” Create from a Python list or dictionary easily.
s = pd.Series([10, 20, 30], index=['a','b','c']) print(s) # a→10, b→20, c→30

2B. Pandas DataFrame β€” 2D Labeled Table

A DataFrame is like a full Excel table β€” many rows and columns together. Each column is actually a Series.

  • 2D Labeled Table β€” Has rows and columns, just like a spreadsheet.
  • Collection of Series β€” Every column in a DataFrame is a Series.
  • Rows (Observations) β€” Each row = one data record (e.g., one student, one transaction).
  • Columns (Features) β€” Each column = one attribute (e.g., Name, Age, Score).
df = pd.DataFrame({ 'Name' : ['Aman', 'Sara'], 'Age' : [25, 22], 'Score' : [88, 95] })
StructureDimensionsAnalogyCreation Method
Series1D (one column)Single Excel columnpd.Series([1,2,3])
DataFrame2D (rows + columns)Full Excel sheetpd.DataFrame({...})
🟠 3. DATA ACCESS AND INDEXING

3A. Indexers β€” Selecting Data

Indexers are used to pick specific rows and columns β€” like clicking specific cells in Excel.

β–Έ loc β€” Label-based Selection

Use loc when you know the row/column name (label).

df.loc[0, 'Name'] # Row 0, column 'Name' β†’ 'Aman' df.loc[1:2, ['Name','Score']] # Multiple rows and columns

β–Έ iloc β€” Integer Position Selection

Use iloc when you know the position number (0, 1, 2...).

df.iloc[0, 1] # Row 0, Column 1 (integer) β†’ 25 df.iloc[0:2, 0:2] # First 2 rows, first 2 columns

β–Έ at / iat β€” Single Cell Access (Fastest)

df.at[0, 'Name'] # By label β†’ 'Aman' df.iat[0, 0] # By position β†’ 'Aman'

3B. Methods

β–Έ Head and Tail

df.head(3) # First 3 rows df.tail(3) # Last 3 rows

β–Έ Boolean Indexing β€” Filter by Condition

df[df['Score'] > 80] # Only rows where Score is greater than 80

β–Έ Select Dtypes

df.select_dtypes(include=['int64']) # Only integer columns
MethodUsed ForExample
locLabel-based row/col accessdf.loc[0,'Name']
ilocInteger position accessdf.iloc[0,1]
atSingle label-based celldf.at[0,'Age']
iatSingle integer-based celldf.iat[0,1]
head(n)First n rows previewdf.head(5)
tail(n)Last n rows previewdf.tail(5)
BooleanFilter rows by conditiondf[df['A']>10]
select_dtypesFilter columns by data typedf.select_dtypes('float')
🟣 4. KEY OPERATIONS

4A. Arithmetic Operations

  • Broadcasting β€” Apply one value to all rows automatically. Example: add 5 to every student's score.
  • Inter-column Operations β€” Do math between two columns. Example: Total = Maths + Science.
  • add, sub, mul, div Methods β€” Named methods that handle missing values (NaN) more gracefully than + - * /.
df['Score'] + 5 # Broadcasting: adds 5 to every row df['Total'] = df['Math'] + df['Science'] # Inter-column df['Score'].add(10, fill_value=0) # Named method

4B. Statistical Operations

  • mean, median, mode β€” Average, middle value, and most repeated value of a column.
  • sum, min, max β€” Total, lowest, and highest values in a column.
  • idxMin, idxMax β€” Returns the row index (label) of the minimum or maximum value.
  • describe() β€” Gives a complete statistical summary in one shot.
df['Score'].mean() # Average score df['Score'].idxmax() # Which row has the highest score? df.describe() # Full statistical summary

4C. Data Manipulation

  • Sort Values and Index β€” Arrange rows by a column's value or by the row index.
  • AsType Conversion β€” Convert a column's data type (e.g., text β†’ number).
  • Value Counts β€” Count how many times each unique value appears in a column.
  • Unique Values β€” See only the distinct (non-repeating) values in a column.
  • Inplace Parameter β€” inplace=True saves the change directly to the DataFrame, no need to reassign.
df.sort_values('Score', ascending=False) # Descending sort df['Age'] = df['Age'].astype(int) # Convert to integer df['City'].value_counts() # Count each city df['City'].unique() # Unique cities only df.drop_duplicates(inplace=True) # Remove duplicates directly
OperationMethodSimple Meaning
Mean.mean()Average of all values
Median.median()Middle value when sorted
Mode.mode()Most frequently occurring value
Sum / Min / Max.sum() .min() .max()Total / Smallest / Largest
idxmin / idxmax.idxmin() .idxmax()Row index of min/max value
Describe.describe()Full statistics summary report
Sort Values.sort_values()Arrange rows by column value
AsType.astype()Change column data type
Value Counts.value_counts()Count each unique value
Unique.unique()List of all distinct values
Inplaceinplace=TrueSave change to same object
🟑 5. ADVANCED FUNCTIONS

5A. Group By

groupby() splits your data into groups and lets you analyse each group separately β€” exactly like Pivot Table in Excel.

df.groupby('City')['Score'].mean() # Average score for each city separately
πŸ“ Real Example:If data has students from Delhi and Mumbai, groupby('City').mean() gives the average score of Delhi students AND Mumbai students β€” separately in one line.

5B. Aggregate (agg)

agg() lets you apply multiple statistics at once on grouped data β€” saves you running many commands separately.

df.groupby('City')['Score'].agg(['mean','max','min','count']) # Shows mean, max, min, count for each city β€” all at once

5C. Apply and Lambda

apply() runs your own custom function on every row or column. Lambda is a one-line shortcut function β€” no need to write a full def block.

df['Grade'] = df['Score'].apply(lambda x: 'Pass' if x>=50 else 'Fail') # Adds a Grade column: 'Pass' if score β‰₯ 50, otherwise 'Fail'

5D. ApplyMap

applymap() applies a function to every single cell in the entire DataFrame β€” useful for formatting or transforming all values.

df.applymap(lambda x: round(x, 2)) # Round every cell to 2 decimals
FunctionWorks OnPurpose
groupby()DataFrameSplit data into groups for separate analysis
agg()GroupBy resultMultiple statistics applied at once
apply()Column or RowApply any custom function
lambdaInline in apply()One-line function without def
applymap()Entire DataFrameFunction applied to every individual cell
πŸ”· 6. DATA LOADING

6A. read_csv β€” Load a CSV File

Loads data from a CSV file (Comma Separated Values β€” a plain text file where data is separated by commas, like when you export from Excel).

df = pd.read_csv('students.csv') print(df.head()) # Preview first 5 rows

6B. read_excel β€” Load an Excel File

Loads data from an Excel file (.xlsx or .xls). You can also choose which specific sheet to load.

df = pd.read_excel('marks.xlsx', sheet_name='Sheet1')

6C. index_col β€” Set a Column as Row Index

Tells Pandas which column should become the row label (index) instead of the default 0, 1, 2, 3...

df = pd.read_csv('data.csv', index_col='RollNo') # Now RollNo column becomes the row index label
πŸ“ Simple Example:If your CSV has a "StudentID" column, using index_col='StudentID' makes each student's ID the row label, so you can easily search by df.loc['S001'].
FunctionFile TypeKey ParametersQuick Example
read_csv()CSV (.csv)sep, header, dtype, nrowspd.read_csv('file.csv')
read_excel()Excel (.xlsx/.xls)sheet_name, usecols, skiprowspd.read_excel('file.xlsx')
index_colBoth (CSV & Excel)Column name or column numberindex_col='RollNo'
πŸ“Š PANDAS LEARNING FLOWCHART

Step-by-step learning path for Pandas β€” from installation to professional data analysis.

🐍 STEP 1 β€” Install Python & Pandas
pip install pandas numpy
πŸ“¦ STEP 2 β€” Import Pandas
import pandas as pd
πŸ“‚ STEP 3 β€” Load Your Data
read_csv() / read_excel() / index_col
πŸ—‚οΈ STEP 4 β€” Understand Data Structures
Series (1D) & DataFrame (2D)
πŸ” STEP 5 β€” Access & Filter Data
loc / iloc / at / iat / Boolean Indexing
πŸ“ STEP 6 β€” Arithmetic Operations
Broadcasting / add / sub / mul / div
πŸ“Š STEP 7 β€” Statistical Analysis
mean / median / mode / describe / idxmax
πŸ”§ STEP 8 β€” Data Manipulation
sort / astype / value_counts / unique / inplace
βš™οΈ STEP 9 β€” Advanced Functions
groupby / agg / apply / lambda / applymap
πŸ“ˆ STEP 10 β€” Visualise & Export
df.plot() / to_csv() / to_excel()
βœ… DATA ANALYSIS COMPLETE!
Insights ready for decisions 🎯
🧠 PANDAS MIND MAP

Complete visual overview of all Pandas topics from the mind map.

πŸ—ΊοΈ PANDAS LEARNING ROADMAP 2025

Structured 6-Phase Roadmap β€” follow week by week to master Pandas from zero to professional level.

πŸ”΅ PHASE 1 β€” Foundation (Week 1)

  • Install Python 3.x and Pandas using pip
  • Understand what Pandas is and why data analysts use it
  • Learn NumPy basics β€” the foundation beneath Pandas
  • Create your first Series and DataFrame from lists and dictionaries
  • Understand index, columns, values, shape, and dtypes

🟒 PHASE 2 β€” Data Loading (Week 2)

  • Load CSV files using read_csv() with all key parameters
  • Load Excel files using read_excel() β€” choose sheets and columns
  • Use index_col, header, usecols, dtype, nrows parameters
  • Explore loaded data: head(), tail(), info(), shape, describe()
  • Handle missing values: isnull(), dropna(), fillna()

🟠 PHASE 3 β€” Data Access & Indexing (Week 3)

  • Master loc (label-based) and iloc (integer-based) selection
  • Use at and iat for fast single-cell access
  • Filter rows using Boolean Indexing with conditions
  • Select specific column types with select_dtypes()
  • Understand head(), tail(), and slicing techniques

🟣 PHASE 4 β€” Operations & Statistics (Week 4)

  • Arithmetic: Broadcasting and inter-column math operations
  • Use add(), sub(), mul(), div() named methods safely
  • Calculate mean, median, mode, sum, min, max statistics
  • Use idxmin(), idxmax(), describe() for quick summaries
  • Sort with sort_values() and sort_index()
  • Convert data types with astype() properly
  • Analyse with value_counts() and unique()

πŸ”΄ PHASE 5 β€” Advanced Functions (Week 5)

  • Master groupby() for grouped data analysis (like Pivot Table)
  • Use agg() for multiple statistics on groups at once
  • Write custom logic with apply() and lambda functions
  • Apply cell-level transformations with applymap()
  • Merge, join, and concatenate multiple DataFrames
  • Create Pivot Tables and cross-tabulations

πŸ”· PHASE 6 β€” Real Projects & Export (Week 6)

  • Work on real-world datasets β€” Sales, Marks, Finance, GST data
  • Visualise data with df.plot() and matplotlib integration
  • Export clean results to CSV with to_csv() and Excel with to_excel()
  • Build an end-to-end data analysis mini-project from scratch
  • Present and share analysis reports professionally
⚠️ Educational Disclaimer: This resource is for educational purposes only and does not constitute legal or professional advice. All code examples are for learning purposes only. Refer to official Pandas documentation at pandas.pydata.org for authoritative guidance.
Β© 2025 Digital E-Filing Coach – Amanuddin Education. All Rights Reserved.
Scroll to Top