πΌ PANDAS FULL COURSE 2025
Complete Educational Guide Β· Data Analysis with Python Β· Digital E-Filing Coach β Amanuddin Education
What is Pandas?
Pandas is like a super-powered Excel inside Python. It lets you load, clean, analyse, and transform data using simple Python commands β no complicated coding needed.
Key Points
- Open Source Python Library β Free to use. Just install it with one command: pip install pandas. No payment needed.
- Built on NumPy β Pandas is built on top of NumPy (another Python library), making it very fast when working with large numbers and datasets.
- Panel Data Derivation β The name "Pandas" comes from "Panel Data" β a term used in statistics for multi-dimensional structured data.
- Data Analysis and Manipulation β You can filter rows, add columns, calculate averages, sort data, and much more β all with just 1β2 lines of code.
Simple Example
Aman β 85 | Sara β 92 | Ravi β 78
| Feature | What it Means | Simple Analogy |
|---|---|---|
| Open Source | Free, public code | Like a free public library |
| Built on NumPy | Uses NumPy arrays inside | Building on a strong foundation |
| Panel Data | Multi-dimensional structured data | Big Excel sheet with rows & columns |
| Data Manipulation | Filter, sort, merge, clean data | Editing a spreadsheet with superpowers |
2A. Pandas Series β 1D Labeled Array
A Series is like a single column of an Excel sheet β one list of values, each with a label (index).
- 1D Labeled Array β One-dimensional, like one column of a table.
- Holds Any Data Type β Numbers, text, dates, booleans β anything goes in a Series.
- Labeled Indices β Each value has a label: default 0,1,2 or custom labels like 'a','b','c'.
- Creation via Lists or Dictionaries β Create from a Python list or dictionary easily.
2B. Pandas DataFrame β 2D Labeled Table
A DataFrame is like a full Excel table β many rows and columns together. Each column is actually a Series.
- 2D Labeled Table β Has rows and columns, just like a spreadsheet.
- Collection of Series β Every column in a DataFrame is a Series.
- Rows (Observations) β Each row = one data record (e.g., one student, one transaction).
- Columns (Features) β Each column = one attribute (e.g., Name, Age, Score).
| Structure | Dimensions | Analogy | Creation Method |
|---|---|---|---|
| Series | 1D (one column) | Single Excel column | pd.Series([1,2,3]) |
| DataFrame | 2D (rows + columns) | Full Excel sheet | pd.DataFrame({...}) |
3A. Indexers β Selecting Data
Indexers are used to pick specific rows and columns β like clicking specific cells in Excel.
βΈ loc β Label-based Selection
Use loc when you know the row/column name (label).
βΈ iloc β Integer Position Selection
Use iloc when you know the position number (0, 1, 2...).
βΈ at / iat β Single Cell Access (Fastest)
3B. Methods
βΈ Head and Tail
βΈ Boolean Indexing β Filter by Condition
βΈ Select Dtypes
| Method | Used For | Example |
|---|---|---|
| loc | Label-based row/col access | df.loc[0,'Name'] |
| iloc | Integer position access | df.iloc[0,1] |
| at | Single label-based cell | df.at[0,'Age'] |
| iat | Single integer-based cell | df.iat[0,1] |
| head(n) | First n rows preview | df.head(5) |
| tail(n) | Last n rows preview | df.tail(5) |
| Boolean | Filter rows by condition | df[df['A']>10] |
| select_dtypes | Filter columns by data type | df.select_dtypes('float') |
4A. Arithmetic Operations
- Broadcasting β Apply one value to all rows automatically. Example: add 5 to every student's score.
- Inter-column Operations β Do math between two columns. Example: Total = Maths + Science.
- add, sub, mul, div Methods β Named methods that handle missing values (NaN) more gracefully than + - * /.
4B. Statistical Operations
- mean, median, mode β Average, middle value, and most repeated value of a column.
- sum, min, max β Total, lowest, and highest values in a column.
- idxMin, idxMax β Returns the row index (label) of the minimum or maximum value.
- describe() β Gives a complete statistical summary in one shot.
4C. Data Manipulation
- Sort Values and Index β Arrange rows by a column's value or by the row index.
- AsType Conversion β Convert a column's data type (e.g., text β number).
- Value Counts β Count how many times each unique value appears in a column.
- Unique Values β See only the distinct (non-repeating) values in a column.
- Inplace Parameter β inplace=True saves the change directly to the DataFrame, no need to reassign.
| Operation | Method | Simple Meaning |
|---|---|---|
| Mean | .mean() | Average of all values |
| Median | .median() | Middle value when sorted |
| Mode | .mode() | Most frequently occurring value |
| Sum / Min / Max | .sum() .min() .max() | Total / Smallest / Largest |
| idxmin / idxmax | .idxmin() .idxmax() | Row index of min/max value |
| Describe | .describe() | Full statistics summary report |
| Sort Values | .sort_values() | Arrange rows by column value |
| AsType | .astype() | Change column data type |
| Value Counts | .value_counts() | Count each unique value |
| Unique | .unique() | List of all distinct values |
| Inplace | inplace=True | Save change to same object |
5A. Group By
groupby() splits your data into groups and lets you analyse each group separately β exactly like Pivot Table in Excel.
5B. Aggregate (agg)
agg() lets you apply multiple statistics at once on grouped data β saves you running many commands separately.
5C. Apply and Lambda
apply() runs your own custom function on every row or column. Lambda is a one-line shortcut function β no need to write a full def block.
5D. ApplyMap
applymap() applies a function to every single cell in the entire DataFrame β useful for formatting or transforming all values.
| Function | Works On | Purpose |
|---|---|---|
| groupby() | DataFrame | Split data into groups for separate analysis |
| agg() | GroupBy result | Multiple statistics applied at once |
| apply() | Column or Row | Apply any custom function |
| lambda | Inline in apply() | One-line function without def |
| applymap() | Entire DataFrame | Function applied to every individual cell |
6A. read_csv β Load a CSV File
Loads data from a CSV file (Comma Separated Values β a plain text file where data is separated by commas, like when you export from Excel).
6B. read_excel β Load an Excel File
Loads data from an Excel file (.xlsx or .xls). You can also choose which specific sheet to load.
6C. index_col β Set a Column as Row Index
Tells Pandas which column should become the row label (index) instead of the default 0, 1, 2, 3...
| Function | File Type | Key Parameters | Quick Example |
|---|---|---|---|
| read_csv() | CSV (.csv) | sep, header, dtype, nrows | pd.read_csv('file.csv') |
| read_excel() | Excel (.xlsx/.xls) | sheet_name, usecols, skiprows | pd.read_excel('file.xlsx') |
| index_col | Both (CSV & Excel) | Column name or column number | index_col='RollNo' |
Step-by-step learning path for Pandas β from installation to professional data analysis.
pip install pandas numpy
import pandas as pd
read_csv() / read_excel() / index_col
Series (1D) & DataFrame (2D)
loc / iloc / at / iat / Boolean Indexing
Broadcasting / add / sub / mul / div
mean / median / mode / describe / idxmax
sort / astype / value_counts / unique / inplace
groupby / agg / apply / lambda / applymap
df.plot() / to_csv() / to_excel()
Insights ready for decisions π―
Complete visual overview of all Pandas topics from the mind map.
Structured 6-Phase Roadmap β follow week by week to master Pandas from zero to professional level.
π΅ PHASE 1 β Foundation (Week 1)
- Install Python 3.x and Pandas using pip
- Understand what Pandas is and why data analysts use it
- Learn NumPy basics β the foundation beneath Pandas
- Create your first Series and DataFrame from lists and dictionaries
- Understand index, columns, values, shape, and dtypes
π’ PHASE 2 β Data Loading (Week 2)
- Load CSV files using read_csv() with all key parameters
- Load Excel files using read_excel() β choose sheets and columns
- Use index_col, header, usecols, dtype, nrows parameters
- Explore loaded data: head(), tail(), info(), shape, describe()
- Handle missing values: isnull(), dropna(), fillna()
π PHASE 3 β Data Access & Indexing (Week 3)
- Master loc (label-based) and iloc (integer-based) selection
- Use at and iat for fast single-cell access
- Filter rows using Boolean Indexing with conditions
- Select specific column types with select_dtypes()
- Understand head(), tail(), and slicing techniques
π£ PHASE 4 β Operations & Statistics (Week 4)
- Arithmetic: Broadcasting and inter-column math operations
- Use add(), sub(), mul(), div() named methods safely
- Calculate mean, median, mode, sum, min, max statistics
- Use idxmin(), idxmax(), describe() for quick summaries
- Sort with sort_values() and sort_index()
- Convert data types with astype() properly
- Analyse with value_counts() and unique()
π΄ PHASE 5 β Advanced Functions (Week 5)
- Master groupby() for grouped data analysis (like Pivot Table)
- Use agg() for multiple statistics on groups at once
- Write custom logic with apply() and lambda functions
- Apply cell-level transformations with applymap()
- Merge, join, and concatenate multiple DataFrames
- Create Pivot Tables and cross-tabulations
π· PHASE 6 β Real Projects & Export (Week 6)
- Work on real-world datasets β Sales, Marks, Finance, GST data
- Visualise data with df.plot() and matplotlib integration
- Export clean results to CSV with to_csv() and Excel with to_excel()
- Build an end-to-end data analysis mini-project from scratch
- Present and share analysis reports professionally
Β© 2025 Digital E-Filing Coach β Amanuddin Education. All Rights Reserved.
