Module 2: Tools and Technologies for Data Analysis
Week 3: Excel for Data Analysis
Course Overview:
This week is designed to give you practical experience with Excel’s essential tools for data analysis. You will explore how to handle, analyze, and manipulate data using key Excel functions. The exercises will focus on common data analysis tasks, such as sorting, filtering, using pivot tables, performing lookups, and automating processes with macros.
Learning Objectives:
By the end of this week, you will be able to:
- Efficiently sort, filter, and summarize large datasets using pivot tables.
- Perform data lookups and use advanced functions like VLOOKUP, INDEX-MATCH, and conditional formatting to analyze data.
- Create and run simple macros to automate repetitive tasks in Excel.
Day 1: Data Handling in Excel
Topics:
- Sorting and Filtering: Learn how to organize and filter data to focus on key subsets.
- Pivot Tables: Explore how to summarize and analyze large datasets with pivot tables.
Activities:
- Sorting & Filtering:
- Load a provided dataset (e.g., sales data, product information, customer records).
- Sort the data by one or multiple columns (e.g., by date or product category).
- Use filters to display data from specific regions or product types.
- Pivot Tables:
- Create a pivot table to summarize sales by product category and region.
- Explore different ways to aggregate data (e.g., sum, average, count).
- Learn how to drill down into pivot table data for deeper insights.
Practice Exercise:
- Using a retail dataset, sort the data by sales amount, filter the data by year, and create a pivot table to summarize total sales by product category and customer region.
Tools Needed: Excel
Day 2: Basic Functions in Excel
Topics:
- VLOOKUP: Find specific data in large datasets across multiple sheets.
- INDEX-MATCH: Perform flexible lookups when VLOOKUP limitations arise.
- Conditional Formatting: Automatically highlight important trends or outliers in your data.
Activities:
- VLOOKUP:
- Use VLOOKUP to find product prices from a lookup table based on product IDs.
- Practice using VLOOKUP across different sheets.
- INDEX-MATCH:
- Explore how to use INDEX-MATCH as a more versatile alternative to VLOOKUP.
- Perform a lookup for customer names based on transaction IDs, retrieving data from multiple columns.
- Conditional Formatting:
- Apply conditional formatting to highlight rows where sales exceed a certain threshold.
- Set up rules to color-code data based on conditions like “Top 10 sales” or “Sales below average.”
Practice Exercise:
- Use VLOOKUP to retrieve customer contact details from a lookup sheet. Then, use INDEX-MATCH to retrieve product prices based on product names, and apply conditional formatting to highlight sales greater than $1,000.
Tools Needed: Excel
Day 3: Introduction to Macros
Topics:
- Recording Macros: Learn how to record a macro that automates repetitive tasks.
- Running Macros: Understand how to run and modify macros to streamline data tasks.
Activities:
- Creating Simple Macros:
- Record a macro that automates formatting (e.g., changing column widths, bolding headers, applying conditional formatting).
- Assign the macro to a button for easy execution.
- Editing Macros:
- Learn how to view and edit the macro code (written in VBA) to customize your automation.
- Make small modifications to an existing macro, such as adding extra formatting steps.
Practice Exercise:
- Record a macro that automates the process of formatting a report. The macro should include applying bold to column headers, setting filters, and adding conditional formatting to highlight sales that meet a certain target.
Tools Needed: Excel
Day 4: Project-Based Practice
Objective:
Integrate everything you’ve learned this week into a final project that simulates a real-world data analysis scenario.
Project Description:
You are given a dataset of sales transactions from a retail company. Your task is to:
- Sort & Filter: Sort the data by region and filter it to focus only on transactions from the current year.
- Pivot Table Analysis: Create a pivot table summarizing total sales by product category and customer region.
- Data Lookup: Use VLOOKUP or INDEX-MATCH to retrieve the corresponding discount applied for each transaction from a lookup table.
- Conditional Formatting: Apply conditional formatting to highlight transactions where the discount exceeded 20%.
- Macro Automation: Record a macro that automates part of the report creation process, such as formatting the pivot table or applying conditional formatting.
Submission:
Submit your final Excel file, with the pivot table, lookup functions, and macro included, for review and feedback.
Tools Needed: Excel
Outcome by the End of Week 3:
By the end of this week, you will have a strong command of essential Excel tools for data handling, including the ability to summarize data using pivot tables, perform advanced lookups, and automate tasks using macros. These skills will be essential for more efficient data analysis and reporting in both academic and professional environments.
Module 2: Tools and Technologies for Data Analysis
Week 4: SQL for Data Analysts
Course Overview:
This week focuses on learning SQL, a crucial language for data analysis, especially when working with relational databases. By the end of this week, you will understand how to interact with databases, run basic and advanced SQL queries, and manipulate data for analysis. The course is designed for a single participant, allowing for personalized learning and hands-on practice with SQL.
Learning Objectives:
By the end of this week, you will be able to:
- Understand how relational databases work and how to interact with them using SQL.
- Write basic SQL queries to retrieve and manipulate data.
- Apply advanced SQL techniques like subqueries, common table expressions (CTEs), and window functions for complex data analysis tasks.
Day 1: Introduction to Databases
Topics:
- Understanding Relational Databases
- What are relational databases, and how do they store data in tables?
- Understanding key concepts like tables, rows, columns, primary keys, and foreign keys.
- How relationships between tables are established using keys.
Activities:
- Exploring a Database:
- You will be introduced to a sample database (e.g., a customer-order-product database for an e-commerce business).
- Explore the schema of the database to understand how the tables are related (e.g., customer details linked to orders via customer ID).
- Identifying Keys:
- Identify primary keys and foreign keys in the provided database schema and understand their importance in ensuring data integrity and establishing relationships between tables.
Practice Exercise:
- Review a database schema and identify which tables relate to customers, orders, and products. Understand how foreign keys are used to link tables (e.g., customer_id linking the “customers” and “orders” tables).
Tools Needed: SQL environment (e.g., MySQL Workbench, DB Fiddle, or any SQL database tool)
Day 2: Basic SQL Queries
Topics:
- SELECT Statement
- Learn how to retrieve specific columns from a table using the SELECT statement.
- JOINs
- Understand different types of joins (INNER JOIN, LEFT JOIN, RIGHT JOIN) and how they combine data from multiple tables.
- WHERE Clause
- Use the WHERE clause to filter data based on specific conditions.
- GROUP BY Clause
- Learn how to group data by specific columns to perform aggregate functions like SUM, COUNT, and AVG.
Activities:
- Writing Simple Queries:
- Write basic SQL queries to retrieve data from a single table (e.g., select customer names and their corresponding orders).
- JOIN Practice:
- Use INNER JOIN to combine data from “customers” and “orders” tables, retrieving details like customer names, order dates, and product names.
- Filtering Data:
- Apply WHERE clauses to filter orders by date range or customers from specific regions.
- Group and Aggregate:
- Use GROUP BY to summarize sales by product category or calculate the total number of orders per customer.
Practice Exercise:
- Write a query that retrieves customer names, their orders, and total order values using JOIN, WHERE, and GROUP BY clauses. Display the results in an aggregated format (e.g., total sales by customer).
Tools Needed: SQL environment
Day 3: Advanced SQL
Topics:
- Subqueries
- Learn how to nest queries within another query to perform more complex data retrieval (e.g., find customers who placed the largest orders).
- Common Table Expressions (CTEs)
- Understand how CTEs make SQL queries more readable and how they can be used to break down complex queries into manageable steps.
- Window Functions
- Explore window functions like ROW_NUMBER(), RANK(), and OVER(), which allow you to perform calculations across specific “windows” of data without aggregating rows.
Activities:
- Subquery Practice:
- Write a query that uses a subquery to find all customers who placed orders above the average order value.
- CTE Practice:
- Use a CTE to break down a complex query. For example, first create a CTE that retrieves total sales per customer, then use that CTE to find the top 5 customers by sales.
- Window Function Practice:
- Write a query that ranks products by total sales using the RANK() window function. Also, use ROW_NUMBER() to assign row numbers to customers based on their order frequency.
Practice Exercise:
- Write a query that:
- Uses a CTE to find the top 5 selling products.
- Implements a window function (RANK()) to rank these products by total sales.
- Uses a subquery to compare the top 5 products’ sales against the overall average.
Tools Needed: SQL environment
Day 4: Project-Based Practice
Objective:
Combine everything you’ve learned about SQL this week into a final project that simulates a real-world data analysis scenario using SQL.
Project Description:
You are working as a data analyst for an e-commerce company. The company wants to understand customer purchase behavior and product performance. You are provided with three related tables: “customers”, “orders”, and “products”. Your tasks are to:
- Basic Queries:
- Write a SELECT query to retrieve customer names, order dates, and product names from the database.
- Joins & Aggregation:
- Perform an INNER JOIN between the “orders” and “products” tables to calculate the total revenue for each product category.
- Subqueries:
- Use a subquery to find customers who have placed orders exceeding the overall average order value.
- CTEs & Window Functions:
- Create a CTE to calculate total revenue for each customer, then use the RANK() window function to rank customers by their total spending.
Submission:
Submit your SQL queries along with the results for review and feedback.
Tools Needed: SQL environment
Outcome by the End of Week 4:
By the end of this week, you will be proficient in writing SQL queries, joining tables, filtering data, and performing advanced analysis using subqueries, CTEs, and window functions. You’ll have a solid understanding of relational databases and how to apply SQL in real-world data analysis scenarios.
Module 2: Tools and Technologies for Data Analysis
Week 5: Python for Data Analysis
Course Overview:
This week will focus on introducing Python programming and its core libraries for data analysis. Python has become a popular language in the data science community due to its ease of use and powerful libraries like Pandas, NumPy, and Matplotlib. By the end of this week, you will have hands-on experience using Python for data manipulation, numerical analysis, and data visualization.
Learning Objectives:
By the end of this week, you will be able to:
- Write basic Python code to manipulate and analyze data.
- Use libraries like Pandas for data handling, NumPy for numerical computations, and Matplotlib for visualizing data.
- Perform basic data cleaning and analysis using Python.
Day 1: Getting Started with Python
Topics:
- Introduction to Python Programming
- Learn about Python syntax, data types (integers, floats, strings, lists, dictionaries), and basic programming concepts like loops and conditionals.
- Set up a Python environment using Jupyter Notebook, an interactive tool that makes running Python code and analyzing data easier.
Activities:
- Setting up Jupyter Notebook:
- Install Anaconda (which includes Jupyter Notebook) or use Google Colab for online Python coding.
- Open your first notebook and understand how cells work in Jupyter for running code.
- Basic Python Coding:
- Write Python code to perform basic operations (addition, subtraction, multiplication, and division).
- Work with Python lists and dictionaries. Example:
- Create a list of customer names.
- Use a loop to print each name.
- Conditionals and Loops:
- Practice writing if-else statements and loops (for, while).
- Example: Write a loop that iterates over a list of numbers and prints “Even” for even numbers and “Odd” for odd numbers.
Practice Exercise:
- Write a Python script that:
- Takes a list of sales numbers.
- Uses a loop to print whether each sale is “above average” or “below average” based on the average sales value in the list.
Tools Needed: Jupyter Notebook (via Anaconda or Google Colab)
Day 2: Libraries for Data Analysis
Topics:
- Introduction to Pandas, NumPy, and Matplotlib
- Pandas: Learn how to use the Pandas library to load, manipulate, and explore datasets.
- NumPy: Explore NumPy for handling arrays and performing numerical calculations.
- Matplotlib: Create simple plots and charts for visualizing data.
Activities:
- Pandas: Working with DataFrames:
- Import a CSV file using Pandas.
- Explore DataFrames and practice basic operations like selecting columns, filtering rows, and calculating summary statistics (mean, sum, etc.).
- Example:
import pandas as pd
data = pd.read_csv('sales_data.csv')
print(data.head()) # Show the first 5 rows of the dataset
- NumPy: Numerical Operations:
- Use NumPy to create arrays and perform mathematical operations like calculating the mean, standard deviation, and matrix multiplication.
- Example:
import numpy as np
sales = np.array([200, 300, 400, 500])
print(sales.mean()) # Calculate the average sales
- Matplotlib: Basic Plotting:
- Create simple bar charts, line graphs, and histograms to visualize data using Matplotlib.
- Example:
import matplotlib.pyplot as plt
plt.bar(['Product A', 'Product B', 'Product C'], [300, 400, 500])
plt.title('Sales by Product')
plt.show()
Practice Exercise:
- Load a dataset of customer transactions using Pandas.
- Use NumPy to calculate the total and average sales.
- Visualize the sales by product category using Matplotlib (e.g., a bar chart showing total sales per product).
Tools Needed: Jupyter Notebook, Pandas, NumPy, Matplotlib
Day 3: Data Cleaning and Manipulation with Pandas
Topics:
- Data Cleaning and Preprocessing
- Handling missing values: Learn how to identify and fill missing data.
- Data type conversion: Understand how to convert data types (e.g., strings to integers, dates).
- Duplicates: Remove or handle duplicate data entries.
Activities:
- Handling Missing Data:
- Load a dataset with missing values.Use Pandas functions like
.isnull()
to find missing values and.fillna()
to replace them with meaningful values (e.g., mean or median).
- Load a dataset with missing values.Use Pandas functions like
data['sales'].fillna(data['sales'].mean(), inplace=True)
- Data Type Conversion:
- Convert a column of date strings into a proper
datetime
format for easier analysis.
- Convert a column of date strings into a proper
data['order_date'] = pd.to_datetime(data['order_date'])
- Removing Duplicates:
- Identify and drop duplicate records in the dataset.
data.drop_duplicates(inplace=True)
Practice Exercise:
- Load a dataset of customer orders.
- Identify and handle missing data, convert the “order_date” column to a datetime format, and remove any duplicate entries.
- Create a summary report that shows the cleaned dataset with no missing values or duplicates.
Tools Needed: Jupyter Notebook, Pandas
Day 4: Data Analysis with Pandas and Visualization with Matplotlib
Topics:
- Analyzing Data with Pandas
- Grouping data and performing aggregations (sum, mean, etc.).
- Filtering data based on conditions.
- Data Visualization with Matplotlib
- Create multiple types of charts (e.g., bar charts, line graphs, scatter plots) to visualize trends and insights.
Activities:
- Grouping and Aggregating Data:
- Use Pandas to group data by a specific column (e.g., product category) and calculate the total sales for each group.
total_sales_by_category = data.groupby('category')['sales'].sum()
print(total_sales_by_category)
- Filtering Data:
- Filter the dataset to show only sales above a certain threshold.
high_sales = data[data['sales'] > 500]
print(high_sales)
- Creating Visualizations:
- Create a line chart showing sales trends over time, and a bar chart showing total sales by region.
plt.plot(data['order_date'], data['sales'])
plt.title('Sales Over Time')
plt.show()
Practice Exercise:
- Group the dataset by “region” and calculate the total sales for each region using Pandas.
- Create a line plot showing how sales have changed over time.
- Create a bar chart to visualize the total sales by region.
Tools Needed: Jupyter Notebook, Pandas, Matplotlib
Day 5: Project-Based Assignment
Objective:
Combine everything you’ve learned throughout the week to perform a real-world data analysis project using Python.
Scenario:
You have been hired as a data analyst for an online retail company. You are given a dataset of customer orders and sales performance and asked to provide insights into sales trends, customer behavior, and product performance.
Tasks:
- Data Exploration:
- Load the dataset and explore the first few rows using Pandas.
- Perform basic summary statistics (e.g., total sales, average sales, top-selling product).
- Data Cleaning:
- Identify and handle missing data (if any).
- Convert columns like “order_date” to the correct format and remove duplicate entries.
- Data Analysis:
- Group the data by product category and calculate the total sales for each category.
- Filter the data to show high-value customers (e.g., customers who spent more than $1,000).
- Data Visualization:
- Create a bar chart to visualize the total sales by product category.
- Create a line plot showing sales trends over the past year.
Exercise Deliverables:
- Submit your Jupyter Notebook file with the following:
- Data cleaning and manipulation steps.
- Analytical queries (grouping, filtering, and aggregating).
- Visualizations (bar charts, line plots).
Outcome by the End of Week 5:
By the end of this week, you will have a solid understanding of Python programming for data analysis. You will be able to use Pandas for data manipulation, NumPy for numerical computations, and Matplotlib for visualizing data. You will also have hands-on experience in cleaning and analyzing real-world datasets.