sql report
SQL FOR DATA ANALYSIS
a) Using SELECT, WHERE, ORDER BY, GROUP BY
1. Basic SELECT
SELECT ID, Mode_of_Shipment, Cost_of_the_Product
FROM ecommerce_shipping;
This query retrieves only the columns ID, Mode of Shipment, and Cost of the Product from the
dataset.
2. Filter Records with WHERE
SELECT *FROM ecommerce_shipping
WHERE Reached_on_Time_YN = 0;
This query selects only those shipments that were not delivered on time (value = 0).
3. Sort Data with ORDER BY
SELECT ID, Cost_of_the_Product, Discount_offered
FROM ecommerce_shipping
ORDER BY Cost_of_the_Product DESC;
Here, shipments are ordered by product cost in descending order to find the most expensive
items first.
4. Group and Aggregate
SELECT Warehouse_block, AVG(Customer_rating) AS avg_rating
FROM ecommerce_shipping
GROUP BY Warehouse_block;
This groups shipments by warehouse and calculates the average customer rating for each
warehouse.
b) Using JOINS (INNER, LEFT, RIGHT)
We created another table warehouse_details to demonstrate joins.
1. INNER JOIN
SELECT e.ID, e.Warehouse_block, w.Location
FROM ecommerce_shipping e
INNER JOIN warehouse_details w
ON e.Warehouse_block = w.Warehouse_block;
Fetches only the orders where warehouse information exists in both tables.
2. LEFT JOIN
SELECT e.ID, e.Warehouse_block, w.Location
FROM ecommerce_shipping e
LEFT JOIN warehouse_details w
ON e.Warehouse_block = w.Warehouse_block;
Fetches all orders, even if there is no matching warehouse in the reference table.
3. RIGHT JOIN
SELECT e.ID, e.Warehouse_block, w.Location
FROM ecommerce_shipping e
RIGHT JOIN warehouse_details w
ON e.Warehouse_block = w.Warehouse_block;
Fetches all warehouses, even if no orders were shipped from them.
C) Using Subqueries
SELECT ID, Mode_of_Shipment, Cost_of_the_Product
FROM ecommerce_shipping
WHERE Cost_of_the_Product = (
SELECT MAX(Cost_of_the_Product)
FROM ecommerce_shipping);
The subquery finds the maximum product cost, and the outer query fetches shipment(s) with
that cost.
d) Using Aggregate Functions (SUM, AVG, COUNT)
1. Total Revenue
SELECT SUM(Cost_of_the_Product) AS total_revenue
FROM ecommerce_shipping;
Calculates the sum of product costs to estimate total revenue.
2. Average Discount by Shipment Mode
SELECT Mode_of_Shipment, AVG(Discount_offered) AS avg_discount
FROM ecommerce_shipping
GROUP BY Mode_of_Shipment;
Groups shipments by mode and calculates the average discount offered.
3. Count Orders by Gender
SELECT Gender, COUNT(*) AS total_orders
FROM ecommerce_shipping
GROUP BY Gender;
Counts how many orders were placed by each gender.
e) Creating Views for Analysis
1. Late Deliveries View
CREATE VIEW late_deliveries AS
SELECT ID, Warehouse_block, Mode_of_Shipment, Cost_of_the_Product
FROM ecommerce_shipping
WHERE Reached_on_Time_YN = 0;
Creates a reusable virtual table showing only late deliveries.
2. Average Cost by Shipment Mode
CREATE VIEW shipment_avg_cost AS
SELECT Mode_of_Shipment, AVG(Cost_of_the_Product) AS avg_cost
FROM ecommerce_shipping
GROUP BY Mode_of_Shipment;
Creates a view that stores the average cost per shipment mode.
f) Query Optimization with Indexes
CREATE INDEX idx_reached_time ON ecommerce_shipping(Reached_on_Time_YN);
CREATE INDEX idx_mode_shipment ON ecommerce_shipping(Mode_of_Shipment);
Indexes speed up queries that filter or group by these columns.
Summary of Work Done So Far-.
Created a table for the dataset.
Wrote queries using SELECT, WHERE, ORDER BY, and GROUP BY.
Used JOINS to combine shipment data with warehouse details.
Applied subqueries to find maximum values.
Implemented aggregate functions (SUM, AVG, COUNT).
Built views for reusable analysis.
Added indexes to improve query performance.
OUTPUT: