MY PORTFOLIO
Hi, my name is Emmanuel Musyoki.
I have been a data analyst for 2 years with experience in Business Intelligence (BI) analysis, Operations Research Analysis, and Market Research Analysis.
PROJECTS
DATA EXPLORATION AND CLEANING USING MYSQL
Data link UNICEF_Data .CSV— HIV early infant diagnosis
Purpose: To create a clear visualization and show the trend in HIV infant diagnosis in Africa between 2011 to 2022
The report will help understand the trend in the spread of HIV from mother to child during birth and also understand the rate at which expectant mothers visit centers to know their HIV status.
I have made it as simple as possible for easier navigation and understanding.
UNICEF CSV Data
Step 1: Importing the above CSV data into MySQL workbench for cleaning
Database name health_db
Table name hiv_Africa_CSV
--creating a table in MySQL
USE health_db;
CREATE table HIV_infant_diagnosis (Country_ID INT PRIMARY KEY NOT NULL,
type VARCHAR(50),country VARCHAR(50), UNICEF_region VARCHAR(100),
indicator VARCHAR(100), Data_source VARCHAR(30), year INT,value INT,
lower INT, upper INT);
Step 2: Import the information into the created table.
--importing data into MySQL table
LOAD DATA LOCAL INFILE "C:\\Users\\USER\\Downloads\\HIV_Early_Infant_Diagnosis_2022"
INTO TABLE HIV_infant_diagnosis
FIELDS TERMINATED BY ","
LINES TERMINATED BY "\n"
IGNORE 1 LINES
(Country_ID,type,country,
UNICEF_region, indicator,Data_source,
year,value,lower,upper);
DATA CLEANING
Filtering the data to retain African countries only
DELETE FROM hiv_africa_csv
WHERE UNICEF_Region NOT LIKE '%africa%'
order by country;
select * from hiv_africa_csv
The Table now contains information for Africa Countries only
filtering to remove unnecessary columns.
-- removing unecessary columns from the whole table
ALTER TABLE hiv_africa_csv
DROP COLUMN ISO3,
DROP COLUMN type,
DROP COLUMN Lower,
DROP COLUMN upper;
select * from hiv_africa_csv;
Creating a table for every year, from 2010 to 2021 (using subquery)
CREATE TABLE 2010_data(
select * from hiv_africa_csv where year = 2010);
CREATE TABLE 2011_data(
select * from hiv_africa_csv where year = 2011);
CREATE TABLE 2012_data(
select * from hiv_africa_csv where year = 2012);
CREATE TABLE 2013_data(
select * from hiv_africa_csv where year = 2013);
CREATE TABLE 2014_data(
select * from hiv_africa_csv where year = 2014);
CREATE TABLE 2015_data(
select * from hiv_africa_csv where year = 2015);
CREATE TABLE 2016_data(
select * from hiv_africa_csv where year = 2016);
CREATE TABLE 2017_data(
select * from hiv_africa_csv where year = 2017);
CREATE TABLE 2018_data(
select * from hiv_africa_csv where year = 2018);
CREATE TABLE 2019_data(
select * from hiv_africa_csv where year = 2019);
CREATE TABLE 2020_data(
select * from hiv_africa_csv where year = 2020);
CREATE TABLE 2021_data(
select * from hiv_africa_csv where year = 2021);
Sample table for each year
NB: I have already removed duplicates using an excel filter function.
Now we have the table data for every year, let's compare the data for two or more years using a Join statement.
--comparing 2012 and 2017 data
SELECT * from 2012_data
join 2017_data on 2012_data.country = 2017_data.country;
you can easily compare the percentage value of the two years' data by using joins. In the case above, the percentage value for Burundi in 2012 was 9.5% and in 2017 the value had increased to 18.3%.
In Liberia, the value dropped from 28.5% in 2012 to 9.2% in 2017.
Therefore, one can easily see the trend for two or more years in the same country by joining the tables.
Overview
Suppose you were asked to filter the data to show the results for African countries/ regions only, That is how I approached this problem and as you can see, with just a few lines of code we can retrieve information for every year, country, or compare results for 2 years.
You can also perform more analysis on this data example:
- Which country has the maximum percentage rate in a certain year
- the average HIV infection rate in each year
- Perform statistical analysis to predict the spread of HIV from mother to child during birth
- Use the window function to fetch certain information from the table.