Ehijie Collins Agbadu | Freelancer Exploration Data Analysis Report

Exploration Data AnalysisReport

EXPLORATORY DATA ANALYSIS REPORT Group 5B|0801-DV Associate internship. INTRODUCTION The main reason for this Exploratory Data Analysis Report is to find out why the majority of the users who applied for the opportunities are not always completing the opportunities and what makes few complete the opportunities. The datasets that will be used for this find to draw insight are as follows; User Data and Opportunity Wise Data User data: this dataset encompasses non-identifying information about every user who has ever created an account on Excelerate. The data is comprehensive, covering all users, regardless of their engagement with specific opportunities. Opportunity signup and completion data: This dataset focuses on non-identifying user information about learners who have engaged with specific opportunities on the Excelerate platform. DATA OVERVIEW USER DATA The User data which includes every user that has ever created an account on the Excelerate platform involves datasets with 8 column/column headings; Preferred sponsors of the user, gender of the user, country of residence during signup, degree or academic level of user, signup date and time of the user, city of residence, zip or postal code and the last column showing whether the user got the information through social media or not. The data has a total of 27,563 rows which is imperative to the same number of users with an account on Excelerate. OPPORTUNITY WISE DATA The Opportunity Wise Data comprises 17 different columns and 20323 rows. The rows were later reduced to 11482 after duplicates were removed. The profile ID column was used to remove the duplication because it is the most unique field among other fields. COLUMN ANALYSIS USER DATA Preferred sponsors: a list of sponsors the user can choose from which are GlobalShala (GLS), Grant Thornton (GT), Illinois University(ILS), Saint Louis University (SLU) and Excelerate. A user can pick one or more sponsors from the list according to the dataset. Gender: shows whether a user is a male or female and can be left blank (null) because the field is not mandatory when signing up. The frequency ratio of male to female to null is Country: çountry of residence of user during signup which spans across the globe. The frequency of the countries using Excelerate. Degree: The level of education of the user at the time of signup. This includes; High school students, Undergraduate, Graduate, and Not in education. Signup date: The signup date and time when signing up City: City of residence when signing up Zip: Zip code or postal code of City of residence when signing up Is from social media: This can be true or false depending on whether the user gets the information from a Google search or not. OPPORTUNITY WISE DATA The Opportunity Wise Data has a wide data structure with 17 Columns which are; Profile ID: The profile ID is the unique Identity of the user on the Accelerate. Opportunity ID: The opportunity ID is the unique ID that is particular to the opportunity a user applied for. Opportunity Name: This is the name given to each opportunity on the Excelerate platform. Opportunity Category; This is the category of all the available opportunities. Opportunity End Date: This is the date on which an available opportunity will end. Gender: This is the given gender for each user. City: City of residence when signing up. State: This is the state of residence of the user in his or her country and it is presented when signing up for the Excelerate platform. Country: çountry of residence of user during signup which spans across the globe. The frequency of the countries using Excelerate. Zip: Zip code or postal code of City of residence when signing up Graduating Date (yyyy mm): This is the graduating date of the user for the opportunity applied for. Current Student Status: This is the current level of education of each user. Whether the user is still in High School, an undergraduate, or a graduate student or not in education. Current/Intended Major: This is the current academic status of the user before applying for any opportunity. Status Description: This is the status of the user’s opportunity applied for. Where Rejected =340, Applied = 26, Not Started = 732, Team Allocation= 8077, Drop Out = 17, Reward Award = 1285, Started = 693, Withdraw = 311. Apply Date: This is the date the user applied for the opportunity. Opportunity Start date: This is the date the users start the opportunity they applied for. Reward Amount: This amount is given to the user who completed the opportunity. Badge ID: This is the unique badge ID given to the user who earned the badge for the opportunity. Badge name: This is the name of the badge given to the users. Skill points earned: This is the point earned by the users during the opportunity. Skills Earned: This is the number of skills gotten by the user after completing the opportunity. PROFILE ID ANALYSIS The profile ID is the unique ID for a particular user in the Excelerate platform. During the cleaning of the dataset, there were a lot of duplicates and the duplicates were removed to ensure the dataset's integrity. Profile ID is our unique value in the opportunity dataset. It was very fundamental in our data-cleaning process to use this column for the identification of duplicates. During the cleaning process, 8841 duplicates were found in this column. Remaining a total of 11481 unique value. OPPORTUNITY STATUS DESCRIPTION The status description column contains a lot of team allocations to users and few users started the opportunities. Among the few that started, it small number of users got the rewards. There is an insight we need to draw here to know why very few users got the reward at the end of the opportunity. Profile ID is our unique value in the opportunity dataset. It was very fundamental in our data-cleaning process to use this column for the identification of duplicates. During the cleaning process, 8841 duplicates were found in this column. Remaining a total of 11481 unique value. BASIC STATISTICS From the exploration so far, we see that there were a lot of users who applied for various opportunities. However, very few users were able to get the rewards at the end of the opportunity. The number of users that got INITIAL OBSERVATIONS The initial observation we saw here is that the more users that applied to the opportunities the fewer users that got the rewards. Some users allocated to the team couldn’t start and some that started couldn't get the rewards for the opportunity. VISUALIZATION CHALLENGES FACED There were a lot of challenges cleaning the dataset because of the irregularity of the Zip code column and also the visualizations were not easy to generate or design. NEXT STEPS Our next now is to focus on Visualization so that we can have a designed dashboard.