In this data analytics project, we will build a Data Professional Survey Breakdown in Power-BI.
Segregated the data in Excel with different types of questions and made into 3 groups as pointed below:
-
Basic analysis on respondents:
- Switching to data career
- Difficulty level of breaking to career of data
- Most favourite programming language
- Comparing the job title along with the industry
-
Analysis on Work Culture based on the survey questions:
- Important factors that respondents look for the new job
- Average rating by the respondents for each survey question
- Respondents are happy with and not at all happy with?
- The relation between Upward-Mobility and learning new things
-
Demographic Analysis:
- Respondents with ethnicity and salary
- Comparing salary with job title and age group
- Number of respondents from country with age group
- % of male and female respondents
Build a Mock up on dashboard in Power Point and shared as PDF file name 'Mock_Up.pdf'
The following Power BI features were incorporated:
- Dynamic title
- DAX functions
- Key Measures
- Page navigation
- Modelling
Following are the tables used :
- Final_Survey_Data
- Score_Sentiments with (0-10) scores:
- 0-4: POOR
- 5-8: AVERAGE
- 9-10: EXCELLENT
- Question_tab :
- Duplicated the “Final_Survey_Data” table and removed all the columns except the 6 Questions.
- Unpivot the 6 Question columns (converts the question in columns to question in rows)
- Delete the Value column and remove the duplicate values in Attribute column
- Rename the Attribute column to Question_Code
- Response table :
- Duplicated the “Final_Survey_Data” table and removed all the columns except ‘Unique ID’, 6 Questions
- Unpivot the 6 Question columns (converts the question in columns to question in rows)
- Renamed the Unique ID column to Respondent ID ,Attribute column to Question_Code and Value column to Score.
- Respondent table: Duplicated the “Final_Survey_Data” table and removed all the columns except:
- Unique ID
- Date Taken
- Time Taken
- Time Spent
- Total_Respondents_Survey = COUNT(Final_Survey_Data[Unique ID])
- Total_M = CALCULATE(COUNT(Final_Survey_Data[Q14 - Gender]),Final_Survey_Data[Q14 - Gender]="Male")
- Total_F = CALCULATE(COUNT(Final_Survey_Data[Q14 - Gender]),Final_Survey_Data[Q14 - Gender] ="Female")
- Male% = DIVIDE([Total_M],[Total_Respondents_Survey])
- Female% = DIVIDE([Total_F],[Total_Respondents_Survey])
- Dynamic_Sub_Title_BA1 = IF( ISFILTERED(Final_Survey_Data[Q16 -Country]),"Respondents from " & SELECTEDVALUE(Final_Survey_Data[Q16 -Country]) & " switched to data career","Respondents from all countries switched to data career")
- Dynamic_Sub_Title_BA3 = IF( ISFILTERED(Final_Survey_Data[Q16 -Country]), "Respondents from " & SELECTEDVALUE(Final_Survey_Data[Q16 -Country]) & " with their favourite language","Respondents from all countries with their favourite language")
- Dynamic_Sub_Title_BA4 = IF( ISFILTERED(Final_Survey_Data[Q16 -Country]),"Respondents from " & SELECTEDVALUE(Final_Survey_Data[Q16 -Country]) & " with Job title along with the industry they belong to","Respondents from all countries with Job title along with the industry they belong to")
- Dynamic_Sub_Title_WC1 = IF(ISFILTERED(Final_Survey_Data[Q1.Job title]),"Important factors " & SELECTEDVALUE(Final_Survey_Data[Q1.Job title]) & " look for the new job in future","Important factors Respondents look for the new job in future")
- Dynamic_Sub_Title_WC2 = IF( ISFILTERED(Final_Survey_Data[Q1.Job title]),"Average rating by " & SELECTEDVALUE(Final_Survey_Data[Q1.Job title]) & " for each question","Average rating from Respondents for each question")
- Dynamic_Sub_Title_WC3 = IF(ISFILTERED(Final_Survey_Data[Q1.Job title]),SELECTEDVALUE(Final_Survey_Data[Q1.Job title]) & " are happy with and not at all happy with?","Respondents are happy with and not at all happy with?")
- Dynamic_Sub_Title_WC4 = IF(ISFILTERED(Final_Survey_Data[Q1.Job title]),"Relationship between Upward-Mobility and Learning new things by "&SELECTEDVALUE(Final_Survey_Data[Q1.Job title]),"Relationship between Upward-Mobility and Learning new things by Respondents")
- Dynamic_Sub_Title_DA1 = IF(ISFILTERED(Final_Survey_Data[Q16 -Country]),"Respondents from "&SELECTEDVALUE(Final_Survey_Data[Q16 -Country])&"with their respective Ethnicity and Salary","Respondents with their respective Ethnicity and Salary")
- Dynamic_Sub_Title_DA2 = IF(ISFILTERED(Final_Survey_Data[Q16 -Country]), "Respondents from "&SELECTEDVALUE(Final_Survey_Data[Q16 -Country]), "Respondents")
- Dynamic_Sub_Title_DA3 = IF(ISFILTERED(Final_Survey_Data[Q16 -Country]), "Respondents from "&SELECTEDVALUE(Final_Survey_Data[Q16 -Country])& " comparing Salary with their respective Job Title and Age","Respondents comparing Salary with their respective Job Title and Age")
- Dynamic_Sub_Title_DA4 = IF(ISFILTERED(Final_Survey_Data[Q16 -Country]), "No. of Respondents from "&SELECTEDVALUE(Final_Survey_Data[Q16 -Country])& " with Age Group","No. of Respondents with Age Group")
- Data -> Final_Survey_Data ->Current Age -> … -> New Group -> Group type : List -> Under Ungrouped values select the values needed and click on Group icon were the selected values move under Groups and Members.
- Rename the grouped values. Here, we have grouped the age into 3 groups and are as follows:
- 18-30 -> Youth Group
- 31-55 -> Middle Aged Group
- 56-100 -> Elder Group
- Average_Score = AVERAGE(Response[Score]) # to get the average rating for each question
- No._of_respondents = COUNTROWS(Response) # since we have unpivoted the 6 columns with the respective scores the number of rows has been increased and to get the count we use this measure
Power BI automatically connected related tables resulting in a star schema model.Response table is a fact for the model and Score Sentiments,Question_tab,Final_Survey_Data and Respondent tables is dimension for the model. The relationship and the Data model are shown in the below screenshots:
Relationship | Data-model |
---|---|
Downloaded the data in Excel from Alex the analyst learning YouTube channel. He took a survey from Data professionals which was posted on LinkedIn and Twitter with response of 700.
Connected the excel sheet in Power Bi and transformed the data which took to Power Query Editor
- Removed the Unwanted columns from the table (Final_Survey_Data) from column ‘Browser’ to column ‘Reference’.
- In Q1. Column, since cleaning each and every ‘Other’ rows will take time. Also, as of now we do not want 1000 other options instead we can simplify it and make it to 6 or 7 options. For that, select Q1. Column -> Home->Split column -> By Delimiter (we have parenthesis after other). Under Select or enter delimiter->Custom-> ‘(’ -> Split at:’Left Most’, which will create another column and remove the extra column. Now we have few options instead of 100.
- Repeat the same steps for the Columns: Q4, Q5, Q8, Q11, Q13
- In Q3 Current Salary Column (eg:106k-125k), we will break up the numbers by taking average of the given range.
- Duplicate the Column Q3, use the copy - We are going to split the column.Home->Split Column->Digit to Non digit
- This helped to split the column into 3 columns.Replace ‘k’ with nothing,Replace ‘-’ with nothing and Replace ‘+’ with 225.
- Convert the Data type of Q3. Copy1 and Q3.Copy2 to Whole No.
- Now to calculate the Average for the column.Add column ->Custom Column -> New Column name: Average_Salary -> Formula : ([Q3.Copy1]+[Q3.Copy2])/2
- Remove Q3.Copy1 and Q3.Copy2 columns
- Basic Analysis
- Analysis on Work- Culture
- Demographic analysis
Filtered by the country name Canada and can see the number of total respondents,male and female at the top right of the page.We can see that 68.7% of respondents from Canada were switched to Data career i.e, initially these 68% were working on some other field.
The graph says that Only 1 middle aged person from Canada felt Very Easy to break in to world of data.None of the youngsters from Canada says that breaking into the world of data is Very Easy but 4 youngsters do say that it was Easy.And 2 middle aged and 2 youth rates that the process of getting into the world of data is Very Difficult.
People from Canada rated Python as their favourite language with 65.63% ,R language with 18.75%.
‘Respondents from Canada with Job title along with the industry they belong to’ [100% stack bar chart]:
In Canada, we can see 80% of data analyst work for Finance industry whereas 100% of data analyst and data engineers works under Construction and Real estate industry respectively.
The graph implies that 188 data analyst rate for Better salary in which 49 are Women. 74 data analyst wants to work anywhere grabs the second most important factor Remote work and 63 votes to have a Good work-life balance.
The bar chart shows 381 data analysts are happy to work with their Co-Workers with an average rate of 6.09 and score 6 is given as they are happy with their current Work-Life balance. But the reality is all the 381 data analysts are not happy with Current Salary in which 101 are women.
The chart says that 52% of Data analysts shows Poor status for their current salary and 23% of respondents are very excited and love to learn new things.
Learning new things can promote upward mobility, and this is a way to nurture and elevate competitive advantage. Here, we can see that there is a high chance of 15 data analysts getting promoted to upward mobility and this is because all 15 are excited and love to learn new things which leads to a happy upward mobility.
Among 32 Canadian respondents 7 White or Caucasian ethnic group bags the highest salary with only one woman. Black or African American & Other are the two ethnic groups with same salary of $67.3.And only 6 Asian or Asian American ethnic women are getting paid with $62 which is the least among all other ethnic groups in Canada.
‘Respondents from Canada comparing Salary with their respective Job title and Age’[Stack bar chart] :
The graph shows that the Youngsters from Canada are working with a low salary of $31 for the job titles student/looking/none implies those respondents may be working as an intern but there people get paid high with $96 for the role Data scientist at their young age. In Canada, people with Middle aged group get paid high ($188) for other job roles and then comes Data engineer with $105.
- Even though the process of breaking into the world of data is difficult, most of the people are giving their efforts to get into it and also 372 out of 630 professionals have switched their initial career to the Data stream.This shows the demand for Data analytics and Data Science is high.
- Learning Python is highly demanded, 66% of total respondents have voted their favourite language.
- Age no matters in the world of data, because out of 630 responses 5 are elder people in which 3 female and 2 male different parts of the world. None of the elder group feel difficulty to get into Data world but the sad part is that they are paying high for ‘Other Job roles’.
- Among 630 responses, 297 have voted to get a better salary were 66 women needs to improve their income.
- Even though the respondents are ready and excited to learn new things they are not happy with their current upward mobility. This might affect the employees’ competitive advantage and performance too.