DataViz Makeover 2

This visualisation presents public sentiment towards a COVID-19 vaccine.

Andy WONG https://www.linkedin.com/in/andywongkh
02-19-2021

1.0 Introduction

One year on, the COVID-19 pandemic continues raging and is still very much on everyone’s mind. 2021 started with a faint glimmer of hope as vaccinations begin to roll out across the world. Countries that have access to the various vaccines are doing their best to fight against a current of anti-vaccine sentiment. While many people are hopeful that the vaccine will finally bring the pandemic under control, many others are wary of the potential dangers of being inoculated with a vaccine so new that any potential long-term side effects have yet to be uncovered. Compounding this fear is the fact that many of these vaccines were pushed through approval processes at lightning speed, causing people to question if the proper processes and due diligence were maintained.

YouGov, together with Imperial College London, conducts an ongoing survey to gather data on people’s behaviours in response to COVID-19. This blog uses the publicly available data and focuses on people’s sentiments to the COVID-19 vaccines in particular. The scope of this visualisation is for survey responses from January 2021.

1.1 The Scene

A research team is currently conducting a study to understand public sentiment regarding the COVID-19 vaccines. Below are two data visualisations created by one of the research scientists.

1.2 The Task

  1. Critique the graph for both its clarity and aesthetics. At least three observations from each evaluation criterion.
  2. With reference to the critique above, suggest an alternative graphical presentation to improve the current design. Sketch out the proposed design. The proposed alternative design should include interactive techniques. Support your design by describing the advantages or which part of the issue(s) your alternative design is trying to overcome.
  3. Using Tableau, design your proposed data visualisation.
  4. Provided step-by-step descriptions on how the data visualisation was prepared.
  5. Describe three major observations revealed by the data visualisation prepared.

The remade data visualisation can be viewed on Tableau Public.

2.0 Makeover

2.1 Critique of the visualisation, and suggested improvements

2.1.1 Clarity

S/n Critique Suggestions for improvement
1 The graphs do not show the confidence intervals of the responses. This does not provide a good sense of how reliable the survey scores are. A better sense of the variability and reliability of the survey responses can be given if a confidence interval were presented in the visualisation. Include the confidence interval of the survey scores for a more holistic and reliable presentation of the survey results.
2 The visualisation is one-dimensional and too simplistic.

a) It does not take into account demographic data such as gender, age, employment status, and household constitution. The data presented is too broad and simplistic for useful sense-making.

b) It only presents data from one single survey question, which does not provide sufficient context to the responses and sentiment of the respondents. To have a fuller understanding of public sentiment, other relevant survey questions should be included.
Introduce more dimensions into the visualisation.

a) Include demographic profiles into the visualisation for a more nuanced presentation of the data.

b) Introduce other relevant survey questions for a deeper and fuller understanding of the data.
3 The stacked bar chart can more accurately reflect what the survey responses are. While the stacked bars show the proportion of responses according to the survey score, it does not clearly show the positive- or negative-leaning of the responses to each survey question. It would be useful to know the public’s general leaning in response to the various questions. Redesign the stacked bar chart to show the general consensus of the responses in a more visually intuitive way.

2.1.2 Aesthetics

S/n Critique Suggestions for improvement
1 The colour scheme for the right-hand chart does not corroborate to the usual meaning of the colours. Red typically has the opposite meaning of blue, and so if blue is 1, red should be 5. Improve the colour scheme of the chart to be more intuitive.
2 The right-hand chart does not add much more value to the visualisation other than to sort the countries in descending order of responses to survey score 1. This is too broad a generalisation for the visualisation to offer any meaningful insight other than the countries that are more ‘pro-vaccine’ based on score 1. Redesign the right-hand chart entirely to provide more insight to the data. This could include any of the suggestions for improvement discussed in sections 2.1.1 and 2.1.2.
3 Lack of editing and proofreading of the title of the graph. Spelling mistakes and grammatical errors, although not critical in themselves, distract from that which is critical: the main content and message of the visualisations. In the left-hand chart, ‘vaccine’ was spelt as ‘vacinne’. Ensure proper editing and proofreading of the content (eg. ‘vaccine’, not ‘vacinne’) to enhance the overall presentation and bring focus to the critical message of the visualisation. The names of the countries could also be shown to their proper format (‘United Kingdom’ instead of ‘united-kingdom’), and capitalised.

Based on the above assessment, the chart falls across Quadrants I and IV of the matrix:

2.2 Sketch an alternative visualisation


The proposed visualisation above offers comparison with a country, as well as across countries. This will provide a better comparison of public sentiment across regions, for each survey question.

The visualisation will allow for customisation of demographic data for more nuanced visualisation, depending on the user’s scope of interest.

A confidence interval will be shown in the visualisation (right-hand chart) to allow for more accurate understanding of the survey responses.

The colours of the left-hand chart will be adjusted to be more intuitive.

2.3 Build the proposed visualisation

2.3.1 Import the data

Step 1: Open Tableau. Add in any one of the data files as the data source.

Step 2: Remove the table as the dataset.

Step 3: Double-click on “New Union”, multi-select all the country csv files and drag them into the Union dialogue under “Specific (manual).

Click OK. This will union all the tables into one single dataset.

2.3.2 Clean the data

Endtime column
Step 4: The “endtime” column will be used to ascertain the date of the survey response.
First, change the data type to string.

Step 5: Click on Custom Split.

Step 6: Type a space into “Use the separator” field to split by space.
Click OK.

Step 7: “endtime – Split 1” is created.

Step 8: Change data type to Date.

Step 9: Rename as “Endtime” for spelling consistency.

Step 10: Click on the column that is not needed and click Hide.

Employment status merged column
Step 11: Click on Manage metadata.

Keep only the relevant columns:
endtime Split-1
household_children
age
gender
household_size
employment_status
employment_status_1
employment_status_2
employment_status_3
employment_status_4
employment_status_5
employment_status_6
employment_status_7
Table Name.

Some countries saved employment_status data as Boolean type. This is mismatched from other countries that did not. As such, these columns needs to be merged into one single column to aid the building of the visualization.

Step 12: Create a calculated field as shown.

Step 13: Ensure data type is String.

Step 14: Rename the column.

Household children recoded column
Step 15: Create a calculated field for household_children to remove null values.

Step 16: Rename column.

Step 17: Ensure data type is integer.

Household size recoded column
Step 18: Create a calculated field for household_size to remove null values.

Step 19: Ensure data type is integer.

Step 20: Rename column.

Survey Question and Survey Response (INT) columns
Step 21: Pivot all survey responses to make the table long and narrow.

Step 22: Edit the aliases of all the survey questions for clarity.

Step 23: Ensure data type for Survey Question is String.

Step 24: Rename column.

Step 25: Split the Survey responses to retain only the number.

Step 26: Change the data type to integer.

Country column
Step 27: Change the country names to the proper spelling.

Step 28: Ensure data type is String, and Geographic Role is Country/Region.

Step 29: Rename the field as “Country”.

Age column
Step 30: Ensure data type is integer.

Step 31: Rename column for consistency.

Gender column
Step 32: Ensure “gender” column is String type.

Step 33: Rename column for consistency.

Saving a new data set
Step 34: Open a new Sheet and click View Data.

Step 35: Click on Export All and save data file as COVID_survey.csv. This will save only the cleaned fields and remove the hidden fields, thereby having a smaller dataset to use, for improved efficiency.

2.3.3 Creating the calculated fields

Count Negative field
The Count Negative field orientates the centre of the neutral survey responses to 0, so that the Strongly Agree, Agree, Disagree, and Strongly Disagree can be presented in a visually more intuitive way.

Step 36: Create a calculated field named Count Negative.

Gantt Start field
The Gantt Start field defines the starting point of the stacked bar chart.

Step 37: Create a calculated field named Gantt Start.

Gantt Percent field
The Gantt Percent field defines the length of each portion of the stacked bar chart.

Step 38: Create a calculated field named Gantt Percent.

Number of Records field
The Number of Records field counts the number of records in the dataset.

Step 39: Create a calculated field named Number of Records.

Percentage field
The Percentage field converts the count of the entries to a percentage.

Step 40: Create a calculated field named Percentage.

Strongly Agree and Agree Scores field
The Strongly Agree and Agree Scores field counts the number of Strongly Agree (1) and Agree (2) survey scores.

Step 41: Create a calculated field named Strongly Agree and Agree Scores.

Total Count field
The Total Count field totals up the number of entries for each response.

Step 42: Create a calculated field named Total Count.

Total Count Negative field
The Total Count Negative field totals up the numbers given in the Count Negative field.

Step 43: Create a calculated field named Total Count Negative.

Proportion field
The Proportion field calculates the proportion of Strongly Agree and Agee Scores column against the total number of records.

Step 44: Create a calculated field named Proportion column.

Prop_SE field
The Prop_SE field calculates the standard error of the Proportion field. This will be used to calculate the confidence interval of the survey responses.

Step 45: Create a calculated field named Prop_SE.

Z_95% field
The Z_95% field calculates Z-score for the 95% confidence interval, to be used later.

Step 46: Create a calculated field named Z_95%.

Prop_Margin of Error 95% field
The Prop_Margin of Error 95% field calculates standard error for a Z-score of 95%. This will be used later.

Step 47: Create a calculated field named Prop_Margin of Error 95%.

Prop_Lower Limit 95% field
The Prop_Lower Limit 95% field calculates the lower limit of the standard error at a 95% confidence interval.

Step 48: Create a calculated field named Prop_Lower Limit 95%.

Prop_Upper Limit 95% field
The Prop_Upper Limit 95% field calculates the upper limit of the standard error at a 95% confidence interval.

Step 49: Create a calculated field named Prop_Upper Limit 95%.

2.3.3 Create the worksheet (Comparison within country)

Step 50: Drag Endtime to the Filters Pane and filter to January 2021, since the scope of this visualisation is only for that month.

Step 51: Right-click on Endtime and select January 2021. Click OK.

Step 52: Drag Gantt Percent to the Columns Pane and Survey Questions to the Rows Pane.

Step 53: On Gantt Percent, click on the small arrow and ensure Continuous format, and Compute Using, Survey Response (INT).

Step 54: On Survey Question, click on the small arrow and ensure Continuous format, Compute Using Survey Response (INT).

Step 55: Drag Survey Response (INT) to the Colour Mark, and set to Dimension and Discrete.

Step 56: Drag Percentage to the Size mark and set to Continuous.

Step 57: Drag Survey Response (INT) to the Detail Mark and set to Dimension and Discrete.

Step 58: Click on the Colour Mark, click on Edit Colours and customise the colour scheme of Survey Response (INT) accordingly.

Step 59: Adjust the size of Percentage accordingly to make the bars in the chart of the appropriate thickness.

Step 60: Drag Country to Filters Pane and select Show Filter.

Step 61: Click on the small arrow beside Country and select Single Value (dropdown).

Step 62: Drag the Filter to the top of the graph.

Step 63: Drag gender to Filters Pane and select Show Filter.

Step 64: Click on the small arrow beside gender and select Single Value (dropdown).

Step 65: Click on the small arrow beside gender and select Edit Title. Type “Gender”.

Step 66: Drag age to Filters Pane and select Show Filter.

Step 67: Click on the small arrow beside age and select Range of Values.

Step 68: Click on the small arrow beside age and select Edit Title. Type “Age”.

Step 69: Drag Employment status merged to Filters Pane and select Show Filter.

Step 70: Click on the small arrow beside Employment status merged and select Single Value (dropdown).

Step 71: Click on the small arrow beside Employment status merged and select Edit Title. Type “Employment Status”.

Step 72: Drag Household Children recoded to Filters Pane and select Show Filter.

Step 73: Click on the small arrow beside Household Children recoded and select Range of Values.

Step 74: Click on the small arrow beside Household Children recoded and select Edit Title. Type “No. of Children in Household”.

Step 75: Drag Household Size recoded to Filters Pane and select Show Filter.

Step 76: Click on the small arrow beside Household Size recoded and select Range of Values.

Step 77: Click on the small arrow beside Household Size recoded and select Edit Title. Type “Size of Household”.

Step 78: Double click on the title and edit accordingly to show the customized settings selected by the user.

Step 79: Double-click on the worksheet tab and rename accordingly.

Step 80: Click on the y-axis label Survey Question and sort in descending order.

Step 81: The final chart is shown below.

2.3.4 Create the worksheet (Comparison across countries)

Step 82: Drag Endtime to the Filters Pane and filter to January 2021, since the scope of this visualisation is only for that month.

Step 83: Right-click on Endtime and select January 2021. Click OK.

Step 84: Drag Measure Values to the Columns Pane and Country to the Rows Pane.

Step 85: Drag Gender to the y-axis and to the Colour Mark.

Step 86: Click on Colour then Edit Colours, and edit the colours accordingly.

Step 87: Drag Proportion to the top of the chart.

Step 88: Right-click and select Synchronise Axis.

Step 89: Drag Measure Names to the Filters Pane, click on the small arrow and select Edit Filter.

Step 90: Select the Prop_Lower Limit 95% and Prop_Upper Limit 95% to define the upper and lower limits of the 95% confidence interval.

Step 91: Drag Survey Question to the Filters Pane. Click on the small arrow and select Show Filter.

Step 92: Drag the Filter to the top of the graph.

Step 93: Drag both Prop_Lower Limit 95% and Prop_Upper Limit 95% to the Measure Values Pane.

Step 94: Ensure Measure Names is in the Measure Values Marks.

Step 95: Select Line as the Mark Type.

Step 96: Ensure Measure Names is also in AGG(Proportion) Pane. Change Mark Type to Colour and Circle.

Step 97: Edit the colour of the Proportion measure.

Step 98: Edit the colour of the Measure Values.

Step 99: Drag gender to Filters Pane and select Show Filter.

Step 100: Click on the small arrow beside gender and select Single Value (dropdown).

Step 101: Click on the small arrow beside gender and select Edit Title. Type “Gender”.

Step 102: Drag age to Filters Pane and select Show Filter.

Step 103: Click on the small arrow beside age and select Range of Values.

Step 104: Click on the small arrow beside age and select Edit Title. Type “Age”.

Step 105: Drag Employment status merged to Filters Pane and select Show Filter.

Step 106: Click on the small arrow beside Employment status merged and select Single Value (dropdown).

Step 107: Click on the small arrow beside Employment status merged and select Edit Title. Type “Employment Status”.

Step 108: Drag Household Children recoded to Filters Pane and select Show Filter.

Step 109: Click on the small arrow beside Household Children recoded and select Range of Values.

Step 110: Click on the small arrow beside Household Children recoded and select Edit Title. Type “No. of Children in Household”.

Step 111: Drag Household Size recoded to Filters Pane and select Show Filter.

Step 112: Click on the small arrow beside Household Size recoded and select Range of Values.

Step 113: Click on the small arrow beside Household Size recoded and select Edit Title. Type “Size of Household”.

Step 114: Sort Proportion of responses in descending order.

Step 115: Double click on the title and edit accordingly to show the customized settings selected by the user.

Step 116: Double-click on the worksheet tab and rename accordingly.

Step 117: The final graph is shown below.

2.3.5 Create the dashboard

Step 118: Drag “Comparison within country” and “Comparison across countries” into the left-hand and right-hand pane in the dashboard.

Step 119: Rearrange the filters accordingly.

Step 120: Change filter settings to “All Values in Database” so that filter options affect the whole dashboard.

Step 121: Edit dashboard title.

Step 122: Rename dashboard.

Step 123: The final dashboard is shown below.

3.0 Insights from the new visualisation

Observation 1: People are generally open to being vaccinated against COVID-19.

This pertains to survey questions Vac 1, Vac 2-3, Vac 2-6, and Vac 3.

Overall, there is an openness to being vaccinated against COVID-19 across all countries. This can be seen from the longer bars for Strongly Agree (blue bar) and Agree (teal bar) vis-a-vis Strongly Disagree (red bar) and Disagree (pink bar), indicating a higher proportion of responses to Strongly Agree and Agree. See sample below.

Observation 2: Females are generally more concerned about contracting COVID-19 than males.

This pertains to survey question Vac 2-1. All countries except Singapore show that, at a confidence interval of 95%, the female population (orange line) are more concerned about contracting the coronavirus compared to males (blue line).

Observation 3: Older people, typically males, are generally less concerned about the side-effects of a vaccine.

This pertains to survey question Vac 2-2. Across all countries, respondents aged 65 to 99 years indicate that they are less concerned about the side-effects of a vaccine, compared to younger people (18 to 64 years old).

From the visualisation below, we can see that respondents in the 65 to 99 year age group have longer 95% confidence interval lines compared to across all age groups. This could be because there are comparatively less respondents from this particular age group. Males in this age category tend to be less concerned about side effects than females.

Appendix

The survey questions used in this visualisation are as follows.

Code Question
Vac 1 If a Covid-19 vaccine were made available to me this week, I would definitely get it.
Vac 2-1 I am worried about getting COVID-19.
Vac 2-2 I am worried about potential side effects of a COVID-19 vaccine.
Vac 2-3 I believe government health authorities in my country will provide me with an effective COVID-19 vaccine.
Vac 2-6 If I do not get a COVID-19 vaccine when it is available, I will regret it.
Vac 3 If a COVID-19 vaccine becomes available to me a year from now, I definitely intend to get it.

Thanks for visiting my blog!

This post is a data visualisation assignment for the MITB programme at the Singapore Management University.

Distill is a publication format for scientific and technical writing, native to the web.
Learn more about using Distill at https://rstudio.github.io/distill.