Data Analytics Capstone Projects’s Docs ▸ Google Data Analytics Original Capstone Project (1A)

Read
History

Ask Questions to Make Data-Driven Decisions

Step 1: Define the Problem

There are Six main different types of problems that we can look into for the project. However, the problem type that would fit best would be to Categorize things. In essence we are categorizing how members think about their benefits.

Categorizing Things

An example of a problem requiring analysts to categorize things is a organization’s goal to improve customer satisfaction. Analysts might classify customer service calls based on certain keywords or scores. This could help identify top-performing customer service representatives or help correlate certain actions taken with higher customer satisfaction scores.

With the organization I worked with we need to get a feel for how the members feel about the benefits they get with their time here. We’ll be looking at satisfaction levels so categorizing things will be the best bet for our problem to solve. There might be another problem we could try to solve, but in this case we’ll be sticking with the Categorizing Things problem and solution.

Step 2 Ask SMART Questions:

We will consider what kind of questions to ask the members of the organization. Our survey can’t be too long, so we’ll aim for about 20 to 25 questions. We need to make sure that our questions follow the SMART outline that is given in our notes for the Google Data Analytics course. SMART Questions include questions that are:

Specific– Does our question address the problem in step 1? Does the question have context?

Measurable– The question needs to have a measurable answer so that we can compare our results, this is not limited to quantitative data, however, quantitative data is more easily measurable than qualitative data and I personally prefer quantitative data to qualitative data for this reason.

Action-Oriented– Our questions should inspire action, if our questions don’t inspire action but are supplementary and cover topics that we would not need to solve then they wouldn’t be very helpful to us.

Relevant– The question needs to be about the particular problem we need to solve.

Time-Bound– Are our questions relevant to the specific time being studied? It does us not good to answer a question that is out of the scope of the time frame we want. What good would it do us to solve a problem that isn’t currently an issue?

This is a very specific question; it gets at whether the person is currently using the organization’s healthcare plan

This question is also very relevant to the survey that we’re taking, it is asking whether the members use the plan by the organization, if the members isn’t using our plan, then their responses will be less important than those who currently use our plan. They shouldn’t be completely town out but because people using our plan are affected more, they should have more weight put on their responses.

This question could be more specific by asking what other kinds of “another plan” there are. It could be considered closed ended because it doesn’t get to what those other types of plans are. It should have a free response option to offer a more open-ended option.

This question is open ended because it allows for people to explain why they would be uninsured if not for the cost, this is a good way to paste the question and could give us answers that we did not anticipate in our survey.

This question Is qualitative, however there is scale to this question so it can possibly be converted into a quantitative answer. For this to be more measurable I would assign values to the answers on a scale between 1 and 5 with 1 being the least and 5 being the most. I tried to do this within the question itself on the survey but the members above me told me that they wanted qualitative answers.

This question has good context and allows for multiple selections, perhaps we should do a ranking system for which is most preferred, but that is not necessary for our purposes. It even allows for a free response which is a good way to answer and is open ended (which could give us a lot of options we didn’t foresee).

This is a very measurable question, which is good for our purposes. However, if it was up to me, I would make 5 be “the most understood well” and 1 “The least understood” because having more of something (like a higher number) implies that you have more of an understanding (it scales much better than 1 being the most and 5 being the least). One example where 1 would be the most and the highest number would be the least would a ranking system. We’re not ranking multiple options here so we should use an option that is more scalable.

I would make this question more measurable (or quantitative) by putting 1 tough 5 complimentary to the answers that are there. However, this may be easier for the members to understand than assigning a number to it and would be less confusing for those taking the survey.

The objective of this question would be an important aspect of it. Considering whether we want the members to pay more in the long run or pay more every time the visit the hospital. This doesn’t need to be open ended (or include a short answer) because we want a very specific answer (one or the other).

This question needs to be very specific so I would not make it open ended. We need to know for sure whether our members want to medical plans to pick from. There should be no ambiguity here and having it be close ended is preferred. The only open-endedness this question should have is the reason behind picking no (as a free response).

I like how open ended this question is. However, I would have made this question be required and there be a “None” selectable option available as well in addition to the free response too if possible.

I like how open ended this question is with the free response at the end. However, it is tempting to quantify these results, but for our purposes the members may not be able to quantify their results so using qualitative answers may be the most accurate way they can respond.

We need to consider whether we want to ask qualitative questions or quantitative questions for our survey. In our SMART Questions we had both qualitative and quantitative questions. We took survey’s that were fairly scalar (Strongly Understand to Not Understand at All) so we can convert these later to numerical values if we want to look at things such as relatedness. We also asked for qualitive questions such as number 17 where the user just answers with a free response. A good survey usually includes both and we seem to be able to use both from our survey answers.

We need to consider whether we’re using Big Data or Small Data with our analysis. For our purposes we’ll be using small data.

Small data includes data that:

Describes specific metrics over a short, well-defined period (collected over the course to about two weeks),
Usually organized and analyzed in spreadsheets (using Excel or Google Sheets to organize and analyze) and it is not necessary to use programs such as RStudio (with R Code)
Used by a small to mid-sized the organization (total number of the populations is around 100)
Is a manageable size for analysis

Step 3: Create a Scope of Work for the Project

Data Analyst: Patrick Burcham

Purpose: The survey will be conducted to determine what the satisfaction of the members are about their benefits. We’ll determine what benefits need to be added and subtracted based on their responses.

Scope/Major Project Activities:

Activity	Description
Survey the Members at the Organization	In this activity I will help create a survey that will be distributed to the members at the organization, this will gauge their satisfaction with current and future benefit plans
Presentation	I will give a presentation based on the findings after working with ‘S’ to help with determining the best course of action for the upper management at the organization, this will include a lot of visuals so there will be no need for a dashboard (the data doesn’t update periodically)

This project does not include:

Specify the things that this project isn’t responsible for doing (out of scope). For instance, “this project does not involve a summation of 2019 data analysis”

Does not include recipients that the organization interacts with daily
Does not include past recipients that are no longer using the organization
Does not focus heavily on members with the organization that do not get benefits

Schedule Overview / Major Milestones:

The expected schedule for the project. This can be defined by milestones (e.g. “all data is cleaned and processed”), periods of time (“Week 1 / Week 2”), or other ways based on the needs of the project.

Milestone	Expected Completion Date	Description/Details
Completed Collection of the Survey	09/23/21	The survey should be completely collected by this point
Completed Presentation	10/22/21	The presentation should be completed by this point, I’m not sure I’ll give one, but I’ll be ready to if the opportunity presents itself

Estimated Date for Completion:

October 22^nd, 2021

Step 4: Working with Stake holders

This is the email(s) that I received for creating the survey that the organization would be using.

[Start Interaction here]

the organization

Winston-Salem

Direct: [Censored]

Main: [Censored]

Fax: [Censored]

Email: [Censored]

Website:

View Help Line’s Directory of Services for Recipients in ******* County:

Too many local Recipients are alone, hungry and struggling to meet basic needs. Your gift will change their lives.

Confidentiality Notice
This message is intended exclusively for the individual or entity to which it is addressed. This communication may contain information that is proprietary, privileged or confidential. If you are not the named addressee, you are not authorized to read, print, retain, copy or disseminate this message or any part of it. If you have received this message in error, please notify the sender immediately either by phone (***)***-**** or reply to this email and delete all copies of this message.

From: Patrick Burcham
Sent: Thursday, September 2, 2021 10:31 AM
To: “B”
Cc: “J”
Subject: Re: Member Benefits Survey

Q2 skip logic, appears if Q1 is no

Q2 now has display logic

Q3 skip logic, appears if Q2 is uninsured

Q3 now has display logic

Q4 skip logic, appears if Q3 is other OR you could make it so the answer choice on Q3 is other, with a write in option.

Q4 has display logic now

Q5 don’t need the numbers or colons “:” beside the answer choices

Colons are gone from question 5 now

Q6 add (mark all that apply)

Added

Q10/Q11 – maybe this should be a rank too, instead of asking them to pick only 3. Is there a different way to do ranking questions beyond the drag and drop? Those results sometimes show up funny.

Now Question 11 is a ranking question

For questions 11 – 16/12-17 isn’t there an option in Survey Gizmo (like the radio button grid, perhaps) that work well for condensing/combining these questions?

Question 12 is now 12–17. I did not delete those questions just in case it was not what you were looking for. I think I tried that before but ‘B’ said the ranking single question with a single answer might be better for the spreadsheets that our survey will take up later.

Question 18/19 – we don’t need the numbers beside the answer choice options

Done

For questions 19-32/21-34 same note as above about the numbers beside the answer choices and also combining/condensing these into one question…I’d suggest using a radio button grid or some other option that combines all these things…

Question 20 is now 21–34. I did not delete those questions just in case it was not what you were looking for. I think I tried that before but ‘B’ said the ranking single question with a single answer might be better for the spreadsheets that our survey will take up later.

Need to add question asking people to rank what is most important to them as we select a plan:
- Out of pocket costs/deductibles
- Premium/cost out of each pay check
- Prescription formulary
- Provider network (keeping same doctor)
- Other (write in option)

Added as question 35

[End Interaction here]

Prepare Data for Exploration:

Step 5: Consider the Population Sample Size and Collection

The population size of the organization is close to 100 people. We’ll calculate the sample size that we need to get the confidence interval that we are pursuing. For our example we’ll only need a 90% confidence interval. Although the accuracy needs to be at least 90% we’re not using something that needs as much confidence as something like a drug test at a pharmacy (which needs to be at 99%). We have more leeway for inaccuracy with our survey. Our Margin of Error is 8% (once again we’re not running pharmacy tests, so our accuracy doesn’t need to be the strictest).

I plugged the information for our sample size into a Sample Size Calculator that I have in my Google Drive. This can be calculated manually, but there is no need for us to calculate it manually.

When we plug these numbers into our online survey calculator, we get 52 people for what we need to randomly survey to get the accuracy that we need.

How the Data will be collected:

The data will be collected as a first party data from the members themselves within the organization directly. The survey will be conducted tough Survey Alchemer and even though we are using a third-party tool to collect the data it still will be considered Primary data (or First party Data) because The organization is the one conducting the survey.

Primary vs. Secondary Data:

Data Format Classification	Definition	Examples
Primary data	Collected by a researcher from first-hand sources	– Data from an interview you conducted – Data from a survey returned from 20 participants – Data from questionnaires you got back from a group of members
Secondary data	Gathered by other people or from other research	– Data you bought from a local data analytics firm’s customer profiles – Demographic data collected by a university – Census data gathered by the federal government

The data will be Primary Data because the organization will be the ones collecting it. There will not be second party or third-party sources that the data collection will go tough. This means we will have more control and accuracy over the results than if we were using a second party or third-party source.

Internal vs. External Data:

Data Format Classification	Definition	Examples
Internal data	Data that lives inside a organization’s own systems	– Wages of members across different business units tracked – Sales data by store location – Product inventory levels across distribution centers
External data	Data that lives outside of a organization or the organization	– National average wages for the various positions throughout your the organization – Credit reports for customers of an auto dealership

This data will be Internal or contained within the organization itself. It will be stored within Survey Alchemer, but the organization has essentially total control over the data.

Discrete vs. Continuous Data:

Data Format Classification	Definition	Examples
Continuous data	Data that is measured and can have almost any numeric value	– Height of kids in third grade classes (52.5 inches, 65.7 inches) – Runtime markers in a video – Temperature
Discrete data	Data that is counted and has a limited number of values	– Number of people who visit a hospital on a daily basis (10, 20, 200) – Room’s maximum capacity allowed – Tickets sold in the current month

This data is discrete and not continuous. The data is very qualitative and that plays a part in looking at the Discrete option. For example, look at Question 9

The answers are split up into different “Buckets” (Excellent, Above Average, Average, Below Average, Do Not Like). These buckets are different levels from each other and are separate and are not continuous (or is limited).

Qualitative vs. Quantitative:

Data Format Classification	Definition	Examples
Qualitative	Subjective and explanatory measures of qualities and characteristics	– Exercise activity most enjoyed – Favorite brands of most loyal customers – Fashion preferences of young adults
Quantitative	Specific and objective measures of numerical facts	– Percentage of board certified doctors who are women – Population of elephants in Africa – Distance from Earth to Mars

Most of these answers are qualitative, not quantitative. I had requested that the questions be more quantitative.

With question 9 I had requested that we use numbers to mark the answers (1-5), instead of qualitative (Excellent, Above Average, Average, Below Average, Poor, Do Not Use). However, the stake holder decided to go with qualitative data.

Nominal vs. Ordinal:

Data Format Classification	Definition	Examples
Nominal	A type of qualitative data that isn’t categorized with a set order	– First time customer, returning customer, regular customer – New job applicant, existing applicant, internal applicant – New listing, reduced price listing, foreclosure
Ordinal	A type of qualitative data with a set order or scale	– Movie ratings (number of stars: 1 star, 2 stars, 3 stars) – Ranked-choice voting selections (1st, 2nd, 3rd) – Income level (low income, middle income, high income)

This data is Ordinal in questions that have qualitative answers like 9 and 23. It is on a scale instead of being just unordered qualitative data.

That is why I wanted to use quantitative data (because it was ordered) for a more exact measurement. However, the stake holder felt that this could potentially confuse the members taking the survey.

Structured vs. Unstructured:

Data Format Classification	Definition	Examples
Structured data	Data organized in a certain format, like rows and columns	– Expense reports – Tax returns – Store inventory
Unstructured data	Data that isn’t organized in any easily identifiable manner	– Social media posts – Emails – Videos

The data is structured. If it was unstructured, it would be in a format like emails, social media posts or videos. The data should be stored in tables that are accessible to users later. I do have one example of data that is somewhat unstructured (Question 3) but structured data is far more common in my capstone.

Step 6: Create Models for the Database

Data Modeling:

When building a model for our data we need to go in tee major steps. First is the conceptual model (which is the least detailed). Next we need to build the Logical Model (which is more detailed than the conceptual model, but less detailed than the physical model). Lastly we build a physical model for our data base program. This will contain some of the actual options that we will need to select when importing out data into the database.

The Data Modeling for this Capstone will not be needed. The data is normalized in first normal form, so the entire dataset is in a single excel or google sheet. However, we’ll create a Conceptual and Logical Data Diagram just for the sake of doing so. We won’t need a Physical Data Model for our example since we won’t be using a Database program, but you can guess that we’d add things like Varchar and Int to such code.

Conceptual Model

Logical Data Model

Process Data from Dirty to Clean

Step 7: Cleaning Up Excel Data

These steps are completed after we have collected the data and constructed our Conceptual, Logical and Physical Models. On 09/23/21 the survey had closed and we got 70 responses. We met our goal of 52 responses for our sample size so we’ll be able to continue on with our project.

We’ll need to export this data to Excel so that we can clean it up. We could use Google Sheets, but using Excel is a personal preference that I have. In order to do that we’ll have to go to the Results Tab and then select Export. Then we’ll select CSV/Excel to download the file. We’ll exclude “Include Email Invitation and Contact Fields” so that the user’s data can remain anonymous.

It’s important to make sure that the data aligns with our business objective. Our Business Objective is to understand the levels of satisfaction that members have with their current benefits plan and what they want with for the potential future for their benefits. We won’t be able to substitute data that we are missing with this dataset because it is specific to our members. We can’t use another organization’s members to determine how ours fill about benefits offered by our organization.

When there’s no data:

The first thing to look at when cleaning up the data Is blank spaces within the Excel document that holds the data. One thing to consider is whether this data is Null or a purposely “No” (which in this case it was left blank even though it should be a “No”). There is a difference between “No” and Null. Null is unanswered, a lack of data or an unknown. A “No” is an actual answer, which could be a zero.

In our first example the columns Contact ID, Legacy Comments and Comments are all unanswered. In this case we will delete the Contact ID column (since we’re publishing this data and thus want to keep the users anonymous). For the Legacy Comments and Comments column we will enter the word “None”. This is the same a zero, but not the same as null. If the answer is null, then there is potential for a future answer. If the answer is “None” that means we will not have a different future answer that has another value.

After we’ve deleted columns that we do not need we’ll fill in the rest of the data that is needed. Some of the data will need to be left blank because of the structure of the question (Mark All that Apply). This is necessary for the visualization software to be able to work well with our table.

I spent about 2 to 3 hours cleaning up all the blank and Null values. This will make looking at each question much easier when dealing with the data. I also numbered each question with a number and a letter. This will help keep straight the data. Since the data is in first normal form there is the possibility that there will be multiple columns will be multiple answers the same question. If we had made the database that we had made models for in Step 6 this would be less of a problem.

I went back and scanned the data again after fixing the blanks and Nulls. I checked spelling, formatting and other issues like that. I fixed all the issues I could find, and I finished cleaning the data. The data is clean and formatted (within Excel) as of 09/23/21 16:28:00 EST.

One thing that would have made the cleaning go much quicker is using filters to find blank spaces in questions where those would have to be filled in as blank or Null. I manually searched for all the spaces, which took more time (In the future, I’ll remember to use the filter option in Excel). I did use ‘Control-H’ to ‘Replace All’ for empty spaces, however, that stopped working about halfway tough the cleaning process and I had to manually drag copies of the word “Null” around to fill in the rest of the spaces.

Perhaps there’s a way to automate this process. I doubt using SQL would have helped with cleaning this data, also it wasn’t a very large dataset (only 70 entries large) so working on it manually wasn’t as time consuming as one might have thought.

I needed to reformat the data (Create copy columns) and relabel them to give numbers to the qualitative answers that they have. They’re scaled qualitative answers, so this is possible. This way I can do some correlation comparisons when looking at different variables and seeing how they are related to each other with scatter plots.

Analyze Data to Answer Questions

Step 8: Organize the Data

This is a list of the Columns for the Dataset (a lot of the names were so long that it was too difficult to fit them all within a good viewing angle on the sheet, this will make it easier to work with the Pivot tables). Columns listed in green are copied columns that were converted to quantitative values.

Data Range:

‘Raw Data’!$A$1:$CN$71

Columns

Response ID

Time Started

Date Submitted

Status

Language

Referer

SessionID

User Agent

IP Address

Longitude

Latitude

Country

City

State/Region

Postal

1A) Are you currently enrolled in our organization’s members health care plan?

2A) Are you: Part-Time/Currently Ineligible for Benefits

2B) Are you: Covered under spouse’s plan?

2C) Are you: Covered under another plan?

2D) Are you: Uninsured?

3A) Cost: Are you uninsured because of:

3B) Other – Please Explain::Are you uninsured because of:

3C) Other – Please Explain::Are you uninsured because of:

4A) How would you rate the information you receive from the organization about your benefit plans?

4B) How would you rate the information you receive from the organization about your benefit plans?

5A) What is your preferred method(s) for receiving benefits communication? (Mark all that apply) Printed

5B) What is your preferred method(s) for receiving benefits communication? (Mark all that apply) Easy to Access Website

5C) What is your preferred method(s) for receiving benefits communication? (Mark all that apply) Slide or Video Presentations

5D) What is your preferred method(s) for receiving benefits communication? (Mark all that apply) Member Meetings

5E) What is your preferred method(s) for receiving benefits communication? (Mark all that apply) E-Mail

5F) What is your preferred method(s) for receiving benefits communication? (Mark all that apply) Tough Mail

5G) What is your preferred method(s) for receiving benefits communication? (Mark all that apply) Other – Please Explain (Selected)

5H) What is your preferred method(s) for receiving benefits communication? (Mark all that apply) Other – Please Explain (Explained)

6A) Supervisor: When you want detailed information about how your benefits work, where would you turn. Please rank your answers from first choice to last choice, from top to bottom.

6B) Department: When you want detailed information about how your benefits work, where would you turn. Please rank your answers from first choice to last choice, from top to bottom.

6C) Organization Dashboard: When you want detailed information about how your benefits work, where would you turn. Please rank your answers from first choice to last choice, from top to bottom.

6D) Insurance Provider: When you want detailed information about how your benefits work, where would you turn. Please rank your answers from first choice to last choice, from top to bottom.

6E) Benefits Booklet: When you want detailed information about how your benefits work, where would you turn. Please rank your answers from first choice to last choice, from top to bottom.

7A) How well do you currently understand how your benefits work? 1 being very well and 5 being not at all.

7B) How well do you currently understand how your benefits work? 1 being very well and 5 being not at all.

8A) How well do you WANT to understand how your benefits work? 1 being very well and 5 being not at all.

8B) How well do you WANT to understand how your benefits work? 5 being very well and 1 being not at all.

9A) Medical Plan: Please select the response that best describes your overall feeling about each of our benefit plans or plan elements.

9B) Medical Plan Provider Network: Please select the response that best describes your overall feeling about each of our benefit plans or plan elements.

9C) Dental Plan: Please select the response that best describes your overall feeling about each of our benefit plans or plan elements.

9D) Short-Term Disability Plan: Please select the response that best describes your overall feeling about each of our benefit plans or plan elements.

9E) Life-Insurance Plan: Please select the response that best describes your overall feeling about each of our benefit plans or plan elements.

9F) Vision Plan: Please select the response that best describes your overall feeling about each of our benefit plans or plan elements.

9G) Accident and Critical Illness: Please select the response that best describes your overall feeling about each of our benefit plans or plan elements.

10A) Health (Medical Plan): Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

10B) Vision Plan: Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

10C) Dental Plan: Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

10D) Member Assistance Program: Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

10E) Supplemental Life Insurance: Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

10F) Dependent Life Insurance: Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits

10G) Accidental & Critical Illness: Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

10H) Short-Term Disability: Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

10I) 403(b) Retirement Plan and Organization Match: Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

10J) Holidays: Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

10K) Vacation: Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

10L) Sick Pay: Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

10M) Paid Lunch Break: Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

10N) Access to Exercise Equipment On-Site (*********):Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

10O) Dependent Care Spending Account: Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

10P) Medical Care Spending Account (Flexible Spending Account): Rate these benefits in terms of importance. Please select the bubble that best corresponds to the degree of importance you place on the following benefits.

11A) In a typical year, how often have you used the exercise equipment at the ********* location?

12A) If you had to contribute more towards your health care, which ONE statement best describes you?

13A) What is your impression of our organization’s benefit plans compared to other local organizations?

13B) What is your impression of our organization’s benefit plans compared to other local organizations?

14A) Out of pocket costs/deductibles: What is most important to you as we select a plan? Please let 1 indicate which is most important to you and 5 indicate which is least important to you.

14B) Premium/cost out of each paycheck: What is most important to you as we select a plan? Please let 1 indicate which is most important to you and 5 indicate which is least important to you.

14C) Prescription formulary: What is most important to you as we select a plan? Please let 1 indicate which is most important to you and 5 indicate which is least important to you.

14D) Provider Network (Keeping the same doctor): What is most important to you as we select a plan? Please let 1 indicate which is most important to you and 5 indicate which is least important to you.

14E) Other (write in option): What is most important to you as we select a plan? Please let 1 indicate which is most important to you and 5 indicate which is least important to you.

15A) Would you like to have two different medical plans to choose from? One may cost more and provide higher benefits, while another may cost less and provide lesser benefits (a Core Plan and a Buy-Up Plan)?

16A) Would you like the opportunity to trade some of your current benefits for others of more importance to you?

17A) What benefits would you like to have that are not currently offered?

18A) Would you consider trading (or giving up) some of your benefits in order to receive more money in your paycheck each pay period?

19A) What benefits would you be willing to give up in exchange for money in your paycheck? See questions 19-32 for a list of benefits.

20A) The IRS allows members to establish an members-owned health savings account (HSA) that secures pre-tax dollars from your earnings in a fund for future medical needs. HSAs are established with high deductible health plans that come with much lower premiums than traditional plans. If you had the option of participating in a high deductible health plan in conjunction with owning an HSA, would you choose to do so?

21A) The IRS allows organizations to establish a Health Care Reimbursement Account to help members pay for certain health care costs with pre-tax income that they must otherwise pay for with after-tax income (for example, deductibles and non-covered medical expenses). If you could allocate a portion of your income to that account, would you choose to do so?

22A) Please provide additional comments on how our organization can improve upon its members benefit plans, or better meet your members benefit plan needs.

23A) Training to enhance job skills in current role: To what degree would you like access to the following career advancement/growth opportunities at the organization:

23B) Ability to try new things within current role: To what degree would you like access to the following career advancement/growth opportunities at the organization:

23C) Opportunity to cross train: To what degree would you like access to the following career advancement/growth opportunities at the organization:

23D) Opportunity to expand knowledge and learn new skills in current role: To what degree would you like access to the following career advancement/growth opportunities at the organization:

23E) Opportunity to advance to a different role with new responsibilities: To what degree would you like access to the following career advancement/growth opportunities at the organization:

23F) Assistance with continuing education: To what degree would you like access to the following career advancement/growth opportunities at the organization:

23G) For our pay to reflect the job we do and the number of years we have been here.: To what degree would you like access to the following career advancement/growth opportunities at the organization:

23H) Hoping other members desire to enhance skills & knowledge: To what degree would you like access to the following career advancement/growth opportunities at the organization:

23I) Open to any learning opportunities and ability to contribute to growth of our agency and services provided for older adults: To what degree would you like access to the following career advancement/growth opportunities at the organization:

23J) some crossing training cannot be done due to regulations. I cannot do a social members job or a nursing job.: To what degree would you like access to the following career advancement/growth opportunities at the organization:

Once the data has been organized (and relabeled with Numbers and Letters) we will Sort the Data in Pivot Tables in Excel. We’ll play around with different numbers to get charts that show information based on what is going on. Because we are dealing with only 71 entries there isn’t really a need for much SQL on this project. We can filter and sort fairly well in Excel. Also we’d need to rename all the columns in the table to do this in order to be able to use the name of those columns in SQL. They’re too long for us to type out as they stand now. There was no need to connect multiple tables together so there isn’t really a need for joining tables using the data in SQL as well.

We’ll need to rename the columns to be shorter names (and to eliminate spaces) in SQL by loading it into Big Query. We’ll sort and filter a little bit using Big Query as an example.

Converting Columns to Type Integer:

Converting the _8B_Want_Understand Column to an Integer Value (You will need to change the query settings in the “More” tab option to overwrite the current table, you’ll have to reset this every time you cast when you change just the query settings). Converting columns to type integer is necessary if we want to make calculations. If they column remains type Varchar we won’t be able to do calculations on them.

SELECT *, SAFE_CAST(_8B_Want_Understand AS INT) AS _8B_Want_Understand_Int

FROM `project1-322422.*****_*******.Survey`

(Another Example of this)

SELECT *, SAFE_CAST(_7B_Currently_Understand AS INT) AS _7B_Currently_Understand_Int

FROM `project1-322422.*****_*******.Survey`

Figuring Out Which Columns are Categories and Which are Subcategories:

This is our table of “Categories” and “Subcategories”. The Categories are measurements that can be measured independently of subcategories (this means it can be split up based on time (time is not supposed to be a part of your subcategories). and you can use aggregate functions in a meaningful way such as average, median, mode or sometimes sum, max and minimum, without another column. You only need themselves to get results based on some aggregation.

The Subcategories are the measurements that cannot be measured independently of categories or other sub categories. They always have to be paired with a category or another subcategory. Categories can be independent of subcategories or they can be paired with other Categories or Subcategories (but it is not necessary to pair them to get a chart/measurement from them or use an aggregate function with them). Things such as map coordinates, geographical locations or id columns are usually subcategories. IDs cannot be added or subtracted (using aggregate functions such as average on them is meaningless). You can use count or count distinct, but not much more than that. This is also true for latitude and longitude values.

Categories	Subcategories
1) Current Enrollment	Location
2) Covered Under Other Plan	Week Answered (1 or 2)
4) Rating Information Received	Cities
5) Receiving Benefits Communication	State
6) Ranking Benefits	On/Off Site
7) Understanding Benefits	Geographical Region
8) Desire to Learn	Postal Zip Code
9) Feelings About Benefits
10) Rate Benefits by Importance
11) How Often do you use Exercise Equipment
12) Contribute More Towards Healthcare
13) Impression of Plan Compared to Other’s
14) Rank Out of Pocket Costs
15) Two Medical Plans to Pick From
16) Trade Benefits
17) What Benefits Do You Want
18) Consider Giving Benefits for More Money
19) What Benefits Would you Give Up for More Money
20) IRS HSA Accounts
22) Addition Comments
23) Access to Growth Opportunities at the organization

This is our chart based on the questions that we have about the data. Dark blue text are Subcategories. Red text are questions we need to ask in the future. These measurements are how we’ll be able to get our data into a form for answering these questions.

Top 12 Question:	Measurements
1) What is the best combination of preferred communication to members?	5) Receiving Benefits Communication	4) Rating Information Received
2) Are On-Site members more likely to be good participants in meetings?	7) Understanding Benefits	8) Desire to Learn	On/Off Site
3) What benefits should we consider giving up/trading for our members?	17) What Benefits Do You Want	19) What Benefits Would you Give Up for More Money
4) How do our plans compare to other’s based upon geographical areas?	13) Impression of Plan Compared to Other’s	Geographical Region
5) What is the least and most important ranking of out of pocket costs with our members?	14) Rank Out of Pocket Costs
6) Should we consider turning the exercise room into something else?	11) How Often do you use Exercise Equipment
7) Would you use a treadmill desk in place of an exercise room?	Null
8) In what cities/zip codes is our benefits perceived as competitive with other organizations?	13) Impression of Plan Compared to Other’s	Cities	Postal Zip Code
9) Are our members’ that are serious about healthcare taking exercise seriously?	11) How Often do you use Exercise Equipment	12) Contribute More Towards Healthcare
10) Should we offer two medical plans?	15) Two Medical Plans to Pick From
11) Is there a Correlation between How Well Member’s Understand Benefits and How Well the Organization has Explained Benefits to Them	4) Rating Information Received	7) Understanding Benefits
12) Is there a Correlation Between How Well Member’s Currently Understand Benefits and How much they want to	7) Understanding Benefits	8) Desire to Learn

Answering Our Top Questions:

Question 1) What is the best combination of preferred communication to members?

We first need to figure out which combinations are the most understood that people would use from the organization. I split up the 6 total possible into combinations of 3.

Category	Totals
Printed + Website + Presentations	6
Printed + Website + Meetings	13
Printed + Website + Email	14
Printed + Website + Mail	5
Printed + Presentations + Meetings	8
Printed + Presentations + Email	5
Printed + Presentations + Mail	4
Printed + Meetings + Email	11
Printed + Meetings + Mail	6
Printed + Email + Mail	6
Website + Presentations + Meetings	8
Website + Presentations + Email	7
Website + Presentations + Mail	4
Website + Meetings + Email	15
Website + Meetings + Mail	6
Website + Email + Mail	6
Presentations + Meetings + Email	8
Presentations + Meetings + Mail	5
Presentations + Email + Mail	5
Meetings + Email + Mail	7

I got this chart from looking at the different combinations and how well they are understood. This chart is based off the Categories and Total Data. However, it is easier to understand because it uses surface area to determine how effective the combination is. The further to the left and top the box is and the more area it takes up, the more effective it is for the members.

Based on this chart the 4 most effective combinations of conveying information are (in order from most effective to least effective)

Website + Meetings + Email, Printed + Website + Email, Printed + Website + Meetings or

Printed + Meetings + Email

We’ll filter for these 4 combinations of communications in our SQL code in Big Query:

SELECT _5A_Printed, _5B_Website, _5D_Meetings, _5E_Email, AVG(_4B_Rate_Information_Int) AS Average_Understanding

FROM `project1-322422.*****_*******.Survey`

GROUP BY _5A_Printed, _5B_Website, _5D_Meetings, _5E_Email;

According to the data, Printed, Meetings and Emails have the highest understanding of all the top 4 combinations to investigate. Having a printed and email copy of the benefits gives the user a digital and physical ledger for their records while meetings give them interactivity they would not normally get on a website with other people)

The solution where members understand the most includes meetings. Not all of our members meet on campus so we’ll have to determine if the enthusiasm for members’ and their understanding of their benefits is better for those who meet on site for members meetings.

2) Are On-Site members more likely to be good participants in meetings?

We’ll put this SQL code into Big Query and get the table that we need to make the following chart:

SELECT AVG(_7B_Currently_Understand_Int) AS Average_of_Currently_Understand, AVG(_8B_Want_Understand_Int) AS Average_of_Want_to_Understand, Work_Location

FROM `project1-322422.*****_*******.Survey`

WHERE Work_Location IS NOT NULL

GROUP BY Work_Location;

Calculating the Percentage Difference between the two columns:

= 4.90229% difference

Solution:
Calculate percentage difference
between V₁ = 8.666666667 and V₂ = 8.251966122

|V1−V2|[(V1+V2)2]×100=?|V1−V2|[(V1+V2)2]×100=?

=|8.666666667−8.251966122|[(8.666666667+8.251966122)2]×100=|8.666666667−8.251966122|[(8.666666667+8.251966122)2]×100

=|0.414700545|[16.9186327892]×100=|0.414700545|[16.9186327892]×100

=0.4147005458.4593163945×100=0.4147005458.4593163945×100

=0.0490229×100=0.0490229×100

=4.90229%difference

There is only a 5% difference between both column’s total. This shows that having offsite members’ at the meetings would be a good idea and it will not have a major effect on how effective the meetings are.

3) What benefits should we consider giving up/getting for our members?

All the benefits that are a part of question 10 (Please select the bubble that best corresponds to the degree of importance you place on the following benefits.) was filtered in SQL Code on Big Query.

SELECT _17A_What_Benefits

FROM `project1-322422.*****_*******.Survey`

WHERE _17A_What_Benefits LIKE ‘%health%’ OR _17A_What_Benefits LIKE ‘%vision%’

OR _17A_What_Benefits LIKE ‘%dental%’ OR _17A_What_Benefits LIKE ‘%members assistance%’

OR _17A_What_Benefits LIKE ‘%life insurance%’ OR _17A_What_Benefits LIKE ‘%illness%’

OR _17A_What_Benefits LIKE ‘%disability%’ OR _17A_What_Benefits LIKE ‘%holiday%’

OR _17A_What_Benefits LIKE ‘%vacation%’ OR _17A_What_Benefits LIKE ‘%sick%’

OR _17A_What_Benefits LIKE ‘%lunch%’ OR _17A_What_Benefits LIKE ‘%exercise%’

OR _17A_What_Benefits LIKE ‘%medical care spending account%’;

This was the result of the previous SQL query in Big Query.

Row	_17A_What_Benefits
1	Fitness reimbursement plan: get rewarded for staying healthy. Treadmill desk/standup desk. Mental health days for living at home dept. Travel benefits, or 4 day work week.
2	More Vacation days. Instead of years of service certificate, how about, every year we stay increased in vacation time. There is value in long term members. Paid day off for our birthday etc. My husbands job , they keep giving them more and more vacation has a retention program. I am stuck at the same amount. His organization keeps adding days.
3	combine sick and vacation
4	Is there such a thing as rewards/ incentives for being healthy and not utilizing medical and Rx benefits? 🙂 My PCP doctor (who I only see for sick visits) unbeknownst to me, dropped me as a patient because I had not been in in a couple years. Scheduler said I was too healthy. I had to get reestablished as a patient with this provider. I thankfully (knock on wood) and historically now for several years, I only see one medical provider (not my PCP) that can do all my wellness and preventative services in one visit annually. Also, the pre-loaded cards we received for certain purchases was VERY nice to have.

This is the SQL Code put into Big Query to determine which people wanted to give up benefits:

SELECT _19A_Give_Up

FROM `project1-322422.*****_*******.Survey`

WHERE _19A_Give_Up LIKE ‘%health%’ OR _19A_Give_Up LIKE ‘%vision%’

OR _19A_Give_Up LIKE ‘%dental%’ OR _19A_Give_Up LIKE ‘%members assistance%’

OR _19A_Give_Up LIKE ‘%life insurance%’ OR _19A_Give_Up LIKE ‘%holiday%’

OR _19A_Give_Up LIKE ‘%vacation%’ OR _19A_Give_Up LIKE ‘%sick%’

OR _19A_Give_Up LIKE ‘%lunch%’ OR _19A_Give_Up LIKE ‘%exercise%’

OR _19A_Give_Up LIKE ‘%medical care spending account%’;

Only one person wanted to give up benefits that were a part of the number 10 in the questionnaire.

Row	_19A_Give_Up
1	Everything but dental and health

4) How do our plans compare to other’s based upon geographical areas?

Our benefits are about average when compared to the benefits of other organizations in different areas based upon the geographical region they’re in. A lot of our benefits are perceived to be within .5 of being average compared to others.