# Quantitative Variables, Descriptive Statistics & Linear Regression

PART #1

Choose two quantitative variables that you think might be related, then write a short survey that you could give to at least 20 participants. Your survey needs to gather data about eachquantitative variable.

#### EXAMPLE

You may believe that there is an association between the number of hours a week a student studies for their math class and their current GPA.

Survey Questions:

- How many hours a week do you study for your math class?
- What is your current GPA?

#### PART 1 OF CAPSTONE PROJECT

Submit a 1- page research proposal using the following format and headings. The description in italics indicates the information required in each section.

*Name:**Project Name:**Statistics Fall 2018*

__Introduction__

*Introduce the variables you will be collecting data on and explain why you think there is a relationship between these variables. Indicate which variable you think might cause a change in the other variable. Address why your research in important or interesting.*

__Population__

*Discuss the population you intend to focus your questions on and what your data will consist of. Answer the following questions in your response: Who is the population of interest (from which you will select your participants in part II of the project)? For the example above relating hours studied and GPA, I might want to consider all college students or I might want to focus on just community college students when I look at the relationship. What ethical issues do you anticipate with your research? Address how any ethical issues identified will be addressed. For example, I might consider that students might not be honest about their GPA unless they can answer the questions privately by writing their responses on paper and dropping it in a box.*

__Materials__

*Include your actual survey (with the survey questions you will ask).*

*PART #2*

Part 2: Data and Descriptive Statistics (45 points total)

In this section you will expand your project to include your actual data set, descriptive statistics for your quantitative variables, and a discussion of your results so far.

Your project should be submitted as a professional report including everything from Part 1 and 2 using the following template. IF you did not do Part 1, you must obtain approval from your instructor on your variables before collecting data. If Part 1 is missing, you will still lose completeness points.

The description in italics indicates the information required in each section.

*(Paste Part 1 of your project here)*

__Sampling Method__

*Based on the population identified in Part 1, describe an appropriate sampling technique to use for collecting data. Discuss *__how__* your participants will be identified/contacted; when and where will your study take place? For example, I might want to use a sample stratified by gender when looking at the relationship between study hours and GPA. To acquire my random sample within each strata, I might use a systematic sampling technique where I ask every 10th female or male entering RCF to participate.*

*NOTE: Ideally every study would have a very carefully designed sampling method where everyperson in the population is equally likely to be in the sample. Unfortunately, we have a limited budget and timeline for our classroom projects. There are still weaknesses in my example above that would needed to be raised in the discussion of my results (see below).*

__Data__

*Collect your own data set using your survey and following the plan you set out in part I. You should have at least 20 cases (subjects).*

*In order to satisfy good ethical practices, if you plan to survey human subjects, it is essential that they consent to being surveyed and that their information is protected. Please do not pressure anyone to participate and please keep identities confidential.*

Participant |
Variable 1: |
Variable 2: |

1 |
enter data |
… |

2 |
… | … |

… |
… |
… |

__Descriptive Statistics for <Variable 1 name>__

*Briefly introduce your variable and report the summary statistics (mean, standard deviation, 5-number summary and IQR).*

*Create both a histogram and modified boxplot for variable 1. You can choose what technology to use, but it must look professional (hand drawn histograms or boxplots will receive no credit). Indicate what technology was used and a brief description of the process.*

*Use the histograms and boxplots to describe the distribution. Discuss the shape (modality, symmetry/skew) and unusual features of your data. For the boxplot report the fences and any outliers, sharing the formula/calculations used.*

*Compare the mean and median, discussing whether the mean or median is a better measure of center and explain why. Compare the standard deviation and IQR, discussing which is the better measure of spread. Explain why.*

__Descriptive Statistics for <Variable 2 name>__

*Briefly introduce your variable and report the summary statistics (mean, standard deviation, 5-number summary and IQR).*

*Create both a histogram and modified boxplot for variable 2. You can choose what technology to use, but it must look professional (hand drawn histograms or boxplots will receive no credit). Indicate what technology was used and a brief description of the process.*

*Use the histograms and boxplots to describe the distribution. Discuss the shape (modality, symmetry/skew) and unusual features of your data. For the boxplot report the fences and any outliers, sharing the formula/calculations used.*

*Compare the mean and median, discussing whether the mean or median is a better measure of center and explain why. Compare the standard deviation and IQR, discussing which is the better measure of spread. Explain why.*

__Discussion__

*Based on the analysis above, discuss at least 2 interesting results from your analysis. For example, when looking at data about the hours students study, I might be surprised at the variation among students and discuss how this is demonstrated by the measures of spread found.*

*Ideally every study would have a very carefully designed sampling method where every person in the population is equally likely to be in the sample. Unfortunately, we have a limited budget and timeline for our classroom projects . Review the sampling method you used and discuss what limitations you see with your research, including sources of bias or other problems that might limit how well your research generalizes to the greater population (for example, if my survey on study hours and GPA was taken on campus between 2pm and 3pm, few night students are likely to be included).*

*PART #3*

Part 3: Z-scores and Linear Regression (45 points total)

This section will further investigate possible outliers in your data and investigate your initial hypothesis that your two variables are associated.

Your project should be submitted as a professional report including everything from Part 1 and 2 using the following template. If you did not do Part 1: you must obtain approval from your instructor on your variables before collecting data. If you did not submit Parts 1 or 2: you can still submit Part 3 but will lose completeness points for any missing sections.

The description in italics indicates the information required in each section.

*(Part 1 and Part 2 of your project here)*

__Z-Scores and Outliers__

*Above, you decided whether you have any outliers for your two variables. A second definition of an outlier is for a point to lie more than 2 standard deviations from the mean. For each variable, find the data point farthest from the mean (it could be either above or below the mean). Find the z-score for each of these points. Based on the z-score, is either of these points considered an outlier by this definition? If so, was it also an outlier based on the fences for that variable (which you found in part 2)?*

*Show all your calculations typed neatly using appropriate word processing software with a mathematics package (for example Equation Editor or MathType in MS Word).*

__Linear Regression and Correlation__

*In your research proposal (part 1) you discussed the relationship that might exist between your two quantitative variables. You are now going to examine this relationship using Linear Regression. Based on what you wrote in Part 1 of your project, state which variable you are selecting to be your explanatory (x) variable and which variable you are selecting to be your response (y) variable and explain why you made this decision.*

*Create a scatterplot of your explanatory and response variable. It must look professional (hand drawn scatterplots will receive no credit). Indicate what technology was used and a brief description of the process. Based on your scatterplot, discuss the direction, form, and strength of the association using appropriate statistical terminology. Are there any suspected outliers or clusters? Is a linear model appropriate based on your scatterplot?*

*Find the **linear regression equation and report the correlation coefficient using your choice of technology. Indicate what technology was used and a brief description of the process. Does your correlation coefficient confirm your observations of the scatterplot from the previous section? Explain why or why not.*

__Discussion__

*Do the results of your linear regression and correlation analysis appear to confirm or contradict your initial belief that these variables were associated in some way? Critically evaluate this conclusion by addressing both the evidence in support of your conclusion about the purported relationship between your two variables, as well as cautions or problems with your data that would weaken your case. The form, shape, and strength of your scatterplot, as well as the strength of the correlation coefficient should be discussed in this evaluation. If your data contained outliers, explain how that impacts the validity of the model. Additionally, given the sampling method you used and the limitations you reported in part II, discuss how this might limit how your answer to your original supposition generalizes to the greater population.*

*PLEASE READ***** THIS IS A THREE PART PROJECT. THE FIRST PART IS DUE IN 3 DAYS BUT THE SECOND AND THIRD PART IS NOT DUE UNTIL ANOTHER 2 WEEKS. I INCLUDED ALL 3 SECTIONS FOR THE REASON OF KEEPING THIS PROJECT WITH JUST ONE TUTOR TO MAKE IT EASIER FOR THE TUTOR TO BE FAMILIAR WITH ALL 3 PARTS OF THE PROJECT. IF IT IS POSSIBLE, I WOULD LIKE TO FIND SOMEONE THAT CAN JUST SUBMIT THE FIRST AND SECOND PART OF THE PROJECT IN 3 DAYS AND THEN STAY CONNECTED WITH ME SO THEY CAN FINISH THE LAST PART IN 2 WEEKS. IF ABLE TO FINISH ALL THREE PARTS OF THE PROJECT IN 3 DAYS PROPERLY, THEN NO NEED FOR MY MESSAGE ABOVE. ALL I WOULD ASK IS TO SUBMIT THE PROJECT IN 3 SECTIONS (INDIVIDUALLY).*