这是一个加拿大的R数据分析Project代写，主题是 Research Proposal and Data Introduction

## Goal of the Assessment:

The purpose of this assessment is to give you a head start with your final project by finding an

area of interest to study, real-world data to work with, and to research a little into your area of

interest to see what has been accomplished surrounding your question. This is the general

process in proposing a research question and will form the basis for a solid introduction section

for your final project report. It will also give you the chance to think about the appropriateness

of linear regression as a tool for statisticians. Lastly, it provides an opportunity to get some

feedback on your writing and research question that can be used to improve your final report.

## Instructions:

1. Decide on one (or a few possible) areas of interest that you may want to explore. This

can be anything that matters or is of interest to you. Some examples could be (but are

certainly not limited to) sports, medicine, public health, economics, video games,

literature, etc. Pick something that you really care about.

2. Next, think about possible research questions you may want to study in these areas.

What do you want to know in this area? You want to make sure that your question is

able to be answered/studied using linear regression models. So you’ll want to frame

your question to be something related to modelling a relationship or predicting a value

based on this relationship. You’ll also want to consider whether the variable of interest

would allow the assumptions of linear regression to hold (see Module 3 content).

3. After coming up with a research question, you will need to find some open-source data

that you may use in your data analysis. You want to make sure that the data you find

has your response variable of interest (or has variables that could be used to create that

variable), as well as any other variable you may want to use as predictors. By looking for

data online, you may realize you need to modify your research question slightly or pick

another one if you can’t quite find the data you’re looking for. Alternatively, you can

stick with your research question but be sure to mention that you expect there to be

many limitations to the dataset because it doesn’t quite meet your needs. Step 4 can

also help you decide what predictors might be needed for you to answer your question.

4. Once you’ve found your dataset and have decided on your research question (or you

can work on steps 2-4 simultaneously and use what you find in all of them to finalize

your research question), you need to look at what others have studied in relation to

your research question. Do a quick search on the University of Toronto library website

to learn about anything related to your area of interest and research question. Look for

academic papers that studied the same question, or something related, that tells you a

bit more about what you may need to consider in your analysis and why your research

question is important.

• Focus on giving your reader a rough idea of how many papers have studied this

topic (or related to this topic) – this tells us how popular the area of research is

and how much research has been done.

• Give examples from a few important papers about what has been

found/discovered to be important in relation to your question (this can be

important variables, important results, surprising results, etc.) – this tells us that

you are aware of prior results and that you will be using these to plan your

analysis.

• Think about how your research question fits into the area of research. Is it

different or new (e.g. nobody has studied this, or maybe it hasn’t been done in

this way or this population, etc.)? – this tells us that you see the importance of

what you are researching and can frame it against what has already been done.

5. Lastly, perform a short exploratory data analysis of your chosen dataset. You’ll want to

focus on identifying anything that you may need to consider moving forward. This

includes identifying skews, statistical outliers, variables with high spread or observations

that don’t make sense, and missing data, or a dataset that you think doesn’t quite have

what you need. You’ll need to present numerical and/or graphical summaries describing

the variables. Choose the options that highlight the features of the data that you want

to point out but will also let your reader clearly understand the data that you’ll be

working with.

• You want to make sure you specifically mention the presence of any of the above

characteristics (or lack thereof) and what this means for the analysis you will

eventually perform (i.e. how this might cause problems (or not) with the results

of linear regression or generalizability).