这是一个加拿大的R数据分析Project代写，主题是 Research Proposal and Data Introduction
The purpose of this assessment is to give you a head start with your final project by finding an
area of interest to study, real-world data to work with, and to research a little into your area of
interest to see what has been accomplished surrounding your question. This is the general
process in proposing a research question and will form the basis for a solid introduction section
for your final project report. It will also give you the chance to think about the appropriateness
of linear regression as a tool for statisticians. Lastly, it provides an opportunity to get some
feedback on your writing and research question that can be used to improve your final report.
1. Decide on one (or a few possible) areas of interest that you may want to explore. This
can be anything that matters or is of interest to you. Some examples could be (but are
certainly not limited to) sports, medicine, public health, economics, video games,
literature, etc. Pick something that you really care about.
2. Next, think about possible research questions you may want to study in these areas.
What do you want to know in this area? You want to make sure that your question is
able to be answered/studied using linear regression models. So you’ll want to frame
your question to be something related to modelling a relationship or predicting a value
based on this relationship. You’ll also want to consider whether the variable of interest
would allow the assumptions of linear regression to hold (see Module 3 content).
3. After coming up with a research question, you will need to find some open-source data
that you may use in your data analysis. You want to make sure that the data you find
has your response variable of interest (or has variables that could be used to create that
variable), as well as any other variable you may want to use as predictors. By looking for
data online, you may realize you need to modify your research question slightly or pick
another one if you can’t quite find the data you’re looking for. Alternatively, you can
stick with your research question but be sure to mention that you expect there to be
many limitations to the dataset because it doesn’t quite meet your needs. Step 4 can
also help you decide what predictors might be needed for you to answer your question.
4. Once you’ve found your dataset and have decided on your research question (or you
can work on steps 2-4 simultaneously and use what you find in all of them to finalize
your research question), you need to look at what others have studied in relation to
your research question. Do a quick search on the University of Toronto library website
to learn about anything related to your area of interest and research question. Look for
academic papers that studied the same question, or something related, that tells you a
bit more about what you may need to consider in your analysis and why your research
question is important.
• Focus on giving your reader a rough idea of how many papers have studied this
topic (or related to this topic) – this tells us how popular the area of research is
and how much research has been done.
• Give examples from a few important papers about what has been
found/discovered to be important in relation to your question (this can be
important variables, important results, surprising results, etc.) – this tells us that
you are aware of prior results and that you will be using these to plan your
• Think about how your research question fits into the area of research. Is it
different or new (e.g. nobody has studied this, or maybe it hasn’t been done in
this way or this population, etc.)? – this tells us that you see the importance of
what you are researching and can frame it against what has already been done.
5. Lastly, perform a short exploratory data analysis of your chosen dataset. You’ll want to
focus on identifying anything that you may need to consider moving forward. This
includes identifying skews, statistical outliers, variables with high spread or observations
that don’t make sense, and missing data, or a dataset that you think doesn’t quite have
what you need. You’ll need to present numerical and/or graphical summaries describing
the variables. Choose the options that highlight the features of the data that you want
to point out but will also let your reader clearly understand the data that you’ll be
• You want to make sure you specifically mention the presence of any of the above
characteristics (or lack thereof) and what this means for the analysis you will
eventually perform (i.e. how this might cause problems (or not) with the results
of linear regression or generalizability).