1. Word Counts: Copy the Plain Text UTF-8 version of Bram Sroker’s Dracula from the site:
http://www.gutenberg.org/ebooks/345 into a text document called ‘dracula.txt’. Write python code that reads the text document and counts the unique words and stores the words and counts into a dictionary. Write the words and counts into a text file called ‘dracula words.txt’ by using the Python file handle method seen in class. Generate a list called ‘sorted counts’ that contains the value and key pairs as tuples that are sorted in descending order based on the frequencies of the words. For example, it should be in the form:
. . .
Here, the word ‘the’ occurs the most frequently. Output the top 10 most frequently occurring words. How often does the word ‘Dracula’ come up in the entire Novel?
Hint: use list comprehension. For the dictionary ‘counts’ the command ‘counts.items()’ returns an object of type dict items that contains the key value pairs in the form of tuples. You can iterate over these with the command: for (key, value) in counts.items(). Thus, you can get a list by finishing the following code: sorted_counts = [(value, key) for (key, value) in ??? ??? ].
You can now sort it element by element since tuples sort in a similar manner to other Python objects.
2. Linear and Non-Linear Regression:
(a) Run linear regression on the closing price data for BTC in the range of 12/15/2020 to 12/28/2020. Report the r-squared value on your training data. Use this to predict BTC closing prices on 1/1/2021 and 1/8/2021. How close are your predictions to the actual?
Repeat with polynomial regression.
(b) Run linear regression on the closing price data for BTC in the range of 2/2/2021 to 2/15/2021.
Report the r-squared value on your training data. Use this to predict BTC closing prices on 2/19/2021 and 2/21/2021. How close are your predictions to the actual? Repeat with polynomial regression.
(c) What is the daily rate of change on average of BTC closing prices based on your predictions obtained from regression on the data in the range of 12/15/2020 to 12/28/2020. What about for 2/2/2021 to 2/15/2021?