written assignment follow the document for the question
- Chapter 1 outlines the tidy text format and the
unnest_tokens()
function. It also introduces the gutenbergr and janeaustenr packages, which provide useful literary text datasets that we’ll use throughout this book. - Chapter 2 shows how to perform sentiment analysis on a tidy text dataset, using the
sentiments
dataset from tidytext andinner_join()
from dplyr. - Chapter 3 describes the tf-idf statistic (term frequency times inverse document frequency), a quantity used for identifying terms that are especially important to a particular document.
- Chapter 4 introduces n-grams and how to analyze word networks in text using the widyr and ggraph packages.
- Chapter 5 introduces methods for tidying document-term matrices and corpus objects from the tm and quanteda packages, as well as for casting tidy text datasets into those formats.
- Chapter 6 explores the concept of topic modeling, and uses the
tidy()
method to interpret and visualize the output of the topicmodels package.