Why?
To understand the different credibility levels of talking about Covid within the AoT collection
What?
Plain text from working URLs from AoT mentioning Covid related topics
How?
- Link checker with code
- Downloaded all the articles into a pandas data frame: text, link, keywords, summaries (Newspaper3k package)
- Run a text search with the 10 most frequent Covid-related terms, then manually filtered the irrelevant ones out in Openrefine
- After the literature review, decided to use the "information credibility" framework. This framework divides sources into three categories: credible, questionable, and non-credible.
- Manually categorised the articles following the framework guidelines
- Word frequency and topic modelling on all categories (python: nltk, spacy, gensim)
Results
Credible
104 rows
Non-credible
33 rows