Why?

To understand the different credibility levels of talking about Covid within the AoT collection

What?

Plain text from working URLs from AoT mentioning Covid related topics

How?

  1. Link checker with code
  2. Downloaded all the articles into a pandas data frame: text, link, keywords, summaries (Newspaper3k package)
  3. Run a text search with the 10 most frequent Covid-related terms, then manually filtered the irrelevant ones out in Openrefine
  4. After the literature review, decided to use the "information credibility" framework. This framework divides sources into three categories: credible, questionable, and non-credible.
  5. Manually categorised the articles following the framework guidelines
  6. Word frequency and topic modelling on all categories (python: nltk, spacy, gensim)

Results

Credible

104 rows

Untitled

Screenshot 2023-01-19 at 09.18.55.png

Untitled

Non-credible

33 rows

Untitled