Research » Web Spam Detection » Datasets » Previous datasets

Previous datasets

Archived dataset: WEBSPAM-UK2002

This was our the collection, labeled at University of Rome La Sapienza. The following file contains a list of 5,300+ class labels assigned manually and based on domain names, for hosts in the .UK domain in 2002:

Details about the classification process can be found in [1,2]. The URLs and hyperlinks are available from the download page of UK-2002 at the Laboratory of Web Algorithmics. Unfortunately we do not have the contents of the pages.

For inquiries contact Carlos Castillo