The Web spam datasets in this site are provided to advance research on Web spam detection, thanks to a collaborative effort by a team of volunteers. These labels are intended for research purposes only. We advice you not to use these labels directly for search engine ranking or filtering.
The labels and graphs are freely available as per this license (CC-by-nc-sa), which basically states that you are free to use the labels and that we make no warranties about them. You can download and use the labels for research in any institution public or private. The "nc-sa" (non-commercial, share-alike) rule applies if you want to redistribute the labels publicly.
These labels are intended for research purposes only. We advice you not to use these labels directly for search engine ranking or filtering. As per our privacy policy, all human judgments about the collection have been anonymized.
Researchers must sign a research-only data usage agreement before obtaining the contents of the pages.
The WEBSPAM-UK2002 and WEBSPAM-UK2006 datasets were obtained at the Algorithmic Engineering group at Università di Roma "La Sapienza" as part of the DELIS project.
The .UK crawls 2006-05 and 2007-05 were downloaded by the Laboratory of Web Algorithmics, Università degli Studi di Milano with the support of the DELIS EU - FET research project.
For inquiries contact Carlos Castillo