Research » Web Spam Detection » Privacy Policy

Privacy Policy

Scope

This applies to all volunteers participating in the Web spam assessment process.

Policy

All the human-provided labels are anonymized. The final result of the classification process is a text file essentially equivalent to the following:

...
site1.example.com user#12:spam user#34:normal user#6:spam
site2.example.com user#6:spam user#12:spam
...

A list of all the "view" and "label" events, including the timestamp at which each host was viewed, labeled (and possibly re-labeled in case the assessor corrected the label) will also be publicly available in the form:

...
user#12 view site1.example.com timestamp1
user#12 label site1.example.com borderline timestamp2
user#12 label site1.example.com spam timestamp3
user#34 view site1.example.com timestamp4
user#34 label site1.example.com normal timestamp4
...

Comments about sites may be included in the resulting dataset, as:

...
site1.example.com comment
...

We include a list of the names of all the volunteers in the acknowledgments/credits page of the collection, unless they opt-out from this.

If you help during the classification you may see other volunteer's labels, but the public collections include no information that allows to connect your name and your assessments.

For inquiries contact Carlos Castillo