This applies to all volunteers participating in the Web spam assessment process.
All the human-provided labels are anonymized. The final result of the classification process is a text file essentially equivalent to the following:
... site1.example.com user#12:spam user#34:normal user#6:spam site2.example.com user#6:spam user#12:spam ...
A list of all the "view" and "label" events, including the timestamp at which each host was viewed, labeled (and possibly re-labeled in case the assessor corrected the label) will also be publicly available in the form:
... user#12 view site1.example.com timestamp1 user#12 label site1.example.com borderline timestamp2 user#12 label site1.example.com spam timestamp3 user#34 view site1.example.com timestamp4 user#34 label site1.example.com normal timestamp4 ...
Comments about sites may be included in the resulting dataset, as:
... site1.example.com comment ...
We include a list of the names of all the volunteers in the acknowledgments/credits page of the collection, unless they opt-out from this.
If you help during the classification you may see other volunteer's labels, but the public collections include no information that allows to connect your name and your assessments.
For inquiries contact Carlos Castillo