Information Credibility on Twitter (presentation)

Here is the presentation I gave on this paper:

C. Castillo, M. Mendoza, B. Poblete: "Information Credibility on Twitter". Proc. of WWW 2011, Hyderabad, India. ACM Press.

Lawrence Lessig: "Citizens"

Lessig explains corruption in 10 minutes.

"There are a thousand hacking at the branches of evil to one who is striking at the root." — Thoreau

TAXOMO sequence-mining tool available

I am glad to announce that today we released the TAXOMO sequence mining software under a BSD license.

TAXOMO is a data-mining tool for sequences. It takes as input a set of sequences and a taxonomy, and generates a succinct description of the sequences (specifically, a Markov chain with lumped states).

The input sequences may represent any kind of data, e.g.: trajectories on a map, web pages visited by a user, etc. The taxonomy should be defined over the states in the sequences. In the case of a map, for instance, they can be regions and sub-regions for the points in the map. In the case of a web site, they can be categories and sub-categories for the pages.

Taxomo was developed at Yahoo! Research Barcelona, and it is described in:

Francesco Bonchi, Carlos Castillo, Debora Donato, Aristides Gionis: "Taxonomy-driven lumping for sequence mining". Data Mining and Knowledge Discovery, Springer, Volume 19, Issue 2, p.227-244 (2009)

For more information and download, see: http://taxomo.sourceforge.net/

Finding the "best" and the "worst" on the Web and Social Media

Call for papers: Workshop on Web Quality (joint WICOW/AIRWeb workshop)

In conjunction with the 20th International World Wide Web Conference in Hyderabad, India. DEADLINE: 31/Jan/2011

The objective of the workshop is to provide the research communities working on web spam, abuse, credibility, and reputation topics with a survey of current problems and potential solutions. It will present an opportunity for close interaction between practitioners who may have focused on more isolated sub-areas previously. We also want to gather crucial feedback for the academic community from participants representing major industry players on how web content quality research can contribute to practice.

On one hand, the joint workshop will cover the more blatant and malicious attempts that deteriorate web quality such as spam, plagiarism, or various forms of abuse and ways to prevent them or neutralize their impact on information retrieval. On the other hand, it will also provide a venue for exchanging ideas on quantifying finer-grained issues of content credibility and author reputation, and modeling them in web information retrieval.

See the workshop topics and more information »»

Modern Information Retrieval - Second Edition is finally out!

Modern Information Retrieval, 2nd ed. will be a key textbook in information retrieval during the coming years, and I expect it to be as successful as its first incarnation.

The book has been completely rewritten, including four new chapters and a new appendix, many new topics and more teaching resources. Overall has 18 chapters and appendices, 913 pages (more than 1,100 in the font of the first edition) and 1,800 references.

Ricardo Baeza-Yates and Berthier Ribeiro-Neto wrote about 60% of the book and participated in all but 5 chapters. Many people collaborated in several chapters; in my case I co-authored the chapter on Web Crawling.

Several chapters can be downloaded as well as slides for teaching from http://mir2ed.org/

Pages

Subscribe to ChaTo (Carlos Castillo) RSS