⚠️ This website has moved to chato.cl/life

Profession and academia

Modern Information Retrieval - Second Edition is finally out!

Modern Information Retrieval, 2nd ed. will be a key textbook in information retrieval during the coming years, and I expect it to be as successful as its first incarnation.

The book has been completely rewritten, including four new chapters and a new appendix, many new topics and more teaching resources. Overall has 18 chapters and appendices, 913 pages (more than 1,100 in the font of the first edition) and 1,800 references.

Ricardo Baeza-Yates and Berthier Ribeiro-Neto wrote about 60% of the book and participated in all but 5 chapters. Many people collaborated in several chapters; in my case I co-authored the chapter on Web Crawling.

Several chapters can be downloaded as well as slides for teaching from http://mir2ed.org/

Yahoo! Clues lets you explore how people search

Today Yahoo! launched Clues, a service based on science from Yahoo! Research Barcelona.

Yahoo! Clues lets you explore how people are using Yahoo! Search. When you enter a word or phrase in the "Search Term" field and click Discover, you’ll see information about that search term’s popularity over time, across demographic groups, and in different locations.

You can also enter a second search term in the "Compare With" field. This will show you information on both search terms, side by side.

Among other things, it allow users to browse a real query-flow graph:

clues.yahoo.com

Slides from paper on automatic creation of teams

These are the slides from our paper [pdf] on creating teams automatically. It was presented last week by Aris Gionis at CIKM'10 in Toronto, Canada.

This research is about creating and assigning teams on-the-fly as a stream of tasks arrives. It is particularly useful for "horizontal" organizations where there is not a single control point deciding who gets to do what. The algorithm we present tries to balance effectiveness (allocating the rights teams to each task) and fairness (dividing evenly the workload among people, even if they have different skills).

Aris Anagnostopoulos, Carlos Castillo, Aristides Gionis, Luca Becchetti, Stefano Leonardi: "Power in Unity: Forming Teams in Large-Scale Community Systems" [pdf]. Proc. of CIKM 2010, pp. 599-608.Toronto, Canada. ACM Press.

Twitter After a Disaster: Is It Reliable?

Wall Street Journal » Blogs » Digits
By Jennifer Valentino-DeVries

Can you trust Twitter in a disaster?

Researchers at Yahoo analyzed tweets after the Chilean earthquake earlier this year and found evidence that the Twitter community works like a “collaborative filter,” questioning reports that turn out to be fake and confirming those that are true.

Lately the microblogging service has been a source of news after catastrophes such as earthquakes. It’s quick and easy and can be a direct conduit from eyewitnesses to the outside world. But like anything else coming from unsubstantiated sources, news from Twitter faces big questions of credibility.

The Yahoo researchers didn’t find anything to suggest that information should be considered reliable because it’s tweeted or re-tweeted. But what they found was that when false rumors entered Twitter, about half of the tweets related to the information denied it.

When the researchers studied tweets about rumors that were later confirmed to be true, they found that less than 1% of tweets about that information denied it. Some tweets questioned the true information, but eventually it became supported, and almost all tweets affirmed it.

In one example, news of a tsunami after the Chilean earthquake on Feb. 27 spread quickly through Twitter “while government authorities ignored its existence,” the authors — Marcelo Mendoza, Barbara Poblete and Carlos Castillo — wrote. That rumor turned out to be true.

News of baseless rumor about a tsunami warning elsewhere in Chile also spread quickly through Twitter, but the vast majority of tweets mentioning it denied or questioned it.

The authors of the Yahoo study suggest that Twitter programs could provide a service to analyze tweets and warn users when a lot of other tweeters are questioning information they are reading. “This would provide signals for users to determine how much to trust a certain piece of information,” they wrote in the paper, which was presented at a social-media analytics workshop in late July.

The paper also takes a look at how information spread through Twitter after the earthquake — who was retweeting whom and how the conversation about the quake changed over time. On the first day, tweets about the earthquake itself and about tsunami warnings dominated. That news was replaced by tweets focused on missing people and requests for help — and then finally about news that the quake might have shifted the Earth’s axis.

Follow Jennifer Valentino-DeVries on Twitter @jenvalentino


Marcelo Mendoza, Barbara Poblete, Carlos Castillo: "Twitter Under Crisis: Can we trust what we RT?". In SOMA 2010: KDD Workshop on Social Media Analytics, Washington, DC. July 2010. [bib|soma]


Original article in WSJ.COM

New Scientist coverage of our [Weber and Castillo 2010] SIGIR paper

The New Scientist magazine featured our upcoming SIGIR paper; it will be presented in Geneva next week:

Demographic data can help, say Ingmar Weber and Carlos Castillo at Yahoo Research Barcelona, Spain. For example, they say that when US women type in the search term "wagner", they are most likely to be thinking of the 19th-century German composer. US men, on the other hand, may well be thinking about the makers of spray painters.

By giving a search engine some basic demographic information, such as age, gender and educational background, it is possible to boost the engine's chances of identifying user intent correctly, say Weber and Castillo. That personal information can be gleaned when people sign up to the other services, such as email, that search engines provide.

To check their theory, the researchers analysed data collected from Yahoo account holders through its search engine over a 12-month period. They then identified ambiguous search terms by looking at searches for which the top few results concerned wildly divergent concepts, and recorded which result the user chose to click on. By running those ambiguous search terms through their demographically modified search engine, they managed to get the chosen link to appear as the top-ranked result 7 per cent more often than in the standard Yahoo search.

Ingmar Weber and Carlos Castillo: The Demographics of Web Search. SIGIR 2010.

Update 2010-07-13: the paper was also posted in Slashdot.

Update 2010-07-14: the paper was also recommended by The Economist (free exchange blog).

Pages

Subscribe to RSS - Profession and academia