I had the privilege to work with Wei Chen (Microsoft Research) and Laks V.S. Lakshmanan (University of British Columbia) on a book for the Synthesis Lectures on Data Management series, edited by M. Tamer Özsu and published by Morgan and Claypool.
This book starts with a detailed description of well-established diffusion models, including the independent cascade model and the linear threshold model, that have been successful at explaining propagation phenomena. We describe their properties as well as numerous extensions to them, introducing aspects such as competition, budget, and time-criticality, among many others. We delve deep into the key problem of influence maximization, which selects key individuals to activate in order to influence a large fraction of a network. Influence maximization in classic diffusion models including both the independent cascade and the linear threshold models is computationally intractable, more precisely #P-hard, and we describe several approximation algorithms and scalable heuristics that have been proposed in the literature. Finally, we also deal with key issues that need to be tackled in order to turn this research into practice, such as learning the strength with which individuals in a network influence each other, as well as the practical aspects of this research including the availability of datasets and software tools for facilitating research. We conclude with a discussion of various research problems that remain open, both from a technical perspective and from the viewpoint of transferring the results of research into industry strength applications
Wired UK, 30 September 2013.
On 24 September a 7.7-magnitude earthquake struck south-west Pakistan, killing at least 300 people. The following day Patrick Meier at the Qatar Computer Research Institute (QCRI) received a call from the UN Office for the Coordination of Humanitarian Affairs (OCHA) asking him to help deal with the digital fallout -- the thousands of tweets, photos and videos that were being posted on the web containing potentially valuable information about the disaster.
[...] AIDR (Artificial Intelligence for Disaster Response) was the second project tested for the first time during the Pakistan floods, and is due to be launched officially at the CrisisMappers conference in Nairobi in November. It's an open-source tool relying on both human and machine computing, allowing human users to train algorithms to automatically classify tweets and determine whether or not they are relevant to a particular disaster.
In Pakistan, SBTF volunteers tagged 1,000 tweets, out of which 130 were used to create a classifier and train an algorithm that could be used to recognise relevant tweets with up to 80 percent accuracy ...
QCRI/AJE press release: QCRI and Al Jazeera launch predictive web analytics platform for news
New platform developed by QCRI and Al Jazeera can predict visits to news articles by taking cues from social media
News organisations have vast archives of information, as well as a number of web analytic tools that aid in allocating editorial resources to cover different news events, and capitalise on this information. These tools allow editors and media managers to react to shifts in their audience’s interest, but what is lacking is a tool to help predict such shifts.
Qatar Computing Research Institute (QCRI) and Al Jazeera are announcing the launch of FAST (Forecast and Analytics of Social Media and Traffic), a platform that analyses in real-time the life cycle of news stories on the web and social media, and provides predictive analytics that gauge audience interest.
“The explosion of big data in the media domain has provided QCRI an excellent research opportunity to develop an innovative way to derive value from the information,” said Dr Ahmed Elmagarmid, Executive Director of QCRI. “Together with our valued partner, Al Jazeera, the QCRI team has developed a platform that will help shift the way media does business.”
With Janette Lehmann (UPF), Mounia Lalmas (Yahoo!) and Ethan Zuckerman (MIT Civic Media), we developed an automatic method (pdf, blog post) that groups together all the users who tweet a particular news item, and later detects new contents posted by them that are related to the original news item.
We call each such group a transient news crowd. The beauty of this approach, in addition to being fully automatic, is that there is no need to pre-define topics and the crowd becomes available immediately, allowing journalists to cover news beats incorporating the shifts of interest of their audiences.