(Disaster)Contrary to what seems to be the norm in Hollywood movies, people don't run in circles screaming and shouting when facing an emergency situation. The immediate, widespread, and ineffective mayhem so often portrayed in disaster movies is to a large extent a plot device, not very different from typical scenes in horror films in which people irrationally split and run straight into danger.

Sociologists of disaster, some of whom have researched these situations for decades, tell us a different story. When faced with a sudden crisis, people quickly try to gather as much information as they can from the sources most available in that moment: people around them, radio, television, or the internet. Based on this information, they evaluate the different alternatives, and take cover, flee, or act in a usually life-saving way. While panic can sometimes get in the way of safety, in most cases people's reactions are fast, calm, and more importantly, effective.

For example, in 2008, Qantas Flight 30 suffered an explosive decompression in midair due to a cargo door that "popped out", creating a hole the size of a small car. Passengers heard a loud noise, oxygen masks fell, and the aircraft rapidly started to drop in altitude to equalise air pressure. Little panic followed, and a passenger described the scene as: "No one panicked, there was no screaming. It was not your typical television movie. Everyone listened to the cabin staff."

People are not only effective saving their own lives, but also saving others. Most of the rescues in the immediate aftermath of a disaster are not done by fire brigades or professional emergency responders: It is the people directly affected by a disaster who take decisive actions and are indeed, the first responders.

With all this in mind, it is only natural that as social media spread and flourished in the past decade, it gradually took an important role in people's lives during emergencies, including natural and man-made disasters.

'Command and control' vs 'engage and listen'

Despite these realities, the "command and control" approach to disasters is fairly prevalent. In this framework, official authorities are expected to provide instructions to an uninformed and passive population. Indeed, this is the most common way in which social media is seen by government officials, as simply one more channel to push information out to the public.

While new and emerging volunteer organisations are often tech-savvy and native of online spaces, governments and formal non-governmental organisations that actually engage with and listen to affected populations through social media are still an exception rather than the norm. The American Red Cross was one of the pioneers, by creating a Digital Operations Center to monitor social media and to answer questions from the public, as well as disseminating life-saving information. The United Nation's Office for the Coordination of Humanitarian Affairs was another pioneer in the field which cofounded the Digital Humanitarian Network to extract information from social media to monitor a developing situation on cases of disasters.

At the government level, the disaster response strategy of both the Federal Emergency Management Agency (FEMA) in the US and the Philippines' government includes social media, the latter even chooses an "official" hashtag to be used for every large crisis event. At a more local level, the Twitter accounts offices for emergency management of both New York (@NYCOEM) and San Francisco (@SF_Emergency) often answer questions from the public through Twitter (my co-worker Patrick Meier has blogged extensively about these efforts, and similar initiatives).

The social media data deluge

Social media activity flares up in areas affected by disasters, often reaching up to thousands of postings and hundreds of photos per minute. Facebook data scientists have measured such bursts of activities during earthquakes. Others have even proposed (jokingly, but accurately) that tweets posted immediately after an earthquake come so fast, that in theory you could read a tweet about an earthquake before the seismic waves actually reach you.

The huge data volume and velocity makes it hard for everyone to make sense of social media data, but this is not the only problem. There are other concerns regarding the authenticity and veracity of messages, as social media is assumed to be less trustworthy than traditional media, mostly due to the anonymity users enjoy.

Also, rumours online are to a large extent self-correcting, and people question and correct social media news they consider dubious or false.

There are many problems with this assumption. In general, there is no reason to blindly trust everything anyone says, independently of whether it is online or offline, and independently of their credentials or performance in the past. Nobody is above making mistakes, including traditional media (such as CBS when it recently reported a "sideways tornado"). Particularly during emergencies, false rumours are often spread by well-intentioned people who simply weigh in the risk associated with not sharing potentially life-saving information, which may or may not end up being true.

The fact that many users share information without verifying it first may be a disadvantage of participative and social media, but it is also what makes social media so fast. Forbidding users from spreading "false news" can be dangerous in the face of a crisis, as it might also discourage them from spreading true news. In reality, being able to spread unverified information during an emergency is a key capacity of social media and one that can save lives. During a crisis, people don't take important decisions based on a single source, but instead contrast information from different sources. Also, rumours online are to a large extent self-correcting, and people question and correct social media news they consider dubious or false.

Crisis computing

Computational methods can contribute to rapidly filtering, sorting and aggregating vast volumes of social media during disasters. By a recent count, over 150 research articles have been published on algorithms for processing social media during crises.

These have focused on methods for collecting crisis-relevant data, detecting events and subevents, georeferencing information, determining information credibility, classifying information into categories, visualising the needs of affected populations in time and space, and even automatically generating summaries and timelines of a developing crisis from millions of postings - all this in the short time frame available during an emergency.

Interestingly, the key to a new wave of computational methods for processing social media data are people themselves. Hybrid methods combine human and machine intelligence by employing digital volunteers along with artificial intelligence (machine learning) methods. These methods are able to make sense of ambiguous data, something humans do much better than machines, as well as dealing with large volumes of data in a deterministic and reliable way, something machines do much better than humans.

For a researcher, to be able to use computer science to help in problems of societal value, such as emergency response and in general data science for social good, is a great opportunity and an invitation to participate in some of the most interesting challenges of applied computing.

The author wishes to thank research collaborators Muhammad Imran, Sarah Vieweg, Alexandra Olteanu, Hemant Purohit, Fernando Diaz and Patrick Meier.

Published in Al Jazeera: How tweets and algorithms can save lives »
December 5th, 2014.

IDD (I Don't Dance) Mixtape — November 2014, with huge kudos to Gui Boratto and Yamil Colucci.

If I remember correctly, the last mixtape I made was 20 years ago :-)

IDD (I Don't Dance) Mixtape — November 2014 by Chato on Mixcloud

Download: IDD (I Don't Dance) Mixtape - November 2014.mp3 [67 MB]

Made with SeratoDJ + Pioneer DDJ SX1. Also available in Mixcrate.

Note: I live in Qatar since 2012, working for a local research institution as a computer scientist specialized in social media. As everything in this blog, my personal opinions do not reflect the position of the institutions I'm part of.

Despite widespread criticism, Qatar authorities have promulgated a new "Cybercrime Prevention Law". The law basically addresses three very distinct topics. The first topic (Chapter 1) is related to unauthorized access to computer systems, stealing or deletion of data, electronic fraud, etc. which together conform what is usually considered "cybercrime," i.e. crimes that involve a computer or network.

There is, however, a second topic (Chapter 2) that is not cybercrime but what the law refers to as "Content Crimes". Content crimes include helping terrorist organizations or disseminating child porn, both punished with up to 3 years in prison and a fine of up to 140K USD (500KQAR). It also includes electronic forgery and blackmail.

Prison for "false news" or violating "social values" online

Between the articles about terrorism and the ones about child pornography there is a vague provision regarding "false news" that basically extinguish freedom of press in Qatar, which is guaranteed in article 48 of its Constitution:

Article 6.- A sentence of not more than three years and a fine of not more than QR500,000 (~140K USD), or either of these penalties, shall be imposed on any person who through an information network or an information technology technique sets up or runs a website to publish false news to threaten the safety and security of the State or its public order or domestic and foreign security. A sentence of not more than a year in a prison and a fine of not more than QR250,000, or either of these penalties, shall be imposed on any person who promotes, disseminates or publishes in any way such false news for the same purpose.

Next, between the article about child pornography and the one about blackmail, there is an article that ends freedom of expression in Qatar, which is guaranteed in article 47 of its Constitution (emphasis added):

Article 8.- A sentence of not more than three years in prison and a fine of not more than QR100,000 (~27K USD), or either of these penalties, shall be imposed on any person who, through an information network or information technology technique, violates social values or principles, publishes news, photos or video or audio recordings related to the sanctity of people’s private or family life, even if the same is true, or insults or slanders others.

Additionally a third major topic (Chapter 5) establishes a maximalist view of intellectual property, in which copyright infringement is punished with up to 3 years in prison and a fine of up to QR500,000 (~140K USD). This is approximately the fine that the US law provides, which is one of the largest in the world (up to 150K USD per infraction), with the addition of jail time. Copyright law has been repeatedly used in the past in several countries to censor expression; for instance reproducing a past speech of someone without his/her authorization has been construed as a copyright violation.

What does it mean?

Personally, I find this extremely disheartening and a tremendous setback for a country that in many fronts is progressing.

The opinions of anybody are likely to challenge, in some way or another, the values or principles of somebody else.

As an atheist who believes in the separation of church and state, a vegan who abhors animal sacrifices including religious ones, a pro-LGBT right that considers inhumane the laws that punish homosexuality, a person who is pro-legalization of drugs for adults, that defends freedom of expressions and a sharing economy of knowledge, etc. I feel that most of my opinions (and those of anyone except drones!) challenge in some way the values or principles or other people. To me, challenging other people's views is part of cosmopolitanism; the opposite (ignoring each other's positions completely) has nothing to do with living together.

As a scientist who has researched extensively in social media credibility, I have to say that false news and rumors are inevitable in social media (and of media in general), particularly in times of crises. At the same time, there are mechanisms that correct false rumors in the sense that in a typical crisis misinformation is actually hard to find! Most people broadcast information that ends up being erroneous moved by a desire to help. Discouraging people from posting information in social media unless it is verified is dangerous: it creates a blind spot in the awareness that we can get from it during a crisis situation.

Finally, and here I echo what Amnesty International has said on the matter, a key issue is vagueness. The law defines "user", "provider", "network", etc. but does not define false news or what are the social values that people are not supposed to challenge through social media. In that sense, this law has an incredible potential for abuse and will have a chilling effect on the development of information technologies in Qatar.

See: unofficial translation to English [PDF] of the law promulgated on September 15th, 2014. Twitter bird and scissor: Carlos Latuff.

In this paper we study the emotional expression of people who edit the Wikipedia. We classify Wikipedia editors according to administrator status (administrators vs. regular editors) and by gender (male vs. female).

Several patterns emerge from the data:

  • Administrators maintain a rather neutral, impersonal tone, and are more task-oriented than regular editors (who are more relationship-oriented).
  • Female editors communicate more often in a manner that promotes emotional connection.
  • Editors tend to interact more often with editors having similar emotional styles (e.g. editors who often express anger connect more with one another, as shown in the graphic where "angry" editors appear in red).

Paper available (open access): Emotions under Discussion: Gender, Status and Communication in Online Collaboration by Daniela Iosub, David Laniado, Carlos Castillo, Mayo Fuster Morell, and Andreas Kaltenbrunner. PLOS ONE. August 2014. DOI:10.1371/journal.pone.0104880

A friend asked me to explain how does an automatic system for classifying documents, such as AIDR, works.

We are going to do this in three steps, first a preliminary with an example on the risk of having a heart attack, then a little generalities, then the real thing.

Preliminary: predicting heart attack risk

Imagine a doctor with several patients that she has been following for several years. She has a clinical file for each patient in which she has noted the following: whether the patient smokes or not (which she writes as "smokes=y, smokes=n". whether the patient has high blood pressure or not (which she writes as "hypertensive=y, hypertensive=n", and whether the patient practices sports or not (which she writes as "sports=y, sports=n").

Finally, the doctor also notes if the patient has had a heart attack, written as "STROKE=y, STROKE=n":

  • Patient 1: smokes=y, hypertensive=y, sports=n, STROKE=y
  • Patient 2: smokes=y, hypertensive=n, sports=n, STROKE=y
  • Patient 3: smokes=y, hypertensive=n, sports=y, STROKE=n
  • Patient 4: smokes=n, hypertensive=y, sports=y, STROKE=n
  • Patient 5: smokes=n, hypertensive=y, sports=n, STROKE=y

Now, one can extract certain statistics from this data. For instance, patients 3 and 4 practice sports and didn't have a stroke, while patients 1, 2, and 5, don't practice sports and did have a stroke. From this data alone, one could conclude that practicing sports may help prevent a stroke (where the "may help" part doesn't come from this data but just from the recognition that 5 patients is not a lot).

We can also learn that 66% of the patients who smoke had heart strokes in this sample.

Now, if we look for combinations of factors, we can extract more information. For instance, by looking with care at the data, one can realize that, disregarding the practice of sports, everybody in this sample who either smokes or is hypertensive has had a heart attack. Obviously, with more data we can be more certain about how good are the combinations of factors that we learn, in terms of how closely they are related to a certain outcome.

Statistical machine learning

There are so many combinations of factors that even in the small dataset above, with five patients, exploring all the combinations and outcomes is very time consuming. Fortunately, there is where a well established research field, statistical machine learning, that studies precisely this problem.

This research field has studied for years different methods to automatically and quickly find relationships between elements in large-scale data. This process is known as learning, and there are many, many, techniques to do it.

In general, what these methods need in order to be able to learn effectively is: (i) a large amount of data, and (ii) the "right" data. In the example above, the medical doctor who interviewed the patients asked the "right" questions. If she had written instead their eye color or other irrelevant factor, learning something about heart stroke risk would have been much more difficult.

Classifying text

Text classification is not much different. Instead of 3 factors (smokes, hypertensive, and sports), we will have hundreds of thousands of factors, one for each word in the dictionary. The factors will take the form "word=y, word=n" where the "word" can be any word, and we write "y" when the document contains the word and "n" when it doesn't.

The outcomes will be different types of documents. Suppose our documents are tweets and we want to separate those that contain information about damage to infrastructures (DAMAGE=y) from those who don't (DAMAGE=n). Again, you can have the following table, in which for each tweet you have one factor for each word in the tweet, and the outcome has been written by an expert who has looked at the tweet and decided if it contains infrastructure damage or not:

Tweet1: ... building=y, ..., collapsed=y, ..., DAMAGE=y
Tweet2: ... bridge=y, ..., collapsed=y, ..., DAMAGE=y
Tweet3: ... bridge=y, ..., playing=y, ..., DAMAGE=n
Tweet1000: ... bridge=y, ..., hearts=y, ..., DAMAGE=n

Again, we can apply any of the statistical machine learning method to learn what are the combinations of words that indicate the presence of infrastructure damage reports.

That's all. Once we learn those combinations, we can use them automatically to evaluate new tweets. In this case, the learning method will also output a confidence, which you can understand roughly as the percentage of tweets having those factors that were found to have that predicted outcome in the data used to learn (it is more complex than that, but that is a good approximation).

When the data is large, in general it is very difficult for a human to be able to spot a pattern better than what a computer algorithm can do. This is why crafting rules by hand (containing "bridge" implies "DAMAGE" unless the tweet also contains "playing" or "play" or "ace" or "heart" or ...) is not only time-consuming but also tends to yield lower accuracy than automatic methods, and is in general a bad idea.

In the case of text, we also use other factors (we call them "features" or "attributes") in addition to words. For instance, we can take all sequences of two words or three words (which we call "word bi-grams" and "word tri-grams"). We can also look at the position of some words in the phrase, as to whether the word was capitalized or not, how many times it appeared, etc. For the learning method, this is all the same, simply more factors that can be exploited to learn about the data.

Further readings: lots of them, but you can start with the Wikipedia page on decision trees, which is a popular and easy to understand method for statistical machine learning.


Subscribe to ChaTo (Carlos Castillo) RSS