IDD (I Don't Dance) Mixtape — November 2014

IDD (I Don't Dance) Mixtape — November 2014, with huge kudos to Gui Boratto and Yamil Colucci.

If I remember correctly, the last mixtape I made was 20 years ago :-)

IDD (I Don't Dance) Mixtape — November 2014 by Chato on Mixcloud

Download: IDWD (I Don't Dance) Mixtape - November 2014.mp3 [67 MB]

Made with SeratoDJ + Pioneer DDJ SX1. Also available in Mixcrate.

Censorship of social media in Qatar

Note: I live in Qatar since 2012, working for a local research institution as a computer scientist specialized in social media. As everything in this blog, my personal opinions do not reflect the position of the institutions I'm part of.

Despite widespread criticism, Qatar authorities have promulgated a new "Cybercrime Prevention Law". The law basically addresses three very distinct topics. The first topic (Chapter 1) is related to unauthorized access to computer systems, stealing or deletion of data, electronic fraud, etc. which together conform what is usually considered "cybercrime," i.e. crimes that involve a computer or network.

There is, however, a second topic (Chapter 2) that is not cybercrime but what the law refers to as "Content Crimes". Content crimes include helping terrorist organizations or disseminating child porn, both punished with up to 3 years in prison and a fine of up to 140K USD (500KQAR). It also includes electronic forgery and blackmail.

Prison for "false news" or violating "social values" online

Between the articles about terrorism and the ones about child pornography there is a vague provision regarding "false news" that basically extinguish freedom of press in Qatar, which is guaranteed in article 48 of its Constitution:

Article 6.- A sentence of not more than three years and a fine of not more than QR500,000 (~140K USD), or either of these penalties, shall be imposed on any person who through an information network or an information technology technique sets up or runs a website to publish false news to threaten the safety and security of the State or its public order or domestic and foreign security. A sentence of not more than a year in a prison and a fine of not more than QR250,000, or either of these penalties, shall be imposed on any person who promotes, disseminates or publishes in any way such false news for the same purpose.

Next, between the article about child pornography and the one about blackmail, there is an article that ends freedom of expression in Qatar, which is guaranteed in article 47 of its Constitution (emphasis added):

Article 8.- A sentence of not more than three years in prison and a fine of not more than QR100,000 (~27K USD), or either of these penalties, shall be imposed on any person who, through an information network or information technology technique, violates social values or principles, publishes news, photos or video or audio recordings related to the sanctity of people’s private or family life, even if the same is true, or insults or slanders others.

Emotions and the collaboration of Wikipedia editors

In this paper we study the emotional expression of people who edit the Wikipedia. We classify Wikipedia editors according to administrator status (administrators vs. regular editors) and by gender (male vs. female).

Several patterns emerge from the data:

  • Administrators maintain a rather neutral, impersonal tone, and are more task-oriented than regular editors (who are more relationship-oriented).
  • Female editors communicate more often in a manner that promotes emotional connection.
  • Editors tend to interact more often with editors having similar emotional styles (e.g. editors who often express anger connect more with one another, as shown in the graphic where "angry" editors appear in red).

Paper available (open access): Emotions under Discussion: Gender, Status and Communication in Online Collaboration by Daniela Iosub, David Laniado, Carlos Castillo, Mayo Fuster Morell, and Andreas Kaltenbrunner. PLOS ONE. August 2014. DOI:10.1371/journal.pone.0104880

How does automatic classification of documents using machine learning works?

A friend asked me to explain how does an automatic system for classifying documents, such as AIDR, works.

We are going to do this in three steps, first a preliminary with an example on the risk of having a heart attack, then a little generalities, then the real thing.

Preliminary: predicting heart attack risk

Imagine a doctor with several patients that she has been following for several years. She has a clinical file for each patient in which she has noted the following: whether the patient smokes or not (which she writes as "smokes=y, smokes=n". whether the patient has high blood pressure or not (which she writes as "hypertensive=y, hypertensive=n", and whether the patient practices sports or not (which she writes as "sports=y, sports=n").

Finally, the doctor also notes if the patient has had a heart attack, written as "STROKE=y, STROKE=n":

  • Patient 1: smokes=y, hypertensive=y, sports=n, STROKE=y
  • Patient 2: smokes=y, hypertensive=n, sports=n, STROKE=y
  • Patient 3: smokes=y, hypertensive=n, sports=y, STROKE=n
  • Patient 4: smokes=n, hypertensive=y, sports=y, STROKE=n
  • Patient 5: smokes=n, hypertensive=y, sports=n, STROKE=y

Now, one can extract certain statistics from this data. For instance, patients 3 and 4 practice sports and didn't have a stroke, while patients 1, 2, and 5, don't practice sports and did have a stroke. From this data alone, one could conclude that practicing sports may help prevent a stroke (where the "may help" part doesn't come from this data but just from the recognition that 5 patients is not a lot).

We can also learn that 66% of the patients who smoke had heart strokes in this sample.

Predictive Web Analytics Dataset and Challenge

A new and exciting dataset is available. It contains the number of visitors, average visit time, "tweets" on Twitter, and "likes" on Facebook, for a set of thousands of web pages. The data is aggregated on windows of 5-minutes, during a period of 48 hours.

We are inviting researchers to participate in a competition: an ECML/PKDD Discovery Challenge that consists on predicting the total activity after 48 hours, by observing only the first hour of life of a web page. This is an important task that has significant practical applications.

Dataset available courtesy of Chartbeat Inc.

Carlos Castillo and Josh Schwartz
Predictive Web Analytics Challenge Co-Chairs

