⚠️ This website has moved to chato.cl/life

Write-ups

We are building systems that encourage ignorance and weaken democracy

For centuries we've designed complex systems through a superposition of layers. The outer layer, the "user interface," is in direct contact with users. The inner layers correspond to subsystems that work on behalf of the users, but with which users have little or no direct interactions.

Everything is designed that way, not only software and gadgets:

  • Food is a complex system in which the user interface includes package labels, supermarket shelves, and restaurant menus.
  • Democracy is a complex system in which the user interface includes voting ballots, media coverage of politics, and demonstrations.
  • Banking is a complex system in which the user interface includes automatic and human tellers, point of sale systems, and stock tickers.
  • Healthcare is a complex system in which the user interface includes waiting rooms, nurses, doctors, and pharmacists.
  • Cars are complex systems in which the user interface includes the steering wheel and the dashboard.

We use many metaphors to speak about the inner layers. We speak of what happens in the "back office" or "under the hood." We assume whatever happens there doesn't concern us as long as the job gets done. System designers assume, in turn, that most of the time users don't need to know about the inner layers. Indeed, when an inner layer is inadvertently exposed through the interface we see this as a system failure; sometimes a benign one such as a bit of dirt in supermarket vegetables, sometimes a creepy one such as a chicken head in a bag of McNuggets.

All systems require transparency to use them effectively, to recover from failure, and to build upon them.

Despite what naïve system designers may think, in most systems users often need to know about what internal layers are doing: to use a system effectively, to recover from failure, and to build upon it or customize it to their needs. For instance:

  • A healthy diet requires knowing what goes into our food. A sustainable diet requires knowing where does it come from and how it is prepared.
  • Democracy cannot be construed to mean blindly voting every few years, and actual progress requires understanding how legislative change works.
  • Reducing the harm from the next economic crisis, requires understanding what can each of us should do to prevent it from being too deep.
  • Maintaining good health and recovering from illness without bankrupting hospitals, increasingly requires relying of patients' self-care.
  • Fuel economy and safe driving requires a basic understanding of how a car engine works, and what are the symptoms of possible failure.

Some of the biggest crises we're facing today are caused or aggravated by the fact that we're hiding too much information in the name of "simplicity." We're increasingly becoming separated of sub-systems that are vital for us. Complete opacity is bad design, sometimes intentionally so. Usable systems have a transparent outer layer and ways of interacting with inner layers progressively.

Power structures that predate the information age deal with opacity by creating more opacity.

The way in which this information problem is addressed, and sometimes ignored, mirrors pre-information age power structures. Indeed, they tend to encourage more opacity: instead of transparency and access to information, we see more and more layers of specialists, representatives and regulators who are supposed to represent our interests and keep us safe:

  • Food production companies fight tooth and nail against any initiative to expand information to consumers. Instead, they encourage industry-led committees that determine what is good for us.
  • Instead of implementing transparency by default, most governments implement transparency on request, often behind a bureaucratic maze.
  • Banking oversight as implemented now requires us to blindly trust on the same people who repeatedly failed to prevent banking crises.
  • The only information we get about pharmaceuticals are advertisings encouraging us to use the newest medications.
  • Car diagnostics are hidden through proprietary systems that make (self-)maintenance impossible or artificially more expensive.

Professionals that create computing and information processing systems have our share of responsibility on this. Not only we design big parts of these systems, we create the wrong metaphors than shape the industries and the expectations of users.

We're encouraging users to believe that ignorance is a good thing.

Most notably, we're encouraging people to believe that ignorance is a good thing. Systems where users are "taken care of" and "don't need to worry about anything" should arise suspicion, not praise. As the world becomes more complex and people have a proportionally vanishing understanding of what happens around them, we should enable and encourage exploration.

A possible framework to achieve this is what Jonathan Zittrain calls generative systems, but there is much more to it. The starting point, in my opinion, is to realize that an entire generation is actively being prevented from understanding the systems around them. This is a huge step backwards. People learn about the systems around them by using them: let's encourage and enable that learning.

Censorship of social media in Qatar

Note: I live in Qatar since 2012, working for a local research institution as a computer scientist specialized in social media. As everything in this blog, my personal opinions do not reflect the position of the institutions I'm part of.

Despite widespread criticism, Qatar authorities have promulgated a new "Cybercrime Prevention Law". The law basically addresses three very distinct topics. The first topic (Chapter 1) is related to unauthorized access to computer systems, stealing or deletion of data, electronic fraud, etc. which together conform what is usually considered "cybercrime," i.e. crimes that involve a computer or network.

There is, however, a second topic (Chapter 2) that is not cybercrime but what the law refers to as "Content Crimes". Content crimes include helping terrorist organizations or disseminating child porn, both punished with up to 3 years in prison and a fine of up to 140K USD (500KQAR). It also includes electronic forgery and blackmail.

Prison for "false news" or violating "social values" online

Between the articles about terrorism and the ones about child pornography there is a vague provision regarding "false news" that basically extinguish freedom of press in Qatar, which is guaranteed in article 48 of its Constitution:

Article 6.- A sentence of not more than three years and a fine of not more than QR500,000 (~140K USD), or either of these penalties, shall be imposed on any person who through an information network or an information technology technique sets up or runs a website to publish false news to threaten the safety and security of the State or its public order or domestic and foreign security. A sentence of not more than a year in a prison and a fine of not more than QR250,000, or either of these penalties, shall be imposed on any person who promotes, disseminates or publishes in any way such false news for the same purpose.

Next, between the article about child pornography and the one about blackmail, there is an article that ends freedom of expression in Qatar, which is guaranteed in article 47 of its Constitution (emphasis added):

Article 8.- A sentence of not more than three years in prison and a fine of not more than QR100,000 (~27K USD), or either of these penalties, shall be imposed on any person who, through an information network or information technology technique, violates social values or principles, publishes news, photos or video or audio recordings related to the sanctity of people’s private or family life, even if the same is true, or insults or slanders others.

Additionally a third major topic (Chapter 5) establishes a maximalist view of intellectual property, in which copyright infringement is punished with up to 3 years in prison and a fine of up to QR500,000 (~140K USD). This is approximately the fine that the US law provides, which is one of the largest in the world (up to 150K USD per infraction), with the addition of jail time. Copyright law has been repeatedly used in the past in several countries to censor expression; for instance reproducing a past speech of someone without his/her authorization has been construed as a copyright violation.

What does it mean?

Personally, I find this extremely disheartening and a tremendous setback for a country that in many fronts is progressing.

The opinions of anybody are likely to challenge, in some way or another, the values or principles of somebody else.

As an atheist who believes in the separation of church and state, a vegan who abhors animal sacrifices including religious ones, a pro-LGBT right that considers inhumane the laws that punish homosexuality, a person who is pro-legalization of drugs for adults, that defends freedom of expressions and a sharing economy of knowledge, etc. I feel that most of my opinions (and those of anyone except drones!) challenge in some way the values or principles or other people. To me, challenging other people's views is part of cosmopolitanism; the opposite (ignoring each other's positions completely) has nothing to do with living together.

As a scientist who has researched extensively in social media credibility, I have to say that false news and rumors are inevitable in social media (and of media in general), particularly in times of crises. At the same time, there are mechanisms that correct false rumors in the sense that in a typical crisis misinformation is actually hard to find! Most people broadcast information that ends up being erroneous moved by a desire to help. Discouraging people from posting information in social media unless it is verified is dangerous: it creates a blind spot in the awareness that we can get from it during a crisis situation.

Finally, and here I echo what Amnesty International has said on the matter, a key issue is vagueness. The law defines "user", "provider", "network", etc. but does not define false news or what are the social values that people are not supposed to challenge through social media. In that sense, this law has an incredible potential for abuse and will have a chilling effect on the development of information technologies in Qatar.


See: unofficial translation to English [PDF] of the law promulgated on September 15th, 2014. Twitter bird and scissor: Carlos Latuff.

How does automatic classification of documents using machine learning works?

A friend asked me to explain how does an automatic system for classifying documents, such as AIDR, works.

We are going to do this in three steps, first a preliminary with an example on the risk of having a heart attack, then a little generalities, then the real thing.

Preliminary: predicting heart attack risk

Imagine a doctor with several patients that she has been following for several years. She has a clinical file for each patient in which she has noted the following: whether the patient smokes or not (which she writes as "smokes=y, smokes=n". whether the patient has high blood pressure or not (which she writes as "hypertensive=y, hypertensive=n", and whether the patient practices sports or not (which she writes as "sports=y, sports=n").

Finally, the doctor also notes if the patient has had a heart attack, written as "STROKE=y, STROKE=n":

  • Patient 1: smokes=y, hypertensive=y, sports=n, STROKE=y
  • Patient 2: smokes=y, hypertensive=n, sports=n, STROKE=y
  • Patient 3: smokes=y, hypertensive=n, sports=y, STROKE=n
  • Patient 4: smokes=n, hypertensive=y, sports=y, STROKE=n
  • Patient 5: smokes=n, hypertensive=y, sports=n, STROKE=y

Now, one can extract certain statistics from this data. For instance, patients 3 and 4 practice sports and didn't have a stroke, while patients 1, 2, and 5, don't practice sports and did have a stroke. From this data alone, one could conclude that practicing sports may help prevent a stroke (where the "may help" part doesn't come from this data but just from the recognition that 5 patients is not a lot).

We can also learn that 66% of the patients who smoke had heart strokes in this sample.

Now, if we look for combinations of factors, we can extract more information. For instance, by looking with care at the data, one can realize that, disregarding the practice of sports, everybody in this sample who either smokes or is hypertensive has had a heart attack. Obviously, with more data we can be more certain about how good are the combinations of factors that we learn, in terms of how closely they are related to a certain outcome.

Statistical machine learning

There are so many combinations of factors that even in the small dataset above, with five patients, exploring all the combinations and outcomes is very time consuming. Fortunately, there is where a well established research field, statistical machine learning, that studies precisely this problem.

This research field has studied for years different methods to automatically and quickly find relationships between elements in large-scale data. This process is known as learning, and there are many, many, techniques to do it.

In general, what these methods need in order to be able to learn effectively is: (i) a large amount of data, and (ii) the "right" data. In the example above, the medical doctor who interviewed the patients asked the "right" questions. If she had written instead their eye color or other irrelevant factor, learning something about heart stroke risk would have been much more difficult.

Classifying text

Text classification is not much different. Instead of 3 factors (smokes, hypertensive, and sports), we will have hundreds of thousands of factors, one for each word in the dictionary. The factors will take the form "word=y, word=n" where the "word" can be any word, and we write "y" when the document contains the word and "n" when it doesn't.

The outcomes will be different types of documents. Suppose our documents are tweets and we want to separate those that contain information about damage to infrastructures (DAMAGE=y) from those who don't (DAMAGE=n). Again, you can have the following table, in which for each tweet you have one factor for each word in the tweet, and the outcome has been written by an expert who has looked at the tweet and decided if it contains infrastructure damage or not:

Tweet1: ... building=y, ..., collapsed=y, ..., DAMAGE=y
Tweet2: ... bridge=y, ..., collapsed=y, ..., DAMAGE=y
Tweet3: ... bridge=y, ..., playing=y, ..., DAMAGE=n
...
Tweet1000: ... bridge=y, ..., hearts=y, ..., DAMAGE=n

Again, we can apply any of the statistical machine learning method to learn what are the combinations of words that indicate the presence of infrastructure damage reports.

That's all. Once we learn those combinations, we can use them automatically to evaluate new tweets. In this case, the learning method will also output a confidence, which you can understand roughly as the percentage of tweets having those factors that were found to have that predicted outcome in the data used to learn (it is more complex than that, but that is a good approximation).

When the data is large, in general it is very difficult for a human to be able to spot a pattern better than what a computer algorithm can do. This is why crafting rules by hand (containing "bridge" implies "DAMAGE" unless the tweet also contains "playing" or "play" or "ace" or "heart" or ...) is not only time-consuming but also tends to yield lower accuracy than automatic methods, and is in general a bad idea.

In the case of text, we also use other factors (we call them "features" or "attributes") in addition to words. For instance, we can take all sequences of two words or three words (which we call "word bi-grams" and "word tri-grams"). We can also look at the position of some words in the phrase, as to whether the word was capitalized or not, how many times it appeared, etc. For the learning method, this is all the same, simply more factors that can be exploited to learn about the data.


Further readings: lots of them, but you can start with the Wikipedia page on decision trees, which is a popular and easy to understand method for statistical machine learning.

News and Social Media (SNOW 2013 Keynote)

Slides from keynote at the Social News on the Web Workshop. Rio de Janeiro, Brazil, May 2013.

Doha II - June 2012

Shortly before it caught fire, I visited the Villagio mall, one of the three largest in Doha. As far as I understand, it is a copy of a place in Las Vegas that intends to give the visitor the impression that you are in an "Italian village", including Gucci stores, a Venetian channel with gondoliers, etc.

I also had the opportunity to meet the (admirable) crazy cat ladies from "Cats in Qatar". Doha is full of abandoned cats. One of them is in this picture; we found her in the -3 parking of Tornado Tower, the maintenance people from the building took care of her for a few days and for now I am taking care of her.

Fortunately my immigration paperwork is done. This is a record. In Italy my resident permit took 7 months. In Spain it took 3 months, without considering the time when they expelled me from the country. Here it was only 5 weeks.

It is evident that the Qatar Foundation as an employer has a lot of influence. They put us in front of the queues at every step, and that saves entire days of paperwork. Queues are never well arranged, and often, you can not trust they will be respected.. In my medical checkup my queue was reversible. You could be at the beginning or at the end, depending on the decision of the security guard.

Through all the city you can hear the speakers of the mosques calling to prayer. SpeakerS, mosqueS, plural. From here I can hear at least three. To me it sounds like a cacophony of Gregorian singing..

Being without Fabiola is weird, I have had some critical days in which I don't even want to eat, specially on weekends. My work mates are practically all foreigners, many from Egypt and India. We hang out together a lot. I have gone a couple of times to the movies ... in sex scenes, they blank the screen and you can hear only the audio.

A little problem derived from the local customs is that in shopping malls and restaurants there are areas for "families". The meaning of this is that men alone can not enter these areas. This discriminates obviously against the poorer immigrants, because to bring your family here you have to have a well-paid job.

* * *

I met my neighbors, a British couple absolutely lovely, who believing I was not at home took my garden furniture. But they gave it back ;-) They are very nice, we went for a brunch to the Ritz-Carlton, where there is a free buffet and a free bar of sparkling white wine if you want -- meaning, the perfect place for getting drunk on a Friday noon (!) Here people work from Sunday to Thursday and Friday, specially Friday morning, is the most relaxed time of the week.

* * *

To get my driver license I have to do a mini-course of 12 sessions. I did not like the idea much at the beginning, I had hoped to just exchange my Chilean license for a Qatari one and maybe if I had insisted enough I would have done that. But the course has not been bad. The first two classes are in a simulator, which is fun because in the simulator everybody is very imprudent, they don't respect traffic lights, etc. I killed a guy in the simulator, who practically waited for me to get closer to throw himself to the street. But I saved a camel in the "country" scenario.

Then there are the practical lessons where you get ready for the exam (traffic signs and practical exam with L-parking, parking in a tight spot, and on-the-road test). I think I will get a PhD in L-parking. I am also practicing a lot defensive driving.

It is worth doing it. People are very aggressive when driving and they do stuff I haven't seen elsewhere. For instance, back from work a colleague who was giving me a ride cut in front of a Land Cruiser, unwillingly. Later the Land Cruiser came in front of his car and stopped suddenly, to make us crash against him. Fortunately, the other people in the car warned my colleague that this would happen, because Qataris know that in a trial between them and a foreigner, the foreigner always looses.

The instructors in the driving school work 10 hours a day and I think mine is always about to throw himself out of the car window out of boredom. When instructors are not giving lessons, they wait in a room with air conditioning and watch wrestling matches.

* * *

Last week one of my colleagues was speaking with the son of someone from the office, who was complaining about bullies in school. My colleague told the kid that he had to do like Napoleon, who studied a lot in school to be better than the bullies. His answer:

-- I am sorry sir, that is a bad example. I don't want to be like Napoleon.
-- Why?
-- Because I am Egyptian.

Pages

Subscribe to RSS - Write-ups