Profession and academia

Decisions that are partially or completely based on the analysis of large datasets are becoming more common every day. Data-driven decisions can bring multiple benefits, including increased efficiency and scale. Decisions made by algorithms and based on data also carry an implicit promise of "neutrality." However, this supposed algorithmic neutrality has been brought into question by both researchers and practitioners.

Algorithms are not really "neutral." They embody many design choices, and in the case of data-driven algorithms, include decisions about which datasets to use and how to use them. One particular area of concern are datasets containing patterns of past and present discrimination against disadvantaged groups, such as hiring decisions made in the past and containing subtle or not-so-subtle discriminatory practices against women or minority races, to name just two main concerns. These datasets, when used to train new machine-learning based algorithms, can contribute to deepen and perpetuate these disadvantages. There can be potentially many sources of bias, including platform affordances, written and unwritten norms, different demographics, and external events, among many others.

The study of algorithmic fairness can be understood as two interrelated efforts: first, to detect discriminatory situations and practices, and second, to mitigate discrimination. Detection is necessary for mitigation and hence a number of methodologies and metrics have been proposed to find and measure discrimination. As these methodologies and metrics multiply, comparing across works is becoming increasingly difficult.

We have created a new website, where we would like to collaborate with others to create benchmarks for algorithmic fairness. To start, we have implemented a number of basic and statistics measures in Python, and prepared several example datasets so the same measurements can be extracted across all of them.

We invite you to check the data and code available in this website, and let us know what do you think. We would love to hear your feedback:

Contact e-mail: Meike Zehlike, TU Berlin

Meike Zehlike, Carlos Castillo, Francesco Bonchi, Ricardo Baeza-Yates, Sara Hajian, Mohamed Megahed (2017): Fairness Measures: Datasets and software for detecting algorithmic discrimination.

The PhD Grind (2012) by Philip Guo is a must read if you want to do a PhD in Computer Science. Even if you're already doing one, I definitively recommend it. It's a first-person account of how it was for the author to complete a six-year PhD at Stanford. Basically all of the obstacles that the author had to overcome during his PhD, are things that either happen to all of us, or that I've heard many times from PhD students. Read this. It's a very short free book that articulates well some of the common negative experiences of PhD students -- and what you can learn from those experiences.

If you are already doing your PhD, specially if you're in your last years, read A PhD Is Not Enough (2nd ed, 2011) by Peter J. Feibelman. Again, it's a short book, that focuses on the transition from PhD student to tenure-track professor to tenure. It has tons of great concrete advise and tips.

Finally, a very comprehensive book is The Professor Is In (2015) by Karen Kelsky, which is really a career guide for academics. The author, a former tenured professor and department head, maintains a popular blog on the subject, and professional career counseling/coaching services. Her approach, as she admits openly, has neoliberal tones: this is a competition that you want to win, however, the author also gives a very reasonable justification as to why this is a good mindset to approach some key career steps. The book is a really detailed guide that covers basically every aspect of an academic career, starting with choosing a PhD advisor but going well into valuable tips for those holding a more advanced or tenured, position.

My advise: read all three, starting with the first one which is shorter.

Bonus: slides by José L. Balcázar on doing research, publishing, writing, defending, and applying for grants.

I am currently looking for students interested in pursuing a PhD in Information and Communications Technology at Universitat Pompeu Fabra in Barcelona, under my supervision, starting October 1st, 2017. My topics of interest are social computing, crisis informatics, news, and social media, plus all kinds of computing applications that address issues of social significance.

General information links:

You can apply using the above links. Your application has a higher chance of succeeding if accompanied by a recommendation letter from a potential advisor for your PhD thesis.

If you would like me to be your advisor, please fill this Expression of Interest (by August 4th, 2017) -- DEADLINE PASSED, FOLLOW ON LINKEDIN OR TWITTER FOR THE NEXT ONE.

The Data Transparency Lab has awarded our project "FA*IR: A tool for fair rankings in search" one of their grants for the year 2017. The grant will enable the development of an open source API implementing fair ranking methods within a widely-used search engine (Apache SOLR).

People search engines are increasingly common for job recruiting, for finding a freelancer, and even for finding companionship or friendship. As in similar cases, a top-k ranking algorithm is used to find the most suitable way of shortlisting and ordering the items (persons, in this case), considering that if the number of candidates matching a query is large, most users will not scan the entire list. Conventionally, these lists are ranked in descending order of some measure of the relative quality of items (e.g. years of experience or education, up-votes, or inferred attractiveness). Unsurprisingly, the results of these ranking and search algorithms potentially have an impact on the people who are ranked, and contribute to shaping the experience of everybody online and offline. Due to its high importance and impact, our aim is to develop the first fair open source search API. This fair ranking tool will enforce ranked group fairness, ensuring that all prefixes of the ranking have a fair share of items across the groups of interest, and ranked individual fairness, reducing the number of cases in which a less qualified or lower scoring item is placed above a more qualified or higher scoring item. We will create this fair search API by extending a popular, well-tested open source search engine: Apache Solr. We will develop this search API considering both the specific use case of people search, as well as considering a general-purpose search engine with fairness criteria. Taking a long-term view, we believe the use of this tool will be an important step towards achieving diversity and reducing inequality and discrimination in the online world, and consequently in society as a whole.

The DTL grant was awarded to Meike Zehlike (Technische Universität Berlin), Francesco Bonchi (ISI Foundation and Eurecat), Carlos Castillo (Eurecat), Sara Haijan (Eurecat), and Odej Kao (Technische Universität Berlin). Together with Ricardo Baeza-Yates (NTENT) and Mohammed Megahed (Technische Universität Berlin), we have been doing joint research on fair top-k ranking. Some of our results can be found on arXiv pre-print 1706.06368.

More details: DTL Grantees 2017 announced.

Good news is I got my accreditation as advanced researcher, which is the requirement to become a full professor in Catalonia. This is awarded by the AQU (Agency for the Quality of the University system).

More information:


Subscribe to RSS - Profession and academia