⚠️ This website has moved to chato.cl/life

Profession and academia

Java Developer sought for developing Open Source ElasticSearch plug-ins


Skills required:

  1. Java development experience of 3+ years.
  2. Familiarity with search engines such as Apache SOLR or ElasticSearch, a significant plus.
  3. Experience in a research environment, a plus.

Description:

  • Our research team has been awarded a prestigious grant from the Data Transparency Lab. The grant is for "FA*IR: A tool for fair rankings in search," which is a new ranking method proposed by our team to avoid discrimination by gender, race, or other protected characteristics. The team includes researchers from Universitat Pompeu Fabra in Barcelona (Dr. C. Castillo, Dr. R. Baeza-Yates), TU Berlin (Mrs. M. Zehlike), NTENT Hispania (Dr. R. Baeza-Yates, Dr. Sara Hajian), and ISI Torino (Dr. F. Bonchi).
  • Within this grant, we are searching for a Java developer for writing a series of plug-ins for ElasticSearch (or alternatively, for SOLR) and interact with our research team. The plug-ins will implement re-ranking strategies for queries in which the documents correspond to descriptions of people (e.g., resumes). We have two groups of plug-ins that will implement algorithms parametrized by a configuration file.
    1. The first group of plug-ins will implement a series of criteria that must be fulfilled by every response to a query (e.g., that for every query, the resulting list of documents must contain a minimum proportion of women in the first positions). These criteria will be based on the paper by Zehlike et al. 2017 at CIKM 2017.
    2. The second group of plug-ins will implement a learning-to-rank re-ranking strategy. They will receive a set of training documents, in which the ranking has been manually established, and will learn how to rank new, unseen documents, based on these training documents and criteria of fairness to be established during the research.
  • In both cases, the plug-ins should not be detrimental to the performance of the search engine, i.e., at most a small extra latency can be incurred. We expect that efficient fair ranking plug-ins will be a significant contribution to ElasticSearch, and given that they will be released as Open Source software, they will have significant impact in the huge user base of ElasticSearch.

Location:

  • The developer will meet a team member once per week to report progress. Ideally at least half of the meetings must be in person, the other half can be remote. The team is based in Barcelona and Berlin, so the developer should be able to attend the in-person meetings in one of these cities. A developer based in the Barcelona or Berlin area will be preferred, while a developer located elsewhere is also acceptable.

Timing:

  • The project will start in February or March and end in July 2018 (5-6 months). The first group of plug-ins can be implemented immediately. The second group can be implemented from April'18, as the research of the research team progresses.
  • Bids will be reviewed from February 1st, 2018 and reviewed until a suitable developer is found.

What we offer:

  • Interaction with a team of international researchers.
  • Working on an application for social good, to mitigate or remove discrimination.
  • Contributing to Open Source software.

How to bid:

  • Questions may be asked by e-mail to Carlos Castillo carlos.castillo@upf.edu; please include the word "FA*IR" in the subject.
  • To bid, use this form.
    1. Include your CV with 2-3 recent relevant projects and your role on them
    2. Include your bid consisting of a work plan consisting of 2-3 phases for the project, the estimated number of work hours and timeline for each phase, and the cost of each phase. After the completion of each phase, a payment will be issued.
  • Contracting will be done directly between the developer and the Technical University of Berlin.

Salary and expenses: the total project cost should not exceed 24,000€ -- including all applicable taxes or deductions.

To bid, use this form

Fairness-Measures.org: a new resource of data and code for algorithmic fairness

Decisions that are partially or completely based on the analysis of large datasets are becoming more common every day. Data-driven decisions can bring multiple benefits, including increased efficiency and scale. Decisions made by algorithms and based on data also carry an implicit promise of "neutrality." However, this supposed algorithmic neutrality has been brought into question by both researchers and practitioners.

Algorithms are not really "neutral." They embody many design choices, and in the case of data-driven algorithms, include decisions about which datasets to use and how to use them. One particular area of concern are datasets containing patterns of past and present discrimination against disadvantaged groups, such as hiring decisions made in the past and containing subtle or not-so-subtle discriminatory practices against women or minority races, to name just two main concerns. These datasets, when used to train new machine-learning based algorithms, can contribute to deepen and perpetuate these disadvantages. There can be potentially many sources of bias, including platform affordances, written and unwritten norms, different demographics, and external events, among many others.

The study of algorithmic fairness can be understood as two interrelated efforts: first, to detect discriminatory situations and practices, and second, to mitigate discrimination. Detection is necessary for mitigation and hence a number of methodologies and metrics have been proposed to find and measure discrimination. As these methodologies and metrics multiply, comparing across works is becoming increasingly difficult.

We have created a new website, where we would like to collaborate with others to create benchmarks for algorithmic fairness. To start, we have implemented a number of basic and statistics measures in Python, and prepared several example datasets so the same measurements can be extracted across all of them.

We invite you to check the data and code available in this website, and let us know what do you think. We would love to hear your feedback: http://fairness-measures.org/.

Contact e-mail: Meike Zehlike, TU Berlin meike.zehlike@tu-berlin.de.

Meike Zehlike, Carlos Castillo, Francesco Bonchi, Ricardo Baeza-Yates, Sara Hajian, Mohamed Megahed (2017): Fairness Measures: Datasets and software for detecting algorithmic discrimination. http://fairness-measures.org/

A book you MUST read if you want to do a PhD in Computer Science ... and more

The PhD Grind (2012) by Philip Guo is a must read if you want to do a PhD in Computer Science. Even if you're already doing one, I definitively recommend it. It's a first-person account of how it was for the author to complete a six-year PhD at Stanford. Basically all of the obstacles that the author had to overcome during his PhD, are things that either happen to all of us, or that I've heard many times from PhD students. Read this. It's a very short free book that articulates well some of the common negative experiences of PhD students -- and what you can learn from those experiences.

If you are already doing your PhD, specially if you're in your last years, read A PhD Is Not Enough (2nd ed, 2011) by Peter J. Feibelman. Again, it's a short book, that focuses on the transition from PhD student to tenure-track professor to tenure. It has tons of great concrete advise and tips.

Finally, a very comprehensive book is The Professor Is In (2015) by Karen Kelsky, which is really a career guide for academics. The author, a former tenured professor and department head, maintains a popular blog on the subject, and professional career counseling/coaching services. Her approach, as she admits openly, has neoliberal tones: this is a competition that you want to win, however, the author also gives a very reasonable justification as to why this is a good mindset to approach some key career steps. The book is a really detailed guide that covers basically every aspect of an academic career, starting with choosing a PhD advisor but going well into valuable tips for those holding a more advanced or tenured, position.

My advise: read all three, starting with the first one which is shorter.

Bonus: slides by José L. Balcázar on doing research, publishing, writing, defending, and applying for grants.

Data Transparency Lab names our project on fair rankings as one of their grantees for 2017

The Data Transparency Lab has awarded our project "FA*IR: A tool for fair rankings in search" one of their grants for the year 2017. The grant will enable the development of an open source API implementing fair ranking methods within a widely-used search engine (Apache SOLR).

People search engines are increasingly common for job recruiting, for finding a freelancer, and even for finding companionship or friendship. As in similar cases, a top-k ranking algorithm is used to find the most suitable way of shortlisting and ordering the items (persons, in this case), considering that if the number of candidates matching a query is large, most users will not scan the entire list. Conventionally, these lists are ranked in descending order of some measure of the relative quality of items (e.g. years of experience or education, up-votes, or inferred attractiveness). Unsurprisingly, the results of these ranking and search algorithms potentially have an impact on the people who are ranked, and contribute to shaping the experience of everybody online and offline. Due to its high importance and impact, our aim is to develop the first fair open source search API. This fair ranking tool will enforce ranked group fairness, ensuring that all prefixes of the ranking have a fair share of items across the groups of interest, and ranked individual fairness, reducing the number of cases in which a less qualified or lower scoring item is placed above a more qualified or higher scoring item. We will create this fair search API by extending a popular, well-tested open source search engine: Apache Solr. We will develop this search API considering both the specific use case of people search, as well as considering a general-purpose search engine with fairness criteria. Taking a long-term view, we believe the use of this tool will be an important step towards achieving diversity and reducing inequality and discrimination in the online world, and consequently in society as a whole.

The DTL grant was awarded to Meike Zehlike (Technische Universität Berlin), Francesco Bonchi (ISI Foundation and Eurecat), Carlos Castillo (Eurecat), Sara Haijan (Eurecat), and Odej Kao (Technische Universität Berlin). Together with Ricardo Baeza-Yates (NTENT) and Mohammed Megahed (Technische Universität Berlin), we have been doing joint research on fair top-k ranking. Some of our results can be found on arXiv pre-print 1706.06368.

More details: DTL Grantees 2017 announced.

Pages

Subscribe to RSS - Profession and academia