In this project we study interdependent privacy risks, that is how some individuals can compromise the privacy of others (family members, friends, colleagues or even complete strangers), either directly–typically in the case where personal data involves multiple data subject–or indirectly–typically when the personal data of individuals are correlated due to phenomena such as homophily or genetic inheritance. For the application domain, we focus mostly on (co-) location data, genomic data and photos. By relying on statistical inference techniques, we analyze to which extent sensitive information about individuals can be inferred from the data of other individuals; we also study the interplay between individuals’ decisions, in terms of information sharing and privacy behavior in general, by using game theory. We also design and build cryptography-based solutions for sharing multi-subject and interdependent data (typically group photos and genomic data to name a few). Finally, in collaboration with jurists, we assess the protective capacity of privacy and data protection laws (in particular EU GDPR) for interdependent data.
Analysis of the privacy/utility trade-off in location-based services.
In this project, we study the location-privacy implications of location-based services, in particular when auxiliary information is available (e.g., semantic information about the visited places: for instance “hotel”, and co-locations of users: for instance Alice and Bob appear together on a photo or have the same IP address). In addition, we study the effect of privacy protection mechanisms, including generalization (e.g., replacing the exact coordinates of a location with the street name or replacing the precise semantic information of a location with a less precise version: for instance “hotel” becomes “travel place”), on the users’ privacy and perceived utility. To do so, we rely on statistical inference and machine learning techniques, which we apply to data collected through guided surveys and data collection campaigns involving real users. The ultimate goal of this project is to build accurate privacy and utility models (which depend on the accuracy of the disclosed information) for exploring and optimizing the privacy/utility trade-off in location based systems.
Automatic and dynamic information sharing.
In this project, we study the factors that influence how mobile users to share information, such as their locations, with their friends and with service providers (through mobile apps, for instance). Through targeted user-surveys and field experiments, we collect data about users’ sharing decisions (grant, deny, obfuscate–i.e., share a less precise version of the information) to gain insights into the users’ decision process. We also evaluate the potential of (semi-)automatic decision making for information sharing using machine-learning techniques. We design and implement a system to predict users’ decisions, based on a number of contextual features including location and time; if the system is sufficiently confident, it makes the decision on behalf of the users, otherwise, the user is asked to manually make the decision, thus the system is dynamically trained. We apply our approach to instant messaging and permissions for mobile apps. We are currently running a field experiment for our new permission system for Android, feel free to try it out!
Press coverage: [fr] magazine of the fédération romande des consommateurs (frc, consumer association in the French-speaking part of Switzerland).
Privacy and security of trajectory-based online services.
In this project, we design and build a system that enables mobile users to prove to a third party certain aggregated properties about the routes they take (e.g., covered distance) without disclosing the routes themselves. Our system relies on existing Wi-Fi access point infrastructures and involves cryptographic techniques; it has direct applications in activity-based social networks (e.g., GarminConnect, RunKeeper) and location-based activity-tracking that is performed by health insurance companies (link) as a user can prove she covered a given distance and a given elevation gain without disclosing where she carried out her physical activity. We evaluate our system by using large datasets of Wi-Fi access point locations and location-based activities GPS traces. We also design and build private and secure ridehailing/ridesharing systems.
Press coverage: Wired.
De-anonymization of genomic databases.
In this project, we study to which extent genomic databases can be de-anonymized, by exploiting knowledge about phenotypic traits of the users whose genomes appear in the database (e.g., blood type, eye and hair color) and statistical relationships between genomic and phenotypic information (e.g., the probability that an individual has blue eyes given that the value of SNP rs1800407 in her genome is GG). To do so, we design and implement a de-anonymization attack by relying on a standard maximum-weight matching algorithm executed on the genotypes-phenotypes compatibility (bi-partite) graph, and we evaluate it on a large dataset from the OpenSNP platform. We also study possible countermeasures.
Past projects (most recent)
Efficient and Transparent Wi-Fi Offloading
In this project, we design and build a system that enables mobile users to offload their upload tasks (e.g., photos and videos to be uploaded on Facebook while on the go) on Wi-Fi access points at full speed, that is at the speed of the Wi-Fi communication, not that of the broadband connection of the access point that often constitutes a bottleneck. To do so, our system relies on the storage and processing capabilities of common devices located on the access point LAN (e.g., NAS, set-top boxes, gateways). Our system operates seamlessly on HTTP(S) POSTs, making it highly generic and widely applicable; also, it requires only limited software changes on the access points and on the target web servers, and none to existing protocols or browsers. We evaluate our system by using a large dataset of Wi-Fi access point locations.
Social-Aware Data Replication for Geo-Distributed Social Networks
In this project, we design and build a system that flattens the replication traffic (hence the traffic costs as traffic is usually charged based on peak use, i.e., so-called burstable billing) between the geo-distributed datacenters operated by a social network provider, and it minimizes the inconsistency perceived by the users. To do so, our systems exploits information about the social ties between the social-network users and the information about user time-zones and user-activity patterns in order to delay the replication of updates that are not likely to be read in the coming hours (e.g., the replication–to a datacenter in Europe–of an update from a user who has no or very few friends in Europe, the replication of an update to a datacenter in Europe at 3am, as most European users are not connected to the social network at this time of the day). We evaluate our system by using a large dataset from Twitter.
Trajectory Prediction in Multi-Player Online Games
In this project, we build a method for predicting short-term trajectories of avatars in multi-player online games (incl. role-playing games and first-person shooter games). Such prediction techniques are key in multiplayer online games as they allow to reduce the position-update traffic; this is particularly important in decentralized peer-to-peer architectures. Our method incorporates, through a physics-inspired force model, semantic information about the game environment, such as neighboring items and points of interest, and the states of the avatars (e.g., health points). We evaluate our method by using datasets of avatar trajectories from Quake III and World of Warcraft.