In this project, we study interdependent privacy risks, that is how some individuals can compromise the privacy of others (family members, friends, colleagues or even complete strangers), either directly – typically in the case where personal data involves multiple data subjects – or indirectly – typically when the personal data of individuals are correlated due to phenomena such as homophily or genetic inheritance. For the application domain, we focus mostly on (co-) location data, genomic data and photos. By relying on statistical inference techniques, we analyze to which extent sensitive information about individuals can be inferred from the data of other individuals; we also study the interplay between individuals’ decisions, in terms of information sharing and privacy behavior in general, by using game theory ( dataset). We also design and build technical (e.g., encryption) and non-technical (e.g., negotiation and mediation) solutions for sharing multi-subject and interdependent data (typically group photos and genomic data to name a few; video) consensually. In collaboration with jurists, we assess the protective capacity of privacy and data protection laws (in particular EU GDPR) for interdependent data. Finally, we develop an interactive tool for estimating kin genomic privacy: try it out below or on the original website! We also recently wrote a comprehensive survey and an encyclopedia entry on the topic.
Interdependent security for Web resources
In this project, we study the interdependent security risks that arise when a website relies on resources (e.g., scripts, downloads) stored on external servers (e.g., mirror, content delivery networks). More specifically, we study the security behavior of Internet users when downloading files from the Web and we develop automated tools to help them. We study the usability and effectiveness of one common integrity-verification mechanism: checksums. To do so, we conduct large-scale surveys and in situ experiments with eye tracking. We also develop a browser extension for Chrome to make checksum verifications automatic; try it out! ( demo, website). We conduct an experiment to study users’ download behaviors and their exposition/reaction to checksums “in the wild”. Finally, we focus on another integrity-verification mechanism (for scripts and stylesheets): subresource integrity (SRI), a recommendation from the W3C and adopted in most browsers. We perform a large-scale longitudinal analysis of the Web (based on the CommonCrawl dataset) to measure its adoption and its usage. We also conduct a survey of web developers to assess their knowledge and understanding of SRI and of the issues it addresses.
Coverage: The morning paper.
Privacy/utility trade-off in location-based and wearable-based services
In this project, we study the location-privacy implications of location-based services, in particular when auxiliary information is available (e.g., semantic information about the visited places: for instance “hotel”, and co-locations of users: for instance Alice and Bob appear together on a photo or have the same IP address). In addition, we study the effect of privacy protection mechanisms, including generalization (e.g., replacing the exact coordinates of a location with the street name or replacing the precise semantic information of a location with a less precise version: for instance “hotel” becomes “travel place”), on the users’ privacy and perceived utility. To do so, we rely on statistical inference and machine learning techniques, which we apply to data collected through guided surveys and data collection campaigns involving real users ( dataset). The ultimate goal of this project is to build accurate privacy and utility models (which depend on the accuracy of the disclosed information) for exploring and optimizing the privacy/utility trade-off in location based systems. Recently, we started investigating the privacy implications of the use of wearable devices (typically wristbands) and in particular to which extent users of such devices can be (psychologically) profiled.
In this project, we study to which extent genomic databases can be de-anonymized, by exploiting knowledge about phenotypic traits of the users whose genomes appear in the database (e.g., blood type, eye and hair color) and statistical relationships between genomic and phenotypic information (e.g., the probability that an individual has blue eyes given that the value of SNP rs1800407 in her genome is GG). To do so, we design and implement a de-anonymization attack by relying on a standard maximum-weight matching algorithm executed on the genotypes-phenotypes compatibility (bi-partite) graph, and we evaluate it on a large dataset from the OpenSNP platform. We also study possible countermeasures. We develop an interactive tool for estimating kin genomic privacy based only on the family tree of the target individual and on the list of relatives whose genomes are know (e.g., because they used a direct-to-consumer genetic testing service such as 23andme): try it out below or on the original website!
Estimate your kin genomic privacy! (1) build your/a family tree, (2) indicate the individuals whose genomes might have used a genetic-testing service (e.g., 23andMe), (3) indicate the “target”, you or any other family member, whose genomic privacy you want to estimate, and (4) observe the target’s genomic privacy score indicated in the bar on the right.
Past projects (most recent)
Privacy and security of trajectory-based online services
In this project, we design and build a system that enables mobile users to prove to a third party certain aggregated properties about the routes they take (e.g., covered distance) without disclosing the routes themselves. Our system relies on existing Wi-Fi access point infrastructures and involves cryptographic techniques; it has direct applications in activity-based social networks (e.g., GarminConnect, RunKeeper) and location-based activity-tracking that is performed by health insurance companies (link) as a user can prove she covered a given distance and a given elevation gain without disclosing where she carried out her physical activity. We evaluate our system by using large datasets of Wi-Fi access point locations and location-based activities GPS traces. We also design and build private and secure ridehailing/ridesharing systems; check it out!
Automatic and dynamic information sharing
In this project, we study the factors that influence how mobile users to share information, such as their locations, with their friends and with service providers (through mobile apps, for instance). Through targeted user-surveys and field experiments, we collect data about users’ sharing decisions (grant, deny, obfuscate–i.e., share a less precise version of the information) to gain insights into the users’ decision process. We also evaluate the potential of (semi-)automatic decision making for information sharing using machine-learning techniques. We design and implement a system to predict users’ decisions, based on a number of contextual features including location and time; if the system is sufficiently confident, it makes the decision on behalf of the users, otherwise, the user is asked to manually make the decision, thus the system is dynamically trained. We apply our approach to instant messaging and permissions for mobile apps. We ran a field experiment for our new permission system for Android ( dataset, website); check it out!
Coverage: [fr] magazine of the fédération romande des consommateurs (frc, consumer association).
Mobile app privacy
In this project, we study the privacy threats related to the access to the list of installed app on mobile devices. We also design and implement HideMyApp (HMA), an effective and practical solution for hiding the presence of (sensitive) mobile apps from nosy apps. HMA relies on a combination of virtualization and obfuscation techniques based on container apps. ( demo, website). check it out!
Efficient and Transparent Wi-Fi Offloading
In this project, we design and build a system that enables mobile users to offload their upload tasks (e.g., photos and videos to be uploaded on Facebook while on the go) on Wi-Fi access points at full speed, that is at the speed of the Wi-Fi communication, not that of the broadband connection of the access point that often constitutes a bottleneck. To do so, our system relies on the storage and processing capabilities of common devices located on the access point LAN (e.g., NAS, set-top boxes, gateways). Our system operates seamlessly on HTTP(S) POSTs, making it highly generic and widely applicable; also, it requires only limited software changes on the access points and on the target web servers, and none to existing protocols or browsers. We evaluate our system by using a large dataset of Wi-Fi access point locations.
Social-Aware Data Replication for Geo-Distributed Social Networks
In this project, we design and build a system that flattens the replication traffic (hence the traffic costs as traffic is usually charged based on peak use, i.e., so-called burstable billing) between the geo-distributed datacenters operated by a social network provider, and it minimizes the inconsistency perceived by the users. To do so, our systems exploits information about the social ties between the social-network users and the information about user time-zones and user-activity patterns in order to delay the replication of updates that are not likely to be read in the coming hours (e.g., the replication–to a datacenter in Europe–of an update from a user who has no or very few friends in Europe, the replication of an update to a datacenter in Europe at 3am, as most European users are not connected to the social network at this time of the day). We evaluate our system by using a large dataset from Twitter.
Trajectory Prediction in Multi-Player Online Games
In this project, we build a method for predicting short-term trajectories of avatars in multi-player online games (incl. role-playing games and first-person shooter games). Such prediction techniques are key in multiplayer online games as they allow to reduce the position-update traffic; this is particularly important in decentralized peer-to-peer architectures. Our method incorporates, through a physics-inspired force model, semantic information about the game environment, such as neighboring items and points of interest, and the states of the avatars (e.g., health points). We evaluate our method by using datasets of avatar trajectories from Quake III and World of Warcraft.