Datasets

Runtime mobile permissions dataset

(Android permissions)

This dataset contains runtime permission decisions made on mobile phones running Android, in the context of the SmarPer project. As a first step, we focused on a few permissions (i.e., location, contacts, and storage) for several popular applications (i.e., Facebook, Twitter, Instagram, WhatsApp, Viber, Skype, Snapchat, The Weather Channel, and AccuWeather). Specifically, the dataset contains user runtime permission decisions (allow, deny or obfuscate—obfuscation is data dependent, see the project website for more details) together with contextual information including:

  • Participant ID (i.e., a random pseudonym assigned to each participant, e.g., A0G811C)
  • Type of permission request (i.e., location, contacts or storage)
  • Android API method used for the request (e.g., requestLocationUpdates)
  • App name (e.g., Facebook)
  • Whether the app was in the foreground (i.e., the participant was using the app)
  • Whether denying the request could cause the app to crash
  • Day of month (e.g., 21)
  • Type of place indicated by the participant (i.e., home, work, in transit, other)
  • Whether the screen was locked
  • Battery level (e.g., 50%)
  • Network connection type (e.g., Wi-Fi, cellular).

The data was collected in mid 2016, from about 40 active Android users (totaling 8k+ runtime decisions). The participants were provided with rooted Nexus 5 phones (Android v5.1.1), which they used as their main phones (configured with their SIM card and their Google account) for at least ten days. The data collection was performed through the SmarPer app which is based on the Xposed framework/XPrivacy (v3.6.19). The participants also had to fill up an entry survey and an exit survey.

Please e-mail smarper@epfl.ch to obtain the dataset. The data is in the CSV format.

Please cite the following article if you use our dataset in your research:

Retrieving publications ajax-loader

User (co-)location sharing/viewing preferences dataset

(Facebook usage)

This dataset contains survey participants’ responses to three questions used to quantify the relative benefits of sharing/viewing location and co-location information on Facebook, and the associated relative costs in terms of location privacy. Specifically, we assessed the participants’ preferences regarding, respectively, (1) sharing vs. viewing posts with location information (i.e., check-in posts), (2) sharing posts with location information vs. sharing posts with colocation information, and (3) location privacy vs. benefits of sharing location information. We designed these survey questions by following a rigorous full-profile conjoint analysis approach. The datasets also contains general information about the participants’ Facebook general usage and location and co-location sharing on Facebook.

The data was collected in early 2016, from 250 active Facebook users recruited via the Amazon Mechanical Turk platform, through an online survey. The survey participants were asked to rank by preference a number of scenarios in which posts were removed from Facebook (e.g., “two of your recent posts are kept and one of your friend’s recent posts is kept”, “none of your recent posts is kept and one of your friend’s recent posts is kept”). Preference factors can be extracted from the responses.

Please e-mail kevin.huguenin@unil.ch to obtain the dataset. The dataset is in the csv format.

Please cite the following article if you use our dataset in your research:

Retrieving publications ajax-loader

Utility dataset

(Foursquare check-ins)

This dataset contains self-reported utility ratings, on a scale from 1 to 5, for obfuscated location check-ins (both at the geographic and semantic level). Each check-in is described by 14 attributes including the users’ age and gender, the types of venue, and the dates and times.

The data was collected in early 2015, from about 75 active Foursquare users recruited via the Amazon Mechanical Turk platform, through a personalized online survey. The survey participants were presented with some of their own past Foursquare check-ins; for each check-in, they were asked to rate on a scale from 1 to 5 the perceived utility of obfuscated versions of the check-ins (e.g., “hotel, on Dearborn St. (Chicago 60654, IL, USA)”, “Travel & transport place, on Dearborn St. (Chicago 60654, IL, USA)”, “hotel, in Chicago (IL, USA)”, and “travel & transport place, in Chicago (IL, USA)”).

Please e-mail kevin.huguenin@unil.ch to obtain the dataset. The dataset is in the Attribute-Relation File Format (arff) used by Weka.

Please cite the following article if you use our dataset in your research:

Retrieving publications ajax-loader

Community hotspots dataset

(FON Wi-Fi access points)

This dataset contains the locations (geographic coordinates) and IDs of Wi-Fi access points from the FON community network in Europe. FON is a large community network with over 18 million hotspots worldwide (e.g., in Belgium, France, Germany, Italy, Netherlands, Portugal, and in the UK, thanks to strategic partnerships with leader national ISPs). The access points composing the FON network are mostly routers and set-top boxes provided and operated by the ISPs. The data was collected in early 2013.

Please e-mail kevin.huguenin@unil.ch to obtain the dataset.

Please cite one of the following articles if you use our dataset in your research:

Retrieving publications ajax-loader

Note: If you are looking for a dense Wi-Fi access point dataset in the US, I recommend using the locations of the new LinkNYC network in New York (note that although the locations of the so-called Links are not all known yet, the locations of payphones, which the Links will replace, can be obtained from the NYC OpenData website).