Ideas for Experimenting With Using Data

The Packard Foundation's Visiting Scholar Lucy Bernholz, along with program analyst Katherine Murtha, are exploring how foundations and nonprofits can take advantage of big data to enhance their work--while attending to the concerns about rights and good governance that meaningful participation in digital society creates. They are tracking some of their thoughts and ideas here. Please contribute your own experiments, thoughts, questions, concerns, and challenges below. We'd love to hear from you!

Conservation and Science

Supporting the Development of Infrastructure to Analyze (and Store) Data
Scientists could find cures or insights for diseases like allergies, obesity and Crohn’s by using a big data process, with computers analyzing raw data on the genomes of the microbes that live in the human body. Many biologists feel that there is not enough funding for infrastructure for computers that can handle these processes.

In 2012, NIH started the Big Data “to Knowledge Initiative” BD2K to promote big data usage in science. Examples of projects are Earth Microbiome Project and the Human Microbiome Project. The EU is funding a project to create a model of the human brain.

Big Data Techniques and Tools to Accelerate Progress in STEM
In 2012, the National Science Foundation issued a solicitation for proposals to develop technologies and techniques to use big data for scientific discovery.
In October 2012, NSF made $15M in grants for big data research projects to innovate and “accelerate progress” in STEM fields. Grantees were:
  1. At Rutgers, scientists sought to find a new way of structuring data so users don’t have to sacrifice speed for data freshness among other tradeoffs that cause bottlenecks.
  2. Researchers at UNC Chapel Hill developed DataBridge to publish, store and search for datasets
  3. At University of Washington, scientists will develop “a formal foundation for big data management,” that is, an open-source software to analyze big data.
  4. Brown University will work on testing math and statistics techniques to deal with “noisy” data, especially molecular biology/cancer genome data.
  5. Carnegie Mellon researchers are developing a way for machines to learn and analyze big data.
  6. Researchers will use big data to make it easier to sequence DNA.
  7. Language processing scholars will develop theories to use big data
  8. Scientists will use big data to make it easier to search text repositories

Cellphone Sensors Track Parkinson's Disease
Using cellphones’ sensors, Kaggle was able to collect large amounts of data on Parkinsons’ patients passively, that is, without interrupting the patients everyday lives or burdening them with self-reporting requirements. This type of technology enables doctors to prescribe the correct dosages in response to patient conditions. The Michael J. Fox Foundation awarded Lionsolver a $10,000 in the Parkinson’s Data Challenge, which encouraged empirical studies using data collected by Kaggle. Lionsolver machine learning used the phones’ accelerometer data to tell when phone owners were having tremors

Population and Reproductive Health

Children, Families, and Communities

Detect Needs With Cellphone Sensors
Perhaps policy analysts could use cellphone sensors to identify where kids are not playing outside because of gang violence or where girls are not playing enough because they lack girls empowerment.
Collecting text metadata to learn about patterns in where teens go. Content analysis of text messages

Open data and transparency can facilitate creating a digital clearinghouse: Design Lab to rapidly prototype innovations in education. Another example is the Find What Works clearinghouse on best practices in education .

Food & Shelter


Track Audience Diversity
Use of big data could enable researchers to tease out more trends and make more nuanced findings about the audience for the arts.
Building off the Theatre Bay Area's Arts Diversity Index, which looked at audiences of arts programming by their age, race, political leanings etc, a new report might include more granular data or more details. Or it may make it easier for other counties to do the same type of analysis. Other reports on art's economic impact and on local arts engagement could also benefit from richer and bigger datasets.

Organizational Effectiveness and Philanthropy

Hone a Foundation's Giving Approach
KUITY uses sentiment analysis to find which interventions are effective. It also provides the infrastructure for “ongoing, organization-wide, analytical processing and data visualization capabilities including, dashboards, key performance indicators (KPIs), scorecards, adaptive reports, and data mining and analytics.” Questions KUITY asks involve before and after measures of the grantee (and comparison organizations) and measuring whether the grantee made its intended impact in the field, benchmarking against other foundations and investors on program and operations,

Help Grantees Share Information
In 2013 the Packard Foundation gave a $100,000 grant to Zago develop and launch a system for sharing information within the Fisheries subprogram and externally with partners. Packard Foundaiton gave two other grants to Zago in 2012.

Use Big Data to Inform Policy
For the UN Office of the Secretary General, Zago LLC created a platform, Global Pulse, to collect open data from countries around the world in real time.

Use Big Data to Improve Grantees' Communications/Branding
Zago LLC can help grantees with branding and advocacy, as well as use big data for initiatives. For example, it has pushed messages to be heard during UN conventions about climate change and women’s rights. Zago helped Human Rights Watch change its communications strategy and share its report with a wider audience by creating a website with multimedia that can be viewed on computers, phones, and tablets.

Program-Related Investments

Passive data collection could make it easier to measure impact

Using Big Data to See Trends More Easily

Reverse Hypothesis Testing

Collecting data and then using powerful technology to identify trends that otherwise wouldn't have been visible enables reverse hypothesis testing. For example, a nonprofit organization called DoSomething set up a text line to inform teens about ways they could volunteer. When the text data revealed that teens were texting for help with various crises, the organization started a Crisis Text Line in response.

Tools for Spotting Trends

Kimonify is a tool that takes unstructured data and makes it structured. Fusion Tables is an easy way to search for, collect, and mesh multiple datasets.
National Center on Missing Children
Palantir video - tells about bringing multiple data sets together. Useful - addresses multiple types of data that go into "big" data

Improving Impact Evaluation

Matching Pairs: The Nurse Family Partnership used big data to evaluate impact of its interventions. NFP is a nonprofit that provides care through two years of site visits to at-risk pregnant women, new mothers, and infants. The Nurse Family Partnership leveraged big data and statistics by taking on summer fellows from Data Science for Social Good. In order for the Partnership to assess the impact of the visit interventions and to determine what would have happened to their clients but for the intervention, it needed a large-scale study comparing clients to very similar women who did not receive the intervention. The Fellows used the vast consumer databases to identify the closest matches for the clients, accomplishing in a summer what would have taken NFP ten months to do otherwise. Rayid Ghani's Edgeflip supported Nurse Family Partnership's effort.

DataKind reported that the World Bank assembled a team of stats experts to combine data on poverty with visualizations to see if they could use the amount of light in an area as an accurate predictor of poverty - thus providing a shortcut to determining where to implement anti-poverty initiatives.

Using Big Data Data as literacy training: Khan Academy learning about its own materials via data on use.

Using Big Data to Guide Policy

Price Data in Kenya
DataKind reported that the World Bank organized data experts to use innovative technology to collect prices in real time to measure inflation to guide Kenya's monetary policy. Accurately measuring inflation was and is crucial for determining the optimal interest rate that would enable people to put food on their tables and to keep the economy running.

Map of Toilets in Slums
A public health student led a team in mapping the toilets in slums in India,where previously there had been no concrete numbers or information. Government had sought to ignore problem. They published their findings on an open-access Google Map information about where there were no toilets, or where there were only toilets of poor quality. This enabled policy to be shaped around improving sanitation.

Disaster response
As the United Nations Office for Coordination of Humanitarian Affairs OCHOA reported, the Digital Humanitarian Network used social media data to map what areas in Philippines were hit hardest by Typhoon Pablo/Typhoon Bopha in 2012, in order to help aid organizations focus their efforts.
International Peace Institute: “OCHA Publication Launch: Humanitarianism in the Network Age.” A new report discusses using big data and open data to help during crises (like using social media to find victims after a natural disaster). Experts suggest making sure to work with the community rather than prescribing solutions with big data.

HHI/SSP and Sudanese violence
Harvard Humanitarian Project working with Satellite imagery data and on the ground volunteers to document violations of Sudanese peace agreements.

Neighborhood mapping of slums for public policy change

Finding missing people – big data SIG alerts

Alliance: “Data for good.” Talks through examples of big data and how it can help enterprises and nonprofits. WASH leverages big data about water access and sanitation to share best practices.


Impact Investment standards

COOP Metrics
CoopMetrics is a company that gathers information from its member companies and uses it to create tools that (among other services) help members benchmark their progress against their peers. The goal is to provide the same sophisticated data and trend information, which Fortune 500 companies use, to local independent businesses to support a thriving local sector and help small businesses compete with big corporations.

Using Open Data and Big Data to Connect Beneficiaries and Funders

Feedback and beneficiary voice

Somalia Speaks
A nonprofit texted Somalis, in areas controlled by al Qaeda, asking how they were affected by the famine. The nonprofit mapped their text replies to inform and guide the international response to the crisis.

Haiti Sentiment Analysis
After the 2010 earthquake in Haiti, many aid organizations went to assist Haitians. By conducting sentiment analysis on texts from people in Haiti, it was possible to get a better sense of how the affected population was doing and to score the performance of the aid groups.


Faster sharing of grants data
Reporting Commitment, RSS Feeds, Website interactivity

Quantifying Externalities
The Sustainability Accounting Standards Board™ is a new shared metrics and accountability standards effort to create industry accounting practices that put value on externalities. Typical cost effectiveness analysis considers the monetary costs of inputs and outputs but doesn't take into account non-monetized costs (or benefits). SASB improved economic efficiency by enabling both private and social cost to be reflected in cost-benefit analysis. For example, a company deciding whether to launch a new pharmaceutical project might consider environmental effects of production in addition to the usual considerations.

Info for donors/nonprofits
Place2give improved efficiency by linking donors to causes best matched to their interests. It is a data intermediary that facilitated customized donations. Donors use the Place2Give search engine to find which charities they should donate to, based on much more data than usually available . Compiles information from many sources and uses algorithms to parse and group charities. Compiling data on grants and donations from multiple sources (databases, e.g.) enabled Ajah to create a database that helps nonprofits identify potential funding sources among government and corporate donors that will be the best match. A Globe and Mail article discussed Ajah's process and results.

Supply/Demand Tracking
World Bank organized data experts to use innovative technology to collect prices which informed monetary policy and improved efficiency.

Rayid Ghani, the Obama for America ‘12 data science team lead (he’s also an Accenture Technology Labs scientist) started Edgeflip. Example projects: tracking abandoned properties for Cook County, tracking bike share demand and supply, “optimizing public transportation routes and schedules, optimizing garbage collection; working with emergency rooms; predicting crime; and… helping the Nurse Family Partnership measure the effectiveness of its program that provides guidance to at-risk first-time mothers.”

Intersection of Open Data and Philanthropy

Crowdfunded medical nonprofit Watsi is 100% transparent about its financials and posts updates daily to its Google document.:
Nominet Trust “Open Data and Charities.” Discusses how can charities use open data.

Funding others to open their data as part of a Theory of Change

Opportunity in Data-Sharing Platform
Rayid Ghani, who started a big data for nonprofits organization, says it’s difficult to share data in different systems and a large foundation could or funder collaborative could invest in building a legitimate data-sharing platform.

Nighttime Satellite Maps
Collective strategy mapping

HRW and Syrian chemical weapons
Human Rights Watch (HRW) used open data from googlemaps, youtube, sketchup, twitter feeds, and blogs to reconstruct the path of chemical weapons missiles and determine who deployed them. HRW relies on 1.5 full time employees in their Emergencies Division, plus on the ground volunteers, to do this kind of data collection and analysis.
why this is useful: Good example of using openly ... visual data; mix of professional and volunteers to analyze; independence of NGOs

Black Male Achievement BMAFunders
Open Society Foundations and Foundation Center project, BMAFunders, is “a go-to source for data and information related to black male achievement;” created “interactive mapping tool with funding data, a timeline of philanthropic milestones in the field, a toolkit for assessing project outcomes, a comprehensive collection of research reports, descriptive case studies of work on the ground, and multi-media content.”

China Foundation Center
China Foundation Center leapfrogs other transparency efforts that have legacy data in analog form with its Foundation Transparency Index

Openness Improving Democracy

Civic Technology

Knight Foundation published report on “civic tech;” that is, use of technology to engage people in the local government and community. Knight Foundation reports that examples of organizations using civic tech fall into two categories or clusters --
Open Government Socrata and placr (making government data accessible, improving transparency and accountability), AlertID and mySociety (makes data useable – people can improve provision of public services), Localocracy and OurSay (get people involved in the decisions facing elected leaders), SeeClickFix and PublicStuff (resident can give feedback about services), Azavea and PublicEngines (allows users to map and visualize information), TurboVote and Votizen (voter participation).
Community Action and Citizinvestor (crowdfunding and peer-to-peer lending for projects that benefit the community), community organizing through and BangTheTable, crowdsourcing information regarding civic issues using waze and noiseTube, Nextdoor and front porch forum are neighborhood forums, lyft and carshare are two examples of peer to peer sharing (helps build sense of community).
More information on the Knight Foundation’s blog.

Crowdsourcing information regarding civic issues using waze and noiseTube also elevates democracy

Innovate SF: Mayor Lee launched this new clearinghouse for the public to get access to databases and government data on a wide array of issues including crime.

Direct democracy

Localocracy and OurSay get regular people involved in the decisions facing elected leaders (per Knight Foundation). Knight Foundation points to two examples of community organizing through technology: and BangTheTable

Open Gov Foundation: Darrell Issa's staffers started this website. Its Project Madison lets citizens propose changes, line by line, to currently debated bills. Project Madison won a $200,000 grant from the Knight Foundation.

New forms of news: TwitterAlert; PING

Voter Tools - aggregators of election contributions information: MapLight gathers information about political contributions and sponsors of ballot measures - developed a voter information tool VotersEdge, complements with crowd-sourced election information.

Daily Kos published an extensive list of democracy technologies highlighted by the Facebook group Upgrade Democracy (see table below)
Circle Voting
Deliberative Democracy (Stanford)
Dynamic Democracy (US)
Open Assembly
Personal Democracy Forum
Participant Labs
Seasteading Institute
Village Votes
[[…%20(nonworking%20link)|… (nonworking link)]]

Daily Kos also published this list of organizations that could potentially be activated by the Upgrade Democracy community:
1Party4All (UK)
Adhocracy (Germany)
Center for Democracy & Technology
Citizens in Charge
Debate Graph
DemoEx (UK & Sweden)
Digital Democracy (US)
Digital Democracy (UK)
Direct Representation
Dynamic Alignment
Dynamic Democracy (UK)
Ideal Government
International Open Source Party
Journal of eDemocracy & Open Government
Liquid Democracy
Metagovernment Project
Modern Ballots
Moxy Vote
Occupy Assembly
Online Democracy
Open Democracy
Open Source Democracy Foundation
Opinion Space
Participatory Politics Foundation
Party X
Planetary I/O
Porto Alegre (city in Brazil)
Program on Networked Governance (Harvard)
Project Vote Smart
ReWired State (UK)
SmartVote (switzerland)
Virtual Parliamant (UK)
Vote For Policies (UK)
YourFreedom (UK)

Open Data Intermediaries That Are Related to Nonprofit Sector

New middle men in the world of data sharing make data more democratic, with the drawback that nicely packaged data is less flexible. While searches are easy and user-friendly, information is pared down or missing for ease of use (or for the intention of conveying a specific message). Would be helpful to have harder-to-use raw data that encompass more variables, but the computer software to digest those data can be expensive. And the more barriers to use data, the less data can be used. gathers data from government databases for research
HealthyCity compiles datasets from a wide variety of sources for policy research and action. aggregates information from police agencies across the country to summarize crimes by type in a specific location, plotting crimes on a map. One search box allows you to search for statistics on children's well-being from a variety of original data sources like the Query System, Census Bureau, and others.
KidsCount Data Center Annie E. Casey Foundation-funded project that allows a one-box search for a wide range of data on children's well-being.
County Health Rankings This data aggregator pulls data from numerous sources to rank counties by how healthy they are (using a score based on weighted metrics). You can also pull customized reports from the data pool.

Example: Open data on BART employee salaries was gathered by the Mercury News and the Bay Area News Group, and a computer programmer asked the Bay Area d3 User Group to use d3.js to create visualizations of a range of data related to the BART strike. He hosted the visualizations on his website. In addition to publishing infographics, He also shared the raw data so curious users can download and manipulate the data themselves.

Pitfalls of Big Data and Open Data Use

Data Aggregators Package Information, Can Distort Takeaway collects data from public agencies around the country to map crimes in a specific location. The crimes called in are aggregated on the map view, but details show the time and nature of the specific crime. The sex offender data, however, include the offender's name and home address. This information is pulled from each state's website. although the information is already publicly available, making it so easy to find in one central clearinghouse may raise ethical and privacy questions. On the other hand, it may be providing a valuable service and making locations a lot safer.

A self described "Open Source Software fanatic" approached the Bay Area d3 User Group to use d3.js,to create visualizations of a range of data pertaining to the BART strike. The group's visualizationof striking BART employees' salaries is dramatic and has been widely re-posted. The presentation of the salaries, however, unintentionally conveys its own message. It shows a grid of dots, with each dot per employee, in order from highest-paid at the top to lowest paid at the bottom. The dots shrink, row by row, from top to bottom, so that at first glance, the median salary seems much higher than it really is. The group doesn't appear to be particularly anti-union, so it is likely that the representation was not deliberately misleading. The group also shares the raw data so curious users can use and manipulate the data themselves.
Why it matters: this use of open data makes it easier for laymen to comprehend the numbers -- but it can also distort the takeaway, intentionally or not.

Not Packaging Information Can Make Data Harder for Laymen to Use

In an opinion piece in the Washington Post, Robert Samuelson bemoans government for cutting the number of charts and graphs they publish each year. While the federal government explained it was a cost-saving measure that made sense because people could generate their own graphs with publically-available data, Samuelson writes that sometimes it is hard for a layman to navigate data sources and to manipulate data once he finds it.

Other Resources

Must-see Blogs

iRevolution blog is full of examples of ways government and NGOs have used big data for humanitarian purposes like disaster response.

Thegovlab is a blog that offers tips on getting into big data.


Transparency Camp

May 30-31 Arlington, VA
A conference organized by the Sunlight Foundation, it brings together people interested in open data (hackathons, open gov advocates, etc). Proposed topics include using open data to improve philanthropy and disaster response.

Scaling What Works

The conference description has examples of using big data for good. Now that the conference has ended, the updates page is useful.

After The Leap

Nearly two dozen high-performing organizations will be presenting on use of data December 3, 2013 at the After the Leap Conference (4:30pm).

Primers on Big Data

“The ‘What’ and The ‘How’ of Big Data.”
McKinsey Global Institute: “Big data: The next frontier for innovation, competition, and productivity. suggests nonprofit sector and foundations start thinking about big data, hiring people with the knowledge and understanding of how to use big data and analyzing its findings.

Three Strains of Skepticism

1. Technology can’t solve everything: big data is way overhyped

“Is Big Data Overhyped?”
big data
lets us avoid traffic jams - but should we be focused on the cause of traffic rather than the quick fix?

On Being a Data Skeptic -Discusses the limitations of big data particularly surrounding the way businesses use the results of big data analysis. Among the pitfalls of reliance on
big data
are the assumption that big data allows for the observation of every data point from every person whereas in fact selection bias can confound results. Other issues are that reliance on big data and data/metrics in general promotes a narrow-minded approach and people can miss important non-numerical trends.

Eight (No, Nine!) Problems with Big Data Op-ed notes, "although big data is very good at detecting correlations, never tells us which correlations are meaningful;" drawing conclusions on big data trends can be dangerous because it may imply causation and it sounds "scientific." The large sample sizes increase statistical power even where there is not a meaningful connection.

Berkman Institute: Peacebuilding in the Information Age: Sifting Hype from Reality.

2. We’ve let the hype of big data get in the way of privacy

“How Companies Learn Your Secrets.”
Widely circulated NY Times article about how data scientists developed ways to use consumer data to find out when a customer was pregnant - even before she herself knew.

From the Editor of The Economist “Big data will transform the world, but issues around privacy and propensity need to be resolved, says Kenneth Cukier.” Kenneth Cukier says the problem most people are aware of is privacy - that the current default of taking people's data and then asking for consent after the fact necessitates a change to privacy laws. Propensity is using people's data to determine whether they are likely to default on a loan, commit a crime, etc. This is the next big concern regarding reliance on big data.

Axciom: This is one of several databases that collects huge amounts of consumer data and sells it to marketers, etc. This type of company is not well-regulated, and it's not centralized. After FTC criticized these data brokers, one of them created a "transparency tool" allowing consumers to see some of the data collected on them. However, this appears to serve a PR purpose more than to provide any consumer protection. “Data Broker Acxiom Launches Transparency Tool, But Consumers Still Lack Control.” “Find out (some of) what one big data broker knows about you.” More on Axciom. Consumer data collected and distributed by these brokers is often incorrect -- and that can be harmful when it's used to speculate on a borrower's propensity to default on a loan or commit a crime. FTC Member’s “Reclaim Your Name” Transparency Proposal: “F.T.C. Member Starts ‘Reclaim Your Name’ Campaign for Personal Data.” FTC member suggested centralizing the data brokers and giving consumers the right to opt out of this data collection and distribution

3. Big Data can place civil liberties, even lives at risk

“Eight Problems With ‘Big Data’”
Raises issues about how use of big data to predict behavior can punish people before they do anything wrong. For example, big data might determine that a person is predisposed to default on a loan, and therefore the bank does not give him a loan.

“Facebook friends could change your credit score.”
Some non-bank lenders are using large quanitties of data to put together a picture of a loan applicant. They may deny a person based on their being friends with people who default on loans. New companies even use metadata -- like the amount of time the applicant takes reading the fine print and whether the applicant writes in sentence case or all caps -- to determine whether to issue a loan.

“Local Cops Following Big Brother's Lead, Getting Cell Phone Location Data Without a Warrant.”
Often police are allowed to take a person's cellphone incident to arrest and they can look through the person's phone. A NY Times article cited in the blog post reported that police departments warned against advertising police officers' use of cellphone data because it was bound to stir controversy.