AEA | Wait... What?

Posts Tagged ‘AEA’

What’s happening with MERL and Tech?

Posted in M&E, M&E Tech, MandEofTech, wait... what?, tagged AEA, evaluation, learning, M&E, MERL, MERL Tech, monitoring, research, SOTF, State of the Field, tech, technology on November 13, 2019| Leave a Comment »

(Reposting, original appears here)

Back in 2014, the humanitarian and development sectors were in the heyday of excitement over innovation and Information and Communication Technologies for Development (ICT4D). The role of ICTs specifically for monitoring, evaluation, research and learning (aka “MERL Tech“) had not been systematized (as far as I know), and it was unclear whether there actually was “a field.” I had the privilege of writing a discussion paper with Michael Bamberger to explore how and why new technologies were being tested and used in the different steps of a traditional planning, monitoring and evaluation cycle. (See graphic 1 below, from our paper).

The approaches highlighted in 2014 focused on mobile phones, for example: text messages (SMS), mobile data gathering, use of mobiles for photos and recording, mapping with specific handheld global positioning systems (GPS) devices or GPS installed in mobile phones. Promising technologies included tablets, which were only beginning to be used for M&E; “the cloud,” which enabled easier updating of software and applications; remote sensing and satellite imagery, dashboards, and online software that helped evaluators do their work more easily. Social media was also really taking off in 2014. It was seen as a potential way to monitor discussions among program participants, gather feedback from program participants, and considered an underutilized tool for greater dissemination of evaluation results and learning. Real-time data and big data and feedback loops were emerging as ways that program monitoring could be improved, and quicker adaptation could happen.

In our paper, we outlined five main challenges for the use of ICTs for M&E: selectivity bias; technology- or tool-driven M&E processes; over-reliance on digital data and remotely collected data; low institutional capacity and resistance to change; and privacy and protection. We also suggested key areas to consider when integrating ICTs into M&E: quality M&E planning, design validity; value-add (or not) of ICTs; using the right combination of tools; adapting and testing new processes before role-out; technology access and inclusion; motivation to use ICTs, privacy and protection; unintended consequences; local capacity; measuring what matters (not just what the tech allows you to measure); and effectively using and sharing M&E information and learning.

We concluded that:

The field of ICTs in M&E is emerging and activity is happening at multiple levels and with a wide range of tools and approaches and actors.
The field needs more documentation on the utility and impact of ICTs for M&E.
Pressure to show impact may open up space for testing new M&E approaches.
A number of pitfalls need to be avoided when designing an evaluation plan that involves ICTs.
Investment in the development, application and evaluation of new M&E methods could help evaluators and organizations adapt their approaches throughout the entire program cycle, making them more flexible and adjusted to the complex environments in which development initiatives and M&E take place.

Where are we now: MERL Tech in 2019

Much has happened globally over the past five years in the wider field of technology, communications, infrastructure, and society, and these changes have influenced the MERL Tech space. Our 2014 focus on basic mobile phones, SMS, mobile surveys, mapping, and crowdsourcing might now appear quaint, considering that worldwide access to smartphones and the Internet has expanded beyond the expectations of many. We know that access is not evenly distributed, but the fact that more and more people are getting online cannot be disputed. Some MERL practitioners are using advanced artificial intelligence, machine learning, biometrics, and sentiment analysis in their work. And as smartphone and Internet use continue to grow, more data will be produced by people around the world. The way that MERL practitioners access and use data will likely continue to shift, and the composition of MERL teams and their required skillsets will also change.

The excitement over innovation and new technologies seen in 2014 could also be seen as naive, however, considering some of the negative consequences that have emerged, for example social media inspired violence (such as that in Myanmar), election and political interference through the Internet, misinformation and disinformation, and the race to the bottom through the online “gig economy.”

In this changing context, a team of MERL Tech practitioners (both enthusiasts and skeptics) embarked on a second round of research in order to try to provide an updated “State of the Field” for MERL Tech that looks at changes in the space between 2014 and 2019.

Based on MERL Tech conferences and wider conversations in the MERL Tech space, we identified three general waves of technology emergence in MERL:

First wave: Tech for Traditional MERL: Use of technology (including mobile phones, satellites, and increasingly sophisticated data bases) to do ‘what we’ve always done,’ with a focus on digital data collection and management. For these uses of “MERL Tech” there is a growing evidence base.

Second wave: Big Data. Exploration of big data and data science for MERL purposes. While plenty has been written about big data for other sectors, the literature on the use of big data and data science for MERL is somewhat limited, and it is more focused on potential than actual use.

Third wave: Emerging approaches. Technologies and approaches that generate new sources and forms of data; offer different modalities of data collection; provide ways to store and organize data, and provide new techniques for data processing and analysis. The potential of these has been explored, but there seems to be little evidence base to be found on their actual use for MERL.

We’ll be doing a few sessions at the American Evaluation Association conference this week to share what we’ve been finding in our research. Please join us if you’ll be attending the conference!

Session Details:

Thursday, Nov 14, 2.45-3.30pm: Room CC101D

Big Data and Evaluation: The Current State and a Vision for the Future, with Michael Bamberger, Kecia Bertermann, Alexandra Robinson and Linda Raftree;

Friday, Nov 15, 3.30-4.15pm: Room CC101D

The Evolving Future of Integrating Technology into Evaluation, with Nathan Balasubramanian, Zach Tilton, Matilde Figuracion, and Paul Lorton Jr

Saturday, Nov 16, 10.15-11am. Room CC200DE

Presidential Strand: What’s happening with Tech in Evaluation? A state-of-the-field baseline to launch discussions about the future of evaluation, with Linda Raftree, Michael Harnar, Zach Tilton, Michael Bamberger, Kerry Bruce, and other authors of our State of the Field papers.

Read Full Post »

Session picks: American Evaluation Association (AEA) Conference

Posted in evaluation, ICTs, mobile and technology, innovation, M&E Tech, MERL Tech, wait... what?, tagged AEA, American Evaluation Association, conference, evaluation, Megan Colner, MERL Tech, monitoring, our picks, sessions on November 7, 2017| 1 Comment »

(Joint post from Linda Raftree, MERL Tech and Megan Colnar, Open Society Foundations)

The American Evaluation Association Conference happens once a year, and offers literally hundreds of sessions. It can take a while to sort though all of them. Because there are so many sessions, it’s easy to feel a bit lost in the crowds of people and content.

So, Megan Colnar (Open Society Foundations) and I thought we’d share some of the sessions that caught our eye.

I’m on the look-out for innovative tech applications, responsible and gender-sensitive data collection practices, and virtual or online/social media-focused evaluation techniques and methods. Megan plans to tune into sessions on policy change, complexity-aware techniques, and better MEL practices for funders.

We both can’t wait to learn about evaluation in the post-truth and fake news era. Full disclosure, our sessions are also featured below.

Hope we see you there!

Wednesday, November 8th

3.15-4.15

Opening plenary session: From Learning to Action: Employing Evaluation to Advance the Public Good Marriott Salon

4.30-6.00

We also think a lot of the ignite talks during this session in the Thurgood Salon South look interesting, like:

5.00-5.05 The Tenth Evaluator: Why Groupthink is dangerous to Evaluation – Ignite 6
5.05-5.10 1538: Understanding topics in behavioral economics: how participation and data collection can be influenced by non-rational human response – Ignite 7
5.30-5.35 Prototyping Civic Engagement: What we learned from early failures, Ignite 14
5.45-5.50 1109:Considerations around non-traditional big data sources: using sales data for a federal evaluation of tobacco products – Ignite 58

6.15-7.15

7.00-8.30

Tour of a few poster sessions before dinner. Highlights might include:

M&E for Journalism (51)
Measuring Advocacy (3)
Survey measures of corruption (53)
Theory of change in practice (186)
Using social networks as a decision-making tool (225)

Thursday, Nov 9th

8.00-9.00 – early risers are rewarded with some interesting options

Evaluating Advocacy in the Early Trump Years: What Evaluators and Funders Need to Know, Park TWR Ste 8206
Rethinking Impact in the Information Age, Virginia C
Innovative Qualitative Approaches to Capturing Complexity: Three Case Studies, Park TWR Ste 8205

9.15-10.15

Presidential Strand: 3596:Dialogues on Race & Class in America Marriott Salon 1-3

10.30-11.15

Innovations in Collecting and Presenting Qualitative Data, Virginia C
2995: Let’s Get Real: Evaluation Methodologies in a Virtual World PARK TWR STE 8224
ITE2: Technological Tools Enhancing Evaluation Roosevelt 4

12.15-1.15

2627 “Fake Findings” and Alternative Facts – What Should We Do When Findings are Misquoted or Data are Misanalysed to Further Personal or Monetary Agendas? Table 39

1.15-2.00

1908: Gaining Informed Consent in the Digital Age: Case study Jefferson (Linda’s session)
Capturing public perceptions to inform policy decisions: Examples from Moldova and South Africa, Madison A
Evaluation to inform public-interest decisions: Examples from the US and Tanzania, Harding
Using the Practice Profile in order to Make Foundation Strategy Evaluable, Washington 6

2.15-3.00

1426: Cutting through the noise: Finding meaningful lessons in social media data Roosevelt 4
How do we know what difference advocacy makes? Using contribution analysis in policy work, Jefferson

3.15-4.15

2466: What are we learning about the gender dimensions of tech-related interventions? Marriott Balcony A (Linda’s session)
Advancing Advocacy and Policy Change Evaluation Practice: Leveraging the Wisdom From the Field, Park TWR Ste 8217
Funders, Foundations, and Grantees: High Impact Investments and Collaboration through Evaluation, Roosevelt 1

4.30-5.15

Seeing the Forest Through the Trees: a donor’s trial and error in portfolio evaluation, Park TWR Ste 8210 (Megan’s Session)

Friday, Nov 10th

8.00-9.30 – early risers rewarded again!

2023: Responsible data practices for digital development Marriott Balcony A (Linda’s session)
Hidden Systems: Conducting Political Economy Analysis in Difficult Data Contexts, Park TWR Ste 8219

11.00-11.45

1810: Using Emerging Technologies to Better Engage Stakeholders in Evaluation PARK TWR STE 8224
Refreshing Evaluation in Support of the Social Movements Revival, Jefferson
Strategy or Evaluation: I Hate These Blurred Lines!, Park Twr 8216

1.45-3.15

3.30-4.15

Learning in the Long Term: What happens when a donor asks for less frequent evaluation and reporting?, Park TWR Ste 8218
Lessons Learnt from a Portfolio level Evaluation for the Bill and Melinda Gates Foundation, Jefferson
Learning from Action: Achieving Social and Financial Impact – Is it Possible? PARK TWR STE 8216

4.30-5.15

Accommodating Performance Management and Policy Evaluation: Lessons from the field of Pay for Success, Park TWR ste 8223
Becoming a Bayesian Columbo: a demonstration of Process-Tracing from start to finish, Thurgood Marshall East

5.30-6.15– if you can hold out for one more on a Friday evening

Educating Funders- how evaluators can shape smarter grantmaking, Park TWR Ste 8219 (Megan’s session)

6.30-7.15

Overcoming Barriers to Building a Dynamic Evaluation-Informed Learning Culture in Philanthropy Maryland C

Saturday, Nov 11th–you’re on your own! Let us know what treasures you discover

Read Full Post »

Insights on using ICTs in equity-focused evaluation

Posted in development, ict-enabled, ICT4D, ICTs, ICTs, mobile and technology, innovation, M&E, M&E Tech, MandEofTech, MERL Tech, technology, wait... what?, tagged AEA, American Evaluation Association, challenges, concerns, equity-focused, evaluation, Girl Effect, good practice, herschel sanders, how to, insights, kecia bertermann, michael bamberger, opportunities, RTI on November 3, 2016| 1 Comment »

At the 2016 American Evaluation Association conference, I chaired a session on benefits and challenges with ICTs in Equity-Focused Evaluation. The session frame came from a 2016 paper on the same topic. Panelists Kecia Bertermann from Girl Effect, and Herschel Sanders from RTI added fascinating insights on the methodological challenges to consider when using ICTs for evaluation purposes and discussant Michael Bamberger closed out with critical points based on his 50+ years doing evaluations.

ICTs include a host of technology-based tools, applications, services, and platforms that are overtaking the world. We can think of them in three key areas: technological devices, social media/internet platforms and digital data.

An equity focus evaluation implies ensuring space for the voices of excluded groups and avoiding the traditional top-down approach. It requires:

Identifying vulnerable groups
Opening up space for them to make their voices heard through channels that are culturally responsive, accessible and safe
Ensuring their views are communicated to decision makers

It is believed that ICTs, especially mobile phones, can help with inclusion in the implementation of development and humanitarian programming. Mobile phones are also held up as devices that can allow evaluators to reach isolated or marginalized groups and individuals who are not usually engaged in research and evaluation. Often, however, mobiles only overcome geographic inclusion. Evaluators need to think harder when it comes to other types of exclusion – such as that related to disability, gender, age, political status or views, ethnicity, literacy, or economic status – and we need to consider how these various types of exclusions can combine to exacerbate marginalization (e.g., “intersectionality”).

We are seeing increasing use of ICTs in evaluation of programs aimed at improving equity. Yet these tools also create new challenges. The way we design evaluations and how we apply ICT tools can make all the difference between including new voices and feedback loops or reinforcing existing exclusions or even creating new gaps and exclusions.

Some of the concerns with the use of ICTs in equity- based evaluation include:

Methodological aspects:

Are we falling victim to ‘elite capture’ — only hearing from higher educated, comparatively wealthy men, for example? How does that bias our information? How can we offset that bias or triangulate with other data and multi-methods rather than depending only on one tool-based method?
Are we relying too heavily on things that we can count or multiple-choice responses because that’s what most of these new ICT tools allow?
Are we spending all of our time on a device rather than in communities engaging with people and seeking to understand what’s happening there in person?
Is reliance on mobile devices or self-reporting through mobile surveys causing us to miss contextual clues that might help us better interpret the data?
Are we falling into the trap of fallacy in numbers – in other words, imagining that because lots of people are saying something, that it’s true for everyone, everywhere?

Organizational aspects:

Do digital tools require a costly, up-front investment that some organizations are not able to make?
How do fear and resistance to using digital tools impact on data gathering?
What kinds of organizational change processes are needed amongst staff or community members to address this?
What new skills and capacities are needed?

Ethical aspects:

How are researchers and evaluators managing informed consent considering the new challenges to privacy that come with digital data? (Also see: Rethinking Consent in the Digital Age)?
Are evaluators and non-profit organizations equipped to keep data safe?
Is it possible to anonymize data in the era of big data given the capacity to cross data sets and re-identify people?
What new risks might we be creating for community members? To local enumerators? To ourselves as evaluators? (See: Developing and Operationalizing Responsible Data Policies)

Evaluation of Girl Effect’s online platform for girls

Kecia walked us through how Girl Effect has designed an evaluation of an online platform and applications for girls. She spoke of how the online platform itself brings constraints because it only works on feature phones and smart phones, and for this reason it was decided to work with 14-16 year old urban girls in megacities who have access to these types of devices yet still experience multiple vulnerabilities such as gender-based violence and sexual violence, early pregnancy, low levels of school completion, poor health services and lack of reliable health information, and/or low self-esteem and self-confidence.

The big questions for this program include:

Is the content reaching the girls that Girl Effect set out to reach?
Is the content on the platform contributing to change?

Because the girl users are on the platform, Girl Effect can use features such as polls and surveys for self-reported change. However, because the girls are under 18, there are privacy and security concerns that sometimes limit the extent to which the organization feels comfortable tracking user behavior. In addition, the type of phones that the girls are using and the fact that they may be borrowing others’ phones to access the site adds another level of challenges. This means that Girl Effect must think very carefully about the kind of data that can be gleaned from the site itself, and how valid it is.

The organization is using a knowledge, attitudes and practices (KAP) framework and exploring ways that KAP can be measured through some of the exciting data capture options that come with an online platform. However it’s hard to know if offline behavior is actually shifting, making it important to also gather information that helps read into the self-reported behavior data.

Girl Effect is complementing traditional KAP indicators with web analytics (unique users, repeat visitors, dwell times, bounce rates, ways that users arrive to the site) with push-surveys that go out to users and polls that appear after an article (“Was this information helpful? Was it new to you? Did it change your perceptions? Are you planning to do something different based on this information?”) Proxy indicators are also being developed to help interpret the data. For example, does an increase in frequency of commenting on the site by a particular user have a link with greater self-esteem or self-efficacy?

However, there is only so much that can be gleaned from an online platform when it comes to behavior change, so the organization is complementing the online information with traditional, in-person, qualitative data gathering. The site is helpful there, however, for recruiting users for focus groups and in-depth interviews. Girl Effect wants to explore KAP and online platforms, yet also wants to be careful about making assumptions and using proxy indicators, so the traditional methods are incorporated into the evaluation as a way of triangulating the data. The evaluation approach is a careful balance of security considerations, attention to proxy indicators, digital data and traditional offline methods.

Using SMS surveys for evaluation: Who do they reach?

Herschel took us through a study conducted by RTI (Sanders, Lau, Lombaard, Baker, Eyerman, Thalji) in partnership with TNS about the use of SMS surveys for evaluation. She noted that the rapid growth of mobile phones, particularly in African countries, opens up new possibilities for data collection. There has been an explosion of SMS surveys for national, population-based surveys.

Like most ICT-enabled MERL methods, use of SMS for general population surveys brings both promise:

High mobile penetration in many African countries means we can theoretically reach a large segment of the population.
These surveys are much faster and less expensive than traditional face-to- face surveys.
SMS surveys work on virtually any GSM phone.
SMS offers the promise of reach. We can reach a large and geographically dispersed population, including some areas that are excluded from FTF surveys because of security concerns.

And challenges:

Coverage: We cannot include illiterate people or those without access to a mobile phone. Also, some sample frames may not include the entire population with mobile phones.
Non-response: Response rates are expected to be low for a variety of reasons, including limited network connectivity or electricity; if two or people share a phone, we may not reach all people associated with that phone; people may feel a lack of confidence with technology. These factors might affect certain sub-groups differently, so we might underrepresent the poor, rural areas, or women.
Quality of measurement. We only have 160 CHARACTERS for both the question AND THE RESPONSE OPTIONS. Further, an interviewer is not present to clarify any questions.

RTI’s research aimed to answer the question: How representative are general population SMS surveys and are there ways to improve representativeness?

Three core questions were explored via SMS invitations sent in Kenya, Ghana, Nigeria and Uganda:

Does the sample frame match the target population?
Does non-response have an impact on representativeness?
Can we improve quality of data by optimizing SMS designs?

One striking finding was the extent to which response rates may vary by country, Hershel said. In some cases this was affected by agreements in place in each country. Some required a stronger opt-in process. In Kenya and Uganda, where a higher percentage of users had already gone through an opt-in process and had already participated in SMS-based surveys, there was a higher rate of response.

screen-shot-2016-11-03-at-2-23-26-pm

These response rates, especially in Ghana and Nigeria, are noticeably low, and the impact of the low response rates in Nigeria and Ghana is evident in the data. In Nigeria, where researchers compared the SMS survey results against the face-to-face data, there was a clear skew away from older females, towards those with a higher level of education and who are full-time employed.

Additionally, 14% of the face-to-face sample, filtered on mobile users, had a post-secondary education, whereas in the SMS data this figure is 60%.

Additionally, Compared to face-to-face data, SMS respondents were:

More likely to have more than 1 SIM card
Less likely to share a SIM card
More likely to be aware of and use the Internet.

This sketches a portrait of a more technological savvy respondent in the SMS surveys, said Herschel.

screen-shot-2016-11-03-at-2-24-18-pm

The team also explored incentives and found that a higher incentive had no meaningful impact, but adding reminders to the design of the SMS survey process helped achieve a wider slice of the sample and a more diverse profile.

Response order effects were explored along with issues related to questionnaire designers trying to pack as much as possible onto the screen rather than asking yes/no questions. Hershel highlighted that that when multiple-choice options were given, 76% of SMS survey respondents only gave 1 response compared to 12% for the face-to-face data.

screen-shot-2016-11-03-at-2-23-53-pm Lastly, the research found no meaningful difference in response rate between a survey with 8 questions and one with 16 questions, she said. This may go against common convention which dictates that “the shorter, the better” for an SMS survey. There was no observable break off rate based on survey length, giving confidence that longer surveys may be possible via SMS than initially thought.

Hershel noted that some conclusions can be drawn:

SMS excels for rapid response (e.g., Ebola)
SMS surveys have substantial non-response errors
SMS surveys overrepresent

These errors mean SMS cannot replace face-to-face surveys … yet. However, we can optimize SMS survey design now by:

Using reminders during data collection
Be aware of response order effects. So we need to randomize substantive response options to avoid bias.
Not using “select all that apply” questions. It’s ok to have longer surveys.

However, she also noted that the landscape is rapidly changing and so future research may shed light on changing reactions as familiarity with SMS and greater access grow.

Summarizing the opportunities and challenges with ICTs in Equity-Focused Evaluation

Finally we heard some considerations from Michael, who said that people often get so excited about possibilities for ICT in monitoring, evaluation, research and learning that they neglect to address the challenges. He applauded Girl Effect and RTI for their careful thinking about the strengths and weaknesses in the methods they are using. “It’s very unusual to see the type of rigor shown in these two examples,” he said.

Michael commented that a clear message from both presenters and from other literature and experiences is the need for mixed methods. Some things can be done on a phone, but not all things. “When the data collection is remote, you can’t observe the context. For example, if it’s a teenage girl answering the voice or SMS survey, is the mother-in-law sitting there listening or watching? What are the contextual clues you are missing out on? In a face-to-face context an evaluator can see if someone is telling the girl how to respond.”

Additionally,“no survey framework will cover everyone,” he said. “There may be children who are not registered on the school attendance list that is being used to identify survey respondents. What about immigrants who are hiding from sight out of fear and not registered by the government?” He cautioned evaluators to not forget about folks in the community who are totally missed out and skipped over, and how the use of new technology could make that problem even greater.

Another point Michael raised is that communicating through technology channels creates a different behavior dynamic. One is not better than the other, but evaluators need to be aware that they are different. “Everyone with teenagers knows that the kind of things we communicate online are very different than what we communicate in a face-to-face situation,” he said. “There is a style of how we communicate. You might be more frank and honest on an online platform. Or you may see other differences in just your own behavior dynamics on how you communicate via different kinds of tools,” he said.

He noted that a range of issues has been raised in connection to ICTs in evaluation, but that it’s been rare to see priority given to evaluation rigor. The study Herschel presented was one example of a focus on rigor and issues of bias, but people often get so excited that they forget to think about this. “Who has access.? Are people sharing phones? What are the gender dynamics? Is a husband restricting what a woman is doing on the phone? There’s a range of selection bias issues that are ignored,” he said.

Quantitative bias and mono-methods are another issue in ICT-focused evaluation. The tool choice will determine what an evaluator can ask and that in turn affects the quality of responses. This leads to issues with construct validity. If you are trying to measure complex ideas like girls’ empowerment and you reduce this to a proxy, there can often be a large jump in interpretation. This doesn’t happen only when using mobile phones for evaluation data collection purposes but there are certain areas that may be exacerbated when the phone is the tool. So evaluators need to better understand behavior dynamics and how they related to the technical constraints of a particular digital or mobile platform.

The aspect of information dissemination is another one worth raising, said Michael. “What are the dynamics? When we incorporate new tools, we tend to assume there is just one-step between the information sharer and receiver, yet there is plenty of literature that shows this is normally at least 2 steps. Often people don’t get information directly, but rather they share and talk with someone else who helps them verify and interpret the information they get on a mobile phone. There are gatekeepers who control or interpret, and evaluators need to better understand those dynamics. Social network analysis can help with that sometimes – looking at who communicates with whom? Who is part of the main infuencer hub? Who is marginalized? This could be exciting to explore more.”

Lastly, Michael reiterated the importance of mixed methods and needing to combine online information and communications with face-to-face methods and to be very aware of invisible groups. “Before you do an SMS survey, you may need to go out to the community to explain that this survey will be coming,” he said. “This might be necessary to encourage people to even receive the survey, to pay attention or to answer it.” The case studies in the paper “The Role of New ICTs in Equity-Focused Evaluation: Opportunities and Challenges” explore some of these aspects in good detail.

Read Full Post »

Big data in development evaluation

Posted in big data, evaluation, ict-enabled, ICTs, mobile and technology, innovation, M&E, M&E Tech, monitoring and evaluation, technology salon, wait... what?, tagged AEA, big data, chicago, evaluation, Technology Salon on November 23, 2015| 3 Comments »

Traditional development evaluation has been characterized as ‘backward looking’ rather than forward looking and too focused on proving over improving. Some believe applying an ‘agile’ approach in development would be more useful — the assumption being that if you design a program properly and iterate rapidly and constantly based on user feedback and data analytics, you are more likely achieve your goal or outcome without requiring expensive evaluations. The idea is that big data could eventually allow development agencies to collect enough passive data about program participants that there would no longer be a need to actively survey people or conduct a final evaluation, because there would be obvious patterns that would allow implementers to understand behaviors and improve programs along the way.

The above factors have made some evaluators and data scientists question whether big data and real-time availability of multiple big data sets, along with the technology that enables their collection and analysis, will make evaluation as we know it obsolete. Others have argued that it’s not the end of evaluation, but rather we will see a blending of real-time monitoring, predictive modeling, and impact evaluation, depending on the situation. Big questions remain, however, about the feasibility of big data in some contexts. For example, are big data approaches useful when it comes to people who are not producing very much digital data? How will the biases in big data be addressed to ensure that the poorest, least connected, and/or most marginalized are represented?

The Technology Salon on Big Data and Evaluation hosted during November’s American Evaluation Association Conference in Chicago opened these questions up for consideration by a roomful of evaluators and a few data scientists. We discussed the potential role of new kinds and quantities of data. We asked how to incorporate static and dynamic big data sources into development evaluation. We shared ideas on what tools, skills, and partnerships we might require if we aim to incorporate big data into evaluation practice. This rich and well-informed conversation was catalyzed by our lead discussants: Andrew Means, Associate Director of the Center for Data Science & Public Policy at the University of Chicago and Founder of Data Analysts for Social Good and The Impact Lab; Michael Bamberger, Independent Evaluator and co-author of Real World Evaluation; and Veronica Olazabal from The Rockefeller Foundation. The Salon was supported by ITAD via a Rockefeller Foundation grant.

What do we mean by ‘big data’?

The first task was to come up with a general working definition of what was understood by ‘big data.’ Very few of the organizations present at the Salon were actually using ‘big data’ and definitions varied. Some talked about ‘big data sets’ as those that could not be collected or analyzed by a human on a standard computer. Others mentioned that big data could include ‘static’ data sets (like government census data – if digitized — or cellphone record data) and ‘dynamic’ data sets that are being constantly generated in real time (such as streaming data input from sensors or ‘cookies’ and ‘crumbs’ generated through use of the Internet and social media). Others considered big data to be real time, socially-created and socially-driven data that could be harvested without having to purposely collect it or budget for its collection. ‘It’s data that has a life of its own. Data that just exists out there.’ Yet others felt that for something to be ‘big data’ multiple big data sets needed to be involved, for example, genetic molecular data crossed with clinical trial data and other large data sets, regardless of static or dynamic nature. Big data, most agreed, is data that doesn’t easily fit on a laptop and that requires a specialized skill set that most social scientists don’t have. ‘What is big data? It’s hard to define exactly, but I know it when I see it,’ concluded one discussant.

Why is big data a ‘thing’?

As one discussant outlined, recent changes in technology have given rise to big data. Data collection, data storage and analytical power are becoming cheaper and cheaper. ‘We live digitally now and we produce data all the time. A UPS truck has anywhere from 50-75 sensors on it to do everything from optimize routes to indicate how often it visits a mechanic,’ he said. ‘The analytic and computational power in my iPhone is greater than what the space shuttle had.’ In addition, we have ‘seamless data collection’ in the case of Internet-enabled products and services, meaning that a person creates data as they access products or services, and this can then be monetized, which is how companies like Google make their money. ‘There is not someone sitting at Google going — OK, Joe just searched for the nearest pizza place, let me enter that data into the system — Joe is creating the data about his search while he is searching, and this data is a constant stream.’

What does big data mean for development evaluation?

Evaluators are normally tasked with making a judgment about the merit of something, usually for accountability, learning and/or to improve service delivery, and usually looking back at what has already happened. In the wider sense, the learning from evaluation contributes to program theory, needs assessment, and many other parts of the program cycle.

This approach differs in some key ways from big data work, because most of the new analytical methods used by data scientists are good at prediction but not very good at understanding causality, which is what social scientists (and evaluators) are most often interested in. ‘We don’t just look at giant data sets and find random correlations,’ however, explained one discussant. ‘That’s not practical at all. Rather, we start with a hypothesis and make a mental model of how different things might be working together. We create regression models and see which performs better. This helps us to know if we are building the right hypothesis. And then we chisel away at that hypothesis.’

Some challenges come up when we think about big data for development evaluation because the social sector lacks the resources of the private sector. In addition, data collection in the world of international development is not often seamless because ‘we care about people who do not live in the digital world,’ as one person put it. Populations we work with often do not leave a digital trail. Moreover, we only have complete data about the entire population in some cases (for example, when it comes to education in the US), meaning that development evaluators need to figure out how to deal with bias and sampling.

Satellite imagery can bring in some data that was unavailable in the past, and this is useful for climate and environmental work, but we still do not have a lot of big data for other types of programming, one person said. What’s more, wholly machine-based learning, and the kind of ‘deep learning’ made possible by today’s computational power is currently not very useful for development evaluation.

Evaluators often develop counterfactuals so that they can determine what would have happened without an intervention. They may use randomized controlled trials (RCTs), differentiation models, statistics and economics research approaches to do this. One area where data science may provide some support is in helping to answer questions about counterfactuals.

More access to big data (and open data) could also mean that development and humanitarian organizations stop duplicating data collection functions. Perhaps most interestingly, big data’s predictive capabilities could in the future be used in the planning phase to inform the kinds of programs that agencies run, where they should be run, and who should be let into them to achieve the greatest impact, said one discussant. Computer scientists and social scientists need to break down language barriers and come together more often so they can better learn from one another and determine where their approaches can overlap and be mutually supportive.

Are we all going to be using big data?

Not everyone needs to use big data. Not everyone has the capacity to use it, and it doesn’t exist for offline populations, so we need to be careful that we are not forcing it where it’s not the best approach. As one discussant emphasized, big data is not magic, and it’s not universally applicable. It’s good for some questions and not others, and it should be considered as another tool in the toolbox rather than the only tool. Big data can provide clues to what needs further examination using other methods, and thus most often it should be part of a mixed methods approach. Some participants felt that the discussion about big data was similar to the one 20 years ago on electronic medical records or to the debate in the evaluation community about quantitative versus qualitative methods.

What about groups of people who are digitally invisible?

There are serious limitations when it comes to the data we have access to in the poorest communities, where there are no tablets and fewer cellphones. We also need to be aware of ‘micro-exclusion’ (who within a community or household is left out of the digital revolution?) and intersectionality (how do different factors of exclusion combine to limit certain people’s digital access?) and consider how these affect the generation and interpretation of big data. There is also a question about the intensity of the digital footprint: How much data and at what frequency is it required for big data to be useful?

Some Salon participants felt that over time, everyone would have a digital presence and/or data trail, but others were skeptical. Some data scientists are experimenting with calibrating small amounts of data and comparing them to human-collected data in an attempt to make big data less biased, a discussant explained. Another person said that by digitizing and validating government data on thousands (in the case of India, millions) of villages, big data sets could be created for those that are not using mobiles or data.

Another person pointed out that generating digital data is a process that involves much more than simple access to technology. ‘Joining the digital discussion’ also requires access to networks, local language content, and all kinds of other precursors, she said. We also need to be very aware that these kinds of data collection processes impact on people’s participation and input into data collection and analysis. ‘There’s a difference between a collective evaluation activity where people are sitting around together discussing things and someone sitting in an office far from the community getting sound bites from a large source of data.’

Where is big data most applicable in evaluation?

One discussant laid out areas where big data would likely be the most applicable to development evaluation:

Screen Shot 2015-11-23 at 9.32.07 AM

It would appear that big data has huge potential in the evaluation of complex programs, he continued. ‘It’s fairly widely accepted that conventional designs don’t work well with multiple causality, multiple actors, multiple contextual variables, etc. People chug on valiantly, but it’s expected that you may get very misleading results. This is an interesting area because there are almost no evaluation designs for complexity, and big data might be a possibility here.’

In what scenarios might we use big data for development evaluation?

This discussant suggested that big data might be considered useful for evaluation in three areas:

Supporting conventional evaluation design by adding new big data generated variables. For example, one could add transaction data from ATMs to conventional survey generated poverty indicators
Increasing the power of a conventional evaluation design by using big data to strengthen the sample selection methodology. For example, satellite images were combined with data collected on the ground and propensity score matching was used to strengthen comparison group selection for an evaluation of the effects of interventions on protecting forest cover in Mexico.
Replacing a conventional design with a big data analytics design by replacing regression based models with systems analysis. For example, one could use systems analysis to compare the effectiveness of 30 ongoing interventions that may reduce stunting in a sample of villages. Real-time observations could generate a time-series that could help to estimate the effectiveness of each intervention in different contexts.

It is important to remember construct validity too. ‘If big data is available, but it’s not quite answering the question that you want to ask, it might be easy to decide to do something with it, to run some correlations, and to think that maybe something will come out. But we should avoid this temptation,’ he cautioned. ‘We need to remember and respect construct validity and focus on measuring what we think we are measuring and what we want to measure, not get distracted by what a data set might offer us.’

What about bias in data sets?

We also need to be very aware that big data carries with it certain biases that need to be accounted for, commented several participants; notably, when working with low connectivity populations and geographies or when using data from social media sites that cater to a particular segment of the population. One discussant shared an example where Twitter was used to identify patterns in food poisoning, and suddenly the upscale, hipster restaurants in the city seemed to be the problem. Obviously these restaurants were not the sole source of the food poisoning, but rather there was a particular kind of person that tended to use Twitter.

‘People are often unclear about what’s magical and what’s really possible when it comes to big data. We want it to tell us impossible things and it can’t. We really need to engage human minds in this process; it’s not a question of everything being automated. We need to use our capacity for critical thinking and ask: Who’s creating the data? How’s it created? Where’s it coming from? Who might be left out? What could go wrong?’ emphasized one discussant. ‘Some of this information can come from the metadata, but that’s not always enough to make certain big data is a reliable source.’ Bias may also be introduced through the viewpoints and unconscious positions, values and frameworks of the data scientists themselves as they are developing algorithms and looking for/finding patterns in data.

What about the ethical and privacy implications?

Big Data has a great deal of ethical and privacy implications. Issues of consent and potential risk are critical considerations, especially when working with populations that are newly online and/or who may not have a good understanding of data privacy and how their data may be used by third parties who are collecting and/or selling it. However, one participant felt that a protectionist mentality is misguided. ‘We are pushing back and saying that social media and data tracking are bad. Instead, we should realize that having a digital life and being counted in the world is a right and it’s going to be inevitable in the future. We should be working with the people we serve to better understand digital privacy and help them to be more savvy digital citizens.’ It’s also imperative that aid and development agencies abandon our slow and antiquated data collection systems, she said, and to use the new digital tools that are available to us.

How can we be more responsible with the data we gather and use?

Development and humanitarian agencies do need be more responsible with data policies and practices, however. Big data approaches may contribute to negative data extraction tendencies if we mine data and deliver it to decision-makers far away from the source. It will be critical for evaluators and big data practitioners to find ways to engage people ‘on the ground’ and involve more communities in interpreting and querying their own big data. (For more on responsible data use, see the Responsible Development Data Book. Oxfam also has a responsible data policy that could serve as a reference. The author of this blog is working on a policy and practice guide for protecting girls digital safety, security and privacy as well.)

Who should be paying for big data sets to be made available?

One participant asked about costs and who should bear the expense of creating big data sets and/or opening them up to evaluators and/or data scientists. Others asked for examples of the private sector providing data to the social sector. This highlighted additional ethical and privacy issues. One participant gave an example from the healthcare space where there is lots of experience in accessing big data sets generated by government and the private sector. In this case, public and private data sets needed to be combined. There were strict requirements around anonymization and the effort ended up being very expensive, which made it difficult to build a business case for the work.

This can be a problem for the development sector, because it is difficult to generate resources for resolving social problems; there is normally only investment if there is some kind of commercial gain to be had. Some organizations are now hiring ‘data philanthropist’ positions that help to negotiate these kinds of data relationships with the private sector. (Global Pulse has developed a set of big data privacy principles to guide these cases.)

So, is big data going to replace evaluation or not?

In conclusion, big data will not eliminate the need for evaluation. Rather, it’s likely that it will be integrated as another source of information for strengthening conventional evaluation design. ‘Big Data and the underlying methods of data science are opening up new opportunities to answer old questions in new ways, and ask new kinds of questions. But that doesn’t mean that we should turn to big data and its methods for everything,’ said one discussant. ‘We need to get past a blind faith in big data and get more practical about what it is, how to use it, and where it adds value to evaluation processes,’ said another.

Thanks again to all who participated in the discussion! If you’d like to join (or read about) conversations like this one, visit Technology Salon. Salons run under Chatham House Rule, so no attribution has been made in this summary post.

Read Full Post »

Wait… What?

discussions on digital ethics. privacy and power

Posts Tagged ‘AEA’

What’s happening with MERL and Tech?

Session picks: American Evaluation Association (AEA) Conference

Insights on using ICTs in equity-focused evaluation

Big data in development evaluation

Search this blog

Affiliations & Projects

I got podcasted!

Publications

Latest posts

Email Subscription

Disclaimer

Archives

who’s visiting?

Contact info