Online data collection survey results

In preparation for the release of psiTurk 2.0, we sent out a short survey to gauge behavioral researchers’ interest in online data collection and the types of tools they’re looking for. We got some great feedback from the community, with 201 people responding. The complete results are posted below.

We heard from a wide range of academic fields. While most respondents (unsurprisingly) hailed from psychology, we also heard from researchers in linguistics, marketing, neuroscience, and economics. Most researchers (85%) had some experience collecting behavioral data online in their labs.

There were two key insights we got from this data.

First, there is a clear interest in and acceptance of the use of online data in research. This is interesting because even a few years ago there was much more skepticism that random people doing your experiment over the Internet would give valid data. Nearly 40% suggested they treat papers reporting data collected online identically to that collected in a lab. Nearly all respondents selected large sample sizes (93%) and a more diverse population (98%) as potential advantages of collecting data online, with most listing a more diverse population (60%) and cost (75%) as factors as well. However, some researchers felt that online data is unreliable (35%), the population is unrepresentative (25%), and the technology required to collect data online is too complex (26%). Half of respondents also stated that the experiment designs they were interested in do not work well online, and researchers outside the US indicated having difficulties using services like Amazon Mechanical Turk.

Second, it appears there remain significant software challenges in helping most researchers do behavioral data collection online. The majority of our respondents listed Qualtrics, a service for conducting online surveys, as the tool they were currently most likely to use. At the same time, 79% of respondents were interested in running full experiments online including multiple trials, fixation crosses, etc…

The vast majority (94%) of those surveyed indicated they were interested in new tools which simplified online data collection. Of the features people hoped such software would include, 90% listed the ability to block repeat participation, and 70% listed the ability to automatically pay people (incidentally, these are all features of our lab’s psiTurk package). In addition, 64% said the availability of example code that one could use to jump-start their own experiment design was important (also a feature of psiTurk’s Experiment Exchange). A majority of participants thought a cloud-based solution with a GUI interface would be the preferred form for this type of software to take and 64% indicated a general lack of experience or knowledge about web-based programming (e.g., Javascript, HTML, etc…). Interestingly, this is currently not how psiTurk works! (Stay tuned though, as we are currently developing greater cloud-compatibility.)

Anyway, that take home seems to be that the acceptance of online data collection is increasing, but the tools for doing this are still lacking in ways most relevant to behavioral researchers.

We conducted this study exactly because our lab is working on open-source tools to help with this. While our approach doesn’t solve every problem that exists in the community, it does seem to tap many of the concerns these respondents have. This is definitely an interesting space for future development and there seems to be a market for tools which simplify online data collection and which provide services for those outside the US who can’t leverage the Amazon Mechanical Turk platform. It will be interesting to re-run this survey in a couple years as progress in this area continues!

Who took the survey?

Data was collected via a variety of psychology, linguistics, and neuroscience oriented bulletin boards and mailing lists as well as social media outlets (e.g., Twitter/Facebook). Data collection took place during the later part of March 2014 until mid April. 201 respondents completed the survey.

What is your main area of work?

Have you collected behavioral data online in your lab?

Do you know how to program using web technologies?

If yes, what languages do you know?

If yes, what languages do you PREFER?

If no, would you be willing to learn Javascript or Flash if there were helpful guides and example code specifically for experimental tasks?

Yes 102 60%
Maybe 45 26%
No 7 4%
No, but I might ask a grad student or research assistant to learn these languages 16 9%

What do people think about online data?

What are the major CHALLENGES in using behavioral data collected online in research?

Data is unreliable 70 35%
Experiment designs I’m interested in do not work well online 100 50%
Population is unrepresentative 51 25%
The technology required is too complex 53 26%
I cannot get IRB approval to do online studies 3 1%
I am based outside the US and find it difficult to use services like Amazon Mechanical Turk 46 23%
Other 35 17%

What are the major BENEFITS in using behavioral data collected online?

Large samples sizes 187 93%
More diverse population 121 60%
Fast data collection 196 98%
Cost 150 75%
Other 9 4%

When reviewing a paper with data collected online in my area (e.g., Mechanical Turk):

I sometimes subjectively feel the researchers were too lazy to run a good study in the lab, but might still accept 14 6%
I treat the data the same as data collected in the lab 92 40%
I hold the study to a higher standard because I believe online data is more unreliable 33 14%
I tend to reject these papers 0 0%
I have never reviewed a paper involving online data collection 71 31%
Other 18 8%

What are people looking for in an online data collection system?

If you were to use online data in your research, what role would it primarily play?

Norming stimuli 80 40%
Conducting simple surveys 135 67%
Conducting full experiments (i.e., multiple trials, fixation crosses, etc…) 159 79%
Transcription of audio or video files 22 11%
Group experiments 43 21%
Multi-day experiments 34 17%
Coding of video files 17 8%
Other 4 2%

If you were required to run an online experiment tomorrow what tool would you use?

Qualtrics 113 56%
Google Forms 24 12%
I’d create my experiment using Javascript/HTML5 45 22%
I’d create my experiment using Adobe Flash 8 4%
I’d hire a programmer 25 12%
I have no idea! 23 11%
Other 34 17%

Would you be interested in a software tool that helped simplify online data collection?

Yes 189 94%
No, I’m not interested 2 1%
No, I have my own system/method 11 5%

If yes, select features of such a system you would find appealing/useful or add your own:

Ability to automatically pay people 138 69%
Ability to automatically assign bonuses based on performance 117 58%
Ability to design experiment in the browser without programming (e.g., no Javascript or Flash needed)… similar to E-prime or OpenSesame 129 64%
No need to have a separate server/webserver installed and maintained 123 61%
Ability to block participants from doing the same experiment twice 181 90%
Ability to record information about browser (e.g. when people switch windows during task) 125 60%
Ability to see if a online participant has done a similar study to yours in the past 125 60%
Automatically fill in conditions randomly and evenly 140 52%
Open-source software I can read/edit myself if needed 121 60%
Ability to save data from experiment incrementally (in case browser or network crash can get partial data) 106 53%
Ability to obtain geographic and visualize geographic information about where participants are connecting from 101 50%
Availability of example code that I can use to jump-start my own experiment 128 64%
Ability to run group experiments (i.e., multiple people interacting) 99 49%
Ability to run multi-day experiments where people come back for multiple sessions 115 57%
Ability to store data from my experiments in the cloud and have them automatically backed up 100 50%
A graphical user interface for managing my online experiments (e.g., paying participants, viewing data) 121 60%
Tools to help document payments for reimbursement from University or Business (i.e., accounting issues) 97 48%
Ability to easily coordinate the running of multiple experiments at the same time 110 55%
Other 20 10%

If yes, would you prefer this tool to use the command line or a graphical user interface?

Graphical user interface 137 74%
Command line 30 16%
Other 17 9%

How comfortable are you using the command line (e.g., terminal)?

I’m a pro 22 11%
Very comfortable 37 18%
Moderate 51 25%
Not very comfortable, but I can get around. 62 31%
What’s a command line? 30 15%

Online data collection requires a server that can run on your local computer or in the cloud. Which do you prefer?

Cloud server (e.g., the way Google Docs or Facebook works) 118 58%
Local computer 36 18%
Unsure 48 24%

Does your university or company provide you with an internet-addressable IP address (e.g., a static IP address you can use to connect to your office computer from home)?