Online data collection survey results
by Alex Rich and Todd Gureckis
In preparation for the release of psiTurk 2.0, we sent out a short survey to gauge behavioral researchers’ interest in online data collection and the types of tools they’re looking for. We got some great feedback from the community, with 201 people responding. The complete results are posted below.
We heard from a wide range of academic fields. While most respondents (unsurprisingly) hailed from psychology, we also heard from researchers in linguistics, marketing, neuroscience, and economics. Most researchers (85%) had some experience collecting behavioral data online in their labs.
There were two key insights we got from this data.
First, there is a clear interest in and acceptance of the use of online data in research. This is interesting because even a few years ago there was much more skepticism that random people doing your experiment over the Internet would give valid data. Nearly 40% suggested they treat papers reporting data collected online identically to that collected in a lab. Nearly all respondents selected large sample sizes (93%) and a more diverse population (98%) as potential advantages of collecting data online, with most listing a more diverse population (60%) and cost (75%) as factors as well. However, some researchers felt that online data is unreliable (35%), the population is unrepresentative (25%), and the technology required to collect data online is too complex (26%). Half of respondents also stated that the experiment designs they were interested in do not work well online, and researchers outside the US indicated having difficulties using services like Amazon Mechanical Turk.
Second, it appears there remain significant software challenges in helping most researchers do behavioral data collection online. The majority of our respondents listed Qualtrics, a service for conducting online surveys, as the tool they were currently most likely to use. At the same time, 79% of respondents were interested in running full experiments online including multiple trials, fixation crosses, etc…
Anyway, that take home seems to be that the acceptance of online data collection is increasing, but the tools for doing this are still lacking in ways most relevant to behavioral researchers.
We conducted this study exactly because our lab is working on open-source tools to help with this. While our approach doesn’t solve every problem that exists in the community, it does seem to tap many of the concerns these respondents have. This is definitely an interesting space for future development and there seems to be a market for tools which simplify online data collection and which provide services for those outside the US who can’t leverage the Amazon Mechanical Turk platform. It will be interesting to re-run this survey in a couple years as progress in this area continues!
Who took the survey?
Data was collected via a variety of psychology, linguistics, and neuroscience oriented bulletin boards and mailing lists as well as social media outlets (e.g., Twitter/Facebook). Data collection took place during the later part of March 2014 until mid April. 201 respondents completed the survey.
What is your main area of work?
Have you collected behavioral data online in your lab?
Do you know how to program using web technologies?
If yes, what languages do you know?
If yes, what languages do you PREFER?
|No, but I might ask a grad student or research assistant to learn these languages||16||9%|
What do people think about online data?
What are the major CHALLENGES in using behavioral data collected online in research?
|Data is unreliable||70||35%|
|Experiment designs I’m interested in do not work well online||100||50%|
|Population is unrepresentative||51||25%|
|The technology required is too complex||53||26%|
|I cannot get IRB approval to do online studies||3||1%|
|I am based outside the US and find it difficult to use services like Amazon Mechanical Turk||46||23%|
What are the major BENEFITS in using behavioral data collected online?
|Large samples sizes||187||93%|
|More diverse population||121||60%|
|Fast data collection||196||98%|
When reviewing a paper with data collected online in my area (e.g., Mechanical Turk):
|I sometimes subjectively feel the researchers were too lazy to run a good study in the lab, but might still accept||14||6%|
|I treat the data the same as data collected in the lab||92||40%|
|I hold the study to a higher standard because I believe online data is more unreliable||33||14%|
|I tend to reject these papers||0||0%|
|I have never reviewed a paper involving online data collection||71||31%|
What are people looking for in an online data collection system?
If you were to use online data in your research, what role would it primarily play?
|Conducting simple surveys||135||67%|
|Conducting full experiments (i.e., multiple trials, fixation crosses, etc…)||159||79%|
|Transcription of audio or video files||22||11%|
|Coding of video files||17||8%|
If you were required to run an online experiment tomorrow what tool would you use?
|I’d create my experiment using Adobe Flash||8||4%|
|I’d hire a programmer||25||12%|
|I have no idea!||23||11%|
Would you be interested in a software tool that helped simplify online data collection?
|No, I’m not interested||2||1%|
|No, I have my own system/method||11||5%|
If yes, select features of such a system you would find appealing/useful or add your own:
|Ability to automatically pay people||138||69%|
|Ability to automatically assign bonuses based on performance||117||58%|
|No need to have a separate server/webserver installed and maintained||123||61%|
|Ability to block participants from doing the same experiment twice||181||90%|
|Ability to record information about browser (e.g. when people switch windows during task)||125||60%|
|Ability to see if a online participant has done a similar study to yours in the past||125||60%|
|Automatically fill in conditions randomly and evenly||140||52%|
|Open-source software I can read/edit myself if needed||121||60%|
|Ability to save data from experiment incrementally (in case browser or network crash can get partial data)||106||53%|
|Ability to obtain geographic and visualize geographic information about where participants are connecting from||101||50%|
|Availability of example code that I can use to jump-start my own experiment||128||64%|
|Ability to run group experiments (i.e., multiple people interacting)||99||49%|
|Ability to run multi-day experiments where people come back for multiple sessions||115||57%|
|Ability to store data from my experiments in the cloud and have them automatically backed up||100||50%|
|A graphical user interface for managing my online experiments (e.g., paying participants, viewing data)||121||60%|
|Tools to help document payments for reimbursement from University or Business (i.e., accounting issues)||97||48%|
|Ability to easily coordinate the running of multiple experiments at the same time||110||55%|
If yes, would you prefer this tool to use the command line or a graphical user interface?
|Graphical user interface||137||74%|
How comfortable are you using the command line (e.g., terminal)?
|I’m a pro||22||11%|
|Not very comfortable, but I can get around.||62||31%|
|What’s a command line?||30||15%|
Online data collection requires a server that can run on your local computer or in the cloud. Which do you prefer?
|Cloud server (e.g., the way Google Docs or Facebook works)||118||58%|
Does your university or company provide you with an internet-addressable IP address (e.g., a static IP address you can use to connect to your office computer from home)?