top of page

SARAHLIU

CrowdForge Replication

Devised experimental methods for a replication study on the benefits and limitations of a crowdsourcing framework, as part of my work in the Computer Science Department at Oberlin College

ROLE

Research Assistant
(Computer Science)

DURATION

Jan 2023 - May 2024
(16 months)

TEAM

Molly Feldman PhD.
Boo Elliot

TOOLS

R Programming
Figma

Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E. Kraut. 2011. CrowdForge: crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology (UIST '11). Association for Computing Machinery, New York, NY, USA, 43–52. https://doi.org/10.1145/2047196.2047202

Overview

What is crowdsourcing?

Crowdsourcing is the practice of obtaining services, ideas, or content from a large group of people, typically via the Internet. It leverages collective intelligence to solve problems or complete tasks, often through digital platforms like Amazon Mechanical Turk and crowdfunding sites. This approach allows organizations to tap into a diverse pool of resources and creativity beyond traditional employees.

Our Purpose

Our paper replicates a study introducing a fundamental crowdsourcing framework in Human-Computer Interaction. Replication is essential in academia to validate the reliability of research findings by confirming if results hold across different conditions or datasets. This ensures that conclusions are sound and not influenced by chance or bias. Replication also helps identify methodological flaws, leading to more credible and rigorous research across disciplines.

CrowdForge Framework

"CrowdForge: Crowdsourcing Complex Work" by Kittur et al. presents a framework designed to tackle complex, interdependent tasks through crowdsourcing platforms like Amazon Mechanical Turk. By breaking down complicated tasks into smaller subtasks, CrowdForge allows for efficient distribution and coordination among crowd workers. The paper consists of a few case studies to determine the effectiveness of this CrowdForge framework.

One part of the study involves having a crowd write an Encyclopedia article. The framework divides this complex task into smaller groups, detailed in the image below:

1. Partition tasks— outline the Encyclopedia article
2. Map tasks— give facts that apply to each outline heading
3. Reduce tasks— Compile all the facts and merge into a whole article

Screenshot 2024-10-07 at 11.06.47 PM.png

Devising Methods

Boo (my lab-mate) and I combed through the Kittur et al. article to fully understand the methodology, but we faced several obstacles:

Challenges:

— The original article sometimes lacked specific details about certain steps in the methodology, making accurate replication difficult.
— We needed to update aspects to reflect today’s standards; for example, the payment for crowd workers was very low in 2011 (~ 5 cents per subtask), requiring revisions for economic and ethical reasons.
— Each subtask in the experiment builds on the previous work of different workers, leading to significant variations in results based on the quantity and quality of contributions. This interdependence complicates replication efforts.

Solutions:

— We used context clues from the article and the source code to address the gaps in information.
— Payments for workers were calculated based on the current minimum wage.
— In cases where the original article's methodology was unclear, we developed a best practices approach, conducting the experiment according to our understanding of optimal methods.

Next, we focused on drafting the IRB application and consent forms tailored to each specific subtask. We also created detailed instructions for each subtask for workers on MTurk, providing clear examples of the expectations for each step.

Data Analysis

While waiting for the Principal Investigator to resolve issues with MTurk, I developed code in R to prepare for analyzing the text data we would eventually receive. I ensured the code could organize the final product articles when imported into R and extract key information, such as word count, from various subtask outputs. This allowed us to efficiently manage and analyze the data across different phases of the experiment.

Drafting

I also wrote the introduction for the final paper, where I discussed the concept of crowdsourcing and the significance of replication studies. To support our points, my lab-mate and I conducted an extensive literature review, gathering relevant sources. Throughout this process, I learned how to format and structure Computer Science papers, which was a new experience for me since I'm more familiar with Psychology papers. The differences in style and structure helped expand my understanding of academic writing across disciplines.

Designing Visuals

To make the task flow easier to understand, we created visual diagrams using Figma as aids for the 3 experiments performed in the original paper. My experience with Figma from UX projects came in handy for this!

Frame 6.png

Frame 7.png

Frame 8.png

Takeaways

I completed this project when I graduated from Oberlin, though it’s still ongoing. Once published, I am excited to be listed as a co-author. Some key takeaways from my experience include:

— Open data and transparency are essential for research replication and should become the norm.
— Science is inherently ambiguous, and replication helps mitigate uncertainties and solidify knowledge over time.
— Ethical considerations, such as fair compensation for MTurk workers, are crucial.

bottom of page