Tap into the wisdom of the crowd on a large scale
Our methodology based on years of research and unique industry expertise can help you successfully tap into the wisdom of the crowd on a large scale. If you want to efficiently use the knowledge of thousands of people to get clean and accurate data for your ML needs, follow our tips for each of these essential steps.
Quality control lies at the heart of crowdsourcing.
Use our examples as benchmarks to achieve the described levels of quality on popular research datasets.
Our dataset has balanced distributions of age and gender using the well-known IMDB-WIKI dataset as ground truth. We describe how our dataset is built and then compare several baseline methods, indicating its suitability for model evaluation.
Domain-specific data is the crux of the successful transfer of machine learning systems from benchmarks to real life. Crowdsourcing has become one of the standard tools for cheap and time-efficient data collection for simple problems such as image classification: thanks in large part to advances in research on aggregation methods.
In this paper, we demonstrate Crowd-Kit, a general-purpose crowdsourcing computational quality control toolkit. It provides efficient implementations in Python of computational quality control algorithms for crowdsourcing, including uncertainty measures and crowd consensus methods.
This paper reviews the crowdsourced audio transcription shared task devoted to this problem and co-organized with the Crowd Science Workshop at VLDB 2021.
We study the problem of predicting future hourly earnings and task completion time for a crowdsourcing platform user who sees the list of available tasks and wants to select one of them to execute.
In this paper, we address the problem of labeling text images via CAPTCHA, where user identification is generally impossible. We propose a new algorithm to aggregate multiple guesses collected through CAPTCHA.