Check out our Editors-in-Chief’s selection of papers from the July issue of PLOS Computational Biology.
Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning
Food security is a growing global concern. Farmers, plant breeders, and geneticists are
hastening to address the challenges presented to agriculture by climate change, dwindling arable land, and population growth. Scientists in the field of plant phenomics are using satellite and drone images to understand how crops respond to a changing environment and to combine genetics and environmental measures to maximize crop growth efficiency. However, the terabytes of image data require new computational methods to extract useful information. Machine learning algorithms are effective in recognizing select parts of images, but they require high quality data curated by people to train them, a process that can be laborious and costly. Naihui Zhou and colleagues examined how well crowdsourcing works in providing training data for plant phenomics, specifically, segmenting a corn tassel—the male flower of the corn plant—from the often-cluttered images of a cornfield. They provided images to students, and to Amazon MTurkers, the latter being an on-demand workforce brokered by Amazon.com and paid on a task-by-task basis. They report on best practices in crowdsourcing image labeling for phenomics, and compare the different groups on measures such as fatigue and accuracy over time. They find that crowdsourcing is a good way of generating quality labeled data, rivaling that of experts.
An enormous potential for niche construction through bacterial cross-feeding in a homogeneous environment
Biodiversity can emerge in a completely homogeneous environment from populations
with initially genetically identical individuals. This striking observation comes from experimental evolution of bacteria, which create new ecological niches when they excrete nutrient-rich waste products that can sustain the life of other bacteria. It is difficult to estimate the potential of any one organism for such metabolic niche construction experimentally, because it is challenging to screen for novel metabolic abilities on a large scale. Magdalena San Roman and Andreas Wagner therefore used experimentally validated models of bacterial metabolism to predict how many novel niches organisms like Escherichia coli can construct, if a novel niche must be able to sustain a stable community of microbes that differ in the nutrients they consume. They identify thousands of such niches. They differ in their primary carbon source and a secondary carbon source that is excreted by some microbes and used by others. Because they restricted themselves to chemically simple environments, they may even have underestimated the enormous potential of microbes for niche construction.
The evolutionary dynamics of metabolic protocells
The protocell hypothesis conjectures the existence of a vesicle containing catalytic and replicating sequences as the primordial cellular organization during the early stages of the evolution of life. Mathematical models of protocells traditionally consider RNA sequences being encapsulated and having both an informational and a catalytic role in the same molecule. Because of this dual function, the protocell sequences are evolutionarily constrained. Mathematical models have been extensively used to study the evolutionary dynamics of protocells with a focus on the processes, like mutation or stochastic sequence assortment upon division, that affect the protocell information capacity in terms of the coexistence of different sequence types. Here Ximo Pechuan and colleagues introduce a simple model of metabolic networks whose output determines the survival of the protocell with the aim of studying the effect of modifying the kinetic and architectural properties of the network on sequence coexistence. They find that stochastic assortment and mutation limit the architectures able to be encapsulated by the protocell with a given fraction of the population harbouring all possible sequence types.
miRAW: A deep learning-based approach to predict microRNA targets by analyzing whole microRNA transcripts
microRNAs are small RNA molecules that regulate biological processes by binding to the 3’UTR of a gene and their dysregulation is associated with several diseases. Computationally predicting these targets remains a challenge as they only partially match their target and so there can be hundreds of targets for a single microRNA. Current tools assume that most of the knowledge defining a microRNA-gene interaction can be captured by analysing the binding produced in the seed region (∼ the first 8nt in the miRNA). However, recent studies show that the whole microRNA can be important and form non-canonical targets. Here, Albert Pla and colleagues use a target prediction methodology that relies on deep neural networks to automatically learn the relevant features describing microRNA-gene interactions for predicting microRNA targets. This means they make no assumptions about what is important, leaving the task to the deep neural network. A key part of the work is obtaining a suitable dataset. Thus, they collected and curated more than 150,000 experimentally verified microRNA targets and used them to train the network. Using this approach, they are able to gain a better understanding of non-canonical targets and to improve the accuracy of state-of-the-art prediction tools.