Skip to content

Latest commit

 

History

History
59 lines (51 loc) · 10.1 KB

03.when-to-use-dl.md

File metadata and controls

59 lines (51 loc) · 10.1 KB

Tip 1: Decide whether deep learning is appropriate for your problem {#appropriate}

In recent years, the number of projects and publications implementing deep learning in biology has risen tremendously [@doi:10.1089/omi.2018.0097; @doi:10.1007/978-981-15-3383-9_54; @doi:10.3390/info11040193]. This trend is likely driven by deep learning's usefulness across a range of scientific questions and data modalities and can contribute to the appearance of deep learning as a panacea for nearly all modeling problems. Indeed, neural networks are universal function approximators and derive tremendous power from this theoretical capacity to learn any function [@doi:10.1007/BF02551274; @doi:10/dzwxkd]. However, in reality, deep learning is not suited to every modeling situation and can be significantly limited by its large demands for data, computing power, programming skill, and modeling expertise.

While large amounts of high-quality data may be available in the areas of biology where data collection is thoroughly automated, such as DNA sequencing, areas of biology that rely on manual data collection may not possess enough data to train and apply deep learning models effectively. Methods that try to increase the amount of training data, such as data augmentation (in which existing data is slightly manipulated in an attempt to yield "new" samples) [@doi:10.3390/info11020125] and weak supervision (in which simple labeling heuristics are combined to produce noisy, probabilistic labels) [@arxiv:1605.07723v3] are valuable for small- to medium-scale datasets. For example, when classifying microalgae using models trained on 21,000 images, data augmentation improved the prediction accuracy by 17% [@doi:10.1109/ICMLA.2017.0-183]. In a text detection context based on a small dataset of 229 fully annotated scene text images, weakly supervised learning improved the precision by 11% [@doi:10.1109/ICCV.2017.166]. However, methods cannot overcome substantial data shortages in many practical scenarios, and recent research investigating machine learning methods in neuroimaging studies of depression suggests that high prediction accuracies obtained from small datasets may be caused by misestimation due to insufficient test dataset sizes [@doi:10.1038/s41386-021-01020-7].

In the fields of computer vision and natural language processing, deep neural networks are routinely trained on sample sizes ranging from hundreds of thousands to millions of training examples [@doi:10.1109/ICCV.2017.97; @doi:10.18653/v1/2020.acl-main.747]. Datasets of this size are often not available in many biological contexts. Still, it has been found that, under certain circumstances, deep learning can be considered for datasets with only one hundred samples per class [@arxiv:1511.06348]. Nonetheless, deep learning is generally best suited for datasets that contain orders of magnitude more samples.

Training deep learning models often requires extensive computing infrastructure and patience to achieve state-of-the-art performance [@doi:10.1109/JPROC.2017.2761740]. In some deep learning contexts, such as generating human-like text, state-of-the-art models have over one hundred billion parameters [@arxiv:2005.14165] and require very costly and time-consuming training procedures [@arxiv:1906.02243]. These types of large language models are being used in biology to learn representations of protein sequences [@arxiv:2004.03497; @arxiv:2007.06225; @doi:10.1073/pnas.2016239118]. Even though most deep learning applications in biology rarely require this much training, they can still require computational resources beyond those available on consumer-grade devices such as laptops or office desktops. Specialized hardware such as discrete graphics processing units (GPUs) and custom deep learning accelerators can dramatically reduce the time and cost required to train models [@doi:10.3390/info11040193]. Still, this hardware is not universally accessible, and cloud-based rentals add additional cost and complexity. These specialized hardware solutions are likely to be more broadly available as deep learning becomes more popular. For example, recent-generation iPhones already have such hardware. In contrast to the large-scale computational demands of deep learning, traditional machine learning models can often be trained on laptops (or even on a $5 computer [@arxiv:1809.00238]) in seconds to minutes. Therefore, due to this enormous disparity in resource demand alone, traditional machine learning approaches may be desirable in various biological applications.

Beyond requiring more data and computational capacity, building and training deep learning models often requires more expertise than training traditional machine learning models. While popular programming frameworks for deep learning such as TensorFlow [@arxiv:1603.04467] and PyTorch [@arxiv:1912.01703] allow users to create and deploy entirely novel model architectures, this flexibility combined with the rapid development of the deep learning field has resulted in large and complex frameworks that can be daunting to new users. For readers new to software development but experienced in biology, gaining computational skills while also interfacing with such complex industrial-grade tools can be a prohibitive challenge. Conversely, traditional machine learning methods are generally more straightforward to apply and are also more accessible through popular frameworks [@https://www.jmlr.org/papers/v12/pedregosa11a.html]. Furthermore, there are currently more tools for automating the model selection and training process for traditional machine learning models than for deep learning models. For example, automated machine learning (AutoML) tools, such as TPOT [@doi:10.1007/978-3-319-31204-0_9] and Turi Create [@https://github.com/apple/turicreate], are able to test and optimize multiple machine learning models automatically and can allow users to achieve competitive performance with only a few lines of code. There are efforts underway to extend these and other automation frameworks to reduce the expertise required to build and use deep learning models. TPOT, Turi Create, and AutoKeras [@arxiv:1806.10282] are already capable of abstracting away much of the programming required for "standard" deep learning tasks, and high-level interfaces such as Keras [@https://keras.io] and Fastai [@doi:10.3390/info11020108] make it increasingly straightforward to design and test custom deep learning architectures. In the future, projects such as these are likely to make deep learning accessible to a much wider range of researchers.

Despite these limitations, deep learning is strongly indicated over traditional machine learning for specific research questions and problems. In general, these include problems that feature hidden patterns across the data, complex relationships, and interrelated variables. Problems in computer vision and natural language processing often exhibit these very features, which helps explain why these areas were some of the first to experience significant breakthroughs during the recent deep learning revolution [@doi:10.1145/3065386]. As long as large amounts of accurate and labeled data are available, applications to areas of life sciences with related data characteristics, such as genetic medicine [@doi:10.3389/fpsyt.2018.00290], radiology [@doi:10.1007/s11604-018-0726-3], microscopy [@doi:10.1364/OPTICA.4.001437], and pharmacovigilance [@doi:10.1093/jamia/ocw180], are similarly likely to benefit from deep learning techniques. For example, Ferreira et al. used deep learning to recognize individual birds from images [@doi:10.1111/2041-210X.13436] despite this problem being very difficult historically. By combining automatic data collection using radio-frequency identification tags with data augmentation and transfer learning, the authors were able to use deep learning to achieve 90% accuracy across several species. Another research area where deep learning excels is generative modeling, where new samples are created based on the training data [@doi:10.1109/ISACV.2018.8354080]. For example, deep learning can generate realistic compendia of gene expression samples [@doi:10.1093/gigascience/giaa117]. One other area of machine learning that has been revolutionized by deep learning is reinforcement learning, which is concerned with training agents to interact with an environment [@arXiv:1709.06560]. Reinforcement learning has been applied to design druglike small molecules [@doi:10.1038/s41598-019-47148-x]. Overall, initial evaluation as to whether similar problems (including analogous ones in other domains) have been solved successfully using deep learning can inform researchers about the potential for deep learning to address their needs.

On the other hand, depending on the amount and type of data available and the nature of the problem set, deep learning may not always outperform conventional methods [@doi:10.1145/3106237.3106256; @doi:10.1186/s12859-020-3427-8]. As an illustration, Rajkomar et al. [@doi:10.1038/s41746-018-0029-1] found that simpler baseline models achieved performance comparable with deep learning in several clinical prediction tasks using electronic health records. Another example is provided by Koutsoukas et al., who benchmarked several traditional machine learning approaches against deep neural networks for modeling bioactivity data on moderately sized datasets [@doi:10.1186/s13321-017-0226-y]. The researchers found that while well-tuned deep learning approaches generally tend to outperform conventional classifiers, simple methods such as Naive Bayes classification tend to outperform deep learning as the dataset's noise increases. Similarly, Chen et al. [@doi:10.1038/s41746-019-0122-0] tested deep learning and a variety of traditional machine learning methods such as logistic regression and random forests on five different clinical datasets. They found that traditional methods matched or exceeded the accuracy of the deep learning model in all cases despite requiring an order of magnitude less training time.

In conclusion, deep learning should only be used after a robust consideration of its strengths and weaknesses for the problem at hand. After choosing deep learning as a potential solution, practitioners should still consider traditional methods as performance baselines.