deep learning: new computational modelling techniques for genomics


Modeling polypharmacy side effects with graph convolutional networks. Durbin, R., Eddy, S. R., Krogh, A. Nucleic Acids Res. As a data-driven science, genomics largely utilizes machine learning to capture dependencies in data and derive novel biological hypotheses. Genet. Enhanced regulatory sequence prediction using gapped k-mer features. Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. Figure 1 : A biological neuron and it computational model used in an NN [1] Deep Learning in Genomics. Deep speech: scaling up end-to-end speech recognition. Nat. Nat. Nature 403, 601–603 (2000). This paper applies deep CNNs to predict chromatin features and transcription factor binding from DNA sequence and demonstrates its utility in non-coding variant effect prediction. In this paper, two models, a deep CNN and a linear model, are stacked to predict tissue-specific gene expression from DNA sequence, which demonstrates the utility of this approach in non-coding variant effect prediction. Rev. IEEE 104, 148–175 (2016). Zhang, Y. et al. Typically, each subsequent convolutional layer increases the dilation by a factor of two, thus achieving an exponentially increasing receptive field with each additional layer. Using deep learning methods on single-cell sequencing … & Qi, Y. Commun. Genomics Proteomics Bioinformatics 16, 320–331 (2018). 4, 85–91 (2017). MiRTDL: a deep learning approach for miRNA target prediction. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Oliver, S. Guilt-by-association goes global. Learn. & Manzagol, P.-A. Nat. Nat. Cogn. Goodfellow, I., Bengio, Y. BMC Bioinformatics 18, 512 (2017). and JavaScript. Bioinformatics 34, 3035–3037 (2018). & Noble, W. Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Deep inside convolutional networks: visualising image classification models and saliency maps. Introduction. Mikheyev, A. S. & Tin, M. M. Y. https://kundajelab.github.io/dragonn/tutorials.html, Kaggle machine learning competitions: PubMed  PubMed Central  Removal of batch effects using distribution-matching residual networks. Nat. Opin. (PWM). IEEE Trans. AIChE J. A function that maps real numbers to [0,1], defined as 1/(1 + e −x). & Zisserman, A. By effectively leveraging large data sets, deep learning has transformed fields such as computer vision and natural language processing. Li, Y., Shi, W. & Wasserman, W. W. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods. Science 337, 1190–1195 (2012). 13, 281–305 (2012). Dermatologist-level classification of skin cancer with deep neural networks. Clipboard, Search History, and several other advanced features are temporarily unavailable. Nat. Mol. 12, 878 (2016). Advances in deep learning created an unprecedented momentum in biomedical informatics and have given rise to new bioinformatics and computational biology research areas. Res. Nat. Syst. Among these submissions, we observed more papers coming from emerging fields such as deep learning, new genomic technologies, and big medical data science. We highlight the difference and similarity in widely utilized models in deep learning … Accessibility Zitnik, M. & Leskovec, J. Tan, J., Hammond, J. H., Hogan, D. A. https://www.kaggle.com/sudalairajkumar/winning-solutions-of-kaggle-competitions, Keras model zoos: the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in (t-SNE). Advances in deep learning created an unprecedented momentum in biomedical informatics and have given rise to new bioinformatics and computational biology research areas. Nat. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Zhou, J. An array that stores the information of the patterns observed in the sequence elements previously processed by a recurrent neural network. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. Biotechnol. Generating and designing DNA with deep generative models. Deep learning: new computational modelling techniques for genomics. Use of the ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Resour. To obtain Genet. Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks. Kingma, D. P. & Welling, M. Auto-encoding variational bayes. AlQuraishi, M. End-to-end differentiable learning of protein structure. These authors contributed equally: Gökcen Eraslan, Žiga Avsec. Sung, K. & Poggio, T. Example-based learning for view-based human face detection. Preprint at bioRxiv https://doi.org/10.1101/265231 (2018). Would you like email updates of new search results? Eraslan, G., Avsec, Ž., Gagneur, J. et al. Ghandi, M., Lee, D., Mohammad-Noori, M. & Beer, M. A. Computational Genomics Group. Cheng, S. et al. Nature Reviews Genetics IEEE Trans. Get the most important science stories of the day, free in your inbox. Tan, J. et al. Internet Explorer). mSystems 1, e00025–15 (2016). Transformation of the log-odds with the sigmoid activation function leads to predicted probabilities. & Bengio, Y. Preprint at arXiv https://arxiv.org/abs/1706.02216 (2017). Goodfellow, I. et al. Nat. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Referring to neural networks that process graph-structured data; they generalize convolution beyond regular structures, such as DNA sequences and images, to graphs with arbitrary structures. Deep learning: new computational modelling techniques for genomics Author: Eraslan, Gökcen Avsec, Žiga Gagneur, Julien Theis, Fabian J. Curr. Nat. Shaham, U. et al. Bethesda, MD 20894, Copyright Sci. Hochreiter, S. & Schmidhuber, J. Machine Learning Methods in Computational Toxicology. Res. doi: 10.1093/nargab/lqaa039. By effectively leveraging large data sets, deep learning has transformed fields such as computer vision and natural language processing. Avsec, Ž., Barekatain, M., Cheng, J. Single cell RNA sequencing (scRNAseq), method of the year 2013 (Nature Methods), has now matured and large … Greenside, P., Shimko, T., Fordyce, P. & Kundaje, A. Discovering epistatic feature interactions from neural network models of regulatory DNA sequences. Boser, B. E., Guyon, I. M. & Vapnik, V. N. A. in Proceedings of the Fifth Annual Workshop on Computational Learning Theory 144–152 (ACM, 1992). Nucleic Acids Res. This paper describes the application of a deep CNN to predict chromatin accessibility in 164 cell types from DNA sequence. However, the ability to extract new insights from the exponentially increasing volume of genomics data requires more expressive machine learning models. Deep generative models are often implemented by a neural network that transforms samples from a standard distribution (normal and uniform) into samples from a complex distribution (gene expression levels or sequences that encode a splice site). Machine learning for integrating data in biology and medicine: principles, practice, and opportunities. 14, 719–732 (2013). 9, 3135 (2018). & Vinař;, T. DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads. Nat. , 20 ( 2019 ) , pp. Amodio, M. et al. 80–91 (World Scientific, 2018). Preprint at bioRxiv https://doi.org/10.1101/085118 (2016). Brief. Rep. 6, 28517 (2016). Biocomput. Practical recommendations for gradient-based training of deep architectures. Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & de Freitas, N. Taking the human out of the loop: a review of bayesian optimization. Lee, D., Karchin, R. & Beer, M. A. Discriminative prediction of mammalian enhancers from DNA sequence. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Elman, J. L. Finding structure in time. 58, 415–434 (1963). Gökcen Eraslan, et al. Models able to generate points from the desired distribution. in Advances in Neural Information Processing Systems 27 (NIPS 2014) (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D. & Weinberger, K. The intersection of deep learning methods and genomic research may lead to a profound understanding of genomics that will benefit multiple fields including precision medicine (Leung et al., 2016), … A function applied to an intermediate value x within a neural network. Deep learning for computational biology. Kim, H. K. et al. 18, 6765–6816 (2017). Character sequence of a certain length. eLife 6, e27041 (2017). Preprint at arXiv https://arxiv.org/abs/1609.02907 (2016). Cuperus, J. T. et al. & Courville, A. Nat. Using deep learning to model the hierarchical structure and function of a cell. & Selbig, J. Non-linear PCA: a missing data approach. Preprint at arXiv https://arxiv.org/abs/1804.10253 (2018). Nat Rev Genet 20, 389–403 (2019). BMC Bioinformatics 18, 136 (2017). Stormo, G. D. DNA binding sites: representation and discovery. Cell 129, 823–837 (2007). & Le, Q. V. Do better ImageNet models transfer better? Preprint at arXiv https://arxiv.org/abs/1610.02527 (2016). Rep. 8, 16329 (2018). Preprint at bioRxiv https://doi.org/10.1101/310458 (2018). 32, 1627–1645 (2010). Preprint at arXiv https://arxiv.org/abs/1409.0575 (2014). J. Mach. Pac. Ching, T. et al. Zhang, W. et al. Bioinformatics 34, i629–i637 (2018). Preprint at arXiv https://arxiv.org/abs/1804.01694 (2018). Deep Learning for Genomics. Since their introduction 3,4, deep learning methods have dominated computational modeling strategies in genomics where they are now routinely used to address a variety of questions … CAS  Predicting the clinical impact of human mutation with deep neural networks. & Yan, Q. Axiomatic attribution for deep networks. Preprint at arXiv https://arxiv.org/abs/1805.08974 (2018). Advances in AI software and hardware, especially deep learning … 30, 595–608 (2016). Layers are a list of artificial neurons that collectively represents a function that take as input an array of real numbers and returns an array of real numbers corresponding to neuron activations. Direct cell reprogramming: approaches, mechanisms and progress. Privacy, Help Shrikumar, A., Greenside, P., Shcherbina, A. Ecol. The desired output used to train a supervised model. A first look at the Oxford Nanopore MinION sequencer. Nat. Supervised learning algorithms that train multiple decision trees in a sequential manner; at each time step, a new decision tree is trained on the residual or pseudo-residual of the previous decision tree. & Tuytelaars, T.) Vol. Jha, A., Gazzara, M. R. & Barash, Y. Integrative deep models for alternative splicing. Genome Res. LeCun, Y., Bengio, Y. Machine learning methods are general‐purpose approaches to learn functional relationships from data without the need to define them a priori (Hastie et al, 2005; Murphy, 2012; Michalski et al, 2013).In computational biology, their appeal is the ability to derive predictive models … Cell Syst. Breiman, L. Random forests. Preprint at arXiv https://arxiv.org/abs/1711.02257 (2017). Q.) Predicting multicellular function through multi-layer tissue networks. Tieleman, T. & Hinton, G. Lecture 6.5 - RMSProp, COURSERA: neural networks for machine learning (2012). 2672–2680 (Curran Associates Inc., 2014). Preprint at arXiv https://arxiv.org/abs/1703.01365 (2017). You are using a browser version with limited support for CSS. Referring to a neural network layer that processes sequential data. Genome Res. Mitra, K., Carvunis, A.-R., Ramesh, S. K. & Ideker, T. Integrative approaches for finding modular structure in biological networks. A wide class of machine learning models with a design that is loosely based on biological neural networks. An artificial neuron aggregates the inputs from other neurons and emits an output called activation. Gradients with respect to the loss function are used to update the neural network parameters during training. & Garnett, R.) 3844–3852 (Curran Associates Inc., 2016). BMC Bioinformatics 19, 202 (2018). A new deep-learning method, DeepCpG, helps scientists better understand the epigenome – the biochemical activity around the genome. PLOS ONE 12, e0178751 (2017). FOIA Nat. Park, P. J. ChIP-seq: advantages and challenges of a maturing technology. In this paper, a deep CNN is trained to call genetic variants from different DNA-sequencing technologies. An axis other than one of the positional axes. Nature 489, 57–74 (2012). Pawlowski, N., Caicedo, J. C., Singh, S., Carpenter, A. E. & Storkey, A. Automating morphological profiling with generic deep convolutional networks. & Gagneur, J. Nat Rev Genet. 2021 Feb 22. doi: 10.1038/s41580-021-00335-z. 10, 669–680 (2009). Get time limited or full article access on ReadCube. Symp. Bergstra, J. Jaganathan, K. et al. Res. This paper describes a pioneering convolutional neural network application in genomics. Preprint at arXiv https://arxiv.org/abs/1806.06975 (2018). Genet. Problems in the analysis of survey data, and a proposal. In this paper, a deep CNN was trained to predict more than 4,000 genomic measurements including gene expression as measured by cap analysis of gene expression (CAGE) for every 150 bp in the genome using a receptive field of 32 kb. & Mitchison, G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge Univ. Stormo, G. D., Schneider, T. D., Gold, L. & Ehrenfeucht, A. Julien Gagneur or Fabian J. Theis. & Xie, X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Modern machine learning methods, such as deep learning… However, the ability to extract new insights from the exponentially increasing volume of genomics data requires more expressive machine learning models. Kunin, D., Bloom, J. M., Goeva, A. It is evident that deep learning models can provide higher accuracies in specific tasks of genomics … Learn. Now, it is becoming the method of choice for many genomics modelling tasks, including predicting the impact of genetic variation on gene regulatory mechanisms such as DNA accessibility and splicing. 16, 199–231 (2001). Deng, Y., Bao, F., Dai, Q., Wu, L. & Altschuler, S. Massive single-cell RNA-seq analysis and imputation via deep learning. One or more bottleneck layers have lower dimensionality than the input, which leads to compression of data and forces the autoencoder to extract useful features and omit unimportant features in the reconstruction. Sundaram, L. et al. 29, 1189–1232 (2001). Biotechnol. Preprint at bioRxiv https://doi.org/10.1101/151274 (2017). The region of the input that affects the output of a convolutional neuron. 50, 1161–1170 (2018). Single cells make big data: new challenges and opportunities in transcriptomics. 20, 61–80 (2009). Preprint at arXiv https://arxiv.org/abs/1806.01261 (2018). Google Scholar. Guo, M., Haque, A., Huang, D.-A., Yeung, S. & Fei-Fei, L. in Computer Vision – ECCV 2018 (eds Ferrari, V., Hebert, M., Sminchisescu, C. & Weiss, Y.) Genet. Cho, H., Berger, B. KoneČný, J., McMahan, H. B., Ramage, D. & Richtárik, P. Federated optimization: distributed machine learning for on-device intelligence. Poplin, R. et al. Zitnik, M., Agrawal, M. & Leskovec, J. Nature Reviews Genetics thanks C. Greene and the other anonymous reviewer(s) for their contribution to the peer review of this work. Life Sci. Preprint at arXiv https://arxiv.org/abs/1603.04467 (2016). Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A. Quang, D. & Xie, X. FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data. (2021), Environmental Microbiology Reports J. Mach. Way, G. P. & Greene, C. S. in Biocomputing 2018: Proceedings of the Pacific Symposium (eds Altman, R. B. et al.) Published in Genome Biology by researchers at EMBL-EBI, the Babraham Institute and the Sanger Institute, DeepCpG leverages ‘deep neural networks’, multi-layered machine-learning models inspired by the brain, to gain new … Methods 15, 30 (2018). Boža, V., Brejová, B. Preprint at arXiv https://arxiv.org/abs/1312.6114 (2013). Zeiler, M. D. & Fergus, R. in Computer Vision – ECCV 2014 (eds Fleet, D., Pajdla, T., Schiele, B. BMC Genomics 19, 511 (2018). Parameters of a convolutional layer. Google’s neural machine translation system: bridging the gap between human and machine translation. Intell. Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction. Wang, D. & Gu, J. VASC: dimension reduction and visualization of single-cell RNA-seq data by deep variational autoencoder. Reconstructing cell cycle and disease progression using deep learning. A primer on deep learning in genomics. (2021), Mathematical Biosciences and Engineering Intell. Dutil, F., Cohen, J. P., Weiss, M., Derevyanko, G. & Bengio, Y. Proc. Genet. More recently, deep learning was adopted to process DNA sequence data and Convolutional Neural Networks (CNN) is the most wildly used deep learning model … The quantification values of the contributions of features to a current model prediction. Activation functions are usually nonlinear yet very simple, such as the rectified-linear unit or the sigmoid function. Referring to a neural network layer that processes data stored in n-dimensional arrays, such as images. Methods 13, 603 (2016). Friedman, J. H. Greedy function approximation: a gradient boosting machine. Science 316, 1497–1502 (2007). Inputs and activations of artificial neurons are real numbers. Lotfollahi, M., Alexander Wolf, F. & Theis, F. J. Generative modeling and latent space arithmetics predict single-cell perturbation response across cell types, studies and species. Zitnik, M. et al. However, the ability to extract new insights from the exponentially increasing volume of genomics data requires more expressive machine learning models. Preprint at arXiv https://arxiv.org/abs/1605.01713 (2016). Genet. Nature 521, 436–444 (2015). Preprint at bioRxiv https://doi.org/10.1101/315556 (2018). Killoran, N., Lee, L. J., Delong, A., Duvenaud, D. & Frey, B. J. Exploring single-cell data with deep multitasking neural networks. Widely used activation function defined as max(0, x). A function that replaces the output at a certain location with a summary statistic of the nearby outputs. Feature importance scores defined as the gradient of the model output with respect to the model input multiplied by the input values. Genome Biol. Deep learning, as an emerging branch from machine learning, has exhibited unprecedented performance in quite a few applications from academia and industry. Filters that skip some values in the input layers. Barski, A. et al. Bai, S., Zico Kolter, J. 26, 990–999 (2016). Methods 12, 931–934 (2015). Preprint at bioRxiv https://doi.org/10.1101/237065 (2019). RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. van der Maaten, L. in Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (eds van Dyk, D. & Welling, M.) Vol. Chen, J., Ma, T. & Xiao, C. FastGCN: fast learning with graph convolutional networks via importance sampling. PlasClass ( published 2020, PLOS … Bioinformatics 34, i457–i466 (2018). Kearnes, S., McCloskey, K., Berndl, M., Pande, V. & Riley, P. Molecular graph convolutions: moving beyond fingerprints. Weirauch, M. T. et al. Vincent, P., Larochelle, H., Bengio, Y. Genet. Zeng, T., Li, R., Mukkamala, R., Ye, J. 16, 321–332 (2015). 2018;1800:119-139. doi: 10.1007/978-1-4939-7899-1_5. Bengio, Y. 2020 May 25;2(2):lqaa039. http://pytorch.org, PyTorch model zoos: 2017 Aug;18(3):273-284. doi: 10.1007/s10339-017-0796-7. Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Biol. Google Scholar. Quang, D., Chen, Y. Methods Mol Biol. Kalinin, A. 50, 1171–1179 (2018). 33, 831–838 (2015). Bioinformatics 34, 1261–1269 (2018). PLOS Comput. 389 - 403 CrossRef View Record in Scopus Google Scholar 51, 12–18 (2019). Nat. In the meantime, to ensure continued support, we are displaying the site without styles Felzenszwalb, P. F., Girshick, R. B., McAllester, D. & Ramanan, D. Object detection with discriminatively trained part-based models. in IEEE Transactions on Big Data (IEEE, 2018). Quang, D. & Xie, X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nat. 9, 2002 (2018). This rapid increase in biological data dimension and acquisition rate is challenging conventional analysis strategies. Supervised learning algorithms in which the prediction is made by making a series of decisions of type ‘is feature i larger than x’ (internal nodes of the tree) and then predicting a constant value for all points satisfying the same decisions series (leaf nodes). Brown, P. O. Inf. Yosinski, J., Clune, J., Bengio, Y. Kelley, D. R., Snoek, J. Biotechnol. Commun. Genet. Machine learning models that embed the entire data-processing pipeline to transform raw input data into predictions without requiring a preprocessing step. An unsupervised method for partitioning the observations into clusters by alternating between refining cluster centroids and updating cluster assignments of observations. Preprint at arXiv https://arxiv.org/abs/1312.6034 (2013). The same neural network is applied to each node and edge in the graph. volume 20, pages389–403(2019)Cite this article. Top. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. However, the ability to extract new … Libbrecht, M. W. & Noble, W. S. Machine learning applications in genetics and genomics. Aug 20, 2019 - As a data-driven science, genomics largely utilizes machine learning to capture dependencies in data and derive novel biological hypotheses. Nawy, T. Spatial transcriptomics. 11, 3371–3408 (2010). The same fully connected layer is applied to multiple local patches of the input array. Nat. Zhou, J. et al. 5 384–391 (PMLR, 2009). Genet. Q.) Schmidhuber, J. Ozaki, K. et al. Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. 12, 56–68 (2011). ImageNet large scale visual recognition challenge. https://kundajelab.github.io/dragonn/tutorials.html, https://www.kaggle.com/sudalairajkumar/winning-solutions-of-kaggle-competitions, https://pytorch.org/docs/stable/torchvision/models.html, https://doi.org/10.1038/s41576-019-0122-6, Interpretable detection of novel human viruses from genome sequencing data, Electrostatic features for nucleocapsid proteins of SARS-CoV and SARS-CoV-2, Deep learning in next-generation sequencing, Variation of bacterial communities along the vertical gradient in Lake Issyk Kul, Kyrgyzstan, MolluscDB: an integrated functional and evolutionary genomics database for the hyper-diverse animal phylum Mollusca. Nat. Sci. Deep learning to predict the lab-of-origin of engineered DNA. Commun. For example, the so-called L2 regularization adds the sum of the squares of the model parameters to the loss function to penalize large model parameters. PubMed  The scenario in which the model fits the training set very well but does not generalize well to unseen data. A set of techniques, which consist of a sequence of elementary arithmetic operations, used to automatically differentiate a computer program. Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity. COVID-19 salivary Raman fingerprint: innovative approach for the detection of current and past SARS-CoV-2 infections. Privacy-preserving generative deep neural networks support clinical data sharing. Nat Rev Mol Cell Biol. Nucleic Acids Res. Defferrard, M., Bresson, X.