Rayan Chikhirayan.chikhi@univ-lille1.fr @RayanChikhi I am a CNRS researcher in bioinformatics at University of Lille 1, France. Broadly speaking, my work consists in analyzing genomes using computers. We can read the DNA of humans, plants, animals, using sequencing instruments. This has transformed biology in the last decade, e.g. to identify mutations in genes, including those that are linked to diseases, to study evolution, and so much more. We would like to have a complete and precise understanding of genomes, but this isn't easy: the instruments produce a lot of data, so people like me develop methods to do the analysis. Recently, I contributed to the assembly of the giraffe genome and the gorilla Y-chromosome.Short bioI studied Computer Science at ENS Rennes and obtained a PhD under the supervision of D. Lavenier. After a postdoc at Penn State in P. Medvedev's lab, CNRS hired me as a junior researcher in 2014. I am currently part of the Bonsai bioinformatics team.Research topicsGenome analysis Algorithms and data structures De novo assembly

SoftwareMinia assemblerWhole genome de novo assembler with very low memory usage, described in [11].KmergenieAutomatic detection of the k-mer size for de novo assembly, described in [14].DSKK-mer counting software, low-memory, low disk usage, supports large values of k, described in [13].BCALM 2Very scalable de Bruijn graph compaction, described in [24].GATB LibraryC++ library for the development of reference-free Illumina data analysis software, described in [17].Publications[25] The Computational Pan-Genomics Consortium,, Briefings in Bioinformatics (2016) [PDF] [24] R. Chikhi, A. Limasset, P. Medvedev,Computational pan-genomics: status, promises and challengesy, ISMB (2016) [PDF] [23] M. Agaba et al.,Compacting de Bruijn graphs from sequencing data quickly and in low memory, Nature Communications (2016) [PDF] [22] M. Tomaszkiewicz et al.,Giraffe genome sequence reveals clues to its unique morphology and physiology, Genome Research (2016) [PDF] [21] K. Sahlin, R. Chikhi, L. Arvestad,A time- and cost-effective strategy to sequence mammalian Y Chromosomes: an application to the de novo assembly of gorilla Y, WABI (2015) [Open-access] [20] R. Chikhi, P. Medvedev, M. Milanic, S. Raskhodnikova,Genome scaffolding with PE-contaminated mate-pair libraries, CPM (2015) [Open-access] [19] R. Uricaru et al.,On the readability of overlap digraphs, Nucleic Acids Research (2014) [Open-access] [Webpage] [18] G. Rizk, A. Gouin, R. Chikhi, C. Lemaitre,Reference-free detection of isolated SNPs, Bioinformatics (2014) [Open-access] [Webpage] [17] E. Drezen et al.,MindTheGap: integrated detection and assembly of short and long insertions, Bioinformatics (2014) [Open-access] [Webpage] [16] R. Chikhi, A. Limasset, S. Jackman, J. Simpson, P. Medvedev,GATB: Genome Assembly & Analysis Tool Box, RECOMB (2014) [PDF] [15] K. R. Bradnam et al.,On the representation of de Bruijn graphs, GigaScience (2013) [PDF] [14] R. Chikhi, P. Medvedev,Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species, Bioinformatics (2013), HiTSeq (2013) Best Paper Award [PDF] [Webpage] [13] G. Rizk, D. Lavenier, R. Chikhi,Informed and Automated k-Mer Size Selection for Genome Assembly, Bioinformatics (2013) [PDF] [Webpage] [12] N. Maillet, C. Lemaitre, R. Chikhi, D. Lavenier, P. Peterlongo,DSK: k-mer counting with very low memory usage, RECOMB Comparative Genomics (2012) [PDF] [Webpage] [11] R. Chikhi, G. Rizk.Compareads: comparing huge metagenomic experiments, WABI (2012) [PDF] [Webpage] [10] P. Peterlongo, R. Chikhi,Space-efficient and exact de Bruijn graph representation based on a Bloom filter, BMC Bioinformatics (2012) [PDF] [Webpage] [9] G. Sacomoto et al.,Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer, RECOMB-seq, BMC Bioinformatics (2012) [PDF] [Webpage] [8] D. A. Earl et al.,KisSplice: de novo calling alternative splicing events from RNA-seq dataGenome Research (2011) [PDF] [7] G. Chapuis, R. Chikhi, D. Lavenier,Assemblathon 1: A competitive assessment of de novo short read assembly methods,, PPAM Parallel Bio-Computing Workshop (2011) [PDF] [6] R. Chikhi, D. Lavenier,Parallel and memory-efficient reads indexing for genome assembly, WABI (2011) [PDF] [5] R. Chikhi, L. Sael, D. Kihara,Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph, Protein function prediction for omics era, D. Kihara ed., Springer (2011) [PDF] [4] D. Kihara, L. Sael, R. Chikhi, J. Esquivel-Rodriguez,Protein binding ligand prediction using moment-based methods, Curr. Protein and Peptide Science (2010) [PDF] [3] R. Chikhi, L. Sael, D. Kihara,Molecular surface representation using 3D Zernike descriptors for protein shape comparison and dockingProteins: Structure, Function, and Bioinformatics (2010) [PDF] [2] R. Chikhi, D. Lavenier,Real-time ligand binding pocket database search using local surface descriptors.(Meeting Abstract) BMC Bioinformatics (2009) [PDF] [1] R. Chikhi, S. Derrien, A. Noumsi, P. Quinton,Paired-end read length lower bounds for genome re-sequencing, International Journal of Electronics (2008) [PDF]Combining flash memory and FPGAs to efficiently implement a massively parallel algorithm for content-based image retrievalTalksColib'Read Workshop, 2016,[PDF] ISMB, 2016,Graph representations of reference-free sequencing data[PDF] ALEA, 2016,Compacting de Bruijn graphs from sequencing data quickly and in low memory(focusing on navigational data structures) [PDF] SMPGD keynote, 2016,On the representation of de Bruijn graphs[PDF] Evomics Workshop on Genomics, 2016,de Bruijn graphs of sequencing data[PDF] [Lab] RECOMB, 2014,de novo assembly[PDF] Evomics Workshop on Genomics, 2014,On the representation of de Bruijn graphs[PDF] [Blog post] [Lab] ISMB/HiTSeq, 2013,de novo assembly[PDF] Evomics Workshop on Genomics, 2013,Informed and Automated k-Mer Size Selection for Genome Assembly[PDF] WABI, 2012,de novo assembly (introduction)[PDF] Thesis slides, 2012,Space-efficient and exact de Bruijn graph representation based on a Bloom filter[PDF] WABI, 2011,Computational methods for de novo assembly of NGS data[PDF] IBL, 2011,Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph[PDF] ISCBSC, 2009,de novo assembly tools, Monument, Mapsembler[PDF]Paired-end read length lower bounds for genome re-sequencingReportsR. Chikhi,, PhD Thesis, 2008-2012 [PDF] Summary: We discuss computational methods (theoretical models and algorithms) to perform the reconstruction (de novo assembly) of DNA sequences produced by high-throughput sequencers. This thesis introduces the following contributions - quantification of the maximum theoretical genome coverage achievable by recent sequencing data (Chapter 2) - theoretical models for paired-end assembly (Chapter 3) - two concepts for practical assembly: localized assembly and memory-efficient paired reads indexing (Chapter 4) - implementation details of a de novo assembly software, the Monument assembler (Chapter 5) - an algorithm that enumerates variants in sequencing data, implemented in the Mapsembler software (Chapter 6) R. Chikhi,Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data, Manuscript, research internship at MIT, Spring 2008 [PDF] Summary: We investigate the conjecture that one cannot simulate QMA(2) protocols in QMA using a quantum operation called a disentangler. Our results show that, when exponential precision is required, this conjecture holds unless P = NP. Moreover, also in the exponential precision case, we show that one only needs a stronger hypothesis to prove the conjecture. R. Chikhi,Study of Unentanglement in Quantum Computing, Manuscript, research internship at Purdue University, Summer 2007 [PDF] Summary: We present a model for two dimensional ligand binding pockets representation and we apply it to pocket-pocket matching and binding ligand prediction.Protein surface descriptors for binding sites comparison and ligand predictionRetired softwareMapsemblerTargeted assembly on a desktop computer, see reference [10].Paired reads repetitionsSoftware package for computing the ratio of single and paired (as in paired NGS reads) exact repetitions within a genome. Useful for obtaining re-sequencing lower bounds inspired by [Whiteford 05]. See [2] and the corresponding talk for sample results and details.MonumentWhole genome de novo assembler, described in [6] and [7] and [Phd Thesis]. (recommended instead: Minia)de Bruijn graph constructionHash table-free implementation of the de Bruijn graph for a set of reads. Also includes a tool that computes the union of two de Bruijn graphs and the cartesian product of abundances, useful for construction a multi-dataset de Bruijn graph. (recommended instead: BCALM 2)Pocket-SurferProtein ligand binding pocket type prediction using a database of known binding sites. See [3] for more details.(recommended instead: 3D-Surfer)