This past week I was visiting the University of Delaware to attend the 3rd Skate Genome Annotation workshop, sponsored by the IDeA Network for Biomedical Research Excellence (INBRE) Program from the National Center of Research Resources at the National Institutes of Health.
As the title suggests, we’re looking at real data from the genome project of the Little Skate (Leucoraja erinacea). Why is this cool? Well firstly because all marine animals are totally awesome (Even vertebrates…I guess). Secondly, the Little Skate is often used as a model organism for understanding the human biology. L. erinacea is a Chondrichthyan fish, a primitive jawed vertebrate that branched off early from all living vertebrate species. The Little Skate functions like any typical vertebrate, possessing an adaptive immune system and a pressurized circulatory system (plus it grows fairly easily in tank); this species has significantly enhanced our knowledge of human physiology, immunology, stem cell and cancer biology, pharmacology, toxicology, and neurobiology. Having a complete genome sequence for the Little Skate will have huge benefits for developmental biology and biomedical research, and will also help to decipher evolution in sharks, rays, and higher vertebrates . With 49 chromosomes and an estimated genome size of 3.42 billion base pairs (slightly larger than the human genome), this species has considerably less genetic matter than other cartilaginous fishes (the genome of the dogfish shark, another closely related model organism, is double the size).“]
The most awesome thing about the Little Skate is its phenomenal power of regeneration. As in, it can grow back organs and amputated limbs. One of the big reasons for sequencing the genome is to characterize the genetic pathways and patterns of gene expression that enable this wound-healing response. If the Little Skate is an ancient vertebrate, then you could reason that limb regeneration is an ancestral trait that was subsequently lost in higher-level vertebrates. If we understand which genes turn on, could we eventually learn how to switch on limb regeneration in humans?“]
A smaller genome is also cheaper to sequence (making scientists happy) and equals a better chance of a successful assembly. Although we hear about new genomes being sequenced on a weekly basis, this stuff is hard work. Take the human genome—our Homo Sapiens genome assembly is pretty damn good (nevermind ten years old already), yet there are STILL bits and pieces of sequence that don’t seem to fit in anywhere. Imagine you’re thisclose to completing a 2,000 piece puzzle, but you’ve got a bunch of holes and your extra pieces are all the wrong shape and size (e.g. you probably wrongly jammed in a piece somewhere in there…now you have to find it and swap it out). In a genome, we’ve got genes flanked by repetitive ‘junk’ DNA. A lot of times we don’t have long enough sequence bridges to span the gap of these repetitive regions; with the newest technologies each sequenced strand of DNA (what us biologists refer to as ‘reads’) is 100-150 base pairs long, but repetitive regions can be thousands of bases in length. The Skate Genome project currently has sequenced over 3 BILLION reads and given us 59x coverage of the genome (meaning every position in the genome has theoretically been sequenced 59 times) but there STILL isn’t enough data for a good assembly. The current assembly has the genome spit into 3 million contigs (longer stretches of DNA stuck together), with the longest contigs around 21,000 bases in length. Curse you, repetitive elements! Nevertheless, we’re making slow and steady progress; the data that we do have is helping to train undergrads, postdocs, and new genomicists in genome assembly and protein identification. Our undergrad from UNH was dreaming about gene annotation by the end of the week…