cole lyman Bioinformatics, Emacs, Programming, and Life

Intelligence to Arrogance Ratio

Yesterday I graduated with my Bachelor’s degree from Brigham Young University and attended the Convocation services for my college (College of Physical and Mathematical Sciences). Our dean, Dr. Shane Reese, gave a very inspirational talk, of which I would like to share part of it with you.

Dr. Reese’s talk centered on two pieces of advice to the graduates, working hard and being meek. He also provided tools of measurement in order for us to see how well we are following his advice.

His first piece of advice was to work really hard. He shared that there is no substitute to working hard, and that one can measure each day if one has given an honest day’s work.

The second piece of advice is what I would like to primarily focus on, and that advice was to be meek. As a tool to measure one’s meekness, he shared a conversation that he had overheard where a faculty member was discussing the president of our university, Dr. Kevin J. Worthen. This faculty member described Dr. Worthen as having an incredibly high intelligence to arrogance ratio, illustrated below.


Imagine that one’s intelligence and arrogance can be quantified, then the intelligence to arrogance ratio would be your quantified intelligence divided by your quantified arrogance. For example, if one is incredibly dumb but also quite full of themselves, then the ratio would be extremely low. On the other hand, if one is reasonably intelligent while still humble, then the ratio is quite high. As a third example, if one is intelligent, but also pompous, then the ratio will be around 1.

In my experience, the individuals with a low ratio are the most pleasant to be around, because they offer interesting insights and advice while not being condescending. They are the ones that show you (rather than tell you) how intelligent they are.

I found this advice to be incredibly insightful and useful to all. I hope to increase my own intelligence to arrogance ratio as I pursue higher education and gain more life experience.

Colored de Bruijn Graphs

I have really wanted to write this post for a long time, but seem to only now get around to it. For more than a year now my research in the Computational Sciences Lab (CSL) at Brigham Young University (BYU) we have been researching various applications of the Colored de Bruijn Graph (CdBG). It all started when we explored some novel phylogenetic reconstruction methods in the CS 601R class during Winter semester 2017. We (or at least, I) kept being drawn back to CdBG’s and their potential for phylogeny reconstruction. Here are some of the things that I have learned along the way!

As with most scientific endeavors, this project certainly stands on the shoulders of giants. Some of these giants include the following papers and their respective authors. I think that they have done amazing work and I admire their methods.


We want to use the CdBG to reconstruct phylogenetic trees because it is very efficient computationally. The CdBG can be constructed in \(O(n)\) time and space and it can utilize whole genome sequences, which is a shortcoming of many of the traditional phylogenetic tree reconstruction algorithms.

Furthermore, we also figured that the CdBG contains more information than many of the kmer counting methods, and if they can perform so well then the CdBG will only be able to perform better because it not only stored the kmers (as nodes in the graph), but it also stores the context in which those kmers occur (as edges where \(k - 1\) basepairs overlap on either end of the kmer).

Our Contribution


In order to prove our hypothesis, we did what every self-respecting Computer Scientist would do, we wrote a program to figure out if it worked. We call our program kleuren, which is Dutch for “colors” (referring to the colors in the CdBG).

kleuren works by finding bubble regions of the CdBG. A bubble is defined as a subgraph of the CdBG that consists of a start and end node that are present in \(n\) or more colors, and there are multiple paths connecting the start node to the end node; where \(n\) is a given parameter and is no greater than the total number of colors in the CdBG and a path is simply a traversal from one node to another.

After the bubbles are found, they are aligned through Multiple Sequence Alignment (MSA) via MAFFT and then each MSA block is concatenated to form a supermatrix. The supermatrix is then fed into a Maximum-Likelihood program (IQ-TREE) to reconstruct the phylogenetic tree of the taxa.

How Bubbles are Found

kleuren uses fairly simple and straightforward algorithms to find the bubbles, which is broken up into two steps: Finding the End Node and Extending the Paths.

Finding the End Node

kleuren iterates over the super-set (the union of all kmers from all taxa) as potential start nodes (in a dBG the nodes are kmers, thus \(node == kmer\)). Given a kmer, it is queried in the CdBG and the number of taxa (or colors, thus \(taxon == color\)) is calculated to determine if the number of colors for that kmer, \(c\), is greater than or equal to \(n\), where \(n\) is a parameter provided by the user.

If \(c \geq n\) then the node is a valid start node and a breadth-first search is performed starting from this node until another node is found where the number of colors that it contains is greater than or equal to \(n\), which then becomes the end node.

Extending the Paths

After an end node is discovered, the sequence of each path between the start and end nodes must be calculated. In order to discover a path in a dBG, one must collapse edges by appending the last nucleotide of the next node to the previous node’s sequence. For example, if a node is ACT -> CTG, then the collapsed sequence will turn out to be ACTG.

This is implemented as at most \(c\) depth-first searches, where \(c\) is the number of colors. The number of depth-first searches decreases as the number of paths with shared colors increases.

Further Reading

If you are interested in the details of our algorithm and would like to see some results, please check out our paper Whole Genome Phylogenetic Tree Reconstruction Using Colored de Bruijn Graphs (preprint). We are currently working on extending kleuren to improve its efficiency.

Genome Mapping Post Processing

So you have mapped your reads to the reference genome, but what comes next? How can you tell how many reads were aligned, where they were aligned, and actually see what your mapping algorithm did? This post will show you how you can analyze the results of your genome mapper by using samtools and IGV.


SAM File Format

Most modern genome aligners will output the aligned reads in the SAM file format. If you have a lot of time on your hands you can read through this SAM file and see where the reads are mapped to and read the information for each read (which probably adds up to millions of lines). A’int nobody got time for that. Instead, we are going to have samtools do the work for us.


  1. Download samtools.
  2. Go to the directory that you downloaded it and unzip the file
    • $ cd ~/Downloads
    • $ tar jxvf samtools-1.3.1.tar.bz2
  3. Go to the upzipped directory and prepare for install
    • $ cd samtools-1.3.1
    • $ ./configure
    • Note: You may get a library dependency error if you don’t have the development files for the ncurses library installed. If you do get this error, install either libncurses5-dev or ncurses-devel.
  4. If the ./configure command executed without errors, install samtools
    • $ sudo make install
    • Note: If you do not have super-user privileges then you can run $ make install --prefix=<directory of your choice>
  5. Test if the install worked by running samtools
    • $ samtools

If the installation was successful then you should see a usage prompt for samtools, otherwise your installation failed (or you aren’t using the correct path to the samtools binary).

BAM File Format

The BAM file format is the binary cousin to the SAM file format, it is the same thing, just in binary form (this means that you can’t directly view a BAM file in a text editor like vim like you could a SAM tile).

Converting SAM to BAM

We will now use samtools to convert our SAM file to a BAM file. The SAM file that we are going to use is found in the examples directory of the samtools-1.3.1 directory, the file is toy.sam. samtools is broken up into subprograms that perform different functions. In order to convert the SAM file we use samtools view, like so: $ samtools view -b -S examples/toy.sam -o examples/toy.bam.

Sorting BAM

Next, we want to sort the reads in the BAM file. This is a necessary step because many visualization programs cannot handle unsorted BAM files. To sort the BAM file, run: $ samtools sort examples/toy.bam -o examples/toy.sorted.bam.

Indexing BAM

We then have to index the sorted BAM file, which is also a necessary step for visualization tools. Run $ samtools index examples/toy.sorted.bam.

Viewing BAM Using samtools tview

Now we can finally view the aligned reads in our terminal! Run $ samtools tview examples/toy.sorted.bam examples/toy.fa and you will suddenly see four reads aligned to an extremely small genome. You should see something along these lines:

1         11              21        31         41        51
  ........    ....  ......K.K......K. ..........
  ........AGAG....***...      ,,,,,    ,,,,,,,,,
    ..C...**** ...**...>>>>>>>>>>>>>>T.....```

The numbers at the top signify the index of the genome, and the first line of characters represents the reference genome itself. The third line is the (consensus sequence) to discover what K represents). Each line under the consensus sequence is a read.

You may be wondering why most of the reads are made of ., well that is because they match the reference genome. There are many different settings that you can play with in samtools tview, to view all of the settings type ? and a help menu will come appear.

Integrative Genome Viewer (IGV)

If you want more flexibility and a more robust way of viewing your aligned reads you can use IGV. It has a GUI, which makes things nice sometimes.


Download the appropriate files according to the system that you are running. IGV is written in Java (using Java 7, not sure if Java 89 will work, but it should), so you need to make sure that you have Java installed on your computer. Once you have Java installed and IGV downloaded, go ahead and unzip the downloaded file, if needed. Then if you are in an Unix-like OS (Mac or Linux) you can run $ ./IGV_2.3.88/ to open up the IGV GUI.

Loading the Reference Genome

Once the program is open, click the Genomes button at the top of the window, then select Load Genome from File.... Select the file samtools-1.3.1/examples/toy.fa.

Loading the Reads

After the reference genome is loaded we can load in the reads by clicking the File button at the top of the window, then select Load from File.... Select the file samtools-1.3.1/examples/toy.sorted.bam.

Seeing the Reads

You probably can’t see any changes in the view, that’s ok. In the second row click on the dropdown arrow that says All and select either ref or ref2 and then you will be able to see what we saw in samtools tview, except it is way easier to figure out what everything means!

Suffix Trees

Suffix Trees

What is a Suffix Tree?

A suffix tree is a data sructure that contains each suffix of a string, a suffix is defined as the end part of a string. For example, the suffixes for the string banana are:

  • a
  • na
  • ana
  • nana
  • anana
  • banana

Notice that the entire string is also considered a suffix. We can also include the empty string as a suffix, so in order to include the empty string we need to append a character that isn’t in the alphabet to the string. We will assume that the character $ is not in the alphabet. Our string is now banana$, and the suffixes are:

  • $
  • a$
  • na$
  • ana$
  • nana$
  • anana$
  • banana$

How to Construct a Suffix Tree

The easiest way to understand how to construct a suffix tree is to first construct a suffix trie, then collapse nodes to convert the trie to a tree. Here is the graphical representation of the suffix trie: suffix trie

Now we collapse the nodes that only have one child, and the resulting suffix tree is this: suffix tree

How to Use a Suffix Tree

Observe that each suffix is represented in this structure. This means that we can efficiently search for a suffix (or substring) by traversing the suffix tree. For example if we were to search for the substring ana in the suffix tree for banana$, this would be the traversal path:

0 -> 2 -> 3

Note: We can stop at any node because we are searching for a substring rather than a suffix, if we were searching for a suffix, the search string would have been ana$.

Suffix Tree Construction

How would you create a naive algorithm to construct a suffix tree? You can easily create a simple algorithm to construct a suffix tree in O(n2), but Esko Ukkonen discovered a way to construct a suffix tree in O(n) (linear time).

Genome Assembly

What is Genome Assembly?

Genome assembly is often compared to putting a puzzle together. In this analogy, the pieces of the puzzle are individual reads that we get from a DNA sequencer. The ultimate goal of putting a puzzle together is to find out where each and every piece fit exactly, this way you can see the completed picture. While this is also true of genome assembly, we need to be realistic. The first step in genome assembly is to generate contigs, or in the words of our puzzle analogy, to combine pieces that we know go together.

What is a Contig?

A contig is short for contiguous sequence. It is a sequence that is longer than the reads, and shorter than the genome (technically the whole assembled genome could be considered a contig in an organism with a single chromosome, but for our purposes it is considered to be shorter than the entire genome). If you have a puzzle that is a picture of a beach and a ocean, you would combine all of the tan colored pieces and then all of the blue colored pieces. You may even combine the pieces that make up the colorful beach umbrellas littering the beach. When you do this you are generating contigs! You are making small chunks of the whole because they are easy to identify, and it is the same with genome assembly.

How Do We Discover Contigs?

Well, we know that the pieces of our puzzle are reads. The regular puzzle piece has four sides with different shapes that define exactly where it should go. Our reads also have “sides” that tell us the proper place it should be in the contig. These “sides” are revealed when we break the read up into kmers and inserting them into a de Bruijn graph. We then can construct the contigs by continuing the non-branching nodes (a node that has exactly one incoming edge and one outgoing edge).

Imagine that we are assembling an incredibly small genome, with way too few reads, which are the following:


The reads are of length 5. The de Bruijn graph using the kmer length of 5 (the entire read) for this set of reads would look like this (the non-branching node is in red): de Bruijn Graph

As you can see there is only one non-branching node in the graph. We can only generate the following contigs:


I hope it is clear that this is not a very good assembly. Why isn’t this assembly good? Well for the most part all of our contigs are the same length of our reads, except for one, which is only one more base pair than the read length. This assembly is equivalent to only connecting one puzzle piece. Let’s see how we can improve this assembly.

Kmer Length

The length of kmer that you use to construct your de Bruijn graph will greatly influence your assembly, for better or for worse. What will happen if we decrease the kmer length from 5 to 3?

Here is the de Bruijn graph of kmer length 3:

Here is the de Bruijn graph of kmer size 3: de Bruijn Graph

What is up with all of the edges? Each edge represents an occurance of that kmer, for example the kmer ACT has 4 edges to CTG because there are 4 occurances of CTG. Let’s clean this graph up by giving weights to the edges. The number on each edge represents how many times that edge is repeated.

Here is the de Bruijn graph of kmer size 3 with weighted edges (the non-branching nodes are in red): de Bruijn Graph

Let’s see if a kmer length of 3 is any better; we definitely have more branhcing nodes, but will that lead to longer contigs? Essentially we have made our puzzle pieces smaller, so just because we can put together more puzzle pieces doesn’t mean that we are building more of the puzzle. Here are the contigs generated by the graph of kmer length 3:


Well, compared to the assembly using kmers of length 5 we have the same number of contigs, with a shorter average, and the longest contig is only the length of the reads. Is this assembly better or worse than the last one? Short answer, yes. You have to define “better” when classifying assemblies. You may want to have the longest average contig length, or simply the longest contig. You may have some other metric like N50 in which you determine which assembly is “better.”

Filtering Errors

One way of accounting for errors in your algorithm is to remove edges below a certain weight. When an edge has a higher value, it occurs more often. If an edge occurs more often, then it is more likely to be a valid contig.

What is Next?

You may be asking yourself, “I can see how we generate contigs, and how that can be useful, but we still don’t have one sequence that represents the genome. We haven’t fully completed the puzzle. How do we do that?” If you would like to fully complete the puzzle, to put all of these contigs together, you would have to construct scaffolds. A scaffold is a representation of how contigs relate to each other as well as accounting for gaps in the sequence. Depending on how much data you have, you may not be able to create one continuous sequence.

NOTE: Scaffolding is not required for the CS 418/BIO 365 Genome Assembler project. Generating contigs is good enough!

How to Assemble using Velvet

Genome Assembly using Velvet

What the heck is Genome Assembly?

Genome assembly is the process of constructing long contiguous sequences from shorter sequences. Think of this problem at a genomic scale. Same approach, just a lot more data.

What the heck is Velvet?

Velvet is a genome assembler that uses a de Bruijn graph to generate contigs. If you are interested in reading the paper describing how Velvet works, feel free to read Velvet: Algorithms for de novo short read assembly using de Bruijn graphs.

Installing Velvet

I say that the most important part of using software is figuring out how to install it. Sometimes it can be harder than you think.

Here is how you install Velvet:

  1. Download the source
    • Optional: Check out the sweet Velvet website complete with web 2.0 design. Al Gore would be proud.
  2. Go to the directory in which you downloaded the file $ cd ~/Downloads and unzip the file $ tar zxvf velvet_1.2.10.tgz
  3. Go to the just unzipped directory $ cd velvet_1.2.10 and compile Velvet by issuing the command $ make
    • Error warning!! If you get an error that says something along the lines of fatal error: zlib.h: NO such file or directory then try installing the package zlib1g-dev then running $ make again
  4. If you didn’t get any errors, it looks like you have installed Velvet! If you got errors, Google the error and figure out how to fix it.

Running Velvet

To execute the Velvet program make sure that you are in the velvet_1.2.10 directory and then type $ ./velveth and it should return a short help message. If it didn’t, check to see if you are in the correct directory by issuing the command $ pwd.

Assembling the Zika virus genome

Prepping the files for assembly

We have some reads from the Zika virus, fresh from Florida. We want to assemble the Zika virus genome to help find a cure. Download the reads zika.read1.fastq and zika.read2.fastq, then run this command $ ./velveth zika_genome 20 -fastq -shortPaired ~/Downloads/zika.read1.fastq ~/Downloads/zika.read2.fastq. This command is a sort of preprocessing command that constructs your dataset so that it can assemble it. Here are what the parameters mean:

  • ./velveth- the program that we use
  • zika_genome- this is the output directory of all the files
  • 20- this is the hash (in other words, kmer) size that we use, you will want to play around with this
  • -fastq- this is the type of input files that we have
  • -shortPaired- this is the type of input reads that we have
  • ~/Downloads/zika.read1.fastq- this is the first file of reads
  • ~/Downloads/zika.read2.fastq- this is the second file of reads

Note: You can have an unlimited number of input files.

Assembling the reads

We are now going to use the program ./velvetg to actually construct the de Bruijn graph and assemble the reads. Issue the command $ ./velvetg zika_genome/ -cov_cutoff 4 -min_contig_lgth 100, and now you have assembled your first genome! Here are what the parameters mean:

  • ./velvetg- the program that we use
  • -cov_cutoff 4- this removes the nodes that have a coverage less than 4
  • -min_contig_lgth 100- this gives us all of the contigs that are greater than 100 bases

Viewing the generated contigs

Now we can see the contigs we generated by this command, $ cd ./zika_genome and $ less contigs.fa. Feel free to explore around in this directory for other cool stuff about the contigs!

Happy assembling!

Above and beyond…

You can compare your generated contigs with the NCBI Reference Sequence for the Zika virus to see how well (or how poorly) your genome assembly actually is!

Python Tutorial CS 418/BIO 365

Python Tutorial

CS 418/BIO 365

30 August 2016

# This is a comment
""" This is a 
''' This is also a 
# How do I declare variables?

# String
my_string = 'Hello, CS 418/BIO 365!!'
my_other_string = "Hello, friends."
# Numbers
my_number = 418
my_other_number = 365.578
# List
my_list = ['this', 'is', 'fun', 56]
# or
my_other_list = []
# or
my_other_list2 = list()

# Dictionary (Map)
my_dict = {'key': 'value', 'other_key': 'other_value'}
# or
my_other_dict = {}
# or
my_other_dict2 = dict()

my_other_dict['my_key'] = 'my_value'
# How do I make loops?

# NOTE: Whitespace matters!!

# for loop
for i in my_list:
# NOTE: "print(i)" is used in Python 3, "print i" is used in Python 2 
# for loop over range
for i in range(0, 5):
['bioinformatics', 0, 1, 2, 3, 4]
# while loop
while len(my_list) > 0:
    del my_list[-1]
['this', 'is', 'fun']
['this', 'is']
# break 
for val in my_other_list:
    if val is 2:
# continue
for val in my_other_list:
    if val is not 2:
# How do I perform file I/O?

# file input
import sys # this imports the library to get command line arguments
with open(sys.argv[1]) as fh: # sys.argv[1] is the first command line argument, sys.argv[2] is the second ... and so on.
    my_first_line = next(fh) # Python 2 & 3
    my_next_line = # Python 2 
with open(sys.argv[1]) as other_fh:
    # iterate over all lines in file
    for line in other_fh:

# file output
with open('./my_file.txt', 'w') as writable: ''' NOTE: the second 
    argument ('w') makes this file writable '''
    writable.write('This is my test file\n')
# How do I manipulate strings?

# string slice
my_string = 'banana'

Rosalind Tutorial

You can register for the course at:

Use your NetID as your username!!

The first problem is found here:

# What do we need to do in pseudo code?

''' -Read in the sequence and the 
    length of the kmer (a kmer is essentially a substring)
    -Break up the sequence into kmers
    -Count each kmer
    -Find the kmer(s) with the highest counts
    -Print out the highest count kmers'''
# Read in the sequence and the length of the kmer from a file
import sys
with open(sys.argv[1]) as file:
    seq = next(file).strip() # REMEMBER: in Python 2 use
    kmer_len = int(next(file).strip())
# Break up the sequence into kmers
    for i in range(0, len(seq)):
        kmer = seq[ i : i + kmer_len]
# Count each kmer
    counts = {}
    for i in range(0, len(seq)):
        kmer = seq[ i : i + kmer_len]
        if kmer not in counts:
            counts[kmer] = 1
            counts[kmer] += 1
# Find the kmer(s) with the highest counts
    max_count = 0
    max_kmers = []
    for kmer, count in counts.items(): # NOTE: use counts.iteritems() in Python 2
        if count > max_count:
            max_count = count
        elif count == max_count:
# Print out the kmers with the highest counts
    print(" ".join(max_kmers))

You Don't Know How Bad You Are Until You Try to Be Good

One interesting phenomenon that I have noticed in myself is that I perceive that I am better than I actually am in reality. Call it pride or high expectations; either way, it seems that I never realize where I was until I begin to improve. When we don’t try to improve ourselves, we will have nothing to compare ourselves to in the future. If we never surpass our current state, then we will never realize that we were ever lacking in our old state.

I have also noticed that the moment that we begin on the path of improvement it seems as if that path is impossibly difficult, so much more difficult than it was before. I believe that this is due to the fact that change is difficult, and as stated previously, I believe that my abilities are higher than they are in reality.

Why try to improve?

Some may ask what the point is in striving to do your best? Is your current state not good enough? I can only speak for myself (obviously), so why do I strive to do my best? I see achieving your highest potential as the greatest challenge this life has to offer. Whether it be in sports, music, academics, or business, it is human nature to embrace difficulties and overcome them. I see overcoming weaknesses and flaws in my character as a challenge that is to be overcome, that I enjoy immensely.

Why is improving so difficult?

You don’t know how hard improving is going to be until you attempt it. If our current state of being isn’t easy or comfortable, then we wouldn’t that way in the first place. It is much easier to stay the way you are than to fundamentally change your character; however, change for the better is always worth it, no matter how difficult it was. Doing what we have always done is definitely easier, but making the change that we want can bring the satisfaction back to life that we may be lacking.


Wait for it Or not

Wait for it… Or not.

Wait for it… Or not.

When you can’t wait…

Wait for it… Or not.

When you can’t wait…


In a world where prayers to the ‘omniscient being of the universe’1 begin with OK Google, and more information than you will ever want to know are answered instantaneously. Furthermore, having the ability to obtain almost any product that one could desire with free 2 day shipping doesn’t help us learn how to wait either.

What is the point of waiting?

What value does patience add to our lives? Learning how to wait is valuable because at some point in your life there will be something that you can’t receive/achieve instantaneously. There are many worthy endeavors that will require you to work long and hard, and if you aren’t patient enough to work to achieve those goals you will never achieve them.

Patience is arguably one of the most difficult attributes to develop. I believe the only way that you can develop patience is by waiting for it (get it?).


Precision Medicine Rogue Therapeutics Harvard

Precision Medicine- Rogue Therapeutics Harvard 2016

Precision Medicine- Rogue Therapeutics Harvard 2016

Rare disease, genomics and patient-driven medicine may be terms that you have never heard of. All of these terms relate to precision…

Precision Medicine- Rogue Therapeutics Harvard 2016

Rare disease, genomics and patient-driven medicine may be terms that you have never heard of. All of these terms relate to precision medicine, medical treatments that are tailored to a specific patient.

These patients usually have some sort of rare disease, which is any disease that affects less than 200,000 people (in the United States). However, while the diseases themselves may be rare; having a rare disease is not rare. There are approximately 30 million people in the United States that suffer from a rare disease.

What is the purpose of precision medicine?

The goal of precision medicine is to develop treatments to alleviate rare diseases. The range of treatments available are quite limited because 80% of rare disease are genetic based. This means that while traditional medications could treat rare diseases, they will never permanently cure genetic disorders.

Undiagnosed Disease Network

Even though Precision Medicine as a discipline is still in its infant stages, there are many organizations that aim to make precision medicine available to every patient. One such organization is the Undiagnosed Disease Network (UDN). This is an initiative funded by the NIH to aid those with rare diseases to have a diagnosis. While only 50% of the current existing rare diseases have a foundation studying and supporting patients with that disease (let alone a cure or treatment for that disease), the UDN can help patients towards a diagnosis.

Karen, Ornella & Lysogene

Another example of an organization that supports rare diseases and have successfully produced one form of precision medicine (so far) is Lysogene. This company was founded by a mother, Karen Aiach, whose daughter, Ornella, was diagnosed with Sanfilippo Syndrome. At the time of diagnosis the doctors stated that the prognosis of Ornella was highly dysfunctional childhood development and an early death at the age of 20 (if not earlier).

“There will be no cure for 20 years, go home and enjoy the time left with your child.” — The doctors diagnosing Ornella

In spite of the doctors’ prognosis, Karen was determined to do something about her daughter’s health condition. She did research and talked to the right people and eventually brought a treatment for Sanfilippo Syndrome to clinical trials. Her daughter still lives, and has given hope to other patients with Sanfilippo Syndrome.

Unfortunately Karen and Ornella’s story is still rare when it comes to the rare disease community. There are thousands that die annually without any cure, or even any hope for a cure. This must change. As research progresses and discoveries about rare diseases are made, more cures will come. Karen, among others, are the pioneers of precision medicine. I hope for a time when all patients diagnosed with a rare disease will be as fortunate as Ornella.

You can read more about Karen and Ornella’s story here at