Journal of Leisure 閒記

It is indeed very hard to keep a homepage up-to-date. I couldn't count how many homepages I have created and then given up . As a result, I decided to write a blog, because it is much more easier to manage so I presume I will update it more often. But keeping a not-so-updated blog just means no one is going to read it. So I also decided to write something here that describes my daily life best, just to keep it not-so-outdated :-)
It's been a hard day's night
And I've been working like a dog
It's been a hard day's night
I should be sleeping like a log

Tuesday, March 13, 2012

CG vs. Illumina (Sensitivity)

Came across MJ's post in response to CG's post about our sequencing platform paper on Nature Biotechnology:


MJ pointed out a good point that our small set of Sanger sequencing data was only suggestive. Here is my thought.

A confidence level of 95% and a confidence interval of 5% for each of the platform specific call set requires a minimum sample size of ~380. Any further estimation based on a statistically insignificant set is inconclusive. That's why we went on to SureSelect at a larger scale, which gives us a statistically significant result.

As mentioned on the paper, the SureSelect may have potential bias since it was followed by Illumina sequencing. But if there is a strong bias towards Illumina due to systematic errors, probably the invalidation rate for Illumina itself wouldn't be as much as that for Complete.

Let's take the existing Sanger numbers and calculate it once again with its possible errors. With the same confidence level of 95% aforementioned, the possibly best validation rate for Illumina is 30% and the worst for Complete is 83%, which convert into 104K and 83K true positives in their specific call sets, respectively. That said, Illumina is still having a higher sensitivity, whereas Complete is more accurate (less FDR).

If it looks unfair, that's the problem of extrapolating on a set with big error bars. One thing that is true is that we can do a larger scale of Sanger sequencing on the specific calls, then we can have a better sense of the potential ground truth which will be less controversial.

Until then, we gotta believe that they both have their goods and bads, and performed very well overall.

4 comments:

Dainiksatta said...

online classes in MBA

like the post very much read this with great interest

unknown said...

Thank you for share this post for us. This post is very helpful for me and I hope also other. Please keep update and add more topic.
I also write about the android app like Yowhatsapp APK

john said...

Nice...
Visit here

Gail H said...

Great post tthankyou