Journal of Leisure 閒記

It is indeed very hard to keep a homepage up-to-date. I couldn't count how many homepages I have created and then given up . As a result, I decided to write a blog, because it is much more easier to manage so I presume I will update it more often. But keeping a not-so-updated blog just means no one is going to read it. So I also decided to write something here that describes my daily life best, just to keep it not-so-outdated :-)
It's been a hard day's night
And I've been working like a dog
It's been a hard day's night
I should be sleeping like a log

Tuesday, March 13, 2012

CG vs. Illumina (Sensitivity)

Came across MJ's post in response to CG's post about our sequencing platform paper on Nature Biotechnology:


MJ pointed out a good point that our small set of Sanger sequencing data was only suggestive. Here is my thought.

A confidence level of 95% and a confidence interval of 5% for each of the platform specific call set requires a minimum sample size of ~380. Any further estimation based on a statistically insignificant set is inconclusive. That's why we went on to SureSelect at a larger scale, which gives us a statistically significant result.

As mentioned on the paper, the SureSelect may have potential bias since it was followed by Illumina sequencing. But if there is a strong bias towards Illumina due to systematic errors, probably the invalidation rate for Illumina itself wouldn't be as much as that for Complete.

Let's take the existing Sanger numbers and calculate it once again with its possible errors. With the same confidence level of 95% aforementioned, the possibly best validation rate for Illumina is 30% and the worst for Complete is 83%, which convert into 104K and 83K true positives in their specific call sets, respectively. That said, Illumina is still having a higher sensitivity, whereas Complete is more accurate (less FDR).

If it looks unfair, that's the problem of extrapolating on a set with big error bars. One thing that is true is that we can do a larger scale of Sanger sequencing on the specific calls, then we can have a better sense of the potential ground truth which will be less controversial.

Until then, we gotta believe that they both have their goods and bads, and performed very well overall.

Detecting and annotating genetic variations using the HugeSeq pipeline

Deciphering genome sequences is important for the mapping of genetic diseases and prediction of their risks. Advances in high-throughput DNA sequencing technologies using short read lengths have enabled rapid sequencing of entire human genomes and unlocked the potential for comprehensive identification of their underlying genetic variations. Various computational algorithms for identifying and characterizing
variants have been developed; however, most of these computational methods are neither integrated nor interoperable, making it difficult for biologists to extract all the genetic information from billions of sequences generated by these sequencing technologies. We developed HugeSeq, an integrated computational pipeline to fully automate the process of variant detection from alignment of these genomic sequences to detection and annotation of all types of genetic variations (single nucleotide polymorphisms (SNPs), short insertions or deletions (indels) and larger structural variations (SVs)).

Reference: Nature Biotechnology. 2012 Mar 7;30(3):226-9.



Performance comparison of whole-genome sequencing platforms

Whole-genome sequencing is becoming commonplace, but the accuracy and completeness of variant calling by the most widely used platforms from Illumina and Complete Genomics have not been reported. We sequenced the genome of an individual with both technologies to a high average coverage of ~76X, and compared their performance with respect to sequence coverage and calling of single-nucleotide variants (SNVs), insertions and deletions (indels). Although 88.1% of the ∼3.7 million unique SNVs were concordant between platforms, there were tens of thousands of platform-specific calls located in genes and other genomic regions...


Monday, May 16, 2011

卡通王國

很高興阿Don同Kennis已經買了期待已久嘅卡通王國。真想不到,這個經典系列竟然沒有六神合體,相聚一刻,小雙俠。。。仲漏咗咩歌?



CD 1
01. 足球小將(足球小將片尾曲) - 張衛健
02. 美少女戰士 - 王馨平 湯寶如 周慧敏
03. IQ博士 - 梅艷芳
04. 現代足球小將(現代足球小將) - 李克勤
05. 伙頭仔昆布 - 草蜢
06. 藍精靈 - 小太陽兒童合唱團
07. 數數小精靈 (寵物小精靈) - 陳浩民
08. 龍珠 (龍珠Z) - 張崇基+張崇德
09. 黃金戰士 - 胡渭康
10. 你讓我找到未來(高智能方程式) - 草蜢
11. 新世紀福音戰士(新世紀福音戰士) -楊千嬅
12. 跅跅步哈姆太郎 (哈姆太郎) - 2R
13. 寶貝甜心 (媽媽是小學四年生) - 湯寶如
14. 忍者 (忍者小靈精) - 小太陽兒童合唱團
15. 時代節奏 (童夢) - 劉錫明

CD 2
01. 夢中天使 (長腿叔叔) - 唐韋琪
02. 我係小忌廉 - 戴蘊慧
03. 多拉A夢 - 陳慧琳
04. 問題天天都多(櫻桃小丸子) - 歐倩怡
05. 小甜甜 - 路家敏
06. 人人愛愉快 (得意快獸仔) - 楊千嬅
07. 皇家雙妹嘜 - 歐倩怡
08. 美少女變身(美少女戰士R) - 陳琪
09. 愛心一百次(飛天小女豬事丁) - 戴恩玲
10. 命運鬥士(鎧甲聖鬥士) - 張衛健
11. 神秘的花園 - 何詠琳
12. 情誼心中印 (龍貓) - 江欣燕
13. 超人小子 (超人小子) - 黎瑞恩
14. 每點愛都記住 (娛樂金魚眼) - 車沅沅
15. 降魔者 (仙魔大戰) - 劉彩玉

CD 3
01. 超人的主題曲 (超人迪加)- 陳奕迅
02. 城市獵人 - 劉德華
03. 宇宙大帝 - 張國榮
04. 千年女王 - 露雲娜
05. 魔法咕嚕咕嚕 (咕嚕咕嚕魔法陣) - 滕麗名
06. 聖鬥士星矢 - 譚耀文
07. 達爾大冒險 - 蘇永康
08. 戰士高狄安 - 蔣慶龍
09. 忍者亂太郎 - 鄭嘉穎
10. 魔法時代 (魔動王) - 曾航生
11. 跟你啄一啄 (福星大咀鳥) - 李彩樺
12. 伏魔小皇子 - 梁漢文
13. 無名小子(伙頭智多星) - 黃凱芹
14. 冰河戰士 - 黃思雯
15. 劍神傳說 (反斗劍神) - 許志安

CD 4
01. 傳說 (千年女王) - 露雲娜
02. 百變小櫻 (百變小櫻MAGIC咭) - 曾嘉莉
03. 足球小旋風 - 李克勤
04. 爆裂旋風 (爆旋陀螺) - 郭富城
05. 超速龍球 (超速搖搖) - 方力申
06. 超級小黑咪 - 陳慧琳
07. 小丸子的心事(櫻桃小丸子) - 何韻詩
08. 變出我精彩(七小花) - 2R
09. 彈珠人的愛心炸彈 (BOMBOM彈珠人) - 楊千嬅
10. 正義 (無敵3x1) - 黎瑞恩
11. 虎威戰士 - 林子祥
12. 腦波子 - 陳美齡
13. 四驅小子 - 吳婉芳
14. 一股歪風 (電子神童) - 梅艷芳
15. 海 (冒險少女娜丁亞) - 吳國敬

Saturday, January 15, 2011

Birth Annoucement - Atticus

Libretto Script Blue Birth Announcement
Find hundreds of cute baby birth announcements at Shutterfly.com.
View the entire collection of cards.

Tuesday, July 13, 2010

Graduation and Beyond

Five years ago, I started a journey that was more challenging than I had ever expected -- gave up my work in IT and began a study in a subject that was totally new to me. I guess the most difficult thing is that, as a PhD student in the States, you will never know when exactly you can graduate until the very last minute. Knowing the nature of research, it should not be anything surprising; however, getting through it is really another matter. Like what in the PhD comics, it is absolutely not easy to see all the achievements (buying houses, getting promotions, having children, etc) of your friends in the same year while you are still a student at the ground level :-P

I wasn't sure I had made a right decision, and I am still not sure yet. Nevertheless, I have certainly learnt a lot in many aspects throughout the years, which I am sure I would not have time to learn or even think about it if I were still working in Hong Kong. With no doubt, doing a PhD is definitely a tough process. But in retrospect, its certainly an unforgettable and fruitful experience for life. So yes, by the time of this writing, I have basically graduated or otherwise I would be still complaining this and that :-) I am delighted to annouce that I successfully defended my thesis on May 20 followed by the submission of it on May 25.



So what's next? there must be a reason for taking one and a half months to update this blog :-P In the past two months, I was super duper busy not only with preparing for my graduation, but also with packing things for relocation -- because I got a Postdoc offer at Stanford starting from June. (Postdoc is actually a postdoctoral training for one to three years after a PhD degree.) So yes, I am still sort of a student, and looking forward to get a permanent job some day :-)

Last but not least, my graduation would not be possible without the support from all my friends, families and especially my dearest wife. Thank you!

Saturday, April 10, 2010

My Graduation Progress

Just submitted the first draft of my thesis and the presentation for the defense to my advisor. Now waiting for confirming the date of defense and I will be all set for my PhD! Can't wait to move on and start something new in somewhere new :-)

Tuesday, January 26, 2010

New Way To Locate Big Genetic Variants

Finally published my paper. And its on Yale News too!

New Haven, Conn. — Yale University researchers, analyzing hundreds of billions of bits of genetic information, have collated and standardized 2,000 signposts that mark the boundaries of large blocks of human genomic variants.


See more...

Citation: Lam et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nature Biotechnology. 2010 Jan;28(1):47-55. Epub 2009 Dec 27.

PubMed ID: 20037582