博文

[转载]【计算机科学】【2018.05】从数据提取到大规模分析的深度学习

已有 921 次阅读 2021-1-30 16:25 |系统分类:科研笔记|文章来源:转载

本文为挪威北极圈大学（作者：Mike Voets）的硕士论文，共68页。

我们的目标是深入了解开发和部署一种深度学习算法，使生物医学图像分析自动化。我们将医学档案系统中的敏感数据匿名化，尝试复制和进一步改进已发布的方法，并扩展我们的算法以支持大规模分析。具体来说，我们的贡献如下。

我们撰写了一种乳房X光片的匿名检测算法，并编写了一个乳腺癌检测的专用脚本。这个脚本将用于一个更大的项目，用于从挪威所有的筛查点提取乳房X光片。第二，由于这个脚本目前正由Helsenord IKT授权，因此我们开发了一种用于生物医学领域类似筛选问题的算法。

为了不重蹈覆辙，我们调查了早先已有的工作。高影响力文章JAMA 2016；316（22）[1]描述了一种检测糖尿病视网膜病变的高性能深度学习算法，报告了0.99的接收器工作特性曲线（AUC）。我们试图复制这个方法，但我们的AUC为0.74和0.59，没有达到论文的结果，可能是由于数据的差异，或是由于方法学上的缺失。

第三，对糖尿病视网膜病变算法中的数据预处理方法稍加修改，AUC分别提高到0.94和0.82。这些发现强调了复制深度学习方法的挑战，这些方法的源代码没有发布，也没有使用公开的数据。

第四，运行基准来评估在国家（挪威）规模上运行算法开发和自动分析所需的资源。我们估计一种乳腺癌检测算法可以在4个GPU上训练不到17小时，与1个GPU相比，亚线性速度提高了3.36倍。使用廉价的GPU进行评估已经证明可以立即执行。最后，结合我们的经验和教训总结了文献建议，以开发、部署一种大规模筛查计划中的乳腺癌检测算法。

We aim to give an insight into aspects of developing and deploying a deep learning algorithm to automate biomedical image analyses. We anonymize sensitive data from a medical archive system, attempt to replicate and further improve published methods, and scale out our algorithm to support large-scale analyses. Specifically, our contributions are described as follows. First, to anonymize and extract mammograms for the development of a breast cancer detection algorithm, we wrote a script for mammograms that reside in a data-locking, sensitive, and proprietary pacs. The script will be used in a larger project to extract mammograms from all screening points in Norway. Second, because this script is currently being authorized by Helsenord IKT, we instead developed an algorithm for a similar screening problem in the biomedical field. In order not to reinvent the wheel, we investigated earlier work. The high-impact article JAMA 2016; 316(22)[1] describes a high performance deep learning algorithm that detects diabetic retinopathy, reporting a receiver operating characteristic curve (AUC) of 0.99. We attempted to replicate the method. Our AUC of 0.74 and 0.59 did however not reach the reported results, possibly by differences in data, or by missing details in the methodology. Third, by modifying the data preprocessing methods in the diabetic retinopathy algorithm slightly, the AUC increased to 0.94 and 0.82. These findings emphasize the challenges of replicating deep learning methods that have their source code not published, and do not use publicly available data. Fourth, benchmarks were run to assess the resources needed to run algorithm development and automated analyses on a national (Norwegian) scale. We estimate that a breast cancer detection algorithm can be trained on 4 GPUs in less than 17 hours, with a sublinear speed-upof 3.36 times compared to 1 GPU. Evaluation with inexpensive GPUs has been shown to perform instantly. Lastly, with our experiences and lessons learned in mind, we conclude with literature suggestions and recommendations to develop and to deploy an algorithm for breast cancer detection in a large-scale screening program.

大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【计算机科学】【2018.05】从数据提取到大规模分析的深度学习

1. 引言

2. PACS数据检索

3. 高影响因子研究的复现与提升

4. 规模评估

5. 结论

更多精彩文章请关注公众号：

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

刘春静

全部作者的其他最新博文

全部精选博文导读

大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【计算机科学】【2018.05】从数据提取到大规模分析的深度学习

1. 引言

2. PACS数据检索

3. 高影响因子研究的复现与提升

4. 规模评估

5. 结论

更多精彩文章请关注公众号：

当前推荐数：0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

刘春静

全部作者的其他最新博文

全部精选博文导读

该博文允许注册用户评论请点击登录评论 (0 个评论)