适合人群
适合年级 (Grade): 高中生/大学生
适合专业 (Major): 流行病学、公共卫生学、流行病统计学等专业学生;对数据科学与统计在公共卫生与生物医学中的应用感兴趣的学生;学生需要具备数理统计、R语言和一些常用库的使用及数据操作基础
建议选修: R语言统计分析速成
导师介绍
R. Todd
哥伦比亚大学 Columbia University
终身正教授&系副主任
R. Todd导师现任哥伦比亚大学生物统计系终身正教授兼系副主任、国际统计学会成员、美国统计协会会士、英国CUMC格伦达·加维教学学院研究员、国际顶级学术期刊Biometrics和 International Statistical Review副主编。Prof. Todd is currently a tenure professor and vice chair of Institute of BioStatistics at Columbia University, a member of the International Statistical Association, fellow of the American Statistical Association, a Fellow of the Glenda Garvey School of Teaching and Learning at CUMC in the United Kingdom, a leading international academic journal Biometrics and International Statistical Deputy Editor of Review
他的研究领域为生物统计方法论及其在各种领域的应用,目前正在与纽约州立精神病学研究所的研究人员合作,通过分析脑成像研究的数据,研究各种统计建模问题。其他正在进行的兴趣包括函数数据分析、非参数回归、小波方法、统计建模、统计计算。His research interests include biostatistical methodology and its applications to a variety of fields, and he is currently working with researchers at the New York State Psychiatric Institute on a variety of statistical modeling issues by analyzing data from brain imaging studies. Other ongoing interests include functional data analysis, nonparametric regression, wavelet methods, statistical modeling, statistical computing.
项目背景
生物统计旨在运用数理原理和方法,分析与阐释生物数据和现象,力图把握本质规律,解决生物、医学、公共卫生问题。数据科学的蓬勃发展及其在金融等诸多领域的落地为生物医学和公共卫生统计分析提供了新方法。目前,R语言、Matlab、SPSS都是全球范围内较为普及的生物信息统计分析工具。项目将广泛介绍统计数据科学在公共卫生和生物医学中的前沿应用,指导学生使用技术和软件完成探索性和更高级的回归分析,帮助学生将技巧应用到解决实际问题中,直接体验数据科学统计技巧对生物医学领域的潜在和重要影响。
项目介绍
本项目将深入探讨R语言在公共卫生领域的广泛应用,为学生提供全面的统计学知识和实践技能。学生将学习如何使用R语言进行数据处理、分析和可视化,强调在公共卫生研究中的具体应用。项目内容包括R语言的基础语法和数据结构,以及如何运用R进行常见的统计方法,如回归分析、方差分析、生存分析等,使用R语言包包括tidyverse、dplyr、ggplot等。通过理论教学和实际案例,学生将掌握R语言的高级编程技巧,有效处理卫生领域的大型数据集。特别强调课程将关注R语言在流行病学研究、健康数据分析、临床试验设计等方面的应用。学生将通过实际项目和案例研究,培养对真实卫生数据的处理和解释能力,从而更好地理解和应用统计学方法。无论是对于初学者还是有一定统计学基础的学生,本课程都将为其提供一个全面的R语言统计学培训,使他们能够在未来的公共卫生研究和实践中灵活应用统计学方法。在项目结束时,提交项目报告,进行成果展示。
This program will explore in depth the wide application of R in the field of public health, providing students with comprehensive statistical knowledge and practical skills. Students will learn how to use R for data processing, analysis, and visualization, emphasizing specific applications in public health research. The content of the project includes the basic syntax and data structure of R language, and how to use R to carry out common statistical methods, such as regression analysis, analysis of variance, survival analysis, etc., the use of the R language package including tidyverse, dplyr, ggplot, etc. Through theoretical instruction and practical cases, students will acquire advanced programming skills in the R language to effectively handle large data sets in the health field. In particular, the course will focus on the application of R language in epidemiological research, health data analysis, clinical trial design, etc. Students will develop the ability to process and interpret real health data through practical projects and case studies to better understand and apply statistical methods. For both beginners and students with a background in statistics, this course will provide them with comprehensive R language statistics training that will enable them to flexibly apply statistical methods in future public health research and practice. At the end of the project, submit the project report and present the results.
个性化研究课题参考:传染病预测预警;生物统计模型在PAHs致人群健康损害危险度评价中的应用研究;生物统计学在降血糖新药疗效评估中的应用
Suggested Future Research Fields: Infectious disease prediction and warning;Research on the application of the biostatistics model in the evaluation of the risk of population health damage caused by PAHs;Application of Biostatistics in Evaluating the Efficacy of New Drugs for Lowering Blood Sugar
项目大纲
统计数据科学概论、数据科学对公共卫生与生物医学的应用;R语言介绍、RStudio和tidyverse的介绍 Introduction to statistical data science; applications in public health and biomedicine. Introduction to R, RStudio, and the tidyverse.
R语言真实数据实操演示 Further practice with R and RStudio, illustration using example real-life data.
数据的读取和操作、R语言之dplyr、R语言之ggplot数据图形化和探索性分析、预测模型-线性回归 Reading data, data manipulation with dplyr, exploratory data analysis with ggplot2
数据操作实践,图形化数据摘要案例学习Practice with data manipulation, further examples of graphical data summaries
线性回归、逻辑回归、机器学习导论Linear regression modeling, logistic regression modeling, introduction to machine learning
回归分析数据案例学习,以诊断图为例 Practical application of regression techniques to real-life data examples, some diagnostic plots.
监督和无监督学习算法,决策树算法,聚类算法 Supervised and unsupervised learning algorithms, tree-based methods, clustering, and other approaches. Validation of methods
使用样本数据进行算法实践 Application of the algorithms discussed in lecture to sample data, illustration of validation analysis.
项目回顾与成果展示 Program Review and Presentation
论文辅导 Project Deliverables Tutoring
项目收获
7周在线小组科研学习+5周不限时论文指导学习 共125课时
项目报告
优秀学员获主导师Reference Letter
EI/CPCI/Scopus/ProQuest/Crossref/EBSCO或同等级别索引国际会议全文投递与发表指导(可用于申请)
结业证书
成绩单
LASER Award in Research Skills for Academic Study官方证书并转换8 UCAS Tariff Points(可选)