Assignment | # of graded assignments | Weight each | Weight total |
---|---|---|---|
In-class participation | 1 | 5% | 5% |
Weekly homework assignments | 10 | 2.5% | 25% |
Quizzes | 2.5* | 8% | 20% |
Midterm | 1 | 17% | 17% |
Take-home final exam | 1 | 33% | 33% |
STAT 463 — Introduction to Exploratory Data Analysis
Administrative
- Course meetings: Tuesday and Thursday, Aug 27 – Dec 5 (final exam period: Dec 17). No class on Tuesday, Nov 5 (Election day) and Thursday, Nov 28 (Thanksgiving break)
- Class time: 1:30 – 2:45 P.M. in Nguyen Engineering Building (ENGR) 5358
- Important dates:
- Sep 3: last day to add
- Sep 9: final drop deadline (no tuition penalty)
- Oct 1: end of self-withdrawal period
- Final exam due: Dec 17
- Instructor: Dr. David Kepplinger (he/him/his)
- Email: dkepplin@gmu.edu
- Office: Nguyen Engineering Building (ENGR), Room 1711
- Office hour: Tuesday, 3 – 4 P.M. in-person in ENGR 1711 (virtual office hours over Zoom only by appointment; Zoom link)
- GTA: Siqi Wei
- Email: swei3@gmu.edu
- Office hours: Wednesday, 10 – 11 A.M. virtual over Zoom; Zoom link)
- Canvas course page: https://canvas.gmu.edu/courses/25176
Course description
Overview
The following topics will be covered in some detail.
- Introduction to the programming language R and the ``tidyverse’’.
- Visualizing data via ggplot2 in R.
- Simple and multivariate linear regression.
- Variable selection in regression models.
- Regularized estimation in regression models (LASSO, EN, and Ridge regression).
- Principal component analysis.
- Cross-validation.
- Regression/classification trees and random forests.
- Generalized Linear Models, particularly the logistic regression model.
- Cluster analysis.
Learning outcomes
After successfully completing this course you are expected to have mastered the following.
- Access data from various sources.
- Wrangle data for the data analysis workflow.
- Compute basic descriptive statistics and perform comparisons to explore quantitative data.
- Use and interpret appropriate statistical models to find patterns in data.
- Visualize data to gain insights about the data, the underlying phenomena, patterns, and to communicate selected results with a defined audience.
- Understand and be able to compute metrics for comparing statistical models.
- Use methods of statistical learning to generate candidate models and identify relevant features.
Prerequisites
STAT 350, 354, 360, or BUS 310. Learners should have working knowledge in R and be able to read and write R code.
Textbooks
The main textbooks for this course are
- James, G., Witten, D., Hastie, T., Tibshirani, R. (2021). An Introduction to Statistical Learning with Applications in R. 2nd Edition. Springer. This book is available online for free.
- Wickham, H., Çetinkaya-Rundel, M., Grolemund, G. (2023). R for Data Science. 2nd Edition. O’Reilly. This book is available online for free.
- Lander, J. (2017). R for Everyone: Advanced Analytics and Graphics. 2nd Edition. Pearson. This book is available online through the GMU Library.
Further details which will likely not be covered in detail during class but which may be of interest to some learners can be found in
- Wickham. (2019). Advanced R. Chapman and Hall/CRC. doi:10.1201/b17487. Available online.
Additional articles and book chapters may be posted in our Canvas course as different topics are discussed.
Academic integrity
Mason is an Honor Code university; please see the Office for Academic Integrity for a full description of the code and the honor committee process. Three fundamental principles to follow at all times are that:
- all work submitted be your own, as defined by the assignment;
- when you use the work, the words, or the ideas of others, including fellow students or online sites, you give full credit through accurate citations; and
- if you are uncertain about the ground rules on a particular assignment or exam, ask for clarification.
No grade is important enough to justify academic misconduct.
Use of generative AI
Students may use Generative AI tools whenever they believe it would be useful to their learning of course material. However, data files shared with learners must not be uploaded to any generative AI tools. The usage license of the data files provided for this course do not allow this type of use. Students must properly cite the used tool(s) and a statement-of-usage is required. This includes citations and statement-of-usage in R code as comments. All academic integrity violations will be reported to the office of Academic Integrity.
Generative AI tools may only be used if following the fundamental principles of the Honor Code. This includes being honest about the use of these tools for submitted work and including citations when using the work of others, whether individual people or Generative-AI tools.
Although you are unrestricted with your use of Generative AI tools, you will be responsible for any incorrect, biased, or unethical information that is submitted. Your assignment grade will reflect the inclusion of any material that is incorrect or offensive. Uploading data files shared with learners of this course to generative AI tools is an honor code violation and will be handled as such. In addition, learners may be liable for copyright infringement.
Logistics
The class is scheduled as a face-to-face meeting on-campus. All learners taking courses with a face-to-face component are required to follow the university’s public health and safety precautions and procedures outlined on the university Safe Return to Campus webpage (). If the campus closes, or if a class meeting needs to be canceled or adjusted due to weather or other concern, learners should check the Canvas course for updates on how to continue learning and for information about any changes to events or assignments.
Communications
The Canvas site for this course is the primary channel of communication. Please check the Canvas course regularly for updates! Information posted on the Canvas site includes
- announcements,
- lecture notes,
- homework assignments, quizzes, midterm and final exam,
- changes to the posted office hours,
- handouts and readings.
Any question related to concepts and topics should be asked on the Course Q&A (under Discussions > Course Q&A). Questions will be visible to all registered students, and everyone is expected to actively participate in answering questions posted by peers. Active participation in answering questions will be counted towards the participation grade.
E-mail communication must be restricted to questions relating to sensitive and confidential information (such as grade concerns, personal circumstances requiring specific accommodations, etc.).
- E-mails will be returned within 2 business days and may not be returned on weekends/holidays.
- When you send an e-mail to me, please put STAT 463 at the beginning of the subject line.
- E-mails related to this course must be sent and received via your Mason e-mail account. E-mails sent from other e-mail accounts may not be answered. (This is a university policy and part of your guaranteed rights under FERPA.)
- E-mails with questions that should be posted to our course Q&A may not be answered.
Should you have concerns that you may not be able to fully participate or engage in any of the activities listed below, please do not hesitate to contact me either by e-mail or speak to me in person during office hours or after class. We can discuss alternative arrangements that suit your needs.
Hardware requirements
We will frequently use laptop computers for in-class activities. Please be respectful of your peers and your instructor and do not engage in activities that are unrelated to the class.
Software requirements
This class will use R (version 4.4 or higher; available from ) and the RStudio IDE (version 2024.04.0 or newer; available from ). Assignments and in-class activities will use interactive tutorials powered by Posit Connect (formerly known as RStudio Connect). To access these assignments a recent web browser with Javascript support fully enabled is required. For all assignments the complete code must be submitted for reproducibility.
Activities and assignments in this course may sometimes use web-conferencing software (Zoom). In addition to the requirements above, students are required to have a device with a functional camera and microphone. In an emergency, students can connect through a telephone call, but video connection is the expected norm.
Grading
Your grade in this course will be based on weekly homework assignments of various types, in-class quizzes, an in-class midterm, a take-home final exam, and participation. The number of quizzes and homework assignments, and their relative grading emphasis may be adjusted. The instructor reserves the right to change the weights if needed.
Written and oral communication are an integral part of any statistical work, and as such, grammar, style, and spelling are part of grading rubrics applied to all deliverables. You are strongly encouraged to use the resources and tutoring offered by the writing center (https://writingcenter.gmu.edu).
All assignments in this course are designated as individual assignments, which are to be undertaken independently. You may discuss your ideas with others but everything you turn in must be your own work. You may not share analyses, graphs, code, and other materials. You are responsible for making sure that there is no reason to doubt that the work you hand in is your own.
Attendance
Attendance is mandatory and in-class participation counts towards your final grade. In case of approved absence, you are expected to get notes from your peers. You are responsible for material covered in class and announcements made during class.
Participation
Success in this course requires active participation in in-class activities and discussions, for which you will need to prepare in advance for each class period. Accordingly, you are expected to prepare for class period by
- reading the corresponding sections of textbooks or research articles to be covered in class,
- reviewing class materials posted on Canvas,
- familiarizing yourself with the use of the covered methods and techniques in R.
Homework assignments
There will be 13 weekly homework assignments throughout the term which will vary in length and content. Some involve in-class activities and continuing problems started in class, others involve solving exercises related to the material covered in class. Only ten of these 13 homework assignments will be graded. It is your responsibility to not submit a homework assignment if you do not want it to be graded. Once a homework is submitted, you cannot withdraw that homework and it will count towards your final grade. Once you submitted your 10 homework assignments over the course of the semester, any following homework assignments will not be graded or considered for the final grade. Due dates and links to the homework assignments will be posted on Canvas.
Late submissions will be penalized by reducing the total number of points possible by 10% of the original total number of points for each day late. For example, if a homework assignment is worth a total of 10 points, it will be worth only 9 points when submitted within the first 24 hours after the due date, 8 points when submitted between 24–48 hours after the due date, and so on. Submissions will not be accepted more than 4 days past the due date.
Quizzes
There will be 3 in-class quizzes spread across the semester. For each quiz you will have 1 hour. Quizzes will cover the materials presented up to and including the previous lecture, with an emphasis on the most recent topics.
Your worst quiz out of the three will count only half the weight of the other two quizzes towards the final grade.
Midterm
There will be an in-class midterm exam. You will have 1:30 hours for the midterm. It will cover all materials presented up to and including the previous lecture.
Grading Policies
Percentage grade | Letter grade |
---|---|
≥ 90% | A |
≥ 80% but < 90% | B |
≥ 70% but < 80% | C |
≥ 60% but < 70% | D |
< 60% | F |
Regrading policies
You have at most one week after a score is posted for an assignment to appeal the score. If you want parts of an assignment remarked, send me an email specifying the question/part and the reason for requesting a review of grading. If you do not notify me in writing of any issues with your score within that time, then the posted score stands (whether or not it is correct).
Further policies
Please see the George Mason Common Course Policies for additional policies governing this course.