Math 531 Regression I. Fall 2017.

  • Instructor: Xingye Qiao

  • Phone number: (607) 777-2593

  • Office: WH-134

  • Meeting time & location: Tuesday and Thursday 8:30–9:55 am at WH-100E.

  • Office hours: Tuesday & Thursday 10–11 pm

This course is a 4-credit course, which means that in addition to the scheduled lectures/discussions, students are expected to do at least 9.5 hours of course-related work each week during the semester. This includes things like: completing assigned readings, participating in lab sessions, studying for tests and examinations, preparing written assignments, completing internship or clinical placement requirements, and other tasks that must be completed to earn credit in the course.

Prerequisite and corequisite

I assume that you have knowledge of (basic) Linear Algebra and at least have taken an undergradaute level Statistical Inference (or Mathematical Statistics) class.

The student is expected to take the class along with MATH 501 (Probability) and 530 (Advanced Linear Algebra).

Graduate students from outside of the mathematical sciences department and senior undergraduate students may take this course with Instructor's approval.

Topics

  • Basic theory of linear regression models: estimation, statistical inference, prediction, model diagnosis, model selection, etc.

  • Proficient use of programming language R with applications to regression models.

  • Basic training in scientific writing.

  • Basic training in presentation.

Learning Outcomes

  • Process and visualize different data types.

  • Identify and evaluate appropriate regression models to be used.

  • Understand the underlying mechanism of predictive models and evaluate and interpret such models.

  • Use analytical tools and software widely used in practice.

  • Work both independently and in a team to solve problems.

  • Learn to present and communicate the findings effectively.

Recommended Texts

The required textbook is Faraway (2014) (see below for details).

  • Required text

    • Faraway (2014). Linear Models with R, Second Edition. (Chapman & Hall/CRC Texts in Statistical Science)

    • Link to R scripts of the book: R codes

  • Recommended additional reading

    • Sheather (2009). A Modern Approach to Regression with R. (Springer Texts in Statistics)

Piazza

Please use Piazza (www.piazza.com) for all electronic communications with me rather than email. Piazza is a question-and-answer platform. It supports LaTeX, code formatting, embedding of images, and attaching of files. You are encouraged to ask questions when you have difficulty understanding a concept or working around a piece of code – you can even ask questions anonymously. Moreover, you can also answer questions from your classmates. I constantly monitor the answers and endorse those which make more sense to me.

Announcement will be sent to the class using Piazza. All enrolled students should create an account with Piazza (www.piazza.com) by visiting their website. Click “enroll now” and select “Binghamton University,” then search for “Math 531.” Alternatively, use this link.

Blackboard

Blackboard will only be used for recording grades on assignments and exams and for distributing solutions.

Grading

  • Homework (20%): homework is assigned between weekly and biweekly.

    • Assigned after each class session. Don't skimp on the homework if you want a good grade.

    • You may discuss the problems with each other in general terms, but you must write your own solution. (If you were found copying or allowing others to copy your homework more than three times, you and your copying source will share the grade equally.)

    • All sources, including those from friends and colleagues, must be cited.

  • Midterm exam (25%): October 24th (tentative, subject to change).

  • Final exam (25%): date to be determined.

  • Course project (25%): a group project will be assigned to each student. Successful completion of the project includes an initial report, a presention and a final report.

  • Lecture attendance and participation (5%): meaningful actitivities (such as asking and answering questions) on piazza also count.

Course project

  • Use an existing data set or create your own one, apply inferential and analytic techniques learned in the class, write reports and give a presentation.

  • You may choose to work with 1–2 persons and you may submit the work products as a team. If you can not find a team member, I may assign one to you. Before the team-forming deadline, you can freely switch teams. After the team-forming deadline, changes to the composition of the team is rare. If you feel the teammates that you initially chose or were assigned to are incorporative, you can choose to leave the team and work alone. You may work on the same project you has been worked on but you must write your own reports and make your own presentation. You must choose to do so voluntarily. Nobody can force a team member out. However, you are not allowed to switch to another team. If you decided to leave your team and work alone, you must do so by October 19, 2017.

  • Members on the same team will receive the same grade for the course project. The total points of the project is 100, which can be divided into three parts:

    • Team-forming: On or before October 3, 2017, each team sends me a note to inform me the composition of the team.

    • Initial report (10 pts): due October 19, 2017.

    • Presentation (30 pts): each team will give a 30-minute presentation about the outcome of the project.

    • Final report (60 pts): due in the final exam week.

  • The initial report should give description of the data, potential research questions and possible methods to use. The initial report should not exceed one page. The initial report should be sent by the leader of each team to me via individual note on piazza, cc-ing the other team members.

  • Tips on writing the final report
    The final report should include

    • Description of research questions / issues (either scientific or statistical question). The significance of the problems.

    • Description of the data.

    • Preliminary studies: data visualization, dimension reduction, feature extraction, feature selection, statistical inference, model assumption checking (normality? transformation needed?), etc.

    • Statistical analysis

      • Methods: what analyses were done and why. If there is any challenge in analysis, describe your approach to tackle the problem.

      • Results: A small number of well-designed and tailored tables and graphics may be appropriate. No copy-paste of large chunks of software outputs!

      • Conclusion: Convey your findings to broader audience. Discuss any boarder impact.

    • Enclose all your computer code in an individual message to me on piazza, not part of the final report.

    • Typos and grammatical errors will be harshly penalized. If you are not yet a master of writing, read The Elements of Style. There are a few copies in the library.

    • The final report should be written with the assumption that the audience of the report are college-educated persons who have taken only elemtary statistics. You are NOT reading a report for your professor to read.

    • The final report should not exceed 6 pages, including figures and tables, and must begin with an appropriate title highlighting your choice of topic and analysis.

  • Data Sources
    Find your own data set online (e.g. google “predictive analytics data set”), you will find plenty. Below are some data repositories.

Ph.D. students

In both the homework assignments and the midterm exam, there are sets of extra problems. Ph.D. students in the Department of Mathematical Sciences, and those who are interested in pursuing a Ph.D. in the department, must complete the extra problems. Completion of the extra problems will not lead to bonus points. But unsatisfactory performance on the extra problem set may have negative impact on your continuation to Ph.D.

Software

  • R is chosen to be the statistical software used in this course. There are many online resources where the students can learn the basics of R.

  • Downloads:

    • R - mirror hosted at UC Berkeley.

    • R Studio - a more user friendly platform for R. Note: This is not an R class. R will not even be taught in class. You are expect to learn R programming by yourself.