Natural language (NL) refers to the language spoken/written by humans. NL is the primary mode of communication for humans. With the growth of the world wide web, data in the form of text has grown exponentially. It calls for the development of algorithms and techniques for processing natural language for the automation and development of intelligent machines. This course will primarily focus on understanding and developing linguistic techniques, statistical learning algorithms and models for processing language. We will have a statistical approach towards natural language processing, wherein we will learn how one could develop natural language understanding models from statistical regularities in large corpora of natural language texts while leveraging linguistics theories.
Pre-requisites:
Must: Proficiency in Linear Algebra, Probability and Statistics, Proficiency in Python Programming
Desirable: Introduction to Machine Learning (CS771) or Probabilistic Machine Learning (CS772) or Topics in Probabilistic Modeling and Inference (CS775) or equivalent course.
Course Instructor:
Dr. Ashutosh Modi
Course TAs:
Alok Kumar Trivedi (Email:
alokt@cse.iitk.ac.in )
Amar Raja Dibbu (Email:
amard@cse.iitk.ac.in )
Chabil Kansal (Email:
chabilk@cse.iitk.ac.in )
Rahul Kumar (Email:
rahulkumar@cse.iitk.ac.in )
Tanikella Sai Kiran (Email:
tskiran@cse.iitk.ac.in )
Course Email:
In case you want to communicate with the instructor, please do not send any direct emails to the instructor (these will most likely end in spam), use this course email for the communication:
nlp.course.iitk@gmail.com
Weekly Sessions:
Tuesday 1200 -1315 Hrs
Wednesday 1200 -1315 Hrs
Lecture Venue:
CSE Dept., KD101
Course Annoucements:
The course will be managed via
Slack. Please sign-up on
Slack for course annoucements, study material, and resources. For joining the workspace, please contact the instructor or one of the TAs.
Tentative Grading:
This is a research project oriented course and the project carries the maximum weightage. The tentative weightage for different components are as follows.
Class Participation: 3%
Quizzes/Exams: 7%
NLP Challenge: 20%
Project: 70%
Lectures
References:
There are no specific references, this course gleans information from a variety of sources likebooks, research papers, other courses, etc. Relevant references would be suggested in the lectures. Some of the frequent references are as follows:
- Speech and Language Processing, Daniel Jurafsky, James H.Martin
- Foundations of Statistical Natural Language Processing, CH Manning, H Schtze
- Introduction to Natural Language Processing, Jacob Eisenstein
- Natural Language Understanding, James Allen
- Deep Dive into Deep Learning, Aston Zhang, Zack C. Lipton, Mu Li, Alexander J. Smola
- Neural Network Methods for Natural Language Processing, Yaov Goldeberg
Useful Resources:
- ACL Anthology: Repo for NLP Research Papers
- Writing Code for NLP Research
- Deep Learning with PyTorch
- Spacy NLP Toolkit
- Guide To ML Research