Wiser By A Semester Of Statistics

– Himanshu Goyal


The first step is the hardest.
– Unknown

When one first hears about the PGDBA course, the aspect which captures one’s attention the most is that it is studied at three different institutes. It seems a bit confusing at the start but soon one gets fascinated by it. The fact that each aspect of business analytics is taught by a specialized institute seems like a perfect horses-for-courses strategy.

When I first heard about it, there was one name out of the three which captured my imagination the most and made me take this course seriously. That was ISI Kolkata – the pinnacle for statisticians. It is not to say that IIT Kharagpur and IIM Calcutta are insignificant names – popularity-wise that would be opposite – but the fact that an institute which is known more for its research-oriented approach, and not for focussing on fat pay checks, was a part of this curriculum made me think that I might actually learn something in this course and not just bide my time until the placement season. Hence I can say that ISI is one of the primary reasons for me being in this course. Now that the ISI phase is over I feel it imperative to share my experience. Due to space constraints, I will limit this article to my academic experience and not the student life there.

We studied 5 courses there and though I will not be able to do justice to each of them in this short space, I will try my best to give an overview of what we studied in each course.

1. Stochastic Processes and its Applications:

Randomness is the true foundation of mathematics. – Gregory Chaitin

This course can be thought of as 3-in-1. It started with probability problems (check out gambler’s ruin problem if gambling is your thing), their ingenious solutions and then expanded to probability distributions. In almost no time at all, we shifted gears from simple coin tosses to complex problems involving combinations of probability and calculus. When we got done with it all in one month, we found out that it was a precursor to stochastic processes. We then spent a good amount of time studying Markov chains which has innumerable practical applications in today’s world (Google’s PageRank algorithm). If that was not enough in one semester, we also got an introduction to time series forecasting in final stages of the course.

Probability, Stochastic Processes and Time Series forecasting – All three have the potential to be a separate course in themselves for one whole semester. While we couldn’t cover each one of them extensively, the course was a good starting point for someone looking for further studies in this area. It also covered enough of the three sub-topics so that we wouldn’t be sitting ducks when facing practical problems in this subject.

2. Statistical Structures in Data:

Statistics is the grammar of science. – Karl Pearson

Arguably the most important course of the semester. This course dwelled into pure statistics which started from simple properties of a distribution such as mean, median, variance, etc. We then moved to solve regression problems which started from univariate and culminated in multivariate statistics. This course also introduced us to various concepts in machine learning such as PCA, Factor Analysis, GLM and Decision Trees. Don’t worry if all these terms look Chinese to you, many of us weren’t aware of it either before stepping into ISI.

Going by the internship interview experience of the first batch of PGDBA, it was considered to be the most important course. I believe that is because being a data analyst is not about being able to run a code in a software, rather it is about understanding the concepts behind the scenes and use that to extract maximum information. This course does exactly that.

3. Inferences:

It is a hypothesis that the sun will rise tomorrow; and this means that we do not know whether it will rise. – Ludwig Wittgenstein

English is never going to be the same for you once you attend this course. This course gave a glimpse into how technical can one get into statistics. We found that many things which would seem a mere nuisance to a ‘normal’ person actually have a lot of difference when it comes to statistics. Likelihood and Probability take on a new meaning altogether; Parameter, Statistic, Estimator and Estimate will always seem like a case of so-near-yet-so-far. We also covered the concept of hypothesis testing and its applications. Any statistics-101 book will have these topics and this particular course coupled with the one above forms the crux for which one comes to ISI.

It is quite easy to commit mistakes in statistics by missing out on one or two seemingly trivial assumptions. This course taught us to be really careful with what we state and what we assume. At the same time, it taught us concepts which have direct applications in the real world and in the realm of statistics.

4. Computing for Data Sciences (CDS):

Computers are incredibly fast, accurate and stupid; humans are slow, inaccurate and brilliant; together they are powerful beyond imagination. – Leo Cherne (This quote is often wrongly attributed to Einstein)

One of my batch-mates has already written a whole article dedicated to this course and its faculty. And the reason for that is simple – We learnt the most in this course. This course takes you on a journey of data science where you become fascinated by it. Each one of our classes was focussed on one topic and the number of topics we ended up covering in the entire semester was humongous. If coverage of so much theory wasn’t enough, we also had hands-on sessions in class on R. This was also the only course in ISI in which we did a complete data science project from scratch. You can find the list of projects done by 2016 batch here (and here for 2015 batch).

We have spent one month at IIT KGP and we are already seeing the benefits of what we covered in CDS. Having got a glimpse into the multitude of topics earlier, it is now much easier for us to get into the flow of each topic which are being taught in separate courses.

5. Fundamentals of Database Systems:

Unless structure follows strategy, inefficiency results. – Alfred D. Chandler

This was probably the least thought-out course of the five in this semester. And that worked both ways for us. We finished the stipulated curriculum halfway through the semester and hence the faculty left it unto us what we wanted to cover. Students suggested the topics and were duly obliged. The first half of semester got us acquainted with MySQL. We then studied normalization of databases in detail which talked about how an efficient and accurate database can be designed. We also studied Information Retrieval, MapReduce and Market-basket model in the latter half of semester.

In total, we spent exactly 4 months at ISI. That is by no means sufficient amount of time to get the maximum potential out of that place but you make peace with what you get. The learning at ISI was unique at the very least and it provided the perfect kick start we needed in this course. We covered a breadth of topics in a couple of courses and went into depth as well wherever necessary. These courses have laid a solid foundation and now we are in a position to apply these concepts and see if we can really make the data speak. The first and the hardest step has been taken. Now is the time to cover the distance.

This Article Has 2 Comments
  1. Umang Varshney Reply

    Please share your preparation strategy

Leave a Reply

Your email address will not be published. Required fields are marked *