What is Data Science

Data science is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in either structured or unstructured forms.

From scientific discovery to business intelligence, data science is changing our world. The dissemination of nearly all information in digital form, the proliferation of sensors, breakthroughs in machine learning and visualization, and dramatic improvements in cost, bandwidth, and scalability are combining to create enormous opportunity.

The field also presents enormous challenges, thanks to the relentless increase in the volume, velocity, and variety of information ripe for mining and analysis.

It employs concepts and techniques from mathematics, statistics, information science, and computer science, in particular from machine learning, classification, cluster analysis, data mining, databases, and visualization.

“Data scientist” has become a popular occupation with the Harvard Business Review dubbing it “The Sexiest Job of the 21st Century” and McKinsey & Company projecting a global excess demand of 1.5 million new data scientists.

What do Data Scientists do?

Data scientists use their data and analytical ability to:

  • find and interpret rich data sources
  • manage large amounts of data despite hardware, software, and bandwidth constraints
  • merge data sources
  • ensure consistency of datasets
  • create visualizations to aid in understanding data
  • build mathematical models using the data
  • present and communicate the data insights/findings.

They are often expected to produce answers in days rather than months, work by exploratory analysis and rapid iteration, and to produce and present results with dashboards.

How to Build Your Profile for MS in Data Science?

Thinking for pursuing an MS in Data Science (or, Machine Learning)?

Head to the Home of Data Science and Machine Learning – Kaggle Competition!

Kaggle is a platform for predictive modelling and analytics competitions in which companies and researchers post data and statisticians and data miners compete to produce the best models for predicting and describing the data. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know at the outset which technique or analyst will be most effective.

Kagglers come from a wide variety of backgrounds, including fields such as computer science, computer vision, biology, medicine, and even glaciology. It also includes many of the world’s best-known researchers, including members of IBM Watson’s Jeopardy-winning team and the team working on Google’s DeepMind. Many of these researchers publish papers in peer-reviewed journals based on their performance in Kaggle competitions.

How does Kaggle Competitions Works?

  1. Companies and organizations prepares the data and a description of the problem. Kaggle frame the competition, anonymize the data, and integrate the winning model into their operations.
  2. Participants, like you, experiment with different techniques and compete against each other to produce the best models. Work is shared publicly through Kaggle Scripts to achieve a better benchmark and to inspire new ideas. Submissions are made through Scripts or through private manual upload. For most competitions, submissions are scored immediately (based on their predictive accuracy relative to a hidden solution file) and summarized on a live leaderboard.
  3. After the deadline passes, the host company pays the prize money for the winning solution. many companies recruit participants based on their place on the leaderboard, final score, and submitted scripts.
  4. Alongside its public competitions, Kaggle also offers private competitions limited to Kaggle’s top participants.

What Kaggle competition should a beginner start with?

I’d start with the tutorials first just to make sure you have a good grasp of the primary tools and techniques that most people use: https://www.kaggle.com/wiki/Home

Afterwards, Titanic: Machine Learning from Disaster is a good competition to start. It will prep you with fundamentals of data science – the data size is manageable, the problem is interesting, and you need minimum overhead in terms of computational requirements.

If you aren’t decided on your weapon of choice, I would suggest that you start with R. The tutorial can be found at Titanic: Machine Learning from Disaster. Follow this up with Python, Titanic: Machine Learning from Disaster.

Since your objective is learning, the most important place for you is the Kaggle forum. There is just tons of valuable information buried in those posts. What worked, what didn’t work, the issues others are facing, interesting patterns and visualizations, and neat tricks. I find it to be the best “practical” data science guide out there.

Once you have a sound footing, maybe in a couple of weeks, the next step would be to try something with text data like Sentiment Analysis on Movie Reviews.

Add to that some competition that uses audio and/or video data. There could be a few running or you can always dig up the old ones like Challenges in Representation Learning: Facial Expression Recognition Challenge and The Marinexplore and Cornell University Whale Detection Challenge

Career in Data Science

A career in Data Science involves statistics, mathematics, business, economics and Computer Science.

After a Master’s in Data Science, you can work in various sectors such as finance, healthcare, consulting, retail or consumer products – basically any field where there is lots of data and there is a requirement to analyze large data sets to develop custom models and algorithms to drive business solutions.

With regard to Data Science, the primary focus is on applications rather than research. You use some knowledge from Computer Science (data structure, deep learning, computer vision, natural language processing, machine learning) in your data science role.

Typical employers include Walmart, Tesla, Intuit, Collective Health and numerous financial/trading companies on Wall Street.

The average salary for a job in Data Science in the US is about $113,000 as per Glassdoor. Another source – Payscale – puts the median salary at about $93,000.

Let’s have a look at the application of data science in different fields.

#1 Data Science in Retail

With online commerce, retail data is increasing exponentially in terms of volume, the velocity at which data is being generated and their value for the kind of insights and profit they could offer. As per McKinsey’s report on Big Data, retailers using big data analytics could raise their operating margins by as much as 60 percent.

The following points are a few of the applications of big data in retail:

  1. Customer Experience: Personalized recommendation based on purchase history, sentiment analysis, predictive analytics for improving customer experience across all channels and devices
  2. Merchandising: Improving layout, product placement and promotional display, identify cross-selling opportunities
  3. Marketing: Location-based personalized offers on mobile phones, real-time pricing, better targeted campaigns
  4. Supply chain logistics: real-time inventory tracking and management, demand-driven forecasting, route optimization and efficient GPS-enabled transportation


#2 Data Science in Health Care

In the US, health care expenses represented 17.6% of the GDP in 2013 with annual spend of $2.6 trillion. Out of this, $600 billion was consumed by waste and fraud. By 2020, this figure is estimated to rise to nearly 20%.

Big Data has the potential to help physicians make better decisions across the board – from personalized treatments to preventive care, while, at the same time, slashing the cost of providing health care services.

The following list details some of the applications of big data in retail:

  1. Personalized medicine: Create a personalized treatment plan based on individual biology using data from various sources including clinical trials, electronic medical records, online patient network, genomics research etc
  2. Genomics: Inexpensive DNA sequencing and next-generation genomic technologies are changing the way health care providers do business. They are getting better understanding of the genetic bases of drug response and disease by combining genomic data with other data in disease research.
  3. Predictive analytics and preventive measures: Some examples are: Mount Sinai Medical Center reduced its readmission rate, Texas Health identified high-risk patients to offer them customized interventions and Methodist Health System predicted patients who will need high cost care in future.
  4. Patient monitoring and home devices: Wearable body sensors – sensors tracking everything from heart rate to testosterone to body water – can take vital stats of the patients every minute of the day. Personal ECG heart monitor, medical monitoring devices and mobile applications are cropping up daily.

#3 Data Science in Finance

There has been a flood of financial data in recent times from various sources such as social media activity, mobile interactions, server logs, real-time market feeds, customer service records, transaction details and, of course, information from existing databases.

The following list details some of the applications of big data in finance:

  1. Sentiment analysis: Use natural-language processing, text analysis and computational linguistics to discover what people really think.
  2. Automated risk credit management: Alibaba has successfully used big data to offer loans to entrepreneurial online vendors without any collateral by using their transaction records, customer ratings, shipping records and a host of other info.
  3. Real-time analytics: helps in fighting financial fraud, improve credit ratings and providing more accurate pricing
  4. Predictive analytics: For example, whether certain customers are likely to pay off their credit cards using the demographic characteristics of customers’ neighborhoods and making calculated predictions.

#4 Big Data in Telecom

Mind Commerce, a market research firm, predicts that the big-data-driven telecom analytics market will grow by nearly 50 percent from 2014 to 2019 and forecasts that by the end of 2019, the market will be up to $5.4 billion in annual revenue.

Here are some applications of big data in finance:

  1. Personalized services: applications include determining a subscriber’s lifetime value, reveal cross-channel insights and avoid customer churn
  2. Network optimization: using real-time and predictive analytics, analyze subscriber behavior and create individual network usage policies
  3. Location-based initiatives: use geo-fencing and sensor technology data scientists can predict a subscriber’s location and specific data needs with stunning accuracy to, for example, create targeted offers, when a subscriber is in a super market
  4. Churn prevention: combine variables such as calls made, minutes used, number of texts sent, average bill amount and behavior such as visiting competitor’s website to predict the likelihood of subscriber changing to a competitor for bargains

There are similar applications of big data in other domains such as Utilities, Travel and Transportation, Insurance, Pharmaceutical, Manufacturing, Gaming, Hospitality, Biotech and Energy.


Let’s quickly compare a career in Data Science with a career in Machine Learning.

Career in Machine Learning

Machine learning is the study of how computers can learn complex concepts from data and experience, and seeks to answer the fundamental research questions underpinning the challenges outlined above.

The field of machine learning crosses a wide variety of disciplines that use data to find patterns in the ways both living systems, such as the human body and artificial systems, such as robots, are constructed and perform.

Whether it’s being applied to analyze and learn from medical data, or to model financial markets, or to create autonomous vehicles, machine learning builds and learns from both algorithm and theory to understand the world around us and create the tools we need and want.

In a Machine Learning job, you are expected to solve new and emerging technical challenges related to human-machine interactions.

In your role, you will utilize core computer science and engineering skills like high-performance computing, distributed systems and applied math.

You are expected to have 5+ years of experience in programming parallel and distributed systems, debugging low-level problems, performance analysis and optimizations, and numerical methods.

Also include – experience in using machine learning techniques for classification, regression, or ranking problems, experience in building predictive models for recommendations or personalization, design and implementation of shipping, innovative consumer products etc.

Typical employers include Facebook, Amazon, Apple, Google and Microsoft.

Check out this tip to learn more about MS in Machine Learning.

How to shortlist universities for MS in Data Science?

Factors important to identify best universities in machine learning / data science

1) University reputation (rankings)

This factor is important in general but more so for the data science programs. This is because most of them are relatively new, i.e. around 2-4 years old and it’s difficult to establish credibility in the industry in such a short duration. – Thus, the university brand name plays a key role on how your candidature will be perceived in the industry after completing the degree. No doubt, your knowledge would always matter more, but university reputation plays a crucial role for new courses.

2) Location

Location plays a pivotal role in practical learning opportunities outside the campus. Practical training typically comes in the form of internships, capstone projects, weekend hackathons, etc. Given that data science is a highly application-oriented domain, practical training would play a crucial role in your overall development. – While you are in the program, its location can have quite an impact on your profile in terms of getting good internship opportunities. Also, a strong data science community gives access to specialized skill meetups and hackathons. For instance, the data science communities in cities like New York or Silicon Valley will be much stronger than other suburban locations. – After the program, a good location definitely helps with the job search as there will be ample employment opportunities.

3) Curriculum

I believe this is the most important aspect and the first thing which you should check out. The curriculum actually tells you what subjects you’ll be studying and gives an impression about the relevance of the program for you. Typically, coursework is divided into core courses (compulsory courses) and electives. You should also check out the list of courses from which you can choose the electives. – Curriculum flexibility i.e. the ratio of elective courses, is another important factor. It can vary from as high as 60-70% in some courses to almost none in others.

4) Industry Collaborations

Since most of the programs in data science related courses are professional, industry collaborations will play a key role in your experience through the program. You should check out the particular companies, which domain they belong to, what sort of activities are conducted like technical talks, research collaboration, capstone projects, etc.

5) First Hand Experience

The first step is to log into the university’s website and have a look at the details of the program. You can do a first level filtering based on the evident information on website. But, an equally important aspect is to talk to people who are already studying there as well as the university’s alumni. You can definitely apply to all the colleges you like, but for making the final choice, I can’t over-emphasize the importance of this step, which will give you a true picture about the college administration and recognition in the industry. These factors are really hard to judge from any university’s website. Also, given that these programs are mostly new, the amount of discussions on third-party websites like Quora are also limited. – If you’re wondering how to find these people, again LinkedIn and Facebook are your best friends!

6) Program name is not so important!

The traditional philosophy – ‘Don’t judge a book by its cover’ works in this case as well. Since Data Science (and Machine Learning) is a non-traditional program, you’ll find all sorts of names like Masters in Analytics, Masters in Business Analytics,Masters in Data Science, Masters in Predictive Analytics, Masters in Marketing Analytics, Masters in Information Systems, etc. Trust me, names can be very misleading. Although, they do give you an idea of what the program is all about, the name of the program should definitely be your last concern, if at all!

16 Top Schools for MS in Data Science

The following schools are some of the best schools that offer programs in Data Science, and you can consider these for your reach, match and safe shortlist.

  1. Carnegie Mellon University: MS in Computational Data Science
  2. Stanford University: MS in Statistics: Data Science
  3. Georgia Institute of Technology: MS in Analytics
  4. MIT Sloan: Master of Business Analytics
  5. Columbia University: Master in Data Science
  6. Michigan State University: MS in Business Analytics
  7. University of Washington: MS in Data Science
  8. University of Southern California: MS in Data Informatics
  9. University of Chicago: MS in Analytics
  10. New York University: MS in Data Science
  11. Northwestern University: MS in Analytics
  12. North Carolina State University: MS in Analytics
  13. Texas A&M University: Masters in Analytics
  14. University of Cincinnati: MS in Business Analytics
  15. Arizona State University: Master in Business Analytics
  16. Illinois Institute of Technology: Master of Data Science

Do you need help with shortlisting the right school for you? Opt for our Premium Counseling for personalized guidance at every step of your application process for MS in Data Science.

Pin It on Pinterest