By reading this, you’ve already taken your first steps on the path the becoming a data scientist. Here are a few reasons to stick along!
- Data Science is fast becoming one of most sought after professions in India and around the world.
- More than 1.5 Lakh job openings for Data Scientists projected in 2020, increasing by 62% from 2019.
- Data is everywhere, it is a universal currency. Learning how to gain insights from data is an invaluable skill to have.
Data is the new oil and Data Science is its combustion engine! While there are many definitions as to what data science really is, we have found it best to describe it as a field revolving around 5 data-related operations.
- CollectionData Collection is the process of gathering data (Numerical, text, video, audio etc), influenced by two major factors namely, the question that needs to be answered by the data scientist and the environment that the data scientist is working in!
- StoringStoring data involves maintaining the collected data for use during the data science pipeline. Structured data is typically stored in relational-databases and aggregated in data-warehouses. With the advent of Big-Data, Data Lakes are now used to store multimodal structured and unstructured data.
- ProcessingData Processing is a set of 3 main sub-processes. Data Wrangling (Extraction, transformation, and loading of the data), Data Cleaning (Handling Missing Values, Outliers, etc) and Data Scaling, Normalization and Standardization.
- DescribingData Description has two aspects. Data Visualising involves representing processed data using graphs, charts, diagrams, and other visualizations. Data Summarisation involves calculating various summary statistics like the mean, median, mode, standard deviation, and variance.
- ModellingStatistical Modelling of data involves modelling the underlying data distribution and relations in the data and then making inferences on top of the model. Algorithmic modelling involves using large volumes of data and optimization techniques to best estimate the distribution and relations of the data, eg Machine Learning and Deep Learning.
If you are familiar with programming (in any language) and comfortable with mathematics at 12th standard (high school) level, then you should be able to follow along with this course. This course is well suited for the following learning objectives:
- Understand the value of data science and the process behind using it.
- Learn the fundamentals of statistics and probability required for data science.
- Use Python to gather, store, clean, analyse, and visualise data-sets.
- Apply statistical methods to formulate and test data hypotheses
- Apply statistical inference to uncover relationships within data-sets
- Understand the role of ML and DL in the data science pipeline
- Understand real-world challenges with several case studies