Mathematical essentials for data science

Mathematical essentials for data science

Mathematical essentials for data science

Data science is a young discipline which requires a solid mathematical foundation. The requirement varies according to the expectation based on whether you are an applied data scientist who is keen on using the existing algorithms to solve business problems or are you developing new algorithms yourself?
Rigorous mathematical training is undoubtedly helpful in acquiring knowledge and understanding of the algorithms and methods used in data science in general and machine learning in particular.

In this article, we highlight some of the mathematical topics which are necessary for an applied data scientist ranging from very fundamental to more advanced topics.

Geometry and Calculus

Knowledge of Geometry is essential in graphing and plotting and data visualization in general. Basic geometry and trigonometry, conic sections, Cartesian and polar coordinates, and analytical geometry are among key topics needed in the data scientist toolbox. Calculus is the study of continuous change and one of the oldest and critical mathematical disciplines. Differential and integral calculus is used almost in every field in physical science, and data science is no exception. Therefore, mastering differentiation, definite and indefinite integral, vector analysis and basic differential equations are essential in data visualization, statistical analysis, and optimization.

Linear Algebra

Linear Algebra is the study of systems of linear equations, and it is a fundamental tool used in machine learning. Notions like vector space, subspace, basis, and linear transformation are key mathematical concepts and are highly recommended for data scientists. Matrix algebra including computing eigenvalues and eigenvectors, diagonalization, LU decomposition, and singular value decomposition are used frequently in machine learning and statistics. In deep learning, linear algebra methods are used to represent and process the neural network. Tools from more advanced topics in multilinear algebra such as tensor decomposition have been implemented in data science and information retrieval.

Probability Theory

Probability theory is the mathematical foundation of statistics and data analysis. A data scientist needs to be quite comfortable with basic combinatorics, random variables, and probability distribution concepts and stochastic process. In Statistics, topics such as hypothesis testing, confidence interval, p-value, and A/B testing are fundamental to conducting statistical tests on data. Modeling random process tools like random walk and Markov process are an integral part of the data science toolbox. Probabilistic graphical models which are a graphical representation of the conditional dependence between random variables have many applications in computer vision, speech recognition, and bioinformatics among many other fields related to data science.

Discrete Mathematics

Discrete Mathematics is the mathematical foundation of computer science, and it includes combinatorics, set theory, propositional logic, recurrence relations, data structure, analysis of algorithms and graph theory. Data science is a multi-disciplinary field and software development is a core component of any AI-based solution. Therefore, a solid understanding of the topics mentioned above is essential for any software developer and enable the developer to write efficient and quality code. On the other hand, for a data scientist to perform network analysis, it is natural to utilize computational graph theory algorithms and techniques to investigate the graph structure properly.


Optimization and operation research are very relevant to many fields in science and engineering such as economics, industrial engineering, computer science, and electrical engineering. Optimization sets in the heart of machine learning, and it is the primary tool used in the majority of modeling algorithms.

Optimization is concerned with finding the best possible element in a set concerning a pre-defined criterion, for example, maximizing a real-valued function in an unbounded domain. Linear regression can be formulated as an unconstrained optimization problem whereas Lass and Ridge’s regression represent constrained optimization problems. Maximum likelihood estimation, maximum a posteriori estimation and expectation maximization are examples of common optimization problems in statistics and data science.

Many optimization algorithms have used in machine learning for parameter tuning, for example, stochastic and batch gradient descent are used in linear regression models. In addition to this, Lagrangian duality is one of the main ideas implemented in the theory of support vector machine and its extension to higher dimensionality. In deep learning, many optimization methods such as conjugate gradient and adaptive moment estimation are commonly used for parameter estimations. Many heuristic search algorithms such as genetic algorithm and ant colony optimization are frequently applied in feature selection and as a learning methodology.