Basic Mathematics for Data Science (Part 1)

Sample article displaying basic mathematics concepts.
To start learning Machine Learning or Data Science, learners will be overwhelmed by the vast amount of knowledge related to Mathematics. This is quite natural because Mathematics is the foundation for any modern Science field.
A solid foundation in mathematics behind the algorithms will give you a significant advantage when conducting research and working in the field of Data Science. Moreover, businesses are willing to spend a lot of money on technology inside machines rather than operators who have no deep knowledge of them. Do you understand this point?
To delve deeper, we can imagine a common scientific process in audio processing as follows:
- Information extraction, physical characteristics of sound, and modeling the sound formation process.
- Building hypotheses about them.
- Estimating data source quality.
- Quantifying data uncertainty and predictions.
- Determining hidden patterns from the information flow.
- Exploring and understanding the limits of a model.
- Studying the abstract mathematical and logical evidence behind it.
So, we can see that the nature of Data Science is not tied to a specific field or topic but can address various diverse issues and phenomena such as cancer diagnosis or weather forecasting.
1) Understanding Variables, Functions, Equations, and Graphs
The basic Mathematics we usually learn from secondary school onwards. Let’s summarize some basic foundations:
- Logarithmic functions, exponential functions, polynomial functions, rational numbers.
- Geometry and basic theorems, knowledge of trigonometry.
- Real numbers and complex numbers with basic properties.
- Knowledge of sequences, sums, inequalities.
- Drawing and representing graphs, knowledge of Descartes coordinates and polar coordinates, conic sections.
How to apply this knowledge?
For programmers, we have all learned about search and sorting algorithms. If you have not encountered them yet, I can explain as follows:
- Search algorithms are like finding your seat number in the exam room from the list.
- Sorting algorithms simply arrange elements in the list in ascending and descending order. The speed of execution depends on various factors such as data set size, algorithm complexity, …
Thus, if I apply them to optimize searching in a database with tens of millions of items, it will not be difficult for us to encounter the “Binary Search” concept. However, if you want to optimize the program, we should not just sit there and apply the existing source code for testing, right? ^^
To make a search program optimal, we need to understand the concept of logarithm and equation repetition. Through computing during logarithm execution time, you can imagine the appropriate search space for the algorithm to build the appropriate data structure.
That’s why “Binary Search” always comes in various versions with increasing performance.
Refer to the Binary search Algorithm: https://en.wikipedia.org/wiki/Binary_search_algorithm
Where can you supplement mathematics knowledge?
- Bssic Website Machine Learning : https://machinelearningcoban.com/math/#luu-y-ve-ky-hieu
- Data Science Math Skills - Coursera: https://www.coursera.org/learn/datasciencemathskills
- Introduction to Algebra - edX: https://www.edx.org/course/introduction-algebra-schoolyourself-algebrax-1
- Algebra I - Khan Academy: https://www.khanacademy.org/math/algebra
Note: You can apply for Financial Aid to learn for free and obtain certificates from Coursera. Ad has successfully applied for Financial Aid for 2 courses. ^^
2) Statistics Knowledge
By now, many people often think that Statistics is mainly related to Economics. But perhaps you will be truly surprised because they are closely related to Data Science.
It can be said that if you are in the field of Data Science, understanding Statistics and Probability is essential.
In a conversational way, we also have a formula: Data Science = Big Data + Statistics + Computer Science
And now, in Vietnam, we always have many practical issues. With such a broad topic, planning is extremely important to ensure that we always cover everything:
- Summarize data and descriptive statistics, central tendencies, variances, covariances.
- Basic probability: fundamental, expectation, probability calculation, Bayes’ theorem, conditional probability.
- Probability distribution functions: uniform, normal, binomial, quadratic, central limit theorem.
- Sampling, measurement, random number generation.
- Hypothesis testing and confidence, error,…
- Linear regression, regularization techniques.
How to apply this knowledge?
The knowledge of Mathematics, Statistics, … is the difference in the mindset of a Data Scientist. Your task is to ask questions and thoroughly investigate the real problems you need to solve. Once the problem is determined, you will proceed to collect data and analyze it carefully, similar to Sherlock Holmes.
With knowledge of Mathematics and Statistics, you can set hypotheses about the data you are examining or predict results from hypotheses. You will not stop at that but continuously form new questions to analyze the data accurately.
Where can you supplement Statistics knowledge?
- Statistics and Probability in Data Science using Python - edX: https://courses.edx.org/courses/course-v1:UCSanDiegoX+DSE210x+3T2017/course/
- Data Science - D2Academics: https://d2academics.thinkific.com/courses/take/data-science/
- Statistics with R Specialization - Coursera: https://www.coursera.org/specializations/statistics
- Business Statistics and Analysis Specialization - Coursera: https://www.coursera.org/specializations/business-statistics-analysis
- (Sách tiếng Việt) Phân Tích Dữ Liệu Với R – Hỏi Và Đáp (Tái Bản 2018) - Nguyễn Văn Tuấn
Through the first part of the Mathematics knowledge for Data Science, I hope that I can provide you with a little luggage on the road to becoming a real Data Scientist. The remaining mathematical knowledge will be addressed in the next part. Please follow me to stay updated on the latest posts!