From the world of data.

Abhishek V Tatachar
6 min readMay 10, 2021

“I need a cloth of 1 meter”. “This ice cream is chunky”. “The average height of people in my class is 5 foot and 6 inches”. If you are wondering why I mentioned these statements, the answer to the question is that these statements indicate units of information called data. Data is defined as the collection of facts. Data is not always numbers. It can be numerical (also called quantitative data) or categorical (also called qualitative data). Data is often stored in the form of tables and these tables are called the data tables. Having data stored in tables make it easy for a person to find the required data. In this article, we would see how we could visualize and analyze data.

Visualizing data:

Data visualization is the task of picturing data in a graphical form. We do this so that it is easy to make inferences from the data. Data in the form of charts and graphs convey more information than data simply laying in tables. Graphical representation is also helpful in drawing attention to trends in the data, or variation of data in certain conditions. When it comes to visualization of data, there are different types of graph, some of which are bar graphs, line Graphs, pie charts and ojives.

Bar graphs: These are also called the bar charts. Bar graph is a set of rectangular bars that are placed either vertically or horizontally who’s heights depict a particular value. Consider the following data table on number of people who like a particular flavour of ice cream and the bar graph that depicts the data.

Each bar has a height that represents a particular value. Using a bar chart it easy to know the value of each element and easily make inferences.

Line Graphs: Line graphs are similar to bar charts, but here we make use of small markers at corresponding values for each element and then draw a straight line connecting all points. Diffrenciating from the bar graphs, we make use of line graphs when we have continuity in values. Consider the following example of Mr. Alex’s income in each month for the first six months of an year.

The graph shows a continuity in his earnings each month. We can infer that his earnings dropped from month February to month March. So line graphs make it easier to understand how a factor changed over time or a period.

Ogives: Ogives are used to represent the cumulative sum. Cummulative sum is the summation of values in a sequence. Imagine this situation. Mr. Alex saved a sum of 5000 rupees every month. So the savings increases by 5000 rupees every month. This can be depicted using an ojive.

Pie Chart: Pie chart is a circular graphical representation of data. So we have a circle that is divided into slices. Each slice proportionally represents some amount of data. Let us understand this with an example. We have a class of 100 students and they are asked to choose one of the three subjects as an elective. We could represent this data using a pie chart.

Analyzing Data

Analyzing of data is usually done to either find the center of the data or the spread of data. Center of data is determined by the mean, median or the mode. On the other hand, the spread of the data can be determined by the range, variance and the standard deviation. We would see the measures of central tendency as well the spread in this article.

Mean: Mean refers to the average of values. Given a set of values, the mean or the average can be determined by summing up all the values and deciding them by the total number of values.

Median: Median refers to the center value of a given set of sorted values. This is the value that separates the higher half from the lower half.

Mode: Mode is the the number that occurs most frequently in a given set of values.

In case there are two numbers that have occurred an equal number of time, we consider the set of values to be bimodal. Anyways, if three or more numbers repeat the same number of times, we say the data set has no mode.

The measures of spread, that we would be discussing is the range, interquartile range, variance and the standard deviation.

Range: The range is the difference between the highest and the lowest value in the given set of numbers. To find the range of a set of values, it is not necessary for the data to be ordered. Simply the range is given as highest value — lowest value.

Interquartile range: The interquartile range is the difference between the 75th percentile and the 25th percentile. These are also called the upper(Q3) and lower (Q1) quartiles which are obtained by taking the median of the upper half and lower half of the data.

Variance: Variance define how far each value in the data set is from the mean of the data set and from every other value in the data set.

Standard Deviation: Standard deviation is simply the square root of the variance. Standard deviation determines the deviation of the value from the mean of the data set.

This sums up the how data can be pictured, perceived and infered. While visualizing data, there are many more forms in which they can be represented. For example venn diagrams, histograms, steam and leaf plots, etc. However, it is important to know that, in each of the examples stated above, we have made use of one way tables. One way tables are those in which the value can be determined by asking a single question. For example in a table of heights and names of people, if we want to determine the height of an individual we just ask one question “Who’s height?”. There are other forms of data tables called the two way data table, where in we have to ask two questions in order to gain information.

Thank you for reading along till here. If you liked this article, kindly do follow. I would be bringing in new articles every month. I wish all my readers good health. Stay home and stay safe. Go out only if necessary. Wear a mask and maintain social distance. Get vaccinated as soon as possible. Thanks again. Click here to visit my linkedin profile.

--

--

Abhishek V Tatachar

A Data Scientist specialising in crafting AI-powered solutions for businesses. Speaks about AI Applications, GenerativeAI, Prompt Engineering and Productivity