Data is set of observation. This observation can be collected in various ways such as field notes, survey or experiments. There are lots of different ways to collect data about the world around us. So, in Social Sciences and biology we often have data on people, but in subjects like physics we may not have people as set of observation.
In statistics we have a very transparent way of organising data, we organise data into something called dataset or data-frame. The rows of the data are called cases or observational unit sometimes rows are simply referred as observation. The columns of the dataset are simply called variables. The variables are the characteristic or traits of cases. To put in another way each row is a different person or state or other kind of observational unit and each column in some kind of characteristic or trait for each one of the observational units.
There are two main type of data and two subtypes for each of these main types. We can think of variables as either being qualitative or quantitative or to put it in another way, we can have numerical or categorical as same thing.
Numerical Variable: Continuous Variable Vs Discrete Variable
Numerical variables are numbers that can take values simply that have numbers. So, for numerical variable, it usually makes sense to add, subtract or take numerical summery such as average with those values. Some examples would be, age of the respondent, miles per gallon in a car or the ranking of different collages.
For numerical variable, its useful to distinguish further between a continuous and discrete numerical variable. For a continuous numerical variable, it can pick any number. So, for example height in inches, percentage of children having dyslexia, number of miles moved away from childhood home. Each of those variables in continuous, in a sense that it can have many decimal values. So, height in inches might be 70.23 inches.
In contrast of discrete numerical variable, they can take out values that are only non-negative counting numbers. So, 0,1,2,3,4,5 and so on. These kinds of variables are discrete, they have certain jumps between each of these values that it can take on. So discrete numerical variables cannot take value 1.5, it doesn’t make sense. It can only take values of non-negative counting numbers. Some examples could be numbers of votes for a politician, when you count votes for a politician the votes are non-negative counting numbers, a politician might get 3 votes or 500 votes, but a politician can’t get 3.5 votes.
If you take a different example, rating of restaurant on a five-point scale, where scale just consist of 1,2,3,4 and 5 that is a discrete numerical variable. Another example would be the count of arrivals at airport, how many arrivals occurs at airport at a given day, well that could be count. That could be discrete numerical variable, you might have 500 arrivals at an airport but you can’t have 500.5. So, the key point between continuous and discrete numerical variable is, it has to be, whether they can take or not all sort of numbers or non-negative counting numbers.
Exercise: So, let’s go through few examples, Annual household income of respondent in thousands of us dollars, well that’s an example of a continuous numerical variable because you could have these decimal places that dollars can take on like 10$, 101.65$, 108.77$. A rating of 1 to 4 stars of a movie, assuming that the stars can take on values 1,2,3 or 4 that’s a discrete numerical variable. To give another example, the number of murders last year in a neighborhood would be a discrete numerical variable because for that kind of variable, you can only take on values 0,1,2,3,4 and so on.
Categorical Variable
Besides numerical variables, and their two subtypes continuous and discrete, it is useful to talk about other main type of variable that you will encounter that is categorical variable. Categorical variables are sometimes called, qualitative variables, they can take on values that have different categories, or qualities. Here are some examples of categorical variables, gender of respondent, in general it could be male or female. So, you have these two categories, you might talk about approval or disapproval of a social policy, you might talk about whether somebody is regular smoker or not so yes or no, you smoke regularly or not these different kinds of variables are categorical in a sense that there are these different categories or qualities associated with them.
Nominal Vs Ordinal
The two subtypes of categorical variables are nominal or ordinal, for nominal categorical variables they are no inherent order to their categories or levels. So, for example religious identification, political party affiliation, the names of countries you have visited past year, each of these are nominal categorical variables because there is no inherent order it does not make sense to say that for religious identification of catholic is somehow higher or lower than Jews. So, for nominal categorical variable, there is no inherent order.
For ordinal categorical variable there is usually an expected ordering there is natural inherent order to the different categories or levels. So, for example you might talk about, a highest education degree a person has attend, level of approval in country economy policy from strongly disapprove to strongly approve, each of these ordinal variable have ordering to it, so when you talk about highest educational degree someone has obtained a graduate degree is higher than high school degree or if you talk about approval there are orders to different levels that whether they strongly approve is the top most category, below that would be, somewhat approve, then approve to somewhat disapprove to strongly disapprove. For ordinal categorical variable, it makes sense to order the different levels, however it’s not always clear cut, because one-person idea of what natural ordering is might be culturally specific, if you go to a different person or different society there is no inherent ranking that is obvious to people for different levels of a categorical variable. So, keep in mind that if you are unclear about whether you are unclear about a nominal vs ordinal you should probably stick with nominal just because in ordinal variable there is certain subjective component to saying that certain level is higher than others.
Here is an example to help you explain more the different types of variables.