100 People: About The Data
Fritz J. Erickson: Dean of Professional and Graduate Studies,
John A. Vonk: Professor Emeritus, University of Northern Colorado
April 7, 2006
How do we know how many people are in the world at any given moment? Is it possible to count everyone at the same time? Is it possible to know exactly how many people are of any given race, religion, geographic location, gender, or any number of variables without counting each person or asking each person the same set of questions? Do different countries and cultures have different meanings for the same questions about population? If we can't count everyone, if we can't ask the same questions, if we have different definitions about the human condition (i.e., schooling, safe drinking water, transportation), and if we can't determine exactly where each person falls within the host of variables that make us individually and wholly unique then how do we determine which 100 among us represent all of us?
These are but a few of the questions we asked in determining how to develop a sound statistical basis for representing the mosaic of the human population. The world varies on so many different characteristics. Our task was to define a few variables and develop a rational, logical, and well founded basis for representing the world's population. Are we 100% accurate. No. Is it possible to be 100% accurate? No. Is it possible to have a high degree of confidence that what is represented here likely represents the world population? Yes. Here's why.
Understanding Estimates and Projections
Another reason for using estimates and projections is that, around the world, population data is collected by different agencies and different countries using different techniques. Most agencies and countries operate on their own cycle which may or may not align with others. When dealing with the population of the world, keep in mind the fact that not every country in the world takes a census every ten years, much less for a ten year cycle, or beginning at the same time (e.g., the year 2000). So, each organization collecting data on world population characteristics makes a projection about the world's characteristics based on data that it has collected at different times and intervals in the past. This means that each population projection differs based on when and how long ago the complete census was taken.
Just as different agencies or countries have different strategies for collecting population data, they also have different interests or goals for the data. For example, not all countries are interested in determining the level of literacy, the use of technology, gender differences, or even the diversity of language or religion within a country. The ways in which questions are asked also cause variations in estimates or projections. The definition of educational levels varies from country to country, religious identification often varies, and there are even the variations in how language is defined.
Finally, and maybe most importantly, because there are differences in estimates or projections it is critical that multiple data sources be considered. Using multiple sources provides for the best opportunity to address inconsistencies between data sources because it allows for the broadest level of consideration. For example, one data source might indicate that that 5 percent of the global population resides in North America. Another source might indicate that number is 7 percent while yet another suggests that the number is 4 percent. Considering multiple data sources allows for these inconsistencies, if they exist, to be addressed. Of course, when there is broad agreement in multiple data sources that provides a great deal of confidence in the accuracy of the estimate or projection.
In most cases, there are minor inconsistencies between data sources. The question then becomes how to address the issue of what to do when estimates vary, or when multiple sources don't agree? One option is to simply average the projections. That is, take all of the projections and calculate the middle point or the measure of central tendency. Another option would be to simply take the most recent projection or estimate assuming that because it is the most recent it is likely to be the most accurate. Another option would be to select the modal estimate, that is, the projection that appears most often in the literature. In the end, the solution is to select the option that best fits the data.
When using multiple data sources to achieve accuracy, each characteristic must be viewed as a discreet variable. That is, as a variable unrelated to any other variable. For example, when reporting on the number of males and females living in the world today, and using multiple sources, it is only gender that can be examined. Gender is examined without regard for age, ethnicity, religion, or urban/rural living. One reason for this is there is not complete agreement among multiple sources on the questions to be answered when collecting data. That is, not every data collection agency attaches the same importance to the data. When assessing gender, not all data collection agencies around the world will ask the additional question of how many males and females live in an urban location, how many males and females are age 14 and under, and how many live in a rural area; or, how many are Muslim and how many are Hindu. We may know that five percent of the world's population lives in North America, and we might assume that it means two and one half percent are men and two and one half percent are women, but we cannot be certain because this specific gender information is not reported in multiple sources.
Another problem arises when dealing with world population statistics. One of the goals of this project is to focus on characteristics that have at least one percent of the world's population. While one percent sounds small the actual number can be rather large. For example, let's use the estimate that the world's population is approaching 7 billion people. One percent of seven billion is 70 million. For a characteristic to occur one percent of the time, it must affect seventy million people at any point in time. Reporting on the number of people with AIDS requires that there be at least 70 million people suffering with AIDS at any one time to be included at the one percent level. Unfortunately, the mortality rate among those people with AIDS is so high the number never reaches the 70 million (1%) level at any single point in time. A similar problem arises for religions in the world. For many believers, their number, while very large, may not reach the level of 70 million to be included in world population statistics at the 1 % level. The category "Other Religions" may make up 12 percent of the world's population, but a single religion in this group does not make it to the 1% level.
Raw Data vs. Secondary Data Sources: A Problem of Access
A Pretty Good Estimate
After identifying each variable we set about the task of identifying data sources to provide insight into each variable. To increase our confidence in the outcome our goal was to include a minimum of three data sources per variable whenever possible. We also tried to select data sources that were as objective as possible. Of course, making this decision is subjective but we wanted data sources that are solid, well grounded, and are less prone to bias.
After identifying data sources we had to consider how to deal with issues of inconsistencies between the data sources. We considered each data source and then each variable independently. In some cases, we selected the middle point of the data to reduce the amount of error. In other cases, we selected the most current data. In other cases, we tried to examine a broader literature base as a reason for selecting one data source to be most representative.
Finally, we tried to apply the "reasonable" test. In other words, does the data represented make sense? Does the data appear to be representative? Have we considered the full range of data possibilities? If the answer was yes then we concluded the data for that variable was accurate and believe that it likely represents the world's population.
It is important to remember that each variable is being represented independently. In other words, when taking pictures of 50 males and 50 females it can be any 50 males and any 50 females. We cannot other variables with gender. In other words, we can't say of the 50 females, 3 need to be from North America, or X are Muslim, or X have are literate. Only gender can be considered so any 50 females is appropriate. Of course, this does not mean you can't decide to take 5 pictures of women who are old, or 10 pictures of women who are Muslim, or 3 pictures of women who speak English, or any other variable. It simply means that according to the statistical requirement any 50 will do.
This process needs to continue through each of the fifteen variables. Once that is complete the next step is the determination of how many pictures are needed to reflect the variable of interest. In other words, how many pictures do you want to represent gender? How many pictures do you want that represent Latin American & the Caribbean? If ten pictures of males and females are desired, a 10% sample of the pictures of males and females can then be drawn at random until five pictures of males and five pictures of females have been selected from the total of 100 pictures. This process can be repeated for each variable.
This photoscript provides a great deal of latitude to represent the full level of diversity that represents the world's population while at the same time following a strong empirical structure. The outcome is one that will allow this project to represent our commonalities, our differences, and our shared humanity.