flags
logo_fb_smalllogo_fb_small Bookmark and Share
100 People: About The Data

Fritz J. Erickson: Dean of Professional and Graduate Studies,
University of Wisconsin - Green Bay

John A. Vonk: Professor Emeritus, University of Northern Colorado

April 7, 2006

How do we know how many people are in the world at any given moment? Is it possible to count everyone at the same time? Is it possible to know exactly how many people are of any given race, religion, geographic location, gender, or any number of variables without counting each person or asking each person the same set of questions? Do different countries and cultures have different meanings for the same questions about population? If we can't count everyone, if we can't ask the same questions, if we have different definitions about the human condition (i.e., schooling, safe drinking water, transportation), and if we can't determine exactly where each person falls within the host of variables that make us individually and wholly unique then how do we determine which 100 among us represent all of us?

These are but a few of the questions we asked in determining how to develop a sound statistical basis for representing the mosaic of the human population. The world varies on so many different characteristics. Our task was to define a few variables and develop a rational, logical, and well founded basis for representing the world's population. Are we 100% accurate. No. Is it possible to be 100% accurate? No. Is it possible to have a high degree of confidence that what is represented here likely represents the world population? Yes. Here's why.

Understanding Estimates and Projections
Our knowledge of the world's population is based on estimates or projections. There are several reasons why estimates or projections are used rather than complete counts. One of the most important reasons relates to the size of the world's population. Currently, there are nearly 7 billion people. Because our global population is so large, even under the best of circumstances it can take several years for data collection to be complete. For example, just taking the census (population count) in the United States requires about ten years of work. The current U.S. census was completed in the year 2000. From that point and for the next 10 years the census data continues to be supplemented, analyzed, and refined. Of course, the next complete U.S. census will be taken in the Year 2010 and the process of refining the data estimate will begin again. So, when statements are made about the size of the population of the United States in the year 2006, it is an "estimate" or a "projection" based on census data collected for 2000 and then modified over a long period of time.

Another reason for using estimates and projections is that, around the world, population data is collected by different agencies and different countries using different techniques. Most agencies and countries operate on their own cycle which may or may not align with others. When dealing with the population of the world, keep in mind the fact that not every country in the world takes a census every ten years, much less for a ten year cycle, or beginning at the same time (e.g., the year 2000). So, each organization collecting data on world population characteristics makes a projection about the world's characteristics based on data that it has collected at different times and intervals in the past. This means that each population projection differs based on when and how long ago the complete census was taken.

Just as different agencies or countries have different strategies for collecting population data, they also have different interests or goals for the data. For example, not all countries are interested in determining the level of literacy, the use of technology, gender differences, or even the diversity of language or religion within a country. The ways in which questions are asked also cause variations in estimates or projections. The definition of educational levels varies from country to country, religious identification often varies, and there are even the variations in how language is defined.

Finally, and maybe most importantly, because there are differences in estimates or projections it is critical that multiple data sources be considered. Using multiple sources provides for the best opportunity to address inconsistencies between data sources because it allows for the broadest level of consideration. For example, one data source might indicate that that 5 percent of the global population resides in North America. Another source might indicate that number is 7 percent while yet another suggests that the number is 4 percent. Considering multiple data sources allows for these inconsistencies, if they exist, to be addressed. Of course, when there is broad agreement in multiple data sources that provides a great deal of confidence in the accuracy of the estimate or projection.

In most cases, there are minor inconsistencies between data sources. The question then becomes how to address the issue of what to do when estimates vary, or when multiple sources don't agree? One option is to simply average the projections. That is, take all of the projections and calculate the middle point or the measure of central tendency. Another option would be to simply take the most recent projection or estimate assuming that because it is the most recent it is likely to be the most accurate. Another option would be to select the modal estimate, that is, the projection that appears most often in the literature. In the end, the solution is to select the option that best fits the data.

When using multiple data sources to achieve accuracy, each characteristic must be viewed as a discreet variable. That is, as a variable unrelated to any other variable. For example, when reporting on the number of males and females living in the world today, and using multiple sources, it is only gender that can be examined. Gender is examined without regard for age, ethnicity, religion, or urban/rural living. One reason for this is there is not complete agreement among multiple sources on the questions to be answered when collecting data. That is, not every data collection agency attaches the same importance to the data. When assessing gender, not all data collection agencies around the world will ask the additional question of how many males and females live in an urban location, how many males and females are age 14 and under, and how many live in a rural area; or, how many are Muslim and how many are Hindu. We may know that five percent of the world's population lives in North America, and we might assume that it means two and one half percent are men and two and one half percent are women, but we cannot be certain because this specific gender information is not reported in multiple sources.

Another problem arises when dealing with world population statistics. One of the goals of this project is to focus on characteristics that have at least one percent of the world's population. While one percent sounds small the actual number can be rather large. For example, let's use the estimate that the world's population is approaching 7 billion people. One percent of seven billion is 70 million. For a characteristic to occur one percent of the time, it must affect seventy million people at any point in time. Reporting on the number of people with AIDS requires that there be at least 70 million people suffering with AIDS at any one time to be included at the one percent level. Unfortunately, the mortality rate among those people with AIDS is so high the number never reaches the 70 million (1%) level at any single point in time. A similar problem arises for religions in the world. For many believers, their number, while very large, may not reach the level of 70 million to be included in world population statistics at the 1 % level. The category "Other Religions" may make up 12 percent of the world's population, but a single religion in this group does not make it to the 1% level.

Raw Data vs. Secondary Data Sources: A Problem of Access
Beyond the complexities of large population numbers and the need to use estimates or projections, is the issue of what data sources are available. There are fundamentally two types of data sources raw data and published secondary data. Raw data refers to the actual numbers or raw numbers and secondary data refers to reported data or published interpretations. With raw data, one can actually manipulate the data. That is, a researcher asks the questions and then determines what cross tabulations are appropriate to analyze the data and then report the results. With secondary data sources, someone else asks the questions and publishes the results. Because there is no one single raw data source and access to smaller raw sources is so difficult, this project focuses on multiple secondary data sources. The impact is important. If there were one primary raw data source for all the worlds population it would be possible to calculate the percentage of any one variable when considering another variable. For example, if there were one raw data source then it would be possible to determine the number of males, 65 years old and older, that are Hindu, live in North America, have access to clean water, a secondary school graduate, but do not have a cell phone. While these may be interesting questions, with secondary data, we are limited to the questions asked by others. We cannot manipulate the variables directly to find answers to these questions; we can only search multiple sources until we find someone else who has asked, and published, a question similar to our own interests.

A Pretty Good Estimate
Because we considered the problems associated with representing the world's population, it is possible to develop an educated basis to have confidence in the statistics used in this project. We began by identifying fifteen variables that we felt represent factors that both make us different and the same. Among these is gender, age, geography, religion, first language, overall literacy, literacy by gender, level of education, and many others. Are there others we could have included? Obviously the answer is yes. However, in our judgment the selected variables represent some of the most common characteristics of world population.

After identifying each variable we set about the task of identifying data sources to provide insight into each variable. To increase our confidence in the outcome our goal was to include a minimum of three data sources per variable whenever possible. We also tried to select data sources that were as objective as possible. Of course, making this decision is subjective but we wanted data sources that are solid, well grounded, and are less prone to bias.

After identifying data sources we had to consider how to deal with issues of inconsistencies between the data sources. We considered each data source and then each variable independently. In some cases, we selected the middle point of the data to reduce the amount of error. In other cases, we selected the most current data. In other cases, we tried to examine a broader literature base as a reason for selecting one data source to be most representative.

Finally, we tried to apply the "reasonable" test. In other words, does the data represented make sense? Does the data appear to be representative? Have we considered the full range of data possibilities? If the answer was yes then we concluded the data for that variable was accurate and believe that it likely represents the world's population.

Photoscript
From that data, the next step in this process was to develop a photoscript of which pictures to take and which pictures to use to represent the data. The process we recommend is to take one hundred pictures for each variable. These pictures should reflect the proportion of the population for that variable and that variable only. For example, for the variable "gender" there are 50 males and 50 females. The one hundred pictures should reflect this distribution. The same holds true for "geography." Out of 100 pictures, 5 need to be from North America, 9 from Latin America & Caribbean, 12 from Europe, 61 from Asia, and 13 from Africa.

It is important to remember that each variable is being represented independently. In other words, when taking pictures of 50 males and 50 females it can be any 50 males and any 50 females. We cannot other variables with gender. In other words, we can't say of the 50 females, 3 need to be from North America, or X are Muslim, or X have are literate. Only gender can be considered so any 50 females is appropriate. Of course, this does not mean you can't decide to take 5 pictures of women who are old, or 10 pictures of women who are Muslim, or 3 pictures of women who speak English, or any other variable. It simply means that according to the statistical requirement any 50 will do.

This process needs to continue through each of the fifteen variables. Once that is complete the next step is the determination of how many pictures are needed to reflect the variable of interest. In other words, how many pictures do you want to represent gender? How many pictures do you want that represent Latin American & the Caribbean? If ten pictures of males and females are desired, a 10% sample of the pictures of males and females can then be drawn at random until five pictures of males and five pictures of females have been selected from the total of 100 pictures. This process can be repeated for each variable.

This photoscript provides a great deal of latitude to represent the full level of diversity that represents the world's population while at the same time following a strong empirical structure. The outcome is one that will allow this project to represent our commonalities, our differences, and our shared humanity.

 



HOME | CONTACT US