Frequently Asked Questions about HINTS
On this page:
What is HINTS?
The Health Information National Trends Survey (HINTS) is a biennial, cross-sectional survey of a nationally-representative sample of American adults that is used to assess the impact of the health information environment. Specifically, HINTS measures how people access and use health information; how people use information technology to manage health and health information; and the degree to which people are engaged in healthy behaviors. Finally, several items in HINTS have a specific focus on cancer prevention and control.
Three iterations of HINTS exist: HINTS 2003, 2005, and 2007. As such, researchers can examine items and constructs that are common to all three iterations as a way to measure trends over time.
How can these data be used for planning in cancer control?
HINTS data provide reliable estimates of the American public's engagement in behaviors related to health information and preventive health care with a specific emphasis on cancer. As such, HINTS data can be used to inform the design, content, and evaluation of cancer control initiatives by revealing where knowledge of and/or behavioral engagement in cancer control strategies are low; by highlighting the media used most often to obtain cancer control information; and by providing national-level trends in behavioral factors related to cancer control. Additionally, subgroup-specific analyses of HINTS data can be used to inform tailored cancer control planning efforts (e.g., by gender, race/ethnicity, socioeconomic status, cancer history).
How are the survey instruments created?
Some items in HINTS are borrowed from existing national-level surveys (e.g., CDC's Behavioral Risk Factor Surveillance System); some come from smaller surveys related to health, and some are created by members of the HINTS program at the National Cancer Institute. In all cases, items are carefully tested before the survey is fielded to ensure that they are psychometrically sound. The on-line codebook documents the source of every item in HINTS (see the Survey Notes field).
How should I report response rates if I'm using HINTS 2007 data from one mode or both?
You should report the response rate for the mode or modes used in the analysis. These can be found for each HINTS iteration in the respective Final Report found here: http://hints.cancer.gov/instrument.aspx. The weighted response rate for the 2007 random-digit-dial (RDD) frame was 24.23%. The weighted response rate for the 2007 address frame was 30.99%.
What geographic units of analysis are possible in HINTS?
HINTS data provide nationally-representative estimates, but there are variables in the data set that allow researchers to compare rural vs. urban metropolitan statistical areas (MSAs) as well as census regions (enter census regions here). Due to the relatively small sample size for respondents from any given state, it is generally not recommended that researchers conduct state-specific HINTS projects, as these analyses can be statistically unreliable. However, in certain cases where data can be pooled across multiple iterations of HINTS, the sample size may increase enough such that a state-based investigation is statistically appropriate. Data can only be assessed as a representation of the nation, as there is not enough data from each state to analyze them individually.
Do I need permission to use the data?
No. All data from every HINTS survey are available on the HINTS Web site for public use.
Can I use these items in my own survey or research?
Yes; however, in order to promote the growth of the HINTS community, we ask that you let us know about any HINTS-related published articles, measures that you derive from HINTS items, and associated presentations so they may be posted on the HINTS Web site. Full citation, and (if available) a PDF or PowerPoint should be sent to NCIHINTS@nih.gov.
How do I contact the program?
For information about the HINTS program, contact us.
What publications/presentations already exist?
What journals would you recommend for HINTS studies?
Where can I find information on sampling procedures?
Each iteration of the survey has a final report that is available on the HINTS Web site. These final reports describe the sampling procedures in great detail. Briefly, there are two sampling methods used for the HINTS survey. The 2003, 2005, and 2007 iterations used a list-assisted random-digit-dial (RDD) sampling plan to collect data via a computer-assisted telephone interview. The 2007 survey additionally used a comprehensive national listing of addresses available from the United States Postal Service to collect data via mailed questionnaire.
What are the limitations of the data?
Because HINTS is a cross-sectional survey, it is not possible to infer causal relationships between constructs or items in the survey. Additionally, while researchers can examine trends over time at the national level for outcomes included in multiple iterations of the survey, one cannot assess change over time at the individual level.
How is HINTS different from other surveys?
HINTS is the only national surveillance vehicle exclusively devoted to monitoring the impact of and changes in cancer communication as well as key processes in health among American adults. Compared to other population-level surveys of health, HINTS is unique in its emphasis on cancer, health communication, and the health information environment (including use of health information technology).
What is the jackknife replicate weight type used in HINTS?
The type of jackknife replication is JK1. The number of jackknife replicates is 50.
Questions about mode
What is the "mode"?
The mode is the medium used to communicate the survey questions.
What modes have been used to collect HINTS data?
HINTS 2003 and 2005 were administered by telephone using a random-digit-dial (RDD) sample frame.
HINTS 2007 was administered in two different modes. One was by telephone, with an interviewer reading the questions. The second was by mail. This involved sending a paper questionnaire to the respondent's home and asking the respondent to fill out the survey by him/herself.
The telephone mode was administered by drawing a sample using a RDD sample frame. This involves randomly generating telephone numbers among those exchanges that are used for landline telephones. The sample for the mail survey was based on a list of all addresses to which the United States postal service delivers residential mail.
Why were HINTS 2007 data collected using 2 modes?
Prior HINTS surveys were conducted by drawing a sample using a random-digit-dial (RDD) procedure and administering the interview by telephone. However, over the last few years, the ability to complete RDD surveys has become increasingly difficult. Response rates to RDD surveys have been decreasing steadily because of the public's reluctance to participate. This reluctance affects both the quality and cost of conducting the survey. The quality is potentially affected because it lowers the overall response rate. The cost is affected because more calls have to be placed to complete the same number of interviews. In addition, RDD surveys have not traditionally included cell phones as part of the survey population. Because of the proliferation of using cell phones, the percentage of persons who do not have access to a landline telephone has been steadily growing. At the time HINTS 2007 was conducted, this percentage was greater than 15% of all adults. Those who do not have access to a landline phone are different from those who do have access.
HINTS 2007 introduced an address sampling frame to conduct a mail survey to counteract the above problems with RDD. The address frame includes those that do not have a landline telephone. A pilot for HINTS 2007 found the response rate for the address frame can achieve equivalent or better results than the telephone. Both an address (mail) and an RDD (telephone) survey were implemented in 2007 to allow HINTS to bridge between survey administrations. Since there is the possibility that certain estimates will be different by the mode of interview, it may be difficult to compare the mail results in 2007 to the telephone results in prior years. Including both an address and RDD frame allows doing trend analyses while keeping the mode constant. The RDD survey in 2007 allows users to look at trends for prior HINTS surveys while keeping the mode constant. Moving forward, HINTS has the option to continue with the telephone, if there are significant changes in this survey methodology. Alternatively, it will be possible to shift completely to an address frame to continue mail and/or a web survey in the future. Regardless of the mode used in the future, it will be possible to maintain the trends over time.
What are mode effects?
This refers to differences in results associated with the mode used to administer the survey. On HINTS 2007, both a mail and a telephone survey was used to administer the survey. Research has shown that when asking certain types of questions, results will differ depending on the mode. For example, a mail survey is completed by the respondent without an interviewer. This methodology has been shown to yield more reports of socially sensitive information when compared to an interviewer administered survey. There may also be effects due to the differences in the channel of communication between the two modes. The mail survey is completed through a visual process with respondents reading and interpreting the questions. The telephone interview is completed through auditory cues, without any visual aid. For certain types of questions, this can lead to different results.
There are also other survey properties associated with mode of interview that may affect the results. The telephone survey on HINTS included a Spanish interview, while the mail mode did not. The mail survey included individuals living in households without a landline telephone. The telephone survey did not. Both of these differences in the sample might lead to different estimates across the two different modes.
How can I tell if I need to be concerned about mode effects?
Compare the estimates from the address sample to the estimates from the RDD sample. To do this, use the address full-sample and RDD full-sample weights to produce the two estimates. A simple review of the data should provide an initial assessment if there are differences in the estimates and whether they are large enough to be concerned.
After reviewing the above differences, you may want to conduct a formal test to see if the estimates are statistically different. Remember, that statistical significance is not particularly meaningful for a sample as large as the HINTS. Relatively small differences will be statistically different but not substantively meaningful.
The method to conduct formal significance tests will depend on the type of analysis that is being conducted. For descriptive analyses, you can use procedures described in Rizzo, et al, (2008) pertaining to Goal 1 (e.g., Chapter 3). This involves generating separate estimates and standard errors for each sample frame and conducting a z test:
(Xr – Xa)/sqrt[V(Xr) + V(Xa)]
Where Xr is the estimate for the RDD sample; Xa is the estimate for the address sample; V(Xr) and V(Xa) are the variance estimates for the RDD and address sample, respectively.
If you are conducting a multivariate analysis that is concerned with the relationship between two variables, then you should include a dichotomous variable, Si, in the regression that represents the type of sample. Si would be 0 if the i=RDD and a 1 if i=Address. An interaction term should also be included between address type and the variable of interest. For example, if one were looking at the relationship between age and whether someone looked for cancer information, the regression would include a term for age, sample type and an interaction between the two. A statistically significant interaction suggests that the relationship between age and looking for cancer is different by mode. One can then review the magnitude and implications this difference might have for the particular analysis being conducted.
One source of a significant interaction is different responses because the mode of communication is different (self administered mail survey vs interviewer administered telephone survey). Keep in mind, however, that there may be differences across mode for other reasons. For example, the address sample contains households that to not have landlines, while the telephone sample does not contain these households. Hence, it is possible that a significant interaction is due to different responses between households that have landlines and households that do not have landlines.
To conduct the multivariate analysis, weights should be created using the procedures described in Rizzo et al (2008: Chapter 4) where the two sets of replicate weights are combined into a single set of replicates. The procedure in Rizzo describes how to do this when conducting tests between two survey years. For testing for mode effects, use the same procedure except treat each sample type as if they were two different years. The weights created by applying the procedure in Rizzo to the two 2007 samples should only be used for testing for mode effects. They should not be used to calculate estimates of totals as their sum over the RDD sample plus the address sample is two times the number of adults in the United States in 2006. (The American Community Survey results for 2007 were not available at the time the weights for HINTS 2007 were calculated.) Also, the weights created by applying the procedure in Rizzo to the 2007 sample should also not be used to estimate quantities that involve sums across both of the 2007 samples because they do not maintain the correct relative relationships between the weights of RDD cases, the weights of address-sample cases with landlines, and the weights of address-sample cases without landlines.
How do I address mode effects in my analyses?
When conducting and reporting analyses of the HINTS, the different modes should be assessed and discussed as a strength of the design. By collecting data in different modes, you are able to assess the robustness of your results under different measurement processes. In some cases, one of these processes is superior for a particular mode. In this case, then you should choose that particular mode. However, if results differ and one cannot argue that one mode is better than the other, then you should report the results both ways.
If there are no mode effects, then it would be appropriate to combine the two samples into a single analysis. Estimates should be generated using the composite weights. Unlike the weights created by the procedure described in Rizzo, the composite weights yield correct estimates of totals because their sum over the RDD sample plus the address sample is equal to the number of adults in the United States. Also the composite weights maintain the correct relative relationships between the weights of the RDD sample, the weights of address sample cases in households with landlines, and the weights of address sample cases without landlines. When writing up your results, you should report that analyses were run separately and results did not differ by mode. Again, this speaks to the robustness of the results.
If there are significant mode effects, there are several possibilities. One is to assess which mode provides the most valid data. For example, if the result involves socially sensitive information (e.g., reports of serious psychological distress), then the mail mode should be selected. Similarly, the telephone mode is more likely to be subject to primary and recency effects (i.e., picking the first or last response on an ordinal scale). If this is the case, then one might choose to use the mail survey. If the result involves comparing to prior HINTS, then use the telephone sample because this keeps the mode consistent across years. Similarly, if the focus is on Hispanics, then use the telephone sample. The telephone interview included both English-speaking and Spanish-speaking respondents, while the mail survey was only administered in English. There are differences, therefore, between who answered the questionnaire among Hispanics.
If you cannot decide which mode is "better," report both analyses. An equivalent method would be to conduct analyses that controlled for mode (e.g., as a separate term and interaction term in a regression).
What should I do if some of the items in my analysis have mode effects and some do not?
The main advantage of using the combined sample is increased statistical precision (larger sample size). If it is possible, report the results not affected by mode using the combined sample. For the results that are affected by mode, either use the "preferred" mode or report it for each mode (if there is no preferred mode). If the increased precision is not essential to the analysis, then consider reporting all of the results in either the preferred mode or for both modes (if there is no preferred mode).
Do I need to consider mode effects if I am looking at trends across HINTS years?
You do need to consider mode effects for trend analyses. If there is a mode effect, use the RDD sample. This keeps mode consistent across the different HINTS surveys.
If there are no mode effects for an item, can I always combine data collected by both modes?
Yes. Use the composite weights for this analysis.
Questions about trends
If HINTS data are cross-sectional, how can I look at trends over time?
HINTS is a series of repeated cross-sectional surveys. By comparing measures across the different survey years, one can examine net change over time. For example, using the HINTS data, it is possible to see if the proportion of adults in the United States who have looked for information about cancer has changed between 2005 and 2007. This is a standard methodology that is applied to virtually all social and economic surveys that examine change over time. For example, the unemployment rate reports net change from repeated surveys.
How many ways can I examine trends over time?
HINTS trend analyses provide the net change in the item of interest. For example, comparing the estimate "proportion looking for cancer information" between 2005 and 2007 measures whether the proportion of adults in 2005 looking for cancer was higher, lower, or the same as in 2007. HINTS cannot provide information on "gross change." Gross change enumerates the status of individuals across both time periods. For example, a measure of gross change would be the number of people who changed their information-seeking behavior between years. In order to get this type of measure, one would have to interview the same people in both 2005 and 2007 to enumerate whether their status changed between the two time periods.
If, in my analysis, I do not observe any changes over time for HINTS items, may I combine data across HINTS years to increase my sample size?
Combining data across years will increase the statistical power by increasing the sample size. This might be especially useful if one wanted to focus on particular population groups that a single survey administration may not provide large sample sizes or if there are other groups of interest that a single year does not provide enough sample. Before combining data across years, one should first determine that the items of interest have not changed between years. To generate estimates and standard errors for combining data, it is necessary to create a new set of weights that permits the statistical program to compute the correct standard errors. The procedure to do this is provided in Rizzo et al (2008) Chapter 4 (combining data files)—see http://hints.cancer.gov/docs/HINTS_Data_Users_Handbook-2008.pdf.
Are there any published papers using HINTS data that examined trends over time?
Does the HINTS program provide any guidance on how to examine trends over time?
Do I need to consider mode effects if I'm looking at trends across HINTS years?
You do need to consider mode effects for trend analyses. If there is a mode effect, use the RDD sample. This keeps mode consistent across the different HINTS surveys.
How can I tell if an item is appropriate to examine in trends analyses?
You should examine the question wording for the item for each year. If the wording has not changed, then it is appropriate for looking at trends. The HINTS documentation has compiled a crosswalk of the different items for each of the survey administrations. This is available on the HINTS Web site—see "HINTS Items Across Years" document found here: http://hints.cancer.gov/instrument.aspx
Do I need to consider differences in response rates or sample demographics when examining trends across HINTS years?
The unweighted demographic distributions of the sample do change across years. The primary reasons for this are the decline in response rates and the deterioration of the random-digit-dial (RDD) sample frame. Between 2003 and 2005 the response rate dropped from 33% to 20%. Between 2005 and 2007 the response rate increased slightly (20% To 24% for the RDD sample). The drop between 2003 and 2005 is generally attributed to the growing problems with getting cooperation from the general public for RDD surveys. The proportion of those who do not have access to a landline telephone also went up during this time period. This group is disproportionately related to younger persons. This generally reduced the number of younger people in the sample over the later HINTS telephone administrations.
The weights for each year will distribute the samples to reflect the demographic distribution of adults living in the United States for that particular year. Using the weighted estimates, therefore, would compensate for differential distributions across years.
A more nuanced question is whether the weights compensate for all possible differential response bias. For example, suppose the research involved the item on seeking information about cancer. Further suppose that in 2005, young people who were not interviewed were less likely to seek information than those who were interviewed. However, in 2003, young people who were not interviewed were just as likely to look for information as those who were interviewed. In this case, the weighted estimates would not eliminate bias in estimates of change across the years.
It is difficult to know if there is differential bias for any of the HINTS questionnaire items. Research has found that large differences in response rates are not a deterministic indicator of bias1. As noted above, for a bias to exist, there has to be a significant jump in who is not responding relative to the item that is being analyzed. For this reason, we believe that once using the weighted estimates, it is generally valid to compare across years. However, researchers should assess their own situation and decide based on the groups they are analyzing.
The one exception to this general rule is in the case of Hispanics, where the mail survey did not include a version in Spanish. The telephone survey included a Spanish version. Consequently, the samples are likely quite different between the two samples, regardless of the weighting adjustment and should not be combined in analyses that focus on this group.
1. Groves, R. (2006) "Nonresponse Rates and Nonresponse Bias in Household Surveys" Public Opinion Quarterly 70: 646 - 675
I want to use the HINTS 2007 random-digit-dial sample for my research. However, I currently use STATA8, which has no jackknife function. What should I do to use SVRset function in my STATA8 to address replicate weight problems?
Stata® began to support the replicate weights starting with Version 9. There is no replicate weights feature in Stata 8® or earlier versions. Some suggestions are:
- Upgrade the Stata® software.
- Use SUDAAN® software or SAS® v9.2. SAS began to support replicate weights from Version 9.2 though the subpopulation analysis is not available for replicate weights.