Longitudinal Observational Research Paper

A longitudinal study (or longitudinal survey, or panel study) is a research design that involves repeated observations of the same variables (e.g., people) over long periods of time, often many decades (i.e., uses longitudinal data). It is often a type of observational study, although they can also be structured as longitudinal randomized experiments.[1]

Longitudinal studies are often used in psychology, to study developmental trends across the life span, and in sociology, to study life events throughout lifetimes or generations. The reason for this is that unlike cross-sectional studies, in which different individuals with the same characteristics are compared,[2] longitudinal studies track the same people and so the differences observed in those people are less likely to be the result of cultural differences across generations. Longitudinal studies thus make observing changes more accurate and are applied in various other fields. In medicine, the design is used to uncover predictors of certain diseases. In advertising, the design is used to identify the changes that advertising has produced in the attitudes and behaviors of those within the target audience who have seen the advertising campaign. Longitudinal studies allow social scientists to distinguish short from long-term phenomena, such as poverty. If the poverty rate is 10% at a point in time, this may mean that 10% of the population are always poor or that the whole population experiences poverty for 10% of the time. It is impossible to conclude which of these possibilities is the case by using one-off cross-sectional studies.[citation needed]

When longitudinal studies are observational, in the sense that they observe the state of the world without manipulating it, it has been argued that they may have less power to detect causal relationships than experiments. However, because of the repeated observation at the individual level, they have more power than cross-sectional observational studies, by virtue of being able to exclude time-invariant unobserved individual differences and also of observing the temporal order of events.[3] Some of the disadvantages of longitudinal study are that they take a lot of time and are very expensive. Therefore, they are not very convenient.[4]

Longitudinal studies can be retrospective (looking back in time, thus using existing data such as medical records or claims database) or prospective (requiring the collection of new data).[citation needed]

Types of longitudinal studies include cohort studies, which sample a cohort (a group of people who share a defining characteristic, typically who experienced a common event in a selected period, such as birth or graduation) and perform cross-section observations at intervals through time (not all longitudinal studies are cohort studies, as it can be a group of people who do not share a common event).[5]

Examples[edit]

Study nameTypeCountry or regionYear startedParticipantsRemarks
Alzheimer's Disease Neuroimaging InitiativePanelInternational2004n/a
Australian Longitudinal Study on Women's Health (ALSWH)CohortAustralia199650,000Includes four cohorts of women: born between 1921 and 1926, 1946–1951, 1973–1978 and 1989–1995
The Jyväskylä Longitudinal Study of Personality and Social Development,[6] (JYLS)CohortFinland1968369The sample was drawn from 12 complete school classes. Data has been collected when the participants were 8, 14, 20, 27, 33, 36, 42 and 50 years old.
Building a New Life in Australia : The Longitudinal Study of Humanitarian Migrants (BNLA)[7][8]CohortAustralia20132399a longitudinal study of the settlement experience of humanitarian arrivals in Australia
Colombian Longitudinal Survey by Universidad de los Andes (ELCA)[9]PanelColombia201015,363[10]Follows rural and urban households for increasing the comprehension of social and economic changes in Colombia
Avon Longitudinal Study of Parents and Children (ALSPAC)CohortUnited Kingdom199114,000
Born in BradfordCohortUnited Kingdom200712,500
1970 British Cohort Study (BCS70)CohortUnited Kingdom197017,000Monitors the development of babies born in the UK in one particular week in April 1970
British Doctors StudyCohortUnited Kingdom195140,701Monitored the health of British male doctors. It provided convincing evidence of the link between smoking and cancer.
British Household Panel SurveyPanelUnited Kingdom1991n/aModeled on the US PFID study
Busselton Health Study[11]PanelAustralia196610,000
Caerphilly Heart Disease StudyCohortUnited Kingdom19792,512Male subjects (Wales)
Canadian Longitudinal Study on Aging (CLSA-ÉLCV)[12]CohortCanada20121,000Planned as a 20-year study.[13]
Child Development Project[14]CohortUnited States1987585Follows children recruited the year before they entered kindergarten in three US cities: Nashville and Knoxville, Tennessee, and Bloomington, Indiana
Children of Immigrants Longitudinal Study (CILS)CohortUnited States19925,262Florida
Congenital Heart Surgeons' Society (CHSS)CohortCanada-5,000Various studies, managed by the Data Center Studies on Congenital Heart Diseases
Dunedin Multidisciplinary Health and Development StudyCohortNew Zealand19721,037Participants born in Dunedin during 1972–73
Study of migrants and squatters in Rio's FavelasCohortBrazil1968n/aThe work of Janice Perlman, reported in her book Favela (2014)[15]
Footprints in Time; the longitudinal study of Indigenous children[16]CohortAustralia20081680Study of Aboriginal and Torres Strait Islander children in selected locations across Australia
Fragile Families and Child Wellbeing StudyCohortUnited States1998n/aStudy being conducted in 20 cities
Framingham Heart StudyCohortUnited States19485,209Massachusetts
Genetic Studies of GeniusCohortUnited States19211,528The world's oldest and longest-running longitudinal study
Socio-Economic Panel (SOEP)PanelGermany198412,000
Growing Up in Scotland (GUS)CohortUnited Kingdom200314,000[17]Scotland
Health and Retirement StudyCohortUnited States198822,000
Household, Income and Labour Dynamics in Australia SurveyPanelAustralia200125,000
Grant StudyCohortUnited States1939268A 75-year longitudinal study of 268 physically and mentally healthy Harvard college sophomores from the classes of 1939–1944.
Human Speechome ProjectCohortUnited States20051The single participant was the son of the researcher, studying language development. Project concluded in 2008.
Growing Up in Australia; the longitudinal study of Australian children[18]CohortAustralia200410,000-
Luxembourg Income Study (LIS)CohortInternational1983n/a30 countries
Midlife in the United StatesCohortUnited States19836,500-
Manitoba Follow-Up Study (MFUS)CohortCanada19483,983 menCanada's largest and longest running investigation of cardiovascular disease and successful aging
Millennium Cohort Study (MCS)CohortUnited Kingdom200019,000Study of child development, social stratification and family life
Millennium Cohort StudyCohortUnited States2000200,000Evaluation of long-term health effects of military service, including deployments
Minnesota Twin Family StudyCohortUnited States198317,000 (8,500 twin pairs)-
National Child Development Study (NCDS)CohortUnited Kingdom195817,000-
National Longitudinal Surveys (NLS)CohortUnited States1979NLSY79-12,686, NLSY97-approx. 9000Includes four cohorts: NLSY79 (born 1957–64), NLSY97 (born 1980–84), NLSY79 Children and Young Adults, National Longitudinal Surveys of Young Women and Mature Women (NLSW)
National Longitudinal Survey of Children and Youth (NLSCY)CohortCanada199435,795Inactive since 2009
National Health and Nutrition Examination Survey (NHANES)CohortUnited States19718,837 (since 1999)Continual since 1999
Pacific Islands Families StudyCohortNew Zealand20001,398-
Panel Study of Belgian Households[20]PanelBelgium199211,000[21]
Panel Study of Income DynamicsPanelUnited States196870,000Possibly the oldest household longitudinal survey in the US
Rotterdam StudyCohortNetherlands199015,000Focus is on inhabitants of Ommoord, a suburb of Rotterdam
Seattle 500 StudyCohortUnited States1974500Study of the effects of prenatal health habits on human development
Stirling County StudyCohortCanada1952639Long-term study epidemiology of psychiatric disorders. Two cohorts were studied (575 from 1952-1970; 639 from 1970-1992).[22]
Study of Health in PomeraniaCohortGermany199715,000Investigates common risk factors, sub-clinical disorders and manifest diseases in a high-risk population
Study of Mathematically Precocious YouthCohortUnited States19725,000Follows highly intelligent people identified by age 13.
Survey of Health, Ageing and Retirement in Europe (SHARE)PanelEurope2002120,000Multidisciplinary and cross-national panel database of micro data on health, socio-economic status and social and family networks of individuals aged 50 or over
Irish Longitudinal Study on Ageing (TILDA)CohortIreland20098,500Studies health, social and financial circumstances of older Irish population
New Zealand Attitudes and Values Study-New Zealand2009n/a-
Seattle Longitudinal StudyCohortUnited States19566,000 [23]-
Understanding Society: The UK Household Longitudinal StudyPanelUnited Kingdom2009100,000Incorporates the British Household Panel Survey
Up SeriesCohortUnited Kingdom196414Documentary film project by Michael Apted
Study on Global Ageing and Adult Health (SAGE)CohortInternational200265,964Studies the health and well-being of adult populations and the ageing process in six countries: China, Ghana, India, Mexico, Russian Federation and South Africa
Wisconsin Longitudinal Study[24]CohortUnited States195710,317Follows graduates from Wisconsin high schools in 1957
ONS Longitudinal Study[25]PanelEngland and Wales1974 (data from 1971)1% sample of the population of England and Wales. The LS contains records on over 500,000 people usually resident in England and Wales at each point in timeThe sample comprises people born on one of four selected dates of birth and therefore makes up about 1% of the total population. The sample was initiated at the time of the 1971 Census, and the four dates were used to update the sample at the 1981,1991 2001 and 2011 Censuses and in routine event registrations. Fresh LS members enter the study through birth and immigration and existing members leave through death and emigration.

Thus, the LS represents a continuous sample of the population of England and Wales, rather than a sample taken at one time point only. It now includes records for over 950,000 study members.

In addition to the census records, the individual LS records contain data for events such as deaths, births to sample mothers, emigrations and cancer registrations.

Census information is also included for all people living in the same household as the LS member. However, it is important to emphasise that the LS does not follow up household members in the same way from census to census.

Support for potential users and more information available at CeLSIUS

Scottish Longitudinal Study (SLS)[26]PanelScotland1991The Scottish Longitudinal Study comprises 5.3% sample of the Scottish population, holds records on approximately 274,000 individuals using 20 random birthdates.The SLS is a large-scale linkage study built upon census records from 1991 onwards, with links to vital events (births, deaths, marriages, emigration); geographical and ecological data (deprivation indices, pollution, weather); primary and secondary education data (attendance, Schools Census, qualifications); and links to NHS Scotland ISD datasets, including cancer registrations, maternity records, hospital admissions, prescribing data and mental health admissions. The research potential is considerable. The SLS is a replica of the ONS Longitudinal Study but with a few key differences: sample size, commencement point and the inclusion of certain variables.

The SLS is supported and maintained by the SLS Development & Support Unit with a safe-setting at the National Records of Scotland in Edinburgh.

Further information and support for potential users is available at SLS-DSU

Northern Ireland Longitudinal Study (NILS)[27]PanelNorthern Ireland2006The Northern Ireland Longitudinal Study comprises about 28% of the Northern Ireland population (approximately 500,000 individuals and approximately 50% of households).The NILS is a large-scale, representative data-linkage study created by linking data from the Northern Ireland Health Card Registration system to the 1981, 1991, 2001 and 2011 census returns and to administrative data from other sources. These include vital events registered with the General Register Office for Northern Ireland (such as births, deaths and marriages) and the Health Card registration system migration events data. The result is a 30-year-plus longitudinal data set which is regularly being updated. In addition to this rich resource there is also the potential to link further Heath and Social care data via distinct linkage projects (DLPs).

The NILS is designed for statistics and research purposes only and is managed by the Northern Ireland Statistics and Research Agency under Census legislation. The data are de-identified at the point of use; access is only from within a strictly controlled ‘secure environment’ and governed by protocols and procedures to ensure data confidentiality.

See also[edit]

References[edit]

External links[edit]

  1. ^Shadish, William R.; Cook, Thomas D.; Campbell, Donald T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference (2nd ed.). Boston: Houghton Mifflin Company. p. 267. ISBN 0-395-61556-9. 
  2. ^Carlson, Neil and et al. "Psychology the Science of Behavior", p. 361. Pearson Canada, United States of America
  3. ^van der Krieke, L., Blaauw, F.J., Emerencia, A.C., Schenk, H.M., Slaets, J., Bos, E.H., de Jonge, P., Jeronimus, B.F. (2016). "Temporal Dynamics of Health and Well-Being: A Crowdsourcing Approach to Momentary Assessments and Automated Generation of Personalized Feedback". Psychosomatic Medicine: 1. doi:10.1097/PSY.0000000000000378. PMID 27551988. 
  4. ^Cherry, Kendra. "What Is Longitudinal Research?". experiments. About.com guide. Retrieved 22 February 2012. 
  5. ^"What is the difference between a Panel Study and a Cohort Study?". Academia Stack Exchange. Retrieved 3 February 2016. 
  6. ^FSD. "Jyväskylä Longitudinal Study of Personality and Social Development (JYLS)". www.fsd.uta.fi. Retrieved 2017-03-30. 
  7. ^"Building a New Life in Australia (BNLA): The Longitudinal Study of Humanitarian Migrants - Department of Social Services, Australian Government". Retrieved 1 December 2016. 
  8. ^"Building a New Life in Australia (BNLA): The Longitudinal Study of Humanitarian Migrants - Department of Social Services, Australian Government". Retrieved 1 December 2016. 
  9. ^Colombian Longitudinal Survey by Universidad de los Andes (ELCA)
  10. ^Encuesta Longitudinal Colombiaba de la Universidad de los Andes - ELCA 2013
  11. ^"Busselton Health Study - Past Projects - BPMRI". Retrieved 1 December 2016. 
  12. ^"Canadian Longitudinal Study on Aging - Canadian Longitudinal Study on Aging". Retrieved 1 December 2016. 
  13. ^Teotonio, Isabel (24 April 2012). "Landmark study on aging to follow 50,000 Canadians over the next two decades". Toronto Life. Toronto Star Newspapers Ltd. Retrieved 28 July 2014. 
  14. ^"Child Development Project - Developmental Pathways to Adjustment and Well-being in Early Adulthood - Center for Child & Family Policy - Duke University". Retrieved 1 December 2016. 
  15. ^Favela: Longitudinal Multi-Generational Study of migrants and squatters in Rio’s Favelas, 1968-2014
  16. ^"Overview of Footprints in Time - The Longitudinal Study of Indigenous Children (LSIC) - Department of Social Services, Australian Government". Retrieved 1 December 2016. 
  17. ^Growing Up in Scotland, Study design
  18. ^Studies, Australian Institute of Family. "Growing Up in Australia: The Longitudinal Study of Australian Children (LSAC) - Australian Institute of Family Studies (AIFS)". Retrieved 1 December 2016. 
  19. ^"Manitoba Follow-up Study - About The Study". Retrieved 1 December 2016. 
  20. ^Panel Study of Belgian Households
  21. ^Panel Study of Belgian Households, Survey summary
  22. ^Murphy JM, Laird NM, Monson RR, Sobol AM, Leighton AH (May 2000). "Incidence of depression in the Stirling County Study: historical and comparative perspectives". Psychol Med. 30 (30(3)): 505–14. PMID 10883707. 
  23. ^"About the Seattle Longitudinal Study". Retrieved 1 December 2016. 
  24. ^"Wisconsin Longitudinal Study Homepage". Retrieved 1 December 2016. 
  25. ^ONS Longitudinal Study
  26. ^"Home :: SLS - Scottish Longitudinal Study Development & Support Unit". Retrieved 1 December 2016. 
  27. ^"Queen's University Belfast - NILS Research Support Unit - NILS Research Support Unit". Retrieved 1 December 2016. 

Abstract

Observational longitudinal research is particularly useful for assessing etiology and prognosis and for providing evidence for clinical decision making. However, there are no structured reporting requirements for studies of this design to assist authors, editors, and readers. The authors developed and tested a checklist of criteria related to threats to the internal and external validity of observational longitudinal studies. The checklist criteria concerned recruitment, data collection, biases, and data analysis and descriptive issues relevant to study rationale, study population, and generalizability. Two raters independently assessed 49 randomly selected articles describing stroke research published from 1999 to 2003 in six journals: American Journal of Epidemiology, Journal of Epidemiology and Community Health, Stroke, Annals of Neurology, Archives of Physical Medicine and Rehabilitation, and American Journal of Physical Medicine and Rehabilitation. On average, 17 of the 33 checklist criteria were reported. Criteria describing the study design were better reported than those related to internal validity. No relation was found between study type (etiologic or prognostic) or word count and quality of reporting. A flow diagram for summarizing participant flow through a study was developed. Editors and authors should consider using a checklist and flow diagram when reporting on observational longitudinal research.

epidemiologic factors; longitudinal studies

CONSORT, Consolidated Standards of Reporting Trials; SD, standard deviation.

Received for publication July 9, 2004; accepted for publication August 31, 2004.

Reporting requirements for clinical trials have improved substantially since the 1960s, when researchers first identified a lack of rigor (1–3). The movement toward standardized reporting has arisen from the recognition that inadequate reporting, for example, of concealed allocation, can lead to biased interpretation (4). The tangible outcome of this improvement is the revised Consolidated Standards of Reporting Trials (CONSORT) statement, comprising a 22-item checklist and flow diagram (5). CONSORT has been adopted by over 150 journals worldwide (6), and its use has been linked with improved quality of reporting of clinical trials, although inadequacies persist (7–9). Since CONSORT, other statements of reporting requirements for nonrandomized interventions (10), meta-analyses (11, 12),and diagnostic tests (13) have appeared. As yet, however, no such equivalent standards for reporting observational longitudinal studies are known to exist. The strength of this design, particularly for assessing etiology and prognosis, is increasingly being recognized (14–17), as is the value of the evidence for clinical decision making (18–21).

In the absence of standard reporting guidelines, authors may refer to theoretical papers and texts describing observational longitudinal research designs (22–24). Although some of these sources provide comprehensive coverage of aspects of observational longitudinal research on which internal and external validity of results depend, others are brief. A few authors have developed checklists with which to assess the quality of reporting of articles, including observational longitudinal research (10, 25–28). These checklists differ in their coverage of elements relevant to the design of observational longitudinal research. The majority are brief or nonspecific and focus on quality judgments. The most comprehensive of these is the Transparent Reporting of Evaluations with Nonrandomized Designs (TREND) statement (10), which is very detailed and places a particular emphasis on interventions. It provides a detailed assessment of the quality of these designs and has suggestions for better reporting. However, none of the checklists offers a simple or straightforward set of guidelines for how observational longitudinal studies should be reported. Adequate reporting is the only means by which proper interpretation can occur (20). The success of CONSORT illustrates the benefits to be gained from improved communication between authors, editors, and readers about research design fundamentals.

The aim of this study was to identify desirable elements in the reporting of observational longitudinal research, construct a CONSORT-style checklist and flow diagram, and test the checklist against published observational longitudinal research. Like other authors (3, 27), we focused on the adequacy of reporting (i.e., whether or not an aspect was reported) and did not attempt to assess quality per se. The secondary focus was to explore the likely value to editors and authors of developing a checklist and flow diagram, covering desirable reporting elements, that would help readers evaluate observational longitudinal research.

MATERIALS AND METHODS

Development of guidelines

We examined the literature on reporting of observational longitudinal research by using the MEDLINE (National Library of Medicine, Bethesda, Maryland), PSYCHLIT (American Psychological Association, Washington, DC), and CINALH (Cinahl Information Systems, Glendale, California) online databases and by hand searching. Search terms included observational, longitudinal, prospective, follow-up, cohort, and outcomes. The literature retrieved described threats to the internal and external validity of longitudinal research and epidemiologic methods in general (e.g., Grimes and Schulz (16), McKee et al. (18), the Epidemiology Work Group of the Interagency Regulatory Liaison Group (22), Wolfe (23), Hartz and Marsh (24), Kleinbaum et al. (29), Greenland (30), Zapf et al. (31), Wolfe et al. (32), Zaccai (33), and Grimes et al. (34)). Several authors had developed their own checklists to assess reporting in epidemiologic studies, and these were reviewed (10, 25–28). Published checklists for reporting randomized (5) and nonrandomized trials (10), meta-analyses (11, 12), and reports on diagnostic accuracy (13) were also reviewed. Additionally, we also examined textbooks on epidemiology (e.g., Rothman and Greenland (35) and Hennekins and Buring (36)).

A draft outline of essential elements related to threats to the internal validity of observational longitudinal research was created. A working group of nine epidemiologists, biostatisticians, and social scientists, with a wide range of qualifications, experience, and clinical interests, contributed and revised checklist criteria. For each essential element identified (e.g., selection bias), the most important criteria (descriptors) to describe an observational longitudinal study were identified (e.g., sampling frame, consent rates, loss to follow-up, item nonresponse). Through this iterative revision process, other criteria fundamental to describing observational longitudinal research adequately (e.g., setting) and to considering generalizability were added to the checklist. Criteria were to be scored as reported (yes), not reported (no), or not applicable to report. To score “yes,” each criterion must be reported in enough detail to allow the reader to judge that the definition had been met. If inadequate information about a criterion was reported, it was scored “no.” If authors referred readers to another publication for specific details about the study methods (e.g., sampling or eligibility), the criterion was scored “no.”

The draft checklist was piloted by the first two authors (L. T. and R. W.), who independently rated 10 articles describing observational longitudinal research (defined as studies in which any designated group of persons was followed or traced over a period of time) (37). Following the pilot study, the criteria were reviewed and were modified by the working group. Once the final checklist was agreed upon, it was tested on a random selection of articles describing observational longitudinal research. The clinical area of stroke was chosen as an example because it is the current field of interest of the first author. None of the other authors or members of the working group had substantive experience in stroke research. Six journals publishing epidemiology, clinical, and rehabilitation stroke research, with a range of impact factors (from 0.9 to 8.6), were chosen: American Journal of Epidemiology, Journal of Epidemiology and Community Health, Stroke, Annals of Neurology, Archives of Physical Medicine and Rehabilitation, and American Journal of Physical Medicine and Rehabilitation.

Ten articles reporting observational longitudinal research were randomly sampled from each journal. The sampling frame was every volume of the six journals published between June 1999 and June 2003 inclusive. Ten randomly generated volume/issue “pairs” (e.g., issue 3, 2002) were produced for each journal. Potentially eligible articles were identified from words such as “longitudinal,” “follow-up,” “outcomes,” “prospective,” or “observational” appearing in the title or abstract. Content eligibility was assessed by the presence of any of the words “stroke,” or “cerebrovascular accident,” or “CVA,” or “acquired brain injury,” or “infarct” coupled with a structure or hemisphere of the brain; or words illustrative of stroke symptoms, for example, “hemiplegia,” “hemiparesis,” or “neglect.” Exclusion criteria were words indicating that the study was randomized; an intervention; a case series; a case-control, cross-sectional, or retrospective study; or a systematic review. Studies of animals were also excluded. When more than one eligible article was identified in a particular volume/issue pair for a selected journal, all were numbered and one selected randomly. When a volume/issue pair had no eligible articles, a new volume/issue pair was randomly generated for the same journal. The American Journal of Epidemiology and the Journal of Epidemiology and Community Health had only three and six eligible articles, respectively, within the sampling frame, so all were included. None of the authors or the members of the working group was an author of any of the sampled publications.

Of the 49 articles selected, six were published from June to December 1999, 11 during 2000, 10 during 2001, and 11 each during 2002 and from January to June 2003. The article list is available at the following website: http://www.sph.uq.edu.au/hisdu/bias_refs.html. Each article was independently rated with the checklist by the first two authors, who then compared ratings and resolved disagreements by consensus. When disagreements could not be resolved, a third independent rater made the final judgment. Besides the rating of each article with the checklist, it was noted whether the study was primarily etiologic (n = 20), prognostic (n = 25), or both (n = 4). The text word count of each article was also estimated. The working group also drafted a summary flow diagram to represent the essential elements of participant recruitment and follow-up in observational longitudinal studies.

Statistical analysis

Descriptive statistics were computed for each checklist criterion by type of study (etiologic or prognostic), journal, and word count. Agreement between the two raters on the 33 criteria was summarized by percentage agreement, presented here by the median and quartiles. For each article, the number of criteria reported was divided by the number of relevant criteria to give a score reflecting the proportion of relevant or applicable criteria reported. For example, if 12 criteria were reported when 33 were applicable, the proportion was 0.36; if 12 criteria were reported when 31 were applicable, the proportion was 0.39. The comparison between type of study (etiologic or prognostic) and proportion of criteria reported was analyzed by using an independent-samples t test. The association between estimated word count and the proportion of criteria reported was analyzed by using Spearman’s correlation coefficient. Analyses were performed with SPSS software (version 11.5; SPSS Inc., Chicago, Illinois).

RESULTS

Development of the checklist

The final checklist comprised 33 criteria (table 1). The definitions used for the criteria and their sources are also included in table 1. The criteria reflect design and interpretation aspects covering the study rationale and population, recruitment, measurement and biases, data analysis, and generalizability of the results. The criteria represent two principal categories: 1) aspects that could possibly influence effect estimates and 2) more descriptive or contextual elements. Not all criteria were deemed applicable to all studies. For example, in some epidemiologic studies, the investigators do not have access to data on the nonconsenting members of the target population and cannot then compare them with consenters.

Application of the checklist to the 49 articles

For the two independent raters, the median percentage agreement was 75 percent (quartiles 62 percent–93 percent). The criteria that the raters had to discuss most often were number of participants at each stage, reliability of measurement methods, validity of measurement methods, reasons for loss to follow-up at each stage, missing data items at each stage, and absolute effect sizes. The raters resolved most coding discrepancies by consensus. A third independent rater was required to make the final decision about reliability of measurement methods in three articles and validity of measurement methods in one article.

Across the 49 articles, the mean proportion of applicable criteria reported was 0.51 (standard deviation (SD), 0.15; range, 0.12–0.82). The association between type of study (etiologic or prognostic) and proportion of criteria reported was not statistically significant (t(43 df) = 0.31, p = 0.76, two sided; studies with both an etiologic and prognostic focus, n = 4, were not included). When analyzed by journal type, the mean proportions of applicable criteria reported by the journals were 0.66 (SD, 0.03) for the American Journal of Epidemiology (impact factor = 4.2), 0.57 (SD, 0.11) for the Journal of Epidemiology and Community Health (impact factor = 2.1), 0.54 (SD, 0.13) for Archives of Physical Medicine and Rehabilitation (impact factor = 1.3), 0.49 (SD, 0.13) for Stroke (impact factor = 5.1), 0.46 (SD, 0.13) for the American Journal of Physical Medicine and Rehabilitation (impact factor = 0.9), and 0.46 (SD, 0.19) for Annals of Neurology (impact factor = 8.6). We found no relation between word count and proportion of checklist criteria reported (Spearman’s correlation coefficient = 0.12, p = 0.41, two sided).

Table 2 shows the total number of articles that reported each of the 33 criteria overall and by type of study. The table also shows the total number (and percentage) of articles where it was applicable to report each of the criteria. Eleven articles had one or more criteria that were not applicable to report. Table 2 shows that “reasons for loss to follow-up at each stage,” accounting for “loss to follow-up in the analysis,” and “missing data in the analysis” were the criteria to which “not applicable” most often applied.

In total, 16 articles (nine etiologic, seven prognostic) referred the reader to another publication for methodological details. In 13 articles, this referral was accomplished directly by using wording such as “full details are reported elsewhere”; three articles were less direct, citing a reference to a previous publication that used the same data.

The best reporting was for criteria describing the study rationale and population as well as how data were collected and analyzed (each criterion reported in 45 or more articles). Qualitative and quantitative assessments of bias (30–35 articles) and confounders (38 articles) were also generally well reported. The most poorly reported criteria (reported in fewer than 10 articles each) were justification for the numbers in the study (e.g., in terms of power to detect effects), reasons for not meeting eligibility criteria, numbers consenting/not consenting, reasons for nonconsent, comparison of consenters with nonconsenters, and accounting for missing data items or loss to follow-up in analyses. Also notable was the general lack of reporting of measures of absolute effects, even though it is regularly described in epidemiology textbooks as a particular strength of observational longitudinal studies.

Development of the flow diagram

As a result of developing the checklist and rating the articles, we produced a flow diagram, modeled on CONSORT (5), to help clarify the numerical history of an observational longitudinal study (figure 1). It records the numbers, and reasons for, eligibility, consent, participation in each wave, and attrition. These main elements were chosen because they provide information at a glance on probable selection-driven threats to internal and external validity.

DISCUSSION

More than 20 years ago, DerSimonian et al. wrote in relation to clinical trials that “although all may not agree on our specific list of items, editors could greatly improve the reporting … by providing authors with a list of items that they expect to be strictly reported” (3, p. 1336). They pointed out that while weakness in design may occur for good reason, weakness in reporting should not occur. Their statements apply just as cogently to observational longitudinal research, and use of a checklist such as ours may be useful to help prevent weak reporting.

We have shown variable reporting of some of the major threats to the internal and external validity of observational longitudinal studies. In the articles sampled, on average about half of the 33 checklist criteria were reported, with no differences found between study type or by word count. The criteria in the checklist representing selection bias were the least frequently reported overall, although issues of measurement quality were also neglected, with fewer than half of the articles discussing either reliability or validity. These findings are concerning because if observational longitudinal studies are to be accepted as valuable sources of evidence, complete reporting is required.

Aspects of recruitment, particularly the proportion of sampled subjects meeting the eligibility criteria and then consenting to participate, were poorly reported. In addition, the reasons that people did not consent, and comparisons of consenters with nonconsenters in terms of baseline demographic or clinical features, were also typically not reported. These aspects of selection bias are potentially important; if consenters differ from nonconsenters, the study findings may be affected. Dunn et al. (38) recently showed nonconsent in five large epidemiologic studies to be about 30 percent and illustrated how nonconsenters and nonresponders can account for 30–60 percent of the original sample. They recommend that researchers plan a priori their sample sizes to account for potential losses and consider the biases likely to be associated with nonconsent and dropout.

Although the numbers of participants at each stage of a study were recorded in half the articles, accounting for loss to follow-up and missing data items in the analyses were rarely reported. Data missing not at random can be a source of bias affecting internal validity and can also influence estimates of absolute prevalence or incidence (39, 40). In this study, we assessed how missing data were handled by whether the articles described imputation, weighting, or sensitivity analyses. It is acknowledged that while authors may not statistically account for missing data in this way, they may postulate on the likely impact of missing data on results. When authors did so, it was captured under criterion 31 of the checklist: “Was the impact of biases estimated quantitatively?” Approximately 60 percent of articles acknowledged the possible quantitative impact of various biases, illustrating a general awareness by authors, or determination by editors, of the necessity for doing so. Methods for dealing with missing data in observational longitudinal research range from simple analysis of between-group differences to complex imputation techniques (41). Although debate exists about the benefits of using such imputation methods, it is at least desirable to determine the pattern of missingness, how ignorable or informative the missing data are, and the potential impact that imputation or other approaches may have on the final estimates (40).

None of the 49 articles included any justification for the sample size. An issue for many longitudinal observational studies is lack of statistical power or precision to determine real differences until sufficient follow-up time has passed to accumulate enough outcomes (42). Although the appropriateness of calculating statistical power for these research designs has been questioned (41), a priori consideration of the precision of a longitudinal study to accurately quantify the difference between effects of exposures on an outcome is desirable (35, 38).

Absolute effect sizes, defined in this study as the difference in rates of disease between groups defined by an exposure, for example, attributable risk, were also infrequently reported. Inclusion of this criterion was strongly debated by the working group because it is not relevant for all observational longitudinal studies. However, absolute effect estimates are a useful measure of association in epidemiologic research (39) and are an underutilized strength of observational longitudinal studies. In the checklist, absolute effects can be seen primarily as a descriptive criterion rather than as an element representing threats to internal validity.

About 40 percent of the articles reported the reliability and validity of instruments used. In a study of reporting of psychometric qualities of measures in 171 articles describing rehabilitation studies, Dijkers et al. (43) also found poor reporting, with reliability and validity mentioned in only 20 percent and 7 percent of articles, respectively. Having reliable and valid instruments is one of the best ways of reducing measurement bias in epidemiologic research. Requiring authors to report these psychometric properties may improve the quality of the instruments used, and the confidence with which conclusions can be drawn from the results. Obviously, this requirement is unrealistic for every measure in a long list of variables, but it is desirable to have some assessment of measurement quality for the core variables, including confounders, in a particular analysis.

Only four criteria were universally reported in the articles: the study objectives, the study population, the number of participants at the beginning, and the method of data collection. Criteria about confounding, and actions to account for confounding in the analysis, were also generally well reported (in more than 60 percent of the articles). This issue is important because confounding is one of the major limitations of nonrandomized designs such as observational longitudinal studies, and adjustment in the analysis is essential for identifying true effects.

Despite the variable reporting of actions taken to reduce bias, chance, and confounding, three quarters of the articles discussed generalizability of the results to the target population. In some cases, authors acknowledged caveats to generalizability because of limitations such as selection bias. However, it is important to recognize that generalizability should be considered only once assumptions of internal validity are satisfied.

We have shown a need for improved reporting of observational longitudinal research, through application of a reasonable set of criteria and a flow diagram. Even though the clinical example used in this study was stroke, the checklist and flow diagram are independent of topic and so are directly applicable to other fields. If authors are required to report criteria such as those listed in the present study, they may think more carefully about design and analysis issues from the beginning of the study, thus raising the overall quality of research (23, 34). Epidemiologists and biostatisticians may be more prone to report these features because of their training (44), which may partially explain why the articles in epidemiology journals in this study reported the most checklist criteria. Journal policy toward reporting observational longitudinal research can clearly contribute. A review of authors’ guidelines for the six journals used in this study showed a rather low level of required detail specific to nonrandomized designs. The reporting of methodological detail about aspects that threaten internal validity are the domains of editors (and journal policy) and authors. Higher journal quality indicators, such as impact factors, have been linked to better overall reporting in randomized and nonrandomized studies (45); however, we failed to show a clear trend in this study.

We developed a flow diagram that summarizes sample selection, participant recruitment, eligibility criteria, consent and reasons for nonconsent, timing of follow-ups, and attrition at each stage. The choice of criteria to include was based on the desire to capture the key aspects that allow editors and readers to rapidly judge threats to the internal and external validity of the study, balanced with the need to keep the diagram relatively simple. Detail about the analysis was not included to avoid complicating the diagram. As expressed by Rennie, commenting on the benefits of CONSORT, “[when using a] … checklist and flow diagram, it takes a fraction of the time to get the essential information necessary to assess the quality of a trial” (46, p. 2006).

We recommend that editors move to require authors to use a structured approach to presenting the architecture of observational longitudinal research to communicate essential details about the study design. Doing so may force researchers to organize their thinking during an early stage of their research. The combination of a checklist such as ours, a flow diagram, and, ideally, a structured abstract (47) offers a starting point for consideration.

ACKNOWLEDGMENTS

Drs. L. Tooth and R. Ware were supported by a National Health and Medical Research Council of Australia Capacity Building Grant (252834).

Assistance is acknowledged from Drs. A. Barnett, Z. Clavarino, J. Najman, A. Lopez, P. Schluter, G. Williams, J. Van Der Pols, A. Mamun, and R. Alati from the Longitudinal Studies Unit at the School of Population Health, University of Queensland (URL: http://hisdu.sph.uq.edu.au/lsu/); and from Dr. A. Green from the Queensland Institute of Medical Research.

TABLE 1.

The checklist criteria, and their definitions, used to rate the articles included in the study*

Criterion Definition 
1. Are the objectives or hypotheses of the study stated? Self-explanatory. 
2. Is the target population defined? The group of persons toward whom inferences are directed. Sometimes the population from which a study group is drawn. 
3. Is the sampling frame defined? The list of units from which the study population will be drawn. Ideally, the sampling frame would be identical to the target population, but it is not always possible. 
4. Is the study population defined? The group selected for investigation. 
5. Are the study setting (venues) and/or geographic location stated? Comment required about location of research. Could include name of center, town, or district. 
6. Are the dates between which the study was conducted stated or implicit? Self-explanatory. 
7. Are eligibility criteria stated? The words “eligibility criteria” or equivalent are needed, unless the entire population is the study population. 
8. Are issues of “selection in” to the study mentioned?† Any aspect of recruitment or setting that results in the selective choice of participants (e.g., gender or health status influenced recruitment). 
9. Is the number of participants justified? Justification of number of subjects needed to detect anticipated effects. Evidence that power calculations were considered and/or conducted. 
10. Are numbers meeting and not meeting the eligibility criteria stated? Quantitative statement of numbers. 
11. For those not eligible, are the reasons why stated? Broad mention of the major reasons. 
12. Are the numbers of people who did/did not consent to participate stated? Quantitative statement of numbers. 
13. Are the reasons that people refused to consent stated? Broad mention of the major reasons. 
14. Were consenters compared with nonconsenters? Quantitative comparison of the different groups. 
15. Was the number of participants at the beginning of the study stated? Total number of participants (after screening for eligibility and consent) included in the first stage of data collection. 
16. Were methods of data collection stated? Descriptions of tools (e.g., surveys, physical examinations) and processes (e.g., face-to-face, telephone). 
17. Was the reliability (repeatability) of measurement methods mentioned? Evidence of reproducibility of the tools used. 
18. Was the validity (against a “gold standard”) of measurement methods mentioned? Evidence that the validity was examined against, or discussed in relation to, a gold standard. 
19. Were any confounders mentioned? Confounders were defined as a variable that can cause or prevent the outcome of interest, is not an intermediate variable, and is associated with the factors under investigation. 
20. Was the number of participants at each stage/wave specified? Quantitative statement of numbers at each follow-up point. 
21. Were reasons for loss to follow-up quantified? Broad mention and quantification of the major reasons. 
22. Was the missingness of data items at each wave mentioned? Differences in numbers of data points (indicating missing data items) explained. 
23. Was the type of analyses conducted stated? Specific statistical methods mentioned by name. 
24. Were “longitudinal” analysis methods stated? Longitudinal analyses were defined as those assessing change in outcome over two or more time points and that take into account the fact that the observations are likely to be correlated. 
25. Were absolute effect sizes reported? Absolute effect was defined as the outcome of an exposure expressed, for example, as the difference between rates, proportions, or means, as opposed to the ratios of these measures. 
26. Were relative effect sizes reported? Relative effects were defined as a ratio of rates, proportions, or other measures of an effect. 
27. Was loss to follow-up taken into account in the analysis? Specific mention of adjusting for, or stratifying by, loss to follow-up. 
28. Were confounders accounted for in analyses? Specific mention of adjusting for, or stratifying by, confounders. 
29. Were missing data accounted for in the analyses? Specific mention of adjusting for, or stratifying by, or imputation of missing data items. 
30. Was the impact of biases assessed qualitatively? Specific mention of bias affecting results, but magnitude not quantified. 
31. Was the impact of biases estimated quantitatively? Specific mention of numerical magnitude of bias. 
32. Did authors relate results back to a target population? A study is generalizable if it can produce unbiased inferences regarding a target population (beyond the subjects in the study). Discussion could include that generalizability is not possible. 
33. Was there any other discussion of generalizability? Discussion of generalizability beyond the target population. 
Criterion Definition 
1. Are the objectives or hypotheses of the study stated? Self-explanatory. 
2. Is the target population defined? The group of persons toward whom inferences are directed. Sometimes the population from which a study group is drawn. 
3. Is the sampling frame defined? The list of units from which the study population will be drawn. Ideally, the sampling frame would be identical to the target population, but it is not always possible. 
4. Is the study population defined? The group selected for investigation. 
5. Are the study setting (venues) and/or geographic location stated? Comment required about location of research. Could include name of center, town, or district. 
6. Are the dates between which the study was conducted stated or implicit? Self-explanatory. 
7. Are eligibility criteria stated? The words “eligibility criteria” or equivalent are needed, unless the entire population is the study population. 
8. Are issues of “selection in” to the study mentioned?† Any aspect of recruitment or setting that results in the selective choice of participants (e.g., gender or health status influenced recruitment). 
9. Is the number of participants justified? Justification of number of subjects needed to detect anticipated effects. Evidence that power calculations were considered and/or conducted. 
10. Are numbers meeting and not meeting the eligibility criteria stated? Quantitative statement of numbers. 
11. For those not eligible, are the reasons why stated? Broad mention of the major reasons. 
12. Are the numbers of people who did/did not consent to participate stated? Quantitative statement of numbers. 
13. Are the reasons that people refused to consent stated? Broad mention of the major reasons. 
14. Were consenters compared with nonconsenters? Quantitative comparison of the different groups. 
15. Was the number of participants at the beginning of the study stated? Total number of participants (after screening for eligibility and consent) included in the first stage of data collection. 
16. Were methods of data collection stated? Descriptions of tools (e.g., surveys, physical examinations) and processes (e.g., face-to-face, telephone). 
17. Was the reliability (repeatability) of measurement methods mentioned? Evidence of reproducibility of the tools used. 
18. Was the validity (against a “gold standard”) of measurement methods mentioned? Evidence that the validity was examined against, or discussed in relation to, a gold standard. 
19. Were any confounders mentioned? Confounders were defined as a variable that can cause or prevent the outcome of interest, is not an intermediate variable, and is associated with the factors under investigation. 
20. Was the number of participants at each stage/wave specified? Quantitative statement of numbers at each follow-up point. 
21. Were reasons for loss to follow-up quantified? Broad mention and quantification of the major reasons. 
22. Was the missingness of data items at each wave mentioned? Differences in numbers of data points (indicating missing data items) explained. 
23. Was the type of analyses conducted stated? Specific statistical methods mentioned by name. 
24. Were “longitudinal” analysis methods stated? Longitudinal analyses were defined as those assessing change in outcome over two or more time points and that take into account the fact that the observations are likely to be correlated. 
25. Were absolute effect sizes reported? Absolute effect was defined as the outcome of an exposure expressed, for example, as the difference between rates, proportions, or means, as opposed to the ratios of these measures. 
26. Were relative effect sizes reported? Relative effects were defined as a ratio of rates, proportions, or other measures of an effect. 
27. Was loss to follow-up taken into account in the analysis? 
Categories: 1

0 Replies to “Longitudinal Observational Research Paper”

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *