Forecasting Papers

These papers deal primarily with the methodological and practical applications of events data to various forecasting models. The methodologies employed in this scholarship include Hidden Markov Models and Cluster Analysis among others.

Predicting Risk Factors Associated with Forced Migration: An Early Warning Model of Haitian Flight

Stephen Shellman (University of Georgia) and Brandon Stewart (William and Mary)

While most forced migration studies focus on explanation, this study focuses on prediction. The study predicts forced migration events by predicting the civil violence, poor economic conditions, and foreign interventions known to cause individuals to flee their homes in search of refuge. By accounting for the interaction between civil conflict intensity levels, the ebb and flow of origin and potential host countries' economies, and impinging foreign policy pressures on countries' governments and dissidents, the model can better predict the occurrence and magnitude of forced migration events. Policy makers can use these predictions to aid their planning for humanitarian crises. If we can predict forced migration, we can better plan for humanitarian crises. While the study is limited to predicting Haitian flight to the United States, its strength is its ability to predict weekly flows as opposed to annual flows, providing a greater level of predictive detail than its "country-year" counterparts. Given the model's performance, the study calls for the collection of disaggregated data in additional countries to provide more precise and useful early warning models of forced migrant events.

Paper prepared for delivery at the Annual Meeting of the International Studies Association, San Diego, March 2006.

Link to Adobe .pdf file of the paper

Forecasting Israeli-Palestinian Conflict with Hidden Markov Models

Robert Shearer (Center for Army Analysis)

This paper presents research into conflict analysis, utilizing Hidden Markov models to capture the patterns of escalation in a conflict and Markov chains to forecast future escalations. HiddenMarkov models have an extensive history in a wide variety of pattern classification applications. In these models, an unobserved finite state Markov chain generates observed symbols whose distribution is conditioned on the current state of the chain. Training algorithms estimate model parameters based upon known patterns of symbols. Assignment rules classify unknown patterns according to the likelihood of known models generating the observed symbols. The research presented here utilized much of the Hidden Markov model methodology, but not for pattern classification, rather to identify the underlying finite state Markov chain for a symbol realization. Machine coded newswire story leads provided event data that served as the symbol realization for the Hidden Markov model. Fundamental matrices derived from the Markov chain led to forecasts that provide insight into the dynamic behavior of the conflict and describe potential futures of the conflict in probabilistic terms, to include the likelihood of conflict, the time to conflict, and the time in conflict.

Link to Adobe .pdf file of the paper

A New Kind of Social Science: The Path Beyond Current (IR) Methodologies May Lie Beneath Them

Valerie M. Hudson, Philip A. Schrodt, and Ray D. Whitmer

Existing formal models of political behavior have followed the lead of the natural sciences and generally focused on methods that use continuous-variable mathematics. Stephen Wolfram has recently produced an extended critique of that approach in the natural sciences, and suggested that a great deal of natural behavior can be accounted for using rules that involve discrete patterns. Wolfram's work generally does not consider models in the social sciences but given the similarity between many of the techniques for modeling in the natural and social sciences, his critique can readily be applied to models of social behavior as well. We argue further that pattern-based models are particularly relevant to modeling human behavior because human cognitive abilities are far more developed in the domain of pattern recognition than in the domain of continuous-variable mathematics. We test the possibility of finding pattern-based behavior in international behavior by looking at event data for the Israel-Palestine conflict for the period 1979-2003. Using a new web-based tool explicitly designed for the analysis of event data patterns, we experiment with three general patterns: the classic tit-for-tat, an "olive branch" pattern designed to detect attempts at de-escalation, and four "meta-rules" that look at the relationship between prior conflict and the propensity of the actors to engage in reciprocal behavior. Our analysis shows that these patterns can be found repeatedly in the data, and their frequency corresponds to changes in the qualitative characteristics of the conflict.

Paper prepared for delivery at the Annual Meeting of the International Studies Association, Montreal, Quebec, Canada, March 2004.

Link to Adobe .pdf file of the paper

The analytical web site for this project (formerly at http://ep.jhax.org), which includes the graphic tools for analyzing event patterns is http://nkss.byu.edu/

.

Using Event Data to Monitor Contemporary Conflict in the Israel-Palestine Dyad

Philip A. Schrodt, Deborah J. Gerner, and Ömür Yilmaz

For the past eighteen months, the Kansas Event Data System (KEDS) project has been using event data and other web-based sources to produce quarterly reports on the Israel-Palestine conflict for the swisspeace (Swiss Peace Foundation) FAST Project, which is sponsored by Swiss Agency for Development and Cooperation and a number of non-governmental organizations. This paper describes the indicators that we are monitoring, the process we have developed to generate the reports, and the supplemental sources we are using. We address the issue of the differences between newspaper and news wire reports with respect to "media fatigue" effects and also analyze some of the strengths and weaknesses of this approach to conflict monitoring.

Paper prepared for delivery at the Annual Meeting of the International Studies Association, Montreal, Quebec, Canada, March 2004.

forthcoming, 2005. International Studies Perspectives

Link to Adobe .pdf file of the paper

Link to detailed list of steps used to update the FAST data

Forecasts and Contingencies: From Methodology to Policy

Philip A. Schrodt

A "folk criticism" in political science maintains that the discipline should confine its efforts to explanation and avoid venturing down the dark, dirty, and dangerous path to forecasting and prediction. I argue that not only is this position inconsistent with the experiences of other sciences, but in fact the questions involved in making robust and valid predictions invoke many core methodological issues in political analysis. Those issues include, among others, the question of the level of predictability in political behavior, the problem of case selection in small-N situations, and the various alternative models that could be used to formalize predictions. This essay focuses on the problem of forecasting in international politics, and concludes by noting some of the problems of institutional culture - bureaucratic and academic - that have inhibited greater use of systematic forecasting methods in foreign policy.

Paper presented at the theme panel "Political Utility and Fundamental Research: The Problem of Pasteur's Quadrant" at the American Political Science Association meetings, Boston, 29 August - 1 September 2002

Link to Adobe .pdf file of the paper

Forecasting Conflict in the Balkans using Hidden Markov Models

Philip A. Schrodt

This study uses hidden Markov models (HMM) to forecast conflict in the former Yugoslavia for the period January 1991 through January 1999. The political and military events reported in the lead sentences of Reuters news service stories were coded into the World Events Interaction Survey (WEIS) event data scheme. The forecasting scheme involved randomly selecting eight 100-event "templates" taken at a 1-, 3- or 6-month forecasting lag for high-conflict and low-conflict weeks. A separate HMM is developed for the high-conflict-week sequences and the low-conflict-week sequences. Forecasting is done by determining whether a sequence of observed events fit the high-conflict or low-conflict model with higher probability.

Models were selected to maximize the difference between correct and incorrect predictions, evaluated by week. Three weighting schemes were used: unweighted (U), penalize false positives (P) and penalize false negatives (N). There is a relatively high level of convergence in the estimates -- the best and worst models of a given type vary in accuracy by only about 15% to 20%. In full-sample tests, the U and P models produce at overall accuracy of around 80%. However, these models correctly forecast only about 25% of the high-conflict weeks, although about 60% of the cases where a high-conflict week has been forecast turn out to have high conflict. In contrast, the N model has an overall accuracy of only about 50% in full-sample tests, but it correctly forecasts high-conflict weeks with 85% accuracy in the 3- and 6-month horizon and 92% accuracy in the 1-month horizon. However, this is achieved by excessive predictions of high-conflict weeks: only about 30% of the cases where a high-conflict week has been forecast are high-conflict. Models that use templates from only the previous year usually do about as well as models based on the entire sample.

The models are remarkably insensitive to the length of the forecasting horizon -- the drop-off in accuracy at longer forecasting horizons is very small, typically around 2%-4%. There is also no clear difference in the estimated coefficients for the 1-month and 6-month models. An extensive analysis was done of the coefficient estimates in the full-sample model to determine what the model was "looking at" in order to make predictions. While a number of statistically significant differences exist between the high and low conflict models, these do not fall into any neat patterns. This is probably due to a combination of the large number of parameters being estimated, the multiple local maxima in the estimation surface, and the complications introduced by the presence of a number of very low probability event categories. Some experiments with simplified models indicate that it is possible to use models with substantially fewer parameters without markedly decreasing the accuracy of the predictions; in fact predictions of the high conflict periods actually increase in accuracy quite substantially.

Paper presented at the American Political Science Association meetings, 31 August - 3 September, 2000

Link to Adobe .pdf version of this paper

The Impact of Early Warning on Institutional Responses to Complex Humanitarian Crises

Philip A. Schrodt and Deborah J. Gerner

This paper considers the problems of institutional response to the early warning of complex humanitarian crises (CHCs). We start with a typology of six different modes of early warning failure: strategic deception, conventional concealment, institutional ignorance, reflexive response, exogenous shifts, and systemic complexity. We discuss the extent to which each of these can affect the early warning of CHCs. We then consider the problems of cognitive, bureaucratic, and political constraints to effective early warning. The paper concludes that the early warning of CHCs is likely to remain decentralized in academic, nongovernmental (NGO) and intergovernmental (IGO) projects, but that because of increases in the availability of information this decentralization does not necessarily preclude effective early warning, and may in fact enhance it. There is, however, a need to augment the credibility, visibility, and efficacy of these efforts, as is being done through efforts such as Forum for Early Warning and Emergency Response (FEWER) and ReliefWeb.

Paper presented at the Third Pan-European International Relations Conference and Joint Meeting with the International Studies Association, Vienna, 16 - 19 September 1998

Link to Adobe .pdf version of this paper

Early Warning of Conflict in Southern Lebanon using Hidden Markov Models

Philip A. Schrodt
in Harvey Starr, ed. The Understanding and Management of Global Violence:
New Approaches to Theory and Research on Protracted Conflict
, pp. 131-162.
New York: St. Martin's Press, 1999.

This paper extends earlier work on the use of hidden Markov models (HMMs) to the problem of forecasting international conflict. HMMs are a sequence comparison method that is widely used in computerized speech recognition; they are easily be adapted to work with sequences of international event data. The HMM is a computationally efficient method of generalizing a set of example sequences observed in a noisy environment. The paper provides a theoretical "micro-foundation" for the sequence comparison approach based on co-adaptation of standard operating procedures.

The left-right (LR) HMM used in speech recognition problems is first extended to a left-right-left (LRL) model that allows a crisis to escalate and de-escalate. This model is tested for its ability to correctly discriminate between BCOW crisis that do and do not involve war. The LRL model provides slightly more accurate classification than the LR model. The interpretation of the hidden states in the LRL models, however, is more ambiguous than found in the LR model.

The HMM is then applied to the problem of forecasting the outbreak of armed violence between Israel and Arab forces in south Lebanon during the period 1979 to 1997 (excluding 1982-1985). An HMM first is estimated using six cases of "tit-for-tat" escalation, then fitted to the entire time period. The model identifies about half of the TFT conflicts -- including all of the training cases -- that occur in the full sequence, with only one false positive. This result suggests that HMMs could be used in an event-driven continuous monitoring system. However, the fit of the model is very sensitive to the number of nonevents found in a sequence, and consequently that measure is ineffective as an early warning indicator.

In a subset of models the maximum likelihood estimate of the sequence of hidden Markov states characterizing a sequence provides a robust early warning indicator with a three to six-month lead. These models are valid in a split-sample test, and the patterns of cross-correlation of the individual states of the model are consistent with the theoretical expectations. While this approach clearly needs further validation, it appears promising.

The paper concludes with observations on the extent to which the HMM approach to be generalized to various categories of conflict, some suggestions on how the method of estimation can be improved, and the implications that sequence-based forecasting techniques have for the theoretical understanding of the causes of conflict.

This paper was presented at the annual meetings of the American Political Science Association, Washington, DC, August 1997.

Link to supplementary graphics

Link to Adobe .pdf version of "A Landscape Model of Rule-Based Co-Adaptation in International Behavior

Left-right-left hidden Markov model source code and data files (.sit)
Left-right-left hidden Markov model source code and data files (.zip)

Pattern Recognition of International Crises using Hidden Markov Models

Philip A. Schrodt
in Diana Richards, ed. Political Complexity: Nonlinear Models of Politics, pp. 296-328.
Ann Arbor: University of Michigan Press, 2000.

Event data are one of the most widely used indicators in quantitative international relations research. To date, most of the models using event data have constructed numerical indicators based on the characteristics of the events measured in isolation and then aggregated. An alternative approach is to use quantitative pattern recognition techniques to compare an existing sequence of behaviors to a set of similar historical cases. This has much in common with human reasoning by historical analogy while providing the advantages of systematic and replicable analysis possible using machine-coded event data and statistical models.

This chapter uses "hidden Markov models" -- a recently developed sequence-comparison technique widely used in computational speech recognition -- to measure similarities among international crises. The models are first estimated using the Behavioral Correlates of War data set of historical crises, then applied to an event data set covering political behavior in the contemporary Middle East for the period April 1979 through February 1997.

A split-sample test of the hidden Markov models perfectly differentiates crises involving war from those not involving war in the cases used to estimate the models. The models also provide a high level of discrimination in a set of test cases not used in the estimated, and most of the erroneously-classified cases have plausible distinguishing features. The difference between the war and nonwar models also correlates significantly with a scaled measure of conflict in the contemporary Middle East. This suggests that hidden Markov models could be used to develop conflict measures based on event similarities to historical conflicts rather than on aggregated event scores.

The file includes the paper in Postscript, MS-Word 5.1a (Macintosh) and Adobe Acrobat (.pdf) formats. It also includes the C source code for the program used to estimate the model, and pointers to the data sets used in the analysis.

An earlier version of the paper was presented in March 1997 at the annual meeting of the International Studies Association and at the "Synergy in Early Warning" Conference, Centre for Refugee Studies, York University, Toronto.

Link to Adobe .pdf version of this paper

Using Cluster Analysis to Derive Early Warning Indicators for Political Change in the Middle East, 1979-1996

Philip A. Schrodt and Deborah J. Gerner
American Political Science Review 94,4: 803-818 (December 2000)

This paper uses event data to develop an early warning model of major political change in the Levant for the period April 1979 to December 1998. Following a general review of statistical early warning research, the analysis focuses on the behavior of eight Middle Eastern actors -- Egypt, Israel, Jordan, Lebanon, the Palestinians, Syria, the United States and USSR/Russia -- using WEIS-coded event data generated from Reuters news service lead sentences with the KEDS machine-coding system.

The analysis extends earlier work (Schrodt and Gerner 1995) demonstrating that clusters of behavior identified by conventional statistical methods correspond well with changes in political behavior identified a priori. We employ a new clustering algorithm that uses the correlation between the dyadic behaviors at two points in time as a measure of distance, and identifies cluster breaks as those time points that are closer to later points than to preceding points. We also demonstrate that these data clusters begin to "stretch" prior to breaking apart; this characteristic can be used as an early-warning indicator. A Monte-Carlo analysis shows that the clustering and early warning measures perform very differently in simulated data sets having the same mean, variance, and autocorrelation as the observed data (but no cross-correlation) which reduces the likelihood that the observed clustering patterns are due to chance.

The initial analysis uses Goldstein's (1992) weighting system to aggregate the WEIS-coded data. In an attempt to improve on the Goldstein scale, we use a genetic algorithm to optimize the weighting of the WEIS event categories for the purpose of clustering. This does not prove very successful and only differentiates clusters in the first half of the data set, a result similar to one we obtained using the cross-sectional K-Means clustering procedure. Correlating the frequency of events in the twenty-two 2-digit WEIS categories, on the other hand, gives clustering and early warning results similar to those produced by the Goldstein scale. The paper concludes with some general remarks on the role of quantitative early warning and directions for further research.

Paper originally presented at the American Political Science Association, San Francisco, 28 August - 1 September 1996

Link to Adobe .pdf file of the original paper

Link to Adobe .pdf file of the APSR version of the paper

The Statistical Characteristics of Event Data

Philip A. Schrodt
International Interactions 20,1-2: 35-53

This paper explores event data as an abstract statistical object. It briefly traces the historical development of event data, with particular attention to how nominal events have come to be used primarily in interval-level studies. A formal definition of event data and its stochastic error structure is presented. From this definition, some concrete suggestions are made for statistically compensating for misclassification and censoring errors in frequency-based studies. The paper argues for returning to the analysis of events as discrete structures. This type of analysis was not possible when event data were initially developed, but electronic information processing capabilities have improved dramatically in recent years and many new techniques for generating and analyzing event data may soon be practical.

Paper originally presented at the International Studies Association, St. Louis, March 1988

Link to Adobe .pdf file of the original paper

LAST UPDATED: 4 APRIL 2006