PDI Models Section
The Models section found in your PDI account includes both predictive models or scores, other data such as Census or geographic-based data, as well as actual action based or observed data.
Note: In addition to the PDI Models and other data described below your account could include your campaign’s own custom models or data.
PDI Models
We built these models to enhance the process for targeting voters that appear harder to differentiate and distinguish key attributes influencing political behaviors. PDI models (like most models) are best when used to supplement or refine other available data or used when there is no other data available. For example if you are trying to identify possible degrees of partisanship or Ideology you should start the actual party of those voters that you want to further distinguish, such as including only INDEPENDENT voters or excluding all DemPlus and RepPlus voters, then use the partisanship or ideology model score to segment or separate those remaining voters who’s partisanship is unknown.
PDI Ideology Liberal / Conservative
This model identifies how conservative or liberal a voter is based on a number of factors. The analysis begins with the survey responses, but then also includes elements such as the actual party registration, household partisan composition using PDI's 27 different household party type codes, and precinct level election outcomes in key past elections.
Category | Count | Percentage (%) |
Very Liberal | 962,040 | 4.47% |
Liberal | 7,522,419 | 34.99% |
Moderate | 7,856,581 | 36.54% |
Conservative | 4,821,438 | 22.43% |
Very Conservative | 337,617 | 1.57% |
To use this model in PDI, select a score from 1-100, where 1 is the most liberal, and 100 is the most conservative. As can be seen in the following chart, the median result is a more liberal voter, however there is a second hump in the data showing a strong population of conservative, but not very conservative voters. A score below 42 represents someone classified as a liberal, and a score over 72 represents someone classified as conservative. Within the range of 42-72, where voters are categorized as moderate, scores from 42 and 57 will lean liberal and scores from 57-72 will lean conservative.
PDI Partisanship of Independent Voters
The current PDI system includes party registration, and a set of party descriptions called DemPlus and RepPlus which can be used to capture both those party registrants, and people who have donated to, pulled ballots for, or previously been registered Democratic (DemPlus) or Republican (RepPlus).
This model builds on the DemPlus and RepPlus by allowing campaigns to target independents who model to primarily vote with Democrats or Republicans.
This model uses some of the same building blocks, but then adds household partisan makeup, ethnicity, age, registration date and voter surveys in which independent voters were asked if they primarily sided with Democrats or Republicans.
Category | Count | Percentage (%) |
Mostly or Always Democrats | 9,672,898 | 42.67% |
Usually through Mostly Democrats | 1,887,644 | 8.33% |
True Swing Voter | 3,046,306 | 13.44% |
Usually through Mostly Republicans | 2,594,097 | 11.44% |
Mostly or Always Republicans | 5,467,880 | 24.12% |
PDI Children in Household/Likely Parent
To use this model in the PDI, select all scores greater than 50 for households with school aged children under 18, and all scores less than 50 for households without children under 18. The closer the scores are to the extremes, the more confidence in children in (or out) of household there should be. As a binary model, we recommend using 50 as a cut point. You can also just use the “Likely Parents” option found in the Demographics tab of the PDI to select the same voters.
PDI Support for Abortion/Choice (2023-2024)
This probabilistic model measures how likely someone is to respond that they support Pro-Choice policies on a survey. Importantly, scores on either extreme do not necessarily reflect an intensity of support, but rather our measure of confidence of our knowledge of the voter’s propensity to support Pro-Choice policies. For example, scores of 75 are not necessarily more pro-choice than scores of 60, we simply have more confidence that they are pro-choice. As a cutoff for messaging, we recommend starting with a cut point at 50 to exclude anti-abortion voters for paid media outreach.
NOTE: if a voter has a “0” value, or null value, it is because they could not be scored for some reason. This could be because the voter record is new, and came on after the scores were completed.
Deep dive into PDI model scores
On a technical level, all of our models are built in a similar way. Since 2016, PDI has continuously run large email surveys via the PDI Emailing module and the SurveyMonkey platform, giving us access to over 200,000 voter-file matched responses. We then take the relevant questions from the surveys, along with the respondents corresponding voter file data, and fit machine learning models to figure out which variables are the most predictive to their responses. After validating these models on a holdout sample of survey response (i.e., some survey responses that are intentionally not included into the modeling process in order to figure out how predictive these models are), we then score these models on the entire voter file.
In order to understand how these models work, it’s useful to crack them open and look at individual examples. Using a methodology developed by researchers at the University of Washington known as SHAP, we can examine the individual “contribution” of each variable to each individual’s score. For an illustrated example, here’s a plot explaining the author’s partisanship score.
This seems fairly obvious -- the author is a registered Democrat, who lives with another Democrat, in a dense urban city. We have pretty high confidence this is a Liberal!
Other Data found in the Models Section
CalEnviroScreen
The CalEnviroScreen was created by the California OEHHA (Office of Environmental Health Hazard Assessment) as a screening methodology that can be used to help identify California communities that are disproportionately burdened by multiple sources of pollution.
It is basically a model or score identifying census tracts or areas where people come in the most contact with pollution or where the most polluted areas in the state are. The higher the score the more polluted the area is that the voter lives in. Our version of this data uses the CES Score field rounded to the nearest whole number 0 thru 99.
For more details about the Cal Enviro Screen visit https://oehha.ca.gov/calenviroscreen
Donated Data
PDI Donor data is based on name and address matches between the PDI voter file and publicly reported donations at the state and local levels. The field groupings include the following:
- Donated All is any voter that has donated or contributed to any campaign or political organization.
- Donated Large Amount is any voter that has donated a cumulative total of at least $1,000.
- Donated to Democratic is a voter of any party that donated directly to a Democratic candidate or party
- Donated to Republican is a voter of any party that donated directly to a Republican candidate or party
- Donated to PAC is a voter that donated directly to a PAC that we may or may not identify as partisan
- Donated to Local Candidate is a voter that donated directly to a local campaign (mostly non-partisan campaigns for city council, school board or local jurisdiction)
- Donated LGBT (or PRO-LGBT Donor) is a voter that donated directly to a LGBT organization or campaign. The majority of this data is made up of voters that donated to the No on Prop 8 campaign.
Educational Attainment (EDU) data
Educational Attainment (EDU) data is specifically the % of Age 25+ Population within a Census Block Group, er ACS (American Community Survey) 5-Year Estimate Census Data.
- Graduate Degree (EDUGD) = % have Graduate Degree of population Age 25+
- College Educated (EDUCG) = % have Bachelor Degree or Higher of population Age 25+
- Some College Education (EDUSC) = % have Some College of population Age 25+
- High School Education (EDUHSG) = % have High School Diploma of population Age 25+
- No High School Education (EDUNHSG) = % have No High School Education of population Age 25+
These values are applied to all voters regardless of age in the same census block group. So, a score or value of College Educated = 20 means that the voter lives in a census block group in which 20% of residents Age 25+ have a Bachelor degree or higher. The higher the number for College Educated or Graduate Degree would mean the voter lives in an area with a higher percentage of people Age 25% that have a Bachelor or Graduate Degree, which would indicate that the voter is more likely to be college educated.
If you want to split the state’s voters into roughly two equal halves you would have to pull any voters in those census block groups that have a College Educated/College Degree value from about 21% or higher. In other words, if you select voters with EDUGD> 20 you will get roughly 50% of total registered voters. If you increase it to 25% or greater (EDUCG>24) you will pull roughly 40% of voters. Pushing it to 50% plus (EDUCG>49) only pulls about 2% of voters in the state. Different districts or regions within the state will produce different results based on the average education level in a given area.