Retrofitting building systems is known to provide cost-effective energy savings. However, prioritizing retrofits and computing their expected energy savings and cost/benefits can be a complicated, costly, and an uncertain effort. Prioritizing retrofits for a portfolio of buildings can be even more difficult if the owner must determine different investment strategies for each of the buildings. Meanwhile, we are seeing greater availability of data on building energy use, characteristics, and equipment. These data provide opportunities for the development of algorithms that link building characteristics and retrofits empirically. In this paper we explore the potential of using such data for predicting the expected energy savings from equipment retrofits for a large number of buildings. We show that building data with statistical algorithms can provide savings estimates when detailed energy audits and physics-based simulations are not cost- or time-feasible. We develop a multivariate linear regression model with numerical predictors (e.g., operating hours, occupant density) and categorical indicator variables (e.g., climate zone, heating system type) to predict energy use intensity. The model quantifies the contribution of building characteristics and systems to energy use, and we use it to infer the expected savings when modifying particular equipment. We verify the model using residual analysis and cross-validation. We demonstrate the retrofit analysis by providing a probabilistic estimate of energy savings for several hypothetical building retrofits. We discuss the ways understanding the risk associated with retrofit investments can inform decision making. The contributions of this work are the development of a statistical model for estimating energy savings, its application to a large empirical building dataset, and a discussion of its use in informing building retrofit decisions.