Prior research has shown load reduction estimates from residential event-driven demand response programs (e.g., Critical Peak Pricing) using X of the highest Y days with a weather adjustment method are the best performing within the class of currently used baseline methods. However, they are still biased relative to estimates produced from randomized control trials (RCTs), the unbiased “gold standard” evaluation method. In this paper we identify underlying factors that cause some of the bias found in one commonly used baseline method. Rather than simply quantifying bias, our research provides a deeper understanding of what causes the bias and thus can be used to develop more accurate methods that are not subject to these underlying factors. Previous studies have compared various baseline methods relative to each other; however, because all baseline methods are biased, it is impossible to determine the true bias that exists in them. We have access to a unique RCT dataset: the Sacramento Municipal Utility District’s study of critical peak pricing. Our analysis of 23 event days over two summers allows us to identify the true bias on load reduction estimates by using the RCT estimates as the unbiased gold standard against which we compare the estimates from the baseline methods. We found that spillover of energy reductions, from hours targeted by a program onto other hours, is one underlying factor that is a major cause of bias in baseline methods. We discuss alternative baseline methods that may not be subject to this same bias.