1.300 Public Finance Modeling#
Explainer
Domain Explainer: Public Finance Modeling#
What is Public Finance Modeling?#
Public finance modeling refers to the computational simulation of tax and benefit policies to understand their impacts on individuals, households, and populations. Unlike general-purpose financial modeling (which might analyze corporate cash flows or investment portfolios), public finance modeling specifically deals with government revenue systems and social programs.
Why It Matters#
Policy decisions affect millions of lives. When legislators propose changing tax rates, creating new credits, or modifying benefit eligibility, they need to understand:
- Who wins and who loses? Distributional analysis shows which income groups benefit or pay more
- How much will it cost? Revenue estimates project fiscal impacts
- What are the incentive effects? Marginal tax rate calculations reveal work incentives
- Will it reduce poverty? Benefit simulations estimate poverty reduction
Without computational models, policy analysis relies on back-of-the-envelope estimates or small samples that may not represent the full population.
How It Works: Microsimulation#
The core technique is microsimulation:
Start with representative microdata (census, IRS data, surveys)
- Each record represents a household or individual
- Contains income, family structure, state of residence, etc.
Encode tax/benefit rules as code
- Federal income tax: brackets, deductions, credits
- State income tax: rates, conformity with federal rules
- Benefits: eligibility criteria, phase-outs
Apply rules to every record
- Calculate tax liability or benefit eligibility
- Weight by population to get national/state estimates
Compare baseline vs. reform
- Baseline: Current law
- Reform: Proposed policy change
- Difference: Who gains/loses, revenue impact
Example:
- Baseline: Child Tax Credit is $2,000 per child
- Reform: Increase to $4,000 per child
- Model: Apply both rules to 200,000 household records
- Output: “Reform reduces poverty by 1.2M people, costs $100B/year”
Who Uses These Tools?#
Government Agencies#
- Congressional Budget Office (CBO): Scores legislation for fiscal impact
- Treasury Department: Revenue estimates for tax proposals
- UK HM Treasury: Uses PolicyEngine for official modeling
- French Government: Uses OpenFisca for tax-benefit analysis
Think Tanks & Policy Organizations#
- Tax Policy Center: Analyzes tax reform proposals
- Center on Budget and Policy Priorities: Evaluates anti-poverty programs
- American Enterprise Institute: Conservative policy analysis
- (Tools are used across political spectrum)
Academic Researchers#
- Study tax incidence (who really pays taxes)
- Analyze behavioral responses (do tax changes affect work decisions?)
- Evaluate program effectiveness (does EITC reduce poverty?)
- Publish in journals like National Tax Journal, Journal of Public Economics
Advocacy Groups#
- Labor unions: Analyze wage tax interactions
- Business groups: Corporate tax burden analysis
- Anti-poverty advocates: Benefit program expansions
- State-level organizations: State budget analysis
Technical Challenges#
1. Data Quality#
- Microdata is expensive/restricted: IRS Public Use File costs money, has privacy limitations
- Survey underreporting: High-income households underrepresented
- Imputation needed: Link multiple datasets to get full picture
2. Rule Complexity#
- Federal tax code: 6,000+ pages of rules
- Interactions: EITC + Child Tax Credit + SNAP + Medicaid + housing assistance all interact
- State variations: 50 different state tax codes
- Temporal changes: Rules change every year, sometimes multiple times per year
3. Validation Difficulty#
- Aggregate statistics: Can compare model output to published IRS totals
- Individual accuracy: Hard to validate individual-level calculations
- Behavioral responses: Models are often “static” (assume no behavior change)
Why Open Source Matters#
Transparency in policymaking:
- Tax laws affect everyone - models should be publicly auditable
- Proprietary models (like ITEP) are black boxes - can’t verify methodology
- Open-source models can be peer-reviewed by academics
Reproducibility:
- Academic papers should provide replication code
- Policy organizations should show their work
- Open models enable independent validation
Ecosystem effects:
- Open tools lower barriers to entry for new researchers
- Collaboration improves quality (many eyes on the code)
- Government adoption is easier (no licensing fees, vendor lock-in)
Current State of Open Source#
Strong foundation:
- Tax-Calculator: US federal, public domain, widely used
- PolicyEngine: US + UK, all 50 states, web interface
- OpenFisca: Multi-country, government-adopted (France)
Gaps:
- Property tax: No open-source solution
- Sales tax: Commercial APIs exist (TaxJar, Avalara) but not research tools
- Local income taxes: NYC, Philadelphia underserved
- Benefits: SNAP, Medicaid less mature than tax modeling
Comparison to Corporate Finance#
| Aspect | Public Finance | Corporate Finance |
|---|---|---|
| Focus | Taxes, benefits, equity | Investment, valuation, risk |
| Users | Government, think tanks | Companies, investors |
| Data | Microdata (census, IRS) | Financial statements |
| Rules | Tax code, eligibility criteria | Accounting standards |
| Goals | Equity, revenue, poverty | Profit, shareholder value |
| Tools | Tax-Calculator, OpenFisca | Excel, Bloomberg Terminal |
Why separate? Tax rules are qualitatively different from financial modeling:
- Discontinuities (phase-outs, cliffs) not smooth curves
- Legal complexity (6,000-page tax code vs. GAAP)
- Distributional focus (who pays) vs. aggregate focus (total profit)
- Public interest (open source, transparency) vs. competitive advantage (proprietary)
Example: Child Tax Credit Reform#
Context: Current Child Tax Credit (CTC) is $2,000 per child under 17. Proposal: increase to $4,000.
Questions policymakers ask:
- How much does it cost?
- Who benefits?
- Does it reduce poverty?
- What are the work incentive effects?
How a model answers:
Output:
- Cost: $120B per year
- Poverty reduction: 1.4M people (especially children)
- Gains: $2,000/year for families with children, $0 for childless households
- Work incentives: Minimal impact (CTC doesn’t phase in with earnings)
Policy debate:
- Proponents: Reduces child poverty significantly
- Critics: Expensive, could prioritize other anti-poverty programs
- Model doesn’t resolve debate, but quantifies trade-offs
Key Terms#
- Microsimulation: Applying rules to individual records and aggregating
- Marginal Tax Rate (MTR): Extra tax paid on next dollar earned
- Effective Tax Rate (ETR): Total tax / total income
- Distributional Analysis: Who pays / who benefits by income group
- Revenue Estimate: Projected government revenue under a policy
- Baseline vs. Reform: Current law vs. proposed change
- Static Model: Assumes no behavioral response to policy changes
- Dynamic Model: Estimates behavioral responses (labor supply, saving)
- Incidence: Who ultimately bears the economic burden of a tax
Further Reading#
Academic#
- Microsimulation Modeling for Policy Analysis (O’Donoghue, 2001)
- The Measurement of Tax Progressivity (Musgrave & Thin, 1948)
- Optimal Taxation in Theory and Practice (Mankiw, Weinzierl, Yagan, 2009)
Practitioner#
Technical Documentation#
Bottom line: Public finance modeling is the quantitative foundation for evidence-based tax and benefit policymaking. Open-source tools like Tax-Calculator, PolicyEngine, and OpenFisca democratize access to these capabilities, enabling transparent, reproducible policy analysis.
S1: Rapid Discovery
S1: Problem Overview#
The Core Problem#
How do we know what a tax or benefit policy will do before we enact it?
Policymakers face a dilemma:
- Pass legislation → wait months/years → see actual effects
- OR: Model the policy beforehand → identify problems → adjust
Traditional approach (pre-2000s):
- Government agencies build proprietary models
- Black-box calculations, no public access
- Researchers can’t verify claims
- Each state/country duplicates effort
Problem: No transparency, no reproducibility, wasteful duplication
Why This Is Hard#
1. Complexity of Tax Codes#
US federal income tax alone:
- ~100+ forms and schedules
- Hundreds of interacting provisions
- Phase-ins, phase-outs, cliffs, kinks
- Different definitions of income (AGI, MAGI, earned income)
- Credits vs. deductions vs. exemptions
- Alternative Minimum Tax (parallel tax system)
Example interaction complexity:
EITC (Earned Income Tax Credit) depends on:
→ Earned income (wages, self-employment)
→ AGI (for phase-out)
→ Number of qualifying children
→ Filing status
→ Investment income limit ($11,000 threshold)
Change CTC (Child Tax Credit) →
→ Changes AGI →
→ Changes EITC phase-out →
→ Changes net refund2. Multi-Level Government#
In the US:
- Federal income tax
- State income tax (43 states)
- Local income tax (cities: NYC, Philadelphia)
- Property tax (3,000+ counties)
- Sales tax (11,000+ jurisdictions)
Interactions:
- State taxes deductible on federal return (SALT cap)
- Some states conform to federal rules, others don’t
- Credits for taxes paid to other states
- Reciprocity agreements (commuter states)
3. Data Requirements#
To model policies accurately, need:
- Microdata: Representative sample of population (Census, IRS)
- Tax units: Convert individuals to filing units (married/single)
- Income components: Wages, capital gains, dividends, etc.
- Demographics: Age, kids, disability status
- Weights: Scale sample to full population
Problem: Privacy laws limit access to detailed data
4. Behavioral Responses#
Tax changes affect behavior:
- Higher marginal rates → work less (labor supply response)
- Tax credits for kids → more childbearing? (demographic response)
- Corporate rate changes → investment decisions (capital response)
Static vs. dynamic modeling:
- Static: Assume behavior doesn’t change (simpler)
- Dynamic: Model behavioral responses (complex, uncertain)
Most libraries are static (documented limitation)
What Public Finance Modeling Solves#
1. Revenue Estimation#
Question: “Will this policy pay for itself?”
Process:
- Load representative sample (e.g., 200,000 tax returns representing 150M filers)
- Apply current law rules → aggregate revenue
- Apply reformed rules → aggregate revenue
- Difference = revenue impact
Output:
Baseline revenue: $2.1 trillion
Reformed revenue: $1.9 trillion
Cost: $200 billion2. Distributional Analysis#
Question: “Who wins and loses?”
Process:
- Calculate current tax for each household
- Calculate reformed tax for each household
- Group by income quintile (bottom 20%, next 20%, …)
- Average change by group
Output:
Income Group Average Change % Benefit
Bottom 20% +$2,000 +15%
Second 20% +$500 +2%
Middle 20% $0 0%
Fourth 20% -$300 -1%
Top 20% -$5,000 -2%
Conclusion: Progressive (helps lower incomes)3. Policy Reform Testing#
Question: “What if we change X?”
Scenarios:
- Increase standard deduction by $5,000
- Expand EITC to childless workers
- Add new child allowance
- Change capital gains rate
- Phase out deductions for high earners
Output: Revenue cost, distributional impact, administrative complexity
4. Marginal Tax Rate Analysis#
Question: “What’s my incentive to earn one more dollar?”
Why it matters:
- High MTRs discourage work
- Phase-outs can create 50%+ MTRs
- Benefits phase-outs add to tax MTRs
Output:
Income Level Federal MTR State MTR EITC Phase-out Effective MTR
$25,000 12% 5% 21% 38%
$50,000 22% 5% 0% 27%
$500,000 37% 10% 0% 47%Scope of This Research#
This survey covers:
In Scope#
- Microsimulation engines: Apply rules to population samples
- Tax-benefit systems: Income tax, payroll tax, refundable credits
- Country-specific models: US, UK, France (major implementations)
- Open-source libraries: Publicly available, reproducible
Out of Scope#
- Proprietary models: Government/think tank internal tools
- Commercial tax software: TurboTax, H&R Block (compliance, not modeling)
- Sales tax APIs: TaxJar, Avalara (transaction-level, not policy analysis)
- Spreadsheet models: Ad-hoc Excel/Google Sheets calculators
- Dynamic scoring: Behavioral response models (future research)
Target Users#
Primary Users#
- Policy analysts (government, think tanks)
- Academic researchers (economics, public finance)
- Advocacy organizations (evaluating proposals)
- Journalists (fact-checking, explainers)
User Needs#
- Transparency: See how calculations work
- Reproducibility: Others can verify results
- Flexibility: Test novel policy ideas
- Accuracy: Match official government projections
- Performance: Simulate 150M+ people in reasonable time
Required Skills#
- Python programming (intermediate)
- Tax policy knowledge (understand terms like AGI, MAGI, credits)
- Statistics (survey weighting, sampling)
- Microdata experience (Census, IRS data)
Learning curve: 2-3 months to become productive
Why Existing Solutions Fall Short#
Problem 1: Multi-State Complexity#
- Most tools focus on federal taxes
- State income taxes have 43 different rule sets
- Federal-state interactions (SALT deduction)
- Cross-border workers (NY resident, NJ job)
Gap: No comprehensive open-source multi-state model (until PolicyEngine 2024)
Problem 2: Property Tax#
- 1/3 of state/local revenue
- 3,000+ counties with unique rules
- Complex exemptions (homestead, senior, agricultural)
- Gap: No open-source property tax library exists
Problem 3: Sales Tax Research#
- 11,000+ jurisdictions
- Product-specific exemptions
- Gap: Commercial APIs exist (TaxJar), but not for policy research
- Too expensive for academics
- Not designed for counterfactual analysis
Problem 4: Integration#
- Comprehensive tax burden = income + payroll + property + sales
- Each tax type has separate tools (if any)
- Gap: No unified household tax burden calculator
Success Criteria#
A successful public finance modeling library should:
- Encode official rules accurately (validation against published examples)
- Handle edge cases (AMT, child tax credit phase-outs, NIIT)
- Scale to full population (150M+ tax units in US)
- Support counterfactual reforms (easy to modify rules)
- Provide distributional outputs (by income, age, geography)
- Be maintainable (annual tax law changes)
- Have comprehensive tests (known correct answers)
- Offer good documentation (examples, not just API reference)
Cross-Cutting Concerns#
Data Privacy#
- Microdata contains sensitive info (income, family structure)
- Public Use Files (PUFs) have reduced detail
- Some models use synthetic data (algorithmically generated)
Computational Performance#
- 150M tax units × 100+ calculations each = 15B operations
- Need efficient vectorized operations (NumPy)
- Typically 5-30 seconds per full simulation
Version Control#
- Tax laws change every year (TCJA sunset in 2026)
- Need to model historical years (for research)
- Parameters vs. structure (rate changes vs. new provisions)
International Portability#
- OpenFisca approach: Core engine + country packages
- Challenge: Each country has unique concepts (France: “quotient familial”, US: “filing status”)
Related Problems#
These research areas intersect with public finance modeling:
- 1.094 Constraint Solving: Budget optimization (maximize benefits, minimize tax)
- 1.101 PDF Processing: Extract tables from tax forms (IRS instructions)
- 1.301 Government Data Access: APIs for Census, IRS, BLS data
- 1.302 Budget Document Parsing: Extract spending data from CAFRs
- 1.303 Civic Entity Resolution: Match taxpayers across datasets
Why This Matters#
Quote from Tax Policy Center:
“Microsimulation models have become the standard for analyzing tax proposals. Without them, policy debates would rely on guesswork and ideology rather than evidence.”
Real impact:
- UK Treasury uses PolicyEngine for Universal Credit analysis (8M households)
- US CBO uses Tax-Calculator for official revenue scores
- France runs its social benefit system on OpenFisca code
Benefit: Better-informed policy, fewer unintended consequences, transparent debate.
S2: Comprehensive
S2: Prior Art - Existing Tools#
Overview#
This section catalogs existing public finance modeling libraries, their capabilities, limitations, and adoption patterns.
1. OpenFisca#
Links: Website | GitHub | Documentation
Language: Python License: AGPL-3.0 Maintenance: Active (latest commit: December 2024, 207 stars, 5,153 commits) Python Support: 3.9+
Description#
OpenFisca is a versatile microsimulation engine that models tax and benefit systems as code. It originated in France in 2011 and has been adopted by multiple governments internationally. The architecture separates the core engine (openfisca-core) from country-specific packages (OpenFisca-France, OpenFisca-Italy, etc.).
Design philosophy:
- Tax legislation should be expressed as executable code
- Same code used for simulation, administration, and compliance
- Web API enables non-programmers to run simulations
- International: Core engine works for any country’s rules
Key Features#
- Rules as code: Tax legislation expressed as Python functions with decorators
- Web API: REST interface for simulations without Python
- Survey data integration: Analyze reforms using census/administrative data
- Multi-country support: France, Italy, UK (via PolicyEngine fork), Tunisia, Senegal
- Interactive reforms: Calculate effects on single situations or entire populations
- Formula versioning: Track how rules change over time
- Period handling: Daily, monthly, yearly calculations
- Entity modeling: Persons, families, households, tax units
Architecture#
Installation#
Example Usage#
Notable Users#
- French government: Official tax-benefit modeling, used by Direction générale du Trésor
- Tunisian government: Social benefit eligibility
- Academic researchers: Worldwide tax policy studies
- International organizations: World Bank, ILO for policy analysis
Strengths#
- Mature codebase: 10+ years of development
- Battle-tested: Used for official government calculations
- International adoption: Proven in multiple countries
- Web API: Non-programmers can use it
- Formula versioning: Historical analysis possible
Limitations#
- Country package quality varies: France is comprehensive, others less so
- AGPL-3.0 license: Requires derivative works to be open-sourced (may restrict commercial use)
- Steep learning curve: Complex entity modeling, decorators
- Documentation inconsistency: Varies by country package
- Performance: Python loops can be slow for large datasets (improving)
Governance#
- Core maintained by OpenFisca team (nonprofit)
- Country packages maintained by respective governments/communities
- Monthly contributor calls
- RFC process for major changes
Sources: OpenFisca Documentation, GitHub, France Digital Service
2. PolicyEngine#
Links: Website | GitHub | Core Docs
Language: Python License: AGPL-3.0 Maintenance: Active (16 stars on core, 5,139 commits, 11 open PRs) Python Support: 3.10-3.13
Description#
PolicyEngine is a nonprofit platform offering free, open-source tax-benefit microsimulation for the US and UK. Built on a fork of OpenFisca-Core, it provides both Python libraries and web applications for policy analysis. Major milestone: As of 2024, PolicyEngine covers all 50 US states plus DC for comprehensive state income tax modeling.
Design philosophy:
- Make policy analysis accessible to everyone (web app)
- Combine traditional microsimulation with machine learning
- Use official government microdata + synthetic enhancement
- Free, no paywalls or usage limits
Key Features#
- Web application: No-code interface for reform design and analysis
- US coverage: Federal + all 50 states (launched 2024)
- UK model: Full UK tax-benefit system with official government use
- Machine learning: Addresses undersampling and measurement errors in survey data
- Individual & population analysis: Household impacts + distributional effects
- API access: Programmatic access to all functionality
- Real-time computation: Immediate results (< 1 second for individual, ~30 seconds for population)
- Household calculator: Input your situation, see your tax
Architecture#
PolicyEngine-Core → PolicyEngine-US / PolicyEngine-UK
Installation#
Example Usage#
Notable Users#
- UK HM Treasury: Official government use documented in algorithmic transparency records
- US policy researchers: Think tanks, advocacy organizations
- Academic institutions: Teaching and research
- Journalists: Fact-checking policy claims
Strengths#
- Web app accessibility: Non-coders can design reforms
- All 50 US states: Most comprehensive open-source US model
- Official UK adoption: Validated by government use
- Active development: 50+ commits/month
- ML enhancement: Better population representation
- Free API: No usage quotas
Limitations#
- US state models new: Launched in 2024, need more validation
- Requires domain knowledge: Understanding tax concepts still needed
- Web app limitations: Complex reforms may need Python
- AGPL-3.0 license: Same restrictions as OpenFisca
- Data dependencies: Uses proprietary enhancements to public data
Innovation: ML-Enhanced Microsimulation#
Traditional approach: Use survey data as-is (undersampling, measurement error)
PolicyEngine approach:
- Start with public microdata (CPS, FRS)
- Train ML models on administrative aggregates (IRS totals)
- Adjust weights and impute missing values
- Validate against published statistics
Result: Better matches to official revenue/poverty numbers
Governance#
- Nonprofit organization (PolicyEngine Inc.)
- Public GitHub development
- Community contributions welcome
- Partnerships with governments (UK Treasury)
Sources: PolicyEngine Website, Core Docs, UK Gov
3. Tax-Calculator#
Language: Python License: CC0 1.0 (Public Domain) Maintenance: Active (latest release: 6.4.0 on Feb 4, 2026, 292 stars, 124 releases) Python Support: 3.11-3.13
Description#
Tax-Calculator is the most established US federal income and payroll tax microsimulation model, maintained by the Policy Simulation Library (PSL). It’s widely used in policy analysis, think tank research, and academic studies. The model can estimate aggregate revenue and distributional effects of tax reforms when paired with representative population data.
Design philosophy:
- Academic rigor: extensive validation and testing
- Public domain: zero restrictions on use
- Federal focus: deep not broad (federal only, but comprehensive)
- Integration: Works with other PSL models (Cost-of-Capital-Calculator)
Key Features#
- Comprehensive US federal tax code: Income tax, payroll tax, refundable credits
- Marginal tax rates: Calculate MTRs for 18 different income types
- Reform analysis: Compare current law vs. proposed reforms
- Integration: Works with Cost-of-Capital-Calculator for business tax analysis
- TCJA modeling: Includes 2026 provision expirations (TCJA sunset)
- Extensive testing: Complete code coverage with hundreds of tests
- Public domain: No license restrictions on use or modification
- Historical capability: Model taxes back to 2013
Architecture#
Three main components:
- Policy: Tax rules and parameters
- Records: Microdata representing tax filers
- Calculator: Applies Policy to Records
Installation#
Example Usage#
Notable Users#
- Tax Policy Center: Major think tank co-founded by Urban Institute and Brookings
- Congressional Budget Office (CBO): Official US government budget analysis
- Academic researchers: Cited in hundreds of papers
- Policy advocacy organizations: Across the political spectrum
Strengths#
- Most cited: Academic gold standard for US federal tax analysis
- Public domain: No licensing concerns
- CBO validation: Used for official scoring
- Extensive tests: 100% code coverage
- Documentation: Tutorial, cookbook, API reference
- Historical analysis: Model past years
Limitations#
- Federal only: Does not model state or local taxes
- No benefit programs: Focuses on taxes, not SNAP/Medicaid/etc.
- Data requirements: Needs microdata (PUF or CPS) for population-level estimates
- Behavioral responses: Static model (no labor supply or saving responses)
- Learning curve: Requires understanding of tax terminology
Validation#
Tax-Calculator validates against:
- IRS Statistics of Income (SOI) aggregates
- Tax Policy Center estimates
- JCT (Joint Committee on Taxation) revenue scores
- CBO baseline projections
Published validation reports: Annual comparisons to official statistics
Governance#
- Policy Simulation Library (PSL) project
- Community-driven development
- Academic advisory board
- Annual contributor meetings
Sources: Tax-Calculator Docs, GitHub, PSL
4. Cost-of-Capital-Calculator (CCC)#
Language: Python License: CC0 1.0 (Public Domain) Maintenance: Active (latest release: 2.1.0 on Aug 25, 2025, 19 stars, 1,781 commits) Python Support: 3.11-3.13
Description#
Cost-of-Capital-Calculator (CCC) evaluates how US federal taxes affect corporate and non-corporate investment incentives. It computes marginal effective tax rates (METRs) on new investments by combining business asset data with individual tax filer microdata. CCC is part of the Policy Simulation Library ecosystem and integrates with Tax-Calculator.
Design philosophy:
- Business tax analysis complement to Tax-Calculator
- Academic foundation (Jorgenson-Hall cost of capital framework)
- Integration of entity and shareholder taxation
- Focus on investment incentives, not revenue estimation
Key Features#
- Marginal effective tax rates: Cost of capital by asset type and industry
- Corporate & non-corporate: Pass-through entities and C-corporations
- Individual integration: Models taxation at both entity and shareholder levels
- Depreciation schedules: Handles complex depreciation rules (MACRS, bonus depreciation)
- Policy scenarios: Analyze investment incentive effects of tax reforms
- Tax-Calculator integration: Combines business and individual tax modeling
- Web interface: Available at ccc.pslmodels.org (limited functionality)
- Asset-level detail: 96 asset types, 10 financing strategies
Concepts#
Marginal Effective Tax Rate (METR):
Interpretation:
- METR = 0%: No tax distortion (tax doesn’t affect investment)
- METR = 20%: Tax increases required return by 25% (1/0.8 - 1)
- METR < 0%: Tax subsidy (encourages investment)
Installation#
Example Usage#
Notable Users#
- Academic researchers: Studying corporate taxation and investment
- Policy analysts: Evaluating business tax reforms (R&D credits, depreciation)
- Think tanks: Combined with Tax-Calculator for comprehensive tax analysis
Strengths#
- Unique focus: Only open-source business tax library
- Academic rigor: Based on established economic framework
- Integration: Works with Tax-Calculator
- Asset detail: 96 asset types
- Public domain: No licensing restrictions
Limitations#
- US federal only: No state/local business taxes
- Static model: No dynamic investment or growth responses
- Complexity: Requires understanding of corporate tax and financial economics
- Data needs: Relies on SOI and other IRS data sources
- Integration required: Most useful when combined with Tax-Calculator
- Niche use case: Smaller user community than Tax-Calculator
Integration with Tax-Calculator#
Combined analysis:
- Tax-Calculator: Individual/household tax impacts
- CCC: Business investment incentives
- Together: Comprehensive tax reform analysis (household + business)
Governance#
- Policy Simulation Library (PSL) project
- Maintained by American Enterprise Institute researchers
- Community contributions welcome
Sources: CCC Website, GitHub, PSL
Comparison Matrix#
| Feature | OpenFisca | PolicyEngine | Tax-Calculator | CCC |
|---|---|---|---|---|
| Countries | France, Italy, Tunisia, etc. | US, UK | US only | US only |
| Scope | Income + benefits | Income + benefits + state | Federal income/payroll | Business taxes |
| Web App | Yes (limited) | Yes (full-featured) | Yes (basic) | Yes (basic) |
| License | AGPL-3.0 | AGPL-3.0 | Public Domain | Public Domain |
| API | REST | REST + Python | Python only | Python only |
| State Taxes | N/A | All 50 states | No | No |
| Maturity | 10+ years | 3 years | 15+ years | 5+ years |
| Government Use | France (official) | UK Treasury (official) | CBO, TPC | Research only |
| Learning Curve | Steep | Moderate | Moderate | Steep |
| Documentation | Good (varies) | Excellent | Excellent | Good |
Ecosystem Tools#
These aren’t microsimulation engines but are commonly used alongside them:
Data Access#
- census (Python): Census Bureau API wrapper
- tidycensus (R): Tidy interface to US Census data
- pandas (Python): DataFrame manipulation
Analysis#
- statsmodels (Python): Regression analysis for tax incidence
- scikit-learn (Python): ML for imputation, weighting
- survey (R): Survey-weighted estimation
Visualization#
- matplotlib/seaborn (Python): Charts, distributional plots
- plotly (Python): Interactive dashboards
- ggplot2 (R): Publication-quality graphics
Historical Context#
Evolution of Public Finance Modeling#
1960s-1980s: Government agencies build proprietary models
- No transparency
- No reproducibility
- Each country/agency duplicates effort
1990s-2000s: First open-source attempts
- TAXSIM (NBER): Web-based tax calculator (still active)
- Early microsimulation models (FORTRAN, SAS)
2010s: Modern Python era
- OpenFisca (2011): Rules as code philosophy
- Tax-Calculator (2010): PSL ecosystem
- PolicyEngine (2021): Web app + open source
2020s: Maturation
- Government adoption (France, UK)
- All 50 US states (PolicyEngine 2024)
- ML-enhanced microsimulation
Key Innovations#
- Rules as code (OpenFisca 2011)
- Public domain licensing (PSL 2010)
- Web accessibility (PolicyEngine 2021)
- ML data enhancement (PolicyEngine 2022)
- Comprehensive state coverage (PolicyEngine 2024)
Gaps Remain#
Even with these excellent tools, gaps persist:
- Property tax: No open-source library (3,000+ counties)
- Sales tax research: Commercial APIs exist, not policy modeling tools
- Multi-jurisdictional: Cross-border workers, part-year residents
- Behavioral responses: All tools are static models
- Integration: No unified household tax burden calculator (income + property + sales)
See S3 (Solution Space) for approaches to these gaps.
S3: Need-Driven
S3: Solution Space - Approaches to Filling Gaps#
Overview#
This section explores approaches to building new public finance modeling tools, addressing the gaps identified in S1 and S2. We focus on three major gaps:
- Property tax calculation libraries
- Sales tax modeling for policy research
- Multi-jurisdictional integration
Gap 1: Property Tax Calculation Libraries#
The Problem#
Property tax generates ~$600B annually in the US (1/3 of state/local revenue), yet there’s no open-source calculation library. Current state:
- Data availability: Many jurisdictions publish assessment data (open data portals)
- No calculation engine: Data exists, but no library to compute taxes
- Extreme locality: 3,000+ counties, each with unique rules
Approach 1A: Top-N Metro Areas (Incremental)#
Strategy: Start with largest metro areas, expand incrementally
Steps:
- Identify top 50 metro areas by population (~60% of US)
- For each metro, encode:
- Assessment methodology (market value, Prop 13, use-based)
- Rate structures (mill levies, voter overrides)
- Exemptions (homestead, senior, veteran, agricultural)
- Special districts (school, fire, library)
- Build validation dataset (scrape public tax bills)
- Create Python library with pluggable jurisdiction modules
Example API:
Pros:
- Focused scope (50 metros is achievable)
- High impact (covers majority of US population)
- Incremental validation (one jurisdiction at a time)
Cons:
- Still substantial work (50+ jurisdictions)
- Annual maintenance (rate changes, new levies)
- Incomplete coverage (rural areas, small cities)
Estimated Effort: 2-3 person-years for initial 50 metros
Approach 1B: Crowdsourced Rule Encoding#
Strategy: Build framework, let community contribute jurisdiction rules
Inspired by: OpenFisca’s country-package model
Steps:
Create core engine (exemption logic, rate application, aggregation)
Define jurisdiction rule DSL (domain-specific language):
Provide tools for validation (compare calculated vs. actual bills)
Gamify contributions (leaderboard, jurisdiction coverage map)
Pros:
- Scales beyond any single team’s capacity
- Community ownership → sustainability
- Framework reusable across all jurisdictions
Cons:
- Slow initial adoption (cold-start problem)
- Quality variance (need review process)
- Complex rules hard to encode (edge cases)
Estimated Effort: 1 person-year for framework, 3-5 years to 500+ jurisdictions
Approach 1C: ML-Assisted Estimation#
Strategy: Train models on assessment + tax bill data, skip rule encoding
Steps:
- Scrape public data (assessments, tax bills)
- Features:
[assessed_value, location, property_type, size, age, ...] - Target:
total_tax - Train gradient boosting model (XGBoost, LightGBM)
- Validate against holdout jurisdictions
Pros:
- No manual rule encoding needed
- Works even where rules are unclear
- Handles complex interactions automatically
Cons:
- Black box (can’t explain calculations)
- Requires substantial data (10k+ tax bills per jurisdiction)
- No counterfactual policy analysis (can’t model “what if senior exemption increased”)
- Legal/transparency concerns (how was this calculated?)
Use case: Rough estimates for real estate platforms, NOT policy analysis
Estimated Effort: 6 months for prototype, 1 year for production
Recommended Approach: Hybrid 1A + 1B#
Phase 1: Core team builds top 20 metros (Approach 1A)
- Proves viability
- Establishes patterns
- Creates validation methodology
Phase 2: Open to community (Approach 1B)
- Core team provides framework + examples
- Community contributes remaining metros
- Maintains quality through validation tools
Why not 1C? ML approach unsuitable for policy analysis (primary use case)
Gap 2: Sales Tax Modeling for Policy Research#
The Problem#
Commercial APIs (TaxJar, Avalara) exist for e-commerce compliance but:
- Expensive for research use ($1000s/month)
- Not designed for counterfactual analysis
- No access to underlying rate database
- Address-level precision unnecessary for policy modeling
Research needs: “What if we exempt groceries?” not “What tax for this exact address?”
Approach 2A: Open Rate Database (Snapshot)#
Strategy: Public database of sales tax rates, quarterly updates
Scope:
- State rates (all 50)
- County rates (top 200 by population)
- City rates (top 100 by population)
- Coverage: ~70-80% of US sales transactions
Data sources:
- State revenue department websites (public data)
- Municipal code databases
- Federation of Tax Administrators publications
Database schema:
Example:
Pros:
- Open data (no API fees)
- Sufficient for research (don’t need 11,000 jurisdictions)
- Quarterly updates adequate
- Enables policy modeling
Cons:
- Manual maintenance (quarterly scraping)
- Not suitable for real-time compliance
- Product categories broad (no UPC-level)
Estimated Effort: 3 months for initial build, 1 week/quarter maintenance
Approach 2B: Microsimulation Integration#
Strategy: Combine rate database (2A) with consumer expenditure data
Data source: Consumer Expenditure Survey (CEX) from BLS
Steps:
Load CEX microdata (household expenditures by category)
Map CEX categories to tax categories:
For each household, calculate sales tax:
Aggregate: total revenue, by income quintile, by state
Policy analysis:
Pros:
- Answers key policy questions
- Distributional analysis (regressivity)
- Integration with existing tools (Tax-Calculator)
Cons:
- CEX has small sample (~20k households)
- Measurement error in expenditures
- Doesn’t capture behavioral responses
Estimated Effort: 6 months (assuming 2A completed)
Approach 2C: Collaborate with Commercial Providers#
Strategy: Partner with TaxJar/Avalara for research access
Model:
- Commercial providers offer API access for research (reduced rate or free)
- Academic/nonprofit researchers use for policy analysis
- Providers benefit from research visibility
Precedent:
- Google Cloud academic grants
- AWS research credits
- Qualtrics academic licenses
Pros:
- No need to maintain rate database
- Access to full 11,000 jurisdictions
- Real-time accuracy
Cons:
- Dependent on provider cooperation
- May have usage limits
- Not truly open-source
- Commercial entity could terminate access
Estimated Effort: 3 months to negotiate partnership
Recommended Approach: 2A + 2B#
Why not 2C? Dependency on commercial entity undermines open research principles
Implementation:
- Build open rate database (2A) - 3 months
- Create microsimulation module (2B) - 6 months
- Validate against state revenue reports
- Publish as library + dataset
Gap 3: Multi-Jurisdictional Integration#
The Problem#
Comprehensive tax burden requires integrating:
- Federal income tax (Tax-Calculator ✓)
- State income tax (PolicyEngine ✓)
- Property tax (Gap 1)
- Sales tax (Gap 2)
Challenge: Each tax has different data requirements, computation order, interactions
Approach 3A: Orchestration Layer#
Strategy: Build wrapper that coordinates existing + new tools
Architecture:
Coordination challenges:
- Data harmonization (different income definitions)
- Computation order (federal affects state via SALT deduction)
- Entity mapping (tax units vs. households vs. individuals)
Pros:
- Leverages existing tools
- No duplication of effort
- Can start before Gaps 1 & 2 fully solved
Cons:
- Brittle (depends on all upstream libraries)
- API mismatches (different conventions)
- Version incompatibilities
Estimated Effort: 6 months for orchestration layer
Approach 3B: Unified Microsimulation#
Strategy: Build comprehensive model from scratch (like OpenFisca, but US + all taxes)
Scope:
- Federal income + payroll
- 50 state income taxes
- Property tax (top metros)
- Sales tax (policy-level)
Pros:
- Consistent API
- Optimized performance
- Designed for integration from start
Cons:
- Massive duplication of effort
- 5-10 person-years to match existing tools
- Maintenance burden enormous
Verdict: Not recommended (reinventing wheel)
Approach 3C: Incremental Integration into PolicyEngine#
Strategy: Contribute property + sales modules to PolicyEngine-US
Rationale:
- PolicyEngine already has federal + 50 states
- Active development, responsive maintainers
- Would create most comprehensive US model
Steps:
- Propose integration to PolicyEngine team
- Build property tax module using PolicyEngine’s framework
- Build sales tax module
- Submit PRs, iterate with maintainers
Pros:
- Single comprehensive tool
- Maintained by nonprofit
- Leverages PolicyEngine’s web app
- Community benefits
Cons:
- Dependent on PolicyEngine roadmap
- Must conform to their architecture
- Governance not under your control
Estimated Effort: 1 year (if accepted by PolicyEngine)
Recommended Approach: 3A (Short-term) + 3C (Long-term)#
Phase 1: Build orchestration layer (3A)
- Proves value of integration
- Works with current tools
- 6 months
Phase 2: Contribute to PolicyEngine (3C)
- Approach PolicyEngine team with working prototype
- Discuss integration
- If accepted: dedicate resources to contribution
- If not: maintain orchestration layer
Implementation Priorities#
Based on impact, feasibility, and user needs:
Priority 1: Sales Tax Research Tools (Gap 2)#
- Rationale: Highest impact/effort ratio
- Approach: 2A + 2B (open rate database + microsimulation)
- Timeline: 9 months
- Users: State policy analysts, think tanks, academics
Priority 2: Property Tax (Top 20 Metros) (Gap 1)#
- Rationale: Proof of concept for larger effort
- Approach: 1A (top metros, then expand)
- Timeline: 1 year for top 20
- Users: Local governments, real estate platforms, researchers
Priority 3: Multi-Jurisdictional Integration (Gap 3)#
- Rationale: Enables comprehensive tax burden analysis
- Approach: 3A (orchestration layer)
- Timeline: 6 months (after Gaps 1 & 2 progress)
- Users: Researchers studying overall tax progressivity
Cross-Cutting Technical Decisions#
Language: Python#
- All existing tools use Python
- Rich ecosystem (pandas, NumPy, scikit-learn)
- Easy integration
License: Public Domain (CC0) or Apache 2.0#
- CC0: Like Tax-Calculator (most permissive)
- Apache 2.0: If need contributor agreements
- NOT AGPL: Want to enable commercial use (real estate, compliance)
Data Format: Parquet#
- Efficient columnar storage
- Fast reads (pandas, Polars)
- Cross-language (R, Python, Julia)
Documentation: Jupyter Notebooks#
- Examples with real data
- Narrative + code
- Reproducible
Testing: Extensive Validation#
- Known correct answers (published tax bills)
- Edge cases (phase-outs, cliffs)
- Regression tests (changes don’t break existing)
Web API: Optional#
- Start with Python library (CLI + API)
- Add REST API if demand exists
- Learn from PolicyEngine’s web app success
Risks and Mitigations#
Risk 1: Maintenance Burden#
Problem: Tax rules change annually, data sources change
Mitigations:
- Build automated tests (detect when rules break)
- Annual update process (documented, scheduled)
- Community contributions (share maintenance)
- Funding model (grants, sponsorships for maintenance)
Risk 2: Data Access#
Problem: Some data proprietary (e.g., IRS PUF restricted)
Mitigations:
- Use public data (Census CPS, CEX, open data portals)
- Synthetic data generation (PolicyEngine approach)
- Partner with universities for data access
Risk 3: Legal/Liability#
Problem: What if calculations are wrong and someone relies on them?
Mitigations:
- Disclaimer: “Not tax advice, for research only”
- Extensive validation + testing
- Public domain license (no warranty)
- Insurance (if forming organization)
Risk 4: Adoption#
Problem: Researchers don’t switch from existing tools
Mitigations:
- Interoperability with existing tools (don’t require switching)
- Demonstrate value (fill gaps, don’t duplicate)
- Publish papers using the tool (lead by example)
- Partner with influential researchers
Success Metrics#
Adoption#
- GitHub stars, forks, downloads (PyPI)
- Academic citations (papers using the tool)
- Government use (any official adoption)
Quality#
- Validation accuracy (within X% of official statistics)
- Test coverage (
>90%) - Issues opened/closed ratio
Impact#
- Policy reforms analyzed using the tool
- Legislation influenced by analyses
- Improved transparency (vs. black-box models)
Related Work#
These approaches build on patterns from:
- OpenFisca: Pluggable country packages, rules as code
- PolicyEngine: Web accessibility, ML enhancement
- Tax-Calculator: Academic rigor, extensive validation
- OpenStreetMap: Crowdsourced geographic data (inspiration for 1B)
- Zillow/Redfin: Real estate data platforms (users for property tax tools)
Recommendations Summary#
| Gap | Recommended Approach | Timeline | Priority |
|---|---|---|---|
| Sales Tax | Open rate DB + microsimulation | 9 months | 1 (High) |
| Property Tax | Top 20 metros → crowdsource | 1 year → ongoing | 2 (Medium) |
| Integration | Orchestration layer | 6 months | 3 (Low) |
Long-term vision: Contribute to PolicyEngine for unified US model
Next step: Prototype sales tax research tool (highest impact/effort ratio)
S4: Strategic
S4: Selection Criteria - Evaluating Public Finance Modeling Tools#
Overview#
This section provides criteria for evaluating existing public finance modeling tools and assessing approaches to filling identified gaps.
Evaluation Framework#
1. Functional Requirements#
1.1 Scope & Coverage#
What to evaluate:
- Geographic coverage (federal, state, local)
- Tax types covered (income, payroll, property, sales, excise)
- Benefit programs (SNAP, Medicaid, housing, EITC)
- Time period support (current law, historical, future projections)
Scoring:
- ⭐⭐⭐⭐⭐ Comprehensive (federal + state + local, all major taxes/benefits)
- ⭐⭐⭐⭐ Broad (federal + state OR all major taxes at one level)
- ⭐⭐⭐ Moderate (single level, multiple taxes OR single tax, multiple levels)
- ⭐⭐ Limited (single level, single tax type)
- ⭐ Narrow (proof of concept only)
Examples:
- PolicyEngine US: ⭐⭐⭐⭐ (Federal + 50 states income/payroll/benefits, no property/sales)
- Tax-Calculator: ⭐⭐⭐ (Federal only, comprehensive)
- Proposed property tax tool (top 20 metros): ⭐⭐ (Limited geographic)
1.2 Accuracy & Validation#
What to evaluate:
- Matches official statistics (IRS, Census, state revenue departments)
- Validation methodology (published tests, known correct answers)
- Edge case handling (phase-outs, AMT, cliffs)
- Error rates (% deviation from official aggregates)
Scoring:
- ⭐⭐⭐⭐⭐ Official use by government (CBO, HM Treasury)
- ⭐⭐⭐⭐ Validated against official stats (
<5% error) - ⭐⭐⭐ Some validation (published tests, no official comparison)
- ⭐⭐ Limited validation (unit tests only)
- ⭐ No validation
Examples:
- Tax-Calculator: ⭐⭐⭐⭐⭐ (CBO uses it)
- PolicyEngine UK: ⭐⭐⭐⭐⭐ (HM Treasury uses it)
- OpenFisca France: ⭐⭐⭐⭐⭐ (Official French government use)
Validation evidence:
1.3 Policy Reform Capability#
What to evaluate:
- Ease of specifying reforms (change rates, add provisions, modify phase-outs)
- Counterfactual analysis (compare baseline vs. reform)
- Interaction effects (how changes ripple through system)
- Behavioral modeling (optional: labor supply responses)
Scoring:
- ⭐⭐⭐⭐⭐ Parameter changes + structural reforms, tested framework
- ⭐⭐⭐⭐ Parameter changes + some structural reforms
- ⭐⭐⭐ Parameter changes only (rates, thresholds)
- ⭐⭐ Limited reform capability (hard-coded scenarios)
- ⭐ No reform capability (current law only)
Examples:
- PolicyEngine: ⭐⭐⭐⭐⭐ (Web app for designing reforms, structural changes possible)
- Tax-Calculator: ⭐⭐⭐⭐ (Parameter reforms easy, structural reforms require code)
1.4 Distributional Analysis#
What to evaluate:
- Outputs by income quintile/decile/percentile
- Winners and losers (% benefit, $ change)
- Poverty impacts (SPM, FPL)
- Demographic breakdowns (age, race, geography)
Scoring:
- ⭐⭐⭐⭐⭐ Multiple dimensions (income + age + geography + demographics)
- ⭐⭐⭐⭐ Income quintiles + one other dimension
- ⭐⭐⭐ Income quintiles only
- ⭐⭐ Aggregate statistics only
- ⭐ No distributional output
Examples:
- PolicyEngine: ⭐⭐⭐⭐⭐ (Income, poverty, age, state, race)
- Tax-Calculator: ⭐⭐⭐⭐ (Income, age, filing status)
2. Technical Quality#
2.1 Performance#
What to evaluate:
- Runtime for full population (150M+ individuals)
- Memory usage
- Optimization techniques (vectorization, caching)
- Scalability (can it handle larger datasets?)
Benchmarks:
- ⭐⭐⭐⭐⭐ < 10 seconds for full population
- ⭐⭐⭐⭐ 10-60 seconds
- ⭐⭐⭐ 1-5 minutes
- ⭐⭐ 5-30 minutes
- ⭐ > 30 minutes or doesn’t scale
Examples:
- PolicyEngine: ⭐⭐⭐⭐⭐ (~30 seconds for 300M people)
- Tax-Calculator: ⭐⭐⭐⭐ (~2 minutes)
Measurement:
2.2 Code Quality#
What to evaluate:
- Test coverage (%)
- Documentation (API reference, tutorials, examples)
- Type hints (Python 3.6+)
- Linting (consistent style)
- CI/CD (automated testing)
Scoring:
- ⭐⭐⭐⭐⭐
>90% coverage, comprehensive docs, full CI/CD - ⭐⭐⭐⭐ 70-90% coverage, good docs, basic CI
- ⭐⭐⭐ 50-70% coverage, minimal docs
- ⭐⭐
<50% coverage, API reference only - ⭐ No tests, no docs
Examples:
- Tax-Calculator: ⭐⭐⭐⭐⭐ (100% coverage, extensive docs)
- OpenFisca: ⭐⭐⭐⭐ (Good coverage, variable docs by country)
2.3 Maintainability#
What to evaluate:
- Active development (commits/month, recent release)
- Contributor community (number of contributors, responsiveness)
- Governance (who maintains, funding model)
- Breaking changes (API stability)
Scoring:
- ⭐⭐⭐⭐⭐ Active (weekly commits), funded, multiple maintainers
- ⭐⭐⭐⭐ Active (monthly commits), some funding
- ⭐⭐⭐ Occasional updates (quarterly)
- ⭐⭐ Minimal updates (annual)
- ⭐ Abandoned (no updates in 2+ years)
Examples:
- PolicyEngine: ⭐⭐⭐⭐⭐ (50+ commits/month, nonprofit funded)
- Tax-Calculator: ⭐⭐⭐⭐ (Monthly updates, PSL supported)
- tenforty: ⭐⭐ (Last update 2018)
2.4 Integration & Interoperability#
What to evaluate:
- APIs (REST, Python, other languages)
- Data formats (Parquet, CSV, JSON)
- Compatibility with other tools
- Extension mechanisms (plugins, custom rules)
Scoring:
- ⭐⭐⭐⭐⭐ REST API + Python + documented extension framework
- ⭐⭐⭐⭐ Python API + extension framework
- ⭐⭐⭐ Python API only, some documentation for extensions
- ⭐⭐ Python API, hard to extend
- ⭐ Single-file script, no API
Examples:
- OpenFisca: ⭐⭐⭐⭐⭐ (REST API, Python, country packages)
- PolicyEngine: ⭐⭐⭐⭐⭐ (REST API, Python, web app)
- Tax-Calculator: ⭐⭐⭐⭐ (Python API, parameter system)
3. Usability#
3.1 Learning Curve#
What to evaluate:
- Prerequisites (programming, tax knowledge)
- Documentation quality (tutorials, examples)
- Community support (forums, Stack Overflow, GitHub issues)
- Quickstart time (how long to first working example)
Scoring:
- ⭐⭐⭐⭐⭐ Web interface (no code) OR excellent tutorials (
<1day) - ⭐⭐⭐⭐ Good tutorials, active community (1-3 days)
- ⭐⭐⭐ API reference, some examples (1-2 weeks)
- ⭐⭐ Minimal docs, expert only (1+ months)
- ⭐ No docs, read the code
Examples:
- PolicyEngine: ⭐⭐⭐⭐⭐ (Web app requires zero coding)
- Tax-Calculator: ⭐⭐⭐⭐ (Excellent tutorials)
- OpenFisca: ⭐⭐⭐ (Steeper curve, entity modeling complex)
3.2 Accessibility#
What to evaluate:
- Cost (free, freemium, paid)
- Installation difficulty (pip install, Docker, complex setup)
- Data availability (includes sample data? requires proprietary data?)
- Target audience (researchers, policymakers, general public)
Scoring:
- ⭐⭐⭐⭐⭐ Free, easy install, sample data included, web interface
- ⭐⭐⭐⭐ Free, easy install, sample data
- ⭐⭐⭐ Free, moderate install, need to source data
- ⭐⭐ Free but complex setup OR paid but easy
- ⭐ Expensive and complex
Examples:
- PolicyEngine: ⭐⭐⭐⭐⭐ (Free web app, includes data)
- Tax-Calculator: ⭐⭐⭐⭐ (Free, pip install, includes CPS data)
- TaxJar: ⭐⭐ (Paid API, $1000+/month)
4. Licensing & Governance#
4.1 License#
What to evaluate:
- Open source? (OSI-approved license)
- Permissiveness (MIT, Apache vs. GPL, AGPL)
- Commercial use allowed?
- Attribution requirements
Scoring:
- ⭐⭐⭐⭐⭐ Public domain (CC0) or highly permissive (MIT, Apache)
- ⭐⭐⭐⭐ Permissive with attribution (BSD, Apache 2.0)
- ⭐⭐⭐ Weak copyleft (LGPL)
- ⭐⭐ Strong copyleft (GPL, AGPL)
- ⭐ Proprietary
Examples:
- Tax-Calculator: ⭐⭐⭐⭐⭐ (CC0 - Public Domain)
- PolicyEngine: ⭐⭐ (AGPL-3.0)
- OpenFisca: ⭐⭐ (AGPL-3.0)
Why it matters:
- AGPL requires derivative works to be open-source (affects commercial products)
- Public domain enables maximum reuse
- For public finance: transparency matters, so open-source preferred even if copyleft
4.2 Governance#
What to evaluate:
- Who controls direction? (government, nonprofit, company, community)
- Funding model (grants, donations, commercial)
- Contributor process (easy to contribute? CLA required?)
- Decision-making (BDFL, committee, consensus)
Scoring:
- ⭐⭐⭐⭐⭐ Open governance, multiple funders, easy contributions
- ⭐⭐⭐⭐ Nonprofit/academic, some funders
- ⭐⭐⭐ Government-backed, clear roadmap
- ⭐⭐ Single company/individual, limited input
- ⭐ Closed governance
Examples:
- Tax-Calculator: ⭐⭐⭐⭐⭐ (PSL open governance, community-driven)
- PolicyEngine: ⭐⭐⭐⭐ (Nonprofit, transparent, accepts contributions)
- OpenFisca: ⭐⭐⭐⭐ (French gov + community)
5. Impact & Adoption#
5.1 User Base#
What to evaluate:
- Official government use
- Academic citations (Google Scholar)
- Industry use (think tanks, media)
- GitHub metrics (stars, forks, downloads)
Scoring:
- ⭐⭐⭐⭐⭐ Official government use + widespread academic/industry
- ⭐⭐⭐⭐ Government use OR extensive academic citations
- ⭐⭐⭐ Some academic use, niche adoption
- ⭐⭐ Small community, few citations
- ⭐ No known users
Examples:
- Tax-Calculator: ⭐⭐⭐⭐⭐ (CBO, TPC, 100+ academic papers)
- PolicyEngine: ⭐⭐⭐⭐⭐ (UK HM Treasury, growing US adoption)
- OpenFisca: ⭐⭐⭐⭐⭐ (French government, international)
5.2 Influence on Policy#
What to evaluate:
- Has it influenced actual legislation?
- Used in official government projections?
- Cited in policy debates?
- Impact on public understanding?
Evidence:
- Direct: “CBO used Tool X to score Bill Y”
- Indirect: Academic papers using tool cited in Congressional testimony
- Public: Media coverage using tool’s analyses
Examples:
- Tax-Calculator: Influenced TCJA debate (2017), CBO uses for official scores
- PolicyEngine: UK Treasury uses for Universal Credit modeling
- OpenFisca: French social benefit system runs on OpenFisca code
Applying the Framework#
Existing Tools Scorecard#
| Criterion | OpenFisca | PolicyEngine | Tax-Calculator | CCC |
|---|---|---|---|---|
| Scope & Coverage | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ |
| Accuracy | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Reform Capability | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Distributional | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ |
| Performance | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Code Quality | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Maintainability | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Integration | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Learning Curve | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ |
| Accessibility | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| License | ⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Governance | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| User Base | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Policy Influence | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| TOTAL | 59/70 | 67/70 | 63/70 | 44/70 |
Interpretation#
PolicyEngine (67/70): Best overall, especially for accessibility (web app) and comprehensive US coverage
Tax-Calculator (63/70): Gold standard for academic use, public domain license
OpenFisca (59/70): Most international, mature, but steeper learning curve
CCC (44/70): Specialized tool, limited audience, but fills unique niche
Recommendations for Tool Selection#
Use Case 1: US Federal Tax Policy Analysis#
Recommendation: Tax-Calculator
Why:
- CBO uses it (official validation)
- Public domain license
- Most cited in academic literature
- Comprehensive federal tax modeling
When to use PolicyEngine instead:
- Need state taxes
- Want web interface
- Non-programmer audience
Use Case 2: US State + Federal Integration#
Recommendation: PolicyEngine US
Why:
- Only tool with all 50 states (as of 2024)
- Active development
- Web app for accessibility
Limitations:
- AGPL license (if building commercial product)
- State models new (need more validation)
Use Case 3: International (Non-US)#
Recommendation: OpenFisca
Why:
- Proven in multiple countries
- Core engine language-agnostic
- Government adoption (France, Tunisia)
Check: Does country package exist? Quality varies.
Use Case 4: Business Tax / Investment Analysis#
Recommendation: Cost-of-Capital-Calculator
Why:
- Only open-source business tax tool
- Integrates with Tax-Calculator
- Based on established economic framework
Combine with: Tax-Calculator for comprehensive analysis
Use Case 5: Property Tax (Future)#
Recommendation: Build new tool (none exist)
Approach: See S3 (Approach 1A: Top metros, incremental)
Evaluation criteria when choosing approach:
- Incremental (1A) scores high on feasibility, moderate on coverage
- Crowdsourced (1B) scores high on coverage, moderate on feasibility
- ML (1C) scores high on automation, low on explainability
For policy research: Avoid ML (1C), prefer rule-based (1A/1B)
Use Case 6: Sales Tax Research (Future)#
Recommendation: Build new tool (commercial APIs not suitable)
Approach: See S3 (Approach 2A + 2B: Open rate database + microsimulation)
Why not TaxJar/Avalara:
- Expensive ($1000+/month)
- Not designed for counterfactual analysis
- No access to underlying data
Decision Matrix for New Tool Development#
Should You Build a New Tool?#
| Question | If YES → | If NO → |
|---|---|---|
| Does existing tool cover this? | Use existing | Consider building |
| Can you contribute to existing? | Contribute | Build standalone |
| Do you have 2+ person-years? | Maybe build | Probably don’t |
| Is there ongoing funding? | Maybe build | Probably don’t |
| Are users waiting? | Build | Don’t build |
Example: Property Tax Tool#
- Existing tool? NO → Consider building
- Contribute to existing? NO (none exist) → Build standalone
- Resources? YES (grant-funded) → Build
- Funding? YES (3-year grant) → Build
- Users? YES (real estate platforms, local govs) → BUILD
Example: Better OpenFisca Web UI#
- Existing tool? YES (OpenFisca has web API) → Use existing
- Contribute to existing? YES (open-source) → CONTRIBUTE, don’t fork
- Resources? Doesn’t matter → Contribute
- Funding? Doesn’t matter → Contribute
- Users? Doesn’t matter → Contribute
Red Flags: When NOT to Build#
- Duplicating Tax-Calculator: Federal US taxes are solved problem
- Country-specific without maintenance plan: Will bitrot with annual tax changes
- Proprietary data requirements: Users can’t access = tool unusable
- No validation strategy: How will you know if it’s correct?
- One-person project, no succession: Bus factor = 1
- “Better OpenFisca” without distinct value: Just contribute instead
Green Lights: When TO Build#
- Clear gap: No existing open-source solution (property tax, sales tax research)
- User demand: Researchers/policymakers asking for it
- Validation path: Can compare to official statistics
- Maintenance plan: Funding for 3+ years
- Interoperability: Works with existing tools
- Unique value: Can’t be achieved by contributing to existing tool
Summary: Choosing Wisely#
For using existing tools:
- US federal: Tax-Calculator (academic) or PolicyEngine (accessible)
- US state: PolicyEngine US (only comprehensive option)
- International: OpenFisca (if country package exists)
- Business tax: Cost-of-Capital-Calculator
For building new tools:
- Property tax: High priority, clear gap, user demand ✅
- Sales tax research: Medium priority, clear gap, moderate demand ✅
- Multi-jurisdictional: Build orchestration, contribute to PolicyEngine ✅
- Duplicate existing: Don’t do it ❌
The golden rule: Contribute to existing tools when possible, build new tools when necessary.