Daniel Hardesty Lewis

Data Scientist & Researcher

Developing interpretable ML systems and scalable computational infrastructure.

I'm a researcher at Columbia University focused on explainable AI and distributionally robust machine learning for high-stakes financial applications.

Previously, I spent 5+ years at the Texas Advanced Computing Center scaling climate and flood models on world-leading supercomputers, leading projects for the $40M Texas Disaster Information System.

My research has been published in ACM TIST, received a mention in the AAAI Presidential Address, and contributed to the Bagnold Medal in geomorphology.

I founded Summit Geospatial to deliver the highest quality terrain data in Texas, and I'm building PoliBOM (Top 5% YC applicant) for tariff intelligence.

About

I apply high-performance computing techniques to terrain and flood models, developing reproducible and scalable workflows from source data all the way out to web services—at scales as small as a parcel and as large as countries.

My current research focuses on developing attribution methods that remain faithful under distribution shift for high-stakes financial applications, combining insights from variational inference and robust optimization.

Education

M.S. Urban Planning

Columbia University · Expected 2026

B.S. Pure Mathematics

University of Texas at Austin · 2017

Certificate, Scientific Computation

University of Texas at Austin · 2017

Languages

English (Native)Spanish (Professional)French (Limited)

Experience

2024 — Present

Research Assistant · Columbia University

Working under Dir. Financial Engineering Ali Hirsa on latent factor models achieving better than commercial R² on backtested holdout prediction. Developing SHAP explainability methods for financial deep learning.

VAEFinancial MLExplainabilitySHAP
2023 — Present

Founder · Summit Geospatial

Developed the highest quality seamless elevation data for Texas. Targeting licensing to AI labs including OpenAI and Cohere.

GeospatialLiDARHPCAWS
2024 — Present

Co-founder · PoliBOM

Top 5% YC applicant. Built tariff intelligence platform that influenced KPMG's Tariff Modeller launch.

RegTechNext.jsPostgreSQL
2021 — 2023

Senior Data Scientist & Technical Lead · Texas Advanced Computing Center

Led principal project for $40M TDIS disaster resiliency initiative. Scaled climate models on world-leading supercomputers with million-node jobs. Contributed research towards Prof. Passalacqua's Bagnold Medal.

HPCGDALPythonFortranParFlow
2018 — 2021

Data Scientist & Research Engineer · Texas Advanced Computing Center

Competed in DARPA World Modelers program. Research mentioned in AAAI Presidential Address by Prof. Yolanda Gil. Deep learning consulting for Petrobras.

DARPADeep LearningMODFLOWHAND
2018 — 2020

Co-instructor · University of Texas at Austin

Taught graduate courses: Machine Learning for the Geosciences, Intelligent Systems for the Geosciences, Scientific Computation (C++, CUDA, HPC, optimization).

TeachingCUDAC++Machine Learning

Projects

Properlytic

Individual home price forecasting using VAE architecture. Achieved 12% MAPE against Zillow's 8.4% in Manhattan—the hardest US market. Interest from a16z & MetaProp.

VAEReal EstatePyTorch

Texas Disaster Information System

Led development of core elevation data layer for web-based spatial data system supporting resilient decision-making at state and local levels. Part of $40M GLO initiative.

HPCPostGISGDALDash

Real-Time Inundation Mapping

Developed computationally efficient methods to produce high-resolution (1m) flood inundation maps from National Water Model outputs for emergency response personnel.

PythonRasterIOGeoFlood

MINT Platform

Integrated climate, hydrology, agriculture, and socioeconomic models for DARPA World Modelers. Published in ACM TIST and mentioned in AAAI Presidential Address.

DARPAModel IntegrationOntologies

Flood Hazard Assessment

Partnered with US Army Corps of Engineers to statistically model compound flood hazards in coastal Texas, integrating high-resolution topographic data with hydrological models.

USACEHEC-RASStatistics

terrain_aggregator

Workflow to aggregate terrain imagery at scale to a single seamless image dataset. 11 stars on GitHub.

ShellGDALHPC

Publications

Peer-Reviewed Articles

2021

Artificial Intelligence for Modeling Complex Systems: Taming the Complexity of Expert Models to Improve Decision Making

Gil, Y., Garijo, D., Khider, D., Knoblock, C.A., et al. (incl. D. Hardesty Lewis)

ACM Transactions on Interactive Intelligent Systems (TIST), 11(2)

89 citations
2019

An Intelligent Interface for Integrating Climate, Hydrology, Agriculture, and Socioeconomic Models

Garijo, D., et al. (incl. D. Hardesty Lewis)

ACM IUI'19

6 citations
2018

A Semantic Model Catalog to Support Comparison and Reuse

Garijo, D., et al. (incl. D. Hardesty Lewis)

9th International Congress on Environmental Modelling and Software

6 citations

Selected Presentations

2021

Estimating Inundation Extent and Depth from National Water Model Outputs and High Resolution Topographic Data

Presented to NOAA

2020

Vector and Raster GIS Processing with Python in Jupyter Notebooks

TACC Institute of Planet Texas 2050

2017

From MODFLOW-96 to MODFLOW-2005, ParFlow, and Others

American Geophysical Union Fall Meeting

Skills

Programming Languages

PythonBashPL/pgSQLFortranC++PerlTypeScript

Libraries & Frameworks

GDALCUDAGeoPandasRasterIONumPyPyTorchPlotly/Dash

Database Systems

PostgreSQLPostGISRedisSupabase

Scientific Software

MODFLOWHANDHEC-RASParFlowGeoFlood

Cloud & Infrastructure

AWSAzureDockerSingularityHPC

ML & Statistics

VAETree-based modelsSHAPDBSCANHierarchical clustering

Contact

I'm always interested in discussing research collaborations, consulting opportunities, or just connecting with others working on interpretable ML, HPC, or geospatial computing.

Currently based in New York City, pursuing my M.S. at Columbia University.

Built with v0