Moosehead
Internal querying platform replacing Stata + Dropbox for analyst data extraction
Work·Aug 2022 — Present·Team Lead & Implementer
Problem
- Analysts at Grassroots used Stata as their primary data extraction tool, pulling from CSV, JSON, and XML files on Dropbox.
- Dropbox synced to every analyst laptop, so posed a security risk.
- Stata runs a single process at a time, so each query took 10 to 15 minutes. Analysts often needed several iterations before a client ask was correctly filtered, turning a morning request into a multi-hour afternoon scramble.
- New analysts took roughly a month before they could be trusted to write production queries, and query quality and workflow varied wildly with each analyst, producing inconsistent client deliverables.
- The existing tech was the bottleneck preventing Grassroots from continuing to scale.
Approach
- Moved all analyst data to the cloud, pulled Dropbox access, and tied data access to individual identities with real enforcement mechanisms instead of implicit trust.
- Built a querying UI on top of our output schema so analysts could generate SQL through the portal instead of writing Stata code. Rolled it out iteratively, covering the 80 to 90 percent of common client asks first, then chipping away at complex cases.
- Added client hierarchy and tenants, per-query permissions, shared query history, rerunnable queries, and a secure file exchange for client deliverables and source material.
- Chose Vue, GCP, and BigQuery because that matched the existing team's skills. Rather than slow everyone down teaching them a new stack, I learned Vue and GCP myself and led the team in what they already knew.
- On the backend, much of my hands-on work was translating a Pydantic model of the output schema into valid, optimized BigQuery SQL.
Tech Stack
Frontend
VueJavaScriptPrimeVue
Backend
PythonFlask
Data & Storage
BigQueryFirestoreCloud Storage
Infrastructure
GCPCloud RunCloud FunctionsFirebase HostingFirebase AuthFirebase AnalyticsSecret ManagerArtifact Registry
Testing
Playwrightpytest
Other
DockerGitHub Actions
Outcomes
- End-to-end client deliverable cycles went from hours to single-digit minutes.
- Client continuity no longer depends on email threads and manual Google Sheets. Assignments, history, and outputs live in the system and are shared across the analyst team.
- Eliminated local data exposure by moving to identity-based access controls. No analyst laptop holds a copy of the dataset.
- Unlocked company growth from ~$5M to more than $30M in ARR.
Query time
15 min< 15 sec
Deliverable cycle
hoursminutes
Weekly requests
10,000+
Analyst time saved
100s of hrs/week