CatalystOps analyzes your PySpark and Databricks code inline as you type — 30+ anti-pattern detectors, dry-run plan analysis on your cluster, and actionable fixes. No context switching.
df.isEmpty() instead —
it short-circuits on the first partition.
if not df.isEmpty(): ...
From static code analysis to live Databricks plan inspection — without leaving the editor.
Detects 30+ PySpark and Databricks anti-patterns inline as you type. No cluster required. Catches collect(), UDFs, cross joins, unsafe writes, SQL injection, schema drift, and more.
Submits a neutralized version of your script to Databricks (cluster or Serverless) and returns the physical Catalyst plan — with sort-merge join detection, broadcast thresholds, shuffle analysis, and cost estimation.
Interactive sidebar tree of the physical plan with per-node cost scores. One-click DAG webview. Context-aware quick fixes directly on plan nodes — broadcast hint, repartition, persist, AQE config.
Tracks DBU and dollar spend per period directly from
system.billing.usage
with a 1-hour cache. After each serverless run, optionally fetches actual DBU consumption.
Tracks inferred schemas across DataFrames. Validates join column existence and type compatibility. Detects union column-order mismatches that silently corrupt data at runtime.
Exposes a Streamable HTTP MCP server auto-discovered by VS Code 1.99+. Lets Claude and other AI tools analyze your PySpark code, fetch billing summaries, and run dry runs through natural language.
Install from the VS Code Marketplace. The moment you open a
.py
file, local analysis kicks in — no configuration needed.
30+ rules light up immediately for any PySpark anti-patterns.
Add your workspace URL and personal access token via
CatalystOps: Configure Databricks Connection.
Pick cluster or Serverless execution mode. CatalystOps reads your
~/.databrickscfg
automatically if it exists.
Press ⌘⇧K to submit the current file. CatalystOps neutralizes side-effects, executes the Catalyst planner on your cluster, and returns the physical plan with cost annotations, join strategies, and actionable fixes — all in the sidebar.
Rules span Spark actions, joins, streaming, Delta Lake, DLT pipelines, and security.
CatalystOps ships a built-in MCP server auto-discovered by VS Code 1.99+. Claude and other AI clients can call CatalystOps tools directly — analyze code, fetch billing data, run dry-runs, and read plan results through natural language.
Free, open-source, and available for any Databricks or PySpark project.
Also available on Open VSX for Cursor, Theia, and other editors.