Performance Guide¶

This guide covers observed performance for sqlalchemy-cubrid and practical optimization patterns.

Table of Contents¶

Overview
Benchmark Results
Performance Characteristics
Optimization Tips
Running Benchmarks

Overview¶

sqlalchemy-cubrid adds SQLAlchemy Core/ORM behavior on top of the CUBRID Python driver.

flowchart LR
    App[Application / ORM Models] --> SA[SQLAlchemy Core / ORM]
    SA --> Dialect[sqlalchemy-cubrid Dialect]
    Dialect --> Driver[pycubrid DBAPI]
    Driver --> CAS[CAS Binary Protocol]
    CAS --> Server[(CUBRID Server)]

flowchart TD
    Query[ORM Query] --> Compile[SQL compilation]
    Compile --> Bind[Parameter binding]
    Bind --> Execute[DBAPI execute]
    Execute --> Fetch[Row fetch + object materialization]
    Fetch --> AppOut[Application objects]

Benchmark Results¶

Source: cubrid-benchmark

Environment: Intel Core i5-9400F @ 2.90GHz, 6 cores, Linux x86_64, Docker containers.

Baseline driver workload: Python pycubrid vs PyMySQL, 10000 rows x 5 rounds.

Scenario	CUBRID (pycubrid baseline)	MySQL (PyMySQL)	Ratio (CUBRID/MySQL)
insert_sequential	10.47s	1.74s	6.0x
select_by_pk	15.99s	3.52s	4.5x
select_full_scan	10.31s	1.86s	5.5x
update_indexed	10.70s	2.19s	4.9x
delete_sequential	10.75s	2.10s	5.1x

Note: SQLAlchemy adds extra overhead for SQL compilation, ORM identity mapping, and object creation.

Issue #104 — ORM dialect profile and optimization¶

Measured on the local benchmark environment used for Issue #104:

Scenario	Before	After	Delta
Tier 2 single-row CRUD (mean)	155.99 ms	268.41 ms	72.1% slower*
Tier 2 bulk insert (100 rows, mean)	370.33 ms	356.09 ms	3.9% faster
Tier 2 bulk insert (1000 rows, mean)	2864.56 ms	2511.52 ms	12.3% faster
Tier 2 query builder select-all (mean)	39.76 ms	31.83 ms	19.9% faster

Profiling with scripts/profile_orm.py showed that dialect-layer self time is a very small fraction of total request time on CRUD-heavy ORM paths, with most elapsed time still spent in SQLAlchemy ORM/Core coordination and the pycubrid driver. In the profiled workload captured for Issue #104:

dialect-layer self time was 0.0009% of total request time
SQLAlchemy Core self time was 6.55% of total request time
pycubrid self time was 7.01% of total request time
ORM select overhead vs raw SQL select was 18.08% of ORM select time

Top dialect-layer hotspots from the compilation-focused profiler pass were:

compiler.py:visit_on_duplicate_key_update
compiler.py:limit_clause
compiler.py:update_limit_clause
compiler.py:<dictcomp> inside visit_on_duplicate_key_update
compiler.py:replace inside visit_on_duplicate_key_update

Shipped optimization:

Enabled SQLAlchemy 2.x insertmanyvalues for the CUBRID dialect (use_insertmanyvalues=True and use_insertmanyvalues_wo_returning=True) so ORM bulk inserts can batch into multi-values INSERT forms without changing the public API.

* The single-row CRUD benchmark showed high variance in this local run, so the clear repeatable improvement from this change set is the bulk insert path rather than singleton unit-of-work latency.

Performance Characteristics¶

The dialect inherits pycubrid transport behavior and CAS protocol costs.
Core query compilation is fast but non-zero; repeated dynamic SQL can accumulate overhead.
ORM paths add identity map and model materialization costs compared to raw DBAPI usage.
Pool configuration strongly impacts latency under concurrency.
Bulk APIs and Core statements generally outperform row-by-row ORM unit-of-work patterns.

Optimization Tips¶

Configure pooling explicitly (example: pool_size, max_overflow, pool_pre_ping=True).
Use SQLAlchemy Core for high-volume bulk writes and large read pipelines.
Use executemany-friendly patterns for insert/update bursts.
Keep transactions explicit and avoid autocommit-style tiny transactions.
Limit ORM object hydration when only scalar/tuple output is needed.

flowchart TD
    Start[Performance tuning] --> Pool{Pool saturated?}
    Pool -->|Yes| TunePool[Increase pool_size / max_overflow]
    Pool -->|No| ORM{Heavy ORM hydration?}
    ORM -->|Yes| CorePath[Switch hot path to Core statements]
    ORM -->|No| Batch{Batch operations possible?}
    Batch -->|Yes| Bulk[Use bulk operations / executemany]
    Batch -->|No| IndexCheck[Validate SQL plans and indexes]

Running Benchmarks¶

Clone: git clone https://github.com/cubrid-lab/cubrid-benchmark.
Start the benchmark database containers per the benchmark documentation.
Run the Python benchmark suite to establish DBAPI baseline metrics.
Run SQLAlchemy-specific scenarios on the same host and dataset shape.
Compare driver baseline vs ORM/Core runs to isolate framework overhead.

Use the benchmark repository documentation for the exact command set and runner scripts.