Performance Guide¶
This guide covers observed performance for sqlalchemy-cubrid and practical optimization patterns.
Table of Contents¶
Overview¶
sqlalchemy-cubrid adds SQLAlchemy Core/ORM behavior on top of the CUBRID Python driver.
flowchart LR
App[Application / ORM Models] --> SA[SQLAlchemy Core / ORM]
SA --> Dialect[sqlalchemy-cubrid Dialect]
Dialect --> Driver[pycubrid DBAPI]
Driver --> CAS[CAS Binary Protocol]
CAS --> Server[(CUBRID Server)]
flowchart TD
Query[ORM Query] --> Compile[SQL compilation]
Compile --> Bind[Parameter binding]
Bind --> Execute[DBAPI execute]
Execute --> Fetch[Row fetch + object materialization]
Fetch --> AppOut[Application objects]
Benchmark Results¶
Source: cubrid-benchmark
Environment: Intel Core i5-9400F @ 2.90GHz, 6 cores, Linux x86_64, Docker containers.
Baseline driver workload: Python pycubrid vs PyMySQL, 10000 rows x 5 rounds.
| Scenario | CUBRID (pycubrid baseline) | MySQL (PyMySQL) | Ratio (CUBRID/MySQL) |
|---|---|---|---|
| insert_sequential | 10.47s | 1.74s | 6.0x |
| select_by_pk | 15.99s | 3.52s | 4.5x |
| select_full_scan | 10.31s | 1.86s | 5.5x |
| update_indexed | 10.70s | 2.19s | 4.9x |
| delete_sequential | 10.75s | 2.10s | 5.1x |
Note: SQLAlchemy adds extra overhead for SQL compilation, ORM identity mapping, and object creation.
Issue #104 — ORM dialect profile and optimization¶
Measured on the local benchmark environment used for Issue #104:
| Scenario | Before | After | Delta |
|---|---|---|---|
| Tier 2 single-row CRUD (mean) | 155.99 ms | 268.41 ms | 72.1% slower* |
| Tier 2 bulk insert (100 rows, mean) | 370.33 ms | 356.09 ms | 3.9% faster |
| Tier 2 bulk insert (1000 rows, mean) | 2864.56 ms | 2511.52 ms | 12.3% faster |
| Tier 2 query builder select-all (mean) | 39.76 ms | 31.83 ms | 19.9% faster |
Profiling with scripts/profile_orm.py showed that dialect-layer self time is a very small fraction of total
request time on CRUD-heavy ORM paths, with most elapsed time still spent in SQLAlchemy ORM/Core coordination
and the pycubrid driver. In the profiled workload captured for Issue #104:
- dialect-layer self time was 0.0009% of total request time
- SQLAlchemy Core self time was 6.55% of total request time
pycubridself time was 7.01% of total request time- ORM select overhead vs raw SQL select was 18.08% of ORM select time
Top dialect-layer hotspots from the compilation-focused profiler pass were:
compiler.py:visit_on_duplicate_key_updatecompiler.py:limit_clausecompiler.py:update_limit_clausecompiler.py:<dictcomp>insidevisit_on_duplicate_key_updatecompiler.py:replaceinsidevisit_on_duplicate_key_update
Shipped optimization:
- Enabled SQLAlchemy 2.x
insertmanyvaluesfor the CUBRID dialect (use_insertmanyvalues=Trueanduse_insertmanyvalues_wo_returning=True) so ORM bulk inserts can batch into multi-values INSERT forms without changing the public API.
* The single-row CRUD benchmark showed high variance in this local run, so the clear repeatable improvement from this change set is the bulk insert path rather than singleton unit-of-work latency.
Performance Characteristics¶
- The dialect inherits
pycubridtransport behavior and CAS protocol costs. - Core query compilation is fast but non-zero; repeated dynamic SQL can accumulate overhead.
- ORM paths add identity map and model materialization costs compared to raw DBAPI usage.
- Pool configuration strongly impacts latency under concurrency.
- Bulk APIs and Core statements generally outperform row-by-row ORM unit-of-work patterns.
Optimization Tips¶
- Configure pooling explicitly (example:
pool_size,max_overflow,pool_pre_ping=True). - Use SQLAlchemy Core for high-volume bulk writes and large read pipelines.
- Use
executemany-friendly patterns for insert/update bursts. - Keep transactions explicit and avoid autocommit-style tiny transactions.
- Limit ORM object hydration when only scalar/tuple output is needed.
flowchart TD
Start[Performance tuning] --> Pool{Pool saturated?}
Pool -->|Yes| TunePool[Increase pool_size / max_overflow]
Pool -->|No| ORM{Heavy ORM hydration?}
ORM -->|Yes| CorePath[Switch hot path to Core statements]
ORM -->|No| Batch{Batch operations possible?}
Batch -->|Yes| Bulk[Use bulk operations / executemany]
Batch -->|No| IndexCheck[Validate SQL plans and indexes]
Running Benchmarks¶
- Clone:
git clone https://github.com/cubrid-lab/cubrid-benchmark. - Start the benchmark database containers per the benchmark documentation.
- Run the Python benchmark suite to establish DBAPI baseline metrics.
- Run SQLAlchemy-specific scenarios on the same host and dataset shape.
- Compare driver baseline vs ORM/Core runs to isolate framework overhead.
Use the benchmark repository documentation for the exact command set and runner scripts.