-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement disconnect method for backend connections #5940
Comments
Nothing like this currently exists, but would be something we could add. Right now the connection should be cleaned up when the backend object is garbage collected, but ensuring no references exist in a session may not always be easy. |
@ianmcook Is there a specific use case you had in mind for this method? |
The circumstance that prompted me to create this ticket was: I was using Ibis interactively with Trino and realized I had connected to Trino with the wrong catalog. The easiest way to fix this was to disconnect and then reconnect to the correct catalog. I didn't know whether the old connection would be properly cleaned up if I simply deleted or overwrote the connection object, and having some OCD tendencies, I looked for a disconnect method and did not find one :) |
Ah ok. Thanks for the clarification. Some of our backends do have a As @jcrist said we are depending on GC behavior for closing things right now. Ideally we could avoid doing this work until we have bug or use case that requires a |
AFAIK, DuckDB only accepts one open write connection at a time. So if you're working interactively with a .duckdb that you write to and then want to run the whole script in another process, you will get an error due to the open connection in the interactive session. Just happened to me while drafting #5962 as a Quarto gfm document. |
This is also a problem in IPython / Jupyter notebooks because of their atypical garbage collection. You have to connect, write your code, and close and delete the connection variable in a single cell, otherwise the connection will not be garbage collected. For now I wrote this as a shortcut/workaround to avoid typing as much when working in Jupyter notebooks: import gc
from collections.abc import Generator
from contextlib import contextmanager
import ibis
@contextmanager
def ibis_connect(resource: str, **kwargs) -> Generator:
connection = ibis.connect(resource, **kwargs)
yield connection
connection.con.dispose()
gc.collect() Can be used like this: resource = 'duckdb://test.duckdb'
with ibis_connect(resource) as con:
con.create_table('my_table', my_memtable.to_pyarrow(), overwrite=True)
print(con.tables) |
This is a genuine pain point. According the DuckDB FAQ page, there are ways for multiple processes to write to DuckDB through application logic but it's not supported natively by the database itself. My current workaround is to do something similar to @jeffchiou above - I run |
## Description of changes This adds a `disconnect` method to all backends. Previously we didn't do this since the actual connection was often wrapped in SQLAlchemy and our ability to terminate the connection was unclear. The DB-API spec states that there should be a `close` method, and that any subsequent operations on a given `Connection` should raise an error after `close` is called. This _mostly_ works. Trino, Clickhouse, Impala, and BigQuery do not conform to the DB-API in this way. They have the `close` method but don't raise when you make a subsequent call. Spark is not a DB-API, but the `spark.sql.session.stop` method does the right thing. For the in-memory backends and Flink, this is a no-op as there is either nothing to disconnect or no exposed method to disconnect. ## Issues closed Resolves #5940
## Description of changes This adds a `disconnect` method to all backends. Previously we didn't do this since the actual connection was often wrapped in SQLAlchemy and our ability to terminate the connection was unclear. The DB-API spec states that there should be a `close` method, and that any subsequent operations on a given `Connection` should raise an error after `close` is called. This _mostly_ works. Trino, Clickhouse, Impala, and BigQuery do not conform to the DB-API in this way. They have the `close` method but don't raise when you make a subsequent call. Spark is not a DB-API, but the `spark.sql.session.stop` method does the right thing. For the in-memory backends and Flink, this is a no-op as there is either nothing to disconnect or no exposed method to disconnect. ## Issues closed Resolves ibis-project#5940
Is your feature request related to a problem?
Is there a method that an Ibis user can call to disconnect from a backend (without quitting/restarting the Python session)?
Describe the solution you'd like
Implement a disconnect method, or better document it if it already exists.
What version of ibis are you running?
5.0.0
What backend(s) are you using, if any?
Trino
Code of Conduct
The text was updated successfully, but these errors were encountered: