Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement disconnect method for backend connections #5940

Closed
1 task done
ianmcook opened this issue Apr 7, 2023 · 7 comments · Fixed by #8341
Closed
1 task done

feat: implement disconnect method for backend connections #5940

ianmcook opened this issue Apr 7, 2023 · 7 comments · Fixed by #8341
Assignees
Labels
backends Issues related to all backends feature Features or general enhancements

Comments

@ianmcook
Copy link
Contributor

ianmcook commented Apr 7, 2023

Is your feature request related to a problem?

Is there a method that an Ibis user can call to disconnect from a backend (without quitting/restarting the Python session)?

Describe the solution you'd like

Implement a disconnect method, or better document it if it already exists.

What version of ibis are you running?

5.0.0

What backend(s) are you using, if any?

Trino

Code of Conduct

  • I agree to follow this project's Code of Conduct
@ianmcook ianmcook added the feature Features or general enhancements label Apr 7, 2023
@jcrist
Copy link
Member

jcrist commented Apr 7, 2023

Nothing like this currently exists, but would be something we could add. Right now the connection should be cleaned up when the backend object is garbage collected, but ensuring no references exist in a session may not always be easy.

@cpcloud
Copy link
Member

cpcloud commented Apr 7, 2023

@ianmcook Is there a specific use case you had in mind for this method?

@ianmcook
Copy link
Contributor Author

The circumstance that prompted me to create this ticket was: I was using Ibis interactively with Trino and realized I had connected to Trino with the wrong catalog. The easiest way to fix this was to disconnect and then reconnect to the correct catalog. I didn't know whether the old connection would be properly cleaned up if I simply deleted or overwrote the connection object, and having some OCD tendencies, I looked for a disconnect method and did not find one :)

@cpcloud
Copy link
Member

cpcloud commented Apr 10, 2023

Ah ok. Thanks for the clarification. Some of our backends do have a close method. We could in theory add that to all backends.

As @jcrist said we are depending on GC behavior for closing things right now.

Ideally we could avoid doing this work until we have bug or use case that requires a close method or someone that very vocally wants all the backends to match WRT this method 😄.

@cpcloud cpcloud added the backends Issues related to all backends label Apr 10, 2023
@dpprdan
Copy link

dpprdan commented Apr 11, 2023

AFAIK, DuckDB only accepts one open write connection at a time. So if you're working interactively with a .duckdb that you write to and then want to run the whole script in another process, you will get an error due to the open connection in the interactive session.

Just happened to me while drafting #5962 as a Quarto gfm document.

@jeffchiou
Copy link

This is also a problem in IPython / Jupyter notebooks because of their atypical garbage collection. You have to connect, write your code, and close and delete the connection variable in a single cell, otherwise the connection will not be garbage collected. For now I wrote this as a shortcut/workaround to avoid typing as much when working in Jupyter notebooks:

import gc
from collections.abc import Generator
from contextlib import contextmanager

import ibis

@contextmanager
def ibis_connect(resource: str, **kwargs) -> Generator:
    connection = ibis.connect(resource, **kwargs)
    yield connection
    connection.con.dispose()
    gc.collect()

Can be used like this:

resource = 'duckdb://test.duckdb'
with ibis_connect(resource) as con:
    con.create_table('my_table', my_memtable.to_pyarrow(), overwrite=True)
    print(con.tables)

@chris-park
Copy link
Contributor

chris-park commented Dec 13, 2023

AFAIK, DuckDB only accepts one open write connection at a time. So if you're working interactively with a .duckdb that you write to and then want to run the whole script in another process, you will get an error due to the open connection in the interactive session.

Just happened to me while drafting #5962 as a Quarto gfm document.

This is a genuine pain point. According the DuckDB FAQ page, there are ways for multiple processes to write to DuckDB through application logic but it's not supported natively by the database itself. My current workaround is to do something similar to @jeffchiou above - I run conn.con.pool.dispose() (assuming the DuckDB connection is called conn) but it would be nice to have a conn.disconnect() method.

@lostmygithubaccount lostmygithubaccount moved this from todo to backlog in Ibis planning and roadmap Jan 26, 2024
@gforsyth gforsyth self-assigned this Feb 13, 2024
@gforsyth gforsyth moved this from backlog to cooking in Ibis planning and roadmap Feb 13, 2024
@gforsyth gforsyth moved this from cooking to review in Ibis planning and roadmap Feb 13, 2024
gforsyth added a commit that referenced this issue Feb 14, 2024
## Description of changes

This adds a `disconnect` method to all backends. Previously we didn't do
this since the actual connection was often wrapped in SQLAlchemy and our
ability to terminate the connection was unclear.

The DB-API spec states that there should be a `close` method, and that
any subsequent operations on a given `Connection` should raise an error
after `close` is called.

This _mostly_ works.

Trino, Clickhouse, Impala, and BigQuery do not conform to the DB-API in
this way.  They have the `close` method but don't raise when you make a
subsequent call.

Spark is not a DB-API, but the `spark.sql.session.stop` method does the
right thing.

For the in-memory backends and Flink, this is a no-op as there is either 
nothing to disconnect or no exposed method to disconnect.



## Issues closed

Resolves #5940
@github-project-automation github-project-automation bot moved this from review to done in Ibis planning and roadmap Feb 14, 2024
ncclementi pushed a commit to ncclementi/ibis that referenced this issue Feb 21, 2024
## Description of changes

This adds a `disconnect` method to all backends. Previously we didn't do
this since the actual connection was often wrapped in SQLAlchemy and our
ability to terminate the connection was unclear.

The DB-API spec states that there should be a `close` method, and that
any subsequent operations on a given `Connection` should raise an error
after `close` is called.

This _mostly_ works.

Trino, Clickhouse, Impala, and BigQuery do not conform to the DB-API in
this way.  They have the `close` method but don't raise when you make a
subsequent call.

Spark is not a DB-API, but the `spark.sql.session.stop` method does the
right thing.

For the in-memory backends and Flink, this is a no-op as there is either 
nothing to disconnect or no exposed method to disconnect.



## Issues closed

Resolves ibis-project#5940
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backends Issues related to all backends feature Features or general enhancements
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

7 participants