Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update database info including integrating #218 #239

Merged
merged 2 commits into from
Nov 10, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 46 additions & 3 deletions docs/rdbms.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,14 @@ SQLite

SQLite is the default database. It is simple in the sense that it keeps your entire database within a single file which can easily be transferred to different systems. Additionally, the SQLite driver is included with Python and so it's quick to get started.

There are two major, related drawbacks to SQLite. The first is that the
There are two major, related drawbacks to SQLite. The first is that to access it one must copy over
the file, and there is no automated way to keep files synchronised between hosts. (Probably the best
thing to do is to write to the database only on one cluster, and then `rsync` it to the relevant
analysis machines.) The second is that it is not really designed for parallel writes, so when tangos
is writing to the database it must manually try to synchronise writes between different workers.
Tangos does a pretty good job of this, but some network file systems can be slow at releasing file
locks that SQLite uses extensively. If you run into errors about 'database is locked', you have reached
the limit of how many tangos processes can safely write to SQLite simultaneously.

PostgreSQL and MySQL
--------------------
Expand Down Expand Up @@ -56,5 +63,41 @@ or for PostgreSQL:
export TANGOS_DB_CONNECTION=postgresql+psycopg2://tangos:my_secret_password@localhost/database_name
```

You can now use all the tangos tools as normal, and they will populate the MySQL/PostgreSQL database
instead of a SQLite file.

You can now create new users that can access your mysql server with their own username and password.

```bash
echo "create user 'my_new_user'@'%' identified by 'new_password';" | docker exec -i mysql-server mysql -pmy_secret_password
```

Note that in MySQL the `%` acts as a wild card, so this command creates a new user
logging in from any host.

The new user would then connect to the database:

```bash
export TANGOS_DB_CONNECTION=mysql+pymysql://my_new_user:new_password@localhost:3306/database_name
```

The database can be accessed remotely if allowed by any applicable firewalls, by replacing `localhost`
with the actual host like `fancy_computer.astro.fancy_school.edu`. Note, however, that
running a database server open to the world has security implications and may be disallowed by
relevant institutions. The simplest approach, rather than opening up firewalls, is to tunnel in.
For example, the server can be accessed as though it's running on `localhost` if the user
first ssh tunnels into `fancy_computer.astro.fancy_school.edu`:

```bash
ssh -N -f -L localhost:3306:localhost:3306 my_username@fancy_computer.astro.fancy_school.edu
```

Note that new users will by default only be able to view a database. Granting
additional permissions should be done on a case-by-case basis. Only the root user can
do this by defualt. To give a user complete permission to edit an existing database:

```bash
echo "grant all on database_name.* to 'new_user'@'%';" | docker exec -i mysql-server mysql -pmy_secret_password
echo "flush privileges;" | docker exec -i mysql-server mysql -pmy_secret_password
```

You (and whatever users you choose) can now use all the tangos tools as normal, and they will
populate the MySQL/PostgreSQL database instead of a SQLite file.
Loading