Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/mysql-optimize' into mysql-optimize
Browse files Browse the repository at this point in the history
  • Loading branch information
apontzen committed Nov 6, 2023
2 parents 28accd0 + 7857404 commit e13e59a
Show file tree
Hide file tree
Showing 7 changed files with 302 additions and 146 deletions.
1 change: 1 addition & 0 deletions docs/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Advanced topics
_Tangos_ is a highly flexible, customisable system. Tutorials are available covering the following
topics:

- Working with [different database systems](dbms.md) (e.g. MySQL and PostgreSQL)
- Writing code to [calculate your own properties](custom_properties.md)
- [Tracking](tracking.md) groups of particles across timesteps
- [Parallelisation strategies](mpi.md)
Expand Down
39 changes: 0 additions & 39 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,45 +77,6 @@ MySQL / MariaDB below.
Remember, you will need to set these environment variables *every* time you start a new session on your computer prior
to booting up the database, either with the webserver or the python interface (see below).

Using PostgreSQL, MySQL or MariaDB
----------------------------------

As stated above, tangos is agnostic to the underlying SQL flavour. It is easiest to get start with
SQLite which doesn't need any special server. But version 1.5+ should also work well with [MySQL](https://www.mysql.com),
[MariaDB](https://mariadb.org) and version 1.7+ also with [PostgreSQL](https://www.postgresql.org).

To try this out, if you have [docker](https://docker.com), you can run a test
MySQL server very easily:

```bash
docker pull mysql
docker run -d --name=mysql-server -p3306:3306 -e MYSQL_ROOT_PASSWORD=my_secret_password mysql
echo "create database database_name;" | docker exec -i mysql-server mysql -pmy_secret_password
```

Or, just as easily, you can get going with PostgreSQL:
```bash
docker pull postgres
docker run --name tangos-postgres -e POSTGRES_USER=tangos -e POSTGRES_PASSWORD=my_secret_password -e POSTGRES_DB=database_name -p 5432:5432 -d postgres
```

To be sure that python can connect to MySQL or PostgreSQL, install the appropriate modules:
```bash
pip install PyMySQL # for MySQL
pip install psycopg2-binary # for PostgreSQL
```

Tangos can now connect to your test MySQL server using the connection:
```bash
export TANGOS_DB_CONNECTION=mysql+pymysql://root:my_secret_password@localhost:3306/database_name
```
or for PostgreSQL:
```bash
export TANGOS_DB_CONNECTION=postgresql+psycopg2://tangos:my_secret_password@localhost/database_name
```

You can now use all the tangos tools as normal, and they will populate the MySQL/PostgreSQL database
instead of a SQLite file.


Where next?
Expand Down
62 changes: 62 additions & 0 deletions docs/rdbms.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
Working with different database systems
=======================================

Tangos is built on sqlalchemy, which means that it is in principle possible to use any database system supported by sqlalchemy. However, different database systems have different features and limitations of which it is worth being aware.

The tangos tests are run with SQLite, mySQL and postgresql. Other databases, while supported by sqlalchemy, have not been directly tested. The following contain some notes on using these different systems.

SQLite
------

SQLite is the default database. It is simple in the sense that it keeps your entire database within a single file which can easily be transferred to different systems. Additionally, the SQLite driver is included with Python and so it's quick to get started.

There are two major, related drawbacks to SQLite. The first is that the

PostgreSQL and MySQL
--------------------

PostgreSQL and MySQL are both server-based systems, and as such take a little more effort to set up and maintain. If one exposes PostgreSQL to the outside world, there are potential security implications. One can of course run it on a firewalled computer and manage access appropriately, but this takes some expertise of its own (that will not be covered here). The major advantage is that you can host your data in a single location and allow multiple users to connect.



MySQL
-----

MySQL is a server-based system, and as such takes a little more effort to set up. The advantage is that you can host your data in a single location and allow multiple users to connect. Additionally, it is able to cope much better with complex parallel writes than SQLite.

For most users, MySQL and PostgreSQL are

To try this out, if you have [docker](https://docker.com), you can run a test
MySQL server very easily:

```bash
docker pull mysql
docker run -d --name=mysql-server -p3306:3306 -e MYSQL_ROOT_PASSWORD=my_secret_password mysql
echo "create database database_name;" | docker exec -i mysql-server mysql -pmy_secret_password
```

Or, just as easily, you can get going with PostgreSQL:
```bash
docker pull postgres
docker run --name tangos-postgres -e POSTGRES_USER=tangos -e POSTGRES_PASSWORD=my_secret_password -e POSTGRES_DB=database_name -p 5432:5432 -d postgres
```

To be sure that python can connect to MySQL or PostgreSQL, install the appropriate modules:
```bash
pip install PyMySQL # for MySQL
pip install psycopg2-binary # for PostgreSQL
```

Tangos can now connect to your test MySQL server using the connection:
```bash
export TANGOS_DB_CONNECTION=mysql+pymysql://root:my_secret_password@localhost:3306/database_name
```
or for PostgreSQL:
```bash
export TANGOS_DB_CONNECTION=postgresql+psycopg2://tangos:my_secret_password@localhost/database_name
```

You can now use all the tangos tools as normal, and they will populate the MySQL/PostgreSQL database
instead of a SQLite file.


3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@
'hupper',
'scipy >= 0.14.0',
'more_itertools >= 8.0.0',
'matplotlib >= 3.0.0' # for web interface
'matplotlib >= 3.0.0', # for web interface
'tqdm >= 4.59.0'
]

tests_require = [
Expand Down
8 changes: 4 additions & 4 deletions tangos/core/halo.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import numpy as np
from sqlalchemy import Column, ForeignKey, Integer, orm, types
from sqlalchemy import Column, ForeignKey, Integer, orm, types, BigInteger
from sqlalchemy.orm import Session, backref, relationship

from . import Base, creator, extraction_patterns
Expand Down Expand Up @@ -30,9 +30,9 @@ class SimulationObjectBase(Base):
__tablename__= "halos"

id = Column(Integer, primary_key=True) #the unique ID value of the database object created for this halo
halo_number = Column(Integer) #by default this will be the halo's rank in terms of particle count
finder_id = Column(UnsignedInteger) #raw halo ID from the halo catalog
finder_offset = Column(Integer) #index of halo within halo catalog, primary identifier used when reading catalog/simulation data
halo_number = Column(BigInteger) #by default this will be the halo's rank in terms of particle count
finder_id = Column(BigInteger) #raw halo ID from the halo catalog
finder_offset = Column(BigInteger) #index of halo within halo catalog, primary identifier used when reading catalog/simulation data
timestep_id = Column(Integer, ForeignKey('timesteps.id'))
timestep = relationship(TimeStep, backref=backref(
'objects', order_by=halo_number, cascade_backrefs=False, lazy='dynamic'), cascade='')
Expand Down
Loading

0 comments on commit e13e59a

Please sign in to comment.