-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cep27 #259
Open
jbae11
wants to merge
6
commits into
cyclus:source
Choose a base branch
from
jbae11:cep27
base: source
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Cep27 #259
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
069dfe8
cep27 rough draft
jbae11 3230670
updates to CEP27
scopatz eec6cbc
Merge pull request #1 from scopatz/cep27
jbae11 1d67bca
added explicit inventory tables for reference
jbae11 b63c111
past tables
jbae11 b06a941
spelling..
jbae11 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,313 @@ | ||
CEP 27 - |Cyclus| Database Restructuring | ||
******************************************** | ||
|
||
:CEP: 27 | ||
:Title: |Cyclus| Database Restructuring | ||
:Last-Modified: 2017-09-11 | ||
:Author: Jin whan Bae & Anthony Scopatz | ||
:Status: Draft | ||
:Type: Standards Track | ||
:Created: 2013-09-11 | ||
|
||
Abstract | ||
============ | ||
This CEP proposes to restructure the |Cyclus| output database structure in order to | ||
reduce the number of tables and redundancy of data, and ultimately reduce the number | ||
of ``joins`` required for data analysis. Doing so would reduce the computing time | ||
for end-user analysis, and allow for a clearer, more concise output database. | ||
|
||
|
||
Motivation | ||
========== | ||
The current output database requires the user to join multiple tables to acquire | ||
meaningful material data, such as quantity and composition. This causes long | ||
analysis computing times and confusion for the user. | ||
|
||
|
||
Rationale | ||
========= | ||
The proposed restructure aims to reduce the number of tables the user has to query | ||
for analysis. This can be done by two methods: | ||
|
||
1. Combine redundant tables | ||
2. Reduce a table (``Compositions`` table) into a column with variable-type map. | ||
|
||
Additionally, this CEP proposes to store both **Inventories** and **Transactions** | ||
by default. Either table may be backed out of the other (with additional | ||
information coming from **Materials** etc). However, this backing out process has proven | ||
extraordinarily expensive, exploding the number of operations needed to back out non-present | ||
by millions to billions. Even for small databases, this has proven prohibitive. | ||
|
||
While storing both **Inventories** and **Transactions** may seem inefficient, consider | ||
that: | ||
|
||
* Data storage is cheap, | ||
* Material inventories are what most analysis tasks require, and | ||
* This is precisely double-entry bookkeeping, as applied to the nuclear fuel cycle. | ||
|
||
Double-entry bookkeeping was huge innovation in accounting systems. When implemented | ||
correctly and without fraud, it leads to a self-consistent system. This enables errors | ||
to be discovered and corrected earlier. This CEP argues that |Cyclus| should provide | ||
the information needed to verify the mass balances, if requested. | ||
|
||
|
||
Specification \& Implementation | ||
=============================== | ||
The following tables that are currently in output are considered for editing: | ||
|
||
1. Compositions | ||
2. Transactions | ||
3. Recipes | ||
4. ExplicitInventory | ||
5. ExplicitInventoryCompact | ||
6. Info | ||
7. InfoExplicitInv | ||
8. ResCreators | ||
9. Resources | ||
|
||
|
||
Material and Product | ||
-------------------- | ||
|
||
Currently, both **Material** and **Product** are in the Resources Table. | ||
The internal state of **Material** is stored in **Compositions**, and | ||
the internal state of **Product** is stored in **Products** table. | ||
This requires the user to make joins to acquire the internal state | ||
of the resources. | ||
|
||
We can avoid unnecessary joins by creating a **Materials** and | ||
**Products** table, with the internal state (composition and quality) | ||
as a column. | ||
|
||
In short, we propose to replace **Compositions**, **Products**, and | ||
**Resources** table with **Materials** and **Products** Table. In the | ||
process, the **QualId** column would be removed. | ||
|
||
Currently: | ||
|
||
============ ========== | ||
Resources | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
SimId uuid | ||
ResourceId int | ||
ObjId int | ||
Type string | ||
TimeCreated int | ||
Quantity double | ||
Units string | ||
QualId int | ||
Parent1 int | ||
Parent2 int | ||
============ ========== | ||
|
||
|
||
|
||
============ ========== | ||
Products | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
SimId uuid | ||
QualId int | ||
Quality string | ||
============ ========== | ||
|
||
|
||
|
||
|
||
============ ========== | ||
Compositions | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
Simid uuid | ||
QualId int | ||
NucId int | ||
MassFrac double | ||
============ ========== | ||
|
||
Would be restructured to: | ||
|
||
|
||
============ ========== | ||
Materials | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
SimId uuid | ||
ResourceId int | ||
ObjId int | ||
TimeCreated int | ||
Parent1 int | ||
Parent2 int | ||
Units string | ||
Quantity double | ||
Composition map<int,double> | ||
============ ========== | ||
|
||
Where the composition column would map <NucId, MassFrac> | ||
|
||
============ ========== | ||
Products | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
SimId uuid | ||
ResourceId int | ||
ObjId int | ||
TimeCreated int | ||
Parent1 int | ||
Parent2 int | ||
Units string | ||
Quantity double | ||
Quality string | ||
============ ========== | ||
|
||
Also, since **QualId** is removed, the **Recipes** Table | ||
also needs to be edited: | ||
|
||
============ ========== | ||
Recipes | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
SimId uuid | ||
Recipes string | ||
Composition map<int,double> | ||
============ ========== | ||
|
||
|
||
Transactions | ||
------------ | ||
The transactions table would be modified to have an integer flag for whether | ||
the commodity is a material or a product. This flag let's anyone inspecting | ||
the transaction table know which resource table (either **Materials** or | ||
**Products**) to go to to find the actual concrete resource. | ||
|
||
**Current:** | ||
|
||
============ ========== | ||
Transactions | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
SimId uuid | ||
TransactionId int | ||
SenderId int | ||
ReceiverId int | ||
ResourceId int | ||
Commodity string | ||
Time int | ||
============ ========== | ||
|
||
**Proposed** | ||
|
||
================ ========== | ||
Transactions | ||
---------------------------- | ||
Column Type | ||
================ ========== | ||
SimId uuid | ||
TransactionId int | ||
SenderId int | ||
ReceiverId int | ||
**ResourceType** **int** | ||
ResourceId int | ||
Commodity string | ||
Time int | ||
================ ========== | ||
|
||
This table will now be optionally written to the database. The default will be to | ||
write this table (true). | ||
|
||
|
||
ResCreators | ||
----------- | ||
Along with **Transactions**, the **ResCreators** | ||
table would need another column, ResourceType: | ||
|
||
============ ========== | ||
ResCreators | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
Simid uuid | ||
Resourceid int | ||
AgentId int | ||
ResourceType int | ||
============ ========== | ||
|
||
|
||
Merge ExplicitInventory & ExplicitInventoryCompact | ||
---------------------------------------------------- | ||
The **ExplicitInventory** table and **ExplicitInventoryCompact** | ||
table should be merged to a single table, called **Inventories**. | ||
The current **ExplicitInventory** table and **ExplicitInventoryCompact** | ||
table has a structure as such: | ||
============ ========== | ||
ExplicitInventory | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
Simid uuid | ||
Agentid int | ||
Time int | ||
InventoryName string | ||
NucId int | ||
Quantity double | ||
============ ========== | ||
|
||
============ ========== | ||
ExplicitInventoryCompact | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
Simid uuid | ||
Agentid int | ||
Time int | ||
InventoryName string | ||
Quantity double | ||
Composition map<int,double> | ||
============ ========== | ||
|
||
============ ========== | ||
Inventories | ||
------------------------ | ||
Column Type | ||
============ ========== | ||
Simid uuid | ||
Agentid int | ||
Time int | ||
InventoryName string | ||
Quantity double | ||
Composition int | ||
============ ========== | ||
|
||
This table will be optionally written to the database. The default will be to | ||
write this table (true). | ||
|
||
|
||
Merge Info & InfoExplicitInv | ||
---------------------------- | ||
We saw little reason to separate the two tables. Combining them is a matter of cleanliness. | ||
Additionally, the single **Info** table will have to contain an extra column, **RecordTransactions**. | ||
Furthermore, the **RecordInventory** column is no longer needed and will be removed. | ||
|
||
Other informational tables may also be merged into the single table. | ||
|
||
|
||
Backwards Compatibility | ||
======================= | ||
This CEP is not backwards compatible. | ||
|
||
Document History | ||
================ | ||
This document is released under the CC-BY 3.0 license. | ||
|
||
References and Footnotes | ||
======================== | ||
|
||
.. rubric:: References | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous layout anticipated the desire to select on nuclide in the query, and hence a different column for each NucId. Perhaps this has not emerged in the wild, but it seems that a consequence of this change would make this no longer possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, that is a valid point.
Maybe it's necessary for us to define what
For example, if one wants the timeseries mass of Pu239,
the query would be like the following:
in the newer database structure, it would be:
followed by a script that processes the result:
So I do assume that it would take a longer time to accomplish
what you mentioned ( and also needs additional scripting outside of the sqlite query)...
You probably know much more than me, but @scopatz and my initial thought was that
this would have more benefit than loss. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will not be optimized for a large calculation (1000-10000 facilities), if you want to see the plutonium inventory in the fleet, you will need to load all the composition, get the informations you need and then re-generate a table.
I would prefer a system that allow us to filter using facility's name and nucid, but I am not sure it is possible without having a gigantic table :(