-
Notifications
You must be signed in to change notification settings - Fork 28
6. Creating DataFrame
There are four ways of creating a data frame:
The easiest and most straightforward way of creating a DataFrame is by passing all data in an array of arrays to fromRows:
or fromColumns:
message. Here is an example of initializing a DataFrame with rows:
df := DataFrame fromRows: #(
('Barcelona' 1.609 true)
('Dubai' 2.789 true)
('London' 8.788 false)).
The same data frame can be created from the array of columns
df := DataFrame fromColumns: #(
('Barcelona' 'Dubai' 'London')
(1.609 2.789 8.788)
(true true false)).
Since the names of rows and columns are not provided, they are initialized with their default values: (1 to: self numberOfRows)
and (1 to: self numberOfColumns)
. Both rowNames
and columnNames
can always be changed by passing an array of new names to a corresponding accessor. This array must be of the same size as the number of rows and columns.
df columnNames: #(City Population BeenThere).
df rowNames: #(A B C).
You can convert this data frame to a pretty-printed table that can be coppied and pasted into letters, blog posts, and tutorials (such as this one) using df asStringTable
message
| City Population BeenThere
---+----------------------------------
A | Barcelona 1.609 true
B | Dubai 2.789 true
C | London 8.788 false
By it's nature DataFrame is similar to a matrix. It works like a table of values, supports matrix accessors, such as at:at:
or at:at:put:
and in some cases can be treated like a matrix. Some classes provide tabular data in matrix format. For example TabularWorksheet class of Tabular package that is used for reading XLSX files. To initialize a DataFrame from a maxtrix of values, use fromMatrix:
method
matrix := Matrix
rows: 3 columns: 3
contents:
#('Barcelona' 1.609 true
'Dubai' 2.789 true
'London' 8.788 false).
df := DataFrame fromMatrix: matrix.
Once again, the names of rows and columns are set to their default values.
In most real-world scenarios the data is located in a file or database. The support for database connections will be added in future releases. Right now DataFrame provides you the methods for loading data from two most commot file formats: CSV and XLSX
DataFrame fromCSV: 'path/to/your/file.csv'.
DataFrame fromXLSX: 'path/to/your/file.xlsx'.
Since JSON does not store data as a table, it is not possible to read such file directly into a DataFrame. However, you can parse JSON using NeoJSON or any other library, construct an array of rows and pass it to fromRows:
message, as described in previous sections.
DataFrame provides several famous datasets for you to play with. They are compact and can be loaded with a simple message. An this point there are three datasets that can be loaded in this way - Iris flower dataset, a simplified Boston Housing dataset, and Restaurant tipping dataset.
DataFrame loadIris.
DataFrame loadHousing.
DataFrame loadTips.