The ColumnObject has a loosely specified type field that's intended to capture the column's dtype:
Data type of the column. If using a file format with a type system (like Parquet), we recommend you use those types.
This definition is pretty vague, but maybe that's OK. It depends on what this is used for. When the Asset is a typed file format like Parquet, the type in ColumnObject is irrelevant when you're actually loading the data. Under that scenario, I think it's mostly just useful for humans learning about the data. But if the Asset is something like a CSV, a reader might want to use the types from the table:columns to avoid inferring a data type from the values.
IMO, we have a few choices:
- Leave
type as is: don't take a strong stance on what values types can take. Make it clear to users that this field is primarily informational.
- Adopt a specific type system (e.g. parquet or json-table-schema's) and require users to translate the Asset's type to that system.
- Drop
type in favor of multiple fields like, parquet_type, arrow_type, jsonstablechema_type, etc. Let the provider choose which one they provide.
The ColumnObject has a loosely specified
typefield that's intended to capture the column's dtype:This definition is pretty vague, but maybe that's OK. It depends on what this is used for. When the Asset is a typed file format like Parquet, the
typein ColumnObject is irrelevant when you're actually loading the data. Under that scenario, I think it's mostly just useful for humans learning about the data. But if the Asset is something like a CSV, a reader might want to use the types from thetable:columnsto avoid inferring a data type from the values.IMO, we have a few choices:
typeas is: don't take a strong stance on what valuestypescan take. Make it clear to users that this field is primarily informational.typein favor of multiple fields like,parquet_type,arrow_type,jsonstablechema_type, etc. Let the provider choose which one they provide.