Category:Parquet

From Traxel Wiki
Revision as of 12:44, 14 September 2023 by RobertBushman (talk | contribs) (Created page with "Category:Hacking = Overview = Parquet is a file format that collects schema, columnar data, and columnar metadata in a partitioned collection of files. The partitioned files allow many processes to work together simultaneously, the columnar data enables fast aggregate values for data analytics, and the columnar metadata ensures each process has the information it needs to access the fields quickly. == Partitioning == Partitioning your data well means each part file...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Overview

Parquet is a file format that collects schema, columnar data, and columnar metadata in a partitioned collection of files. The partitioned files allow many processes to work together simultaneously, the columnar data enables fast aggregate values for data analytics, and the columnar metadata ensures each process has the information it needs to access the fields quickly.

Partitioning

Partitioning your data well means each part file within a partition will have fewer rows that do not need to be filtered.

Reducing the row length of a collection of rows means you can include more columns in the same size Parquet part file.

Assuming this aligns well with your queries, this means increasing the number of

This category currently contains no pages or media.