Category:Parquet
Revision as of 12:44, 14 September 2023 by RobertBushman (talk | contribs) (Created page with "Category:Hacking = Overview = Parquet is a file format that collects schema, columnar data, and columnar metadata in a partitioned collection of files. The partitioned files allow many processes to work together simultaneously, the columnar data enables fast aggregate values for data analytics, and the columnar metadata ensures each process has the information it needs to access the fields quickly. == Partitioning == Partitioning your data well means each part file...")
Overview
Parquet is a file format that collects schema, columnar data, and columnar metadata in a partitioned collection of files. The partitioned files allow many processes to work together simultaneously, the columnar data enables fast aggregate values for data analytics, and the columnar metadata ensures each process has the information it needs to access the fields quickly.
Partitioning
Partitioning your data well means each part file within a partition will have fewer rows that do not need to be filtered.
Reducing the row length of a collection of rows means you can include more columns in the same size Parquet part file.
Assuming this aligns well with your queries, this means increasing the number of
This category currently contains no pages or media.