<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.traxel.com/index.php?action=history&amp;feed=atom&amp;title=Category%3AParquet</id>
	<title>Category:Parquet - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.traxel.com/index.php?action=history&amp;feed=atom&amp;title=Category%3AParquet"/>
	<link rel="alternate" type="text/html" href="https://wiki.traxel.com/index.php?title=Category:Parquet&amp;action=history"/>
	<updated>2026-04-28T13:26:54Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.38.2</generator>
	<entry>
		<id>https://wiki.traxel.com/index.php?title=Category:Parquet&amp;diff=3094&amp;oldid=prev</id>
		<title>RobertBushman: RobertBushman moved page Category:Parquert to Category:Parquet: Typo</title>
		<link rel="alternate" type="text/html" href="https://wiki.traxel.com/index.php?title=Category:Parquet&amp;diff=3094&amp;oldid=prev"/>
		<updated>2023-10-14T23:41:33Z</updated>

		<summary type="html">&lt;p&gt;RobertBushman moved page &lt;a href=&quot;/index.php/Category:Parquert&quot; class=&quot;mw-redirect&quot; title=&quot;Category:Parquert&quot;&gt;Category:Parquert&lt;/a&gt; to &lt;a href=&quot;/index.php/Category:Parquet&quot; title=&quot;Category:Parquet&quot;&gt;Category:Parquet&lt;/a&gt;: Typo&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 23:41, 14 October 2023&lt;/td&gt;
				&lt;/tr&gt;
&lt;!-- diff cache key traxel_wiki:diff::1.12:old-2741:rev-3094 --&gt;
&lt;/table&gt;</summary>
		<author><name>RobertBushman</name></author>
	</entry>
	<entry>
		<id>https://wiki.traxel.com/index.php?title=Category:Parquet&amp;diff=2741&amp;oldid=prev</id>
		<title>RobertBushman: /* Partitioning */</title>
		<link rel="alternate" type="text/html" href="https://wiki.traxel.com/index.php?title=Category:Parquet&amp;diff=2741&amp;oldid=prev"/>
		<updated>2023-09-14T13:00:52Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Partitioning&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 13:00, 14 September 2023&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l4&quot;&gt;Line 4:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 4:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Partitioning ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Partitioning ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Partitioning your data well means each part file within a partition will have fewer rows &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;that do not need &lt;/del&gt;to be &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;filtered&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Partitioning your data well means each part file within a partition will have fewer rows&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;, a greater percentage of them will be relevant to your query, and fewer accesses &lt;/ins&gt;to &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;external datasets will &lt;/ins&gt;be &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;required&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt; &lt;/del&gt;Reducing the row length of a collection of rows means you can include more columns in the same size Parquet part file.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Reducing the row length of a collection of rows means you can include more columns &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;or a larger share of the total dataset &lt;/ins&gt;in the same size Parquet part file.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Assuming this aligns well &lt;/del&gt;with your &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;queries&lt;/del&gt;, &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;this means increasing &lt;/del&gt;the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;number &lt;/del&gt;of&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Having more columns means you can do more complex statistics without joining against other datasets. Increasing the share of the total dataset means you can get the same job done &lt;/ins&gt;with &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;fewer task nodes.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt; &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;But the cost is that are splitting up the dataset. If &lt;/ins&gt;your &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;query mismatches the index&lt;/ins&gt;, &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;you lose all &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;benefit &lt;/ins&gt;of &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;indexing and may introduce problems memory footprint for a single task.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key traxel_wiki:diff::1.12:old-2740:rev-2741 --&gt;
&lt;/table&gt;</summary>
		<author><name>RobertBushman</name></author>
	</entry>
	<entry>
		<id>https://wiki.traxel.com/index.php?title=Category:Parquet&amp;diff=2740&amp;oldid=prev</id>
		<title>RobertBushman: Created page with &quot;Category:Hacking = Overview = Parquet is a file format that collects schema, columnar data, and columnar metadata in a partitioned collection of files. The partitioned files allow many processes to work together simultaneously, the columnar data enables fast aggregate values for data analytics, and the columnar metadata ensures each process has the information it needs to access the fields quickly.  == Partitioning == Partitioning your data well means each part file...&quot;</title>
		<link rel="alternate" type="text/html" href="https://wiki.traxel.com/index.php?title=Category:Parquet&amp;diff=2740&amp;oldid=prev"/>
		<updated>2023-09-14T12:44:25Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;&lt;a href=&quot;/index.php/Category:Hacking&quot; title=&quot;Category:Hacking&quot;&gt;Category:Hacking&lt;/a&gt; = Overview = Parquet is a file format that collects schema, columnar data, and columnar metadata in a partitioned collection of files. The partitioned files allow many processes to work together simultaneously, the columnar data enables fast aggregate values for data analytics, and the columnar metadata ensures each process has the information it needs to access the fields quickly.  == Partitioning == Partitioning your data well means each part file...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;[[Category:Hacking]]&lt;br /&gt;
= Overview =&lt;br /&gt;
Parquet is a file format that collects schema, columnar data, and columnar metadata in a partitioned collection of files. The partitioned files allow many processes to work together simultaneously, the columnar data enables fast aggregate values for data analytics, and the columnar metadata ensures each process has the information it needs to access the fields quickly.&lt;br /&gt;
&lt;br /&gt;
== Partitioning ==&lt;br /&gt;
Partitioning your data well means each part file within a partition will have fewer rows that do not need to be filtered.&lt;br /&gt;
&lt;br /&gt;
 Reducing the row length of a collection of rows means you can include more columns in the same size Parquet part file.&lt;br /&gt;
&lt;br /&gt;
Assuming this aligns well with your queries, this means increasing the number of&lt;/div&gt;</summary>
		<author><name>RobertBushman</name></author>
	</entry>
</feed>