MediaWiki HSL Migration: Difference between revisions

From Traxel Wiki
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 22: Line 22:
# grep for "Full resolution"
# grep for "Full resolution"
#* That didn't work. Try this:
#* That didn't work. Try this:
#* <div class="fullMedia"><a href="/w/images/6/63/3d_Printing_BYOF.jpg" class="internal" title="3d Printing BYOF.jpg">3d_Printing_BYOF.jpg</a>‎ <span class="fileInfo">(482 × 541 pixels, file size: 80 KB, MIME type: image/jpeg)</span>
#* <pre><div class="fullMedia"><a href="/w/images/6/63/3d_Printing_BYOF.jpg" class="internal" title="3d Printing BYOF.jpg">3d_Printing_BYOF.jpg</a>‎ <span class="fileInfo">(482 × 541 pixels, file size: 80 KB, MIME type: image/jpeg)</span></pre>
#* <pre><div class="fullMedia"><a href="/w/images/e/e5/3d_Printing_Donation_Box.jpg" class="internal" title="3d Printing Donation Box.jpg">Full resolution</a>‎ <span class="fileInfo">(1,177 × 753 pixels, file size: 247 KB, MIME type: image/jpeg)</span></pre>
# get the URL from: a href="/w/images/1/1c/zoom-into-rectangle.png" class="internal"
# get the URL from: a href="/w/images/1/1c/zoom-into-rectangle.png" class="internal"
# Use the second curl bit below to get the files
# Use the second curl bit below to get the files
Line 30: Line 31:
<pre>
<pre>
curl -s 'https://wiki.heatsynclabs.org/wiki/File:zoom-into-rectangle.png' | grep 'Full resolution' | perl -ne 'print "$1\n" while m|a href="(/w/images/[^"]*)" class="internal"|g'
curl -s 'https://wiki.heatsynclabs.org/wiki/File:zoom-into-rectangle.png' | grep 'Full resolution' | perl -ne 'print "$1\n" while m|a href="(/w/images/[^"]*)" class="internal"|g'
</pre>
<pre>
curl -s 'https://wiki.heatsynclabs.org/wiki/File:zoom-into-rectangle.png' | grep 'div class="fullMedia"' | grep -v 'Full resolution' | perl -ne 'print "$1\n" while m|a href="(/w/images/[^"]*)" class="internal"|g'
</pre>
</pre>
<pre>
<pre>
awk '{print "curl -sO https://wiki.heatsynclabs.org"$1}' < ../file-full-res-paths.txt
awk '{print "curl -sO https://wiki.heatsynclabs.org"$1}' < ../file-full-res-paths.txt
awk '{print "echo "$1" && curl -sO https://wiki.heatsynclabs.org"$1}' < ../file-full-res-paths-nofull.txt
</pre>
</pre>



Latest revision as of 00:52, 31 March 2024

2024-03-30 Attempt

Fixing Broken Categories

cd /var/www/mediawiki/maintenance/
php update.php # this didn't do it
php rebuildall.php # this worked

Perl Regex for Page Names

perl -ne 'print "$2\n" while m|<a href="[^"]*"( class="mw-redirect")* title="[^"]*">([^<>]*)</|g' < pages-02.html

Perl Regex for File Links

a href="/wiki/File:01mendel-ABS_solid-bed-height-spacer-31m.jpg"
cat files-*.html | perl -ne 'print "$1\n" while m|a href="(File:[^"]*)"|g'

Handling Files

  1. Hit the /wiki/File:... page
  2. grep for "Full resolution"
    • That didn't work. Try this:
    • <div class="fullMedia"><a href="/w/images/6/63/3d_Printing_BYOF.jpg" class="internal" title="3d Printing BYOF.jpg">3d_Printing_BYOF.jpg</a>‎ <span class="fileInfo">(482 × 541 pixels, file size: 80 KB, MIME type: image/jpeg)</span>
    • <div class="fullMedia"><a href="/w/images/e/e5/3d_Printing_Donation_Box.jpg" class="internal" title="3d Printing Donation Box.jpg">Full resolution</a>‎ <span class="fileInfo">(1,177 × 753 pixels, file size: 247 KB, MIME type: image/jpeg)</span>
  3. get the URL from: a href="/w/images/1/1c/zoom-into-rectangle.png" class="internal"
  4. Use the second curl bit below to get the files
  5. Go to the All Pages special page and export the Files category.
    1. Import those pages to the new wiki.
  6. Upload the files, ignoring warnings. (maybe find a way to bulk this, but shouldn't be more than an hour of work)
curl -s 'https://wiki.heatsynclabs.org/wiki/File:zoom-into-rectangle.png' | grep 'Full resolution' | perl -ne 'print "$1\n" while m|a href="(/w/images/[^"]*)" class="internal"|g'
curl -s 'https://wiki.heatsynclabs.org/wiki/File:zoom-into-rectangle.png' | grep 'div class="fullMedia"' | grep -v 'Full resolution' | perl -ne 'print "$1\n" while m|a href="(/w/images/[^"]*)" class="internal"|g'
awk '{print "curl -sO https://wiki.heatsynclabs.org"$1}' < ../file-full-res-paths.txt
awk '{print "echo "$1" && curl -sO https://wiki.heatsynclabs.org"$1}' < ../file-full-res-paths-nofull.txt
bob@lap-2021:~/Documents/hacking/hsl/wiki-migrate$ cat file-paths.txt | ./file-get-commands.pl | sh
bob@lap-2021:~/Documents/hacking/hsl/wiki-migrate$ cat file-get-commands.pl 
#!/usr/bin/env perl

while (<>) {
    chomp;
    $a = $_;
    if ($a) {
        print "curl -s 'https://wiki.heatsynclabs.org".$a."'";
        print " | grep 'Full resolution'";
        print ' | perl -ne \'print "$1\n" while m|a href="(/w/images/[^"]*)" class="internal"|g;\'';
        print " >> test.sh\n";
    }
}