One of my keyboards at home is a very clickety (model M style, too clickety for the office) "matias tactile pro". Here is the vendor blurb.
While the typing is great, the plastic shell didn't stand the test of time. It got brittle very quickly, internal clips broke off, and the top and bottom case parts became completely seperated at the front. Occasionally, the the "return" key got stuck with the raised shell and went auto-repeat.
"All the world's problems can be fixed with either gaffa tape or WD40", they say. For the past five years, I used clear adhesive tape but its stickyness is limited, too. Time for some quick renewal:
We will see whether this will do the job for the next five days or five years. At least Piet should approve the solution:
And now for something completely different:
The newest addition to the labs is not 7+ years old but spanking new gear: i5-9400, 32GB RAM, 500GB SSD, ... and very, very quiet.
I usually don't put stickers to my machines. The Abteilung-fuer-Redundanz-Abteilungdeserves an exception to this rule: their sticker not only matches the colour scheme of the mini-tower/shoebox enclosure perfectly but the six wings also hint at the six cores in the i5-9400.
The machine comes with a big monitor (2560x1440 on 27") and is my new home office workplace.
This replaces the the prior work setup of my Lenovo X220 laptop + 1280x1024 external monitor. (Which is way too noisy when used the whole day long.)
I had to shoot the pictures right away because such a clean desk is veryrare with me. It was only possible because I shifted all of the entropy to the neighbouring desk:
Both desks feature a row of 2cm holes in the back. These were originally used as air vents for the heating elements the desks covered in the 70ies. Now they are great for much of the cabling.
The cables will be tied up nicely once things have settled.
I still need to learn how to route the audio via the DisplayPort instead of the analog LineOut/Headphone jacks. Also, having the keyboard USB-hubbed off the monitor means that I have to re-run my
xmodmap ~/,Xmodmap
every time after switching the monitor off/on. (Probably a job for udev; for now, I let the screensaver do the work.)
I intend to run hubert as a 24/7 server+workstation. I decided on Proxmox-VE/Debian as OS, extended with the desktop services and applications I need at the hypervisor level.
Hubert
should inherit many tasks from
alexis
, the Olinuxino ARM board which provided the 24/7 services for the last years. (XDMCP, bootp/tftp, X11 xfs, DNS, NTP, lpd). Once
fz
, the NCD X Terminal in the kitchen can boot/work from
hubert
, I can give alexis its badly needed system upgrade.
... but in different ways.
Sometimes the integrated electronics board of a disk dies. I cannot cope with these failures -- such a an HDD is trulydead for me.
Sometimes, just "a few" disk blocks become unreadable (and/or unwritable).
And if you are lucky, the disk got just reported as bad but is actually fine for use: I have seen a RAID enclosure for 16 disks which liked to complain about various disks but always in the same two slots -- I rather suspect a problem with the backplane. Then there are those disks which were just subjected to unfair work conditions: too hot. Give them enough cool air to breathe and they cooperate nicely again.
Ditching an entire HDD because a few first blocks become unreadable is justified in a production setting where HDDs are cheaper than the labor to deal with a corrupted disk and the associated data loss.
In my lab, though, this is where the fun begins. Can I put HDDs "with known issues" still to someproper use? Of course:
Not all of my data is important. I would not shed a tear about a lost mp3 file ripped from one of my CDs.
I can forgo the loss of a filesystem backup -- there will be new ones soon enough.
Build RAID systems from risky disks -- that's what RAIDs are for, after all.
This article takes a closer look at a faulty 250 GB disk. It was pulled from a RAID-1 system where it was easily replaced. (By a 500 GB disk because that was the smallest size sold at that time.)
Summary: dd(1), awk(1), jqt(1), smartctl(1).
Here is my current backlog of disks to check:
These are mostly 3.5" SATA HDDs, mostly fallout from production servers in the company.
Disks which are checked with some hope for an afterlife go here:
These are mostly SCSI disks of various vintages. I have no shortage on 68-pin WIDE-SCSI disks. 80-pin SCA disks and 50-pins "FAST" SCSI are the real treasure items, I need these most for the workstation gear I carry. (It has actually become more and more difficult to find the 50pin disks.)
Testing is most easily done with external drive sockets such as these:
This one takes 3.5" and 2.5" SATA disks (rotating dust or SSDs). It is attached via eSATA to
fred.marshlabs.gaertner.de
, a Core2Duo box running DragonFlyBSD.
IDE (PATA) disks went mostly out of interest but I would have some IDE-to-USB connector kit for these, too. It's the SCSI drives which cause a bit of pain: these have to be mounted into some computer or enclosure, depending on the interface type.
dd(1)
is
theswiss army knife to deal with storage devices of all kinds. It works at the block level and isn't dependent on the actual data. If vital blocks on the disk have taken a hit so that your partitioning or filesystem are hosed,
dd
is not impacted by that.
In the rest of this article, I move from SI-metric Kilo/Mega/Giga units to 2^10 = 1024-based unit factors throughout. Our disk marketed as "250 GB" is really:
ad10: 239429MB <WDC WD2502ABYS-02B7A0 02.03B03> at ata5-master SATA300
according to dmesg. That
239429MB
is just a rounded value, though. The disk advertises itself to the BIOS with a geometry of
cylinders=486459 heads=16 sectors/track=63
yielding a total of 490350672 blocks (with 512 bytes), or 251059544064 bytes. This justifies the 250 10^9 marketing GB, but we better see this as "239429 and a bit (2^20) MiB (MebiBytes)" or "almost 234 (2^30) GiB (GibiBytes)".
My first command was simply for reading as many blocks from the disk as possible, starting at the beginning (block 0):
% dd if=/dev/ad10 of=/dev/null
This command supposedly runs through the entire disk (
/dev/ad10
) as "input file", reading all blocks and "copying" them over to Unix' big bit bucket,
/dev/null
as output (pseudo) file. In case of a problem with reading a disk block, it bails out at that point with an error message. In any case we get informed how many blocks were read/written.
When
dd
runs, it is silent. On BSD systems, you can issue a
SIGINFO
signal to the running process by hitting Ctrl-T.
Dd
will respond with the block position it is currently working on.
Reading just one block at a time is
slow. On the first day, I aborted the
dd
command after having read 200.000.000 blocks (100 GB) successfully.
On the second day, I picked up reading more blocks where I stopped the day before:
% dd if=/dev/ad10 of=/dev/null skip=200000000
This command stopped with this output:
dd: /dev/ad10: Input/output error
135382768+0 records in
135382768+0 records out
69315977216 bytes transferred in 16899.7 secs (4101616 bytes/sec)
Taking the skipped blocks from yesterday's successful but partial run into account, this translate into:
I noted that block number down into a simple ascii file and then continued testing, skipping all the 335382768 good and 1 bad block just found:
% dd if=/dev/ad10 of=/dev/null skip=335382769
After just another 1482 good blocks,
dd
would detect the next error at block no. 335384251. Note that down, skip 335384251 blocks for the next run, rinse and repeat a few times to get an initial feeling how errors appear to be distributed. I ended up with these notes:
200000000
135382768
335382768 blks OK.
335382768 bad
335384251 bad
335402546 bad
335404029 bad
335408476 bad
335449276 bad
335450759 bad
335452242 bad
335453724 bad
335455208 bad
I finished with these manual stop-and-go tests at this point and had a look at the distances from one bad block to the next:
% awk '/bad/ {if (old) print $1-old; old=$1}' ad10-hd.errs
1483
18295
1483
4447
40800
1483
1483
1482
1484
There's some pattern here: if something bad happened to the magnetic coating in none place, we should see the same sector destroyed along a series of neighbored tracks. Perhaps 1483 is the physical track length at this spot, or the sum of the track length's across all disk surfaces. Perhaps a surface defect in a single spot would look like this?
For further bad block detection, I let
dd
do the job: the
conv=noerror
will note every problem but automatically advance past the troubling block (just like we did before) and continue from there, until the end of the disk. I used an initial skip to shortly before the first error already found, block number 335382768:
% dd if=/dev/ad10 of=/dev/null skip=335000000 conv=noerror |& tee ~/ad10-disk.errs
The resulting output starts like this:
dd: /dev/ad10: Input/output error
382768+0 records in
382768+0 records out
195977216 bytes transferred in 26.313414 secs (7447807 bytes/sec)
dd: /dev/ad10: Input/output error
dd: /dev/ad10: Input/output error
384250+0 records in
384250+0 records out
196736000 bytes transferred in 28.703818 secs (6854001 bytes/sec)
dd: /dev/ad10: Input/output error
dd: /dev/ad10: Input/output error
402544+0 records in
402544+0 records out
206102528 bytes transferred in 32.030519 secs (6434567 bytes/sec)
dd: /dev/ad10: Input/output error
dd: /dev/ad10: Input/output error
404026+0 records in
404026+0 records out
206861312 bytes transferred in 34.588290 secs (5980675 bytes/sec)
dd: /dev/ad10: Input/output error
and continues in this style for another 1500 lines. If you look closely, you'll notice the offset-adjusted "records in" numbers increasingly deviate from the bad block numbers established manually:
335382768 335384251 335402546 335404029 ...
335382768 335384250 335402544 335404026 ...
This is because the bad, skipped blocks are not counted as input records. Lesson learned:
conv=sync,noerror
would replace the unreadable blocks with zeroed blocks for the destination and include these blocks in the the "records in/out" counts.
We will fix this up when extracting the numbers with
awk
below.
I aborted the run after 13910434 blocks before shutting down the machine and going to bed. The next day, I picked things up at
skip=485000000
, finding no further errors on the disk.
So these runs located 309 errors on the disk:
% awk '/records in$/ {print $1+bad++}' ~/ad10-disk.errs > ad10.bb
% pr -t5 ad10.bb
335382768 335789372 336188550 336561556 337125282
335384251 335790855 336190033 336561768 337126698
335402546 335792337 336191515 336563251 337129770
335404029 335793820 336197446 336566256 337131186
335408476 335795302 336200663 336566468 337132602
335449276 335798520 336205111 336567739 337134018
335450759 335804450 336206593 336567951 337142754
335452242 335805933 336208076 336569221 337199178
335453724 335807415 336214259 336569433 337200594
335455208 335812115 336215741 336570704 337455984
335458424 335813598 336226371 336570916 337458816
335459907 335815080 336227854 336572186 337473216
335461389 335822745 336229337 336572398 337484784
335462872 335831641 336230819 336573669 337487616
335464354 335860568 336232302 336573881 337497768
335465837 335880093 336238484 336575364 337499184
335469054 335887758 336239967 336739950 337503672
335470537 335892207 336241449 336741432 337507920
335472020 335896906 336250597 336742915 337509336
335473502 335898389 336252080 336744397 337510752
335474985 335899871 336253562 336747363 337513824
335476467 335901354 336496544 336753545 337515240
335477950 335902836 336498027 336755028 348457280
335481167 335904320 336504208 336757993 348796478
335482650 335907536 336513105 337054458 348797894
335484132 335909019 336513317 337057290 348800966
335485615 335910502 336514799 337058706 348802382
335487098 335911984 336517805 337061778 348805214
335488580 335913467 336518017 337063194 348806630
335490063 335914949 336522252 337066026 348808046
335494763 335918167 336522464 337068858 348809462
335496245 335919649 336525217 337073346 348812534
335497728 335921132 336525429 337074762 348813950
335499210 335922614 336526912 337076178 348815366
335500693 335924097 336528434 337077594 348816782
335502176 335925580 336529917 337080426 348818198
335505393 335927062 336530129 337081842 348821030
335506876 335930279 336534365 337084914 348829766
335508358 335930280 336534577 337086330 348831182
335509841 335931761 336535848 337087746 348832598
335511323 335931762 336536060 337089162 348837086
335512806 336094401 336537330 337090578 348838502
335514288 336095883 336537542 337091994 348842750
335517506 336133916 336542030 337093410 348844166
335518988 336135399 336542242 337096482 348847238
335520471 336156659 336543513 337097898 348848654
335521954 336158142 336543725 337099314 348850070
335523436 336159625 336544995 337100730 348852902
335532584 336164325 336545207 337102146 348854318
335541732 336165807 336546478 337103562 348857390
335543214 336167290 336546690 337104978 348858806
335553844 336168772 336547961 337108050 348860222
335771077 336170255 336548173 337109466 348861638
335772560 336171737 336552661 337110882 348864470
335775777 336173220 336554143 337112298 348871790
335777259 336176437 336554355 337113714 348873206
335778742 336177920 336555626 337115130 348874622
335780224 336179403 336555838 337118202 348876038
335781707 336180885 336557108 337119618 348877454
335783189 336182368 336557320 337121034 348883358
335784673 336183850 336558591 337122450 348910742
335787889 336185333 336558803 337123866
And here is a plot. The x-axis is along the trouble spots, y-axis the faulty block number. I added the blocks
0
and
490350671
for the first and last block of the disk to put the error positions in relation to the whole disk:
So the first 270 bad blocks cluster in one area, the last 40 ones in another.
Entire faulty range: 335382768 to 348910742 = 13527974 blocks = 6.45 GB
Inner good gap: 337515240 to 348457280 = 10942040 blocks = 5.22 GB
With 490.350.672 blocks on our 233 GB disk, we should not get mad if one block gets bad. Bad blocks are just a fact of life.
It would sure be nice that this does not go unnotice. Then again: the system should not panic right away due to a bad black. Disk drives are using checksums and error correction internally to prevent wrong bits being returned to the system. Bits can get flipped at many points though, and filesystems such as ZFS or btrfs will do thier checksums to guard against these things.
Even brand new disks will have bad blocks, and more will evolve with time. It wouldn't be economical for a vendor (or buyer) to bet on "perfect" disks.
There are several methods to handle bad blocks:
Sometimes, simply rewriting a bad block can unwedge it again. You have lost the data, of course.
Let the disk designate the bad blocks.
Digital Equipment Corporation ("DEC") formulated
a standard numbered 144for their system RK06/RK07 disks. The 144 standard describes how bad blocks are marked as bad. Both the drive controller and the OS can then avoid the blocks where the magic pixie dust has fallen off. The standard is designed to cope with up to 126 bad blocks. Tools such as
bad144(8)
let you investigate the current list of bad blocks and mark more blocks as bad.
The technology is not dead. Modern drives will do their best to hide such issues from your eyes, too. Disk drives usually come with spare blocks not advertised to the system. The internal drive controller will automatically remap blocks gone bad into spare blocks if possible.
Let the filesystem mask out the bad blocks.
For a BSD filesystem,
badsect(8)
can prevent bad blocks from being used for regular files. For linux ext2/3/4 filesystems, there is
bladblocks(8)
.
Partioning around the fault
Use fdisk/gpt/... to prevent suspicious, large areas from being used.
Repartitioning is what I am going to do here. I will define the 7.15 GB from, say, block 335000000 to 349999999 as a "bad" partition, and the space before and after that is still good for healthy partitions. I won't micro-manage he disk by dealing with the good gap within the two bad streaks differently. I would rather play it safe and include more leading/trailing space with the known to be bad area. 20 GB should be plenty to play it safe and yet have much of the disk for continued use.
Scanning a disk block by block gives you detailed information where the errors are but is very slow. This is why I spent 3 days on this 233 GB disk.
We can significantly speed up things by reading many blocks at time, i.e. tackle the disk with a much larger blocksize option. Here are the first and second GB read from the disk, once transfered in single blocks, and once with 2024 blocks aggregated into a 1 MB transfer size. We read different GBs from the disk to avoid cache effects:
% dd if=/dev/ad10 of=/dev/null bs=1b count=2097152
2097152+0 records in
2097152+0 records out
1073741824 bytes transferred in 134.774307 secs (7966962 bytes/sec)
% dd if=/dev/ad10 of=/dev/null bs=1m skip=1024 count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 9.100007 secs (117993515 bytes/sec)
Whoa! From 7.6 MB/s to 112.5 MB/s! Warp factor 15! It turns out that even a modest blocksize such as 16 KB gives me 112.2 MB/s throughput.
The maximum throughput is limited by the bottleneck in the chain of
Exercise for you:
What
dd
speed can you achieve with your disk? Is it related to any marketing number you are expecting from your system? (For example: "SATA-II -- 3 Gbit/s!")
Using a sensible aggregated blocksize, we can map out the 233 GB disk in a mere 40 minutes. Any chunk flagged as unreadable can then be subjected to fine-grained, block-wise analysis.
My first attempts turned out to be miserable failures, though: Using
bs=1g
first and then
bs=256m
, I caused the machine to
I blame the OS for this, DragonFlyBSD-5.8.1 in this case. I finallly got lucky with
bs=1m
.
dd if=/dev/ad10 of=/dev/null conv=noerror bs=1m | & tee ad10-1m.dd
created a listing with 175 faults, and these were accompanied by 175 messages from the kernel in
/var/log/messages
:
Sep 18 16:33:06 fred kernel: ad10: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=335382528
Sep 18 16:33:08 fred kernel: ad10: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=335402496
Sep 18 16:33:11 fred kernel: ad10: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=335408384
[...]
The plots about the error distributions look very much like the first plot, of course:
These plots were made by
awk
ing the block numbers from the logs and plotting them with with the
jqt
IDE for the
j
interactive proramming language. J is
veryhandy to juggle around with data. I used it all the time for this article, mostly as a calculator for dealing with the various values and units, for example turning bytes/sec into MB/s.
If you are stuck with a text terminal, have a look at
ministat(1)
. This is part of the FreeBSD/DragonFlyBSD base system and will give you a simple ascii rendition how values are distributed.
Because our 175 values stack up in just to tall peaks, ending up in some 130 rows with te default
ministat
output, let me sample down the data to just every 10th value for compactness, and provide the first/last (good) blocks to put the bad block locations into context:
% printf %d\\n 0 0 239429 239429 > bounds
% awk '0==NR%10' bad-1m > bad-samples
% wc -l bad-samples
17 bad-samples
% ministat -s -w 65 bounds bad-samples
x bounds
+ bad-samples
+---------------------------------------------------------------+
| + |
| + |
| + |
| + |
| + |
| + |
| + |
| + |
| + |
| + |
| + |
| + |
| + |
| x ++ x |
| x ++ x |
||______________________________A______________________________||
| A| |
+---------------------------------------------------------------+
N Min Max Median Avg Stddev
x 4 0 239429 119714.5 119714.5 138234.4
+ 17 163796 170278 164278 164922 2035.2878
No difference proven at 95.0% confidence
dd
- be SMART!The
dd
command is transferring the data from the disk drive across the busses into memory -- just to
/dev/null
it there.
Modern disks support the SMARTfeature. You can ask the drive to do self-tests internally, without impacting the I/O system of the computer.
How this is used shall be part of another article, though.
After missing the 2000 days milestone:
Last login: Tue Jun 9 13:16:53 from kenny.gaertner.de
ULTRIX V4.4 (Rev. 69) System #17: Tue May 28 10:12:31 MET DST 1996
~~~
---
You have mail.
neitzel 1 > date
Tue Jun 9 13:18:47 MET DST 2020
neitzel 2 > uptime
1:18pm up 2222 days, 22:22, 8 users, load average: 0.21, 0.05, 0.00
neitzel 3 >
Upgrading the RouterOS on MikroTik devices is a simple affair:
All this is usually done in five minutes. So far I never had any issues caused by an RouterOS update.
Which makes you buy more MikroTik gear. The youngster in my home is...
[neitzel@lowell] > /system routerboard print
routerboard: yes
board-name: hAP lite
model: RouterBOARD 941-2nD
serial-number: 7C2C07C4455B
firmware-type: qca9531L
factory-firmware: 3.36
current-firmware: 6.46.6
upgrade-firmware: 6.46.6
This is a cheap (22 EUR), small wireless router/switch/access point serving my kitchen. Permanently attached nodes are NCD X terminal terminal
fz
and the DAB+/FM/Internet/LAN-Media radio
gaga
.
Lowell
is supposed to replace the small cisco switch lab there but, as of now, they all still share the window sill:
I have run out of my public IPv4 addresses at home long ago. Because
lowell
is currently mostly just operating as an access point it doesn't need any layer-3 address except for management. And so it became my first IPv6-only node, without any IPv4 address at all.
This was all fine. Until I tried the first RouterOS upgrade.
As
tcpdump
showed the upgrade process will resolve the server name
download.mikrotik.com has address 159.148.172.226
download.mikrotik.com has address 159.148.147.204
download.mikrotik.com has IPv6 address 2a02:610:7501:4000::226
download.mikrotik.com has IPv6 address 2a02:610:7501:1000::196
which apparently wouldserve both the current and the vintage protocol flavours. The hAP though will first try an IPv4 server, notice that that network is unreachable, and... give up. What a shame!
This short-coming is particularly disappointing because the RouterOS cantransfer data via IPv6 when asked manually:
[neitzel@lowell] /file> /tool fetch url="http://hackett.6.ml.gaertner.de/index.html" output=user
status: finished
downloaded: 0KiB
data: All my friends and I are crazy. That's the only thing that
keeps us sane.
[neitzel@lowell] /file>
D'oh!
Workaround: for RouterOS updates, I temporarily
/ip dhcp-client enable 0
. Do the upgrade dance, and
/ip dhcp-client disable 0
again.
Not nice but there are worse things in life.
Hey, let's just spend fifteen minutes on upgrading all three MikroTik gadgets, I thought around 8pm. When I went into bed, it was around 5am.
The upgrades went without a hitch on the two larger devices,
billy
and
hall
but on
lowell
strange things would happen. The "download" step went fine but the "reboot for install" step would end up in the same old package versions as before (6.46.2), with the download new version (6.46.6) purged from the
/file
area. Repeated attempts didn't help.
WLKIKIV, as we say here, and a quick
/log print
shows the problem:
not enough disk space.
The "hdd" flash memory is indeed much more constrained on
lowell
:
% echo billy hall lowell | \
> xargs -n1 -Ixx ssh xx /system resource print | \
> grep -E 'hdd|board-name'
free-hdd-space: 107.3MiB
total-hdd-space: 128.0MiB
board-name: RB2011L
free-hdd-space: 109.0MiB
total-hdd-space: 128.0MiB
board-name: CRS125-24G-1S-2HnD
free-hdd-space: 7.1MiB
total-hdd-space: 16.0MiB
board-name: hAP lite
%
The size of the stock "combo" release packages is now approaching half of the
16.0MiB
disk size:
% echo 2 6 | xargs -n1 -I X lynx -head -dump \
> https://download.mikrotik.com/routeros/6.46.X/routeros-smips-6.46.X.npk |\
> grep Length
Content-Length: 7651050
Content-Length: 7700154
%
During an upgrade, both the old and new version have to sit side-by-side on the disk, the filesystem structure needs some space, the config needs some space, ... the official documentation is asking for 2 MB spare capacity. After this download though I was down to the last 44 KB(!) on the disk.
A few months ago, in the same situation, I found a surplus support-dump I could delete to gain enough breathing space. No such luck tonight.
With the current RouterOS "Stable" images, things have now simply become too tight for a stock "hAP lite" and similar devices. without much extra config/data on its flash medium to upgrade to newer stock RouterOS versions. To be frank, this is major surprise if not a disgrace.
The official RouterOS documentation doesn't address this problem.
Grudgingly I dived into the "community support". I simply hate sifting through web fora, no matter which ones. It took hours.
Yes, I was not the only one with the problem. There were messages about the problem without any followup at all; there were quite a handful of wrong explanations; there was even a bit of ad-hominem and mud-slinging.
It did pointed me to the proper solution though:
By default, have their software installed from a "combined routeros package" which contains a selection of individual feature packages. It should not happen but the combined package can become to big for smaller platforms. You have then to switch over to deal with the packages individually, selecting you own mix.
My first idea was to delete a few of the 6.46.2 packages which I currently don't use in order to create the space for the complete new kit.
Turns out that you cannot
/system package uninstall
anything when everything comes from the "combined routeros".
The only way forward is this:
Make an extra backup of your configuration beyond of what the automatic reboot/reset backup is providing. The commands are simple and the demands on precious flash space are small:
[neitzel@lowell] > /system backup save
[neitzel@lowell] > /export compact file=cfg-mn
[neitzel@lowell] > /file print where type!=directory
# NAME TYPE SIZE CREATION-TIME
0 cfg-mn.rsc script 6.4KiB may/27/2020 03:18:50
1 auto-before-reset.b... backup 19.0KiB jan/02/1970 02:42:11
2 lowell-20200527-031... backup 30.1KiB may/27/2020 03:14:53
Download the "Extra packages" kit matching your hardware from https://www.mikrotik.com/download.
This kit does
notcontain just "extra" packages for the more obscure features as the title suggests to me. Instead, the filename is much more appropriate:
all_packages-smips-6.46.6.zip
. This zip contains the ten packages which comprise the "combined" = "Main" package (=
routeros-smips-6.46.6.npk
), and only three extra pkgs:
multicast
,
openflow
,
tr069-client
. (A full listing is below.)
Download this .zip archive elsewhere and extract the .npk packages.
Use scp, ftp or RouterOS'
/tool fetch
to copy a subset of the packages into the
/file
area for installation. Everybody needs the
system
package which weighs in with 5.5 MB alone. Another essential package for me is
ipv6
(196 KB) to be able to access/manage the hAP-lite at all.
dhcp
might be that thing for you, and in that case you also need
security
(155 + 307 = 462 KB). And since
security
is also required for
ssh
access, I used that, too. These four pkgs already total at 6+ MB, enough to get nervous.
Reboot to install these packages.
You onlyget the few selected new packages.
Allpackages/features from the old version get removed. The result is nota mix of old and updated packages.
Your new reduced feature set will load your old configration as much as possible. Settings for now missing features will be
lost. For example, without the
wireless
pkg, I lost my WLAN definition.
Luckily, you didn't skip the the first step, saving your config, did you?
With the old version's packages gone, you have now plenty of disk space for the other new packages. Install as much as you want by copying them to the
/file
area and rebooting.
With all wanted new packages in place, you can now reload your configuration:
/system backup load name=lowell-20200527-0314.backup
or
/import cfg-mn.rsc
As of now I haven't figured out which is better in which case. I suppose that either would do for me.
I believe you can choose between these two strategies:
Exercise some restraint and aim at "below 7 MB for everything", so that future upgrades are completely painless. The standard
/system package update
process should download only those packages you have in use.
In my case, this would be: system, ipv6, wireless, security, dhcp. As of now, this already totals in 7149168 bytes aka 6.8MiB. Hrrmmm....
If you prefer a "all packages" setup, you
willhave to go through the "update to/with minimal package set / add extras later" on every single update. The only ease is that you can get rid of ballast before doing the upgrade:
/system package uninstall
will now work. You can then do the (minimal) upgrade and re-add non-minimal packages afterwards. Again, this requires the download of the "Extras" .zip-file. And, of course, the backup of your configuration.
I am wondering how all this will pan out for me. I'll try to automate the "all packages" updates, i.e. the second approach.
For reference, here is my current
lowell
installation and sizes of the corresponding all_packages:
neitzel 373 > unzip -l all_packages-smips-6.46.6.zip
Archive: all_packages-smips-6.46.6.zip
Length Date Time Name
-------- ---- ---- ----
69713 05-14-20 12:14 advanced-tools-6.46.6-smips.npk
155729 05-14-20 12:14 dhcp-6.46.6-smips.npk
147537 05-14-20 12:14 hotspot-6.46.6-smips.npk
196689 05-14-20 12:14 ipv6-6.46.6-smips.npk
57425 05-14-20 12:14 mpls-6.46.6-smips.npk
36945 05-14-20 12:14 multicast-6.46.6-smips.npk
49233 05-14-20 12:14 openflow-6.46.6-smips.npk
258129 05-14-20 12:14 ppp-6.46.6-smips.npk
69713 05-14-20 12:14 routing-6.46.6-smips.npk
307281 05-14-20 12:14 security-6.46.6-smips.npk
5330220 05-14-20 12:14 system-6.46.6-smips.npk
114769 05-14-20 12:14 tr069-client-6.46.6-smips.npk
1159249 05-14-20 12:14 wireless-6.46.6-smips.npk
[neitzel@lowell] > /system package print
Flags: X - disabled
# NAME VERSION SCHEDULED
0 security 6.46.6
1 ipv6 6.46.6
2 dhcp 6.46.6
3 advanced-tools 6.46.6
4 system 6.46.6
5 wireless 6.46.6
6 hotspot 6.46.6
7 mpls 6.46.6
8 multicast 6.46.6
9 openflow 6.46.6
10 ppp 6.46.6
11 routing 6.46.6
12 tr069-client 6.46.6
[neitzel@lowell] > /system reso print
uptime: 8h49m20s
version: 6.46.6 (testing)
build-time: Apr/27/2020 10:32:16
factory-software: 6.28
free-memory: 7.7MiB
total-memory: 32.0MiB
cpu: MIPS 24Kc V7.4
cpu-count: 1
cpu-frequency: 650MHz
cpu-load: 0%
free-hdd-space: 7.0MiB
total-hdd-space: 16.0MiB
write-sect-since-reboot: 215
write-sect-total: 194269
bad-blocks: 0%
architecture-name: smips
board-name: hAP lite
platform: MikroTik
The 22,- EUR are dirt cheap but my time isn't. Automating the the "all packages" updates will certainly be a worthwile learning experience.
How about 44,- EUR for a non-lite hAP? Or a 50,- hAP ac lite? Flash is still sized at 16MiB but you can add a USB stick. Would that help? I couldn't find any statements on this in the manual or product brochures.
If not, then the entire hAP/cAP/wAP range of "16 MB Flash" MikroTik products does not really have a future in the "Stable" RouterOS track for consumers. MikroTik must resolve this issue somehow.
The RB951Ui-2HnD comes at 80,- EUR and with 128MiB NAND storage. This would definitely remove the upgrade pains albeit at a noticeable price increase.
fred.marshlabs.gaertner.de
is now running DragonflyBSD-5.8.1, a bugfix release from three days ago.
quick
builds did the job. Total time for this source upgrade including everthing from
git pull
to merging
/etc
files, rebooting, updating pkgs, and a final
make initrd
: some 45min.
fred
is an Intel E6550 Core2 Duo CPU with exactly that: 2 cores, running at 2.33GHz, has 4 GB RAM, and 230 GB spinning rust.
Good keyboards do not die. They just get a bit dirty.
Here's a quick run through the marshlabs, hunting for all those Cherry G80ies with their original DIN plugs still in active use. For the youngsters: we are talking about these:
G80-DIN PS/2-adaptered into the IGEL thin client / host
susan.marshlabs.gaertner.de
:
G80-DIN PS/2-adaptered into the NCD X terminal
fz.marshlabs.gaertner.de
:
G80-DIN straight into into the i486-DX66
miles.marshlabs.gaertner.de
:
Front side: 3 * Pentium1 (133 Mhz), backside: 3 DIN kbd sockets:
Nerds move with boxes dedicated to gear. Let's home in into the bottom one:
(The small adaptor there is for a PS/2 keyboard and a DIN computer socket. Allowing you to take modern kbd and adaptor USB to PS/2 to DIN to
miles
.)
Update in the evening: a quick tally in our office rooms (tech and sales = 14 people) yields five DIN-plugged keyboards still in active use in the company.
The highlights there:
Some work in the aftermath of Sunday's DSL troubles:
To recap: my "CPE machinery" consists of
A Fritz!BOX WLAN 3170 in modem mode. (The routing/NAT and WLAN features are not active.) The FB provides 4 ethernet ports.
A MikroTik RouterBoard 2011 (
billy.marshlabs.gaertner.de
) is acting as PPPoE client and is the actual Internet gateway for the marshlabs.
The PPPoE packets between the RB-2011 and FB-3170 travel over a dedicated 7 meter ethernet cable.
The FB-3170 ist still manageable via IP. I defined its IP address to be
192.168.77.1/24
instead of the default
192.168.178.1/24
(which my neighbor's WLAN already uses -- another FritzBox in da house, and within reach :-). For trouble-shooting on Sunday/Monday, I moved my laptop close to the FB and plugged an extra cable into one of the three remaing free LAN ports.
Today, I reconfigured the RB-2011 to have things a bit more convenient.
Before: the PPPoE was defined on top of Ethernet port with the cross-link to the FB. This uplink port (number 10) was not part of any other bridge. (Bridges are the RouterBoard-RouterOS way to tie ports together.)
After: there is now a new small "cpe-bridge" defined on the RouterBoard, with ports 9 and 10 as members. So port 9 becomes another option to attach to the FB management LAN, right next to my desk. Packets travel over the existing 7m crosslink cable. The pppoe-client interface had to be moved a little bit: now it sits on top of the
cpe-bridge
, not anymore on top of the single port.
With this setup, it was already possible to keep the laptop at the desk when connecting to the FB-3170.
Even more luxury was possible by making the RB-2011 an active player in and router for the mgmt LAN: just add a IP address to the bridge. I decided on the static
192.168.77.2/24
instead of some DHCP assignment from the FB.
I then tested two alternatives to connect to the FB-3170 (192.168.77.1/24) from my real LAN (217.13.64.128/26):
Let the RouterBoard do the work and hide my LAN via NAT behind its
192.168.77.2
bridge address:
[neitzel@billy] /ip firewall nat> print
Flags: X - disabled, I - invalid, D - dynamic
0 chain=srcnat action=masquerade dst-address=192.168.77.1
out-interface=cpe-bridge log=no log-prefix=""
Let FB3170 do the work: even in "modem-only" mode, it will take additional static routes to extend the "LAN" side.
Of course I settled on the latter. Let's avoid NAT wherever possible.
My first actual "management action" today was to install new firmware to the FB-3170. I went from 49.04.24 to 49.04.58. The release notes promised "more stable DSL". Well, it turns out that the downstream now syncs at only 10.700 Kbps instead of 11.300 as before. (Nope, there is no Go Faster! option; I could just throttle things further down.)
And there is now an "energy monitor". Fancy.
On the lucky side: they didn't nuke the modem-only operational mode.
TODO: in four weeks, check if April and May differ noticably in the RIPE ATLAS measurements. Will
atlas.marshlabs.gaertner.de
aka
p2781.probes.atlas.ripe.net
be more reachable than before?
alexis.marshlabs.gaertner.de
is an
OLinuXino ARM-Board. It still runs the original Debian
wheezy
distribution which, by now, is "a bit" long in the tooth.
There's a reason for that old system. In earlier years, the special MALI-400 GPU support tied to the 3.4 linux kernel prevented an upgrade.
In the recent past, though, the mainlining efforts have made huge steps forward.
Today, I downloaded the Olimex-provided images for their "Armbianish" Debian-buster and Ubuntu-bionic releases, and even more current versions are afoot. Another candidate for the upgrade is Arch Linux ARM.
I need to find out if its possible to use a btrfs root filesystem. This would be just great for an extensive systemd-nspawn/machinectl setup.
I still have 90 GB unassigned on the 120 GB SSD and two unused partition entries in the MBR, set aside from the start for such an upgrade.
I don't dare to do the tests during the week: The board serves as XDCP, font server, and TFTP server for the X terminal in the kitchen, as well as for the telephone (tftp). It's also the DNS server for the marshlabs. So, I wouldn't mind a rainy weekend.
What a waisted day.
My DSL-CPE elected to reboot itself every three or four minutes. Ping packets run as usual initially, with 44ms RTT into the office, then times ramp into the hundreds, thousands, ... and between 10000ms and 20000ms the CPE reboots.
It's a lowly Fritz!Box 7130 operated just as an ADSL2+ modem, not as router. (FritzBoxen do a forced NAT. I run public IP space at home.)
Turning the machine of for a short (10 min) or long (5 hours) period didn't change anything. However, since exactly(!) midnight everything was OK again.
Let' see how this develops. I had just a single reboot today. A matching replacement power-supply is at hand, too: my small 5" VGA LCD monitor delivers 12V/1A, too. (Hey, 1.2A even.)
Thanks to my friendly upstairs neighbor who let's me use her WLAN & internet connection in times like these.
And yes: It's always a good idea to choose something other than the default vendor LAN addresses. Being connected both to my neighbor's WLAN 192.168.178.0/24 and on the LAN side to my CPE's 192.168.178.0/24 managment LAN just doesn't cut it.
Makes me wonder: can we already use the IPv6
fe80:...%if
notation for RFC-3927
169.254.0.0/16
IPv4 link-local nets?