Wednesday, December 31, 2014

Pre-decimal Machins

Just been mounting the set of 1967 Machins I got for Christmas... so nice to have the values that I've never seen before. The golden brown ½d, the beautiful powder-blue 8d (I'm guessing this replaced the red 8d after the red version of the 5d came out) and the green 9d.

Friday, December 19, 2014

Secondlife

Persistently orange-clouded? With the Developer menu enabled, and the Debug console selected, you may see useful messages that will explain that 'self is clouded' due to missing (unworn) items, eg SHAPE or EYES.

Sunday, December 14, 2014

Smallpox

Right, so although Janet Parker had been vaccinated against smallpox in 1966, she was infected in the Birmingham University outbreak of 1978 because the vaccination is only good for about 10 years.

By the way, kudos to the unnamed doctor called by Ms Parker's parents, who on August 24 1978 realised that she was suffering from smallpox even though it was a very unlikely diagnosis at that point in time, the last British outbreak having been in 1966. (I'd like to know a bit more about this previous outbreak - the writer of this article seems to know more than he's saying).

By the way, don't do an image search for 'smallpox lesions'. Though if you do, it will teach you why so much effort was put into eradicating the disease. (Though the US have nobly retained their stocks, for research and definitely not for use against 'enemies of freedom', that is right out, they deny that absolutely.)

Monday, December 8, 2014

Colon classification

Empty, emptying etc

Empty digit

An empty digit (Z, 0, 9) is effectively not there. The leading 9s in the octave device (7, 8, 91, 92 etc) are empty digits. It is as if 91 and 92 are new single digits that follow 8.

Emptying digit

An emptying digit (T, V, X) nullifies the semantic value of the preceding digit, but not its ordinal value. So in the sequence K, KX, L the X causes KX to sort after K, but not to be a subcategory of K. KX here is co-ordinate with K and L just as M is - it's as if KX is a new single digit that comes between K and L. NB that when these digits are initial digits, they don't have any emptying effect (because there is no preceding digit!)

Empty-emptying digit

An empty-emptying digit (U, W, Y) combines both roles. It enables infinite interpolation between any two digits, eg

L Medicine
LX Pharmacognosy
LY1 Nursing
LY7 Anaesthiology
M Useful arts

Without empty-emptying digits, we would only be able to interpolate 3 times between L and M (LT, LV, LX).

If there was an LX1 in the above list, it would be a subcategory of LX. LY1 and LY7 however are not subcategories of LY, because a number LY cannot exist, just like a number 9 cannot exist with the octave device. Again, it's as if LY1 and LY7 are new single digits that come between LX and M.

(I think of empty-emptying digits as having 'zero width' or being 'control characters' - they aren't actually in the final output, they just indicate how the other characters are to be interpreted).

Partial comprehension

Better thought of as agglomeration - I think 'partial' here is meant to be read as 'of parts', ie 'comprehension of parts'.

Is done with Z as a contextual emptying digit (normally it's just an empty digit) and * as an anteriorising digit (to make Z sort before 'no digit').

16 Upper extremity
17*Z Head and neck
17 Neck
18 Head

Friday, November 14, 2014

Command line Rosetta stone

C

argc is the number of elements in argv[], which is 0-based.

Executable name is argv[0] and arguments are argv[1..]

Perl

$#ARGV is the index of the last element in @ARGV, which is 0-based.

Executable name is $0 and arguments are $ARGV[0..]

Autoit3

$CmdLine[0] is the index of the last element in $CmdLine, which is 1-based.

Executable name is @AutoItExe and arguments are $CmdLine[1..]

Command line as entered is $CmdLineRaw.

C#

args.Length is the number of elements in args, which is 0-based.

There's no simple answer to finding the executable name. Arguments are args[0..] or Environment.GetCommandLineArgs(). My tests show [0] as the first argument, the documentation says [0] is the executable name.

Command line as entered is Environment.CommandLine.

Friday, October 31, 2014

Unicode thinking aloud

I've got this set of text files that have a lot of C2 bytes in them, usually prefixing characters that I think of as 'extended ASCII', like 97 for an em dash. Suspecting this might be something to do with this new-fangled Unicode stuff, I had Firefox display one: it correctly rendered the 97s and hid the preceding C2s, and reported the encoding as UTF-8.

So I happily went off to look at a UTF-8 table, only to find that C297 is a control character, and not an em dash.

But this page has the correct translation.

So thinking about it really hard - till it hurts:

In the old extended ASCII codepage 1252 ISO Latin-1 character set, 97 hex is an em dash
UTF-8 is one of the modern systems for encoding modern Unicode characters, which can have up to 4 bytes (00 00 00 00 to 10 FF FF FF). There are several such systems:
- UTF-32 which has a nice sensible invariant 4-byte-wide character
- UTF-16 and UTF-8 which use variable-length characters, each aiming to give a certain specific subset of characters the shortest code
- UTF-8 favours original ASCII characters 00-7F - they all get 1-byte characters, but higher codes take up to 6 bytes to express
- But we digress.
When 97 encoded in UTF-8, 97 translates to C2 97

Why does it translate thusly?

I've adapted this from http://sydney.edu.au/engineering/it/~graphapp/package/src/utility/utf8.c:

Max Sig Bits  Pattern
------------  -------
           7  0xxxxxxx
          11  110xxxxx 10xxxxxx
          16  1110xxxx 10xxxxxx 10xxxxxx
          21  11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
          26  111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
          32  111111xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

So for example, 00-7F have 7 significant bits - xxxxxxx, or to put it more helpfully, abcdefg - and these are rendered 0abcdefg. So 00-7F translate to single bytes 00 to 7F.

But our character, 97 (1001 1111) has 8 significant bits, which means it falls in the 11 bit category, and is rendered 110xxxxx 10xxxxxx, or again to put it more helpfully, 110abcde 10fghijk. Thusly:

             9    7
          1001 1111          
      000 1001 1111
      abc defg hijk
      abcde   fghijk
      00010   011111
   110abcde 10fghijk
   11000010 10011111
  11000010  10011111
 1100 0010 1001 1111
    C    2    9    7

Ta da!

The only remaining mystery is why the first chart I consulted listed C2 97 as the translation for a control character. I wonder if there is some confusion with U+0097 'END OF GUARDED AREA'.

Perhaps it is because UTF-8 is a just a coding scheme - a dumb algorithm for converting bit patterns to byte groups - rather than part of Unicode itself. UTF-8 doesn't know or care what 97 means, it just knows to translate it to C2 97.

In fact yes, that's it.

When the text files in question were originally created, an em dash was turned into a codepage 1252 extended ASCII character 97.
The files were UTF-8 encoded and this turned 97 into C2 97
My text editor rendered that as two characters - C2 (Â) and 97 (—) - because it only understands 8-bit extended ASCII
But Firefox took C2 97 as a 2 byte encoding of 97, and duly rendered a single character 97 (—)

Monday, August 18, 2014

Sidereal time offset

The offset of sidereal time from UK civil time increases by 2 hours every month, but is disturbed by summer time. Assuming that the clocks go forward on Mar 28 and back on Oct 28, and that the offset is closest to an integer number of hours on the 6th and 22nd of each month, then the crucial points to remember are

March +11 and +12 (On Mar 22 Sun has RA 0h and is 0 hours from culmination at 1200)
April +12 and +13 (April fools you by repeating 12)
October +0 and +1 (October)
November +3 and +4 (November skips over 2)

People are asking me what I mean by 'sidereal time offset'. The idea is to have a quick way of working out the approximate current sidereal time in my head. The sidereal time offset is the number of hours you add to the civil time to get the sidereal time.
For example, at the spring equinox sidereal time is 12 hours ahead of UK civil time (GMT in this case) so the sidereal time offset is +12 - if it's 11am, then the sidereal time is 2300.
Similarly, at the autumn equinox sidereal time is 1 hour behind UK civil time (BST) - the Sun has RA 12h and is 0 hours from culmination at 1pm BST (1300), so the offset is -1 (or +23 if you like). Remember, when BST is in force then the sun isn't due south at noon, because noon is actually only 11am.
(Yes, I know about the equation of time but its maximum effect is a variation of 18 minutes, which we can easily afford to neglect when making an approximation).

All right, you've talked me into it, you can have a table. I've emboldened the points where the pattern is disrupted by UK daylight savings time. US DST starts a month later in the spring IIRC.

Month	Offset 6th (h)	Offset 22nd (h)
January	+7	+8
February	+9	+10
March	+11	+12
April	+12	+13
May	+14	+15
June	+16	+17
July	+18	+19
August	+20	+21
September	+22	+23
October	+0	+1
November	+3	+4
December	+5	+6

Or to put it another way:

Sidereal time is civil time...		Sidereal time is civil time...
+0	Oct 6	+0
+1	Oct 22	-23
+2	-	-22
+3	Nov 6	-21
+4	Nov 22	-20
+5	Dec 6	-19
+6	Dec 22	-18
+7	Jan 6	-17
+8	Jan 22	-16
+9	Feb 6	-15
+10	Feb 22	-14
+11	Mar 6	-13
+12	Mar 22, Apr 6	-12
+13	Apr 22	-11
+14	May 6	-10
+15	May 22	-9
+16	Jun 6	-8
+17	Jun 22	-7
+18	Jul 6	-6
+19	Jul 22	-5
+20	Aug 6	-4
+21	Aug 22	-3
+22	Sep 6	-2
+23	Sep 22	-1

Thursday, July 17, 2014

MERGE

MERGE (compared to how I expected it to work)

merge targett [ alias ]
using /* targett join */ sourcet st
on /* tt.*/ field1=st.field1...
when matched then
update /* targett */ set field2=st.field2...
when not matched then
insert /* into targett */ (field2...)
values ( field2... )
;

Essentially you do not (as you would with update...from or delete...from) specify the target table where it could be assumed. A join between target and source is both assumed and required.

NB that an alias for targett is allowed, and sometimes required for disambiguation.

You can't put WHERE clauses in the join part, but you can put eg "AND age > 70" after the WHEN (NOT) MATCHED bits.

MERGE (with expectations omitted for clarity)

merge targett [ alias ]
using sourcet st
on field1=st.field1...
when matched then
update set field2=st.field2...
when not matched then
insert (field2...)
values ( field2... )
;

NB that unlike traditional SQL statements, MERGE has to be followed by a semicolon. If using DbVisualizer, make sure that neither of the Statement Delimiter options is set to ';', as DbV will interpret it as a 'go' if so.

If you need to tighten the join between sourcet and targett, eg to include only records matching a profile, do it with an 'AND' clause after each 'MATCHED'. It's no good putting such conditions in the join itself.

The using clause doesn't just have to be a table name, it can be a subquery:
using (select * from ... ) st

Thursday, May 1, 2014

Perl returned list

Ooh that's interesting. If a sub returns a list

sub a {
return ("name", 99);
}

but it's assigned to a scalar

$f = a();

the value assigned seems to be the last item in the list (which you'd expect if you thought about list items being pushed onto a stack in order) and not the first item, and not the scalar value of the list either (the number of items).

Friday, April 25, 2014

unpack

You skip characters in the input with x, eg

unpack("A1x2A8", "1..Hartnell")

This is badly explained in the documentation, which is all written in terms of pack and so explains 'x' as meaning 'insert a null byte'.

Case is significant in pack codes, and particularly in this one: 'X' means 'back up a byte in the input'.

Thursday, April 10, 2014

Perl chain

I knew it was possible to 'chain' one Perl script from another with do, but I could never see how to give the second script arguments.

But of course it turns out to be easy: you just explicitly set @ARGV before you 'do'. The second script is eval'd in the same global scope that the 'do' is in, so it gets the @ARGV that you set.

Btw in this situation, $0 does not change inside the 'do' - it's still the name of the first script.

Should point out that 'chain' isn't quite the right word - control of course returns to the first script after the 'do' is complete. 'Chain' implies that script 1 finishes when script 2 begins.

Friday, February 28, 2014

sqlite cross-table update

sqlite still won't do cross-table updates (stuff like update b from table1 b join table2 a on....), but there is that lame replacement:

  update b set b.f1 = (select f2 from a where a.f3=b.f3)

which of course has the side-effect of setting b.f1 to null where there isn't a corresponding record in a.

However, despite the magic that controls which rows actually get updated, the (select ...) bit is still just an expression like in standard SQL. Which means it can be an argument to coalesce():

  update b set b.f1 = coalesce((select f2 from a where a.f3=b.f3), b.f1)

While this isn't ideal it does solve the null problem.

sqlite performance

Definitely do follow the advice on stackoverflow about wrapping batch queries (eg a single query with 1000 insert statements) in their own transaction with

begin transaction
..
end transaction

I got a 2-orders of magnitude speed increase just from trying that, it was the only optimisation I needed.

Saturday, February 15, 2014

HTTP::Daemon

Firefox and Iron are more forgiving with undelimited responses from HTTP::Daemon than Opera or other http clients (eg curl, or Uniface's UHTTP component*).

I find that the latter need to have their responses terminated with force_last_request, or accompanied by a correct Content-Length header ('correct' appears to be length($whatever_message_string_you_sent) ).

*Don't use this, it leaks memory like a sieve, spawn curl instead. And consider not using Uniface either

Friday, February 14, 2014

gcc link option

Option order is significant with gcc - if you want to link, eg, to the psapi library, you must not only use -l psapi but it must come after your source code in the command line:

gcc f:\src\c\ex-proclist.c -l psapi

(I picked that up from stackoverflow but it wouldn't let me vote it up)

Wednesday, February 12, 2014

Perl shift

shift doesn't seem to have a list context -

%settings = shift();  # no

%settings = @_;       # yes

Friday, January 31, 2014

Perl use constant

The new constants declared by 'use constant' are very useful, but if like me you test your new packages by putting a stub at the top of the .pm file, you'll run into this gotcha:

$m = new Package;
$m->doit(Package::RED);

{
package Package;

use constant RED => 0xFF0000;

...
}

The call to doit will supply the string "Package::RED" not 0xFF0000 - because constant declaration and substitution both happen at compile time, and the doit call is compiled before the use constant because the former is reached first.

The solution is to put the stub after the package.

Thursday, January 30, 2014

Perl => quoting

=> quotes the thing to the left of it - but it does not preserve leading zeroes if that thing is a number. %h = ( 00 => "null" ) will result in '0' being the hash key, not '00'.

Quick