• Random line from a file

    From Amcleod@VERT to All on Tue Aug 21 01:39:00 2001
    The question arises periodically -- how to select a line randomly from a text file? If the file were filled with fixed-length records you could compute the offset of a random line and FSEEK/FREAD the apropriate line. Alas, text files are annoyingly variable in line-length. Or you could read all the lines into an array (simulated on disk if needed) and select randomly from the array. Ugh! So here is my method-of-choice, which reads through the entire file from top to bottom, and randomly picks one line out of the file:

    #----------------------------------------------------#
    # randomly select one random-length line from a file #
    #----------------------------------------------------#

    !INCLUDE FILE_IO.INC

    STR randomfile inbuffer winner
    INT handle line_count rval cval

    #--------------------#
    # set up some values #
    #--------------------#
    SET randomfile "c:\\sbbs\\exec\\oneliner.txt"
    SET line_count 0
    SET winner "Default Value -- in case routine fails"

    #---------------#
    # open the file #
    #---------------#
    FOPEN handle O_RDONLY randomfile
    IF_FALSE
    PRINTF "Error opening %s\n" randomfile
    RETURN
    END_IF

    #----------------------------------------#
    # loop through entire file, line by line #
    #----------------------------------------#
    :Loop
    # check for End-of-File
    FEOF handle
    IF_TRUE
    # we're done!
    GOTO Were_Done
    END_IF

    # read line -- watch for read failure
    FREAD_LINE handle inbuffer
    IF_FALSE
    # we're done!
    GOTO Were_Done
    END_IF
    ADD line_count 1

    # compute random value and comparison value
    RANDOM rval 1000000
    SET cval 1000000
    DIV cval line_count

    # is this line a winner, replacing previous winner?
    COMPARE rval cval
    IF_LESS
    # we have a winner
    COPY winner inbuffer
    END_IF

    # check remaining lines
    GOTO Loop

    #------------------------------------------------------------------------#
    # we should have a winner by this point, and can do with it what we want #
    #------------------------------------------------------------------------#
    :Were_Done
    PRINT winner

    It's quite clever how it works (no, I didn't think of it first!). It reads each line and determines whether that line would have been selected _assuming_ there are no more lines in the file. If it _does_ find another line, it determines whether it would have been chosen over what ever line had _already_ been chosen. Like this:

    It reads the first line. Possibly the ONLY line. So this line must be selected 100% of the time. Hence Line #1 is selected randomly 1/1 of the time. Then it reads the second (and possibly last) line. If there are only two lines in the file, then there is a 1/2 chance that line #2 will be selected. Should a random comparison show that the 1/2 chance is met, line #2 replaces what was previoulsy selected (IOW line #1). Now it reads line #3. This line is 1/3 likely to be selected. If it is, it replaces the previously selected line (either #1 or #2). Line #4 replaces the previously selected line 1/4 of the time. Line #5 replaces the previous winner 1/5 of the time... and so on.

    The only thing tricky is that since we don't have FP values, we have to generate integer random numbers and scale the count accordingly. (You can't have values of 1/4 = 0.25 to compare with randiom numbers between 0 and 1, so you have to use a scaling factor -- 1,000,000 in this case -- and compare 1000000/4 with a random number between 0 and 1000000.)

    Caution -- this routine will select blank lines as well as any other, so watch out for blank lines PARTICULARLY at the end of the file. Or check for blank lines immediately after reading and "GOTO Loop" immediately, without adjusting the counter, if you find one. Skip Comments the same way.

    You could also select "cookies" which consist of more than one line using a method similar to this. You'd have to separate groups of lines somehow (say a blank line), accumulate all non-blank lines as a possible winner, and when you discover a blank, THEN you incriment your counter, make your random check, and if successful replace the previous winning lines with the new set of lines you have accumulated.

    I've run this algorythm in a loop 250,000 times with a file containing 25 different lines, and each line is selected 10,000 times +/- 1% which is good enough for government work.

    ---
    þ Synchronet þ Vertrauen þ Home of Synchronet þ [vert/cvs/bbs].synchro.net