|
Assembly
Language
Extensions of Visual
Prolog for Data Mining
of Huge
Data A lecture and presentation at the
INTRODUCTION In this paper we present
an
overview of very specific methodologies that were used in real-world
problems,
producing quite adequate -or even impressive-
results by the combined use of Visual
Prolog and Assembly Language.
Visual Prolog has been
designed with serious optimizations in mind: Whenever possible, it
tries to
avoid unnecessary computations (that other Prolog compilers perform all
the
time) if these computations are not required for the operation of each
specific
(e.g. deterministic) predicate. All linked “foreign language code”
works
seamlessly inside native Prolog code. As a result, the linking of
Assembly
language subroutines “as if they are Prolog predicates” is efficient
from the
outset, and these foreign predicates run as fast as they would in any
other
(“mainstream”) developer environment.
At the time of writing
(March 2006) there are more than 1100 Visual Prolog predicates that
have been
implemented in Assembler, as well as more than 150 predicates
implemented as
‘C’ glue-functions. In addition, about 20 new predicates are added to
these
libraries every month (depending on the practical needs of programming
projects).
There are families of such predicates dealing with lists, strings and
characters, “binaries”, CSV Excel-files (texts delimited by
separators),
time-series, dates, tree-like data structures and other objects. All
these
predicates originated from real needs in
real software applications. The efficiency and speed of these
applications
destroys the myth of Prolog being “just an academic language”, while
outperforming commercial applications of similar nature developed using
mainstream languages by a significant factor, typically of an order of
magnitude - and quite often more. WORK ORGANIZATION This work has been
organized on the basis of the fact that Assembly Language code is not easy to modify or debug unless it is properly
commented, and it is not
useable at all unless the comments include instructions
for calling each routine externally as a
Prolog predicate.
Such conversions are typically necessary for 95% of current Prolog projects. The growing need for such predicates led to about a dozen of them implemented in Assembly. Their full global definitions are listed in Appendix A: This library includes customized predicates implementing many such conversions, from strings to lists -and vice versa- in a variety of possible ways. When
dealing with large
texts, our most basic need is to
copy them inside main memory for further processing. Unfortunately, it
has been
verified that -after certain specific list or text size limits- almost
any Prolog predicate that uses recursion
is useless for simple tasks, like
converting texts to lists! This situation can (of course) be improved: If we don’t want to use Assembly language, it is certainly possible to create optimized Visual Prolog predicates that minimize the use of the stack, so that texts of somewhat larger sizes (than before) can be converted, at somewhat improved speeds. Visual Prolog includes implementations of (non-ISO) predicates for string-handling that can access the memory of strings (or lists) directly. Examples of code using these built-in predicates are already given inside Visual Prolog’s documentation for advanced users, so they won’t be repeated here. The important thing, however, is that even such optimized coding, impossible to do in most other Prolog compilers (which lack those extra predicates) and possible only in Visual Prolog, inevitably and very quickly hits against some new limits of speed and memory, that can be shown to be (this time) unsurpassable. One of our most highly optimized conversion predicates is ‘strx2list’: It can convert strings (typically large texts, which can be of several megabytes size) to Visual Prolog string-lists. Here is a table of measured execution times, for the optimized predicate “strx2list”, in all its possible variants and memory methods.
The
timings (in seconds) of columns A, B, C refer to literally millions
or tens of millions of
elements! Alas, it
was discovered that (after a few
hundred
thousand elements) most high-level
language code does
not work (Prolog, as well as any
language).
Optimised Prolog code executes at about one order of
magnitude (i.e. 10 times) slower than optimised
Assembly for the
restricted small range of text and list sizes where it does
work. Now, although these results are correct (even at higher data-sizes than those shown in the table) at a turning point of about 250 Mb of text (and 25 million elements) the execution time of the ASM-predicates increases steeply and unpredictably, due to RAM limitations of the specific computer used (P4, 512 Mb). Experienced users of Visual Prolog may have noticed that columns A, B, C (of the previous timing-table) carry annotations of the memory allocation method used in each case. And the three possible ways of allocating memory are A) Visual Prolog’s stack, B) Visual Prolog’s heap, and C) no new allocation (i.e. memory must be already allocated, before the call). All these three memory allocation methods are possible, using the predicate “strx2list”: It was implemented in two variants, one with two and one with three input-arguments. PREDICATES strx2list :
(string8 STR, char sepc) ->
slist procedure(i,i)
language c strx2list : (string8 STR, char
sepc,
binary Buffer) è
slist
procedure(i,i,i) language
c
If
you are somewhat familiar with Assembly language, you might see what is
going
on in detail, by looking into Appendix B ,
the sourcecode of “strx2list”: The assembly
language main
loop checks out every byte of the input string for presence of a
“separator-character” (arg. 2) while also planting
into this position a zero-byte
(signifier of the “end of a string” in ‘C’ and Visual Prolog) and then
inserts
as a list-element only a pointer to
the start of each chopped part of the
string (rather than a new string). Consequently, we could say that the
input-text is “recycled”, to play the
part of the output-strings (in the list) as
well.
As
regards the internal structure of a “list” in Visual Prolog, suffice it
to say
that it consists of 12-byte structures, where the first 4 bytes are
indicators
of “element-type” (“normal” or end elements), the next 4 bytes are
(pointers
to) each element itself, and the last 4 bytes are pointers to the “next
list-element”. So, instead of
allocating small pieces of memory N times (for N list-members), memory
can be
allocated only once, for an entire
list, if the number of list-elements is known:
For this reason, the first predicate strx2list/2
has two internal loops: (1) One for
finding the number of lines in the text (which is also the number of
list-elements) and (2) a second loop that uses
this number to create each list-element.
Before the second loop begins, a single
call of an external memory allocation predicate is enough to
reserve memory
for the entire list (either in the
stack, or in the heap). The
second variant of strx2list/3 was created for those rare cases when we
need to
call this predicate several times, inside another loop (external to
it). In
this case, it is not really worth
allocating memory during every call.
Instead, memory is allocated prior to
the outer loop only once, to create a
binary buffer which is used by strx2list/3 as a third (bound) argument,
a
Visual Prolog “binary”, pre-allocated
with 12*(N+1) bytes, where N is the pre-calculated number
of
list-elements. So, the second variant, strx2list/3,
uses no memory allocation at all.
Instead of this, it re-cycles all the
memory of its own inputs. The reason we use a “binary”
is that Visual Prolog binaries are equipped with an indicator
of size embedded in a memory location before their
pointer. This
size-indicator can be used later to release
the heap-memory -if in the heap- precisely as required, without
side-effects that can arise from intermediate processing, e.g.
zero-bytes
inside a string making it appear shorter -confusingly for de-allocation-
etc. Of course, there are
other
similar conversion predicates in this family that don’t
behave as stingily
towards memory, as does “strx2list”:
Predicates “str2list” (with no “x”
before the 2) and “conclist/2” are
routinely used in most ordinary text-to-list
/ list-to-text conversions.
As
a final note -about the predicate strx2list-
that may be of general value, if you examine the source code of
strx2list, in Appendix B,
you may be surprised to realize that
it doesn’t call literally any
external memory allocation predicates. Instead of
this, it uses function-pointers to an external
allocation predicate, included in the library’s Global
data segment as “_allocfunc”. This was
done for flexibility; so that the actual
allocation predicate may be
changed, at any time. E.g. the test-program, which produced the results
for the
previous timing-table, used different
assignments for “_allocfunc” for each column of the results: Column A was
generated while using (as the actual
contents of the location
“_allocfunc”) Visual Prolog’s stack-allocation predicate
“_MEM_AllocGStack”,
and column B was created by using
another -similar- predicate, which allocates memory in the heap,
instead. The
results (shown in the timing table) should not
surprise experienced Visual Prolog users: Heap
memory is much
more suitable -than
the stack- for
storing huge data! In conclusion, programming techniques such as these may seem “dirty” to “Prolog Purists” (since they are “procedural as hell”) but many years of Visual Prolog programming dictate the necessity of using such “dirty methods” when we are in desperate need to process huge amounts of data in RAM, rather than in hard disks. Of
course, today’s hard disk databases (with SQL, etc) are more efficient
than
ever before. However, there are important
advantages in keeping as much as possible of our data in RAM, when
we want
to do (e.g.) exhaustive fuzzy
string-matching, cross-examining each list-element versus all other
list-elements, etc. NO HARD-CODED
EXTERNAL
CALLS Avoiding hard-coded external function-calls was found to be good programming practice. Today all Assembly language predicates in these libraries use pointers to external calls instead of literal external predicates. So, if (e.g.) a newer version of Visual Prolog (or a new ‘C’ compiler, used for mixed language programming) is installed, it is still possible to use the same libraries, with minor changes in very few pointer-definition files where the names of some external predicates are stored (and are compiled into pointers immediately). Alternatively, it is also possible to use initial calls of certain “configuration predicates”, to fill the locations of external call pointers with addresses of external predicates at run-time. So, instead of _IO_WrStr (a built-in Visual Prolog/‘C’ predicate that writes strings to the screen or to current output, defined in the file “pdcrunt.h”) many ASM predicates use a call to the function pointer “_wr_str”, located in the Data Segment. This makes it possible to do much more than just writing on the screen: E.g. if the contents of location _wr_str are changed at run-time to contain the address of hp_wrstr, a new ASM predicate that writes strings to “virtual devices” inside the heap, it becomes possible to accumulate text that may be the result of some processing, writing it in small lumps inside the heap, and in the end to dump it -all at once- to a file, or to the screen, after certain conditions are met. Nevertheless,
the default-values of all these memory
locations are pre-filled with Visual
Prolog predicates, and are rarely
expected to change: Location “_wr_ch” contains Visual Prolog’s
“_IO_WrCh”,
location “_wr_int” contains Visual Prolog’s “_IO_WrIntg”, and so on. If
any of
these need changing, there are appropriate ASM-predicates to do the
job. For
example, in order to change at run-time the
current string-writing predicate, there is a configuration
predicate, “set_wr_str/1”
(in Visual Prolog 5.* syntax): GLOBAL
DOMAINS STR_i:
determ (STRING) –(i)
language c GLOBAL
PREDICATES set_wr_str( STR_i ) –(i)
language c Calling “set_wr_str(foo)” where “foo” is any predicate of domain “STR_i”, results in “foo” becoming the current string-writing predicate globally called by every ASM predicate (that does any calls to any external string-writing). Now, “foo” is not “obliged” to do any literal string-writing; Instead of this, it could very happily be asked to do other things, such as calling “assert” (so that -instead of writing- it asserts facts in a Prolog database, and so on). Another common example (part of which was discussed earlier, with “strx2list”) is this: To change at run-time the current global memory allocation predicate used (by “strx2list”, as well as others), there is a global configuration predicate, “set_allocfunc”: GLOBAL
DOMAINS allocbin_func = determ
BINARY (UNSIGNED size) -(i)
language c GLOBAL
PREDICATES set_allocfunc(allocbin_func) -(i)
language c The
possibility of changing the currently
used memory-allocation predicate
(globally) by a program on the fly is
also useful when calling these predicates inside a “main program” that
is in a
language other than Visual Prolog. SPECIALISED
WRITING
PREDICATES The ASM-predicate write_slist, which writes a list of strings with separators, belongs to a large family of similar writing predicates that can have their output redirected anywhere by changing the function pointers called by them for writing. The default-values of the pointers (as explained before) are set to Visual Prolog writing predicates, but can be changed to any other predicates at any time. The family where “write_slist” belongs also includes predicates that perform sophisticated writing scripts: Some of them can write the contents of Visual Prolog binaries in ways understandable by humans; others may write integer-lists, strings and binaries in ways that are needed by specific software applications, e.g. to create CSV Excel-files or HTML files. All
these can be redirected for writing
to the heap (in “virtual strings”), or for doing other (often
unexpected) tasks irrelevant to writing: Any predicate of similar type to
“_IO_WrStr” may be used as a
“virtual writing predicate” by inserting
its
address in location “_wr_str”. In Appendix C
there is a complete listing of
all
these “special writing predicates”, with some brief
comments about their operation. Appendix C also
includes another big
family
of ASM predicates, which are
often useful for experimentations and software development, but are not really so important for most final
applications: Random generation predicates. CSV-FILE HANDLING
PREDICATES In Appendix D
there are
Predicate
definitions and descriptions of an Assembly language library to process
CSV-
(Excel) files, in many possible ways. Some
of these ASM predicates analyse CSV files to extract parameters from
them (e.g.
their number of data-fields); others write them out in various ways;
others
select, delete or insert columns, rows, or cells, etc. However,
due to space and time limitations this
CSV-file handling library
is not
included in the main part of this
presentation. USING THE HEAP
FOR
“STREAMING” A common need is to suspend actual writing (to the screen, or to a file), writing everything to a virtual device or “stream”, which can be flushed. So, another family of predicates was created for this; writing strings, numbers and characters to “virtual strings” inside the heap. First,
the initialisation-predicate “heap_init”
should be called, with a single argument: the number of virtual heap
strings required. This
call allocates heap-memory for the array
of pointers (and all the “position indicators”) needed by the
“virtual
strings”, but not for the strings themselves.
To allocate heap memory for
each individual virtual string,
predicate heap_create/1 must be
called, once for each “virtual
string”: It has only one argument, the required
size of every string; It returns (in register EAX) the “integer-handle”
for each “virtual string”. This “handle” can then
be re-used for “virtual writing”, i.e. adding content to the specific
“virtual
string” in ways that are similar to writing on the screen, or to a
file. If we don’t need many such “virtual strings”, or if we want to use one of them most of the time, a “current default string” can be defined by predicate “hpstream/1”, with a single argument: -This virtual string’s “handle”. Some Assembly language predicates in this “heap-string library” emulate precisely Visual Prolog’s _IO_WrStr, _IO_WrCh, IO_WrIntg, etc, but (this time) writing to “virtual strings”, rather than to the screen or to a file. Their “target” is the “current virtual string” (which can be changed by “hpstream/1”). These are predicates that can be inserted as “clones” of standard writing predicates, in the locations invoked by “write_slist” (and many others). See Appendix
E
for a listing of all the global predicate
definitions in this specialised heap-writing and “streaming”
library. ENHANCED SEARCH PREDICATES
An
entirely new family of
predicates for searching (inside strings, as well as inside binaries)
was
implemented in Pure Assembly language. This family of predicates
contains many
enhancements, including fuzzy
search, non-deterministic search, etc), in order to:
1)
Start from a
pre-specified position (rather
than
always at the
beginning) 2)
Return the “rest” (after
the search, as a pointer; not through
memory allocation) 3)
Search backwards (used in parsers) 4)
Search
non-deterministically (a Prolog–specific
feature, useful for Data Mining) 5)
Search
case-insensitively, and with Generalized
Case-insensitivity based
on Character
Table techniques 6)
Do fuzzy searches and
inexact pattern- matching Searches (in many ways), 7)
Search optimally, using the Boyer-Moore
algorithm (and
other types of optimized
search) 8)
Do everything for both
Strings
and Binaries (providing all possible variants of all
these predicates, together with all their features; as a result, e.g. there are 15 different variants of “nd_search”
alone, which does non-deterministic search) 9)
Handle
all possible types of objects (I.e. be
able to
find strings or
binaries, or characters inside strings or binaries, in all possible
type-combinations). In Appendix F there is a complete
listing of
all the Global Predicate
definitions for this large family of universal
search-predicates. Of
particular interest are non-deterministic and “fuzzy search”
predicates in this family.
Of
course, these simple (sometimes ad-hoc)
methods of fuzzy search may
seem somewhat childish to bio-computing
experts, who know more efficient algorithms for
inexact
string matching,, used in today’s DNA
Research. However, the boosting effects of Assembly Language on an algorithm’s execution
speed often compensate for the algorithm's minor
imperfections. Besides, it may be worthwhile to rewrite
today’s bestbio-computing
search
algorithms in hand-optimized Assembly
, and then perhaps
use Visual Prolog for the more intelligent parts of
DNA
analysis, e.g. using professor David Searle’s recently
discovered String
Variable Grammar (already written in Prolog) to delve into semantic
DNA
parsing! In any
case, a
combination of character table- based
methods with Levenstein’ s algorithm was used in a Visual
Prolog ENB application that correctly identified the
names of many hundreds of
hydrological stations, matching each station’s true name to many commonly used aliases and sub-station names. The lack of
a common convention about
hydrological station names -in previous years- had caused a painfully
prolific number of aliases, inside paper documents
of the raw hydrological
data for previous
decades. In Appendix G there
are selected screen-shots from
this Visual Prolog
Windows program,
which was used by ENB Ltd to
compile and correct the final and official Nationwide
List of Hydrological
Stations in Greece. In Appendix
G there are also
some
source
code sample listings, for both fuzzy and non-deterministic
search predicates (of global definitions provided in Appendix
F). DATA MINING IN
BITS (1): “DATE-BINARIES” “Date-Binaries” are
data-structures of Dates
encoded as bits. A “Date-Binary”
is a Visual Prolog Binary
with the following structure:
In
addition,
In short, date-binaries consist
of bit-patterns or
bit-sets, representing the days of specific
months
within
specific -series of- years in a
compact way, allowing fast and easy
access. Date-binaries
are complete and adequate, as specifications of any
single Boolean attribute, over calendar-time. This attribute
can
denote
truth
or existence of any property, of any type of data in
relation to
calendar time. E.g.
date-binaries could be
used to analyze the market
behavior of customers who
did or did not buy a certain product,
over time,
and who did or did not have an “attribute” that we think
is relevant, in the likelihood to buy this product,
within given periods of time. In this particular -hypothetical-
example, all we
need to do, to verify a suspected latent correlation between buying patterns of customers who do
or don’t have a “specified attribute”,
is to use date-binary bit-operations (AND, OR and NOT) at a massive scale, in various ways. So,
Assembly
language predicates have been written, that implement a
variety of logic and numeric bit-operations on Date-Binaries: Date-binaries
were originally created to assist the quest for the longest
intervals of valid and contiguous hydrological time-series data,
in a large-scale project of hydrological predictions financed by the
Greek
government. The
problem of this project was that much of the raw data, sent from
hydrological
stations throughout the country, was partly incomplete and unusable.
Soon enough, it was proved to be more
tedious to distinguish the good
from the bad data, than to actually use
it! The
company's main hydrological software used by ENB's engineers at the
final phase
became useless if the data fed to it had gaps of erroneous
data, inside the raw time-series. So,
a Visual Prolog Windows application
was developed, based on date-binaries, to produce reliable
data-summaries,
readable by humans, for use in important decisions
on this project. To
identify the erroneous
data-gaps, a five-phase
extraction process was devised:
The
use of date-binaries in this project was a clear success. All
known
previous methods for achieving similar goals became obsolete.
Visual
Prolog program development was
also fast: Less than a day, plus
about ten working person-days to… invent the “Date-Binaries Library”
itself; i.e. the ASM predicates used by
the program. This program was so
fast that it took longer to save results to a file than to...
calculate
them, even though the data included several
decades and many dozens of stations. Admittedly, it would be possible to use any “mainstream language”, such as Visual Basic or C++, to achieve the same goals. However, even forgetting the advantage of a very short development time (using Visual Prolog) the ultra-high execution speed of the program empowered the company’s engineers to experiment widely with many different ranges, settings, thresholds and time-periods, so as to re-adjust and re-define the complete picture of the data. This had a very positive impact on the quality of the decisions taken, about which parts of the raw data to accept. We
now have good reason to believe, that the software methodology
used for
assisting this project, can achieve similarly good results in any
field
where incomplete or erroneous time-series data is an
obstacle, which means: every other
company working with raw time-series
data. However, the feeling of the author is that this
is only
the beginning. A logical next step is to associate date-binaries
with other types of data and rule extraction
algorithms, e.g. operating on raw time-series to discover hidden
patterns
or hidden relations inside the data, at a very high speed. Now,
“Inductive
Logic Programming” methods are known to demand vast amounts of time to
arrive
at the simplest results, but if they are assisted or boosted with such
low-level innovations (as date-binaries), they are bound to become much
faster
and much more practical, in business fields that are extremely
demanding for instant
and powerful results. In Appendix
H there is also a
listing
of all Global Predicate definitions
in our libraries that deal with Date-Binaries. In
the last section, we will take a brief
look into a methodology much more powerful and important
than
“date-binaries”, but also 100% compatible with them: It's a methodology
that
can assist serious and extremely reliable data-mining processes at
a
fraction of the time normally required for them: A logic
inference algorithm which (like date-binaries) also
uses bits and is also based on low-level logic bit-operations,
but it is
of more general scope, a part of an
“Alternative Logic System” which could be described as a “Reduced Instruction Set
Inference Engine for Boolean
Logic”. Appendices
A, B,
C, D, E, F, G, H, I:
Appendix I (iota): ALGORITHM
IPHIGENIA The
“Iphigenia algorithm” is a certain kind of “mathematical curiosity” or
invention of the author, dated back to the mid-eighties. Today, after
many
years -during which the theory of this algorithm was... buried
in the
closet- there are some fresh results and theorems, proved in 2004 (and
more recently). Assembly
Language
implementations of this algorithm also led to
some advances in the theory behind it: “Multiple
Form Logic”, about which you
can
find out
more in the site http://multiforms.netfirms.com (a non-commercial research
site).
Here is some information about
the theory
leading to “Algorithm
Iphigenia”, as well as a list of all
the predicates implementing it,
in Visual Prolog 6.3 syntax, towards the end:
Working with BITS, Multiple Form Logic
and “ALGORITHM
IPHIGENIA”: Unfortunately,
“Multiple Form Logic” is
not easy to expound in just
a few
minutes, and it is also considerably off topic for the
concerns of
this conference. So this particular section will come to an end with a
brief
discussion of practical results obtained by using algorithm Iphigenia
and also
some hints about how this algorithm can be combined with
certain other compatible
methodologies, implemented in Assembly language, such as date-binaries,
in the context of Visual Prolog
programming.
The
algorithm operates blindly and with brute force, as a bit-pattern
reduction and
inference engine, in a way that is “pseudo-parallel”, and it crunches
our
bit-encoded knowledge base in a fraction of a second, to
find the
result. There
only very few “bit reduction rules”, all of which are implemented as bit
operations on Visual Prolog binaries. There
are a couple of
rules
for “vertical or horizontal” reduction, a couple of “row elimination
rules”,
and... If
our Knowledge Base is encoded as bits, it can also be easily extended
to include date-binaries, or any
other bit-encoded
attributes (of
raw unexplored information). Algorithm
Iphigenia can then also be
used as a powerful data-mining
engine, able to deduce
new
logic
patterns and inferences (in bits) that were not
known
(at the
start), provided we make sure that the interface
between the
“raw data”
and our logical “meta-rules” (used for the data
mining strategy)
is kept transparently clear and well-designed. Quite
often, it is
necessary to include external criteria
or some types of
“thresholds” at
an intermediate level, before “raw data” is allowed to pass
into the
“blind” logic inference phases.
E.g. in the previously
mentioned
hydrological project (at ENB Ltd) where
we had used date-binaries,
without using
algorithm Iphigenia, a threshold
of
25 days per month was decided to be the “minimum number
of days for a particular month's data
to become acceptable” by subsequent
phases. A
common mistake that one can make easily, if one uses this algorithm
carelessly
is to ask a wrong question, causing an “avalanche” of useless
but... correct inferences, which
-even though are
(logically) correct- they are also... impossible to digest,
as
regards their results, unless you possess... supernatural abilities for
logical
complexity comprehension. The larger our knowledge base, the more
logic
facts will be deduced massively and blindly (but also correctly)
as logic consequences of our query. If
our question is reasonable, then the answer will also be
reasonable. If
-however- our query is ambiguous, then the answers are not produced
by anything... decent, polite and sequential as Prolog-like
back-tracking, but
-instead- they are hammered out by a brute-force potentially
parallel
inhuman algorithm of blind reduction at lightning-fast speed,
but also
hard to stop or control, once it begins! Here
are sample screen
shots from a test-program (an EasyWin Visual
Prolog executable) linked
to the “Iphigenia Inference Engine Library”: (1) First example: ALGORITHM IPHIGENIA TESTING PROGRAM (EasyWin interface). Select: 1 to load and test, 2 to create new, 0 to exit: ?- 1 IPHIGENIA ALGORITHM Expert-System KBS/memory diagnostics: *********************************************************************** Number of expert-system RULES = 4. Terms in the Expert-system 'dictionary' = 6. Active Bits = 7 (Number of Terms + 1 'flag-bit'). Covered by 1 dword(s), i.e. 4 bytes. (7 / 32 = 0, 0 * 32 = 0, 1 *
32 =
32) Size of the main array: (2 x 4 fact-bytes x 4 rules) = 32
bytes. Total memory allocation (including 'conclusion-line'): 4 bytes x (4+1) x 2 = 40 bytes. ********************************************************************** Sentences in Logic Array: 4, Dwords per row: 2 0)01100000000000000000000000000000
00010000000000000000000000000000: intelligent(1)&have
time(2)=>can learn
programming(3) 1)00010000000000000000000000000000
00001000000000000000000000000000: can learn programming(3)=>can
learn
Prolog(4) 2)00001000000000000000000000000000
00000100000000000000000000000000: can learn Prolog(4)=>can learn
Logic(5) 3)00000100000000000000000000000000
00000010000000000000000000000000: can learn Logic(5)=>can learn
Iphigenia(6) (*)
00000000000000000000000000000000 00000000000000000000000000000000: VOID
?- 2 Give conclusion_row: > intelligent&have time=>can learn Iphigenia [1,2] => [6] (*) 01100000000000000000000000000000
00000010000000000000000000000000:
intelligent(1)&have time(2)=>can learn Iphigenia(6) ok(y/n)?- y 1:Proof 2:Query 3:Edit 4:HRC 5:LogFile 6:List 7:DHC
9:REFRESH 10:Save
0:Exit ?- 1 FINISHED(3) - The proposed conclusion is TRUE (by HRC): (*)
01100000000000000000000000000000 00000010000000000000000000000000:
intelligent(1)&have time(2)=>can learn Iphigenia(6) 1:Proof 2:Query 3:Edit 4:HRC 5:LogFile 6:List 7:DHC
9:REFRESH 10:Save
0:Exit ?- 0 (2) Second
example: ALGORITHM IPHIGENIA TESTING PROGRAM (EasyWin interface). Select: 1 to load and test, 2 to create new, 0 to exit: ?- 1 IPHIGENIA ALGORITHM Expert-System KBS/memory diagnostics: *********************************************************************** Number of expert-system RULES = 37. Terms in the Expert-system 'dictionary' = 29. Active Bits = 30 (Number of Terms + 1 'flag-bit'). Covered by 1 dword(s), i.e. 4 bytes. (30 / 32 = 0, 0 * 32 = 0, 1
* 32
= 32) Size of the main array: (2 x 4 fact-bytes x 37 rules) =
296 bytes. Total memory allocation (including 'conclusion-line'): 4 bytes x (37+1) x 2 = 304 bytes. *********************************************************************** Sentences in Logic Array: 37, Dwords per row: 2 0)01000000000000000000000000000000
00100000000000000000000000000000 :
bird(1)=>covered by feathers(2) 1)01000000000000000000000000000000
00010000000000000000000000000000 :
bird(1)=>can fly(3) 2)01000000000000000000000000000000
00001000000000000000000000000000 :
bird(1)=>has eggs(4) 3)00000100000000000000000000000000
00000010000000000000000000000000 :
mammal(5)=>has hair(6) 4)00000100000000000000000000000000
00000001000000000000000000000000 :
mammal(5)=>has live birth(7) 5)00000000100000000000000000000000
00000000010000000000000000000000 : carnivore(8)=>eats meat(9) 6)00000000001000000000000000000000
00000000000100000000000000000000 :
ungulate(10)=>has hoofs(11) 7)00000000000010000000000000000000
00000000000001000000000000000000 :
cheetah(12)=>tawny color(13) 8)00000000000010000000000000000000
00000000000000100000000000000000 :
cheetah(12)=>has darkspots(14) 9)00000000000000010000000000000000
00000000000001000000000000000000 :
tiger(15)=>tawny color(13) 10)00000000000000010000000000000000
00000000000000100000000000000000 : tiger(15)=>has darkspots(14) 11)00000000000000001000000000000000
00000000000001000000000000000000 : giraffe(16)=>tawny color(13) 12)00000000000000001000000000000000
00000000000000000100000000000000 : giraffe(16)=>long-legged(17) 13)00000000000000001000000000000000
00000000000000000010000000000000 : giraffe(16)=>long-necked(18) 14)00000000000000001000000000000000
00000000000000100000000000000000 : giraffe(16)=>has darkspots(14) 15)00000000000000000001000000000000
00000000000000000000100000000000 : zebra(19)=>white color(20) 16)00000000000000000001000000000000
00000000000000000000010000000000 : zebra(19)=>has black stripes(21) 17)00000000000000000000001000000000
00000000000000000100000000000000 : ostrich(22)=>long-legged(17) 18)00000000000000000000001000000000
00000000000000000000000100000000 : ostrich(22)=>cannot fly(23) 19)00000000000000000000001000000000
00000000000000000010000000000000 : ostrich(22)=>long-necked(18) 20)00000000000000000000001000000000
00000000000000000000000010000000 : ostrich(22)=>black color(24) 21)00000000000000000000001000000000
00000000000000000000000001000000 : ostrich(22)=>has white marks(25) 22)00000000000000000000000000100000
00000000000000000000000000010000 : penguin(26)=>can swim(27) 23)00000000000000000000000000100000
00000000000000000000000010000000 : penguin(26)=>black color(24) 24)00000000000000000000000000100000
00000000000000000000000001000000 : penguin(26)=>has white marks(25) 25)00000000000000000000000000001000
00000000000000000000100000000000
: albatross(28)=>white color(20) 26)00000100000000000000000000000000
00000000000000000000000000000100 : mammal(5)=>animal(29) 27)00000000100000000000000000000000
00000100000000000000000000000000 : carnivore(8)=>mammal(5) 28)00000000001000000000000000000000
00000100000000000000000000000000 : ungulate(10)=>mammal(5) 29)01000000000000000000000000000000
00000000000000000000000000000100 : bird(1)=>animal(29) 30)00000000000000000000001000000000
01000000000000000000000000000000 : ostrich(22)=>bird(1) 31)00000000000000000000000000100000
01000000000000000000000000000000 : penguin(26)=>bird(1) 32)00000000000000000000000000001000
01000000000000000000000000000000 : albatross(28)=>bird(1) 33)00000000000010000000000000000000
00000000100000000000000000000000
: cheetah(12)=>carnivore(8) 34)00000000000000010000000000000000
00000000100000000000000000000000 : tiger(15)=>carnivore(8) 35)00000000000000001000000000000000
00000000001000000000000000000000 : giraffe(16)=>ungulate(10) 36)00000000000000000001000000000000
00000000001000000000000000000000 : zebra(19)=>ungulate(10) (*)
00000000000000010000000000000000 00000000000000000000000000000000:
tiger(15)=> 1:Proof 2:Query 3:Edit 4:HRC 5:LogFile 6:List 7:DHC
9:REFRESH 10:Save
0:Exit ?- 2 Give conclusion_row: > tiger=>animal [15] => [29] (*)
00000000000000010000000000000000 00000000000000000000000000000100:
tiger(15)=>animal(29) ok(y/n)?- y 1:Proof 2:Query 3:Edit 4:HRC 5:LogFile 6:List 7:DHC
9:REFRESH 10:Save
0:Exit ?- 1 FINISHED(1) - The proposed conclusion is TRUE (by RVC): (*)
00000000000000010000000000000000 00000000000000000000000000000100:
tiger(15)=>animal(29) Press a key for more proof-cycles (or ESC to stop): -------------------------------------------
(end of
sample screen-shots)------------------------------------ INTERPRETATION:
26)00000100000000000000000000000000
00000000000000000000000000000100 :
mammal (5)=>animal (29) 27)00000000100000000000000000000000
00000100000000000000000000000000 :
carnivore(8)=>mammal(5) 34)00000000000000010000000000000000
00000000100000000000000000000000 :
tiger (15)=>carnivore(8) The proposed assertion is
the following (bottom bit-row): (*) 00000000000000010000000000000000
00000000000000000000000000000100:
tiger(15)=>animal(29) Interestingly, this was not proved by “following implications”
(as a human prover, or a conventional
mechanical theorem prover
might do). Instead, the assertion was proved by more elementary
low-level bit-reductions, in a
Logic System
that is an alternative to Boolean Logic itself. In this
Logic system there are only three axioms
(used in the proofs) and no need for
“logic implication” as such. In fact, logic
implication can be proved as a theorem in this
system, rather than
taken for granted as an axiom. A
complete proof of this claim is in
the web-page: http://multiforms.netfirms.com/mflogic_proofs.html#theorem_T7 It
seems that the “quantum”,
or the “elementary particle” of logic
computation, is not
“implication”, but a more primitive
operation, the act of “drawing a distinction”. From this low-level
operation,
logic implications arise as consequences. The theory about this is
“Multiple
Form Logic”, explained in the (strictly non-commercial) research
site: http://multiforms.netfirms.com. IPHIGENIA
ALGORITHM PREDICATES: % Applies brute-force RVC (Rule of Vertical Cancellation) to an entire Iphigenia array: iphig_rvc : (binary Array,binary Conclusion) -> integer procedure(i, i) language c as "_iphig_rvc_0". % Returns 0 if array was PROVED, 1 if array was changed, -1 if no change.
% Like the previous, but also produces (as output arg.3) an array of “rows changed”: iphig_rvc : (binary Array, binary Conclusion, binary RowsChangedBuffer) -> integer procedure(i, i, i) language c as "_iphig_rvc_1". % Returns 0 if array was PROVED, 1 if array was changed, -1 if no change.
% Applies RDT (“Rule of Diagonal Transfer”) % to a single-row Iphigenia array/query-line (only used for testing purposes): iphig_rdt : (binary SingleRowArray, binary ConclusionRow) -> integer procedure(i, i) language c as "_iphig_rdt_0". % Returns 0 if the array was PROVED, 1 if array was changed, -1 if no change.
% Applies reduction rule “RDT” to a given row (and query line) of an Iphigenia array: iphig_rdt : (binary Array, binary ConclusionRow, unsigned RowIndex) -> integer procedure(i, i, i) language c as "_iphig_rdt_1". % Returns 0 if the array was PROVED, 1 if array was changed, -1 if no change.
% Applies reduction rule “RDT” to all specified rows (ARG 3) of an Iphigenia array: iphig_rdt_rows: (binary Array, binary ConclusionRow, binary RowIndexesAffected) -> integer procedure(i, i, i) language c as "_iphig_rdt_rows". % Returns 0 if the array was PROVED, 1 if array was changed, -1 if no change.
% Applies reduction rule “RDT” to ALL the rows of an Iphigenia array: iphig_rdt_all: (binary Array, binary ConclusionRow) -> integer procedure(i, i) language c as "_iphig_rdt_all". % Returns 0 if the array was PROVED; 1 if array was changed; -1 if no change.
% Checks if the (32-bit-encoded) input number contains an isolated “1”-bit: single_onbit: (unsigned Num) -> integer procedure(i) language as "_single_onbit_0".
% Checks if a particular row of an Iphigenia Array contains an isolated “1”-bit: single_onbit: (binary Array, unsigned NumDwords, unsigned DwPosOut) -> integer procedure(i, i, o) language c as "_single_onbit_1".
% Reverses left-hand & right-sides of a “Query Row” relevant to an Iphigenia array: iphig_revsides : (binary ConclusionRow,unsigned DwordsInBinary) -> integer procedure(i, i) language c as "_iphig_revsises".
% Sets a doubleword value (32 bits) at a row and column of an Iphigenia array: iphig_setdword : (binary array,unsigned Nrows,unsigned Row,unsinged Col,unsigned Value) -> integer procedure(i, i, i, i, i) language c as "_iphig_setdword".
% Sets the byte-value : (all 8 bits) of a row and column in an Iphigenia array: iphig_setbyte : (binary array,unsigned Nrows,unsigned Row,unsinged Col,byte Value) -> integer procedure(i, i, i, i, i) language c as "_iphig_setbyte".
% Sets the bit-value of a specific row and column, in an Iphigenia array: iphig_setbit : (binary array,unsigned Nrows,unsigned Row,unsinged Col,byte Value) -> integer procedure(i, i, i, i, i) language c as "_iphig_setbit_0".
% Sets the bit-value of a specific row and column in a SIDE of an Iphigenia array: iphig_setbit : (binary Array, unsigned Nrows,unsigned Row,unsinged Col, byte Side, byte Value) -> integer procedure(i, i, i, i, i, i) language c as "_iphig_setbit_1".
% Gets the bit-value of a specific row and column, in an Iphigenia array: iphig_getbit : (binary Array, unsigned Nrows,unsigned Row,unsinged Col) -> integer procedure(i, i, i, i) language c as "_iphig_getbit_0".
% Gets the bit-value of a specific row and column in a SIDE of an Iphigenia array: iphig_getbit : (binary Array, unsigned Nrows,unsigned Row,unsinged Col,byte Side) -> integer procedure(i, i, i, i, i) language c as "_iphig_getbit_0".
% Tests if the Conclusion-Row of an Iphigenia Array contains “…&X& => X”: iphig_xsb : (binary ConclusionRow) -> integer procedure(i) language c as "_iphig_xsb_0".
% Tests if any Row of an Iphigenia Array contains “…&X& => X”: iphig_xsb : (binary Array,unsigned RowWidth) -> integer procedure(i, i) language c as "_iphig_xsb_1".
% Deletes (setting “invalid bit” to 1) all the rows satisfying condition “DHC”: iphig_dhc : (binary Array, binary ConclusionRow) -> integer procedure(i, i) language c as "_iphig_dhc".
% Writes an Iphigenia array’s “conclusion row” to current output: iphig_wrbin : (binary ConclusionRow) -> integer procedure(i) language c as "_iphig_wrbin_0".
% Writes an entire Iphigenia array, as well as a “conclusion row” to current output: iphig_wrbin : (binary Array, unsigned NumDwordsPerRow) -> integer procedure(i, i) language c as "_iphig_wrbin_1".
% Writes a specific Row (arg.3) of an Iphigenia array to current output: iphig_wrbin : (binary Array, unsigned NumDwordsPerRow, unsigned Row) -> integer procedure(i, i, i) language c as "_iphig_wrbin_2".
% Clears a row in an Iphigenia array by setting the “invalid bit” to 1: iphig_clear_row : (binary Array, unsigned NDwordsPerRow, unsigned Row) -> integer procedure(i, i, i) language c as "_iphig_clear_row".
% Deletes completely an Iphigenia Array’s row (by setting all bits to 0): iphig_del_row : (binary Array, unsigned NDwordsPerRow, unsigned Row) -> integer procedure(i, i, i) language c as "_iphig_del_row".
% Inserts (i.e. modifies) a given row (arg.3) in an Iphigenia array: iphig_ins_row : (binary Array, unsigned NDwordsPerRow, unsigned Row, binary Content) ->
integer procedure(i, i, i, i) language c as "_iphig_ins_row". NOTES
MULTIPLE FORM
LOGIC -the theory that led to “Algorithm Iphigenia”:
Multiple Form
Logic is
a simple but strange mutation
of Propositional Calculus, based solely on operations OR and
XOR, and
also (which is even more strange) requiring no
absolute or
pre-defined truth values (like 0 and 1): Instead of 0 or 1, there is a
“Universal Object” equivalent to 1, but not exactly the
same,
ontologically. This “Universal Object” is not given, but
instead it
is constructed, as the “ Multiple Form
Logic is
an enhanced descendant of another system
of logic, expressed in a book by British author George Spencer-Brown:
“Laws of
Form”. Although the logic theories of George Spencer-Brown have not
been
entirely accepted by the international academic community, there are
many
people in the world working seriously with “Laws of Form”, some of them
respected professors of Computer Science and Mathematics: Professor Lou
Kauffman (the inventor of “Knot Theory”, from the University of
Illinois),
Richard Shoup (who also runs the “Laws of form site” www.lawsofform.org),
David Keenan, William Bricken, Eddie Oshins, and many others. Since long ago
(mid-eighties) the three
axioms of Multiple Form Logic were proved
to be:
However...
|