Trono's world: novembre 2006

lunedì 27 novembre 2006

Method invocation in queries

In my object persistence engine I expect to be able to submit queries which can call methods on the stored objects:

SELECT Class c WHERE c.method();

Let's try to classify methods by their arguments, return values and side effects: the simplest kind of method I can use in a query is a “const” method (a method which doesn't modify the internal state of the objects) which returns a boolean and doesn't expect any argument. Such a method can be easily and safely be invoked during query execution, as it doesn't modify the object and is quite easy to evaluate. Methods that return something different from a boolean value, could cause problems during the evaluation of the WHERE clause. Also, methods expecting arguments could be difficult to evaluate as the query engine needs a mechanism to choose what argument to pass. But the most problematic type of method is the “non-const”, which modifies its internal state during the execution: this could require the query to re-evaluate some previously evaluated objects, or if the evaluated object modifies some other related object contained in the database could cause a cascade update, which could be difficult to handle. And, even worse, a method could throw an exception! But how to handle it?

Constness: if not const, could cause cascade modifications on the persistence layer
Return value: if not boolean (or primitive) could cause difficult evaluation
Arguments: if needed, could cause difficulties for the query engine to choose which
Exceptions: could cause difficulties in the execution thread

venerdì 24 novembre 2006

Object ID

9223372036854775807 is the maximum value of a Java long. It should be a sufficiently big number to be used for object identification in my ODBMS (well, actually it's an object persistence and query engine, but it could be used by a fully featured object oriented database management system in the future).
I'd like to be able to use consecutive numbers starting from 0 but I need an efficient hash function for indexing: the java.lang.Long's hashCode() won't do the work as it just returns a xor of the two halves of the value, so I was thinking about using something like this function - look for hash64shitf(long key).
Let's do some speed tests:
5 millions of hash values calculated with the hash64shit() took ~= 2450 mills, while the usual Long.hashCode() took 422 mills. Sure, the weak hash function is much faster, but 2,5 seconds for computing 5M hash keys is quite a small amount of CPU time!

giovedì 23 novembre 2006

Using the file system as a persistence engine for a database

I'm currently designing a sort of object oriented database engine in Java. I'd like to have a plug-in design for all the components, so that I can choose to implement them as I want. Now I'm thinking about the persistence engine, and I'd like to approach the problem with a different perspective; let's see what are the features I'd like to obtain from this component: first of all I'd like to implement it easily, then I'd like it to be enough performing to be usable not only for example purpose; I want caching, expandability, eventually replication and multi-volume. I've started thinking about an “ad hoc” file format, but its design is not so trivial: deleting records could cause an inefficient use of the space, and eventually fragmentation; and caching could be a nice problem too. Having to deal with this problems ain't that easy, which means that this approach could cause a big amount of time spent. So is there any thing able to perform this sort of operations efficiently? Sure: the file system! Modern file systems (have a look at Sun's ZFS ) do offer a lot, and they are virtually free, in the sense that any modern OS offers a set of features that include caching, mirroring, space saving features such as compression, and even security features such as journalling, error recovery and file content encryption.
So the choice is done, I'll think about a persistence engine based on the file system and directory structure (I could implement inheritance via subdirectory organization, and object reference with simple urls or links).

Linux drivers

I should practice my English, so I'll try to write some posts in English from now on.
I've plugged my Logitec web-cam in my Fedora 6 laptop, hoping that the OS would recognize it. Sadly it didn't happened, so I went on Logitec's site, but they don't supply Linux drivers. I haven't tried my Epson Photo R300 printer yet, but I guess I won't get the same printing results as in windows, because of the lack of an Epson's specific driver. Why are all this hardware vendors not suppling Linux drivers? Is it a matter of licenses? If so, why are they giving out free drivers for win and mac? ATI and nVidia have closed-source drivers for Linux, so I guess it's not a question about licenses. Maybe they don't care about Linux users, they think that spending in the development of Linux drivers won't give a return in investment... the solution could be letting the community develop the software, by exposing the hardware specifics, but that would let competitors to learn about the technology used. So is this a no solution problem?
No. In object oriented programming, there is a parading which enables a programmer to expose only something about its work – that is the “interface” – while hiding the implementation, the algorithms behind a piece of software: the information hiding paradigm. Can this paradigm be used in another context, precisely in the software interface between the operating system and a hardware device? Well, actually it IS used, exactly in this precise context! Think about all the PROTOCOLS used between devices, some of which are even OPEN PROTOCOLS... maybe the community itself could provide open protocols between devices! Imagine a “universal printer protocol” which relies on one or more well know communication layer (via USB, or parallel port, or even over a network adapter) that includes all the PostScript and PCL features, and adds some special features like color calibration or else (the availability of this optional features could be negotiated between the devices by the “universal printer protocol” itself). Or a “universal A/V protocol”, which includes bidirectional audio and video streaming over an open compression format (ogg?), or without compression on simpler device (am I thinking about my web-cam?), every thing transported over a well known connection layer (of course USB 2, or firewire, or even a wireless or hard wired network protocol). The vendors could even enhance these open protocols, by proposing or adding features, and the same protocol could be used by every operating system (Linux, Windows, Mac, but also Symbian via infrared or bluetooth, or by cell phones). And easily a device could expose the implemented features, and higher versions of the protocol could provide new or enhanced features as well: a printer protocol 1.0 could provide black and white printing, while 2.0 could provide colors, and 2.1 even more colors...

martedì 21 novembre 2006

Streaming applicato ad un DBMS

Recentemente ho utilizzato i servizi database in due modalita':
- In un modello client/server, con client e server dispiegati su due macchine diverse
- In un modello tipo Java EE (oppure Unify, per quelli che sanno cosa voglio dire..) dove DBMS e application server risiedono sulla stessa macchina (o cluster), e interagiscono con un client (o un terminare remoto) trasferendo le informazioni "un po' alla volta, tante quante ce ne stanno a video"

Nel primo caso esiste un limite tale per cui il risultato di una query e' di dimensioni tali da saturare la rete, il client si blocca in attesa del completamento del trasferimento del result-set (e quindi l'utente e' scontento), il DBMS fa fatica a processare altre richieste (la rete e' satura!) ed il sistema processore + memoria + disco del DBMS non lavora in modo efficiente.
In questi casi, potrebbe essere conveniente utilizzare una tecnologia di tipo streaming, o meglio un modello produttore/consumatore per il trasferimento dei risultati della query: il db trasferisce i risultati al client "man mano che li trova", mentre il client rimane in una condizione non bloccante su un buffer, e "consuma" i record man mano che arrivano. Inoltre il server potrebbe lavorare tenendo conto del consumo di record da parte del client, cioe' in caso di buffer vuoto, dare la precedenza alla query, mentre in caso di buffer riempito, diminuire la priorita' del thread, lasciando piu' risorse a disposizione di altri utenti.Pero' utilizzando questo approccio si perde la possibilita' di ottenere un result-set ordinato dal DBMS, tranne che in pochi casi particolari. Immaginiamo un DBMS ad oggetti, e pensiamo di poter passare un comparatore alla query per determinarne l'ordinamento:

SELECT Cliente c ORDER USING new Comparator<Cliente>() {
   int compare( Cliente a, Cliente b )
   {
      return a.indirizzo.compare(b.indirizzo);
   }
};

Ora la streaming query potra' essere ordinata se e solo se esiste gia' un indice basato sul comparatore, e il DBMS e' in grado di utilizzare questo indice. In caso contrario, se l'ordinamento deve essere fatto “al volo”, il DBMS dovra' prima ottenere tutti i risultati della query e poi ordinarli utilizzando il comparatore. Ma come fa il DBMS a capire se due comparatori (quello nell'esempio e' costruito “al volo”) sono equivalenti, cioe' possono utilizzare gli stessi indici? Un metodo protrebbe essere quello di mantenere all'interno del DBMS le istanze dei comparatori utilizzati, ed implementare un metodo di confronto tra comparatori:

interface Comparator<T> extends Serializable
{
   int compare(T a, T b);
   bool isEquivalent(Comparator b);
   int hasCode();
}

Con comparatori di questo tipo, il DBMS potrebbe creare gli indici man mano che i comparatori vengono utilizzati (eventualmente con delle statistiche di utilizzo, ed una euristica di rimozione dei comparatori e relativi indici non utilizzati), naturalmente deve manutenere gli indici quando vengono modificati i dati, ma ha la possibilita' di effettuare delle comparazioni tra comparatori (l'iterazione e' umana, la ricorsione e' divina!) e decidere di utilizzare indici di ordinamento gia' esistenti. Naturalmente se l'indice richiesto non e' presente, deve venire creato, allungando ulteriormente il tempo di esecuzione della query. Potrebbe essere buono per un applicativo dove le query si ripetono, ma decisamente dannoso in un datawarehouse.

Java 5 won't work with Compiz

E subito brutte news...
Ho scaricato l'ultima jdk 5 con netbeans 5.5 (vedi qui), ma purtroppo non funziona con Compiz. Ho letto su qualche newsgroup che la jdk 1.4.2 non ha questo problema, e anche che i ragazzi della Sun ci stanno lavorando su. Ma merda... ah, naturalmente su winzozz funziona, e tra l'altro sembra anche "molto bbuonoo".
Tra l'altro ho avuto qualche prob anche con Eclipse con il CDT scaricato con l'apt-get, ma rimuovendo il package apt e installandolo come plugin di eclipse, funzia (vedi qui come fare).
Bonci.

Composition manager

Da qualche giorno ho installato Ubuntu e Fedora sui miei PC, per provare Compiz, e per capire se il prossimo anno varra' la pena aggiornare a Vista per usufruire del nuovo tema grafico. Sono sempre stato un fan di Xgl, ed ho sempre una live di Kororaa in borsa!
Seguendo un Howto ho installato i driver proprietari di nVidia per il desktop su cui gira Ubuntu per poter attivare AIGLX e Compiz, mentre sul portatile Anaconda ha riconosciuto la ATI Mobility senza battere ciglio, installando gia' di default Compiz (basta solo abilitarlo da una voce nel menu' di preferenze). Il risultato e' fantastico, non ne posso piu' fare a meno, e win non mi era mai sembrato cosi' brutto!
Pero' pero' pero'....
1) Gnome e' piu' lento del front end grafico di win
2) JBuilder non si installa, e' configurato per qualche release precedente di librerie (ma qui approfondiro')
3) La webcam Logitech non va di default, quindi anche qui dovro' approfondire
4) Eclipse e' cosi' lento che ti passa la voglia di programmare
Ora sto scaricando NetBeans, sperando di trovare un ambiente di sviluppo Java soddisfacente, mentre per quanto riguarda il C++ dovro' ancora fare qualche esperimento.

Hello world!

Ovviamente, come primo post... ;)

#include <iostream>
using namespace std;
int main()
{
cout << "Welcome to Trono's Blog" << endl;
return 0;
}