Loading...

biojava-l@biojava.org

[Prev] Thread [Next]  |  [Prev] Date [Next]

Re: [Biojava-l] File parsing in BJ3 Richard Holland Tue Oct 21 02:00:41 2008

Spot on.

Annotation/interface.... i think Annotation is probably better as you
suggest, but I'd have to look into that. Not sure how it works with
collections and generics. If it does turn out to be a better bet, I'll
change it over.

With the BioSQL dependencies, take a look at the pom.xml file inside the
biojava-dna module. It declares a dependency on biojava-core. If you want to
add dependencies to external JARs, take a look at biojava-biosql's pom.xml
to see how it depends on javax.persistence. (The easiest way to add these is
via an IDE such as NetBeans, which is what I'm using at the moment).

cheers,
Richard

2008/10/21 Mark Schreiber <[EMAIL PROTECTED]>

> So if I want to build a BioSQL loader from Genbank then would the
> classes (or there wrappers) in the BioSQL Entity package need to
> implement Thing?  Would maven have an issue with that or would it just
> create a dependency on core? (you can tell I've never used Maven
> right).
>
> From a design point of view should Thing be an interface or an
> Annotation? The reason I ask is that it doesn't define any methods so
> it is more of a tag than an interface.
>
> Anyway, my understanding is that I would use a Genbank parser (or
> write one). Write a EntityReceiver interface (probably more than one
> given the number of entities in BioSQL, implement a EntityBuilder
> (again possibly more than one) that implements EntityReceiver and
> builds Entity beans from messages it receives. In this case I probably
> wouldn't provide a writer as JPA would be writing the beans to the
> database.  Would this be how you imagine it?
>
> - Mark
>
>
> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
> <[EMAIL PROTECTED]> wrote:
> > (From now on I will only be posting these development messages to
> > biojava-dev, which is the intended purpose of that list. Those of you who
> > wish to keep track of things but are currently only subscribed to
> biojava-l
> > should also subscribe to biojava-dev in order to keep up to date.)
> >
> > As promised, I've committed a new package in the biojava-core module that
> > should help understand how to do file parsing and conversion and writing
> in
> > the new BJ3 modules. Here's an example of how to use it to write a
> Genbank
> > parser (note no parsers actually exist yet!):
> >
> > 1. Design yourself a Genbank class which implements the interface Thing
> and
> > can fully represent all the data that might possibly occur inside a
> Genbank
> > file.
> >
> > 2. Write an interface called GenbankReceiver, which extends ThingReceiver
> > and defines all the methods you might need in order to construct a
> Genbank
> > object in an asynchronous fashion.
> >
> > 3. Write a GenbankBuilder class which implements GenbankReceiver and
> > ThingBuilder. It's job is to receive data via method calls, use that data
> to
> > construct a Genbank object, then provide that object on demand.
> >
> > 4. Write a GenbankWriter class which implements GenbankReceiver and
> > ThingWriter. It's job is similar to GenbankBuilder, but instead of
> > constructing new Genbank objects, it writes Genbank records to file that
> > reflect the data it receives.
> >
> > 5. Write a GenbankReader class which implements ThingReader. It can read
> > GenbankFiles and output the data to the methods of the ThingReceiver
> > provided to it, which in this case could be anything which implements the
> > interface GenbankReceiver.
> >
> > 6. Write a GenbankEmitter class which implements ThingEmitter. It takes a
> > Genbank object and will fire off data from it to the provided
> ThingReceiver
> > (a GenbankReceiver instance) as if the Genbank object was being read from
> a
> > file or some other source.
> >
> > That's it! OK so it's a minimum of 6 classes instead of the original 1 or
> 2,
> > but the additional steps are necessary for flexibility in converting
> between
> > formats.
> >
> > Now to use it (you'll probably want a GenbankTools class to wrap these
> steps
> > up for user-friendliness, including various options for opening files,
> > etc.):
> >
> > 1. To read a file - instantiate ThingParser with your GenbankReader as
> the
> > reader, and GenbankBuilder as the receiver. Use the iterator methods on
> > ThingParser to get the objects out.
> >
> > 2. To write a file - instantiate ThingParser with a GenbankEmitter
> wrapping
> > your Genbank object, and a GenbankWriter as the receiver. Use the
> parseAll()
> > method on the ThingParser to dump the whole lot to your chosen output.
> >
> > The clever bit comes when you want to convert between files. Imagine
> you've
> > done all the above for Genbank, and you've also done it for FASTA. How to
> > convert between them? What you need to do is this:
> >
> > 1. Implement all the classes for both Genbank and FASTA.
> >
> > 2. Write a GenbankFASTAConverter class that implements
> ThingConverter<FASTA>
> > and GenbankReceiver, and will internally convert the data received and
> pass
> > it on out to the receiver provided, which will be a FASTAReceiver
> instance.
> >
> > 3. Write a FASTAGenbankConverter class that operates in exactly the
> opposite
> > way, implementing ThingConverter<Genbank> and FASTAReceiver.
> >
> > Then to convert you use ThingParser again:
> >
> > 1. From FASTA file to Genbank object: Instantiate ThingParser with a
> > FASTAReader reader, a GenbankBuilder receiver, and add a
> > FASTAGenbankConverter instance to the converter chain. Use the iterator
> to
> > get your Genbank objects out of your FASTA file.
> >
> > 2. From FASTA file to Genbank file: Same as option 1, but provide a
> > GenbankWriter instead and use parseAll() instead of the iterator methos.
> >
> > 3. From FASTA object to Genbank object: Same as option 1, but provide a
> > FASTAEmitter wrapping your FASTA object as the reader instead.
> >
> > 4. From FASTA object to Genbank file: Same as option 1, but swap both the
> > reader and the receiver as per options 2 and 3.
> >
> > 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
> mentions
> > of FASTA and Genbank, and use GenbankFASTAConverter instead.
> >
> > One last and very important feature of this approach is that if you
> discover
> > that nobody has written the appropriate converter for your chosen pair of
> > formats A and C, but converters do exist to map A to some other format B
> and
> > that other format B on to C, then you can just put the two converts A-B
> and
> > B-C into the ThingParser chain and it'll work perfectly.
> >
> > Enjoy!
> >
> > cheers,
> > Richard
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: [EMAIL PROTECTED]
> > http://www.eaglegenomics.com/
> > _______________________________________________
> > Biojava-l mailing list  -  [EMAIL PROTECTED]
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: [EMAIL PROTECTED]
http://www.eaglegenomics.com/
_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://lists.open-bio.org/mailman/listinfo/biojava-l