Exact matches only
Search in title
Search in content
Search in posts
Search in pages
portfolio
Filter by Categories
ADAM
Architecture
Big Data
Bio-IT World
Bioinformatics
Blog
Cases
cBioPortal
Consultancy
CTMM TraIT
Data Curation
Data Management
FAIR
Galaxy
Harry van Haaften
IMI EMIF
IMI ETRIKS
IMI Translocation
Irina Pulyakhina
Janneke Schoots
Jarno van Erp
Jojanneke Blaak
Joris Borgdorff
Kees van Bochove
Maxim Monait
Natalia Boukharov
OMOP and OHDSI
Open source
Peter Kok
Pieter Lukasse
Piotr Zakrzewski
RADAR CNS
Rianne Jansen
Riza Nugraha
Services
Seth Snel
Sjoerd van Hagen
Software Development
Software Engineering
Software licenses
Software licenses
Solutions
Spark
Team
Topics
Translational Research
tranSMART
TranSMART 2.0
Uncategorized
Ward Weistra
Wibo Pipping

Ward Weistra

Bioinformatics Project Manager

@TheHyveNL

With the growing popularity of the tranSMART data mart and analysis platform the quality standards for all its modules are continuously rising. The ETL framework (Extract, Transform, Load; the part that reshapes and uploads the data) has been developed in a combination of shell scripts, Kettle code and stored procedures from the conception of the open source tool. However, this leaves room for improvement in a few fronts:

  • There is not a lot of Kettle experience in the community and debugging the tool is not familiar to most Java developers.
  • There are no tests in the current ETL pipeline and the current setup makes creating them very difficult.
  • The stored procedures are maintained separately for each supported database (currently Oracle and Postgres). This gives a lot of room for them to diverge, when developers most often work only on one of the two databases.

Starting from these action points we have started to develop a new ETL framework having a code that is familiar to current tranSMART developers, and thus is easy to debug, and which renders the stored procedures obsolete. Funding for this efforts is provided by IMI EMIF, Janssen and CTMM TraIT.

The framework is based on Spring Batch. Spring is familiar to tranSMART developers as the Spring framework is a component of Grails, the web application framework on which tranSMART is written.

Currently the upload of clinical data, including across trial support, has been implemented in the transmart-batch project, which can be found on our Github repository at https://github.com/thehyve/transmart-batch.

On Tuesday November 18th our senior developer Gustavo Lopes, who has been the main developer involved together with our Carlos Silva, has given a presentation on transmart-batch, introducing both his colleagues and the wider community to the efforts. The screen capture of this presentation is on Youtube, embedded below.

We welcome everyone to check this Github project out, test it and add your contributions in extending and getting this production ready. We are eager to hear your thoughts on transmart-batch below in the comments.

One Response to The Hyve presents on the new transmart-batch ETL framework

  • Janneke Schoots

    Great work! As an ‘ETL user’ I’m really looking forward to actually using this. Keep up the good work.

Leave a Reply

Your email address will not be published. Required fields are marked *

Ward Weistra

Bioinformatics Project Manager

@TheHyveNL