A majority of the contributions of LEADS are available as open-source. Some are integrated in larger projects, and all are available on the project's GitHub page.
LEADS open source contributions to existing projects
Several of our realizations were directly contributed to open source projects and made available to the community. We give a list below.
Infinispan is a distributed and transactional key/value pair datastore. The LEADS fork of Infinispan can be found on GitHub, and improvements by the project are gradually being integrated in Infinispan's main branch. We list the contributions below:
- Ensemble: a library for federating several independent Infinispan clusters on different sites and transparently offer a single view of the federated data store. Ensemble supports customizable geographical distribution and replication guarantees.
This feature is available on the LEADS GitHub page along with a tutorial, and under staging for inclusion to Infinispan main branch.
- Atomic Object Factory: an efficient, scalable and dependable implementation of the distributed object paradigm in Infinispan, based on state-machine replication. This feature is now part of the Infinispan main branch. Check our announcements on this site and on the Infinispan blog. A usage example is available on this page.
- Support for Apache Avro for data serialization. This feature is available on the LEADS GitHub page along with a tutorial, and under staging for inclusion to Infinispan main branch. Check our announcements on this site and on the Infinispan blog.
- Support for Apache Gora allow using Apache Gora APIs with Infinispan as the persistence storage system. This feature is available on the LEADS GitHub page along with a tutorial, and under staging for inclusion to Infinispan main branch. Check our announcements on this site and on the Infinispan blog.
- Support for versioning LEADS contributed efficient mechanisms to store and retrieve data stored under multiple versions. This feature is available in the LEADS Infinispan fork. The principles were described in an IEEE SRDS 2014 paper.
- Clustered listeners and Continuous queries: LEADS contributed two important improvements for handling cluster-wide event notifications and track changes in near real-time, with support for Clustered listeners in Infinispan main branch version 7 (see Infinispan blog post) and support for Continuous queries in Infinispan main branch version 8 (see Infinispan blog post).
Hibernate Search is an indexing and querying library offering advanced search capabilities including full-text search. It also offer advanced clustering capabilities and high performance. Many improvements were made to Hibernate Search to improve its feature set and be used as part of the search engine of LEADS. Hibernate Search is also the search engine of Infinispan.
Hibernate OGM is an Object Mapper for NoSQL solutions and in particular Infinispan. It has been used in LEADS and various improvements have been provided to the project.
Apache ZooKeeper is a coordination kernel for data center applications. We contributed in LEADS the ZooFence library that allows using a global-scale coordination service taking advantage of partitioning, and aggregating several independent ZooKeeper instances running on multiple, geographically distant sites. ZooFence does not require modification to ZooKeeper itself and only uses its standard API. ZooFence is available on the LEADS GitHub along with a tutorial. The principles of ZooFence were presented in an IEEE SRDS 2014 paper.
OpenStack is the reference open-source platform in particular for IaaS installations. The LEADS project contributed automated scaling mechanisms that allow scaling both vertically and horizontally the VMs supporting an application. This software and documentation are available on the LEADS GitHub and require using the UniMon framework (link below).
LEADS independent open source contributions
In addition to direct contributions to open-source projects listed above, the following components were contributed stand-alone by the LEADS project to the community.
Multi-cloud efficient crawling with Unicrawl
Unicrawl is a geo-distributed crawler solution that orchestrates several geographically distributed sites. Each site operates an independent crawler and relies on well-established techniques for fetching and parsing the content of the web. Unicrawl splits the crawled domain space across the sites and federates their storage and computing resources, while minimizing thee inter-site communication cost. Unicrawl builds upon Apache Nutch, Apache Gora, Apache Hadoop and LEADS-enhanced Infinispan. Unicrawl and a usage tutorial are available on the LEADS GitHub. A paper presenting Unicrawl appeared at IEEE Cloud 2015.
Efficiently supporting MapReduce jobs over multiple clouds with Multi-Cloud MapReduce
MapReduce is the reference approach for Big Data processing. In LEADS, the data is stored on multiple sites (Infinispan instances aggregated by Ensemble). The project developed a new approach to efficiently support MapReduce jobs in such a distributed environment. This contribution and a link to installation and usage tutorials are available on the LEADS GitHub.
Multi-cloud monitoring with the UniMon framework
Monitoring the performance and costs of multiple clusters at different sites in real-time can be a cumbersome process. To ease this task, the LEADS project contributes the UniMon framework that orchestrates the monitoring of multiple clouds from a single tool. The tool and instructions to use it are available on the LEADS GitHub.
Other contributions and tools
Note that the LEADS GitHub page contains several other projects that can be of interest to the community but target more experiences users having specific needs (e.g. for specific deployment contexts) of our tools. Do not hesitate to drop us a line if you have any request, do not find what you are looking for or are looking for a collaboration on future developments.