Our Blog

Ongoing observations by End Point people

MongoDB and OpenStack - OSI Days 2014, India

By Selvakumar Arumugam · Wednesday, November 19, 2014

Tags: big-data, cloud, conference, mongodb, open-source

The 11th edition of Open Source India, 2014 was held at Bengaluru, India. The two day conference was filled with three parallel tech talks and workshops which was spread across various Open Source technologies.


In-depth look at Architecting and Building solutions using MongoDB

Aveekshith Bushan & Ranga Sarvabhouman from MongoDB started off the session with a comparison of the hardware cost involved with storage systems in earlier and recent days. In earlier days, the cost of storage hardware was very expensive, so the approach was to filter the data to reduce the size before storing into the database. So we were able to generate results from filtered data and we didn’t have option to process the source data. After the storage became cheap, we can now store the raw data and then we do all our filter/processing and then distribute it.


        Filter -> Store -> Distribute


        Store -> Filter -> Distribute

Here we are storing huge amount of data, so we need a processing system to handle and analyse the data in efficient manner. In current world, the data is growing like anything and 3Vs are phenomenal of growing (Big)Data. We need to handle the huge Volume of Variety of data in a Velocity. MongoDB follows certain things to satisfy the current requirement.

MongoDB simply stores the data as a document without any data type constraints which helps to store huge amount of data quickly. It leaves the constraints checks to the application level to increase the storage speed in database end. But it does recognises the data type after the data is stored as document. In simple words, the philosophy is: Why do we need to check the same things (datatype or other constraints) in two places (application and database)? MongoDB stores all relations as single document and fetches the data in single disk seek. By avoiding multiple disk seeks, this results in the fastest retrieval of data. Whereas in relational database the relations stored in different tables which leads to multiple disk seek to retrieve the complete data of an entity. And MongoDB doesn’t support joins but it have Reference option to refer another collection(Table) without imposing foreign key constraints. As per [db-engines](http://db-engines.com/en/ranking) rankings, MongoDB stays in the top of NoSQL database world. Also it provides certain key features which I have remembered from the session: - Sub-documents duplicates the data but it helps to gain the performance(since the storage is cheap, the duplication doesn’t affect much) - Auto-sharding (Scalability) - Sharding helps parallel access to the system - Range Based Sharding  - Replica Sets (High availability) - Secondary indexes available - Indexes are single tunable part of the MongoDB system  - Partition across systems  - Rolling upgrades - Schema free - Rich document based queries - Read from secondary When do you need MongoDB? - The data grows beyond the system capacity in relational database - In a need of performance in online requests Finally, speakers emphasized to understand use case clearly and choose right features of MongoDB to get effective performance. ## OpenStack Mini Conf A special half day OpenStack mini conference was organised at second half of first day. The talks were spread across basics to in depth of OpenStack project. I have summarised all the talks here to give an idea of OpenStack software platform. OpenStack is a Open Source cloud computing platform to provision the Infrastructure as a Service(IaaS). There is a wonderful project [DevStack](http://docs.openstack.org/developer/devstack/) out there to set up the OpenStack on development environment in easiest and fastest way. A well written [documentation](http://docs.openstack.org/) of the OpenStack project clearly explains everything. In addition, anyone can contribute to OpenStack with help of [How to contribute](https://wiki.openstack.org/wiki/How_To_Contribute) guide, also project uses Gerrit review system and Launchpad bug tracking system. OpenStack have multiple components to provide various features in Infrastructure as a Service. Here is the list of OpenStack components and the purpose of each one. Nova (Compute) - manages the pool of computer resources Cinder (Block Storage) - provides the storage volume to machines Neutron (Network) - manages the networks and IP addresses Swift (Object Storage) - provides distributed high availability(replication) on storage system. Glance (Image) - provides a repository to store disk and server images KeyStone (Identity) - enables the common authentication system across all components Horizon (Dashboard) - provides GUI for users to interact with OpenStack components Ceilometer (Telemetry) - provides the services usage and billing reports Ironic (Bare Metal) - provisions bare metal instead of virtual machines Sahara (Map Reduce) - provisions hadoop cluster for big data processing OpenStack services are usually mapped to AWS services to better understand the purpose of the components. The following table depicts the mapping of similar services in OpenStack and AWS:

AWS Console
Elastic Mapreduce

Along with the overview of OpenStack architecture, there were couple of in-depth talks which are listed below with slides.

- Neependra Khare from RedHat gave a presentation on using Docker in OpenStack Nova.  - Pushpesh Sharma presented a comparison between storage component OpenStack Swift and Ceph - Sridhar Rao presented about OpenStack and it’s role in Network Function Virtualisation(NFV) - Slides

That was a wonderful Day One of OSI 2014 which helped me to get better understanding of MongoDB and OpenStack.