Simplify Data Transfer from MongoDB to PostgreSQL

Purrito is a free, fast and customizable ETL tool for automating data transfer.

Our Story

The Beginning

Booster is a company that has been making people’s lives easier and more convenient in the past 4 years. The idea is simple: you order a boost via the mobile app and you get high-quality gas at your parking spot while you are away. This is the wonderful team of engineers who make all this possible:

In the beginning, we had to make a choice about the DBMS around which we would build Booster: we needed flexibility, high availability and easy development. All of these were offered by MongoDB for free. To make our loyal users happy, we constantly improve our applications which sometimes causes changes in our data structures. MongoDB handles these situations well and lets us focus on different problems.

Booster quickly became part of people’s everyday lives in Silicon Valley, resulting with rapid growth which has not stopped since then.

Need for PostgreSQL

Just like many other companies, we recognized the importance of data analysis as we grew. Data analysis helps us increase our productivity and make better decisions for the future. We needed the ability to perform joins which means that we needed a relational system. Our data engineers were mostly familiar with SQL and we just needed to pick a RDBMS for our data warehouse.

Finding the right RDBMS is not an easy task and there are many database vendors to choose from. We wanted to keep our costs as low as possible.

We chose Postgres since it offered the most important things we were looking for:

object-relational
fast
flexible (the JSON support lets us go schema-less whenever we want to)
free

Adventures with MoSQL

We used an ETL tool called MoSQL to transfer our data from MongoDB to Postgres. MoSQL was created by Stripe and announced in 2013: free, open-source and easy-to-use. It transfers all of the collections defined in a collection map and syncs with MongoDB for changes. We loved MoSQL but unfortunately it is no longer maintained. In 2018 we faced multiple issues: MoSQL crashed when there were schema changes and connectivity issues. This resulted with reimporting millions of documents from the beginning on a weekly basis. Before the reimporting has been started, all of the tables were dropped by MoSQL, making us unable to generate reports and make plans for the next days and weeks when necessary. A full reimport took more than 5 hours. As time passed, the issues were more and more frequent and we were left with error messages which did not describe what went wrong. We recognized the need to take things into our own hands.

A Solution that Works

We needed a service our analytics team could rely on. We created Purrito to do what MoSQL did right and handle the issues MoSQL could not handle anymore. We did not want to make drastic changes in our app logic because we wanted the trasition to be as smooth as possible. We kept the collection map and generated an output similar to the one that MoSQL generates. This way we could avoid refactoring most of our SQL queries and integrate our solution with a only a few changes. We also kept in mind the connectivity issues MoSQL could not always handle.

Comparison: Purrito vs MoSQL

We decided to extend Purrito with some additional features:

transfering data to a custom schema
generate the schema based on MongoDB collections
when restarting, update, drop or trucate the existing tables, create or empty the whole schema etc.
in case of connectivity issues wait for a while, attempt to reconnect to PostgreSQL or MongoDB and start tailing the oplog again from the timestamp saved in the database
update collection map without dropping or stopping anything

	Purrito	MoSQL
Transfer time (cca 50GB data, 10 000 000 rows)	~ 3.5 h	~ 5 h
Size	x MB	y MB
Schema changes coming from MongoDB	logs an error and continues tailing	-
Database unavailable	tries to reconnect to the database	-
Schema changes	update the collection map in the database	update the collection map file and restart
Tailing	tailing from any date and time from the past or from the current timestamp	tail from the current timestamp
Types	create a collection map or choose auto typecheck	create a collection
Written in	Python	Ruby

Ready to get started?

Learn everything you need to know about Purrito.

Quickstart

Follow the instructions to get started quickly.

Docker

We created an example to help you get started even quicker

TRY

Docs

Take a look at our docs to find out more about Purrito

READ

Help Purrito become purrfect

This is just the beginning for Purrito. No tool is perfect but we would love to improve what we created and help other teams just like MoSQL helped us. We are always open to new ideas and suggestions.

Want to contribute?

Contributing to an open-source project is a great opportunity to help the community and increase your knowledge. Purrito is open-source and free and we would like to keep it that way.

You implemented a new feature? Fixed an issue or just a typo in our documentation? We would be happy to see your pull request on Github! Check out our Guidlines on how to contribute.

Contributors

By improving Purrito you will hold a special place in our hearts but we also keep a spot on our page for those who contribute the most. Click here to meet the contributors. We hope that you are the next we can add here!

Found any issues?

Purrito is actively maintained by Booster. Head over to Github and help the community by opening an issue.

Contact

Got an idea on how to improve Purrito?

We are always open to new ideas and suggestions which can help us become better and more efficient. Don't hesitate to give us a shout at dev@boosterfuels.com .