MongoDB utility to inflate the contents of small collection to a new larger collection

Overview

MongoDB Data Inflater ("data-inflater")

The data-inflater tool is a MongoDB utility to automate the creation of a new large database collection using data sourced from an existing smaller database collection.

By default, the utility will use the Atlas 'sample data set' database collection sample_mflix.movies as the source collection. However, most users will provide parameters to the utility to specify the use of their own database and source collection. If you do want to use the Atlas sample data set, see the sample data manual page for more information.

The data-inflater utility issues multiple concurrent aggregation processes, each copying batches of records in parallel for increased performance. The resulting collection will contain documents with duplicated data but with new unique _id field values. The variance ratio of data in the new collection will approximately reflect the variance ratio of the source collection. Therefore, you should ensure you have supplied at least a few different documents (if not a few hundred or thousand) in the source collection.

If you are running a sharded cluster, the utility will ensure the target collection is sharded with a shard key, and where it can, it will pre-split the chunks to avoid subsequent needless balancer overhead. For example, if you specify the --shardkey parameter for this utility to reference a field (e.g. product_name) as the range based shard key, before creating the target collection, the utility will introspect the spread of values for the shard key field (e.g. product_name). The utility will then create pre-split chunks in the new empty target collection before any data is copied to it, to maximise performance.

How To Run

In a running MongoDB cluster (self-managed or running in Atlas), ensure you have created and populated a source collection with at least one sample record in it (ideally more with varying values for the fields across the different documents to reflect the shape and variance you desire).

Ensure Python3 (version 3.8 or greater) and the MongoDB Python Driver (PyMongo) are already installed on your workstation. Example to install PyMongo:

pip3 install --user pymongo

Ensure the .py script is executable and then execute the following to view the utility's help instructions and the full list of parameters that you can provide:

./data-inflater.py -h

Execute the following to connect to a locally running single server database (default port) to copy and expand the data from an existing source collection, mydb.mySrcColl, to an a new collection, mydb.myDestColl, which will contain 1 million records:

./data-inflater.py --url 'mongodb://localhost:27017' -d 'mydb' -c 'mySrcColl' -t 'myDestColl' -s 1000000

Execute the following to connect to an Atlas cluster (ensure you've already loaded the Atlas sample data set), to inflate the data from the source movies collection to the new movies_big collection, which will contain 100 million records (note, first change the URL username, password and hostname shown, to match the URL of your Atlas cluster):

./data-inflater.py --url 'mongodb+srv://usr:[email protected]/'
Owner
Paul Done
Paul Done
Random Number Generator Analysis With Python

Random-Number-Generator-Analysis Governor's Honors Program Project to determine

Jack Prewitt 2 Jan 23, 2022
Generate random german words

Generate random german words / Generiere zufällige deutsche Wörter Getting Started Pip install with pip install zufallsworte Install the library with

Maximilian Freitag 5 Mar 24, 2022
Python type-checker written in Rust

pravda Python type-checker written in Rust Features Fully typed with annotations and checked with mypy, PEP561 compatible Add yours! Installation pip

wemake.services 31 Oct 21, 2022
Helpful functions for use alongside the rich Python library.

🔧 Rich Tools A python package with helpful functions for use alongside with the rich python library. 󠀠󠀠 The current features are: Convert a Pandas

Avi Perl 14 Oct 14, 2022
Daiho Tool is a Script Gathering for Windows/Linux systems written in Python.

Daiho is a Script Developed with Python3. It gathers a total of 22 Discord tools (including a RAT, a Raid Tool, a Nuker Tool, a Token Grabberr, etc). It has a pleasant and intuitive interface to faci

AstraaDev 32 Jan 05, 2023
Random Number Generator

Application for generating a random number.

Michael J Bailey 1 Oct 12, 2021
A simple tool to extract python code from a Jupyter notebook, and then run pylint on it for static analysis.

Jupyter Pylinter A simple tool to extract python code from a Jupyter notebook, and then run pylint on it for static analysis. If you find this tool us

Edmund Goodman 10 Oct 13, 2022
Utility to add/remove licenses to/from source files

Utility to add/remove licenses to/from source files. Supports processing any combination of globs, files, and directories (recurse). Pruning options allow skipping non-licensing files.

Eduardo Ponce Mojica 2 Jan 29, 2022
A program to convert celcius to faranheit. made with python

Temp-Converter What is Temp-Converter Temp-Converter is little program made with pyhton to convert celcius to faranheit. Needed A python interpreter P

Chandula Janith 0 Nov 27, 2021
Helper script to bootstrap a Python environment with the tools required to build and install packages.

python-bootstrap Helper script to bootstrap a Python environment with the tools required to build and install packages. Usage $ python -m bootstrap.bu

Filipe Laíns 7 Oct 06, 2022
WindowsDebloat - Windows Debloat with python

Windows Debloat 🗑️ Quickly and easily configure Windows 10. Disclaimer I am NOT

1 Mar 26, 2022
Python program for analyzing the output files of phonopy.

PhononTools Description Python program to analyze the results generated by phonopy. Using the .yaml and .dat files that phonopy generates one can plot

Harry LaBollita 8 Nov 27, 2022
Dill_tils is a package that has my commonly used functions inside it for ease of use.

DilllonB07 Utilities Dill_tils is a package that has my commonly used functions inside it for ease of use. Installation Anyone can use this package by

Dillon Barnes 2 Dec 05, 2021
Spacegit is a .git exposed finder

Spacegit Spacegit is a basic .git exposed finder Usage: You need python3 installed to run spacegit use: python3 spacegit.py (url) Disclaimer: **This i

2 Nov 30, 2021
Blender 2.93 addon for loading Quake II MD2 files

io_mesh_md2 is a Blender 2.93 addon for importing Quake II MD2 files.

Joshua Skelton 11 Aug 31, 2022
A python module to validate input.

A python module to validate input.

Matthias 6 Sep 13, 2022
Michael Vinyard's utilities

Install vintools To download this package from pypi: pip install vintools Install the development package To download and install the developmen

Michael Vinyard 2 May 22, 2022
A python lib for generate random string and digits and special characters or A combination of them

A python lib for generate random string and digits and special characters or A combination of them

Torham 4 Nov 15, 2022
Manage your exceptions in Python like a PRO

A linter to manage all your python exceptions and try/except blocks (limited only for those who like dinosaurs).

Guilherme Latrova 353 Dec 31, 2022
Utility to extract Fantasy Grounds Unity Line-of-sight and lighting files from a Univeral VTT file exported from Dungeondraft

uvtt2fgu Utility to extract Fantasy Grounds Unity Line-of-sight and lighting files from a Univeral VTT file exported from Dungeondraft This program wo

Andre Kostur 29 Dec 05, 2022