Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB import #542

Open
cwdabba opened this issue Nov 15, 2022 · 5 comments
Open

MongoDB import #542

cwdabba opened this issue Nov 15, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@cwdabba
Copy link

cwdabba commented Nov 15, 2022

Hi thank you very much for the library, its really useful. I tried importing the countries, states & cities tables in MongoDB but the issues is now I am having _id & id column.

Will it have any issues? Or should I rename id column to _id column before importing?

Any suggestions will really help me a lot.

@dr5hn
Copy link
Owner

dr5hn commented Nov 24, 2022

I would suggest keeping the id column as is.

@aprilmintacpineda
Copy link

aprilmintacpineda commented Nov 17, 2023

For those looking to import this to their MongoDB, I wrote a script that can do that (but using prisma):

import { Country, PrismaClient, State } from '@prisma/client';
import axios from 'axios';

/**
 * We want to group items so that we can limit the number of
 * asynchronous requests we send to the database
 */
function group<T = any>(items: T[], max: number) {
  const grouped: T[][] = [];

  do {
    if (items.length > max) grouped.push(items.splice(0, max));
    else grouped.push(items.splice(0));
  } while (items.length > 0);

  return grouped;
}

async function populate() {
  const client = new PrismaClient();
  await client.$connect();
  const maxItems = 20;

  async function populateCountries(_countries: any[]) {
    const countriesGroups = group(_countries, maxItems);

    for (const countries of countriesGroups) {
      await Promise.all(
        countries.map(async _country => {
          const numericCode = String(_country.numeric_code);

          const country = await client.country.upsert({
            where: { numericCode },
            create: {
              name: _country.name,
              iso2: String(_country.iso2),
              iso3: String(_country.iso3),
              numericCode,
              phoneCode: String(_country.phone_code)
            },
            update: {
              name: _country.name,
              phoneCode: String(_country.phone_code)
            }
          });

          await populateStates(_country.states, country);
        })
      );
    }

    console.log('Upserted countries');
  }

  async function populateStates(_states: any[], country: Country) {
    const statesGroups = group(_states, maxItems);

    for (const states of statesGroups) {
      await Promise.all(
        states.map(async (_state: any) => {
          const ref = String(_state.id);

          const state = await client.state.upsert({
            where: { ref },
            create: {
              name: _state.name,
              ref,
              country: {
                connect: {
                  id: country.id
                }
              }
            },
            update: {
              name: _state.name
            }
          });

          await populateCities(_state.cities, state);
        })
      );
    }

    console.log('Upserted states for', country.name);
  }

  async function populateCities(cities: any[], state: State) {
    await client.$transaction(
      cities.map(city => {
        const ref = String(city.id);

        return client.city.upsert({
          where: { ref },
          create: {
            name: city.name,
            ref,
            state: {
              connect: {
                id: state.id
              }
            }
          },
          update: {
            name: city.name
          }
        });
      })
    );

    console.log('Upserted cities for', state.name);
  }

  try {
    const { data } = await axios.get<any>(
      'https://raw.githubusercontent.com/dr5hn/countries-states-cities-database/master/countries%2Bstates%2Bcities.json'
    );

    const start = Date.now();
    await populateCountries(data);
    console.log('took', (Date.now() - start) / 1000);
  } catch (error) {
    console.log(error);
    await client.$disconnect();
  }
}

populate();

This will also handle updates. I added a throttle to ensure that you don't overload your database because of all the async requests the script is sending, I ran it on my macos local to my mongodb docker container and I find that setting maxItems=20 is good, it will take ~300 seconds to import everything. There are 150,573 Cities, 5,084 states, and 250 countries.

@dr5hn
Copy link
Owner

dr5hn commented Mar 4, 2024

@dosubot write a script which will help convert the current db to mongodb

Copy link

dosubot bot commented Mar 4, 2024

Hey @dr5hn! Great to see you diving into the depths of data again 🚀😄

In MongoDB, the _id field is a special field that acts as the primary key for the document. By default, MongoDB creates this field and assigns a unique ObjectId for each document. However, you can also provide your own _id when you insert the document.

If you have an id field in your data and you want to use that as the _id in MongoDB, you can rename the id field to _id before importing. This way, MongoDB will use your id as the _id and won't create a new one.

Here's a Python script that reads the JSON file, renames the id field to _id, and writes the modified data back to a new JSON file:

import json

with open('countries.json', 'r') as f:
    data = json.load(f)
    for doc in data:
        doc['_id'] = doc.pop('id')

with open('countries_mongodb.json', 'w') as f:
    json.dump(data, f)

You can then import the new JSON file into MongoDB using the mongoimport command:

mongoimport --db yourDbName --collection yourCollectionName --file countries_mongodb.json

Please replace 'countries.json' and 'countries_mongodb.json' with the actual paths to your files, and replace 'yourDbName' and 'yourCollectionName' with the actual names of your MongoDB database and collection.

Remember to do this for all the tables (countries, states, cities) you are importing.

This approach will not create any issues as long as the id values in your data are unique. If there are duplicate id values, MongoDB will throw an error when you try to import the data, because _id values must be unique within the collection.

I hope this helps! If you have any other questions, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@FractalMind
Copy link

FractalMind commented Jun 27, 2024

import json

with open('countries.json', 'r') as f:
    data = json.load(f)
    for doc in data:
        doc['_id'] = doc.pop('id')

with open('countries_mongodb.json', 'w') as f:
    json.dump(data, f)
mongoimport --db yourDbName --collection yourCollectionName --file countries_mongodb.json

@dr5hn That script saved my life. Btw don't forget --jsonArray at the end Ex:
mongoimport --db yourDbName --collection yourCollectionName --file countries_mongodb.json --jsonArray

@dr5hn dr5hn added the enhancement New feature or request label Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants