-1

I have a two dimensional data with one dimension is ordered and another one is categorical, for example, country and city_age:

country age city
Italy 2773 Rome
Germany 784 Berlin
USA 397 New York

Suppose this database is bigdata.

Now I want to set age range, for example 500-1500 and then search for countries in O(1). Is it possible?

If I store data in

using TCountryName = string;
using TCityAge = int;
using TCityName = string;

unordered_map<TCountryName, map<TCityAge, TCityName>> data;

It will take me O(Log[n]) to filter inner map in each call, and if I define

map<TCityAge, unordered_map<TCountyName, TCityName>> data;

if will to iterate over inner map with O(n), and if I define

unordered_map<tuple<TCountyName,TCityAge>, TCityName> data;

it will be O(1) but only if I know age beforehead.

Is it possible to have data structure to search both by range and by id fast?

7
  • "Suppose this database is bigdata." Can you clarify this statement? Does your data set fit in memory? Commented Oct 6, 2021 at 12:19
  • @VincentSavard the question is general, so suppose it fits in memory. BidData means to avoid brute force scanning...
    – Dims
    Commented Oct 6, 2021 at 12:22
  • No, it is not possible to find O(n) items in O(1) time. I hope this seems trivial to you. Just returning the items will take O(n) time because there are O(n) items. Commented Oct 6, 2021 at 13:03
  • 3
    @Dims: "Bigdata" usually means thousands of Terabytes of Data, in the order of magnitudes what companies like Amazon or Google process. You should really edit this misleading term out of the question. And to the order of running time: you probably meant O(m) where m is the number of cities returned. O(1) is nonsense.
    – Doc Brown
    Commented Oct 6, 2021 at 13:57
  • 1
    @Dims: there is pretty huge difference wether you are after a simple solution you can implement in main memory, in a single program, or if you need a distributed system with hundreds of machines working in parallel on externally stored data. When you ask for "Bigdata", you ask for the latter. So no. this is not just the same problem on a bigger scale, it is a way larger problem.
    – Doc Brown
    Commented Oct 6, 2021 at 19:01

1 Answer 1

2

Store your data in a simple std::vector. Then create (and maintain) separate lookup data-structures which map search criteria to sets of array indexes. Like this:

std::unordered_multimap<TCountryName, size_t> citiesByCountryName;
std::multimap<TCityAge, size_t> citiesByCityAge;
std::map<TCityName, size_t> cityByName; // assumes city names are unique!

You can now query your data like this:

std::cout<<"Cities in Switzerland are: ";
auto range = citiesByCountryName.equal_range("Switzerland");
for_each (range.first, range.second, [](citiesByCountryName.value_type& i) {
     std::cout << " " << cities[i.second].city;
}

This is basically how indexes work in databases. Speaking of which: When you need a lot of database-like features like this in your application, then you might want to consider to save yourself a lot of work by just using an embedded database. There are even some which understand SQL and run completely in-memory.

OK, but you said you want to search by country name first, and then search for age range within the result-set. For that use-case you could create an unordered map which maps names to multimaps, which then in turn each contain the indexes of all cities within that one country by age:

<std::map<TCountryName, <std::multimap<TCityAge, size_t>> cityByCountryAndAge
  1. you look in this map by country name to get a std::multimap<TCityAge, size_t>
  2. you look in that multimap by age to get an iterator to a set of array indexes
  3. You use those array indexes to retrieve the data payload from the cities array
2
  • 1
    I guess the OP is after a solution where they can apply two search criteria simultanously, and the running time is proportinal to the number of returned items from that combined condition.
    – Doc Brown
    Commented Oct 6, 2021 at 14:00
  • @DocBrown Answer updated
    – Philipp
    Commented Oct 6, 2021 at 14:35

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.