Skip to main content

All Questions

Tagged with
1 vote
2 answers
1k views

Finding similar objects in a large data set

I have a large collection of (C#) objects and these objects have a large number of properties, mostly strings and numbers. This collection is stored in a database. When a new object is about to be ...
Mats's user avatar
  • 163
1 vote
1 answer
1k views

Full, 2-way table synchronization

I need to synchronize tables of data between two different systems. This is a multi-master setup; data can get changed in either system. After a synchronization runs I'd like the data in each table to ...
ccleve's user avatar
  • 171
5 votes
5 answers
1k views

How can billion integer ids be stored and specific ones checked for existence most efficiently? (persistent solution, not just in-memory)

Let me preface this by saying that I am familiar with RDBMS. I have a solution using mysql/mariadb but I am not happy with the efficiency of the solution so I'm looking for alternatives. I'm trying ...
Jimbotron's user avatar
1 vote
3 answers
1k views

Comparing whether two very large text contents are different or not efficiently

I have a MySQL database with a column Body MEDIUMTEXT. Until now I used to only store the contents into it. There was no update option for the users of the application. Now, I wanted to add an update ...
SkrewEverything's user avatar
0 votes
1 answer
17k views

How to represent in a flowchart data fetched from a database

I have an existing piece of code that uses a database. In order to understand its algorithm, I try to represent it with a flow chart. In this algorithm I need to fetch some data from the database in ...
Dimitrios Desyllas's user avatar
1 vote
3 answers
187 views

Architecture/algorithm for unusual recommendation system

There are thousands (or tens of thousands) of possible movie titles. New user enters my website and selects hundreds of titles that he likes. The only single goal of my website is to output the list ...
userQWERTY's user avatar
0 votes
1 answer
117 views

I have a data set of over a million addresses and I want to display the closest N locations to a given address or current location

I am a student working on a personal project which is essentially a location finder that will be on a website. I have a data set of over a million addresses and I want to display the closest N ...
Jason Hong's user avatar
3 votes
3 answers
1k views

Aggregation of data from two Microservices

I have two Microservices A and B. B Microservice has a large set of an entity called User. A Microservice stores the User entity in its own DB if User is configured by an agent. There is no flag ...
Puneeth mypadi's user avatar
11 votes
1 answer
752 views

Concurrent fault-safe data structure

I am building an application that aims to process ~10M data items per second. Each item is exactly 42 bytes small (including a sorting key) which means the total data rate will not be big 420 MB/s. ...
SmartArray's user avatar
7 votes
3 answers
2k views

System Design: Very Large CSV Imports Every Month

We have a webapp that will rely on large CSVs from external vendors every month. When I say large, we are looking at around 6gb so a few million rows. Probably, 2-5 CSVs. This webapp will also allow ...
user2370642's user avatar
6 votes
3 answers
4k views

Effecient algorithm for data deduplication in procedural code

I have written a data cleansing application which, for the most part, works well. It is not designed to handle large volumes of data: nothing more than about half a million rows. So early on in the ...
Bob Tway's user avatar
  • 3,636
4 votes
1 answer
1k views

In what way is an XML database different from (is not specialization of) a graph database?

XML is labelled tree, a special case of graphs. So, why there are separate XML databases and why is it that XML is not stored in graph databases? At a first glance it seems so easy - one should map ...
TomR's user avatar
  • 1,009
0 votes
1 answer
67 views

Add related record on record creation

My question is at a high risk of being a duplicated one, but that is not for lack of investigation. I've been around with this issue for long, and found nothing to solve it, but it seems such an ...
tebastian's user avatar
4 votes
3 answers
138 views

Finding difference in a big live (almost identical) databases

I have a replicated database (not SQL, a triple store, but specifics should not matter too much) running on several hosts. Each of them holds a copy of the database which is updated by feeding from ...
StasM's user avatar
  • 3,367
1 vote
0 answers
107 views

high-dimensional index structures

I'm looking for information on searching (filtering) high-dimensional data. I'm not interested in nearest-neighbour search or clustering, but rather filtering/sorting by subset of dimensions - like in ...
mabn's user avatar
  • 161

15 30 50 per page