Import into table ignoring duplicates

srondeau · May 4, 2022

I am trying to come up with a solution to import a CSV file into a table which is possibly already populated with some of the same information as in the CSV file. For example, I previously imported a list of students (names with a unique student number) from a class list into a table, and I want to import another list of students from a different class list -- and there may be some of the same students in both class lists.

If in the Forms Configurator I turn on duplicate detection, any import will fail atomically the first time a duplicate entry is attempted to be inserted. What I would like is that it wasn't transaction-based for the entire import, but rather for each row, which would cause it to fail to insert any one row that has a duplicate but continue inserting for all that aren't duplicates.

MySQL has a command called "LOAD FILE INTO" which allows that possibility with the right options. I realize that DaDaBIK wants to support multiple DBMSes and therefore may not be able to support that specific command.

Am I missing an easier solution to this problem? I am trying to use DaDaBIK for low-code, but finding out that I need more and more code.

DaDaBIK 11.5, Chrome browser under Windows (server OS is Centos 7.9), DBMS (MariaDB 10.1.48), Web server (Apache 2.4.6)

eugenio · May 5, 2022

Hello,
you didn't mention which edition you are using. If you are running DaDaBIK Platinum, you can enable CSV synch: when DaDaBIK finds a record already existent (based on one or more field you set), instead of inserting it, the record is updated (not ignored as you mentioned, updated but maybe it is still what you need).
Consider that, more in general, the import from excel/CSV feature in dadabik does much more than "load file into".

Best,

srondeau · May 5, 2022

DaDaBIK Enterprise. You don't ask for the edition when one creates a new post, just the version, which to me is the version number.

Yes, Platinum would do what I want with an update vs. ignore... like LOAD DATA's REPLACE vs. IGNORE.

I had to work with CSV import to get it to import:

I was a bit dismayed not to be able to use a tab character, so I had to change one of the CSV files I create to use an acceptable printable character as a delimiter.
I think the column names of the CSV file must match the case of the column names of the database -- this was before I read about "$csv_col_names" mapping. I changed my table column names instead, then later added the mapping for a different CSV file and table.
I also thought that the table id field (auto-increment primary key) should just work without being part of the CSV file, but it didn't seem to initially. I ended up providing the unique id number myself in the CSV file. However, a different CSV file imported to another table worked without needing an id number in the CSV file, so it must have been an initial misunderstanding on my part.

Nevertheless, when something doesn't work the first time when you think you understand what is going on (importing CSV files isn't rocket science), you start to wonder about the choice of CSV import tools. I think I investigated the one you use for additional insight.

After I sent the support request yesterday, I managed to figure out how to get a custom page to perform a file upload and then to get the database table to automatically ignore duplicates (using a column other than the primary key, which is a simple auto-increment integer). I defined a unique index on the table for the input column from the CSV that I wanted to check for duplicates, and that worked well with LOAD DATA INFILE.

It's a delicate balance to provide a tool that can save time and allow customization. It means that the user must adapt to the "DaDaBIK way" of doing things, and look at the application and database through that lens. I have spent the last two weeks trying to figure out the "DaDaBIK way", from documentation and experimentation, so I can customize the web app the way I need to do it. I am still learning.

eugenio · May 5, 2022

Hello,
you should think to the DaDaBIK import process as a normal INSERT operation coming from an INSERT form, for example if your IDs are auto increment, you typically don't show the ID field to the insert form (so probably you don't want it in the CSV file either).
The chapter "Import data from MS Excel/CSV/ODS" in the documentation explains everything in details (including the synch option provided by DaDaBIK Platinum).

Best,

deep64blue · May 10, 2022

srondeau said:
DaDaBIK Enterprise. You don't ask for the edition when one creates a new post, just the version, which to me is the version number.

Most of us use the data from "Edit this App" -> "About/Upgrade", this gives @eugenio the information he needs and guarantees the accuracy:-

Your current DaDaBIK version
You are using DaDaBIK version 11.5-Elba enterprise, installed on May 9, 2022 (installation code: 1729969999999380a), the latest version of DaDaBIK is 11.5-Elba released on Apr 28, 2022

You are running the latest release of DaDaBIK

In case you want to upgrade to a more powerful edition (from Pro to Enterprise/Platinum, from Enteprise to Platinum) please contact us.

System info
PHP Version: 7.4.29

mysql version: 5.5.5-10.4.22-MariaDB-log

Web server: Apache

Client: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36

URL installation: http://localhost/Webme/sblpa/all/

Import into table ignoring duplicates

srondeau

New member

eugenio

Administrator

srondeau

New member

eugenio

Administrator

deep64blue

DaDaBIK Guru

Your current DaDaBIK version

System info

Import into table ignoring duplicates

srondeau

New member

eugenio

Administrator

srondeau

New member

eugenio

Administrator

deep64blue

DaDaBIK Guru

Your current DaDaBIK version​

System info​

Your current DaDaBIK version

System info