Understanding PostgreSQL UPSERT with ON CONFLICT
PostgreSQL provides a powerful feature called ON CONFLICT for handling duplicate key violations during data insertion. In this post, we’ll explore the following query:
INSERT INTO grp_nodes (group_id, node_id, x, y)
VALUES
(1, 101, 10.0, 20.0),
(2, 102, 30.0, 40.0),
(3, 103, 50.0, 60.0)
ON CONFLICT (group_id, node_id)
DO UPDATE SET
x = EXCLUDED.x,
y = EXCLUDED.y;
This query illustrates how to perform a conditional insert or update, also known as an “upsert” operation. Let’s break it down in detail.
The Table Structure
Before running this query, assume the table grp_nodes is defined as:
CREATE TABLE grp_nodes (
group_id INT NOT NULL,
node_id INT NOT NULL,
x FLOAT NOT NULL,
y FLOAT NOT NULL,
PRIMARY KEY (group_id, node_id)
);
group_idandnode_idform a composite primary key, ensuring uniqueness across these two columns.- The columns
xandystore coordinate data for each node.
Query Explanation
1. INSERT INTO Clause
This clause specifies the target table (grp_nodes) and the columns to populate:
INSERT INTO grp_nodes (group_id, node_id, x, y)
The subsequent VALUES section provides the data to insert:
VALUES
(1, 101, 10.0, 20.0),
(2, 102, 30.0, 40.0),
(3, 103, 50.0, 60.0)
Each row represents a node with its group_id, node_id, and coordinates (x, y).
2. ON CONFLICT Clause
The ON CONFLICT clause specifies how to handle violations of the primary key constraint (composite key of group_id and node_id) when inserting data. Here, it targets the conflict on (group_id, node_id):
ON CONFLICT (group_id, node_id)
This ensures that if a row with the same group_id and node_id already exists, the database will not throw an error.
3. DO UPDATE SET Clause
If a conflict occurs, the DO UPDATE SET clause determines how to update the existing row:
DO UPDATE SET
x = EXCLUDED.x,
y = EXCLUDED.y;
- The keyword
EXCLUDEDrefers to the row that caused the conflict. - In this case, the conflicting row’s
xandyvalues are updated with the new values provided in theVALUESclause.
Complete Query Flow
- PostgreSQL attempts to insert each row.
- If no conflict occurs, the row is inserted as-is.
- If a conflict occurs on
(group_id, node_id), the existing row is updated with the newxandyvalues.
Example Execution
Initial Table State
| group_id | node_id | x | y |
|---|---|---|---|
| 1 | 101 | 5.0 | 10.0 |
Query Execution
When the query runs, here’s what happens:
- The first row
(1, 101, 10.0, 20.0)encounters a conflict because(1, 101)already exists. The existing row’sxandyvalues are updated to10.0and20.0, respectively. - The second and third rows
(2, 102, 30.0, 40.0)and(3, 103, 50.0, 60.0)are inserted successfully because they do not conflict with existing rows.
Final Table State
| group_id | node_id | x | y |
|---|---|---|---|
| 1 | 101 | 10.0 | 20.0 |
| 2 | 102 | 30.0 | 40.0 |
| 3 | 103 | 50.0 | 60.0 |
Key Benefits of ON CONFLICT
-
Atomicity:
- The operation ensures that insertion and updates are performed atomically, avoiding race conditions in concurrent environments.
-
Simplicity:
- Replaces the need for complex
MERGEorUPSERTlogic often required in other databases.
- Replaces the need for complex
-
Efficiency:
- Avoids unnecessary updates or deletes when rows already match the desired state.
Practical Applications
- Data Synchronization: Updating existing records with new data while inserting missing entries.
- Event Tracking: Storing the latest state of entities in real-time systems.
- Geospatial Updates: Updating coordinates for nodes in mapping or network topology applications.
Conclusion
This query is an elegant solution for handling duplicate key conflicts in PostgreSQL. By leveraging the ON CONFLICT clause, developers can streamline their workflows and write efficient, maintainable SQL for upsert operations.