Understanding PostgreSQL UPSERT with ON CONFLICT

Understanding PostgreSQL UPSERT with ON CONFLICT

PostgreSQL provides a powerful feature called ON CONFLICT for handling duplicate key violations during data insertion. In this post, we’ll explore the following query:

INSERT INTO grp_nodes (group_id, node_id, x, y)
VALUES
    (1, 101, 10.0, 20.0),
    (2, 102, 30.0, 40.0),
    (3, 103, 50.0, 60.0)
ON CONFLICT (group_id, node_id)
DO UPDATE SET
    x = EXCLUDED.x,
    y = EXCLUDED.y;

This query illustrates how to perform a conditional insert or update, also known as an “upsert” operation. Let’s break it down in detail.

The Table Structure

Before running this query, assume the table grp_nodes is defined as:

CREATE TABLE grp_nodes (
    group_id INT NOT NULL,
    node_id INT NOT NULL,
    x FLOAT NOT NULL,
    y FLOAT NOT NULL,
    PRIMARY KEY (group_id, node_id)
);
  • group_id and node_id form a composite primary key, ensuring uniqueness across these two columns.
  • The columns x and y store coordinate data for each node.

Query Explanation

1. INSERT INTO Clause

This clause specifies the target table (grp_nodes) and the columns to populate:

INSERT INTO grp_nodes (group_id, node_id, x, y)

The subsequent VALUES section provides the data to insert:

VALUES
    (1, 101, 10.0, 20.0),
    (2, 102, 30.0, 40.0),
    (3, 103, 50.0, 60.0)

Each row represents a node with its group_id, node_id, and coordinates (x, y).

2. ON CONFLICT Clause

The ON CONFLICT clause specifies how to handle violations of the primary key constraint (composite key of group_id and node_id) when inserting data. Here, it targets the conflict on (group_id, node_id):

ON CONFLICT (group_id, node_id)

This ensures that if a row with the same group_id and node_id already exists, the database will not throw an error.

3. DO UPDATE SET Clause

If a conflict occurs, the DO UPDATE SET clause determines how to update the existing row:

DO UPDATE SET
    x = EXCLUDED.x,
    y = EXCLUDED.y;
  • The keyword EXCLUDED refers to the row that caused the conflict.
  • In this case, the conflicting row’s x and y values are updated with the new values provided in the VALUES clause.

Complete Query Flow

  1. PostgreSQL attempts to insert each row.
  2. If no conflict occurs, the row is inserted as-is.
  3. If a conflict occurs on (group_id, node_id), the existing row is updated with the new x and y values.

Example Execution

Initial Table State

group_id node_id x y
1 101 5.0 10.0

Query Execution

When the query runs, here’s what happens:

  1. The first row (1, 101, 10.0, 20.0) encounters a conflict because (1, 101) already exists. The existing row’s x and y values are updated to 10.0 and 20.0, respectively.
  2. The second and third rows (2, 102, 30.0, 40.0) and (3, 103, 50.0, 60.0) are inserted successfully because they do not conflict with existing rows.

Final Table State

group_id node_id x y
1 101 10.0 20.0
2 102 30.0 40.0
3 103 50.0 60.0

Key Benefits of ON CONFLICT

  1. Atomicity:

    • The operation ensures that insertion and updates are performed atomically, avoiding race conditions in concurrent environments.
  2. Simplicity:

    • Replaces the need for complex MERGE or UPSERT logic often required in other databases.
  3. Efficiency:

    • Avoids unnecessary updates or deletes when rows already match the desired state.

Practical Applications

  • Data Synchronization: Updating existing records with new data while inserting missing entries.
  • Event Tracking: Storing the latest state of entities in real-time systems.
  • Geospatial Updates: Updating coordinates for nodes in mapping or network topology applications.

Conclusion

This query is an elegant solution for handling duplicate key conflicts in PostgreSQL. By leveraging the ON CONFLICT clause, developers can streamline their workflows and write efficient, maintainable SQL for upsert operations.