<![CDATA[Cemal Turkoglu]]>https://turkogluc.com/https://turkogluc.com/favicon.pngCemal Turkogluhttps://turkogluc.com/Ghost 5.49Thu, 25 May 2023 09:22:16 GMT60<![CDATA[Centralized Logging and Monitoring with Elastic Stack]]>https://turkogluc.com/centralised-logging-and-monitoring-with-elastic-stack/646f24c59b6311000195da66Sun, 13 Feb 2022 21:29:19 GMT

Elasticsearch comes with very handy features in the new versions for collecting logs and also monitoring the hosts or docker containers. We can run elasticsearch and kibana with the following docker-compose file.

version: '2.2'
services:
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.17.0
    container_name: es01
    environment:
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - xpack.security.enabled=true
      - xpack.security.authc.api_key.enabled=true
      - xpack.security.audit.enabled=true
      - ELASTIC_PASSWORD=somethingsecret
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data01:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - elastic


  kib01:
    image: docker.elastic.co/kibana/kibana:7.17.0
    container_name: kib01
    ports:
      - 5601:5601
    environment:
      ELASTICSEARCH_URL: http://es01:9200
      ELASTICSEARCH_HOSTS: '["http://es01:9200"]'
      ELASTICSEARCH_USERNAME: elastic
      ELASTICSEARCH_PASSWORD: somethingsecret
      SERVER_PUBLICBASEURL: http://localhost:5601
      XPACK_ENCRYPTEDSAVEDOBJECTS_ENCRYPTIONKEY: "something_at_least_32_characters"
      XPACK_REPORTING_ENCRYPTIONKEY: "something_at_least_32_characters"
      XPACK_SECURITY_ENCRYPTIONKEY: "something_at_least_32_characters"
    depends_on:
      - es01
    networks:
      - elastic


volumes:
  data01:
    driver: local

networks:
  elastic:
    driver: bridge

Reference on configuring the kibana security:

Configure security in Kibana | Kibana Guide [8.0] | Elastic
A list of the supported authentication mechanisms in Kibana.
Centralized Logging and Monitoring with Elastic Stack

Collecting Metrics with Metricbeat

We can run the following docker container in each host that we want to collect metrics from.

version: '3.8'
services:

  metricbeat:
    image: docker.elastic.co/beats/metricbeat:7.17.0
    user: root
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./metricbeat.docker.yml:/usr/share/metricbeat/metricbeat.yml:ro
      - /sys/fs/cgroup:/hostfs/sys/fs/cgroup:ro
      - /proc:/hostfs/proc:ro
      - /:/hostfs:ro
      - /var/log:/var/log:rw
      - /var/lib/docker/containers:/var/lib/docker/containers:rw
    network_mode: "host"

metricbeat configuration is provided with the metricbeat.docker.yml file:

metricbeat.max_start_delay: 10s
# setup.ilm.enabled: true
setup.dashboards.enabled: true
setup.dashboards.beat: metricbeat


#==========================  Modules configuration =============================
metricbeat.modules:

#-------------------------------- System Module --------------------------------
- module: system
  metricsets:
    - cpu             # CPU usage
    - load            # CPU load averages
    - memory          # Memory usage
    - network         # Network IO
    - process         # Per process metrics
    - process_summary # Process summary
    - uptime          # System Uptime
    - socket_summary  # Socket summary
    #- core           # Per CPU core usage
    #- diskio         # Disk IO
    #- filesystem     # File system usage for each mountpoint
    #- fsstat         # File system summary metrics
    #- raid           # Raid
    #- socket         # Sockets and connection info (linux only)
    #- service        # systemd service information
  enabled: true
  period: 10s
  processes: ['.*']

  # Configure the mount point of the host’s filesystem for use in monitoring a host from within a container
  #system.hostfs: "/hostfs"

  # Configure the metric types that are included by these metricsets.
  cpu.metrics:  ["percentages","normalized_percentages"]  # The other available option is ticks.
  core.metrics: ["percentages"]  # The other available option is ticks.

#-------------------------------- Docker Module --------------------------------
- module: docker
  metricsets:
    - "container"
    - "cpu"
    - "diskio"
    - "event"
    - "healthcheck"
    - "info"
    #- "image"
    - "memory"
    - "network"
    #- "network_summary"
  hosts: ["unix:///var/run/docker.sock"]
  period: 10s
  enabled: true

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
  hosts: ["localhost:9200"]
  username: "elastic"
  password: "somethingsecret"

setup.kibana:
    host: "localhost:5601"
    username: "elastic"
    password: "somethingsecret"

We use system module to collect the host metrics and docker module to collect stats via docker engine. Metricbeat supports many modules that can be integrated and used to collect metrics, such as we can collect metrics from docker, kubernetes, many databases, message queues like rabbitmq, kafka, and many more applications. You can find the modules and their configuration from here:

Modules | Metricbeat Reference [8.0] | Elastic
Centralized Logging and Monitoring with Elastic Stack

It is very easy to configure these modules and collect logs from various systems. For example we can collect metrics from rabbitmq server with the following configuration:

#------------------------------ Rabbit Module ---------------------------------

- module: rabbitmq
  metricsets: ["node", "queue", "connection", "exchange"]
  enabled: true
  period: 10s
  hosts: ["rabbithost:15672"]
  username: admin
  password: rabbit-password

As we enabled setup.dashboards.enabled config, metricbeat loads ready dashboards to kibana. We can customise or create new views or dashboards. Some of the ready dashboards looks as follows:

Centralized Logging and Monitoring with Elastic Stack
Centralized Logging and Monitoring with Elastic Stack

From the observability -> inventory we can view hosts and/or containers with metrics.

Centralized Logging and Monitoring with Elastic Stack

We can check the the logs (if we are collecting) or metrics of any host or container by clicking on the views here.

Centralized Logging and Monitoring with Elastic Stack

As mentioned before we can collect metrics from the databases i.e. postgresql and view the logs as follows:

Centralized Logging and Monitoring with Elastic Stack

Index Lifecycle Management

One of the nice features of elasticsearch in the new versions is the ILM (Index Lifecycle Management) which provides out of box feature to clean up documents from the indexes and storage. As far as I know, for the older versions, people had to run a separate process (curator) to clean old records.

At the moment Elasticsearch provides the following phases for ILM:

  • Hot: The index is actively being updated and queried.
  • Warm: The index is no longer being updated but is still being queried.
  • Cold: The index is no longer being updated and is queried infrequently. The information still needs to be searchable, but it’s okay if those queries are slower.
  • Frozen: The index is no longer being updated and is queried rarely. The information still needs to be searchable, but it’s okay if those queries are extremely slow.
  • Delete: The index is no longer needed and can safely be removed.

ILM moves indices through the lifecycle according to their age. You can set how many days the records should stay in a phase and then it will go to the next phase and finally will be deleted (if configured so).

Centralized Logging and Monitoring with Elastic Stack

As we configured the metricbeat to collect the metrics per 10 second, it will pretty fast increase the disk size and collect GBs per day, in order to get rid of the older ones we can clean up via ILM policies. For further details please check the documentation:

Index lifecycle | Elasticsearch Guide [8.0] | Elastic
Centralized Logging and Monitoring with Elastic Stack

Setting Alerts For Metrics

To create an alert on metrics, we can go to Stack Management -> Alerts and Insights -> Rules and Connectors -> Create Rule. There are multiple rule types we can use, but I would like to show Metric Threshold type. We need to set a name for the rule, an interval to define how often to check, as the notify config we can select only on status change. We need to define which metrics to collect, for example: system.cpu.total.norm.pct average above 80%. We can also group by a field.

Centralized Logging and Monitoring with Elastic Stack

The actions in the free version is limited to Index and Server Log. In the paid versions of subscriptions there are many more options as Email, Slack, Jira, Webhook etc. As I am using the free version for now, I will select the Index action, which will write alert details to an index. When we select the alert action we need to define which fields from the alert event we want to record as a json document.

{
    "actionGroup": "{{alert.actionGroup}}",
    "actionGroupName":"{{alert.actionGroupName}}",
    "actionSubgroup":"{{alert.actionSubgroup}}",
    "alertId":"{{alert.id}}",
    "alertState":"{{context.alertState}}",
    "contextGroup":"{{context.group}}",
    "contextMetric":"{{context.metric}}",
    "contextReason":"{{context.reason}}",
    "contextThreshold":"{{context.threshold}}",
    "contextTimestamp":"{{context.timestamp}}",
    "contextValue":"{{context.value}}",
    "date":"{{date}}",
    "kibanaBaseUrl":"{{kibanaBaseUrl}}",
    "ruleId":"{{rule.id}}",
    "ruleName":"{{rule.name}}",
    "ruleSpaceId":"{{rule.spaceId}}",
    "ruleTags":"{{rule.tags}}",
    "ruleType":"{{rule.type}}"
}

We created new index to write the alert and specified that with a connector as follows:

Centralized Logging and Monitoring with Elastic Stack

If you check the alert-logs index we can see that triggered alerts are saved to this index.

Centralized Logging and Monitoring with Elastic Stack

If you are using free license and want to receive email alerts, you can use the following tool that I was preparing to trigger email alerts.

GitHub - turkogluc/elastic-email-alerts: Elasticsearch alerts email action
Elasticsearch alerts email action. Contribute to turkogluc/elastic-email-alerts development by creating an account on GitHub.
Centralized Logging and Monitoring with Elastic Stack

There are few environment variables to configure and run it. It is very light-weighted tool and can be used to send email alerts from indexes. An example email looks like as follows:

Centralized Logging and Monitoring with Elastic Stack

Collecting Logs

Metricbeat is used to collect the metrics, and to collect the logs we can use filebeat as follows:

version: '3.8'
services:

  filebeat:
    image: docker.elastic.co/beats/filebeat:7.17.0
    user: root
    volumes:
      - ./filebeat.docker.yml:/usr/share/filebeat/filebeat.yml:ro
      - /var/log/custom:/var/log/custom:ro

We can provide the configuration filebeat.docker.yml as follows:

setup.dashboards.enabled: true
setup.dashboards.beat: filebeat

#=========================== Filebeat inputs =============================

filebeat.inputs:

- type: log
  enabled: true
  paths:
    - /var/log/custom/*.json

  json.keys_under_root: true
  json.overwrite_keys: true
  json.add_error_key: true
  json.expand_keys: true

  # Decode JSON options. Enable this if your logs are structured in JSON.
  # JSON key on which to apply the line filtering and multiline settings. This key
  # must be top level and its value must be string, otherwise it is ignored. If
  # no text key is defined, the line filtering and multiline features cannot be used.
  #json.message_key:

  # By default, the decoded JSON is placed under a "json" key in the output document.
  # If you enable this setting, the keys are copied top level in the output document.
  #json.keys_under_root: false

  # If keys_under_root and this setting are enabled, then the values from the decoded
  # JSON object overwrite the fields that Filebeat normally adds (type, source, offset, etc.)
  # in case of conflicts.
  #json.overwrite_keys: false

  # If this setting is enabled, then keys in the decoded JSON object will be recursively
  # de-dotted, and expanded into a hierarchical object structure.
  # For example, `{"a.b.c": 123}` would be expanded into `{"a":{"b":{"c":123}}}`.
  #json.expand_keys: false

  # If this setting is enabled, Filebeat adds a "error.message" and "error.key: json" key in case of JSON
  # unmarshaling errors or when a text key is defined in the configuration but cannot
  # be used.
  #json.add_error_key: false


# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
  enabled: true
  hosts: ["localhost:9200"]
  username: "elastic"
  password: "somethingsecret"

setup.kibana:
  host: "localhost:5601"
  username: "elastic"
  password: "somethingsecret"

We can run this on every host that we want to collect the logs and it will ship the logs. Note that logs are collected from /var/log/custom folder and only json files. The applications should write the logs within this folder.

Application Level Logging for Java Application

For the spring or java applications we can configure the application logging compatible with log collectors and elasticsearch with ecs-logging.

Get started | ECS Logging Java Reference [1.x] | Elastic
Centralized Logging and Monitoring with Elastic Stack

We need to add the following dependency:

implementation 'co.elastic.logging:logback-ecs-encoder:1.3.2'

And add the following logback.xml configuration:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <property name="LOG_FILE" value="${LOG_FILE:-spring.log}"/>
    <include resource="org/springframework/boot/logging/logback/defaults.xml"/>
    <include resource="org/springframework/boot/logging/logback/console-appender.xml" />
    <include resource="org/springframework/boot/logging/logback/file-appender.xml" />
    <include resource="co/elastic/logging/logback/boot/ecs-file-appender.xml" />
    <root level="INFO">
        <appender-ref ref="CONSOLE"/>
        <appender-ref ref="ECS_JSON_FILE"/>
        <appender-ref ref="FILE"/>
    </root>
</configuration>

The LOG_FILE environment variable can be used to specify the path and file name. For example export $LOG_FILE=/var/log/custom/app.log and then there will be app.log.json file will be created and written as well. Therefore filebeat can collect the logs from the specific folder.

]]>
<![CDATA[SQL Window Functions Introduction]]>https://turkogluc.com/sql-window-functions/646f24c59b6311000195da65Sat, 23 Oct 2021 11:45:20 GMT

Window functions perform a calculation over a set of rows that are connected in some way to the current row. It can be compared with GROUP BY and aggregate functions. However with window functions, rows keep their separate identities in the output instead of being grouped into a single output row. The window function is able access more than just the current row of the query result under the hood. Window definition has the following syntax:

WINDOW window_name AS
(
	[ PARTITION BY expression [, ...] ]
	[ ORDER BY expression [ ASC | DESC ] [, ...] ]
)

The ORDER and PARTITION define what is called the "window", the ordered subset of data over which calculations are made.

PARTITION BY

Let's focus on the partitioning concept. When we partition by a field, the table is divided to the partitions/groups and each row individually can access to the items from its own partition. For example:

SELECT emp_no,
       department,
       salary,
       AVG(salary) OVER w
FROM emp_salary
    WINDOW w AS (PARTITION BY department)

The aggregate function AVG is used OVER the window w and the result occurs to be:

SQL Window Functions Introduction
partition by query result

Note that each color represents a partition.

The rows and the first 3 columns are the directly from the emp_salary table and we would see that exact part with SELECT * FROM emp_salary query, the last column comes with the aggregate function AVG that operates on the partition of the current row and calculates the average of salary of the partition.

For each row, the window function is calculated over the rows that fall in the same partition as the current row.

As you can see from the following explain analyse result, it is important to notice that the PARTITION BY statement first orders the table by the partitioned column:

+--------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN                                                                                                          |
+--------------------------------------------------------------------------------------------------------------------+
|WindowAgg  (cost=83.37..104.37 rows=1200 width=72) (actual time=0.589..0.879 rows=10 loops=1)                       |
|  ->  Sort  (cost=83.37..86.37 rows=1200 width=40) (actual time=0.371..0.477 rows=10 loops=1)                       |
|        Sort Key: department                                                                                        |
|        Sort Method: quicksort  Memory: 25kB                                                                        |
|        ->  Seq Scan on emp_salary  (cost=0.00..22.00 rows=1200 width=40) (actual time=0.037..0.129 rows=10 loops=1)|
|Planning Time: 0.068 ms                                                                                             |
|Execution Time: 1.053 ms                                                                                            |
+--------------------------------------------------------------------------------------------------------------------+

ORDER BY

ORDER BY statement can be omitted as it is optional, however there is another important concept to understand the behaviour of using it. For each row, there is a set of rows within its partition called its window frame.

  • When the ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row.
  • When ORDER BY is omitted the default frame consists of all rows in the partition.

If we compare result of the following query with the previous one we can see the AVG is calculated within the frame, which means from start to the current row:

SELECT emp_no,
       department,
       salary,
       AVG(salary) OVER w
FROM emp_salary
    WINDOW w AS (PARTITION BY department ORDER BY salary)
SQL Window Functions Introduction
order by query result

As we can see in the example emp#5 has the average of the first 4 rows.


Usual Aggregates: SUM, COUNT, and AVG

We can use the aggregate  functions that are used normally without windows.

SELECT emp_no,
       department,
       salary,
       AVG(salary) OVER w,
       SUM(salary) OVER w,
       COUNT(salary) OVER w
FROM emp_salary
    WINDOW w AS (PARTITION BY department ORDER BY salary)
SQL Window Functions Introduction
sum, avg, count query result

Remember that the frame would be the whole partition if we did not use ORDER BY statement.

ROW_NUMBER

As the name implies, it shows the number of the row within the partition, and it does not take any parameter.

SELECT emp_no,
       department,
       salary,
       ROW_NUMBER() over w
FROM emp_salary
    WINDOW w AS (PARTITION BY department ORDER BY salary)
SQL Window Functions Introduction
row_number query result

RANK and DENSE_RANK

Rank is similar to the row_number however when 2 fields are having the same order (based on the order by clause) their rank is considered as the same,  and the next rank is omitted, and +1 value is given for the upcoming row.  dense_rank function does not skip the next rank and assigns it to the upcoming row.

SELECT emp_no,
       department,
       salary,
       RANK() over w,
       DENSE_RANK() over w
FROM emp_salary
    WINDOW w AS (PARTITION BY department ORDER BY salary)
SQL Window Functions Introduction
rank, dense_rank query result

LAG and LEAD

It can often be useful to compare rows with preceding or following rows, especially if you have the data in a meaningful order. LAG function can access data of the previous rows, and LEAD function can access the next rows relative to the current row. Both functions take offset as parameter to provide the number of rows to go backward of forward.

SELECT emp_no,
       department,
       salary,
       LAG(salary, 1) over w as preceding_salary,
       LEAD(salary, 1) over w as following_salary
FROM emp_salary
    WINDOW w AS (PARTITION BY department ORDER BY salary ASC)
SQL Window Functions Introduction
lag and lead query result

FIRST_VALUE and LAST VALUE

These functions return the first or last value of the window frame.

SELECT emp_no,
       department,
       salary,
       first_value(salary) over w as smallest_salary,
       last_value(salary) over w  as biggest_salary
FROM emp_salary
    WINDOW w AS (PARTITION BY department ORDER BY salary ASC)
SQL Window Functions Introduction
first_value and last_value query result

Please note as we can use the named windows, we could define it directly in the projection column as well:

SELECT emp_no,
       department,
       salary,
       first_value(salary) over (PARTITION BY department ORDER BY salary) as smallest_salary,
       last_value(salary) over (PARTITION BY department ORDER BY salary)  as biggest_salary
FROM emp_salary

References

3.5. Window Functions
3.5.&nbsp;Window Functions A window function performs a calculation across a set of table rows that are somehow related to the …
SQL Window Functions Introduction
SQL Window Functions | Advanced SQL - Mode
This lesson of the SQL tutorial for data analysis covers SQL windowing functions such as ROW_NUMBER(), NTILE, LAG, and LEAD.
SQL Window Functions Introduction
]]>
<![CDATA[Server Sent Events with Spring Boot and ReactJS]]>https://turkogluc.com/server-sent-events-with-spring-boot-and-reactjs/646f24c59b6311000195da64Sat, 16 Jan 2021 14:43:57 GMT

Server Sent Events (SSE) is an HTTP standart that provides the capability to servers to push streaming data to client. The flow is unidirectional from server to client and client receives updates when the server pushes some data.

Server Sent Events with Spring Boot and ReactJS

SSE has an EventSource interface with a straightforward API in the client side:

var source = new EventSource('sse-endpoint-address');
source.onmessage = function (event) {
  console.log(event.data);
};

The data sent is always decoded as UTF-8. The server sends the events in the text/event-stream MIME type, and the default event type is a message event. onmessage event handler captures these default messages:

Server Sent Events with Spring Boot and ReactJS

The client API has 3 predefined event handlers:

The server can also send custom event types and in that case client should register event listener for that event:

event: add
data: 73857293

event: remove
data: 2153

event: add
data: 113411
source.addEventListener('add', addHandler, false);
source.addEventListener('remove', removeHandler, false);

Spring MVC Server Sent Events

Spring Boot provides a way to implement SSE by using Flux which is a reactive representation of a stream of events, however in this post I use Spring MVC which provides 3 important classes:

  • ResponseBodyEmitter
  • SseEmitter
  • StreamingResponseBody

ResponseBodyEmitter is a parent class which handles async responses, and SseEmitter is a subclass of ResponseBodyEmitter and provides additional support for Server-Sent Events. Let us see some example implementations with SseEmiter in action.

Pushing Time As a Simple Message Event

We can create a controller in the backend side as follows:

@RestController
public class Controller {

    private static final Logger LOGGER = LoggerFactory.getLogger(Controller.class);
    private final ExecutorService executor = Executors.newSingleThreadExecutor();

    @PostConstruct
    public void init() {
        Runtime.getRuntime().addShutdownHook(new Thread(() -> {
            executor.shutdown();
            try {
                executor.awaitTermination(1, TimeUnit.SECONDS);
            } catch (InterruptedException e) {
                LOGGER.error(e.toString());
            }
        }));
    }

    @GetMapping("/time")
    @CrossOrigin
    public SseEmitter streamDateTime() {

        SseEmitter sseEmitter = new SseEmitter(Long.MAX_VALUE);

        sseEmitter.onCompletion(() -> LOGGER.info("SseEmitter is completed"));

        sseEmitter.onTimeout(() -> LOGGER.info("SseEmitter is timed out"));

        sseEmitter.onError((ex) -> LOGGER.info("SseEmitter got error:", ex));

        executor.execute(() -> {
            for (int i = 0; i < 15; i++) {
                try {
                    sseEmitter.send(LocalDateTime.now().format(DateTimeFormatter.ofPattern("dd-MM-yyyy hh:mm:ss")));
                    sleep(1, sseEmitter);
                } catch (IOException e) {
                    e.printStackTrace();
                    sseEmitter.completeWithError(e);
                }
            }
            sseEmitter.complete();
        });

        LOGGER.info("Controller exits");
        return sseEmitter;
    }

    private void sleep(int seconds, SseEmitter sseEmitter) {
        try {
            Thread.sleep(seconds * 1000);
        } catch (InterruptedException e) {
            e.printStackTrace();
            sseEmitter.completeWithError(e);
        }
    }
}

Note that SseEmitter instance is created and given to the thread pool to be used in async tasks, and also it is returned as the response to the REST call. Async task uses the send method to push data, and since only the data is being provided to the method, it pushes as default message event. So REST call immediately  returns the emitter and the "Controller exits", and whenever something is ready to push executor thread will do that.

At the client side, we can simply create a react project with create-react-app and use the EventSource interface to subscribe to an endpoint as follows:

function App() {

  const [listening, setListening] = useState(false);
  const [data, setData] = useState([]);
  let eventSource = undefined;

  useEffect(() => {
    if (!listening) {
      eventSource = new EventSource("http://localhost:8080/time");

      eventSource.onopen = (event) => {
        console.log("connection opened")
      }

      eventSource.onmessage = (event) => {
        console.log("result", event.data);
        setData(old => [...old, event.data])
      }

      eventSource.onerror = (event) => {
        console.log(event.target.readyState)
        if (event.target.readyState === EventSource.CLOSED) {
          console.log('eventsource closed (' + event.target.readyState + ')')
        }
        eventSource.close();
      }

      setListening(true);
    }

    return () => {
      eventSource.close();
      console.log("eventsource closed")
    }

  }, [])

  return (
    <div className="App">
      <header className="App-header">
        Received Data
        {data.map(d =>
          <span key={d}>{d}</span>
        )}
      </header>
    </div>
  );
}

export default App;

So each received message is pushed to the data array and that array is displayed on the App page:

Server Sent Events with Spring Boot and ReactJS

Pushing Custom Progress Event

This time we can use the SseEventBuilder to push a custom JSON data and give it an event name:

sseEmitter.send(SseEmitter.event().name("Progress").data(progress, MediaType.APPLICATION_JSON));

We will push the instances of following class to represent the progress

@JsonInclude(JsonInclude.Include.NON_EMPTY)
public class ObservableProgress {
    private final int target;
    private final AtomicInteger value = new AtomicInteger(0);

    public ObservableProgress(int target) {
        this.target = target;
    }

    public ObservableProgress increment(int v){
        value.getAndAdd(v);
        return this;
    }

    public int getTarget() {
        return target;
    }

    public int getValue() {
        return value.get();
    }

    @Override
    public String toString() {
        return "ObservableProgress{" +
                "target=" + target +
                ", value=" + value +
                '}';
    }
}

In the controller I want to simulate handling a job with multiple steps and each step is an I/O Blocking operation. Any time we complete a step we push back to the client some progress points. It's implementation is as follows:

@RestController
public class Controller {

    private static final Logger LOGGER = LoggerFactory.getLogger(Controller.class);

    @GetMapping("/run")
    @CrossOrigin
    public SseEmitter doTheJob() {

        SseEmitter sseEmitter = new SseEmitter(Long.MAX_VALUE);

        sseEmitter.onCompletion(() -> LOGGER.info("SseEmitter is completed"));

        sseEmitter.onTimeout(() -> LOGGER.info("SseEmitter is timed out"));

        sseEmitter.onError((ex) -> LOGGER.info("SseEmitter got error:", ex));

        ObservableProgress progress = new ObservableProgress(100);

        runAsync(() -> {
            sleep(1, sseEmitter);
            pushProgress(sseEmitter, progress.increment(10));
        })
        .thenRunAsync(() -> {
            sleep(1, sseEmitter);
            pushProgress(sseEmitter, progress.increment(20));
        })
        .thenRunAsync(() -> {
            sleep(1, sseEmitter);
            pushProgress(sseEmitter, progress.increment(10));
        })
        .thenRunAsync(() -> {
            sleep(1, sseEmitter);
            pushProgress(sseEmitter, progress.increment(20));
        })
        .thenRunAsync(() -> {
            sleep(1, sseEmitter);
            pushProgress(sseEmitter, progress.increment(20));
        })
        .thenRunAsync(() -> {
            sleep(1, sseEmitter);
            pushProgress(sseEmitter, progress.increment(20));
        })
        .thenRunAsync(sseEmitter::complete)
        .exceptionally(ex -> {
            sseEmitter.completeWithError(ex);
            throw (CompletionException) ex;
        });

        LOGGER.info("Controller exits");
        return sseEmitter;
    }

    private void pushProgress(SseEmitter sseEmitter, ObservableProgress progress) {
        try {
            LOGGER.info("Pushing progress: {}", progress.toString());
            sseEmitter.send(SseEmitter.event().name("Progress").data(progress, MediaType.APPLICATION_JSON));
        } catch (IOException e) {
            LOGGER.error("An error occurred while emitting progress.", e);
        }
    }

    private void sleep(int seconds, SseEmitter sseEmitter) {
        try {
            Thread.sleep(seconds * 1000);
        } catch (InterruptedException e) {
            e.printStackTrace();
            sseEmitter.completeWithError(e);
        }
    }
}

Note that I use CompletableFuture to simulate the handling async tasks.

In the client side we need to register a listener for our new custom event type:

eventSource.addEventListener("Progress", (event) => {
        const result = JSON.parse(event.data);
        console.log("received:", result);
        setData(result)
});

So the App.js becomes as follows:

import React, {useEffect, useState} from "react";
import {Card, Progress, Row} from "antd";

function App() {

  const [listening, setListening] = useState(false);
  const [data, setData] = useState({value: 0, target: 100});
  let eventSource = undefined;

  useEffect(() => {
    if (!listening) {
      eventSource = new EventSource("http://localhost:8080/run");

      eventSource.addEventListener("Progress", (event) => {
        const result = JSON.parse(event.data);
        console.log("received:", result);
        setData(result)
      });

      eventSource.onerror = (event) => {
        console.log(event.target.readyState)
        if (event.target.readyState === EventSource.CLOSED) {
          console.log('SSE closed (' + event.target.readyState + ')')
        }
        eventSource.close();
      }

      eventSource.onopen = (event) => {
        console.log("connection opened")
      }
      setListening(true);
    }
    return () => {
      eventSource.close();
      console.log("event closed")
    }

  }, [])

  return (

    <>
      <Card title="Progress Circle">
        <Row justify="center">
          <Progress type="circle" percent={data.value / data.target * 100}/>
        </Row>
      </Card>
      <Card title="Progress Line">
        <Row justify="center">
          <Progress percent={data.value / data.target * 100} />
        </Row>
      </Card>
    </>


  );
}

Note that I use Ant Design Progress component to visualise the progress.

Server Sent Events with Spring Boot and ReactJS

Limitation to the maximum number of open connections

Note that SSE has a drawback when not used over HTTP/2, a browser is not allowed to open more than 6 SSE connections for the same address (www.ex-adress.com). So in chrome and firefox we can maximum open 6 tabs to the same address which opens sse connection. When we are using over HTTP/2 there is no such low number limitation, by default up to 100 connections is allowed.

Server-Sent Events vs. WebSockets

Websockets and SSE (Server-Sent Events) are both capable of pushing data to browsers.

Websockets connections are bidirectional and can send data to the browser and receive data from the browser. The games, messaging apps, and the cases where you need near real-time updates in both directions are good examples for WebSocket usage.

SSE connections are unidirectional and can only push data to the browser. Stock tick data, pushing notifications, twitters updating timeline are good examples of an application that could benefit from SSE.

In practice since everything that can be done with SSE can also be done with Websockets and Websockets provides richer protocol. That's why WebSockets gets more attention and used more widely. However, it can be overkill for some types of applications, and the backend could be easier to implement with a protocol such as SSE.

References:

Using server-sent events - Web APIs | MDN
Developing a web application that uses server-sent events is straightforward. You’ll need a bit of code on the server to stream events to the front-end, but the client side code works almost identically to websockets in part of handling incoming events. This is one-way connection, so you can’t send …
WebSockets vs. Server-Sent events/EventSource
Both WebSockets and Server-Sent Events are capable of pushing data to browsers. To me they seem to be competing technologies. What is the difference between them? When would you choose one over the...
Server Sent Events with Spring Boot and ReactJS
Stream Updates with Server-Sent Events - HTML5 Rocks
The EventSource API is designed for receiving push notifications from a server, removing the need for client-size XHR polling.
Server Sent Events with Spring Boot and ReactJS
]]>
<![CDATA[Creating PDF Reports with iText 7 in Java]]>https://turkogluc.com/java-creating-pdf-reports-with-itext/646f24c59b6311000195da63Fri, 27 Nov 2020 19:29:39 GMT

I have been using React-pdf for PDF report generation for quite sometime in one of the React project. It is a nice library for certain size of reports, as content is prepared as React components and styling becomes way easier. However if the page content is generated dynamically and there is no certainty in number of pages it might cause serious problems in the frontend side. I ended up getting my browser frozen or memory limit exceed errors. So for the huge reports I started looking for Java modules to handle it in the backend server, and it did not take me much to come across iText 7.

First thing caught my attention was html2pdf module and I gave it a try. Html content can be provided as a string or a file.

public static void main(String[] args) throws FileNotFoundException {
    String html = "<h1>Test</h1>" +
            "<p>Hello World</p>";

    String dest = "hello.pdf";
    HtmlConverter.convertToPdf(html, new FileOutputStream(dest));
}

Resulting PDF:

Creating PDF Reports with iText 7 in Java

If you have a certain design in Html or you would like to generate Html with thymeleaf, this module could be a good fit for your case.

In my scenario, each item is retrieved from database and inserted to the report, and its images are also placed. The length of the text in the fields and the number of images differs in each item. I decided to use iText core module and APIs to build up a report rather than generating html and converting to pdf. So let's have a look at the building blocks of core API.

Basic Components

We can add the dependency as follows:

implementation 'com.itextpdf:itext7-core:7.1.13'

PDF Document

PDF files are represented by PdfDocument class and it has a wrapper called Document. Instantiating PdfDocument class can be done by providing the PdfReader, or PdfWriter in the constructor. As we intent to write to file we can instantiate with a PdfWriter which wraps the OutputStream or destination File.

// Creating a PdfWriter
String dest = "example.pdf";
PdfWriter writer = new PdfWriter(dest);

// Creating a PdfDocument
PdfDocument pdfDoc = new PdfDocument(writer);

// Adding a new page
pdfDoc.addNewPage();

// Creating a Document
Document document = new Document(pdfDoc);

// Closing the document
document.close();
System.out.println("PDF Created");

We are going to add the components that we want to insert to the Document object by using the following method:

public Document add(IBlockElement element)

Paragraph

Paragraph class is a container for textual information. It can be filled by sending string to constructor or added text by add method. It has many useful elements related to the text or the paragraph view.

String content = "Lorem ipsum dolor sit amet...";
Paragraph paragraph = new Paragraph(content);
paragraph.setFontSize(14);
paragraph.setTextAlignment(TextAlignment.CENTER);
paragraph.setBorder(Border.NO_BORDER);
paragraph.setFirstLineIndent(20);
paragraph.setItalic();
paragraph.setBold();
paragraph.setBackgroundColor(new DeviceRgb(245, 245, 245));
paragraph.setMargin(10);
paragraph.setPaddingLeft(10);
paragraph.setPaddingRight(10);
paragraph.setWidth(1000);
paragraph.setHeight(100);
document.add(paragraph);

document.add(paragraph); // add second time
Creating PDF Reports with iText 7 in Java

AreaBreak

AreaBreak can be used when we would like to start some element at the beginning of the next page. So the remaining part of the current page will be left empty and following elements will start from the next page.

String content = "Lorem ipsum dolor sit amet...";
document.add(new Paragraph(content));
document.add(new AreaBreak());
document.add(new Paragraph("This text will be located in the next pagee"));
Creating PDF Reports with iText 7 in Java

List

We can insert a List as follows:

Paragraph paragraph = new Paragraph("Lorem ipsum dolor...");
document.add(paragraph);

List list = new List();
list.add("Java");
list.add("Go");
list.add("React");
list.add("Apache Kafka");
list.add("Jenkins");
list.add("Elastic Search");
document.add(list);
Creating PDF Reports with iText 7 in Java

Table

Table class can be instantiated with providing either number of columns or array of column length. We can insert Cell objects to a table and a cell may contain any IBlockElement object.

float [] pointColumnWidths = {150F, 150F, 150F, 150F};
Table table = new Table(pointColumnWidths);
// Table table = new Table(4); init by number of columns

table.addCell(new Cell().add(new Paragraph("Id")));
table.addCell(new Cell().add(new Paragraph("Name")));
table.addCell(new Cell().add(new Paragraph("Location")));
table.addCell(new Cell().add(new Paragraph("Date")));

table.addCell(new Cell().add(new Paragraph("1000")));
table.addCell(new Cell().add(new Paragraph("Item-1")));
table.addCell(new Cell().add(new Paragraph("Istanbul")));
table.addCell(new Cell().add(new Paragraph("01/12/2020")));

table.addCell(new Cell().add(new Paragraph("1005")));
table.addCell(new Cell().add(new Paragraph("Item-2")));
table.addCell(new Cell().add(new Paragraph("Warsaw")));
table.addCell(new Cell().add(new Paragraph("05/12/2020")));
Creating PDF Reports with iText 7 in Java

Table and Cell classes contains many methods related to styling such as borders, alignments, background options, margin/padding settings etc.

Image

Image class is used to insert an image to pdf file. It might be created by a local file, remote url or stream.

String imFile = "images/logo2.png";
ImageData data = ImageDataFactory.create(imFile);
Image image = new Image(data);
image.setPadding(20);
image.setMarginTop(20);
image.setWidth(200);
image.setMaxHeight(250);
image.setAutoScale(false);
document.add(image);

ImageDataFactory class may create an image instance from local path or remote URL. Image class has many styling methods as other components.

Creating PDF Reports with iText 7 in Java

Exporting Reports With REST APIs

As a solution to my scenario, first of all, I prepared some general styling for each report type so that I can use it for different entity types. I created an AbstractPdfDocument in order to re-use the common functionality, for example having title and logo at the top of each page, or some common functionality as inserting remote images, or tables. I achieved placing logo and title at the top of each page by registering an EventHandler to the Document class. So the components are listed as follows:

ImageDownloader

As the images are going to be downloaded from remote URLs, I decided to create some ImageDownloader service to download images with a thread pool in parallel.

public class ImageDownloader {

    private static final Logger logger = LoggerFactory.getLogger(ImageDownloader.class);
    private final int NUMBER_OF_THREADS = 12;
    private final ExecutorService executorService = Executors.newFixedThreadPool(NUMBER_OF_THREADS);

    public ConcurrentHashMap <String, Future <Image>> downloadImagesInParallel(
            List <String> imageList, ConcurrentHashMap <String, Future <Image>> imageMap) {

        if (imageList != null && imageList.size() > 0) {
            imageList.forEach(image -> startDownloadImageTask(imageMap, image));
        }
        return imageMap;
    }

    private void startDownloadImageTask(ConcurrentHashMap <String, Future <Image>> imageMap, String image) {
        Future <Image> imageFuture = executorService.submit(() -> getImageObject(image));
        imageMap.put(image, imageFuture);
    }

    private Image getImageObject(String url) {
        try {
            return new Image(ImageDataFactory.create(url, false));
        } catch (MalformedURLException e) {
            logger.error("download image failed: {}", e.getMessage());
            return null;
        }
    }
}

TableHeaderEventHandler

This event handler is used to add a table with logo and title at the top of each page and it is code is as follows:

class TableHeaderEventHandler implements IEventHandler {
    private Table table;
    private float tableHeight;
    private Document doc;

    public TableHeaderEventHandler(Document doc, String documentTitle) {
        this.doc = doc;
        // Calculate top margin to be sure that the table will fit the margin.
        initTable(documentTitle);

        TableRenderer renderer = (TableRenderer) table.createRendererSubTree();
        renderer.setParent(new DocumentRenderer(doc));

        // Simulate the positioning of the renderer to find out how much space the header table will occupy.
        LayoutResult result = renderer.layout(new LayoutContext(new LayoutArea(0, PageSize.A4)));
        tableHeight = result.getOccupiedArea().getBBox().getHeight();

        // set top margin
        float topMargin = 36 + getTableHeight();
        doc.setMargins(topMargin, 36, 36, 36);
    }

    @Override
    public void handleEvent(Event currentEvent) {
        PdfDocumentEvent docEvent = (PdfDocumentEvent) currentEvent;
        PdfDocument pdfDoc = docEvent.getDocument();
        PdfPage page = docEvent.getPage();
        PdfCanvas canvas = new PdfCanvas(page.newContentStreamBefore(), page.getResources(), pdfDoc);
        PageSize pageSize = pdfDoc.getDefaultPageSize();
        float coordX = pageSize.getX() + doc.getLeftMargin();
        float coordY = pageSize.getTop() - doc.getTopMargin();
        float width = pageSize.getWidth() - doc.getRightMargin() - doc.getLeftMargin();
        float height = getTableHeight();
        Rectangle rect = new Rectangle(coordX, coordY, width, height);

        new Canvas(canvas, rect)
                .add(table)
                .close();
    }

    public float getTableHeight() {
        return tableHeight;
    }

    private void initTable(String documentTitle) {
        table = new Table(new float[]{320F, 200F});
        table.useAllAvailableWidth();
        Cell title = new Cell();
        title.setBorder(Border.NO_BORDER);
        Paragraph movement_report = new Paragraph(documentTitle).setFontSize(17);
        title.add(movement_report);
        table.addCell(title);
        table.setMarginBottom(20);
        ImageData data = ImageDataFactory.create("images/rsz_logo10.png");
        Image img = new Image(data);
        img.setWidth(200);
        Cell logo = new Cell();
        logo.setBorder(Border.NO_BORDER);
        logo.add(img);
        table.addCell(logo);
    }
}

AbstractPdfDocument

AbstractPdfDocument contains the common functionality as adding header to pages, creation and destruction of Document object, and helper methods as inserting images, tables, titles etc.

public abstract class AbstractPdfDocument<T> {

    protected final int MAX_IMAGE_NUM = 4;
    protected final Color GRAY = new DeviceRgb(245, 245, 245);
    protected final Color GRAY_LINE = new DeviceRgb(212, 212, 212);
    protected final Color WHITE = new DeviceRgb(255, 255, 255);
    ConcurrentHashMap <String, Future <Image>> imageMap = new ConcurrentHashMap<>();

    private final ImageDownloader imageDownloader;
    protected final String documentTitle;

    AbstractPdfDocument(ImageDownloader imageDownloader, String documentTitle) {
        this.imageDownloader = imageDownloader;
        this.documentTitle = documentTitle;
    }

    public final byte[] generatePdf(List<T> data) {
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        PdfDocument pdfDocument = new PdfDocument(new PdfWriter(outputStream));
        Document document = new Document(pdfDocument);
        TableHeaderEventHandler handler = new TableHeaderEventHandler(document, documentTitle);
        pdfDocument.addEventHandler(PdfDocumentEvent.END_PAGE, handler);

        writeData(document, data);

        document.close();
        return outputStream.toByteArray();
    }

    public void startDownloadingImages(List <String> imageList) {
        if (imageList != null && imageList.size() > 0) {
            imageDownloader.downloadImagesInParallel(imageList, imageMap);
        }
    }

    protected void insertImageTable(Document document, List <String> imageList) {
        Table imageTable = new Table(4);

        imageList.forEach(image -> {
            try {
                Image img = imageMap.get(image).get();
                imageTable.addCell(new Cell()
                        .setTextAlignment(TextAlignment.CENTER)
                        .setHorizontalAlignment(HorizontalAlignment.CENTER)
                        .setVerticalAlignment(VerticalAlignment.MIDDLE)
                        .setBorder(new SolidBorder(GRAY_LINE, 1))
                        .add(img.scaleToFit(114, 114))
                        .setPadding(7));
            } catch (InterruptedException | ExecutionException e) {
                e.printStackTrace();
            }
        });
        document.add(new Paragraph().setMarginTop(10));
        document.add(imageTable);
    }

    protected Table createTable(TableFields tableFields) {
        Table table = new Table(new float[]{220F, 300F});
        AtomicInteger rowCounter = new AtomicInteger(0);

        tableFields.fieldList.forEach(field ->
                insertIfNotNull(field.displayName, field.value, table, rowCounter));
        return table;
    }

    protected Paragraph getBlockTitle(String title) {
        return new Paragraph(title)
                .setFontSize(13)
                .setBorderBottom(new SolidBorder(GRAY_LINE, 1))
                .setMarginTop(35);
    }

    protected void insertIfNotNull(String displayName, Object value, Table table, AtomicInteger rowCounter) {
        if (value != null) {
            Color color = rowCounter
                    .getAndIncrement() % 2 == 0 ?
                    GRAY :
                    WHITE;

            table.addCell(new Cell()
                    .setBorder(Border.NO_BORDER)
                    .setBackgroundColor(color)
                    .add(new Paragraph(displayName)));

            table.addCell(new Cell()
                    .setBorder(Border.NO_BORDER)
                    .setBackgroundColor(color)
                    .add(new Paragraph(String.valueOf(value))));
        }
    }

    protected static class TableField {
        public String displayName;
        public Object value;

        protected TableField(String displayName, Object value) {
            this.displayName = displayName;
            this.value = value;
        }

        @Override
        public boolean equals(Object o) {
            if (this == o) return true;
            if (!(o instanceof TableField)) return false;
            TableField that = (TableField) o;
            return Objects.equals(displayName, that.displayName) &&
                    Objects.equals(value, that.value);
        }

        @Override
        public int hashCode() {
            return Objects.hash(displayName, value);
        }
    }

    protected static class TableFields {
        private List <TableField> fieldList = new ArrayList <>();

        protected void add(String displayName, Object value) {
            fieldList.add(new TableField(displayName, value));
        }

        protected void add(TableField field) {
            fieldList.add(field);
        }
    }

    protected abstract void writeData(Document document, List<T> data);
}

This class uses Factory Method design pattern with its writeData abstract method. The inheritors of this class has to implement this method to lead the insertion of actual content. For each entity the content is inserted differently, and by implementing this method the concrete classes should define how it should be inserted. And also, child classes might use the helper methods for common tasks.

ItemPdfDocument

ItemPdfDocument is the concrete class which extends the AbstractPdfDocument and implements the writeData method. It handles inserting item details and after uses helper method insertImageTable of its parent class.

public class ItemPdfDocument extends AbstractPdfDocument<Item> {

    ItemPdfDocument(ImageDownloader imageDownloader) {
        super(imageDownloader, "Item Report");
    }

    @Override
    protected void writeData(Document document, List <Item> items) {
        startDownloadingItemImages(items);
        items.forEach(item -> {
            addItemFields(document, item);
            addItemImages(document, item);
        });

    }

    private void startDownloadingItemImages(List <Item> items) {
        if(items != null && items.size() > 0) {
            startDownloadingImages(
                    items.stream()
                            .map(Item::getImgList)
                            .flatMap(Collection::stream)
                            .collect(Collectors.toList())
            );
        }
    }

    private void addItemFields(Document document, Item item) {
        document.add(getBlockTitle("Report: " + item.getId()).setMarginTop(0));

        TableFields movementFields = new TableFields();
        movementFields.add("Report Id", item.getId());
        movementFields.add("Item Name", item.getName());
        movementFields.add("Title", item.getTitle());
        movementFields.add("Description", item.getDescription());
        movementFields.add("Area", item.getArea());
        movementFields.add("Location", item.getLocation());
        Table movementDetails = createTable(movementFields);

        document.add(movementDetails);
    }

    private void addItemImages(Document document, Item item) {
        insertImageTable(document, item.getImgList());
        document.add(new Paragraph());
    }
}

So introducing new reports for each entity type is easy, we can create a new concrete class and show how to insert the content in the writeData method.

An example report generated by this document looks like as follows:

Creating PDF Reports with iText 7 in Java

Controller

At the controller I create an endpoint to retrieve the generated pdf file. At the first attempt I returned the byte array from output stream however I realized that might cause corrupted PDF files at the client side. So I encode it with base64 and client should decode it before saving the file.

@PostMapping("/pdf")
public void exportPdf(@RequestParam(required = false) List<Long> idList, HttpServletResponse response) throws IOException {

    response.setHeader("Expires", "0");
    response.setHeader("Cache-Control", "must-revalidate, post-check=0, pre-check=0");
    response.setHeader("Pragma", "public");
    response.setContentType("application/pdf");

    byte[] byteArray = itemService.exportPdf(idList);

    byte[] encodedBytes = Base64.getEncoder().encode(byteArray);
    response.setContentLength(encodedBytes.length);
    OutputStream os = response.getOutputStream();
    os.write(encodedBytes);
    os.flush();
    os.close();
}
]]>
<![CDATA[Java Concurrency - Understanding the Executor Framework And Thread Pool Management]]>https://turkogluc.com/java-concurrency-executor-services/646f24c59b6311000195da60Mon, 02 Nov 2020 22:29:39 GMT

In the previous post, I was writing about the Basics of Threads. This post is going to focus on a higher-level abstraction of thread creation and management.

Concurrent programs generally run a large number of tasks. Creating thread on demand for each task is not a good approach in terms of performance and usage of the resource as the thread creation and threads itself is very expensive. There is also a limitation for the maximum number of threads a program can create, couple of thousands depending on your machine (this is going to be changed with Project Loom).

A call center is one of the good example given to illustrate parallelisation; you can have bounded number of customer representatives in the call center and if there will be more customer calling than your employees, customers wait in the queue until one representative will be available to take the next call. So, hiring a new representative on each call would not make sense.

Therefore it is a better idea to have a thread pool containing a number of threads that would execute the tasks we are sending. Thread pool may create the threads statically (at the time of the creation of the pool), or dynamically (on demand), but it should have a reasonable upper bound. If you like to see a simple thread pool implementation that is queuing the submitted tasks and using the threads from the pool to execute them please check the example in the previous post.

Using the low-level Thread API is also hard and it requires very much attention each time we need to use it. Java Executor framework helps us in this manner by decoupling the creation and management of the Threads from the rest of the application.

In the following sections, I will try to explain consecutively, the ExecutorService interface and its methods, implementations of the ExecutorService, and using the factory methods of Executors to simplify creation of ExecutorService.

The Executor Service

At the heart of the executor framework, there is the Executor interface which has the single execute method:

Java Concurrency - Understanding the Executor Framework And Thread Pool Management

ExecutorService is the main interface extending the Executor and we are going to mostly interact with. It is an abstraction around a Thread Pool, and exposes the submit method that we use to send the tasks. It contains a number of threads in its pool depending its implementations which we will see in the following sections.

When we send the Runnable or Callable tasks by submit method, the threads from the pool are going to run them.

  • Runnable: So far we have mentioned only about runnable, which does not return anything or is not able to throw any exception.
  • Callable: As similar to the Runnable, designed for classes whose
    instances are potentially executed by another thread,  but returns a result and may throw exception.
Java Concurrency - Understanding the Executor Framework And Thread Pool Management
Runnable vs Callable

Submit method returns a Future object that represents the result of an asynchronous computation.  Future has methods to check if the task is complete, to wait for its completion, and to retrieve the result. Its get method returns the result but it is a blocking method, so we can postpone calling the get method as long as possible and do other operations. Once we need the result of the task, we call the get method and if the result is not ready the calling thread will be blocked and we need to wait the result. If we can not afford waiting for long time, we can call the method with a timeout.

Java Concurrency - Understanding the Executor Framework And Thread Pool Management
Future interface

The ExecutorService also provides methods for sending collection of tasks all together. We can use invokeAll method to send multiple tasks, and it returns List of Futures. InvokeAny method can be used to run similar tasks, and it returns the fastest answer

Another important advantage executor service provides is that it has shutdown functionality to stop the pool and the threads. There is an important difference between shutdown and shutdownNow methods:

  • shutdown: Calling this method indicates that no new tasks will be accepted to the queue, and previously sent tasks are going to be waited to complete. Note that if the tasks are long running tasks (infinite loop) they will never complete.
  • shutdownNow: This method interrupts all the active threads, stops the processing of new tasks from the queue and returns the list of those tasks that were waiting in the queue.

Note that in order to stop the processing, we need to handle the interruption in our Runnable/Callable tasks. Otherwise shutdownNow will trigger interruption but no thread will show reaction to it, and it behaves same as shutdown.

We can also see the ScheduledExecutorService in the first diagram, as it extends the ExecutorService and provides methods to run scheduled tasks. It is an high-level abstraction for the Timer, and it is easier and better way to run periodic tasks.

Implementations of the ExecutorService

ExecutorService instances are mostly created by using Executors factory methods, and I will show it in the next section. Executors factory methods are easy way to generate Thread Pools, however before using that, I would like to show the important concrete classes that implements the ExecutorService, because the factory method internally retrieves one of these implementations, and I believe it is important to understand the internals. Knowing some of the concrete ExecutorServices, we can create custom pools in case we have specific needs.

If we look at the following diagram, we have AbstractExecutorService which provides default implementations of submit, invokeAny and InvokeAll methods of the ExecutorService. Concrete implementations overrides some of the implementation details.

Java Concurrency - Understanding the Executor Framework And Thread Pool Management
Implementations of Executor interface

1- ThreadPoolExecutor

The ThreadPoolExecutor is a one of the core implementation of the ExecutorService and it executes each submitted task using one of possibly several pooled threads. This class provides many adjustable parameters and extensibility for configuring and managing the pool. We can configure the following parameters in this class:

  • corePoolSize: minimum number of Threads in the pool.
  • maximumPoolSize: it is self explanatory, the upper bound of pool size. By setting the corePoolSize and maximumPoolSize the same number, we simple create a fixed size pool.
  • ThreadFactory: New threads are created by using a ThreadFactory which is by default Executors#defaultThreadFactory that creates threads to all be in the same ThreadGroup, with the same priority and non-daemon status. If you like to customise it, you can set a different ThreadFactory.
  • keepAliveTime: When the pool has more than minumum number and the threads are idle, exceeding ones are terminated after the keepAliveTime.
  • Queue: A BlockingQueue can be configured to keep the submitted tasks. Example queues and the queueing strategies are as follows:
  1. SynchronousQueue: It is provides a direct handoff strategy, which means that tasks are delivered directly to the workers without storing them in a queue. If no threads are available to take the received task, then a new thread will be constructed. If a maximumPoolSize is set and that limit is reached, task will be rejected.
  2. LinkedBlockingQueue: It provides unbounded queue strategy which means, using a queue without a predefined capacity. This will cause new tasks to wait in the queue when all corePoolSize threads are busy. So no more than corePoolSize threads will be created and the value of maximumPoolSize does not have any effect.
  3. ArrayBlockingQueue: It provides bounded queue strategy which means, using a queue with a predefined capacity. It has a limited space in its queue therefore there should be enough number of threads to consume the tasks rapidly. When the queue is not full tasks are added to the queue. When queue becomes full, and the number of threads are less than maximumPoolSize a new thread is created. Finally when number of threads reaches the limit, the task is rejected.

We can configure parameters mentioned above at the construction time or later with the setter methods.

public ThreadPoolExecutor(int corePoolSize,
                          int maximumPoolSize,
                          long keepAliveTime,
                          TimeUnit unit,
                          BlockingQueue<Runnable> workQueue,
                          ThreadFactory threadFactory,
                          RejectedExecutionHandler handler)

Example of running multiple tasks by ThreadPoolExecutor:

public class Main {

    private static final int CORE_POOL_SIZE = 4;
    private static final int MAX_POOL_SIZE = 4;

    private static final AtomicInteger taskCounter = new AtomicInteger(0);
    private static final ThreadFactory threadFactory = (runnable) -> new Thread(runnable,
        "thread " + taskCounter.incrementAndGet()); // name each thread

    public static void main(String[] args) throws InterruptedException {

        ThreadPoolExecutor pool = new ThreadPoolExecutor(CORE_POOL_SIZE,
            MAX_POOL_SIZE,
            0L, // No timeout.
            TimeUnit.MILLISECONDS,
            new LinkedBlockingQueue<>(),
            threadFactory);

        Collection<Callable<Long>> tasks = new ArrayList<>();
        for (int i = 0; i < 10; i++) {
            int var = i;
            tasks.add(() -> {
                System.out.println("[" + Thread.currentThread().getName() + "]"
                    + " running the task: " + var);
                return Long.valueOf(var * var);
            });
        }

        List<Future<Long>> futures = pool.invokeAll(tasks);
        futures.forEach(longFuture -> {
            try {
                Long result = longFuture.get();
                System.out.println("Result: " + result);
            } catch (InterruptedException e) {
                e.printStackTrace();
            } catch (ExecutionException e) {
                e.printStackTrace();
            }
        });

        pool.shutdown();
        pool.awaitTermination(1, TimeUnit.SECONDS);
    }
}

It would give the following result:

[thread 4] running the task: 3
[thread 2] running the task: 1
[thread 3] running the task: 2
[thread 1] running the task: 0
[thread 4] running the task: 4
[thread 2] running the task: 5
[thread 1] running the task: 6
[thread 2] running the task: 7
[thread 1] running the task: 8
[thread 2] running the task: 9
Result: 0
Result: 1
Result: 4
Result: 9
Result: 16
Result: 25
Result: 36
Result: 49
Result: 64
Result: 81

As you can see in the UML diagram, apart from its configuration, ThreadPoolExecutor class provides many more useful methods, related to the queue, tasks or threads. So it is obviously more verbose than having an Executor, or ExecutorService instance to interact with the pool.

2- ForkJoinPool

The fork/join framework is designed to recursively split a parallelizable task into smaller tasks and then combine the results of each subtask to produce the overall result. The way it works is quite different than the other ExecutorServices, as it is build upon an algorithm based on Divide and Conquer.

Java Concurrency - Understanding the Executor Framework And Thread Pool Management
ForkJoin Framework

As it can be seen in the above figure tasks should be divided into smaller tasks, each small task should be run separately and the results should be combined and merged. When the tasks does not have dependencies to each other and can be divided into smaller subtasks this framework can be used to process them in parallel.

ForkJoinPool contains a single common queue that contains the tasks that are sent form outside, and fixed number of worker threads. Each thread also contains a deque (Double ended queue). Once a thread takes a task from queue, the tasks is divided into smaller pieces and thread add the these smaller subtasks into its deque. The treads takes the subtasks from the front of the deque and processes. Therefore each thread contains the subtasks in its own queue. Once a thread does not have anymore a task in its deque, and also there are no more tasks waiting in the common queue, it starts to take tasks from the back of other threads deques. This behaviour is called work stealing. By this way, ForkJoin framework maximises the efficiency of each thread, decreases the competition between the threads to take task and provides improved parallelism.

Java Concurrency - Understanding the Executor Framework And Thread Pool Management
WorkStealing

In order to work with ForkJoin Framework API in Java, we need to create RecursiveTask instances which is used to define how to operate, divide etc. The API details is not going to be covered in this post.

ForkJoin Framework is used in many places in java, such as in ExecutorServices as we have seen, in parallel streams, in CompletableFutures and so on. Especially Streams API heavily uses ForkJoin pools, so it is important to understand how it works and how it is integrated with streams.

3- ScheduledThreadPoolExecutor

This executor is used to run the task once or periodically multiple times in the future. It serves the similar purpose with Timer class, but it is higher level implementation. After creating an instance, we can simple use the one of the schedule methods:

public ScheduledFuture<?> scheduleAtFixedRate(Runnable command,
                                                  long initialDelay,
                                                  long period,
                                                  TimeUnit unit);

An example logging task can be implemented as follows:

public static void main(String[] args) {
    Runnable runnable = () -> {
        System.out.println("[" + Thread.currentThread().getName() + "]"
            + " running the scheduled task");
    };

    ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(1);
    executor.scheduleAtFixedRate(runnable, 1, 1, TimeUnit.SECONDS);

    Runtime.getRuntime().addShutdownHook(new Thread(() -> {
        executor.shutdown();
        try {
            executor.awaitTermination(1, TimeUnit.SECONDS);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }));
}

Executors Factory Methods

In the previous section we have seen the concrete implementations of ExecutorServices. It is good to know the way to create, configure and manage them. However, we can easily generate ExecutorServices by calling the static factory methods of the Executors class. The class provides the following methods to help creation of thread pools:

Java Concurrency - Understanding the Executor Framework And Thread Pool Management

So we can create most of the commonly used thread pools:

ExecutorService fixedPool = Executors.newFixedThreadPool(4);
ExecutorService workStealingPool = Executors.newWorkStealingPool();
ExecutorService singleThreadPool = Executors.newSingleThreadExecutor();
ExecutorService cachedThreadPool = Executors.newCachedThreadPool();
ScheduledExecutorService scheduledPool = Executors.newScheduledThreadPool(1);

Executors class provides convenient and easy way to generate pools, and generally that's how we create them rather than using the new keyword and initiating the ExecutorService types. As mentioned before Executors class uses the concrete implementations of ExecutorService interface to generate these pools with reasonable default configurations. As a matter of fact, if we inline the method calls we can see the the details how they are being generated:

Java Concurrency - Understanding the Executor Framework And Thread Pool Management

Gracefully closing the Executor Services

We need to remember to close the thread pools when we are done with them, in order to free the resources and also to not cause any problem. If we forget to close, the main process will not exit and hang as long as pool exists. If we need the pool for very specific purpose, we can immediately close them after using. However generally the common approach to have a the thread pools is keeping it in a Singleton class, so pools lives as long as the program is running, and does its job during its lifetime. In such cases we can add a shutdown hook in order to close the pool right before JVM exits.

Runtime.getRuntime().addShutdownHook(new Thread(() -> {
        executor.shutdown();
        try {
            executor.awaitTermination(1, TimeUnit.SECONDS);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
}));
]]>
<![CDATA[Java Concurrency - Basics of Threads]]>https://turkogluc.com/java-concurrency-basics-of-threads/646f24c59b6311000195da5fFri, 30 Oct 2020 09:13:47 GMT

Java Thread objects allow us to run our code in separate threads. When an application starts JVM creates the initial thread named main. The main method is run on the main thread. Inside the application we can create new threads to execute other tasks in parallel with the main thread.

Java uses native operating system threads. So one java thread is mapped by one OS thread.

Creating Threads

The constructor of the Thread class takes a Runnable object. Runnable interface has an abstract run method which is called by Thread#start() method. It object can be instantiated by a lambda, anonymous class or a class which implements Runnable method.

Java Concurrency - Basics of Threads

Using lambdas are generally easier and more compact:

Thread thread = new Thread(() -> {
    // content of run command
});
thread.start();

Thread lives as long as the its run hook method has not returned. The scheduler can suspend and run the Thread many times. For a thread to execute forever, it needs an infinite loop that prevents it from returning.

Join method allows one thread to wait for the completion of another. This is a simple form of barrier synchronisation.

Java Concurrency - Basics of Threads

Java Thread Types: User and Daemon Threads

When JVM start it contains a single User thread, named Main thread. The main difference between User and Daemon threads are what happens when they exit.

  • A user thread continues its lifecycle even if the main thread exits.
  • However all Daemon threads terminates when all the user threads exits.
  • JVM itself exits when all the user threads has exited.

Thread class contains boolean daemon field to specify whether the thread is daemon. It can be set at the time of creation by the constructor or by setter method.

Thread thread = new Thread(getRunnable());
thread.setDaemon(true);
thread.start();

By default daemon field is false, so most of the Threads that we generate is a User Thread. Threads copy the isDaemon status of the parent threat if it is not specified. Java uses Daemon thread in some places such as ForkJoinPool and Timer. To illustrate we can use the following example:

public class Main {

    public static void main(String[] args) throws InterruptedException, ExecutionException {
//        runDeamonThread();
        runUserThread();
        System.out.println(getCurrentThreadName() + " exits");
    }

    private static void runDeamonThread() throws ExecutionException, InterruptedException {
        ExecutorService executorService = Executors.newWorkStealingPool(10);
        executorService.execute(getRunnable());
    }

    private static void runUserThread() {
        Thread thread = new Thread(getRunnable());
        thread.start();
    }

    private static Runnable getRunnable() {
        return () -> {
            for (int i = 0; i <= 200; i++) {
                System.out.print(".");
                Thread.yield();
            }
            System.out.println(getCurrentThreadName() + " exits. isDeamon: " + isDaemon());
        };
    }

    private static boolean isDaemon() {
        return Thread.currentThread().isDaemon();
    }

    private static String getCurrentThreadName() {
        return Thread.currentThread().getName();
    }
}
  • When we invoke runUserThread method it show the following example output:
................................................
main exits
........................................................................................
Thread-0 exits. isDeamon: false
  • The second case is invoking the runDeamonThread which uses ForkJoinPool as an example of Daemon Threads. I could simply use setDaemon(true) method, but wanted to give an example usage. Output:
main exits

So when the main method exits, all the user threads are terminated and JVM exits and kills all daemon threads, so that we did not even have a chance to see output from daemon threads.

Stopping Threads

Compared to creating, stopping a thread is quite hard thing. Once thread starts running it diverges from the caller and it has it is own lifecycle anymore. It can either complete the task and exits or if it does a long running operation it can work forever. Java does not provides us a method (non-deprecated) to stop the thread voluntarily.

  1. A naive approach could be using a stop flag:
volatile boolean isStopped = false;

public void test() {
    new Thread(() -> {
        while (!isStopped) {
            System.out.print(".");
        }
        System.out.println("Child Exits");
    }).start();

    try {
        Thread.sleep(100);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    isStopped = true;
    System.out.println("Main exits");
}

Note that the flag is volatile in order to make its up-to-date value visible for both threads. However this approach fails if the thread is doing blocking operations such as sleep, wait, join or blocking I/O operations.

2. Another way to stop the tread is to use interrupt() method of the thread.

An interrupt request to a thread is an indication that it should stop what it is doing and do something else. It is up to the programmer to decide exactly how a thread responds to an interrupt but it is very common for the tread to terminate.

For the interrupt mechanism to work correctly, the interrupted thread must support its own interruption mechanism. There are 2 cases we can examine for interruption:

  • Non Blocking and Long Running Tasks

In this case calling the thread.interrupt() method will set the interrupt flag of the that thread but if the task itself does not check the status of the interrupted flag it will not have any impact. For example:

public void test() throws InterruptedException {
    Thread thread = new Thread(() -> {
        System.out.println("Child Starts");
        while (true) {
            System.out.print(".");
        }
    });

    thread.start();
    thread.interrupt();

    thread.join();
    System.out.println("Main exits");
}

In order for the thread to catch the interrupt, it should iteratively check the status of the interrupt flag so that it can understand if there are any pending interruption request and handle the request accordingly.

So we can check the flag in our while loop in if it is true we can return or break the loop. In the lambda expression it is not possible to throw an exception but in appropriate places we can throw InterruptedException as well.

public void test() throws InterruptedException {
    Thread thread = new Thread(() -> {
        System.out.println("Child Starts");
        while (true) {
            if (Thread.interrupted()) {
                break;
            }
            System.out.print(".");
        }
        System.out.println("Child exits");
    });

    thread.start();
    thread.interrupt();

    thread.join();
    System.out.println("Main exits");
}

Note the Thread.interrupted() method returns the value of the flag and clears it if it has been true. So if we want to keep the state of the Thread as interrupted for the upper level of stack, we can set it back with Thread.currentThread().interrupt();

  • Blocking Tasks

If a thread frequently calls the blocking methods such as wait, join, sleep, blocking I/O which are all run interruptively, these methods internally check if they have been interrupted and if so they automatically throw InterruptedException. This exception should be caught and handled in the appropriate context. The following example uses the interruption to break the loop in a blocking sleep operation:

public void test() throws InterruptedException {
    Thread thread = new Thread(() -> {
        System.out.println("Child Starts");
        try {
            while (true) {
                Thread.sleep(10000);
            }
        } catch (InterruptedException e) {
            System.out.println("Thread interrupted: " + e.getMessage());
        }
        System.out.println("Child Exits");
    });

    thread.start();
    thread.interrupt();

    thread.join();
    System.out.println("Main exits");
}

There are patterns for dealing with Java InterruptedException:

  • One approach is propagating the exception to the callers, so higher layer would be responsible.
  • Before re-throwing, we can do task specific clean up.
  • If it is not possible to re-throw, we can set the interrupted status to true again with Thread.currentThread().interrupt() to preserve the evidence if the higher layers want to check it.

So as a conclusion if we want to implement cancellable tasks we need to periodically check the status of the interrupt status and handle the interruption in a way that thread will exit.

Thread Groups

In order to simplify thread management, multiple threads  can be organised with java.lang.ThreadGroup objects that group related threads. Each Thread Group needs to have a parent group. In the hierarchy, there is the Main group which is the parent of the other groups or threads we create in the program. We can create ThreadGroup by calling its constructor with a parent group and/or name. To add the Threads in a group we need to specify the group in the Thread's constructor.

public void test() {
    ThreadGroup tg1 = new ThreadGroup("Thread-group-1");
    ThreadGroup tg2 = new ThreadGroup(tg1, "Thread-group-2");

    Thread thread1 = new Thread(tg1,"thread-1");
    Thread thread2 = new Thread(tg2,"thread-2");
    Thread thread3 = new Thread(tg2,"thread-3");

    thread1.start();
    thread2.start();
    thread3.start();

    Thread[] threads = new Thread[tg2.activeCount()];
    tg2.enumerate(threads);

    Arrays.asList(threads).forEach(t -> System.out.println(t.getName()));
    tg1.list();
}

We can iterate over the threads by calling the enumerate method, which fills the given array with the thread references of the group.

We can implement a Thread Pool by making use of Thread Groups:

public class ThreadPool {
    // Create a thread group field
    private final ThreadGroup group = new ThreadGroup("ThreadPoolGroup");
    // Create a LinkedList field containing Runnable
    private final List<Runnable> tasks = new LinkedList<>();

    public ThreadPool(int poolSize) {
        // create several Worker threads in the thread group
        for (int i = 0; i < poolSize; i++) {
            var worker = new Worker(group, "worker-" + i);
            worker.start();
        }
    }

    private Runnable take() throws InterruptedException {
        synchronized (tasks) {
            // if the LinkedList is empty, we wait
            while (tasks.isEmpty()) tasks.wait();
            // remove the first job from the LinkedList and return it
            return tasks.remove(0);
        }
    }

    public void submit(Runnable job) {
        // Add the job to the LinkedList and notifyAll
        synchronized (tasks) {
            tasks.add(job);
            tasks.notifyAll();
        }
    }

    public int getRunQueueLength() {
        // return the length of the LinkedList
        // remember to also synchronize!
        synchronized (tasks) {
            return tasks.size();
        }
    }

    public void shutdown() {
        // this should stop all threads in the group
        group.interrupt();
    }

    private class Worker extends Thread {
        public Worker(ThreadGroup group, String name) {
            super(group, name);
        }

        public void run() {
            // we run in an infinite loop:
            while(true) {
                // remove the next job from the linked list using take()
                // we then call the run() method on the job
                try {
                    take().run();
                } catch (InterruptedException e) {
                    e.printStackTrace();
                    break;
                }
            }
        }
    }
}

Thread Local Variables

Java ThreadLocal class can be used to create variables whose value can be accessible by only the same thread. So, even if two threads are executing the same code, and the code has a reference to the same ThreadLocal variable, the two threads cannot see each other's ThreadLocal variables.

public class Main {

    public static class ThreadLocalStorage {

        private static final ThreadLocal<String> threadLocal = new ThreadLocal<>();

        public static void setName(String name) {
            threadLocal.set(name);
        }

        public static String getName() {
            return threadLocal.get();
        }
    }

    public static void main(String[] args) {

        ThreadLocalStorage.setName("Main thread");

        Runnable runnable = () -> {
            ThreadLocalStorage.setName(getCurrentThreadName());
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            System.out.println("Thread: [" + getCurrentThreadName() + "] " +
                "- value: [" + ThreadLocalStorage.getName() + "]");
        };

        Thread thread1 = new Thread(runnable);
        Thread thread2 = new Thread(runnable);

        thread1.start();
        thread2.start();

        System.out.println("Main exits");
    }

    private static String getCurrentThreadName() {
        return Thread.currentThread().getName();
    }
}

If we run the code we can see that each thread has its own copy of the ThreadLocal object.

Main exits
Thread: [Thread-0] - ThreadLocal value: [Thread-0]
Thread: [Thread-1] - ThreadLocal value: [Thread-1]

Instead of each thread having its own value inside a ThreadLocal, the InheritableThreadLocal grants access to values to a thread and all child threads created by that thread.

References

]]>
<![CDATA[Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-3)]]>https://turkogluc.com/developing-admin-portal-with-react-redux-and-ant-design-part-3/646f24c59b6311000195da5eSun, 18 Oct 2020 09:30:24 GMT

In the part-1, I started with installing the React, dependencies and preparing the environment and showed  the implementation of Main Layout of the Admin Panel. I have demonstrated routing in the react with the SiderMenu and Content in the Layout.

In the part-2, I demonstrated using generic components with custom hooks, and created exemplary pages containing the tables in order to list the items, forms for create, update pages, and we designed the pages by invoking the reusable hooks.

In this part I would like show more visual components by designing the Dashboard page. For charting I use bizcharts, that is a chart component library with wide variety of choice. Thanks to its Example Charts gallery, it is easy to just copy the components and use in our pages.

Designing the Dashboard

We need to start with install the bizcharts library:

yarn add bizcharts

ChartCard Component

I would like have a Card component that will contain a summary info, and in the size of 1/4 of a row, so that I can add 4 of them in a row and display some tiny bit of information within. It is going to look like as follows:

Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-3)

The component code is as follows:

import React from 'react';
import { Card } from 'antd';
import './ChartCard.less';

function ChartCard(props) {
  const renderContent = () => {
    const {
      contentHeight,
      title,
      avatar,
      action,
      total,
      footer,
      children,
      loading,
    } = props;

    return (
      <div className="chartCard">
        <div className="chartTop">
          <div className="avatar">{avatar}</div>
          <div className="metaWrap">
            <div className="meta">
              <span className="title">{title}</span>
              <span className="action">{action}</span>
            </div>
            <div className="total">{total}</div>
          </div>
        </div>
        {children && (
          <div className="content" style={{ height: contentHeight || 'auto' }}>
            <div className="contentHeight">{children}</div>
          </div>
        )}
        {footer && <div className="footer">{footer}</div>}
      </div>
    );
  };
  return (
    <Card loading={false} bodyStyle={{ padding: '20px 24px 8px 24px' }}>
      {renderContent()}
    </Card>
  );
}

export default ChartCard;
ChartCard.js

And it has a styling file:

.chartCard {
  position: relative;
  .chartTop {
    position: relative;
    width: 100%;
    overflow: hidden;
  }
  .chartTopMargin {
    margin-bottom: 12px;
  }
  .chartTopHasMargin {
    margin-bottom: 20px;
  }
  .metaWrap {
    float: left;
  }
  .avatar {
    position: relative;
    top: 4px;
    float: left;
    margin-right: 20px;
    img {
      border-radius: 100%;
    }
  }
  .meta {
    height: 22px;
    color: fade(#000, 45%);
    font-size: 14px;
    line-height: 22px;
  }
  .action {
    position: absolute;
    top: 4px;
    right: 0;
    line-height: 1;
    cursor: pointer;
  }
  .total {
    height: 38px;
    margin-top: 4px;
    margin-bottom: 0;
    overflow: hidden;
    color: fade(#000, 85%);
    font-size: 30px;
    line-height: 38px;
    white-space: nowrap;
    text-overflow: ellipsis;
    word-break: break-all;
  }
  .content {
    position: relative;
    width: 100%;
    margin-bottom: 12px;
  }
  .contentFixed {
    position: absolute;
    bottom: 0;
    left: 0;
    width: 100%;
  }
  .footer {
    margin-top: 8px;
    padding-top: 9px;
    border-top: 1px solid hsv(0, 0, 91%);
    & > * {
      position: relative;
    }
  }
  .footerMargin {
    margin-top: 20px;
  }
  .trendText {
    margin-left: 8px;
    margin-right: 4px;
    color: fade(#000, 85%);
  }
  .boldText {
    color: fade(#000, 85%);
  }
}
.chartCard {
  position: relative;
  .chartTop {
    position: relative;
    width: 100%;
    overflow: hidden;
  }
  .chartTopMargin {
    margin-bottom: 12px;
  }
  .chartTopHasMargin {
    margin-bottom: 20px;
  }
  .metaWrap {
    float: left;
  }
  .avatar {
    position: relative;
    top: 4px;
    float: left;
    margin-right: 20px;
    img {
      border-radius: 100%;
    }
  }
  .meta {
    height: 22px;
    color: fade(#000, 45%);
    font-size: 14px;
    line-height: 22px;
  }
  .action {
    position: absolute;
    top: 4px;
    right: 0;
    line-height: 1;
    cursor: pointer;
  }
  .total {
    height: 38px;
    margin-top: 4px;
    margin-bottom: 0;
    overflow: hidden;
    color: fade(#000, 85%);
    font-size: 30px;
    line-height: 38px;
    white-space: nowrap;
    text-overflow: ellipsis;
    word-break: break-all;
  }
  .content {
    position: relative;
    width: 100%;
    margin-bottom: 12px;
  }
  .contentFixed {
    position: absolute;
    bottom: 0;
    left: 0;
    width: 100%;
  }
  .footer {
    margin-top: 8px;
    padding-top: 9px;
    border-top: 1px solid hsv(0, 0, 91%);
    & > * {
      position: relative;
    }
  }
  .footerMargin {
    margin-top: 20px;
  }
  .trendText {
    margin-left: 8px;
    margin-right: 4px;
    color: fade(#000, 85%);
  }
  .boldText {
    color: fade(#000, 85%);
  }
}
ChartCard.less

Mini Charting Components for ChartCard

I would like to add some bar/line charts within the ChartCard to display some visual summary info, but to be able to fit in such a small area, I need to use some styling and autoHeight method.

import React from 'react';

function computeHeight(node) {
  const { style } = node;
  style.height = '100%';
  const totalHeight = parseInt(`${getComputedStyle(node).height}`, 10);
  const padding =
    parseInt(`${getComputedStyle(node).paddingTop}`, 10) +
    parseInt(`${getComputedStyle(node).paddingBottom}`, 10);
  return totalHeight - padding;
}
function getAutoHeight(n) {
  if (!n) {
    return 0;
  }
  const node = n;
  let height = computeHeight(node);
  const { parentNode } = node;
  if (parentNode) {
    height = computeHeight(parentNode);
  }
  return height;
}
function autoHeight() {
  return WrappedComponent => {
    class AutoHeightComponent extends React.Component {
      constructor(props) {
        super(props);
        this.state = {
          computedHeight: 0,
        };
        this.root = undefined;
        this.handleRoot = node => {
          this.root = node;
        };
      }

      componentDidMount() {
        // eslint-disable-next-line react/prop-types
        const { height } = this.props;
        if (!height) {
          let h = getAutoHeight(this.root);
          this.setState({ computedHeight: h });
          if (h < 1) {
            h = getAutoHeight(this.root);
            this.setState({ computedHeight: h });
          }
        }
      }

      render() {
        // eslint-disable-next-line react/prop-types
        const { height } = this.props;
        const { computedHeight } = this.state;
        const h = height || computedHeight;
        return (
          <div ref={this.handleRoot}>
            {/* eslint-disable-next-line react/jsx-props-no-spreading */}
            {h > 0 && <WrappedComponent {...this.props} height={h} />}
          </div>
        );
      }
    }
    return AutoHeightComponent;
  };
}

export default autoHeight;
autoHeight.js

Some general styling:

.miniChart {
  position: relative;
  width: 100%;
  .chartContent {
    position: absolute;
    bottom: -28px;
    width: 100%;
    > div {
      margin: 0 -5px;
      overflow: hidden;
    }
  }
  .chartLoading {
    position: absolute;
    top: 16px;
    left: 50%;
    margin-left: -7px;
  }
}
chart.less

MiniArea Component

import React from 'react';
import { Axis, Chart, Geom, Tooltip } from 'bizcharts';
import './chart.less';
import autoHeight from './autoHeight';

function MiniArea(props) {
  const {
    height = 1,
    data = [],
    forceFit = true,
    color = 'rgba(24, 144, 255, 0.2)',
    borderColor = '#1089ff',
    scale = { x: {}, y: {} },
    borderWidth = 2,
    line,
    xAxis,
    yAxis,
    animate = true,
  } = props;

  const padding = [36, 5, 30, 5];

  const scaleProps = {
    x: {
      type: 'cat',
      range: [0, 1],
      ...scale.x,
    },
    y: {
      min: 0,
      ...scale.y,
    },
  };

  const tooltip = [
    'x*y',
    (x, y) => ({
      name: x,
      value: y,
    }),
  ];

  const chartHeight = height + 54;

  return (
    <div className="miniChart" style={{ height }}>
      <div className="chartContent">
        {height > 0 && (
          <Chart
            animate={animate}
            scale={scaleProps}
            height={chartHeight}
            forceFit={forceFit}
            data={data}
            padding={padding}
          >
            <Axis
              key="axis-x"
              name="x"
              label={null}
              line={null}
              tickLine={null}
              grid={null}
              {...xAxis}
            />
            <Axis
              key="axis-y"
              name="y"
              label={null}
              line={null}
              tickLine={null}
              grid={null}
              {...yAxis}
            />
            <Tooltip showTitle={false} crosshairs={false} />
            <Geom
              type="area"
              position="x*y"
              color={color}
              tooltip={tooltip}
              shape="smooth"
              style={{
                fillOpacity: 1,
              }}
            />
            {line ? (
              <Geom
                type="line"
                position="x*y"
                shape="smooth"
                color={borderColor}
                size={borderWidth}
                tooltip={false}
              />
            ) : (
              <span style={{ display: 'none' }} />
            )}
          </Chart>
        )}
      </div>
    </div>
  );
}

export default autoHeight()(MiniArea);
MiniArea.js

We will use MiniArea component in ChartCard and it looks like as follows:

Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-3)

MiniBar Component

import React from 'react';
import { Chart, Interval, Interaction } from 'bizcharts';
import './chart.less';
import autoHeight from './autoHeight';

function MiniBar(props) {
  const data = [
    { year: '1951 year', sales: 38 },
    { year: '1952 year', sales: 52 },
    { year: '1956 year', sales: 61 },
    { year: '1957 year', sales: 45 },
    { year: '1958 year', sales: 48 },
    { year: '1959 year', sales: 38 },
    { year: '1960 year', sales: 38 },
    { year: '1962 year', sales: 38 },
    { year: '1963 year', sales: 10 },
    { year: '1965 year', sales: 90 },
    { year: '1966 year', sales: 80 },
    { year: '1967 year', sales: 20 },
    { year: '1968 year', sales: 80 },
    { year: '1970 year', sales: 50 },
  ];

  return (
    <div style={{ paddingTop: '20px' }}>
      <Chart autoFit pure data={data}>
        <Interval position="year*sales" />
        <Interaction type="element-highlight" />
        <Interaction type="active-region" />
      </Chart>
    </div>
  );
}

export default MiniBar;
Minibar.js

We will use MiniBar component in ChartCard and it looks like as follows:

Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-3)

Adding ChartCards to Dashboard

We can display similar information or visual charts in the ChartCards and we can display them in the Dashboard as follows:

import React from 'react';
import { Card, Col, Row, Layout, Tooltip } from 'antd';
import { InfoCircleFilled, CaretUpFilled } from '@ant-design/icons';
import ChartCard from '../../component/chart/ChartCard';
import MiniArea from '../../component/chart/MiniArea';
import MiniBar from '../../component/chart/MiniBar';
import MiniProgress from '../../component/chart/MiniProgress';
import { movementSummary, visitSummary } from './Constants';
import ProductBarChart from '../../component/chart/ProductBarChart';
import ProductPieChart from '../../component/chart/ProductPieChart';

function Dashboard() {
  const topColResponsiveProps = {
    xs: 24,
    sm: 12,
    md: 12,
    lg: 12,
    xl: 6,
    style: { marginBottom: 24 },
  };

  return (
    <>
      <Row gutter={24} type="flex">
        <Col {...topColResponsiveProps}>
          <ChartCard
            bordered={false}
            title="Total Items"
            action={
              <Tooltip title="Total number of items">
                <InfoCircleFilled />
              </Tooltip>
            }
            loading={false}
            total={12}
            footer={
              <>
                <span className="boldText">{13}</span> Items added in the last{' '}
                <span className="boldText">7</span> days
              </>
            }
            contentHeight={46}
          >
            <div style={{ position: 'absolute', bottom: 0, left: 0 }}>
              Weekly Changes
              <span className="trendText">{14}%</span>
              <CaretUpFilled style={{ color: '#52c41a' }} />
            </div>
          </ChartCard>
        </Col>
        <Col {...topColResponsiveProps}>
          <ChartCard
            bordered={false}
            title="Portal Visits"
            action={
              <Tooltip title="Total number of active users in the last month.">
                <InfoCircleFilled />
              </Tooltip>
            }
            loading={false}
            total={10}
            footer={
              <>
                <span className="boldText">{12}</span> Average daily visits per
                day
              </>
            }
            contentHeight={46}
          >
            <MiniArea color="#975FE4" data={visitSummary} />
          </ChartCard>
        </Col>
        <Col {...topColResponsiveProps}>
          <ChartCard
            bordered={false}
            title="Items Moved"
            action={
              <Tooltip title="Item movement in the last year.">
                <InfoCircleFilled />
              </Tooltip>
            }
            loading={false}
            total={124}
            footer={
              <>
                <span className="boldText">{123}</span> Items moved in the last
                month
              </>
            }
            contentHeight={46}
          >
            <MiniBar data={movementSummary} />
          </ChartCard>
        </Col>
        <Col {...topColResponsiveProps}>
          <ChartCard
            bordered={false}
            title="Item Returns"
            action={
              <Tooltip title="Percentage of returned items.">
                <InfoCircleFilled />
              </Tooltip>
            }
            loading={false}
            total={10 + ' %'}
            footer={
              <>
                <span className="boldText">{12}</span> Items in the last year
              </>
            }
            contentHeight={46}
          >
            <MiniProgress
              percent={10}
              strokeWidth={16}
              color="#13C2C2"
              target={100}
            />
          </ChartCard>
        </Col>
      </Row>
    </>
  );
}

export default Dashboard;
Dashboard.js

Now the Dashboard page looks like as follows:

Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-3)

BarChart Component

import React from 'react';
import { Chart, Interval, Tooltip } from 'bizcharts';
import { Card } from 'antd';

const barData = [
  { x: 'W-1', y: 44 },
  { x: 'W-2', y: 201 },
  { x: 'W-3', y: 41 },
  { x: 'W-4', y: 197 },
  { x: 'W-5', y: 173 },
  { x: 'W-6', y: 184 },
  { x: 'W-7', y: 109 },
  { x: 'W-8', y: 55 },
  { x: 'W-9', y: 28 },
  { x: 'W-10', y: 153 },
  { x: 'W-11', y: 76 },
  { x: 'W-12', y: 27 },
];

function ProductBarChart() {
  return (
    <Card bordered={false}>
      <Chart
        height={250}
        autoFit
        data={barData}
        interactions={['active-region']}
      >
        <Interval position="x*y" />
        <Tooltip shared />
      </Chart>
    </Card>
  );
}

export default ProductBarChart;
ProductBarChart.js

PieChart Component

import React from 'react';
import { Interaction, PieChart } from 'bizcharts';
import { Card } from 'antd';

const pieData = [
  {
    type: 'home',
    value: 27,
  },
  {
    type: 'living',
    value: 25,
  },
  {
    type: 'accessories',
    value: 18,
  },
  {
    type: 'jewellery',
    value: 15,
  },
  {
    type: 'clothing',
    value: 10,
  },
  {
    type: 'handmade',
    value: 5,
  },
];

function ProductPieChart() {
  return (
    <Card bordered={false}>
      <PieChart
        forceFit
        height={250}
        data={pieData}
        radius={0.8}
        angleField="value"
        colorField="type"
        label={{
          visible: true,
          type: 'outer',
          offset: 20,
          formatter: val => `${val}%`,
        }}
      >
        <Interaction type="element-single-selected" />
      </PieChart>
    </Card>
  );
}

export default ProductPieChart;
ProductPieChart.js

These components are just some example usage of chart components of Bizchart. We can display them in the dashboard with a 1/2 row size card. Right after the first row in the Dashboard page we can add these components as the second row:

<Row gutter={24} type="flex">
  <Col span={12}>
    <Card title="Weekly Sale Report">
      <ProductBarChart />
    </Card>
  </Col>
  <Col span={12}>
    <Card title="Sale Summary">
      <ProductPieChart />
    </Card>
  </Col>
</Row>
Dashboard.js

Now the Dashboard page looks like as follows:

Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-3)

See the commit for the changes: 22f2798.

]]>
<![CDATA[Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-2)]]>https://turkogluc.com/developing-admin-portal-with-react-redux-and-ant-design-part-2/646f24c59b6311000195da5dSun, 18 Oct 2020 09:30:16 GMT

In the part-1, I demonstrated the implementation of Main Layout of the Admin Panel. In this part I will show some generic components that can be used in the content part of the layout, such as tables, forms, charts.

Header Component

We can define a generic component that can be used as the header above the table views. It contains a search bar, add new and delete buttons and looks like as follows:

Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-2)
import React from 'react';
import { Button, Col, Divider, Input, Popconfirm, Row } from 'antd';
import {
  DeleteOutlined,
  PlusOutlined,
  QuestionCircleOutlined,
} from '@ant-design/icons';
import { useHistory } from 'react-router-dom';

const { Search } = Input;

function Header({ addNewPath, hasSelected, handleSearch }) {
  const history = useHistory();

  const handleAddNew = () => {
    history.push('/' + addNewPath);
  };

  return (
    <>
      <Row>
        <Col>
          <Search
            placeholder="Search"
            onSearch={handleSearch}
            allowClear
            style={{ float: 'left', width: 350 }}
          />
        </Col>
        <Col flex="auto">
          <Button
            icon={<PlusOutlined />}
            type="primary"
            style={{ float: 'right' }}
            onClick={handleAddNew}
          >
            Add New
          </Button>

          <Button
            icon={<DeleteOutlined />}
            disabled={!hasSelected}
            style={{ float: 'right', marginRight: 12 }}
          >
            <Popconfirm
              title="Sure to delete?"
              icon={<QuestionCircleOutlined style={{ color: 'red' }} />}
              onConfirm={() => {}}
            >
              Delete
            </Popconfirm>
          </Button>
        </Col>
      </Row>
      <Divider />
    </>
  );
}

export default Header;
Header.js

Now we can add the Header component in the pages, for example in the product page:

import React, { useState } from 'react';
import Header from '../../component/Header';

function ShowProducts() {
  const [hasSelected, setHasSelected] = useState(false);

  return (
    <>
      <Header addNewPath="add-product" hasSelected={hasSelected} />
    </>
  );
}

export default ShowProducts;
ShowProducts.js

See the commit for changes: 5c6f738

DataTable Component

Ant design has a Table component with a wide variety of features as selectable rows, pagination, rendering custom columns, handling user events on the table. However, using it in each page causes duplicate code. Therefore we can create our custom ho0k in order to use tables. So the table hook is as follows:

import React, { useState } from 'react';
import { Table } from 'antd';
import useActionMenu from './ActionMenu';

const DEFAULT_PAGE_SIZE = 10;
const DEFAULT_PAGE_NUMBER = 0;

function useDataTable({ columns, dataSource, updateEntityPath }) {
  const [selectedRowKeys, setSelectedRowKeys] = useState([]);
  const [selectedRow, setSelectedRow] = useState(null);
  const [currentPage, setCurrentPage] = useState(DEFAULT_PAGE_NUMBER);
  const [pageSize, setPageSize] = useState(DEFAULT_PAGE_SIZE);
  const [actionColumnView] = useActionMenu({ selectedRow, updateEntityPath });

  const hasSelected = selectedRowKeys.length > 0;

  const rowSelection = {
    selectedRowKeys,
    onChange: selected => {
      setSelectedRowKeys(selected);
    },
  };

  const updatedColumns = [
    ...columns,
    {
      title: 'Action',
      key: 'action',
      render: () => actionColumnView,
    },
  ];

  const handleSingleDelete = () => {
    console.log('handleSingleDelete, selected:', selectedRow);
  };

  const resetPagination = () => {
    setCurrentPage(DEFAULT_PAGE_NUMBER);
  };

  const handleTableChange = pagination => {
    console.log('pagination:', pagination);
    setCurrentPage(pagination.current - 1);
  };

  const DataTable = () => (
    <Table
      rowKey={record => record.id}
      rowSelection={rowSelection}
      columns={updatedColumns}
      dataSource={dataSource.content}
      onRow={record => {
        return {
          onClick: () => {
            setSelectedRow(record);
          },
        };
      }}
      onChange={handleTableChange}
      pagination={{
        pageSize: DEFAULT_PAGE_SIZE,
        current: currentPage + 1,
        total: dataSource.totalElements,
        showTotal: (total, range) => {
          return `${range[0]}-${range[1]} of ${total} items`;
        },
      }}
    />
  );

  return {
    DataTable,
    hasSelected,
    selectedRow,
    selectedRowKeys,
    currentPage,
    pageSize,
    resetPagination,
  };
}

export default useDataTable;
DataTable.js

It returns:

  • DataTable component that represents the table
  • hasSelected boolean value for any columns selected
  • selectedRow is the single row which is at last selected
  • selectedRowKeys is the array containing multiple selected keys
  • currentPage, pageSize and resetPagination for the pagination

These values can be used in the component that is wrapping the table. We are also adding an action column at the end of column list to display update and delete actions. This view is also implemented as a custom hook in the ActionMenu file as follows:

import React from 'react';
import { Dropdown, Menu, Popconfirm } from 'antd';
import {
  DeleteOutlined,
  DownOutlined,
  EditOutlined,
  QuestionCircleOutlined,
} from '@ant-design/icons';
import { useHistory } from 'react-router-dom';

function useActionMenu({ selectedRow, updateEntityPath }) {
  const history = useHistory();

  const handleMenuClick = (action) => {
    if (action.key === 'edit') {
      const updatePath = '/' + updateEntityPath + '/' + selectedRow.id;
      history.push(updatePath);
    }
  };

  const handleSingleDelete = () => {
    console.log('handleSingleDelete, selected:', selectedRow);
  };

  const actionMenu = (
    <Menu onClick={handleMenuClick}>
      <Menu.Item key="edit">
        <EditOutlined />
        Update
      </Menu.Item>
      <Menu.Item key="delete">
        <Popconfirm
          title="Sure to delete?"
          placement="left"
          icon={<QuestionCircleOutlined style={{ color: 'red' }} />}
          onConfirm={handleSingleDelete}
        >
          <DeleteOutlined type="delete" />
          Delete
        </Popconfirm>
      </Menu.Item>
    </Menu>
  );

  const actionColumnView = (
    <span>
      <Dropdown overlay={actionMenu} trigger={['click']}>
        <a className="ant-dropdown-link" href="#">
          Actions <DownOutlined />
        </a>
      </Dropdown>
    </span>
  );

  return [actionColumnView];
}

export default useActionMenu;
ActionMenu.js

Now we can use the table hook in the pages for example in product page:

import React from 'react';
import Header from '../../component/Header';
import useDataTable from '../../component/DataTable';
import * as constants from './Constants';

function ShowProducts() {
  const {
    DataTable,
    hasSelected,
    currentPage,
    pageSize,
    resetPagination,
  } = useDataTable({
    columns: constants.columns,
    dataSource: constants.data,
    updateEntityPath: 'update-product',
  });

  return (
    <>
      <Header addNewPath="add-product" hasSelected={hasSelected} />
      <DataTable />
    </>
  );
}

export default ShowProducts;
ShowProducts.js

Columns and DataSource to table is defined in a constant file as follows:

import React from 'react';
import { Tag } from 'antd';

export const columns = [
  {
    title: 'Id',
    dataIndex: 'key',
    key: 'key',
  },
  {
    title: 'Name',
    dataIndex: 'name',
    key: 'name',
    render: text => <a>{text}</a>,
  },
  {
    title: 'Description',
    dataIndex: 'description',
    key: 'description',
  },
  {
    title: 'Quantity',
    dataIndex: 'qty',
    key: 'qty',
  },
  {
    title: 'owner',
    dataIndex: 'owner',
    key: 'owner',
  },
  {
    title: 'Category',
    key: 'category',
    dataIndex: 'category',
    render: tags => (
      <>
        {tags.map(tag => {
          let color = 'blue';
          if (tag === 'accessory') {
            color = 'volcano';
          } else if (tag === 'clothing') {
            color = 'geekblue';
          } else if (tag === 'jewellery') {
            color = 'green';
          }
          return (
            <Tag color={color} key={tag}>
              {tag.toUpperCase()}
            </Tag>
          );
        })}
      </>
    ),
  },
];

export const data = {
  totalElements: 8,
  content: [
    {
      key: '1',
      name: 'Personalized Bar Bracelet',
      description: 'This is a metal bracelet',
      qty: 32,
      owner: 'John Brown',
      category: ['jewellery', 'accessory'],
    },
    {
      key: '2',
      name: 'Handcraft Boots',
      description: 'Vegan-friendly leather',
      qty: 12,
      owner: 'John Green',
      category: ['clothing', 'living'],
    },
    {
      key: '3',
      name: 'Personalized Bar Bracelet',
      description: 'This is a metal bracelet',
      qty: 32,
      owner: 'John Brown',
      category: ['jewellery', 'clothing'],
    },
    // ...
  ],
};
Constants.js

Of course the data variable is a mock here and in real life applications it is supposed to be retrieved from backend by API calls. So the table view becomes as follows:

Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-2)

See the commit for changes: 1694602

Add New Item with Forms

Ant Design has the feature rich Form component that handles most of the work and design hassle for us. I will demonstrate a save product page using Antd Form.

import React from 'react';
import {
  Switch,
  Card,
  Form,
  Input,
  Row,
  Col,
  Select,
  Divider,
  Button,
  InputNumber,
} from 'antd';
import { CheckOutlined, CloseOutlined } from '@ant-design/icons';

const { Option } = Select;

function AddProduct() {
  const [form] = Form.useForm();

  const handleSave = values => {
    console.log('onFinish', values);
    // call save API
  };

  const requiredFieldRule = [{ required: true, message: 'Required Field' }];

  const ownerArray = [
    {
      id: 1,
      value: 'John Nash',
    },
    {
      id: 2,
      value: 'Leonhard Euler',
    },
    {
      id: 3,
      value: 'Alan Turing',
    },
  ];

  const categoryArray = [
    {
      id: 1,
      value: 'Clothing',
    },
    {
      id: 2,
      value: 'Jewelery',
    },
    {
      id: 3,
      value: 'Accessory',
    },
  ];

  return (
    <Card title="Add Product" loading={false}>
      <Row justify="center">
        <Col span={12}>
          <Form
            labelCol={{ span: 4 }}
            wrapperCol={{ span: 16 }}
            form={form}
            name="product-form"
            onFinish={handleSave}
          >
            <Form.Item label="Name" name="name" rules={requiredFieldRule}>
              <Input />
            </Form.Item>
            <Form.Item label="Description" name="description">
              <Input />
            </Form.Item>
            <Form.Item label="Owner" name="owner">
              <Select>
                {ownerArray.map(item => (
                  <Option key={item.id} value={item.id}>
                    {item.value}
                  </Option>
                ))}
              </Select>
            </Form.Item>
            <Form.Item label="Category" name="category">
              <Select>
                {categoryArray.map(item => (
                  <Option key={item.id} value={item.id}>
                    {item.value}
                  </Option>
                ))}
              </Select>
            </Form.Item>
            <Form.Item label="Quantity" name="qty">
              <InputNumber />
            </Form.Item>
            <Form.Item
              label="Status"
              name="active"
              valuePropName="checked"
              initialValue={false}
            >
              <Switch
                checkedChildren={<CheckOutlined />}
                unCheckedChildren={<CloseOutlined />}
              />
            </Form.Item>
            <Divider />
            <Row justify="center">
              <Button type="primary" htmlType="submit">
                Save
              </Button>
            </Row>
          </Form>
        </Col>
      </Row>
    </Card>
  );
}

export default AddProduct;
AddProduct.js

We need to add the new component in the RoutingList as a new route. The Add New button in the Show Product page forwards to the /add-product path.

const routes = [
  // ...
  {
    path: '/add-product',
    component: AddProduct,
    key: '/add-product',
  },
];

The view looks like as follows:

Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-2)

See the commit for the changes: 790615.

handleSave method is called when the Save button is clicked and the values parameter contains the json that can be sent to backend save API.

What is Next

]]>
<![CDATA[Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-1)]]>https://turkogluc.com/developing-react-admin-portal-with-redux-and-ant-design/646f24c59b6311000195da5cSun, 18 Oct 2020 09:30:05 GMTGetting StartedDeveloping a Modern Admin Portal with React, Redux, and Ant Design (Part-1)

In this series I would like to share a step by step guide for developing high quality admin portals with React and Ant Design. I will share reusable generic components that will fasten the ground up development. In order to focus on the certain aspects, I divided it to multiple parts.

In this part I present the steps for preparing React and necessary modules, and then, gradually show the implementation of each separate component to build up the Admin Portal. If you like to see the complete code I committed in the following public repository:

turkogluc/react-admin-portal
Contribute to turkogluc/react-admin-portal development by creating an account on GitHub.
Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-1)

I use the Ant Design as the user interface design framework as it contains a set of high quality components and ready to use demos for building rich, interactive user interfaces. The list of Ant Design react components can be found in the following link:

Components Overview - Ant Design
antd provides plenty of UI components to enrich your web applications, and we will improve components experience consistently. We also recommand some great Third-Party Libraries additionally.
Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-1)

Initializing the React project

We can use the following boilerplate code as it already contains React 16, Webpack 4 with babel 7, the webpack-dev-server, react-hot-loader and CSS-Modules:

git clone https://github.com/HashemKhalifa/webpack-react-boilerplate.git
mv webpack-react-boilerplate react-admin-portal
cd react-admin-portal
yarn install
yarn start

This repository is maintained continuously and the dependencies are upgraded. So it is good starting point as the most of the environment is already prepared and the technologies are up to date.

The web application starts at http://localhost:8080 address.

Installing Ant Design

yarn add antd
yarn add babel-plugin-import
yarn add less-loader
yarn add less

While using antd, we can either import the complete style file in our root file, or add styling only for the used components which is better in terms of performance. That's why we added babel-plugin-import module and we will update the webpack configuration to import less styles as follows:

Add antd library option in babel-loader  rule in webpack-common.js:

{
  test: /\.(js|jsx)$/,
  loader: 'babel-loader',
  exclude: /(node_modules)/,
  options: {
    presets: ['@babel/react'],
    plugins: [['import', { libraryName: 'antd', style: true }]],
  },
},
webpack-common.js

In the same file add also .less to the extensions:

extensions: ['*', '.js', '.jsx', '.css', '.scss', '.less'],

Add less-loader to the webpack-dev.js:

{
	test: /\.less$/,
	use: [
		'style-loader',
		'css-loader',
		'sass-loader',
		{
            loader: 'less-loader',
            options: {
              lessOptions: {
                javascriptEnabled: true,
              },
            },
		},
	],
},
webpack-dev.js

You can find these changes in the commit 6f0001f.

Installing Redux

yarn add @reduxjs/toolkit
yarn add react-redux
yarn add redux-logger

Let's create our first reducer just as an example:

const INITIAL_STATE = {};

export default (state = INITIAL_STATE, action) => {
  switch (action.type) {
    default:
      return state;
  }
};
initReducer.js

And add this reducer to global reducers list:

import { combineReducers } from 'redux';
import initReducer from './initReducer';

export default combineReducers({
  initReducer,
});
reducer/index.js

Now we can create our store:

import { createLogger } from 'redux-logger';
import { applyMiddleware, createStore } from 'redux';
import thunkMiddleware from 'redux-thunk';
import reducer from './reducer';

const loggerMiddleware = createLogger();

export const store = createStore(
  reducer,
  applyMiddleware(thunkMiddleware, loggerMiddleware)
);
store.js

And wrap the App component with Provider:

function App() {
  return (
    <Provider store={store}>
      Hello world
    </Provider>
  );
}
App.js

See the commit for changes: 98dd23b.

Installing React Router

yarn add react-router-dom

Let's create browser history:

import { createBrowserHistory } from 'history';

export default createBrowserHistory();
history.js

We can create an empty dashboard page as the initial page in the routing list.

import React from 'react';

function Dashboard() {
  return <div>Dashboard Page</div>;
}

export default Dashboard;
Dashboard.js

Now we can create Routing list that will contain the path and the component mapping. routes variable contains the list of path component mapping. Dashboard Page is be added as the first path and the root path (/).

import React from 'react';
import { Route } from 'react-router-dom';
import Dashboard from '../page/dashboard/Dashboard';

const routes = [
  {
    path: '/',
    component: Dashboard,
    key: '/',
  },
];

function RoutingList() {
  return routes.map(item => {
    if (item.path.split('/').length === 2) {
      return (
        <Route
          exact
          path={item.path}
          component={item.component}
          key={item.key}
        />
      );
    }
    return <Route path={item.path} component={item.component} key={item.key} />;
  });
}

export default RoutingList;
RoutingList.js

And we can wrap the App component with BrowserRouter. So the App component becomes as follows:

import React from 'react';
import { hot } from 'react-hot-loader/root';
import { Provider } from 'react-redux';
import { BrowserRouter, Switch } from 'react-router-dom';
import { store } from './redux/store';
import history from './router/history';
import MainLayout from './page/layout/MainLayout';

function App() {
  return (
    <Provider store={store}>
      <BrowserRouter history={history}>
        <Switch>
          <MainLayout />
        </Switch>
      </BrowserRouter>
    </Provider>
  );
}
export default hot(App);
App.js

I have added another empty component which is MainLayout. It is the component that will contain the structure of the layout.

See the commit for changes: 7afcd0c.


Designing the Layout

MainLayout component contains sider menu on the left header at the top and the content in the middle. Content part will be controller by the Router. We will place <RoutingList /> component in the content place so that we can to change the content by routing different paths. See the structures of the mentioned components in the next picture.

Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-1)

Let's see the implementation of each component in details.

User Avatar Component

I use Ant Design Avatar component in order to represent User Avatar based on the users first name.

import React from 'react';
import { Avatar } from 'antd';

function getColor(username) {
  const colors = [
    '#ffa38a',
    '#a9a7e0',
    '#D686D4',
    '#96CE56',
    '#4A90E2',
    '#62b3d0',
    '#ef7676',
  ];
  const firstChar = username.charCodeAt(0);
  const secondChar = username.charCodeAt(1);
  const thirdChar = username.charCodeAt(2);

  return colors[(firstChar + secondChar + thirdChar) % 7];
}

export const getUsernameAvatar = (username, size = 'large') => {
  return (
    <div>
      <Avatar
        style={{
          backgroundColor: getColor(username),
          verticalAlign: 'middle',
        }}
        size={size}
      >
        {username ? username.charAt(0).toUpperCase() : ''}
      </Avatar>
    </div>
  );
};
UserAvatar.js

It looks as follows:

Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-1)

Layout Banner Component

It is the header component in the layout that contains  some horizontal menus from Ant Design such as user menu, language switcher etc.

import React from 'react';
import {
  MenuUnfoldOutlined,
  MenuFoldOutlined,
  QuestionCircleOutlined,
  GlobalOutlined,
  BellOutlined,
  UserOutlined,
  LogoutOutlined,
} from '@ant-design/icons';
import { Layout, Menu, Badge } from 'antd';
import './Style.less';
import { getUsernameAvatar } from '../../component/UserAvatar';

const { Header } = Layout;
const { SubMenu } = Menu;

function LayoutBanner({ collapsed, handleOnCollapse }) {
  const getCollapseIcon = () => {
    if (collapsed) {
      return (
        <MenuUnfoldOutlined onClick={handleOnCollapse} className="trigger" />
      );
    }
    return <MenuFoldOutlined onClick={handleOnCollapse} className="trigger" />;
  };

  const handleLanguageMenuClick = () => {};
  const handleSettingMenuClick = () => {};
  const handleLogout = () => {};

  return (
    <Header className="header" style={{ background: '#fff', padding: 0 }}>
      <div
        style={{
          float: 'left',
          width: '100%',
          alignSelf: 'center',
          display: 'flex',
        }}
      >
        {window.innerWidth > 992 && getCollapseIcon()}
      </div>
      <Menu
        // onClick={this.handleLanguageMenuClick}
        mode="horizontal"
        className="menu"
      >
        <SubMenu title={<QuestionCircleOutlined />} />
      </Menu>
      <Menu
        // onClick={this.handleLanguageMenuClick}
        mode="horizontal"
        className="menu"
      >
        <SubMenu
          title={
            <Badge dot>
              <BellOutlined />
            </Badge>
          }
        />
      </Menu>
      <Menu
        onClick={handleLanguageMenuClick}
        mode="horizontal"
        className="menu"
      >
        <SubMenu title={<GlobalOutlined />}>
          <Menu.Item key="en">
            <span role="img" aria-label="English">
              🇺🇸 English
            </span>
          </Menu.Item>
          <Menu.Item key="it">
            <span role="img" aria-label="Italian">
              🇮🇹 Italian
            </span>
          </Menu.Item>
        </SubMenu>
      </Menu>
      <Menu onClick={handleSettingMenuClick} mode="horizontal" className="menu">
        <SubMenu title={getUsernameAvatar('Cemal')}>
          <Menu.Item key="setting:1">
            <span>
              <UserOutlined />
              Profile
            </span>
          </Menu.Item>
          <Menu.Item key="setting:2">
            <span>
              <LogoutOutlined onClick={handleLogout} />
              Logout
            </span>
          </Menu.Item>
        </SubMenu>
      </Menu>
    </Header>
  );
}

export default LayoutBanner;
LayoutBanner.js

Adding styles:

.header {
  display: flex;
}

.trigger {
  margin-left: 16px;
  margin-right: 16px;
  align-self: center;

}

.menu {
  .ant-menu-horizontal {
    & > .ant-menu-submenu {
      float: right;
    }
    border: none;
  }
  box-shadow: #e4ecef;
  position: relative;
  .ant-menu-submenu-title {
    width: 64px;
    height: 64px;
    text-align: center;
    padding-top: 8px;
  }
}
Sytle.less

So it looks as follows:

Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-1)

Sider Menu Component

This component uses Sider, Menu, Icon components of Ant Design.

import React from 'react';
import { Layout, Menu } from 'antd';
import { useHistory } from 'react-router-dom';
import {
  DashboardOutlined,
  FundProjectionScreenOutlined,
  PartitionOutlined,
  SettingOutlined,
  TeamOutlined,
} from '@ant-design/icons';
import './Style.less';

const { SubMenu } = Menu;

const { Sider } = Layout;

function SiderMenu({ handleOnCollapse, collapsed }) {
  const theme = 'light';

  const history = useHistory();

  const handleSiderMenuClick = action => {
    console.log('menu:', action);
    switch (action.key) {
      case 'dashboard':
        history.push('/');
        break;
      case 'showProducts':
        history.push('/products');
        break;
      case 'addProduct':
        history.push('/add-product');
        break;
      case 'showCustomers':
        history.push('/customers');
        break;
      case 'addCustomer':
        history.push('/add-customer');
        break;
      default:
        history.push('/');
    }
  };

  return (
    <Sider
      breakpoint="lg"
      collapsedWidth="80"
      onCollapse={handleOnCollapse}
      collapsed={collapsed}
      width="256"
      theme={theme}
    >
      <a>
        <div className="menu-logo" />
      </a>
      <Menu mode="inline" theme={theme} onClick={handleSiderMenuClick}>
        <Menu.Item key="dashboard">
          <DashboardOutlined />
          <span className="nav-text">Dashboard</span>
        </Menu.Item>
        <SubMenu
          key="products"
          title={
            <span>
              <PartitionOutlined />
              <span>Products</span>
            </span>
          }
        >
          <Menu.Item key="showProducts">
            <span className="nav-text">Show Products</span>
          </Menu.Item>
          <Menu.Item key="addProduct">
            <span className="nav-text">Add Product</span>
          </Menu.Item>
        </SubMenu>
        <SubMenu
          key="customers"
          title={
            <span>
              <TeamOutlined />
              <span>Customers</span>
            </span>
          }
        >
          <Menu.Item key="showCustomers">
            <span className="nav-text">Show Customers</span>
          </Menu.Item>
          <Menu.Item key="addCustomer">
            <span className="nav-text">Add Customer</span>
          </Menu.Item>
        </SubMenu>
        <Menu.Item key="settings">
          <SettingOutlined />
          <span className="nav-text">Settings</span>
        </Menu.Item>
        <Menu.Item key="reports">
          <FundProjectionScreenOutlined />
          <span className="nav-text">Reports</span>
        </Menu.Item>
      </Menu>
    </Sider>
  );
}

export default SiderMenu;
SiderMenu.js

We can add a logo above the sider menu, within the less style file.

.menu-logo {
  background-image: url('../../../public/icon.png');
  background-repeat: no-repeat;
  background-position: center;
  height: 35px;
  background-size: 100%;
  margin: 20px;
  color: #ffffff;
}
Sytle.less

Main Layout Component

Main Layout component contains the Layout and Content of Ant Design. Layout wraps the entire body, our custom SiderMenu is placed on the left and LayoutBanner stands above the content. RoutingList component is added within the Content. React Router will render the route at this place.

import React, { useState } from 'react';
import { Layout } from 'antd';
import SiderMenu from './SiderMenu';
import LayoutBanner from './LayoutBanner';
import './Style.less';
import RoutingList from '../../router/RoutingList';

const { Content } = Layout;

function MainLayout() {
  const [collapsed, setCollapsed] = useState(false);

  const handleOnCollapse = () => {
    setCollapsed(prevState => !prevState);
  };

  return (
    <Layout style={{ minHeight: '100vh' }}>
      <SiderMenu collapsed={collapsed} handleOnCollapse={handleOnCollapse} />
      <Layout>
        <LayoutBanner
          collapsed={collapsed}
          handleOnCollapse={handleOnCollapse}
        />
        <Content style={{ margin: '24px 16px 0' }}>
          <div style={{ padding: 24, background: '#fff', minHeight: 20 }}>
            <RoutingList />
          </div>
        </Content>
      </Layout>
    </Layout>
  );
}

export default MainLayout;
MainLayout.js

See the commit for changes: 229b776.


Adding New Pages/Routes

We create new components, with a simple content:

function ShowCustomers() {
  return <div>Customer Page</div>;
}

function ShowProducts() {
  return <div>Product Page</div>;
}

And we can add new routes to the RoutingList component as follows:

const routes = [
  {
    path: '/',
    component: Dashboard,
    key: '/',
  },
  {
    path: '/customers',
    component: ShowCustomers,
    key: '/customers',
  },
  {
    path: '/products',
    component: ShowProducts,
    key: '/products',
  },
];
RoutingList.js

In the SiderMenu click action sends these paths for the clicked keys:

case 'showProducts':
	history.push('/products');
	break;

case 'showCustomers':
    history.push('/customers');
    break;
SiderMenu.js

Now clicking Show Products renders the ShowProducts component within the content area.

Developing a Modern Admin Portal with React, Redux, and Ant Design (Part-1)

See the commit for changes: 4c52d5b.

What is Next?

]]>
<![CDATA[Making Sense of Change Data Capture Pipelines for Postgres with Debezium Kafka Connector]]>https://turkogluc.com/postgresql-capture-data-change-with-debezium/646f24c59b6311000195da52Sun, 27 Sep 2020 13:39:38 GMT

Should you need to get familiar with Kafka Connect Basics or Kafka JDBC Connector check out the previous post. This post focuses on PostgreSQL backup-replication mechanism and streaming data from database to Kafka with using Debezium connector.

There are basically 3 major methods to perform backups or replication in PostgreSQL:

  • Logical dumps (Extracting SQL script that represents the data, for example with using pg_dump)
  • Transactional Log Shipping (Using Write-Ahead Log)
  • Logical Decoding

The first method is relatively harder to maintain and creates more load on the database. Log shipping and Logical Decoding is low level solutions that make use of transaction logs and have the major advantage of efficiency.

Understanding the Transactional Log

Transactional Log is the essential part of the modern relational database systems. It is basically history log of all actions and changes applied on the database. The database stores the data eventually in the filesystem, however this I/O operation is relatively costly and if any failure interrupts the writing process the file becomes inconsistent and it would not be easy to recover.

Therefore database writes the data to a log which is called Write-Ahead Log (WAL) directly and transaction is completed. When the data is logged in the WAL, it is considered to be successfully stored even though it is not written to file system yet. So even if system crashes or some failure arises, it will be read from the log when system restarts.

A process constantly reads the WAL, sets a checkpoint as point in time and writes the changes appended since the last checkpoint time to the actual database file system (every 5 mins by default). As the disk space is limited, the WAL files that are already processed are either archived or recycled (old ones are removed) in order to clean up disk space.

Each transaction log is divided into 16 MB file that is called WAL segment that consist of records. Each record has an identifier named Log Sequence Number (LSN) that shows the physical location of the record. Processing the transactions and checkpointing could be demonstrated as follows:

Making Sense of Change Data Capture Pipelines for Postgres with Debezium Kafka Connector

It's All About WAL!

The WAL has important role in replication as it contains the details about transactions, and these records are very important and useful in terms of representing the database state. Note that the WAL segments are not stored forever, once it is consumed it is removed from the disk. So one way to backup locally is copying segment files to a different directory (default dir: pg_wal). Sharing these records with remote servers gives the opportunity to create replicas efficiently.

Transactional Log Shipping

It is a solution to replicate the database in different servers by shipping the WAL segments. Replica server operates on recovery mode all the time and replays the WAL records that it receives to be consistent with primary server. This solution could be implemented by sending the complete 16MB WAL segments files or streaming individual records as they are written. Streaming is a better solution in terms of durability. See the details of psychical replication implementation details: Log Shipping.

Making Sense of Change Data Capture Pipelines for Postgres with Debezium Kafka Connector

In Log Shipping method the segment files which contains binary data copied physically (byte-by-byte) to another server. So it can be also called physical replication. This approach has some limitations and drawbacks as it is not possible to replicate between different versions of Postgres or different operating systems, and also it can not replicate part of a database. Logical replication is introduced to address these limitations.

Logical Replication

Logical replication is a method that decodes the binary WAL records into a more understandable format and sends this decoded data to remote server. It uses publish and subscribe model with one or more subscribers receiving data from publishers. In contrast to physical replication, there is no Primary and Standby servers but data can flow both sides, but from publishers to subscribers. Publisher creates a PUBLICATION:

CREATE PUBLICATION mypublication FOR TABLE users, departments;
CREATE PUBLICATION alltables FOR ALL TABLES;

And Subscribers creates SUBSCRIPTION:

CREATE SUBSCRIPTION mysubcribtion
         CONNECTION 'host=localhost port=5432 user=foo dbname=foodb'
        PUBLICATION mypublication, alltables;

The changes send by publisher are replicated in the subscriber database to ensure that the two databases remain in sync. It is important to remember that after some time WAL segments are deleted. So if a subscriber stops for some and the the WAL record is deleted in the publisher it causes a FATAL error.

Replication slots solves this problem for us. By assigning a subscriber to a replication slot we can guarantee that the WAL records that it related with the subscription are not going to be removed until the subscriber consumes them. CREATE SUBSCRIPTION command automatically generates a replication slot in the publisher side for us so we do not need to create it manually.

Making Sense of Change Data Capture Pipelines for Postgres with Debezium Kafka Connector

Now having the necessary knowledge of PostgreSQL replication concepts we can proceed to configure and manage the Debezium Connector.

Debezium Connector

Debezium is an open source Change Data Capture platform that turns the existing database into event streams. Debezium Kafka Connector captures each row level change in the database and sends them to Kafka topics.

Making Sense of Change Data Capture Pipelines for Postgres with Debezium Kafka Connector

Debezium uses the logical replication feature of PostgreSQL in order to capture the transaction records from the WAL. The connector acts the subscriber role for the changes published from tables. It handles all the low level configuration for us, as creating publisher, subscribers replication slots etc.

In order to use the connector we need an output plugin installed in the PostgreSQL server so that it can decode the WAL records. Debezium connector supports number of output plugins:

  • decoderbufs: based on Protobuf and maintened by Debezium Community
  • wal2json: based on JSON
  • pgoutput: is the standard logical decoding output plug-in in PostgreSQL, also supported by Debezium Community.

I personally recommend using pgoutput as it is the native plugin and it does not require any additional installation. If you would like to use another plugin you need to install it in the database server. See the documentation for installing plugins. We can setup the all architecture with the following steps

1-Running Kafka Cluster

We can use the following docker-compose file to get Kafka cluster with a single broker up and running.

version: '2'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    hostname: zookeeper
    container_name: zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000

  kafka:
    image: confluentinc/cp-kafka:latest
    hostname: kafka
    container_name: kafka
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
      - "29092:29092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

  control-center:
    image: confluentinc/cp-enterprise-control-center:5.5.1
    hostname: control-center
    container_name: control-center
    depends_on:
      - zookeeper
      - kafka
    ports:
      - "9021:9021"
    environment:
      CONTROL_CENTER_BOOTSTRAP_SERVERS: 'kafka:9092'
      CONTROL_CENTER_ZOOKEEPER_CONNECT: 'zookeeper:2181'
      CONTROL_CENTER_REPLICATION_FACTOR: 1
      CONTROL_CENTER_INTERNAL_TOPICS_PARTITIONS: 1
      CONTROL_CENTER_MONITORING_INTERCEPTOR_TOPIC_PARTITIONS: 1
      CONFLUENT_METRICS_TOPIC_REPLICATION: 1
      PORT: 9021

  postgresql:
    image: postgresql:12
    hostname: postgresql
    container_name: postgresql
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: demo
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: root

Note that confluent center is optional and used only as the user interface for Kafka broker.

2- Preparing Debezium Connector Plugin

Download the Debezium PostgreSQL Connector plugin and extract the zip file to the Kafka Connect's plugins path. While we start Kafka Connector we can specify a plugin path that will be used to access the plugin libraries. For example plugin.path=/usr/local/share/kafka/plugins. Check Install Connector Manually documentation for details.

3- Running Kafka Connect

We can run the Kafka Connect with connect-distributed.sh script that is located inside the kafka bin directory. We need to provide a properties file while running this script for configuring the worker properties.

We can create create connect-distributed.properties file to specify the worker properties as follows:

# A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
bootstrap.servers=localhost:29092

# unique name for the cluster, used in forming the Connect cluster group. Note that this must not conflict with consumer group IDs
group.id=connect-cluster

# The converters specify the format of data in Kafka and how to translate it into Connect data.
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true

# Topic to use for storing offsets. This topic should have many partitions and be replicated and compacted.
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1

# Topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated,
config.storage.topic=connect-configs
config.storage.replication.factor=1

# Topic to use for storing statuses. This topic can have multiple partitions and should be replicated and compacted.
status.storage.topic=connect-status
status.storage.replication.factor=1

# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000

plugin.path=/Users/cemalturkoglu/kafka/plugins

Note that the plugin.path is the path that we need to place the plugin library that we downloaded.

After running the connector we can confirm that connector's REST endpoint is accessible, and we can confirm that JDBC connector is in the plugin list by calling http://localhost:8083/connector-plugins

4- Preparing PostgreSQL Server for Logical Replication

The first thing we need to configure in PostgreSQL is wal_level that can be one of minimal, replica and logical. This configuration specifies how much information should be written to WAL, and the default value is replica, we need to set it as logical to be able to stream records with logical replication.

The default wal_level can be seen with SHOW wal_level; command

postgres=# SHOW wal_level;
 wal_level
-----------
 replica
(1 row)

This configuration is in the postgresql.conf file which is generally located at /var/lib/postgresql/data/pgdata/postgresql.conf. To find the config location you can use SHOW config_file; command. In the PosgreSQL server go to this location and edit the file:

~ docker exec -it 62ce968539a6 /bin/bash
root@db:/# nano /var/lib/postgresql/data/pgdata/postgresql.conf

Set the wal_level to logical and save it. This change requires restarting the postgreSQL server. This is the bare minimum connection required to run the connector.

Also optionally, you can create another user who is not a super user for security reasons. This user need to have LOGIN and REPLICATION roles.

CREATE ROLE name REPLICATION LOGIN;

And this user should be added in pg_hba.conf file to have authentication. This config may have host all all line then it accepts all connections and there is no need add the new user.

5- Starting the Connector

We run the connectors by calling REST endpoints with the configuration JSON. We can specify the configuration payload from a file for curl command. The following command starts the connector.

curl -d @"debezium-config.json" \
-H "Content-Type: application/json" \
-X POST http://localhost:8083/connectors

The configuration for the plugin is stored in debezium-config.json file can be as follows:

{
  "name": "debezium-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "localhost",
    "database.port": "5432",
    "database.user": "postgres",
    "database.password": "root",
    "database.dbname" : "demo",
    "plugin.name": "pgoutput",
    "database.server.name": "demo-server"
  }
}
  • The connector connets to the database with the connection string.
  • Reads the current position of the server's transaction log.
  • It starts by performing an initial consistent snapshot of each of the database schemas. It sends messages with READ event for each row encountered during the snapshot.
  • After snapshot is completed, it starts streaming changes as CREATE, UPDATE and DELETE events from the transaction log starting from the saved position. So if new records are added during the snapshot they will be captured from WAL and send to Kafka.

We can see that 4 tables from my demo database are shipped to Kafka topics. Topic names are generated as: <server-name>.<schema-name>.<table-name>

Making Sense of Change Data Capture Pipelines for Postgres with Debezium Kafka Connector

And each row in the tables are loaded as a message:

Making Sense of Change Data Capture Pipelines for Postgres with Debezium Kafka Connector

Each message contains schema and payload fields. Payload has the before and after fields specifying the object that is being changed. The payload also contains op field which stands for the operation and maps as follows:

  • c: create
  • r: read
  • u: update
  • d: delete
{
  "schema": {
    "type": "struct",
    "fields": [
      {
        "type": "struct",
        "fields": [
          {
            "type": "int64",
            "optional": false,
            "field": "id"
          },
          // ...
        ],
        "optional": true,
        "name": "demo_server.public.address.Value",
        "field": "after"
      },
    ],
    "optional": false,
    "name": "demo_server.public.address.Envelope"
  },
  "payload": {
    "before": null,
    "after": {
      "id": 5,
      "flat": "flat-1",
      "postal_code": "ABC",
      "street": "street",
      "title": "new address",
      "city_id": 7,
      "user_id": 2
    },
    "source": {
      "version": "1.2.5.Final",
      "connector": "postgresql",
      "name": "demo-server",
      "ts_ms": 1601157566900,
      "snapshot": "false",
      "db": "demo",
      "schema": "public",
      "table": "address",
      "txId": 848,
      "lsn": 24940072,
      "xmin": null
    },
    "op": "c",
    "ts_ms": 1601157567132,
    "transaction": null
  }
}

In this example payload.op is c and the payload.before field is null as this is a create operation.

Filtering Schema and Tables

  • schema.whitelist and schema.blacklist configuration properties can be used to choose the schemas to be subscribed to.
  • table.whitelist and table.blacklist configuration properties can be used to choose the tables to be subscribed to. Table list should be comma separated list of <schema-name>.<table-name> format.
  • column.whitelist and column.blacklist  configuration properties can be used to choose the subset of columns.

Replica Identity

Replica Identity is a table level PostgreSQL setting that is used to determine the amount of the information to be written in the WAL for update events. In the UPDATE and DELETE events there is a payload.before field which contains the previous object and some of its details. For example in a delete event:

{
   "payload": {
      "before": {
         "id": 5,
         "flat": null,
         "postal_code": null,
         "street": null,
         "title": null,
         "city_id": null,
         "user_id": null
      },
      "after": null,
      "op": "d",
      "ts_ms": 1601157654097,
      "transaction": null
   }
}

after is null as it is removed, and before contains only primary key of the object. The details of this object depends on the REPLICA IDENTITY of the table. It has the following options:

  • DEFAULT: contains only primary keys
  • FULL: contains all of the columns
  • NOTHING: contains no information

We can set this config at the source table with the following command:

ALTER TABLE table_name REPLICA IDENTITY FULL;

After this change on address table if I update a column I get the following message with the payload containing full details of the before object:

{
   "payload": {
      "before": {
         "id": 4,
         "flat": "10/1",
         "postal_code": "KLM123",
         "street": "Street3",
         "title": "office address",
         "city_id": 8,
         "user_id": 3
      },
      "after": {
         "id": 4,
         "flat": "10/1",
         "postal_code": "KLM02",
         "street": "St 15",
         "title": "office alternative address",
         "city_id": 8,
         "user_id": 3
      },
      "op": "u",
      "ts_ms": 1601212592317,
      "transaction": null
   }
}

Final words

In the Kafka JDBC Connector post high level implementation of copying data from relational database to Kafka is discusses. JDBC connector uses SQL queries to retrieve data from database so it creates some load on the server. It is also not possible to retrieve DELETED rows in this solution.

Although it is easy to start and setup JDBC connectors, it has pitfalls. Transactional Log based Change Data Capture pipelines are better way to stream every single event from database to Kafka. CDC pipelines are more complex to set up at first than JDBC Connector, however as it directly interacts with the low level transaction log it is way more efficient. It does not generate load on the database.

One drawback of this approach is that it is not possible to get schema changes as events. If the schema changes in the source database, the destination client should be adjusted manually.

References

]]>
<![CDATA[Kafka Connect JDBC Source Connector]]>https://turkogluc.com/kafka-connect-jdbc-source-connector/646f24c59b6311000195da5bTue, 22 Sep 2020 15:00:40 GMT

Getting data from database to Apache Kafka is certainly one of the most popular use case of Kafka Connect. Kafka Connect provides scalable and reliable way to move the data in and out of Kafka. As it uses plugins for specific plugins for connectors and it is run by only configuration (without writing code) it is an easy integration point.

Visit the Kafka Connect Basics post if you would like to get an introduction.

1- Running Kafka Cluster

We can use the following docker-compose file to get Kafka cluster with a single broker up and running.

version: '2'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    hostname: zookeeper
    container_name: zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000

  kafka:
    image: confluentinc/cp-kafka:latest
    hostname: kafka
    container_name: kafka
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
      - "29092:29092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

If you would like to use a user interface rather than console tools to manage the Kafka, Confluent Control Center is one of the best choice. It is commercial tool but it comes with 30 days licence. There are also Landoop UI which has Kafka Connect management interface as well. If you would like to use Confluent Control Center you can add it as a service to the docker-compose file as follows:

control-center:
    image: confluentinc/cp-enterprise-control-center:5.5.1
    hostname: control-center
    container_name: control-center
    depends_on:
      - zookeeper
      - kafka
    ports:
      - "9021:9021"
    environment:
      CONTROL_CENTER_BOOTSTRAP_SERVERS: 'kafka:9092'
      CONTROL_CENTER_ZOOKEEPER_CONNECT: 'zookeeper:2181'
      CONTROL_CENTER_REPLICATION_FACTOR: 1
      CONTROL_CENTER_INTERNAL_TOPICS_PARTITIONS: 1
      CONTROL_CENTER_MONITORING_INTERCEPTOR_TOPIC_PARTITIONS: 1
      CONFLUENT_METRICS_TOPIC_REPLICATION: 1
      PORT: 9021

2- Preparing Connector Library

Download the Kafka Connect JDBC plugin from Confluent hub and extract the zip file to the Kafka Connect's plugins path. While we start Kafka Connector we can specify a plugin path that will be used to access the plugin libraries. For example plugin.path=/usr/local/share/kafka/plugins. Check Install Connector Manually documentation for details.

We also need JDBC 4.0 driver as it will be used by the connector to communicate with the database.  Postgresql and sqlite drivers are already shipped with JDBC connector plugin. If you like to connect to another database system add the driver to the same folder with kafka-connect-jdbc jar file. See Installing JDBC Driver Manual.

Kafka Connect JDBC Source Connector

3- Running Kafka Connect

We can run the Kafka Connect with connect-distributed.sh script that is located inside the kafka bin directory. We need to provide a properties file while running this script for configuring the worker properties.

connect-distributed.sh <worker properties file>

We can create create connect-distributed.properties file to specify the worker properties as follows:

# A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
bootstrap.servers=localhost:29092

# unique name for the cluster, used in forming the Connect cluster group. Note that this must not conflict with consumer group IDs
group.id=connect-cluster

# The converters specify the format of data in Kafka and how to translate it into Connect data.
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true

# Topic to use for storing offsets. This topic should have many partitions and be replicated and compacted.
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1

# Topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated,
config.storage.topic=connect-configs
config.storage.replication.factor=1

# Topic to use for storing statuses. This topic can have multiple partitions and should be replicated and compacted.
status.storage.topic=connect-status
status.storage.replication.factor=1

# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000

plugin.path=/Users/cemalturkoglu/kafka/plugins

Note that the plugin.path is the path that we need to place the library that we downloaded.

After running the connector we can confirm that connector's REST endpoint is accessible, and we can confirm that JDBC connector is in the plugin list by calling http://localhost:8083/connector-plugins

[{"class":"io.confluent.connect.jdbc.JdbcSinkConnector","type":"sink","version":"5.5.1"},{"class":"io.confluent.connect.jdbc.JdbcSourceConnector","type":"source","version":"5.5.1"},{"class":"org.apache.kafka.connect.file.FileStreamSinkConnector","type":"sink","version":"2.6.0"},{"class":"org.apache.kafka.connect.file.FileStreamSourceConnector","type":"source","version":"2.6.0"},{"class":"org.apache.kafka.connect.mirror.MirrorCheckpointConnector","type":"source","version":"1"},{"class":"org.apache.kafka.connect.mirror.MirrorHeartbeatConnector","type":"source","version":"1"},{"class":"org.apache.kafka.connect.mirror.MirrorSourceConnector","type":"source","version":"1"}]

4. Starting the JDBC Connector

As we operate on distributed mode we run the connectors by calling REST endpoints with the configuration JSON. We can specify the configuration payload from a file for curl command. The following command starts the connector.

curl -d @"jdbc-source.json" \
-H "Content-Type: application/json" \
-X POST http://localhost:8083/connectors

The configuration for the plugin is stored in jdbc-source.json file can be as follows:

{
    "name": "jdbc_source_connector_postgresql_01",
    "config": {
        "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
        "connection.url": "jdbc:postgresql://localhost:5432/demo",
        "connection.user": "postgres",
        "connection.password": "root",
        "topic.prefix": "postgres-01-",
        "poll.interval.ms" : 3600000,
        "mode":"bulk"
    }
}
  • The connector connects to the database with using the JDBC URL and connection credentials.
  • It will create kafka topic per table. Topics are named with the topic.prefix + <table_name>
  • The data is retrieved from database with the interval specified by poll.interval.ms config.
  • The mode configuration is to specify the working mode which will be discussed below. Bulk mode is used to load all the data.

We can see that my demo database with 4 tables are loaded to the 4 kafka topics:

Kafka Connect JDBC Source Connector

And each row in the tables are loaded as a message.

Kafka Connect JDBC Source Connector

The message contains the following fields:

{
  "schema": {
    "type": "struct",
    "fields": [
      {
        "type": "int64",
        "optional": false,
        "field": "id"
      },
      {
        "type": "string",
        "optional": true,
        "field": "flat"
      },
      {
        "type": "string",
        "optional": true,
        "field": "postal_code"
      },
      {
        "type": "string",
        "optional": true,
        "field": "street"
      },
      {
        "type": "string",
        "optional": true,
        "field": "title"
      },
      {
        "type": "int64",
        "optional": true,
        "field": "city_id"
      },
      {
        "type": "int64",
        "optional": true,
        "field": "user_id"
      }
    ],
    "optional": false,
    "name": "address"
  },
  "payload": {
    "id": 3,
    "flat": "3/1B",
    "postal_code": "ABX501",
    "street": "Street2",
    "title": "work address",
    "city_id": 7,
    "user_id": 3
  }
}

Note that it contains the fields attribute with the information about the fields and payload with the actual data.

Selecting Schema and Tables To Copy

We can use catalog.pattern or schema.pattern to filter the schemas to be copied.

By default all tables are queried to be copied. However we include or exclude the list of tables in copying by table.whitelist and table.blacklist configurations. We can use either blacklist or whitelist at the same time.

table.whitelist:"Users,Address,City"

table.blacklist:"Groups"

Query Modes

There are alternative incremental query modes to bulk mode which is used in the above demonstration. Incremental modes can be used to load the data only if there is a change. Certain columns are used to detect if there is a change in the table or row.

bulk: In this mode connector will load all the selected tables in each iteration. If the iteration interval is set to some small number (5 seconds default) it wont make much sense to load all the data as there will be duplicate data. It can be useful if a periodical backup, or dumping the entire database.

incrementing: This mode uses a single column that is unique for each row, ideally auto incremented primary keys to detect the changes in the table. If new row with new ID is added it will be copied to Kafka. However this mode lacks the capability of catching update operation on the row as it will not change the ID. incrementing.column.name is used to configure the column name.

timestamp: Uses a single column that shows the last modification timestamp and in each iteration queries only for rows that have been modified since that time. As timestamp is not unique field, it can miss some updates which have the same timestamp. timestamp.column.name is used to configure the column name.

timestamp+incrementing: Most robust and accurate mode that uses both a unique incrementing ID and timestamp. Only drawback is that it is needed to add modification timestamp column on legacy tables.

query: The connector supports using custom queries to fetch data in each iteration. It is not very flexible in terms of incremental changes. It can be useful to fetch only necessary columns from a very wide table, or to fetch a view containing multiple joined tables. If the query gets complex, the load and the performance impact on the database increases.

Incremental Querying with Timestamp

Using only unique ID or timestamp has pitfalls as mentioned above. It is better approach to use them together. The following configuration shows an example of timestamp+incrementing mode:

{
    "name": "jdbc_source_connector_postgresql_02",
    "config": {
        "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
        "connection.url": "jdbc:postgresql://localhost:5432/demo-db",
        "connection.user": "postgres",
        "connection.password": "root",
        "topic.prefix": "postgres-02-",
        "table.whitelist": "store,tag,category,address,city",
        "mode":"timestamp+incrementing",
        "timestamp.column.name": "last_modified_date",
        "validate.non.null": false,
        "db.timezone": "Europe/Warsaw"
    }
}

Note the validate.non.null is used because connector requires the timestamp column to be NOT NULL, we can either set these columns NOT NULL or we can disable this validation with setting validate.not.null false.

While using the timestamp column timezone of the database system matters. There might be different behaviour because of time mismatches so it can be configure by db.timezone.

table.whitelist configuration is used to limit the tables to given list. So these 5 tables are copied to Kafka topics.

Kafka Connect JDBC Source Connector

It is mentioned above that using incrementing mode without timestamp causes not capturing the UPDATE operations on the table. With the timestamp+incrementing mode update operations are captured as well.

Kafka Connect JDBC Source Connector

Final words

JDBC Connector is great way to start for shipping data from relational databases to Kafka. It is easy to setup and use, only it is needed to configure few properties to get you data streamed out. However there are some drawbacks of JDBC connector as well. Some of the drawbacks can be listed as:

  • It needs to constantly run queries, so it generates some load on the physical database. To not cause performance impacts, queries should be kept simple, and scalability should not be used heavily.
  • As the incremental timestamp is mostly needed, working on legacy datastore would need extra work to add columns. There can be also cases that it is not possible to update the schema.
  • JDBC Connector can not fetch DELETE operations as it uses SELECT queries to retrieve data and there is no sophisticated mechanism to detect the deleted rows. You can implement your solution to overcome this problem.

References

]]>
<![CDATA[Spring Data JPA Auditing]]>https://turkogluc.com/spring-data-jpa-auditing/646f24c59b6311000195da5aSun, 20 Sep 2020 13:54:53 GMT

Spring Data provides a great support to keep track of the persistence layer changes. By using auditing, we can store or log the information about the changes on the entity such as who created or changed the entity and when the change is made.

We can make use of the annotations like @CreatedBy, @CreatedDate, @LastModifiedDate, @LastModifiedBy annotation on the entity fields to instruct the Spring JPA to transparently fill these fields. We can use the annotations as follows:

@Entity
public class Category {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private Long title;

    @CreatedBy
    private Long createdBy;

    @CreatedDate
    private LocalDateTime createdDate;

    @LastModifiedBy
    private Long lastModifiedBy;

    @LastModifiedDate
    private LocalDateTime lastModifiedDate;

    // getters and setter..
}

Auditing feature is generally needed in most of the entities, so it is a better approach to create and abstract class Auditable class that contains the auditing fields and extend the abstract class by the entities that need auditing. By this way we will avoid duplicating the same fields in all entities.

Creating Abstract Auditable Class

We can create an Abstract class to contain the audit related fields as follows:

@MappedSuperclass
@EntityListeners(AuditingEntityListener.class)
public abstract class Auditable {

    @CreatedBy
    @Column(columnDefinition = "bigint default 1", updatable = false)
    protected Long createdBy;

    @CreatedDate
    @Column(columnDefinition = "timestamp default '2020-04-10 20:47:05.967394'", updatable = false)
    protected LocalDateTime createdDate;

    @LastModifiedBy
    @Column(columnDefinition = "bigint default 1")
    protected Long lastModifiedBy;

    @LastModifiedDate
    @Column(columnDefinition = "timestamp default '2020-04-10 20:47:05.967394'")
    protected LocalDateTime lastModifiedDate;

    public Long getCreatedBy() {
        return createdBy;
    }

    public void setCreatedBy(Long createdBy) {
        this.createdBy = createdBy;
    }

    public LocalDateTime getCreatedDate() {
        return createdDate;
    }

    public void setCreatedDate(LocalDateTime createdDate) {
        this.createdDate = createdDate;
    }

    public Long getLastModifiedBy() {
        return lastModifiedBy;
    }

    public void setLastModifiedBy(Long lastModifiedBy) {
        this.lastModifiedBy = lastModifiedBy;
    }

    public LocalDateTime getLastModifiedDate() {
        return lastModifiedDate;
    }

    public void setLastModifiedDate(LocalDateTime lastModifiedDate) {
        this.lastModifiedDate = lastModifiedDate;
    }
}

@MappedSuperclass annotation is used to specify that the class itself is not an entity but its attributes can be mapped in the same way as an entity, however this mappings will apply only to its subclasses. So each class inherits the abstract Auditable class will contain these attributes.

@EntityListeners annotation is used to configure AuditingEntityListener which contains the @PrePersist and @PreUpdate methods in order to capture auditing information

Enable Auditing Feature

In order to enable the auditing feature in Spring we need to use @EnableJpaAuditing annotation.

@SpringBootApplication
@EnableJpaAuditing
public class BackendApplication {
	public static void main(String[] args) {
		SpringApplication.run(BackendApplication.class, args);
	}
}

Provide Auditor

createdDate and lastModifiedDate fields are filled according to the current time. Besides, createdBy and lastModifiedBy annotations needs a way to get the user who is performing the action. In order to provide this information we need to implement the AuditorAware interface.

@Component
public class AuditAwareImpl implements AuditorAware <Long> {

    @Override
    public Optional <Long> getCurrentAuditor() {
        ApplicationUser principal = (ApplicationUser) SecurityContextHolder.getContext().getAuthentication().getPrincipal();
        return Optional.of(principal.getId());
    }
}

We added the implementation of getCurrentAuditor method because it is invoked to retrieve the user who is performing the operation.

Extend the Auditable Class in Entities

Now we can extend the Auditable class in the entities that we want to use auditing. For example:

@Entity
@Data
public class Category extends Auditable {

    @GeneratedValue(strategy = GenerationType.IDENTITY)
    @Id
    private Long id;
    private String title;

    // ..
}

And the auditing fields will be filled automatically:

Spring Data JPA Auditing
]]>
<![CDATA[Understanding the effective data fetching with JPA Entity Graphs (Part-2)]]>https://turkogluc.com/understanding-the-effective-data-fetching-with-jpa-entity-graphs-part-2/646f24c59b6311000195da58Fri, 18 Sep 2020 19:30:14 GMTPart-2: The SolutionUnderstanding the effective data fetching with JPA Entity Graphs (Part-2)

In the previous post I tried to demonstrate JPA fetching strategies. The are number of problems using the FetchType.Lazy or FetchType.Eager when it is the only configuration for the entity for fetching plan.

  • It causes N+1 problem which runs many unnecessary queries and affect the data access layer performance very badly.
  • It does not give the flexibility to choose different fetch strategies for different scenarios, so it is limiting.
  • Very likely to put you in trouble with JPA lazy initialisation exception that is caused by accessing the proxy objects that are not fetched from the outside of transaction context.

What is needed to improve this behaviour is that using SQL JOIN s when it is intended to retrieve the referenced objects. One of the solution could be writing your own queries for more predictable and better performing queries.

@Query(
value = "SELECT a FROM Address AS a " +
	"JOIN FETCH a.user user " +
    "JOIN FETCH a.city city "
)
List <Address> findAllAddresses();

Although it is good enough to perform join queries when it is necessary, in a big scale enterprise project most probably number of such methods will increase a lot and there might me lots of almost duplicate SQL queries needed. So for such cases Entity Graphs seems to be a better fit.

Entity Graphs

Entity graphs are introduced in JPA 2.1 and used to allow partial or specified fetching of objects. When an entity has references to other entities we can specify a fetch plan by entity graphs in order to determine which fields or properties should be fetched together. We can describe a fetch plan with its paths and boundaries with @NamedEntityGraph annotation in the entity class.

@NamedEntityGraph(
        name = "address-city-user-graph",
        attributeNodes = {
                @NamedAttributeNode("city"),
                @NamedAttributeNode("user")
        }
)
@Entity
public class Address {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String title;
    private String street;
    private String flat;
    private String postalCode;

    @OneToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "city_id")
    private City city;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "user_id")
    private User user;

    // getters and setters

}

I switched the fetch type to be lazy, and defined and entity graph named  address-city-user-graph. Entity graph contains the city and user attributes so that those objects will be retrieved all together. So for example when I want to retrieve only the address data, I can use findAll method and children object will not be queries because of lazy fetch strategy. However if I would like to retrieve details of users and city as well I can use the address-city-user-graph entity graph that I defined. To assign a defined NamedEntityGraph in the queries we use @EntityGraph annotation on the repository methods.

@Repository
public interface AddressRepository extends JpaRepository<Address, Long> {

    @EntityGraph("address-city-user-graph")
    List <Address> findByUserId(Long userId);

    @EntityGraph("address-city-user-graph")
    List<Address> findByCityId(Long cityId);
}

Entity graph annotation takes the name of NamedEntityGraph which is only a user defined String as an identifier. So we can define this fetch plan by entity graph and use it multiple times. If I were to write SQL queries I would have to write duplicate JOIN queries multiple times. Now when I call the findByUserId method it runs a single query with JOINs:

-- hibernate
    select
        address0_.id as id1_0_0_,
        user1_.id as id1_2_1_,
        city2_.id as id1_1_2_,
        address0_.city_id as city_id6_0_0_,
        address0_.flat as flat2_0_0_,
        address0_.postal_code as postal_c3_0_0_,
        address0_.street as street4_0_0_,
        address0_.title as title5_0_0_,
        address0_.user_id as user_id7_0_0_,
        user1_.email as email2_2_1_,
        user1_.name as name3_2_1_,
        user1_.password as password4_2_1_,
        user1_.phone as phone5_2_1_,
        city2_.name as name2_1_2_
    from
        address address0_
    left outer join
        users user1_
            on address0_.user_id=user1_.id
    left outer join
        city city2_
            on address0_.city_id=city2_.id
    where
        user1_.id=?

We can define multiple entity graphs in an entity within @NamedEntityGraphs annotation:

@NamedEntityGraphs({
        @NamedEntityGraph(
                name = Address.WITH_USER_GRAPH,
                attributeNodes = {
                        @NamedAttributeNode("user")
                }
        ),
        @NamedEntityGraph(
                name = Address.WITH_CITY_GRAPH,
                attributeNodes = {
                        @NamedAttributeNode("city"),
                }
        ),
        @NamedEntityGraph(
                name = Address.WITH_USER_AND_CITY_GRAPH,
                attributeNodes = {
                        @NamedAttributeNode("user"),
                        @NamedAttributeNode("city")
                }
        )
})
@Entity
public class Address {

    public static final String WITH_USER_GRAPH = "address-with-user-graph";
    public static final String WITH_CITY_GRAPH = "address-with-city-graph";
    public static final String WITH_USER_AND_CITY_GRAPH = "address-with-user-and-city-graph";

    // fields..

}

And we can assign each entity graph to repository methods as we'd like:

@Repository
public interface AddressRepository extends JpaRepository<Address, Long> {

    @EntityGraph(Address.WITH_USER_AND_CITY_GRAPH)
    List<Address> findAll();

    @EntityGraph(Address.WITH_USER_GRAPH)
    List <Address> findByUserId(Long userId);

    @EntityGraph(Address.WITH_CITY_GRAPH)
    List<Address> findByCityId(Long cityId);

    Optional<Address> findById(Long id);

}

In the example there is no Entity Graph assigned to findById method so it will use FetchTypes from the entity which is defined lazy so it will only retrieve the fields of address but not children. If I run the findByCity method thanks to its Entity Graph it returns the city field of the address in the same query as well.

Hibernate:
    select
        address0_.id as id1_0_0_,
        city1_.id as id1_1_1_,
        address0_.city_id as city_id6_0_0_,
        address0_.flat as flat2_0_0_,
        address0_.postal_code as postal_c3_0_0_,
        address0_.street as street4_0_0_,
        address0_.title as title5_0_0_,
        address0_.user_id as user_id7_0_0_,
        city1_.name as name2_1_1_
    from
        address address0_
    left outer join
        city city1_
            on address0_.city_id=city1_.id
    where
        city1_.id=?

Types of Entity Graph

@EntityGraph annotation takes a type parameter with 2 values:

  • FETCH: It is the default graph type. When it is selected the attributes that are specified by attribute nodes of the entity graph are treated as FetchType.EAGER and attributes that are not specified are treated as FetchType.LAZY.
  • LOAD: When this type is selected the attributes that are specified by attribute nodes of the entity graph are treated as FetchType.EAGER and attributes that are not specified are treated according to their specified or default FetchType.

Selecting LOAD type example:

@EntityGraph(value = Address.WITH_USER_AND_CITY_GRAPH, type = EntityGraph.EntityGraphType.LOAD)
List<Address> findAll();

Subgraphs

Subgraphs can be defined in a NamedEntityGraph in order to specify fetch plan of the fields that belongs to the child.

In the previous post User entity did not have reference for Address relation, so the relation was mapped with unidirectional @ManyToOne on the Address entity. To demonstrate the the subgraphs I will make this relation bidirectional with adding @OneToMany to the User entity, and also I will create another entity named Group that will be parent entity of User.

@Entity
@Table(name = "users")
public class User {

	// ..

	@OneToMany(fetch = FetchType.LAZY, mappedBy = "user")
    private Set <Address> addressList = new HashSet <>();

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "group_id")
    private Group group;

}

And the group entity:

@Entity
public class Group {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private String name;

    @OneToMany(fetch = FetchType.LAZY, mappedBy = "group")
    private Set <User> userList = new HashSet <>();

    // getter and setters..

}

Now I will define an entity graph in order to retrieve group with its users that contains its children addresses, and addresses contain city information. So I would like to see such a tree from parent to child in 1 query.

Understanding the effective data fetching with JPA Entity Graphs (Part-2)

The entity graph contains the subgraph and subgraphs are defined with @NamedSubgraph annotation. This annotation also takes @NamedAttributeNode to specify the attributes to be fetched.

@NamedEntityGraphs(
        @NamedEntityGraph(
                name = Group.WITH_USER_AND_SUB_GRAPH,
                attributeNodes = {
                        @NamedAttributeNode( value = "userList", subgraph = "userSubGraph") },
                subgraphs = {
                        @NamedSubgraph(
                                name = "userSubGraph",
                                attributeNodes = { @NamedAttributeNode( value = "addressList", subgraph = "addressSubGraph") }
                        ),
                        @NamedSubgraph(
                                name = "addressSubGraph",
                                attributeNodes = {
                                        @NamedAttributeNode("city")
                                }
                        )
                }
        )
)
@Entity
@Table(name = "groups")
public class Group {

	// ..
}

So all the tree is retrieved in one query with joining the tables:

Hibernate:
    select
        group0_.id as id1_2_0_,
        userlist1_.id as id1_3_1_,
        addresslis2_.id as id1_0_2_,
        city3_.id as id1_1_3_,
        group0_.name as name2_2_0_,
        userlist1_.email as email2_3_1_,
        userlist1_.group_id as group_id6_3_1_,
        userlist1_.name as name3_3_1_,
        userlist1_.password as password4_3_1_,
        userlist1_.phone as phone5_3_1_,
        userlist1_.group_id as group_id6_3_0__,
        userlist1_.id as id1_3_0__,
        addresslis2_.city_id as city_id6_0_2_,
        addresslis2_.flat as flat2_0_2_,
        addresslis2_.postal_code as postal_c3_0_2_,
        addresslis2_.street as street4_0_2_,
        addresslis2_.title as title5_0_2_,
        addresslis2_.user_id as user_id7_0_2_,
        addresslis2_.user_id as user_id7_0_1__,
        addresslis2_.id as id1_0_1__,
        city3_.name as name2_1_3_
    from
        groups group0_
    left outer join
        users userlist1_
            on group0_.id=userlist1_.group_id
    left outer join
        address addresslis2_
            on userlist1_.id=addresslis2_.user_id
    left outer join
        city city3_
            on addresslis2_.city_id=city3_.id

So I have the object with it is references loaded:

Understanding the effective data fetching with JPA Entity Graphs (Part-2)
]]>
<![CDATA[Understanding the effective data fetching with JPA Entity Graphs (Part-1)]]>https://turkogluc.com/understanding-jpa-entity-graphs/646f24c59b6311000195da57Fri, 18 Sep 2020 15:38:33 GMTPart-1: The problemUnderstanding the effective data fetching with JPA Entity Graphs (Part-1)

JPA provides 2 types of fetching strategy for the entities that have relationship between each other (such as OneToOne, OneToMany..), :

  • FetchType.LAZY
  • FetchType.EAGER

This configuration alone is criticised for being statically declared and applied to every fetch call. I would like to explain what exactly that means with an example. Having the following entities:

Understanding the effective data fetching with JPA Entity Graphs (Part-1)
Entity-Relation Diagram

City Entity:

@Entity
public class City {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private String name;

    // getters and setters

}

User Entity:

@Entity
@Table(name = "users")
public class User {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private String name;
    private String email;
    private String password;
    private String phone;

    // getters and setters
}

Address entity

@Entity
public class Address {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String title;
    private String street;
    private String flat;
    private String postalCode;

    @OneToOne
    @JoinColumn(name = "city_id")
    private City city;

    @ManyToOne
    @JoinColumn(name = "user_id")
    private User user;

    // getters and setters
}

I have inserted following entities to the database:

Understanding the effective data fetching with JPA Entity Graphs (Part-1)

We should keep in mind that address is the owning side of the both relations, because it has the foreign keys, and it is also called child entity.

Now if I will invoke find all method of the address repository addressRepository.findAll() the following queries are being retrieved:

-- Hibernate:
    select
        address0_.id as id1_0_,
        address0_.city_id as city_id6_0_,
        address0_.flat as flat2_0_,
        address0_.postal_code as postal_c3_0_,
        address0_.street as street4_0_,
        address0_.title as title5_0_,
        address0_.user_id as user_id7_0_
    from
        address address0_
-- Hibernate:
    select
        city0_.id as id1_1_0_,
        city0_.name as name2_1_0_
    from
        city city0_
    where
        city0_.id=?
-- Hibernate:
    select
        user0_.id as id1_2_0_,
        user0_.email as email2_2_0_,
        user0_.name as name3_2_0_,
        user0_.password as password4_2_0_,
        user0_.phone as phone5_2_0_
    from
        users user0_
    where
        user0_.id=?
-- Hibernate:
    select
        user0_.id as id1_2_0_,
        user0_.email as email2_2_0_,
        user0_.name as name3_2_0_,
        user0_.password as password4_2_0_,
        user0_.phone as phone5_2_0_
    from
        users user0_
    where
        user0_.id=?
-- Hibernate:
    select
        city0_.id as id1_1_0_,
        city0_.name as name2_1_0_
    from
        city city0_
    where
        city0_.id=?

So what happens here is step by step as follows:

  • find all addresses -> returns A1, A2, A3
  • For A1, find the referenced city -> returns C1
  • For A1, find the referenced user -> returns U1
  • For A2, the city referenced is already stored in the persistence context so it is not retrieved again, find the referenced user -> returns U2
  • For A3, the user referenced is already in the persistence context so it not retrieved again, find the referenced city -> returns C2

This behaviour is thanks to first level caching mechanism. Once an entity is in managed state, The Entity Manager keeps it in the cache, so that it is not retrieved from database again.

But why there are so many calls? The problem resides in the fetching strategy. The child entity annotations (OneToOne, ManyToOne) by default configured to use FetchType.EAGER. So when an address is retrieved from database, JPA immediately calls its parents as well.

If we had very large number of rows in each table the list of separate calls would be incredibly long as well and obviously this is very bad in terms of performance. Eager fetching strategy generates this issue named N+1 problem, which means we are invoking 1 select query and it generates other N number separate calls for its children. That's it is recommended to use lazy fetching mechanism. So if I update my Address entity to use lazy fetching:

@OneToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "city_id")
private City city;

@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "user_id")
private User user;

and invoke the find all method again I can see hibernate logs only 1 select query:

-- Hibernate:
    select
        address0_.id as id1_0_,
        address0_.city_id as city_id6_0_,
        address0_.flat as flat2_0_,
        address0_.postal_code as postal_c3_0_,
        address0_.street as street4_0_,
        address0_.title as title5_0_,
        address0_.user_id as user_id7_0_
    from
        address address0_

The retrieved address object contains proxies (not the actual reference) to the parent objects. If I try to access the city or user of the retrieved address within the transaction then it will be retrieved from database again. And if I do this operation in a loop of addresses N+1 Problem will appear again.

Another problem is that sometimes I would like to retrieve the addresses lazily, but sometimes with its user and (or) city information. So setting the fetch type on the entity affects all retrieve calls and does not give the flexibility for different fetch types. That's why we can make use of entity graphs in order to generate multiple fetch plans for different purposes.

Possible solutions and particularly entity graphs are discussed in the Part-2, keep reading.

]]>
<![CDATA[Kafka Connect Basics]]>https://turkogluc.com/apache-kafka-connect-introduction/646f24c59b6311000195da56Mon, 14 Sep 2020 12:53:17 GMTKafka Connect BasicsKafka Connect Basics

Kafka Connect is an open source Apache Kafka component that helps to move the data IN or OUT of Kafka easily. It provides a scalable, reliable, and simpler way to move the data between Kafka and other data sources. According to direction of the data moved, the connector is classified as:

  • Source Connector: Reads data from a datasource and writes to Kafka topic.
  • Sink Connector: Reads data from Kafka topic and writes to a datasource.

Kafka Connect uses connector plugins that are community developed libraries to provide most common data movement cases. Mostly developers need to implement migration between same data sources, such as PostgreSQL, MySQL, Cassandra, MongoDB, Redis, JDBC, FTP, MQTT, Couchbase, REST API, S3, ElasticSearch. Kafka plugins provides the standardised implementation for moving the data from those datastores. Find all available Kafka Connectors on Confluent Hub.

So what Kafka Connect provides is that rather than writing our own Consumer or Producer code, we can use a Connector that takes care of all the implementation details such as fault tolerance, delivery semantics, ordering etc. and get the data moved. For example we can move all of the data from Postgres database to Kafka and from Kafka to ElasticSearch without writing code. It makes it easy for non-experienced developers to get the data in or out of Kafka reliably.

Concepts

Connector plugins implement the connector API that includes connectors and tasks.

  • Connector: is a job that manages and coordinates the tasks. It decides how to split the data-copying work between the tasks.
  • Task: is piece of work that provides service to accomplish actual job.

Connectors divide the actual job into smaller pieces as tasks in order to have the ability to scalability and fault tolerance. The state of the tasks is stored in special Kafka topics, and it is configured with offset.storage.topic, config.storage.topic and status.storage.topic. As the task does not keep its state it can be started, stopped and restarted at any time or nodes.

For example JDBC Connector is used to copy data from databases and it creates task per each table in the database.

  • Worker: is the node that is running the connector and its tasks.  

Kafka Connect workers executes 2 types of working modes:

  • Standalone mode: All work is performed in a single worker as single process. It is easier to setup and configure and can be useful where using single worker makes sense. However it does not provide fault tolerance or scalability.
  • Distributed mode: Multiple workers are in a cluster. Configured by REST API. Provides scalability and fault tolerance. When one connector dies, its tasks are redistributed by rebalance mechanism among other workers.

Running Kafka Connect

Kafka Connect ships with Apache Kafka binaries. So there is no need to install it separately, but in order to run it we need to download Kafka binaries. The executables are in the bin directory and configurations are in the config directory.

Kafka Connect Basics

I personally would prefer you to start practising with distributed mode as it is gets unnecessarily confusing if you work with the standalone and after switch to distributed mode. Also it is recommended to use distributed mode in production, and if we don't want to have a cluster we can run only 1 worker in distributed mode.

1 - Running Kafka Cluster

Let's start with getting a Kafka cluster up and running. We can set up a cluster with one zookepeer and one broker in docker environment with using the following docker compose file.

version: '2'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    hostname: zookeeper
    container_name: zookeeper
    ports:
      - "2181:2181"
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000

  kafka:
    image: confluentinc/cp-kafka:latest
    hostname: kafka
    container_name: kafka
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
      - "29092:29092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

Run the docker-compose up -d command to start the containers. One thing to pay attention here is that KAFKA_ADVERTISED_LISTENERS are set to be localhost:29092 for outside of docker network, and kafka:9092 for inside the docker network. So from out host machine we can access kafka instance with localhost:29092.

2 - Running Kafka Connect

We can run the Kafka Connect with connect-distributed.sh script that is located inside the kafka bin directory. We need to provide a properties file while running this script for configuring the worker properties.

connect-distributed.sh <worker properties file>

We can create create connect-distributed.properties file to specify the worker properties as follows:

# A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
bootstrap.servers=localhost:29092

# unique name for the cluster, used in forming the Connect cluster group. Note that this must not conflict with consumer group IDs
group.id=connect-cluster

# The converters specify the format of data in Kafka and how to translate it into Connect data.
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true

# Topic to use for storing offsets. This topic should have many partitions and be replicated and compacted.
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1

# Topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated,
config.storage.topic=connect-configs
config.storage.replication.factor=1

# Topic to use for storing statuses. This topic can have multiple partitions and should be replicated and compacted.
status.storage.topic=connect-status
status.storage.replication.factor=1

# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000

group.id is one of the most important configuration in this file. Worker groups are created according to group id. So if we start multiple worker with same group id, they will be in the same worker cluster.

offset.storage.topic, config.storage.topic and status.storage.topic configurations are also needed so that worker status will be stored in Kafka topics and new workers or restarted workers will be managed accordingly.

Now we can start Kafka connect with the following command:

connect-distributed.sh /path-to-config/connect-distributed.properties

3 - Starting Connector

Now we have Zookeeper, Kafka broker, and Kafka Connect running in distributed mode. As it is mentioned before, in distributed mode, connectors are manages by REST API. Our connector exposed REST API at http://localhost:8083/.

As an example, we can run a FileStreamSource connector that copies data from a file to Kafka topic. To start a connector we need to send a POST call to http://localhost:8083/connectors endpoint with the configuration of the Connector that we want to run. Example configuration for Connector looks like as follows:

{
  "name": "local-file-source",
  "config": {
    "connector.class": "FileStreamSource",
    "tasks.max": 1,
    "file": "/Users/cemalturkoglu/kafka/shared-folder/file.txt",
    "topic": "file.content"
  }
}

Every connector may have its own specific configurations, and these configurations can be found in the connector's Confluent Hub page.

  • connector.class specifies the connector plugin that we want to use.
  • file is a specific config for FileStreamSource plugin, and it is used to point the file to be read.
  • topic is the name of the topic that the data read from file will be written to.

We need to send this json config in the content body of REST call. We can read this config from file for curl command as follows:

curl -d @"connect-file-source.json" \
-H "Content-Type: application/json" \
-X POST http://localhost:8083/connectors

After this call connector starts running, it reads data from the file and send to the kafka topic which is file.content in the example. If we start a consumer to this topic:

Kafka Connect Basics

We can see that every line in the file.txt is send to Kafka topic as a message. Note that key.converter.schemas.enable and value.converter.schemas.enable is set to be true for the worker at the beginning. So messages are wrapped with Json schema.

In order to scale up the worker cluster, you need to follow the same steps of running Kafka Connect and starting Connector on each worker (All workers should have same group id). The high level overview of the architecture looks like as follows:

Kafka Connect Basics
Kafka Connect Overview

Running Kafka Connect in Docker

In the above example Kafka cluster was being run in Docker but we started the Kafka Connect in the host machine with Kafka binaries. If you wish to run Kafka Connect in Docker container as well, you need a linux image that has Java 8 installed and you can download the Kafka and use connect-distribued.sh script to run it. For a very simple example, you can use the following Dockerfile to run workers:

FROM openjdk:8-jre-slim

RUN apt-get update && \
    apt-get install wget -y

COPY start.sh start.sh
COPY connect-distributed.properties connect-distributed.properties

RUN echo "Downloading Apache Kafka" && \
    wget "http://ftp.man.poznan.pl/apache/kafka/2.6.0/kafka_2.12-2.6.0.tgz" &&\
    tar -xzvf kafka*.tgz && \
    rm kafka*.tgz && \
    mv kafka* kafka && \
    export PATH="$PATH:$(pwd)kafka/bin"


CMD /kafka/bin/connect-distributed.sh connect-distributed.properties

You can customise the Dockerfile according to your needs and improve it or you can use Confluent's Kafka Connect image by adding it to the docker-compose file as follows:

connect:
    image: confluentinc/cp-kafka-connect:latest
    hostname: connect
    container_name: connect
    depends_on:
      - zookeeper
      - kafka
    ports:
      - "8083:8083"
    volumes:
      - ./shared-folder:/shared-folder
    environment:
      CONNECT_BOOTSTRAP_SERVERS: kafka:9092
      CONNECT_REST_ADVERTISED_HOST_NAME: connect
      CONNECT_REST_PORT: 8083
      CONNECT_GROUP_ID: compose-connect-group
      CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
      CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
      CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000
      CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
      CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
      CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
      CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
      CONNECT_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
      CONNECT_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
      CONNECT_INTERNAL_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
      CONNECT_INTERNAL_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
      CONNECT_ZOOKEEPER_CONNECT: zookeeper:2181
      CONNECT_PLUGIN_PATH: /usr/share/java
]]>