Our Blog

Ongoing observations by End Point people

Testing to defend against nginx add_header surprises

By Jon Jensen
May 29, 2020

Cute calico cat perched securely upon a trepidatious shoe

These days when hosting websites it is common to configure the web server to send several HTTP response headers with every single request for security purposes.

For example, using the nginx web server we may add these directives to our http configuration scope to apply to everything served, or to specific server configuration scopes to apply only to particular websites we serve:

add_header Strict-Transport-Security max-age=2592000 always;
add_header X-Content-Type-Options    nosniff         always;

(See HTTP Strict Transport Security and X-Content-Type-Options at MDN for details about these two particular headers.)

The surprise (problem)

Once upon a time I ran into a case where nginx usually added the expected HTTP response headers, but later appeared to be inconsistent and sometimes did not. This is distressing!

Troubleshooting leads to the (re-)discovery that add_header directives are not always additive throughout the configuration as one would expect, and as every other server I can think of typically does.

If you define your add_header directives in the http block and then use an add_header directive in a server block, those from the http block will disappear.

If you define some add_header directives in the server block and then add another add_header directive in a location block, those from the http and/or server blocks will disappear.

This is even the case in an if block.

In the nginx add_header documentation we find the reason for the behavior explained:

There could be several add_header directives. These directives are inherited from the previous level if and only if there are no add_header directives defined on the current level.

This nginx directive has always behaved this way. Various people have warned about it in blog posts and online discussions for many years. But the situation remains the same, a trap for the unwary.

I have tried to imagine the rationale behind this behavior. Response headers often are set in groups, so the programmer...


sysadmin nginx security javascript nodejs testing

Implementing SummAE neural text summarization with a denoising auto-encoder

By Kamil Ciemniewski
May 28, 2020

Book open on lawn with dandelions

If there’s any problem space in machine learning, with no shortage of (unlabelled) data to train on, it’s easily natural language processing (NLP).

In this article, I’d like to take on the challenge of taking a paper that came from Google Research in late 2019 and implementing it. It’s going to be a fun trip into the world of neural text summarization. We’re going to go through the basics, the coding, and then we’ll look at what the results actually are in the end.

The paper we’re going to implement here is: Peter J. Liu, Yu-An Chung, Jie Ren (2019) SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders.

Here’s the paper’s abstract:

We propose an end-to-end neural model for zero-shot abstractive text summarization of paragraphs, and introduce a benchmark task, ROCSumm, based on ROCStories, a subset for which we collected human summaries. In this task, five-sentence stories (paragraphs) are summarized with one sentence, using human summaries only for evaluation. We show results for extractive and human baselines to demonstrate a large abstractive gap in performance. Our model, SummAE, consists of a denoising auto-encoder that embeds sentences and paragraphs in a common space, from which either can be decoded. Summaries for paragraphs are generated by decoding a sentence from the paragraph representations. We find that traditional sequence-to-sequence auto-encoders fail to produce good summaries and describe how specific architectural choices and pre-training techniques can significantly improve performance, outperforming extractive baselines. The data, training, evaluation code, and best model weights are open-sourced.

Preliminaries

Before we go any further, let’s talk a little bit about neural summarization in general. There’re two main approaches to it:

The first approach makes the model “focus” on the most important parts of the longer text - extracting them to form a summary.

Let’s take a recent...


python machine-learning artificial-intelligence natural-language-processing

Designing flexible CI pipelines with Jenkins and Docker

By Will Plaut
May 25, 2020

Pipes

Photo by Tian Kuan on Unsplash

When deciding on how to implement continuous integration (CI) for a new project, you are presented with lots of choices. Whatever you end up choosing, your CI needs to work for you and your team. Keeping the CI process and its mechanisms clear and concise helps everyone working on the project. The setup we are currently employing, and what I am going to showcase here, has proven to be flexible and powerful. Specifically, I’m going to highlight some of the things Jenkins and Docker do that are really helpful.

Jenkins

Jenkins provides us with all the CI functionality we need and it can be easily configured to connect to projects on GitHub and our internal GitLab. Jenkins has support for something it calls a multibranch pipeline. A Jenkins project follows a repo and builds any branch that has a Jenkinsfile. A Jenkinsfile configures an individual pipeline that Jenkins runs against a repo on a branch, tag or merge request (MR).

To keep it even simpler, we condense the steps that a Jenkinsfile runs into shell scripts that live in /scripts/ at the root of the source repo to do things like test or build or deploy, such as /scripts/test.sh. If a team member wants to know how the tests are run, it is right in that file to reference.

The Jenkinsfile can be written in a declarative syntax or in plain Groovy. We have landed on the scripted Groovy syntax for its more fine-grained control of Docker containers. Jenkins also provides several ways to inspect and debug the pipelines with things like “Replay” in its GUI and using input('wait here') in a pipeline to debug a troublesome step. The input() function is especially useful when paired with Docker. The function allows us to pause the job and go to the Jenkins server where we use docker ps to find the running container’s name. Then we use docker exec -it {container name} bash to debug inside of the container with all of the Jenkins environment variables loaded. This has proven to be a great...


jenkins docker groovy

Creating a Messaging App Using Spring for Apache Kafka, Part 3

By Kürşat Kutlu Aydemir
May 21, 2020

Spring-Kafka

Photo by Pascal Debrunner on Unsplash

This article is part of a series.

In this article we’ll create the persistence and cache models and repositories. We’re also going to create our PostgreSQL database and the basic schema that we’re going to map to the persistence model.

Persistence

Database

We are going to keep the persistence model as simple as possible so we can focus on the overall functionality. Let’s first create our PostgreSQL database and schema. Here is the list of tables that we’re going to create:

  • users: will hold the users who are registered to use this messaging service.
  • access_token: will hold the unique authentication tokens per session. We’re not going to implement an authentication and authorization server specifically in this series but rather will generate a simple token and store it in this table.
  • contacts: will hold relationships of existing users.
  • messages: will hold messages sent to users.

Let’s create our tables:

CREATE TABLE kafkamessaging.users (
    user_id BIGSERIAL PRIMARY KEY,
    fname VARCHAR(32) NOT NULL,
    lname VARCHAR(32) NOT NULL,
    mobile VARCHAR(32) NOT NULL,
    created_at DATE NOT NULL
);

CREATE TABLE kafkamessaging.access_token (
    token_id BIGSERIAL PRIMARY KEY, 
    token VARCHAR(256) NOT NULL,
    user_id BIGINT NOT NULL REFERENCES kafkamessaging.users(user_id),
    created_at DATE NOT NULL
);

CREATE TABLE kafkamessaging.contacts (
    contact_id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL REFERENCES kafkamessaging.users(user_id),
    contact_user_id BIGINT NOT NULL REFERENCES kafkamessaging.users(user_id),
);

CREATE TABLE kafkamessaging.messages (
    message_id BIGSERIAL PRIMARY KEY,
    from_user_id BIGINT NOT NULL REFERENCES kafkamessaging.users(user_id),
    to_user_id BIGINT NOT NULL REFERENCES kafkamessaging.users(user_id),
    message VARCHAR(512) NOT NULL,
    sent_at DATE NOT NULL
);

Model

Before creating the models we’ll add another dependency called Lombok in pom.xml as shown...


java spring frameworks kafka spring-kafka-series

Shopify Admin API: Importing Products in Bulk

By Patrick Lewis
May 4, 2020

Cash Register Photo by Chris Young, used under CC BY-SA 2.0, cropped from original.

I recently worked on an interesting project for a store owner who was facing a daunting task: he had an inventory of hundreds of thousands of Magic: The Gathering (MTG) cards that he wanted to sell online through his Shopify store. The logistics of tracking down artwork and current market pricing for each card made it impossible to do manually.

My solution was to create a custom Rails application that retrieves inventory data from a combination of APIs and then automatically creates products for each card in Shopify. The resulting project turned what would have been a months- or years-long task into a bulk upload that only took a few hours to complete and allowed the store owner to immediately start selling his inventory online. The online store launch turned out to be even more important than initially expected due to current closures of physical stores.

Application Requirements

The main requirements for the Rails application were:

  • Retrieving product data for MTG cards by merging results from a combination of sources/APIs
  • Mapping card attributes and metadata into the format expected by the Shopify Admin API for creating Product records
  • Performing a bulk push of products to Shopify

There were some additional considerations like staying within rate limits for both the inventory data and Shopify APIs, but I will address those further in a follow-up post.

Retrieving Card Artwork and Pricing

I ended up using a combination of two APIs to retrieve MTG card data: MTGJSON for card details like the name of the card and the set it belonged to, and Scryfall for retrieving card images and current market pricing. It was relatively easy to combine the two because MTGJSON provided Scryfall IDs for all of its records, allowing me to merge results from the two APIs together.

Working With the Shopify Admin API in Ruby

The Shopify Admin API deals in terms of generic Product records with predefined attributes...


shopify ecommerce ruby rails

Creating a Messaging App Using Spring for Apache Kafka, Part 2

By Kürşat Kutlu Aydemir
April 29, 2020

Spring pasture

This article is part of a series.

In this part I’ll walk through Kafka’s servers and processes, the basics of spring-kafka producers and consumers, persistence, and caching configurations.

Kafka Servers

Kafka uses Apache ZooKeeper as the distributed coordination server. You can download the Apache Kafka with ZooKeeper bundle here.

When you download and untar the Kafka bundle Kafka’s console scripts can be found in the bin directory. To enable Kafka connectivity and prepare the Kafka configuration let’s start the Kafka servers and see how to create Kafka topics and test console producers and consumers.

ZooKeeper

To start ZooKeeper with the default properties run the following command:

bin/zookeeper-server-start.sh config/zookeeper.properties

Kafka Server

A single Kafka server with the default properties can be started with following command:

bin/kafka-server-start.sh config/server.properties

Kafka Topics

Creating Kafka Topics

Let’s create a test Kafka topic:

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic myTestTopic

List Topics

To list all previously created Kafka topics:

bin/kafka-topics.sh --list --bootstrap-server localhost:9092

Start a Producer

To start a console producer run the following command and send some messages from console:

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic myTestTopic
> This is a message
> This is another message

Start a Consumer

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic myTestTopic --from-beginning

When you run the consumer on the console with the from-beginning parameter you’ll see all the messages sent previously shown in the console.

Here we ran Kafka as a single server. You’ll need to optimize and scale the Kafka clusters for production and large-scale distributed systems. So far, we’ve become familiar with some Kafka components but for further Kafka configuration you can refer to the corresponding...


java spring frameworks kafka spring-kafka-series

Convenient Reporting with Jasper

By Árpád Lajos
April 28, 2020

Basalt pillars

Business Intelligence (BI) reporting is a huge problem space in custom software. There’s a wide range of business needs for looking at past and predictive behavior. Building a reporting tool can be a very cost effective way to get this data, especially compared to writing individual queries or manually generating reports.

I’ve been working with Jasper in the Java project space and wanted to write about some research I’ve collected on the topic.

JasperReports takes .jrxml files as input and outputs a .jasper report. Possible output targets include:

  • Screen
  • Printer
  • PDF
  • HTML
  • Excel files
  • RTF
  • ODT
  • CSV
  • XML

Jasper history

  • June 2001: Teodor Danciu began working on JasperReports.
  • September 2001: Jasper was registered on SourceForge.
  • November 2001: JasperReports 0.1.5 was released.
  • 2004: Panscopic teamed up with Teodor Danciu, acquired ownership of the product and changed its name to Jaspersoft.
  • 2005: JasperReports 1.0 was released.
  • 2007: Brian Gentile became CEO of the company.
  • 2014: TIBCO acquired Jaspersoft for ~$185 million.

Best reporting tools

Let’s compare some popular reporting tools:

  • JasperReports is a free and open source Java-based reporting tool, which supports lots of possible outputs, as mentioned earlier. Generating reports can be difficult if you’re less technical. More technical aspects can be more difficult as well; embedding JasperReports into a project is not necessarily simple, but once it’s done, the tool will be reliable.
  • Crystal Reports supports many inputs, including Access, Excel, XML, ODBC, and JDBC. It also has good multi-language support. It’s easy to embed into a .NET project, but software updates are unstable. The process can be very slow and there is no control of data-level security. A trial version is offered, but if one wants to use it long-term, then the price is a one-time payment of $495 or more, for larger companies.
  • Domo is another a popular reporting tool. It provides a trial version, and a 5 user plan costs $5700/year*...

java reporting jasper

Migrating large PostgreSQL databases

By Árpád Lajos
April 21, 2020

Migration

Photo by Harshil Gudka on Unsplash

The challenge

One of our clients has a large and important health-related application. It’s built on an end-of-life Ruby on Rails-based open source framework, heavily customized over the years. They wanted to upgrade to a newer, supported, Java-based open source application a partner organization had developed as a replacement. Both organizations used the old system previously. To do that we would need to migrate all their existing PostgreSQL data from the old system to the new one, retaining important customizations while adapting to the new database schema.

Although there were many similarities between the old system and the new, the differences were significant enough to require careful study of the database schemas and the migration scripts designed to move the data:

  • There were schema-level differences between our old database and the partner organization’s old database.
  • Even where the two old databases were similar there were differences on the data level, such as different standards, different values in table records, different representation, etc.
  • We had different content, so if a script was working well for their data, it was not necessarily correct for us.
  • There were dynamically generated tables for both old databases and we had to find out how we can convert our current schema elements along with its records to the planned schema elements along with its records.

We had to understand the differences between our and their old databases. Due to the number of tables and average number of columns, manual comparison between databases was not really an option. We knew that the algorithm of handling the scripts would look like below:

For each S in Scripts
    Analyze S and understand the intent behind it
    Compute a read-only version of S to avoid write operations
    Execute the read-only version of S
    Analyze the results and find out whether they are different from the expected results
    Convert our read-only...

postgres big-data database
Page 1 of 187 • Next page

Popular Tags


Archive


Search our blog