Our Blog

Ongoing observations by End Point people

Conference Recap: PyCon Asia Pacific (APAC) 2017 in Kuala Lumpur, Malaysia

By Muhammad Najmi bin Ahmad Zabidi
December 2, 2017

I got a chance to attend the annual PyCon APAC 2017 (Python Conference, Asia Pacific) which was hosted in my homeland, Malaysia. In previous years, Python conferences in Malaysia were held at the national level and this year the Malaysia’s PyCon committee worked hard on organizing a broader Asia-level regional conference.

Highlights from Day 1

The first day of the conference began with a keynote delivered by Luis Miguel Sanchez, the founder of SGX Analytics, a New York City-based data science/data strategy advisory firm. Luis shared thoughts about the advancement of artificial intelligence and machine learning in many aspects, including demonstrations of automated music generation. In his talk Luis presented his application which composed a song using his AI algorithm. He also told us a bit on the legal aspect of the music produced by his algorithm.

Luis Miguel Sanchez speaking

Luis speaking to the the audience. Photo from PyCon’s Flickr.

Then I attended Amir Othman’s talk which discussed the data mining technique of news in the Malay and German languages (he received his education at a German tertiary institution). His discussion included the verification of the source of the news and the issue of the language structure of German and Malay, which have similarities with English. First, Amir mentioned language detection using pycld2. Amir shared the backend setup for his news crawler which includes RSS and Twitter feeds for input, Redis as a message queue, and Spacy and Polyglot for the “entity recognition”.

Quite a number of speakers spoke about gensim, including Amir, who used it for “topic modelling”. Amir also used TF/IDF (term frequency–inverse document frequency) which is a numerical statistic method that is intended to reflect how significant a word is to a document in a corpus. For the similarity lookup aspect, he used word2vec on the entire corpus. In the case of full-text search he used Elasticsearch.

Later I attended Mr. Ng Swee Meng’s talk in which he shared his effort in the...


conference python

Logstash: Removing fields with empty values

By Jon Jensen
November 22, 2017

The Elastic stack is a nice toolkit for collecting, transporting, transforming, aggregating, searching, and reporting on log data from many sources. It was formerly known as the ELK stack, after its main components Elasticsearch, Logstash, and Kibana, but with the addition of Beats and other tools, the company now calls it simply the Elastic stack.

We are using it in a common configuration, on a central log server that receives logs via rsyslog over TLS, which are then stored in local files and processed further by Logstash.

When conservation is recommended

When forwarding logs on to SaaS log services such as Logentries, SumoLogic, etc., we have a limited amount of data transfer and storage allotted to us. So we need to either economize on what we send them, pay for a more expensive plan, or retain a shorter period of history.

For some very busy logs (nginx logs in JSON format) we decided to delete fields with empty values from the log event during the filter phase in Logstash. This removes a lot of data from the log message we send to the log service over the wire, and reduces the size of each log event stored in their system.

I expected this to be simple, but that expectation sometimes proves to be false. :)

Trying the prune filter

The most obvious way would be to use the Logstash prune filter, which is designed for just such a use case. However, the prune filter doesn’t handle nested keys, as explained in the documentation:

NOTE: This filter currently only support operations on top-level fields, i.e. whitelisting and blacklisting of subfields based on name or value does not work.

That is too bad.

Pruning with custom Ruby code

Several people have posted alternative solutions to this in the past. A representative recipe to have Logstash delete empty fields looked like this:

# This doesn’t work in Logstash 5 and newer ...
filter {
  ruby {
    code => "event.to_hash.delete_if {|field, value| value == '' }"
  }
}

And sadly, it doesn’t work.

Logstash...


hosting logging

From Zero to HTTPS in an afternoon

By Matt Vollrath
November 20, 2017

I’ve been hosting my own humble personal web site since 2012. I had never bothered setting up HTTPS for my domain, but after hearing about the Let’s Encrypt project, I was completely out of excuses.

For the unfamiliar, Let’s Encrypt offers free and fully automatic HTTPS certificates. The web cares about HTTPS now more than ever. Deeply interactive interfaces like geolocation and user media (camera, microphone) are too sensitive to trust an insecure transport. By leveraging the security features present in modern browsers, users can expect a reasonable safety from attacks that would exploit the weaknesses of HTTP.

To take the security mission even further, I decided to completely containerize my server and expose only a couple of ports. Using a Docker composition made it very easy to deploy up-to-date nginx and keep it isolated from the rest of my host shard.

The first mission was to set up certificates with certbot, the EFF’s free certificate tool. certbot has a plugin that writes nginx configuration for you, but in this case I didn’t want nginx installed on my host at all. Instead of following the nginx-specific instructions for my platform, I opted for the webroot plugin to just give me a certificate and let me figure out how to set it up. A certbot invocation and a few seconds later I have certificates for my site in /etc/letsencrypt/live/www.mvollrath.net.

Next I went shopping for nginx Docker images. The official nginx image has everything I want: the latest and greatest mainline nginx based on stable Debian. I considered the Alpine variant, but felt like Debian was a better choice for me; familiarity outweighs a few tens of MB of image size.

The nginx image ships with a default configuration serving a single root directory over HTTP. Since HTTPS was the point of this experiment, I set out to correct this. I started by creating a project directory on the host to house all the configuration needed to build out my server. Then I started up a container with...


hosting security tls nginx

President of UN General Assembly Thanks End Point

By Jonathan Blessing
November 17, 2017

The President of UN General Assembly, Peter Thomson, thanked End Point for supporting the Ocean Conference, which was held at the United Nations Headquarters this past summer to bring attention and action to saving the world’s oceans.

End Point’s Liquid Galaxy helped bring to life “Reconnecting Humanity to the Sea,” an exhibition meant to showcase the beauty of the ocean and the challenges it faces today. End Point created the presentation’s content and showcased it at the conference.

“We were very pleased to see End Point’s Liquid Galaxy used to promote a hopeful future for the world’s oceans. It’s very satisfying to see our technology used to make an important story that much more compelling.”

Rick Peltzman
End Point, CEO

This UN press release explains more about the conference and its results:
“UN Ocean Conference wraps up with actions to restore ocean health, protect marine life”

See the letter:


company liquid-galaxy

Using GitHub for Blog Comments

By Phin Jensen
November 14, 2017

Last Saturday, November 11, we rolled out a new website that we’ve been working on for a few months. Part of this update was moving from Blogger as our blogging platform to static HTML generated by Middleman. We were more than happy to move away from Blogger for a variety of reasons, including its lack of HTTPS support for custom domains and how difficult it was to keep its templating and styling up to date with our main website. We were also able to move from http://blog.endpoint.com to https://www.endpoint.com/blog.

The most obvious thing that is missing from Middleman’s blog extension is the lack of a commenting system. After exploring some options for comments, we settled on using GitHub issues and comments, inspired by Don Williamson’s post about doing the same thing. It’s a bit of an unconventional approach, so this post will explain how to use our commenting system and how we implemented it.

Commenting requires a GitHub account, which is easy to sign up for and free, and the general target audience of our blog will often already have a GitHub account. At the bottom of each post will be a link to a GitHub issue at the top of the list of comments, if there are any. Click on the issue, write and post your comment, and it’ll appear on the blog post when you reload the page. It’s pretty simple.

Now, for the gritty details. Don’s post shows the JavaScript he uses to find the correct GitHub issue and fetch its comments, but it required a bit of modification to work in our Middleman project. First, we needed a way to get the right GitHub issue number. Don’s example has that worked into the JavaScript with Hugo, fetching it from some parameter value, but I opted to create a hidden input on the page that would provide the right number:

<input type="hidden" name="gh_issue" value="<%= current_page.data.gh_issue_number =>">

Or in HAML:

%input{type: "hidden", name: "gh_issue", value: current_page.data.gh_issue_number}

And we can fetch that value with jQuery:

var...

javascript community

A Collaborative Timezone Utility

By Joe Marrero
October 30, 2017

Try It Out Yourself

The code for this project is hosted on GitHub and can be cloned from here.

At End Point Corporation, our team is spread out across 10 time zones. This gives us the advantage of being able to work around the clock on projects. When one co-worker leaves for day, another can take over. Consider this scenario. It's Monday evening and Martin needs to continue installing software on that Linux cluster, but it's already 6pm and his wife is going to murder him if he's not ready to go out for their anniversary dinner. Let's see who can take over... Ah, yes, Sanjay in Bangalore can continue with the maintenance. Tuesday morning, the client wakes up to be surprised that 16 hours of work was completed in a day. With respect to software development, the same efficiencies can be realized by parallelizing tasks across time-zones. Code reviews and further development can be continued after normal business hours.

With all the blessings of a distributed engineering team, collaborating with co-workers can be, occasionally, challenging. Some of these challenges stem from complexities of our system of time. Every co-worker may be operating in a timezone that is different than yours. Time-zones have an associated offset relative to Coordinated Universal Time (UTC). These offsets are usually in whole hour increments but they may be any real-valued number.

For example, Eastern Standard Time (EST) has an offset of -5 (five hours behind UTC) and Indian Standard Time (IST) has an offset of 5.5 (five and half hours ahead of UTC). Furthermore, these UTC offsets can be completely arbitrary. In 1995, Kiribati, an island nation in the Pacific, changed its UTC offset from -10 to +14 so that all of its outlying islands can share the same time. To further complicate things, some regions may not observe daylight savings time (DST) while other regions do. In fact, in the United States, Indiana started observing DST on April 2, 2006. Some states like Arizona and...


linux open-source tools

Hot-deploy Java classes and assets in Wildfly 8/9/10

By Piotr Hankiewicz
October 27, 2017

Introduction

Java development can be really frustrating when you need to re-build your project and restart a server every time you change something. I know about JRebel, but while it’s a good tool, it’s also pretty expensive. You can use the open-source version, but then you need to send project statistics to the JRebel server, which is not a viable option for your more serious projects.

Fortunately, there is an open-source project called HotSwapAgent and it does the same thing as JRebel, for free (thank you, guys!).

I will explain how to combine it with Widlfly in order to hot-deploy Java classes as well as how to hot-deploy other resources (Javascript, CSS, images).

Wildfly configuration

Let’s assume that we use the standalone-full.xml configuration file.

We need to use exploded deployment instead of deploying WAR or EAR. You can do this in production as well to allow for application changes with zero downtime.

Start by configuring the metaspace size; we had to increase defaults for our application, but it’s possible that it will be just fine in your case. It’s encouraged that you play with these values after completing all steps.

In:

WILDFLY_DIR/bin/standalone.conf

set:

-XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=512m

so it looks like this:

JAVA_OPTS="-Xms512m -Xmx1024m -XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=512m".

Now, look for the deployment-scanner node in:

WILDFLY_DIR/standalone/configuration/standalone-full.xml

Replace it so it looks like this:

<deployment-scanner path="PATH_TO_DEPLOYMENT_DIR" relative-to="RELATIVE_TO_PATH" scan-enabled="true" scan-interval="2000" auto-deploy-exploded="false" runtime-failure-causes-rollback="${jboss.deployment.scanner.rollback.on.failure:false}"/>

Note:

PATH_TO_DEPLOYMENT_DIR is WILDFLY_DIR/standalone/deployments

RELATIVE_TO_PATH is, as the name suggests the dir that the PATH_TO_DEPLOYMENT_DIR is relative to.

HotSwapAgent installation and configuration

We need to download and install the latest...


java

Using tail_n_mail after hours

By Greg Sabino Mullane
October 23, 2017


(Photo of Turtle Island by Edwin Poon)

Someone recently asked me something about tail_n_mail, a program that watches over your log files, scans for certain patterns, and sends out an email if matches are found. It is frequently used to watch over Postgres logs so you can receive an automatic email alert when Bad Things start happening to your database. The questioner wanted to know if it was possible for tail_n_mail to change its behavior based on the time of day — would it be able to do things differently outside of “business hours”? Although tail_n_mail cannot do so directly, a simple solution is to use alternate configuration files — which get swapped by cron — and the INHERIT keyword.

To demonstrate the solution, let’s spin up a Postgres 10 instance, route the logs to syslog, setup tail_n_mail, and then create separate configuration files for different times of the week. First, some setup:

$ initdb --version
initdb (PostgreSQL) 10.0
$ initdb --data-checksums data
$ cat >> data/postgresql.conf << EOT
log_line_prefix=''
log_destination='syslog'
EOT
$ echo 'local0.*  /var/log/postgres.log' | sudo tee -a /etc/rsyslog.conf > /dev/null
$ sudo systemctl restart rsyslog
$ pg_ctl start -D data -l logfile

Grab the latest version of tail_n_mail and verify it:

$ wget --no-verbose https://bucardo.org/downloads/tail_n_mail{,.asc}
2017-03-03 10:00:33 URL:https://bucardo.org/downloads/tail_n_mail [98767/98767] -> "tail_n_mail" [1]
2017-03-03 10:00:33 URL:https://bucardo.org/downloads/tail_n_mail.asc [163/163] -> "tail_n_mail.asc" [1]
FINISHED --2017-03-03 10:00:33--
Total wall clock time: 0.3s
Downloaded: 2 files, 96K in 0.1s (702 KB/s)
$ gpg --verify tail_n_mail.asc
gpg: assuming signed data in `tail_n_mail'
gpg: Signature made Sun 01 Oct 2017 11:14:07 AM EDT using DSA key ID 14964AC8
gpg: Good signature from "Greg Sabino Mullane <greg@turnstep.com>"
gpg:                 aka "Greg Sabino Mullane (End Point Corporation) <greg@endpoint.com>"
gpg: WARNING: This key is not...

postgres monitoring
Previous page • Page 2 of 169 • Next page

Popular Tags


Archive


Search our blog