Emails arrive on mobile device but not Outlook client

In a single AD Domain with an Exchange 2016 environment that was hosting multiple email domains, there was a power user that has several mailboxes with different email suffixes that would sporadically stop receiving inbound emails to his fully patched, Outlook 2016 client. (The 2013 client behaved exactly the same.)

The Exchange server system is a simple 2 server setup, the databases are replicated in a DAG array, with several different databases split out by company/department.

Exchange DB1

 

As you see in the figure, User1 has four different user accounts with four different mailboxes with different suffixes hosted on the same database, as he is from Company1, but needs to receive separated email to different mailboxes (reply with those unique email addresses), and authenticate separately.

After several hours of combing through the environment, and Microsoft support services unable to find anything amiss, one of the tests were creating a new Outlook profile, adding just one user account, and testing, well what do you know, it works! When a second mailbox is added to the profile, inbound mail stops to the client though. (Again, a mobile device receives the inbound mail immediately, but nothing occurs for the desktop Outlook client)

A hint on how to fix it came when I looked at User2. In this case, User2 also was opening up multiple mailboxes with the same clients, but there were no issues at all. As is evident, even though the mailboxes open from the same Exchange environment, the back end databases are separate.

After creating a new database for “@Othersuffix.com”, and migrating the User1 mailbox over to it, when that additional mailbox was opened in Outlook, mail flow continued!

The Exchange environment pictured has a lot more complexity, to end users it is completely separate, seemingly different Auth Domains, DNS, URLs, etc., but in reality is all the same back end infrastructure for ease of maintenance, (hint, KEMP is used to do a bunch of backward and forward URL rewriting) so adding some additional mailbox databases in the back end didn’t really complicate efforts too much.

Inside access to external NAT IP services

The corporate office has an environment where there is a separate “guest” network for vendors, visitors, etc. that can use their own devices, to use internet services through Wi-Fi.
Due to the fact some internal services are needed such as joining internal Audio/Video conferences, and access to collaborative services, we had a requirement that access to those services be available through this semi-public network that is “external”, i.e. uses external DNS resolution, but is still “inside” the firewall boundary as shown below.

Access from GuestNet-Diag

To spell it out, we had the following infrastructure:

  • Internal services only accessible from inside the corporate network and internal devices.
  • External services accessible from outside the corporate network.
  • Guest network with external/public DNS resolution.

The requirements are as follows:

  • Internal services accessible externally if secured with boundary extending services, primarily Microsoft Direct Access, or if explicitly approved, Cisco VPN software.
  • Skype for Business availability for Guest Wi-Fi to join conferences and collaboration.
  • SharePoint services (specifically Office Online Server) for collaboration access from Guest Wi-Fi.

Cisco has a good article here: https://www.cisco.com/c/en/us/support/docs/security/asa-5500-x-series-next-generation-firewalls/72273-dns-doctoring-3zones.html to make this process work, the D-NAT methodology was used, but the article can a bit confusing, so I just wanted to explain a couple things to clarify it, and show how the final NAT rules end up looking.

The following are the relevant NAT rules:

Access from GuestNet

  • For the Match Criteria:
    • Source is the Guest network.
    • Destination for the Skype for Business services is the “inside” DMZ interface, as in this case, the DMZ is sandwiched, has an internally facing network, and an externally facing network, or wherever the service resides, as some services are on the “services” network.
    • IP source is the Guest network object.
    • IP destination is the External IP of the service.
  • For the Action:
    • Source is wherever the service resides.
    • Destination is the Internal IP of the service.

I hope this helps clarify the Cisco article.

 

Broken ADFS! Service Unavailable – Error 503

Came in this morning to a lovely issue, ADFS authenticated services were completely unavailable! Office 365 archive mailboxes, hosted CRM, etc.

Upon testing the URL: /adfs/services/trust/mex a lovely “Error 503” was displayed!

Key prob 1

I changed the internal ADFS certs to use the new EKU requirements (Server and Client Authentication), verified NT SERVICE\drs and NT SERVICE\adfssrv had the correct permissions on the private keys, but still no dice for external usage.

After using my trusty bing.com, I came across this lovely Microsoft article about the KeySpec property for the Web Application Proxy server: https://docs.microsoft.com/en-us/windows-server/identity/ad-fs/technical-reference/ad-fs-and-keyspec-property

Checking the server’s keys using the Powershell command dir cert:/LocalMachine/My reveals the following problem:

KeySpec = 0

Ext Cert wrong

Ok, so the fix is easy right? Just export the cert to a pfx file, import it with

certutil -csp "Microsoft RSA SChannel Cryptographic Provider" -importpfx

or as the article says:

certutil –importpfx extcert.pfx AT_KEYEXCHANGE

In this case, I got a lovely -importPFX command FAILED: 0x80090029 The requested operation is not supported. error message as shown:

Key prob 3

After looking around for a while, I remembered the article I wrote back in September 2017: LS Audio/Video Authentication Server Error 19008 – Private Key not found, went through that process, and what do you know, it worked!!

Ext Cert right

The URL: /adfs/services/trust/mex now works perfectly, and all services that depend on ADFS are up!

Exchange database contains one or more mailboxes…

What do you do when you have what appears to be an empty mailbox base, but you get the dreaded: “This mailbox database contains one or more mailboxes, mailbox plans….” message?

remove-db error

The following are some commands to run:

Get-MailboxStatistics -Database DatabaseToRemove | ForEach { Update-StoreMailboxState -Database $_.Database -Identity $_.MailboxGuid -Confirm:$false }

Get-MailboxStatistics -Database DatabaseToRemove | where {$_.DisconnectReason -eq "SoftDeleted"} | foreach {Remove-StoreMailbox -Database $_.database -Identity $_.mailboxguid -MailboxState SoftDeleted}
Get-Mailbox -Database DatabaseToRemove -Archive
Get-Mailbox -Database DatabaseToRemove -PublicFolder
Get-Mailbox -Database DatabaseToRemove -Arbitration
Get-Mailbox -Database DatabaseToRemove -AuditLog

If after all those back empty you still have the issue, try to remove the database with the -Verbose parameter, as that parameter will show you what mailboxes (if any) still reside on the database.

Remove-MailboxDatabase DatabaseToRemove -Verbose

If the removal process still fails, a possibility is that the database in question is an Archive Database for a mailbox residing on a different mailbox database.

The following command helps you list mailboxes using a specific database as Archive Database:

Get-Mailbox | where {$_.ArchiveDatabase -eq DatabaseToRemove}

You can migrate just the archive mailbox to another database like so:

New-MoveRequest username -ArchiveOnly

The database can now be removed after the move is completed!

Net Neutrality, my thoughts

My take on Net Neutrality:

The old adage of free market vs government boundaries! – Let’s take a look at some different mediums, the telephone, heavily regulated, how much innovation have we seen there in the past 20 years? What about broadcast spectrum? Sure, we have HDTV now, but innovation? I worked on Microsoft Mediaroom a while back (it’s called ATT Uverse here in the States), do you know 10 years ago they had the capability of choosing what car you wanted to be in while a car race was going on, the angle of the  football field or team you wanted to see from during a game? All from your remote control! Why haven’t we seen it yet? Government regulations! – Let’s look at another medium not regulated in the slightest, the Internet! – When I got involved in it the rage was BBS technologies, a bank of modems where files/bulletin boards, etc were exchanged, then E-Mail took over and went nutso. Groupwise, Lotus Notes and Microsoft slugged it out. During that HTML went nutso, we know that story, then we have PKI, VoIP, IoT, and the plethora of other technologies, what’s the latest? Blockchain, who knows what else? Do you see a contrast?

What’s the big difference here? Free market vs heavily regulated mediums!

I’m no expert, but that’s how I see it.

LS Audio/Video Authentication Server Error 19008 – Private Key not found

After happily running for several years, one of the Skype for Business edge servers for one implementation decided it was not going to start its Audio/Video Authentication and Audio/Video Edge service!

Looking at the event viewer, the following two Event IDs were raised: 19008 and 19005. Specifically: 19008: Private key for server certificate not found by the LS A/V Authentication service or the service does not have sufficient permissions to access the certificate.

After verifying the private key permissions are set correctly (NETWORK SERVICE: Read, etc) in the Certificate MMC snap-in, I checked to see what the certs looked like in PowerShell

PS Cert:\LocalMachine\My\> dir .\ 5E670E493E5EBAACC5B26E219ACA8A629F9485D4 | fl 

HasPrivateKey  : True
PrivateKey     :
PublicKey      : System.Security.Cryptography.X509Certificates.PublicKey

Notice that there is no PrivateKey provider defined here, which means the cert broke somehow! Strange, as in this environment there were two Skype for Business edge servers, one worked perfectly, the other did not.

Anyways, the fix was to tear the certs apart, and put them back together as shown in this Merge certificate public and private key with OpenSSL TechNet article.

Specifically

    1. I got the OpenSSL binaries from: https://indy.fulgan.com/SSL
    2. I extracted the keys using the following commands:
      openssl pkcs12 -in egdev1.pfx -nocerts -out private_key.pem -nodes
      openssl pkcs12 -in egdev1.pfx -nokeys -out public_key.crt
    3. I merged the keys back together using the following command:
      openssl pkcs12 -export -in public_key.crt -inkey private_key.pem -out lync_edge_merged.pfx

After certificate import, and applying it to the services, I checked to see what the certs looked like in PowerShell

PS Cert:\LocalMachine\My\> dir .\ 5E670E493E5EBAACC5B26E219ACA8A629F9485D4 | fl 

HasPrivateKey  : True
PrivateKey     : System.Security.Cryptography.RSACryptoServiceProvider
PublicKey      : System.Security.Cryptography.X509Certificates.PublicKey

Notice that there is now a PrivateKey provider defined here, and the two Audio/Video Authentication and Audio/Video Edge services started up just perfectly!

Skype for Business presentation size limits.

I recently had an implementation where very large PowerPoint presentations was needed. When those pptx files were pre-uploaded to the meeting, the following dreaded “allowable file size exceeded” message occurred:

Exceeds File size SfB

It got me interested in finding out what the allowable file sizes are for Skype, and after scouring documentation, I discovered the following:

As of September 2017:

  • With Office Web Apps 2013, the max file size is 150Mb
  • With Office Online Server, the max file size is 300Mb

The explicit limits, where applicable, are listed in the table below. However, note that there is a 60-second file download time out that applies to all GetFile operations, and this time out can affect the perceived file size limit. In practice, this time out is rarely hit, since connectivity and bandwidth is typically very good between Office Online and host datacenters. However, hosts should be aware of this limit.

File size limits
Application Mode Limit Notes
Excel Online View 5MB
Excel Online Edit 5MB
PowerPoint Online View See notes No limit, but subject to the 60 second time out for file downloads as described above.
PowerPoint Online Edit 300MB While the upper limit is 300MB, this is still subject to the overall 60 second time out for file downloads so it is possible that smaller files will hit that timeout.
Word Online View See notes No limit, but subject to the 60 second time out for file downloads as described above.
Word Online Edit See notes The technical limit is 100,000,000 (100 million) characters in the document XML; however, this does not correlate with file size in a meaningful way. For example, a 1000-page document, hundreds of MB in size does not hit this limit. For the vast majority of use-cases, this limit is irrelevant.

The process to configure these max sizes is fairly simple, and is configured in the “Settings_Service.ini” configuration file. The default location for that file is:

C:\Program Files\Microsoft Office Web Apps\PPTConversionService

At the bottom of the file, just add the following entries:

For Office Web Apps 2013, add:

PowerPointEditServerMaxFileSizeBytes=(System.UInt64)153600000
PowerPointServerMediaEmbeddedMaxSize=(System.UInt64)153600000

For Office Online Server, add:

PowerPointEditServerMaxFileSizeBytes=(System.UInt64)307200000
PowerPointServerMediaEmbeddedMaxSize=(System.UInt64)307200000

Once the changes are saved, restart the server service. You may do so in PowerShell with the following command:

Restart-service WACSM

Please note, these max sizes are for the entire meeting, not per attachment, therefore, if your meeting has much larger files, you will have to split them, upload one, go through it, remove it, upload the next. – You can upload several files at a time, but they cannot collectively be larger than the total MB size limits.

Setting the Network Profile in Windows 2012 or higher

I recently had a non-domain joined edge Windows 2016 machine with two separate NICs that I needed to set different Windows Firewall settings to. Why? For instance, I wanted to allow RDP for the Internal NIC, and not allowing it for the External one, etc. The problem was, the NICs were set with the wrong network profile, the external public facing one was set to Private, and the other was reversed as shown in the following screen capture:

Network Connect profile - 1-1

New with Windows Server 2012 and higher, to change the network profile, PowerShell v4 cmdlets need to be used! Those cmdlets are:

Get-NetConnectionProfile
Set-NetConnectionProfile

Here are the results with the “Get” command:

Network Connect profile - 1

We can see the results are reversed, as the “Internet” connection has the “Private” designation, thus the wrong Windows Firewall profile is assigned to it.

To fix that, we run the “Set” command as shown in the bottom of the capture above, and the correct firewall profile is assigned!

Note: I named the external facing NIC “Public”, and the internal facing one “Private”. You can name it whatever you’d like, and identify it with the -InterfaceAlias property.

Network Connect profile - 1-4

The default profile in Windows Server 2012+ is Public. It automatically changes when you join the server to the Domain. In my instance, I was not joining this server to a Domain, and thus had to set it manually, on top of that, in this instance, the automatic designation was configured incorrectly.

 

Dropped Calls at 10 min with Skype for Business SIP trunk. (& what is SIP ALG?)

The case of the dropped call at the 10 min mark:

We have a SIP trunk provider that due to their implementation of RFC requirements, send down a Re-Invite packet at the 10 minute mark. If you’re just running a single server, there’s no issue, as the FE that gets the packet sends back an ACK packet saying “got it”, and communication continues. When you have multiple servers, it can be an issue, as in this provider’s case, they only send that Re-Invite packet to the first registered IP, and if the outgoing call originates from one of the other servers, they never get that Re-Invite packet, can’t send back an ACK, and the call gets dropped due to timeout.

To mitigate the above mentioned case, an SBC was put in the middle so that the trunk would only have one IP to communicate to, all was fine and dandy, but due to the nature of things changing in an IT environment, a new random call drop issue came up. (Ugh!)

To see what was going on, client and server Skype for Business logs were looked at for the time the drop occurred, they showed a standard “BYE” happening, not an error, so onto looking at the next step, infrastructure.

In order to give some concept of the environment, here is a basic diagram of what it looks like:

Internal Diag

Note that there is NAT’ing going on between the networks, the SBC was configured with the IP 192.168.70.44 (in the syslog figures, it shows up as “Device”) the internal FE servers had NAT addresses of 192.168.70.20, 21, and 22. We’ll get back to that in a bit.

Syslog was enabled on the SBC, perused through those, and this is what I found:

In the process to narrow down what was going on, we found that in the majority of complaints, the calls were getting dropped at the 10 min mark, and since the Re-Invite gets sent at that time, the SIP provider was in the crosshairs, and that was our main focus. After much going back and forth with them, going back and forth with the SBC vender, tearing up the servers, time to start fresh, and look deeper.

In looking at the logs, there are some calls that the re-invites are handled correctly: Here are two long ones, one lasting 13 min, the other almost 30, both with Re-Invite packets, (the second one with three).

Work1-1

Eventually I found some calls that failed, here’s an example of one that wasn’t handled correctly:

Notice the call is only 2 seconds long, but still dropped!

Non0-1

Here’s a screenshot of the Re-Invite hangup problem:

Non3

Here we see the Re-Invite packet coming down at exactly 10 min after the call started (15:08:58), when the internal server answers that with its SDP packet, the address doesn’t get NAT’d, the SBC then tries to send it to the Internal IP, that communication doesn’t occur, and after a period of no communication (14 sec in this instance), the server assumes the call has ended, has the BYE packet sent to drop the call.

So, from looking at all the calls with the syslog viewer, I saw sporadically there were calls failing, and others succeeding, there didn’t seem to be any rhyme or reason to it, except for the failure was always right after an SDP packet coming from the inside server.

This now had me looking at ALG, which in this case, is controlled by the firewall. What is ALG? To put it simply, it is NAT for SIP SDP packets. What is SDP? While SIP deals with creating, modifying, and closing down sessions, SDP deals with the media within those sessions. and what we’re specifically interested in this scenario, is the IP addresses the two endpoints should talk over. Since one of the endpoints is behind a NAT’d device, ALG is the mechanism that changes the data in these packets to have IPs the two devices can talk to!

Thankfully, this firewall makes it easy to capture inbound packets and outbound packets separately in the same session, producing two separate pcap files which can be compared to side by side! Let’s look at packet #24 of a call that does work:

Here is the inbound packet, notice the O and C attributes have the internal IP, as it’s originating from inside the network:

cap1

Once it goes through the firewall, here is the outbound version of that exact same capture, you can see the O and C attributes have been changed to the NAT IP by ALG:

cap2

Here’s an example of a complete failure capture. Note, this was from an “updated” firmware version of the firewall that completely blew up NAT, so the capture was easy to catch, but the concept is the same which is occurring with the sporadic failures above:

Here is the inbound packet #4, notice the O and C attributes have the internal IP, as it’s originating from inside the network:

cap3

Once it goes through the firewall, here is the outbound version of that packet #4, you can see the O and C attributes have NOT been changed to the NAT IP by ALG:

cap4

Let’s take a look at that syslog trace again:

We can see the original invite comes from the Skype FE server (10.50.2.46, which is NAT’d to 192.168.70.22), communication goes to the SBC, and then gets sent up to the SIP provider’s CPE SIP router. Everything works great until one of the SDP packets does not get translated through ALG correctly, when that happens, the SBC does not know where to send the packet, it gets sent to the “Internal” IP, subsequently goes nowhere, and 14 seconds later, the stream is disconnected with a “BYE” packet due to inactivity.

Non1

Why did this problem arise out of the blue? I’m guessing there have been some good firewall firmware versions that have not caused any issues, so there has been some periods of tranquility, with other firmware versions that have “sometimes” caused this issue. The current firewall version installed is 7.1.5. I’ve tried 7.1.8, 8.0, and 8.0.2, they are progressively worse (with 8.0 and higher not working at all), so I went back down to 7.1.5, and am awaiting resolution from the manufacturer.

Another solution is to change the architecture of the network boundaries affected to be Routed instead of NAT’d, that way ALG does not come into the picture. That might not be a feasible solution due to your security or infrastructure constraints, but has solved the issue (temporarily) in this case. It is not a permanent fix, as any SIP conversations going over NAT will continue to be an issue until the vendor resolves it in their firmware.